SegAnnDB

The basic idea is to find a model which agrees with your visual definition of a breakpoint. Our approach to segmentation uses computer vision and machine learning, exploiting the strong points of your eyes and mathematical models:

Breakpoint detector | Strong points | Weak points |
---|---|---|

Your eyes | Identifying outliers, noise, and breakpoints over large regions | Finding the exact breakpoint |

Mathematical models | Use optimization to find the breakpoints of maximum likelihood | Tuning parameters that limit the number of breakpoints, and must be chosen using various heuristics |

Each scatterplot of probe logratio is shown with a piecewise constant smoothing model. If the breakpoints detected by the model are not what you expect, then you can add breakpoint annotations to update the segmentation. Breakpoint annotations are regions that encode your visual interpretation of the signal, and were first described for segmentation model selection in Learning smoothing models of copy number profiles using breakpoint annotations.

When a profile is uploaded, each chromosome is pre-processed:

- Python Imaging Library is used to draw PNG scatterplots for several zoom levels, so even high-density profiles can be quickly plotted and zoomed.
- Pruned Dynamic Programming (DP) is used to find the optimal segmentation for several model sizes.

**Optimal fitting:** the displayed segmentation can be
edited by adding 0/1 breakpoint annotations on the bottom half of the
plot. Given a set of breakpoint annotations, **how do we find a
consistent segmentation?** The color of the displayed breakpoints
indicates the algorithm used to find the displayed segmentation:

**green=PrunedDP**. After annotation, we first check if there are any segmentations with 0 annotation error from Pruned DP. If there are, we use one of those.**purple=SegAnnot**. If there are no 0-error segmentations from Pruned DP, then SegAnnot is used to find the most likely segmentation that is consistent with the given annotations. Thus the segmentation model is always consistent with the visually-defined breakpoint annotations.

**Optimal prediction:** if there are few breakpoint annotations,
then there may be several Pruned DP segmentations which are
consistent. For example, when there are no breakpoint annotations, all
the Pruned DP models are consistent. In that case, **how do we
predict the optimal number of breakpoints?** We use
a max-margin
interval regression model learned on all the other annotated
signals to select the number of breakpoints. The breakpoint
detection of this model gets better as you add more breakpoint
annotations.

**Copy number annotations:** there are 5 copy number annotations:

Annotation | Approximate copy number |

amplification | >>2 |

gain | >2 |

normal | ≈2 |

loss | <2, typically 1 |

deletion | <<2, typically 0 |

Each copy number annotation is used to assign a copy number status to its overlapping segment. Annotated segments are the same color as the copy number annotations, and un-annotated segments are green. The annotated segments of a profile are used to predict copy number status for un-annotated segments on the same profile. But copy number is not generalized between profiles, so you need to label a few segments on each profile.

Rendering large zoomed PNG scatterplots of high-density profiles is not supported on all browsers. For example, chr2 on profile dr3hg19 has 153,662 probes, and the following table shows which web browsers can display the different zoom levels.

Zoom level | Width (pixels) | Chrome/Safari iPad |
Firefox Ubuntu/Windows |
Chrome Windows |
Chrome Ubuntu |
---|---|---|---|---|---|

standard | 1,500 | yes | yes | yes | yes |

ipad | 20,000 | yes | yes | yes | yes |

1pixel_per_probe | 153,662 | yes | yes | ||

5pixels_per_probe | 768,310 | yes |

W3C standard HTML5 web site made by Toby Dylan Hocking using Emacs, Python, Pyramid, SegAnnot, Berkeley DB, D3. |

Thanks to INRIA for hosting the server, and INRIA GForge for hosting the GPL-3 free software source code which runs this site. |