The basic idea is to find a model which agrees with your visual definition of a breakpoint. Our approach to segmentation uses computer vision and machine learning, exploiting the strong points of your eyes and mathematical models:
|Breakpoint detector||Strong points||Weak points|
|Your eyes||Identifying outliers, noise, and breakpoints over large regions||Finding the exact breakpoint|
|Mathematical models||Use optimization to find the breakpoints of maximum likelihood||Tuning parameters that limit the number of breakpoints,
and must be chosen using various heuristics
Each scatterplot of probe logratio is shown with a piecewise constant smoothing model. If the breakpoints detected by the model are not what you expect, then you can add breakpoint annotations to update the segmentation. Breakpoint annotations are regions that encode your visual interpretation of the signal, and were first described for segmentation model selection in Learning smoothing models of copy number profiles using breakpoint annotations.
When a profile is uploaded, each chromosome is pre-processed:
Optimal fitting: the displayed segmentation can be edited by adding 0/1 breakpoint annotations on the bottom half of the plot. Given a set of breakpoint annotations, how do we find a consistent segmentation? The color of the displayed breakpoints indicates the algorithm used to find the displayed segmentation:
Optimal prediction: if there are few breakpoint annotations, then there may be several Pruned DP segmentations which are consistent. For example, when there are no breakpoint annotations, all the Pruned DP models are consistent. In that case, how do we predict the optimal number of breakpoints? We use a max-margin interval regression model learned on all the other annotated signals to select the number of breakpoints. The breakpoint detection of this model gets better as you add more breakpoint annotations.
Copy number annotations: there are 5 copy number annotations:
|Annotation||Approximate copy number|
|loss||<2, typically 1|
|deletion||<<2, typically 0|
Each copy number annotation is used to assign a copy number status to its overlapping segment. Annotated segments are the same color as the copy number annotations, and un-annotated segments are green. The annotated segments of a profile are used to predict copy number status for un-annotated segments on the same profile. But copy number is not generalized between profiles, so you need to label a few segments on each profile.
Rendering large zoomed PNG scatterplots of high-density profiles is not supported on all browsers. For example, chr2 on profile dr3hg19 has 153,662 probes, and the following table shows which web browsers can display the different zoom levels.
|Zoom level||Width (pixels)||Chrome/Safari
Back to home.
|W3C standard HTML5 web site made by Toby Dylan Hocking using Emacs, Python, Pyramid, SegAnnot, Berkeley DB, D3.|
|Thanks to INRIA for hosting the server, and INRIA GForge for hosting the GPL-3 free software source code which runs this site.|