Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Copy number variation detection method based on next generation sequencing

A technology for copy number variation and detection method, applied in the field of copy number variation detection based on next-generation sequencing, which can solve the problems of systematic errors, neglect of internal locus correlation, and small number of samples, so as to reduce systematic errors or sequencing platform errors. , improve detection efficiency, improve the effect of accuracy

Active Publication Date: 2016-07-13
XIDIAN UNIV
View PDF5 Cites 43 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

First, the difficulty of the problem itself: a) the number of sites is as high as 1.8 million and the number of samples is often small, forming a data pattern of high latitude and small samples; b) systematic errors caused by different sequencing platforms and sequencing levels, And normalize the samples of different sequencing levels; c) the reads signal (readdepth, RD) corresponding to the gene locus is easily affected by noise such as sequencing errors and alignment errors; d) there is a strong The relevance of each factor is not independent, so that there is an interaction between the detection factors; e) To detect the state of copy number amplification or deletion, two characteristics should be considered, that is, the number of reads corresponding to the site and the correlation between the sites, which requires a reasonable The mechanism for balancing these two features
Second, the theories and methods for solving the problem are challenging: a) The scale of data is large, and it is a challenge to effectively control the computational time and space complexity; b) How to fully consider the correlation between CNV sites and reduce the significance of CNV The conservatism of the estimation of the significance level is a difficult problem; c) How to establish a null hypothesis distribution that is consistent with the statistics and enhance the statistical significance of the estimation of the significance level is a key issue that has not yet been broken through
[0013] (1) Statistics based on a single CNV site as a primitive may easily lead to conservatism in the estimation of the significance level; although statistics based on CNV structural fragments retain the inherent structural characteristics of copy number to a certain extent, they ignore It is difficult to objectively estimate the significance level of the statistic CNV because of the correlation between internal sites
[0014] (2) There is no reasonable balance between the frequency of CNV and the correlation of mutation sites, making it difficult to locate the biological manifestations of CNV and cancer;
[0015] (3) When the method based on single-sample detection detects cCNV of multiple samples, the problem of systematic error or platform error is serious
[0016] (4) There is no automatic synthesis of multiple samples from different sequencing platforms or sequencing levels, which makes the detection of CNV functional patterns co-occurring in multiple samples have great limitations;
[0017] (5) For low-coverage level sample data, it is not sensitive and the detection effect is not good

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Copy number variation detection method based on next generation sequencing
  • Copy number variation detection method based on next generation sequencing
  • Copy number variation detection method based on next generation sequencing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] In order to make the object, technical solution and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0044] The present invention adopts different standardized processing methods for data with different coverage levels during data processing, especially for high-coverage depth data, first defines the copy number amplification and deletion states according to the characteristics of its data frequency histogram, and separates only normal ( 0)—amplification (1) data set and normal (0)—missing (-1) data set; the present invention uses a single site as a detection primitive when designing statistics, and synthesizes CNV when quantifying statistics The number of reads at a single site and the information of the correlation betwe...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a copy number variation detection method based on next generation sequencing. The method comprises the following steps: pre-processing copy number variation data, constructing a sliding window, calculating of statistics, implementing a replacement policy, constructing zero distribution, and carrying out performance evaluation of an algorithm. The performance evaluation of the algorithm comprises the steps of judging whether a relatively high correct positive rate can be acquired by the algorithm under the condition that a false positive rate is controllable, evaluating whether the algorithm can relatively accurately estimate a p value or not, detecting a boundary detection capability of copy number variation, and analyzing the calculation complexity of the algorithm. With the adoption of the copy number variation detection method, the problem of copy number variation detection errors, caused by the fact that sequencing platforms and sequencing levels are different, is solved, and a result is relatively accurate; data is normalized by utilizing characteristics of a multi-peak frequency histogram, so that a normal region and a copy number variation region are accurately divided; and a new model is established by a comprehensive effect of relevance between a variation reads number and a variation site, so that the inconsistency problem is solved, and the remarkable level of copy number variation is objectively estimated.

Description

technical field [0001] The invention belongs to the technical field of high-throughput sequencing for sequence determination of DNA molecules, and in particular relates to a copy number variation detection method based on next-generation sequencing. Background technique [0002] Copy number variation (CNV) is an important phenomenon in cancer genomes. It is mainly manifested in two states of copy number amplification and deletion, and is closely related to the occurrence and development of cancer cells. Detecting CNVs that co-occur in the same region in multiple cancer samples, and integrating and analyzing the impact of CNVs on the genome-wide expression level, and identifying those cancer genes that are affected by CNVs are of great significance for the study of cancer occurrence and metastasis. Although single-sample-based CNV detection methods have become more and more mature, these methods still cannot meet the detection sensitivity and accuracy of CNV regions in multi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F19/22G06F19/24
CPCG16B30/00G16B40/00
Inventor 李垚垚袁细国张军英杨利英白俊
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products