Three-generation sequencing-based whole genome structure variation analysis method and system

A structural variation, whole-genome technology, applied in the field of whole-genome structural variation analysis based on third-generation sequencing, can solve the problems of low sensitivity, high single-base error rate, and good randomness of errors in the second-generation technology, and meet the accuracy requirements. or sensitivity requirements, time-consuming, and the effect of improving detection speed

Active Publication Date: 2017-09-19
BEIJING GRANDOMICS BIOTECH
View PDF5 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the characteristics of the second-generation sequencing read length (about 100-150bp), the reads cannot span the entire variation region. Although a variety of algorithms are used, the detection of genome structure variation still has the disadvantages of low accuracy and low sensitivity; The third-generation sequencing technology has the characteristics of extremely long read

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Three-generation sequencing-based whole genome structure variation analysis method and system
  • Three-generation sequencing-based whole genome structure variation analysis method and system
  • Three-generation sequencing-based whole genome structure variation analysis method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0091] Sample: This sample comes from a voluntary donor of our company. This sample has a good research basis for first-generation and next-generation sequencing. Therefore, this example uses this sample as a demo case to illustrate the accuracy of this system.

[0092] Data analysis and result statistics:

[0093] raw data statistics

[0094] Table 1 Raw data statistics

[0095] Number of sequencing bases

34.28G

Number of polymer reads

3.59M

Polymer read average length

9,441

polymer read length N50

16,694

number of subreads

12.88M

subread average length

2,624

subread average N50

3,208

[0096] Comparison result statistics

[0097] Through blasr alignment, 12.85M reads were finally aligned to the genome (version number hg19).

[0098] Compare with standard data

[0099] It is currently known that there are 2194 and 68 deletion sequences and insertion sequences longer than 200 bp in the samples use...

Embodiment 2

[0106] Sample: This sample is a whole-genome sequencing sample completed by our company using three-generation sequencing technology. The sequencing depth of this sample is as high as 100X, so the detection result of the genome structure variation of this sample has high reliability. In this embodiment, the genomic structural variation detected by various systems under high-depth conditions is used as a standard set, and 10X data is randomly selected as test data to test the accuracy of the present invention.

[0107] Data analysis and result statistics:

[0108] The statistical results of the test data in this embodiment are as follows

[0109] Table 4 Raw data statistics

[0110] Number of sequencing bases

34.22G

Number of polymer reads

2.39M

Polymer read average length

14,344

polymer read length N50

12,169

number of subreads

3.03M

subread average length

11,294

subread average N50

9,954

[0111] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a three-generation sequencing-based whole genome structure variation analysis method and system. The method comprises the steps of 1) performing sequence splitting; 2) performing sequence comparison; 3) performing genome structure variation preliminary detection; 4) combining and screening genome structure variation preliminary detection results; and 5) performing genome structure variation function annotation. The system comprises a sequence splitting module, a sequence comparison module, a genome structure variation preliminary detection module, a genome structure variation preliminary detection result combining and screening module and a genome structure variation function annotation module. According to the method and the system, by integrating existing three-generation genome structure variation detection technologies PBhoney and Sniffles, the accuracy and sensitiveness of genome structure variation detection under low coverage degree can be effectively improved, and the reliability of the detection results is ensured while the detection cost is reduced.

Description

technical field [0001] The invention belongs to the field of genome structure variation detection, and in particular relates to a method and system for analysis of whole genome structure variation based on three-generation sequencing. Background technique [0002] Genome structural variation usually refers to the insertion, deletion, duplication, inversion, translocation, and DNA copy number variation (CNV) of large fragments in the genome. Genome structural variants affect more genome sequences (~13%) than short sequence variants (SNPs, Indels, etc.), and thus also play a very important role in various diseases. At present, the detection of genomic structural variation mainly includes first-generation sequencing technologies such as oligonucleotide-based array-CGH, ​​SNP array, MLPA, and QPCR, Breakdancer, readdepth, delly, and PIndel analysis technologies based on second-generation sequencing, and PBhoney and Sniffles based on third-generation sequencing. analytical skill...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/18
CPCG16B20/00
Inventor 汪德鹏方立王凯张朋胡江
Owner BEIJING GRANDOMICS BIOTECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products