High-fault-tolerance genome complex structure variation detection method based on filtering strategy

A complex structure and mutation detection technology, applied in genomics, proteomics, instruments, etc., can solve problems such as interference

Active Publication Date: 2020-07-24
XI AN JIAOTONG UNIV
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The technical problem to be solved by the present invention is to provide a high-error-tolerant genome complex structural variation detection method based on the filtering strategy to solve the interference caused by sequencing err

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-fault-tolerance genome complex structure variation detection method based on filtering strategy
  • High-fault-tolerance genome complex structure variation detection method based on filtering strategy
  • High-fault-tolerance genome complex structure variation detection method based on filtering strategy

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0080] The present invention provides a highly fault-tolerant genome mutation detection method based on a filtering strategy. A highly fault-tolerant detection algorithm (CIDDⅡ) based on a filtering strategy is a complex indel proposed based on the fault tolerance mechanism of Kalman filter and support vector machine (SVM) Detection algorithm.

[0081] See figure 1 , The present invention is a high-fault-tolerant genome complex structure mutation detection method based on a filtering strategy, including the following steps:

[0082] S1, preprocess the SAM file;

[0083] Traverse all the comparison results in the SAM file. The SAM file has already given the comparison quality of the reads and sorted them accordingly. Each read only processes the results of the best comparison quality, that is, traverses the reads of the best quality comparison. CIGAR field in;

[0084] S2, according to the CIGAR field after the comparison and the variation score calculation criteria, calculate the va...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a high-fault-tolerance genome complex structure variation detection method based on a filtering strategy, and the method comprises the steps: carrying out preprocessing on an input file in an SAM format, and traversing a CIGAR field in an optimal quality comparison reading segment; according to the compared CIGAR field and variation score calculation criterion, calculatinga variation score corresponding to each site in the current reading segment, and storing the variation score in a variation score set of each site in advance; counting an average value in the variation score set of each site as a final variation score of the site and obtaining a variation score function of the sample; carrying out Kalman or Gaussian filtering on the variation score function to obtain a variation score function after filtering and noise reduction; according to the filtered variation score function, setting a threshold value and separating a structure variation region, and carrying out feature extraction; and training a support vector machine (SVM) model, and classifying the structural variation regions by using the trained SVM model to obtain a complex indel result set. According to the invention, the interference of sequencing errors on the determination of structural variation is solved.

Description

Technical field [0001] The invention belongs to the technical field of third-generation nucleic acid sequence sequencing (Single Molecule Real Time, SMRT), and specifically relates to a high-fault-tolerant genome mutation detection method based on a filtering strategy. Background technique [0002] Complex insertion-deletion (Complex insertion-deletion) is a genomic structural variation that is relatively rare in the population but more frequently in the tumor genome. A complex indel is a compound mutation in which a DNA fragment of a gene has a deletion mutation. Due to the self-repair mechanism of the DNA molecule, other DNA fragments are subsequently inserted at the same site and the inserted fragments may be inverted. There are dozens of manifestations of complex indels that have been discovered so far. As an important structural variation, the detection of complex indels is the basis for downstream analysis of the correlation between tumor susceptibility and phenotype. Exp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B20/20G16B40/20G06K9/62
CPCG16B20/20G16B40/20G06F18/2411
Inventor 张选平刘佳琦王嘉寅陈恒伟黄毅
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products