Third-generation data correction method based on DNA variation detection

A technology of mutation detection and data correction, applied in the field of bioinformatics, can solve problems such as high cost and high error rate, and achieve the effect of reducing cost, improving accuracy and facilitating data analysis

Active Publication Date: 2018-09-28
BEIJING UNIV OF CHEM TECH +1
View PDF6 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The present invention solves the problems of high error rate and high cost in the third-generation sequencing technology thr...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Third-generation data correction method based on DNA variation detection
  • Third-generation data correction method based on DNA variation detection
  • Third-generation data correction method based on DNA variation detection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0029] The third-generation data used in the test is the 85X Escherichia coli (Escherichia coliK12MG1655Methylome) sequencing data provided by PacBio (download address: https: / / github.com / PacificBiosciences / DevNet / wiki / Datasets), the second-generation data used is from NCBI’s sra The 290X Escherichia coli Illumina (Escherichia coli K12MG1655Methylome) sequencing data downloaded from the database with the number ERR022075, the selected reference genome is the standard reference gene of Escherichia coli K12MG1655 downloaded from the Genome database of NCBI, (download address: https: / / www .ncbi.nlm.nih.gov / genome / 167?genome_assembly_id=161521).

[0030] Set the coverage gradient of the PacBio data to 10X, 20X, and 30X respectively, and set the coverage gradient of the Illumina data to 30X. First compare the PacBio data to the reference genome, count the number of bases inserted and deleted in the mapping information in column 6 of the sam file, and calculate the sequencing error ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a third-generation data correction method based on DNA variation detection and belongs to the field of bioinformation technology. According to the method, third-generation sequencing data is processed to serve as reference sequence data, then second-generation sequencing data is compared with the third-generation sequencing data after being processed, and a comparison file is obtained; and variation analytical detection is performed on the comparison file, variation information of the second-generation sequencing data relative to the third-generation sequencing data canbe obtained, and the variation information is utilized to complete correction of the third-generation sequencing data. The DNA variation detection method is applied to third-generation sequencing dataerror correction, the second-generation sequencing data and the third-generation sequencing data are combined in use, and therefore the cost of third-generation data correction is lowered; and a multithread thought is adopted in a program, so that the correction speed of the third-generation data is increased. Through the method, the problems of a high error rate and high cost in a third-generation sequencing technology are solved through a united correction technology, and a foundation is laid for subsequent third-generation sequencing data variation detection.

Description

technical field [0001] The invention belongs to the technical field of biological information, and in particular relates to a three-generation data correction method based on DNA variation detection. Background technique [0002] With the development of third-generation sequencing technology, the fragment length of sequencing data is continuously increasing. At the same time, with the proposal and development of precision medicine, the scale of sequencing data has shown explosive growth. The current stage is the transition stage from the second-generation sequencing technology to the third-generation sequencing technology. Due to some defects in the third-generation sequencing technology, the development and application of the third-generation sequencing technology are restricted. The third-generation sequencing data correction software currently used mainly includes FALCON and PBcR. They use the third-generation data self-correction method to correct the sequencing data, wh...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/20G06F19/18G06F17/30
CPCG16B20/00G16B25/00
Inventor 高敬阳高峰陈禹保
Owner BEIJING UNIV OF CHEM TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products