Third-generation PacBio sequencing data comparison method

A technology for sequencing data and comparison, applied in the field of biological information, can solve problems affecting biological analysis, etc., and achieve the effect of saving comparison time, saving memory and comparison time, and improving accuracy and speed.

Active Publication Date: 2016-10-12
HANGZHOU HEYI GENE TECH
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

And these comparison software will compare and output these repeated sequences, thus affecting the subsequent biological analysis (such as assembly, expression analysis, etc.)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Third-generation PacBio sequencing data comparison method
  • Third-generation PacBio sequencing data comparison method
  • Third-generation PacBio sequencing data comparison method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0032] (1) Use the second-generation Illumina sequencing data to build a kmer model and extract unique-kmer from it

[0033] Use jellyfish software to perform k-mer statistics on the second-generation Illumina sequencing data, and break all the data into fragments of length k (called k-mer). The abscissa is the frequency of the k-mer, and the ordinate is the frequency. Types of k-mers. According to the k-mer distribution diagram, the k-mer within twice the main peak is obtained as the unique-kmer. For k≤17, a bit file (*.bit) with a size of 2G is used to store it. For the case of k>17, the The unique-kmer is stored in the (*.h5) file in the GATB open source package. Wherein, the second-generation Illumina sequencing data refers to the next-generation sequencing data obtained through a sequencer of Illumina Company.

[0034] According to the above method, write the following program to extract unique-kmer. The specific operation commands are as follows:

[0035]

[0036] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a third-generation PacBio sequencing data comparison method capable of effectively reducing comparison errors caused by repeated sequences. According to the method, a k-mer model is established by using second-generation Illumina data; unique-kmer is extracted; and in third-generation PacBio sequencing data comparison, unique-kmer is used as a seed used in the comparison, so that the influence of the repeated sequences can be greatly reduced and the comparison speed can be increased.

Description

technical field [0001] The present invention relates to the technical field of biological information, in particular to a method for comparing DNA sequences, which uses the second-generation Illumina sequencing data to model and extract key information, and uses the key information to assist the comparison of the third-generation PacBio sequencing data. Background technique [0002] For the sequencing data of the third generation of PacBio, the error rate of a single sequencing is about 15%. There are not many alignment software that specifically support the third generation. At present, the following two software are most used: (1) blasr; (2) dalign. [0003] These two are very good three-generation comparison software, which can support PacBio's high error rate. Due to the presence of repetitive sequences in the genome itself, they have highly similar sequences. These comparison software will compare and output these repeated sequences, thereby affecting subsequent biolog...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/20
CPCG16B25/00
Inventor 詹东亮王军一郝美荣何荣军俞凯成高金龙蔡庆乐
Owner HANGZHOU HEYI GENE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products