Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Three-generation PacBio sequencing data-based hole filling method

A technology for sequencing data and filling holes, applied in the field of biological information, can solve the problems of slow comparison speed, waste, and long scaffolding time, and achieve the effects of improving accuracy, saving memory, and saving comparison time.

Active Publication Date: 2016-10-12
HANGZHOU HEYI GENE TECH
View PDF4 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, there is PBjelly, the software for filling holes based on the third-generation PacBio sequencing data, but it is based on the blasr comparison software. Because the comparison speed of the blasr software is very slow, the construction time of the entire scaffold is also very long.
Especially for genomes larger than 1G, when the sequencing depth is greater than 10X, it usually takes several months

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Three-generation PacBio sequencing data-based hole filling method
  • Three-generation PacBio sequencing data-based hole filling method
  • Three-generation PacBio sequencing data-based hole filling method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0041] 1. Extract the unique-kmer from the contig. In step (1), use the Jellyfish software to perform k-mer statistics on the second-generation Illumina sequencing data, and use the k-mer that appears once as the unique-kmer. For k≤17, use a size It is stored as a 2G bit file (*.bit file), and for the case of k>17, the unique-kmer is stored in the (*.h5) file in the GATB open source package. Wherein, breaking all the data into fragments of length k is called k-mer, and the second-generation Illumina sequencing data refers to the next-generation sequencing data obtained through the sequencer of Illumina Company.

[0042] The program was written according to the above method, and the usage is as follows:

[0043]

[0044]

[0045] Put the contig path into a file file.lst

[0046]

[0047] Then run the program to get unique-kmer:

[0048]

[0049] Because k=17 is selected, the result is stored in the bit file: k17.bit

[0050] 2. Use unique-kmer as the seed for com...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a three-generation PacBio sequencing data-based hole filling method. According to the method, the comparison time in the hole filling process is greatly shortened and the genome hole filling sped is remarkably improved. The method comprises the following steps: comparing three-generation PacBio sequencing data with two ends of a hole in a genome through corresponding software; intercepting partial region of the compared three-generation PacBio sequencing data and clustering the intercepted data according to a hole to which the data belongs; and carrying out error correction by using dazcon software and carrying out sequence connection by using the data after the error correction.

Description

technical field [0001] The invention relates to the technical field of biological information, in particular to a hole filling method for DNA assembly, which uses third-generation PacBio sequencing data to fill holes in genome data. Background technique [0002] The third generation of PacBio sequencing is famous for its long read length. The P6-C4 reagent currently used for sequencing can make the average read length of sequencing data reach 10-15k, and the sequencing has no obvious GC bias, which can theoretically complement the genome very well. Hole. Currently, there is PBjelly, the hole-filling software based on the third-generation PacBio sequencing data, but it is based on the blasr comparison software. Since the comparison speed of the blasr software is very slow, it takes a long time to build the scaffold. Especially for genomes larger than 1G, when the sequencing depth is greater than 10X, it usually takes several months. Contents of the invention [0003] The ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/24
CPCG16B40/00
Inventor 詹东亮蔡庆乐王兆宝罗亚丹范崇仪王军一范玉美
Owner HANGZHOU HEYI GENE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products