Method and devices of using third-generation sequence to optimize second-generation assembly result

A sequence, three-generation technology, applied in the field of using the third-generation sequence to optimize the second-generation assembly results, can solve problems such as high sample requirements, high single-base error rate, and third-generation data limitations, and achieve the effect of improving accuracy and integrity.

Active Publication Date: 2018-08-28
BGI TECH SOLUTIONS
View PDF11 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the shortcomings of Pacbio sequencing, such as high sample requirements, low sequencing yield, high sequencing cost, and high single-base error rate, the development of whole-genome assembly with third-generation data is great

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and devices of using third-generation sequence to optimize second-generation assembly result
  • Method and devices of using third-generation sequence to optimize second-generation assembly result
  • Method and devices of using third-generation sequence to optimize second-generation assembly result

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The present invention will be further described in detail below through specific embodiments in conjunction with the accompanying drawings.

[0041] The second-generation sequence in the present invention can come from any next-generation sequencing platform. Existing technical platforms mainly include Roche / 454FLX, Illumina / Solexa Genome Analyzer and Applied Biosystems SOLID system, etc., preferably from the Illumina sequencing platform. The third-generation sequence comes from the Pacbio sequencing platform in the third-generation single-molecule real-time sequencing technology (SMRT).

[0042] In one embodiment of the present invention, an assembly sequencing solution based on the combination of the second-generation Illumina sequencing technology and the third-generation Pacbio sequencing technology is provided, with the purpose of solving the assembly problems of insufficiently high assembly indexes and low assembly accuracy of complex genomes.

[0043] The method ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and devices of using a third-generation sequence to optimize a second-generation assembly result. The method includes: obtaining the second-generation assembly resultand a third-generation assembly result; using the third-generation assembly result as a reference sequence, and aligning the second-generation assembly result to the reference sequence; obtaining sequences which are in contig sequences of both sides of gap sequences of the second-generation assembly result and are aligned or not aligned to the reference sequence; substituting for gap sequences bysequences, which are not aligned to the reference sequence, to obtain the new gap sequences; and using third-generation data to carry out hole filling on the new gap sequences to obtain an optimized second-generation assembly result. The method can improve accuracy of genome assembly indicators and splicing.

Description

technical field [0001] The present invention relates to the technical field of sequencing sequence assembly, in particular to a method and device for optimizing second-generation assembly results using third-generation sequences. Background technique [0002] At present, genome assembly is mainly based on next-generation sequencing data obtained from the whole genome shotgun (WGS) Illumina sequencing platform. Its main features are high sequencing throughput, fast speed, high accuracy, low cost, and the ability to measure different insertions DNA fragment libraries of large size, especially DNA fragment libraries larger than 1k. This sequencing method has a good application effect in the assembly of simple genomes and some relatively complex genomes. When the average sequencing depth is sufficiently deep, it can basically guarantee the accuracy of the assembly results and the integrity of the genome. Therefore, next-generation sequencing is The current mainstream sequencing...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/20
CPCG16B25/00
Inventor 贺丽娟邓天全刘亚斌杨林峰高强
Owner BGI TECH SOLUTIONS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products