Method for de novo assembly of genome by comprehensively applying third-generation ultra-long reads and second-generation linked reads

A genome assembly and linking technology, which is applied in the field of mixed assembly of third-generation sequencing data and second-generation sequencing data, can solve the problems of increasing sequencing and computing costs, increasing sequencing data, assembly problems and obstacles, etc., to achieve low Effects of sequencing cost and computational cost, avoidance of mismatches, and reduction of complexity

Pending Publication Date: 2020-03-03
KUNMING INST OF ZOOLOGY CHINESE ACAD OF SCI
View PDF2 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the assembly problem caused by the high error rate of the third-generation sequencing technology has become an obstacle to its large-scale promotion
The error position of the third-generation sequencing data occurs randomly. This error can be corrected by increasing the sequencing coverage (Coverage), but the increase in coverage leads to an increase in sequencing data, which increases the cost of sequencing and computing.
Although Pacbio and Nanopore sequencing technologies have been successfully applied to genome de novo sequencing, high sequencing and computational costs hinder the large-scale application of third-generation sequencing technologies

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for de novo assembly of genome by comprehensively applying third-generation ultra-long reads and second-generation linked reads
  • Method for de novo assembly of genome by comprehensively applying third-generation ultra-long reads and second-generation linked reads
  • Method for de novo assembly of genome by comprehensively applying third-generation ultra-long reads and second-generation linked reads

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] We use human genes to verify the effect of the method of the present invention. Using 56X second-generation linked read sequencing data to assemble Scaffold through Supernova, and then convert it into Contigs, and select 7X third-generation sequencing ultra-long reads. Contigs and 7X ultra-long reads were mixed assembled using DBG2OLC. In addition, we used the assembly results of the second-generation concatenated reads and the assembly results of the third-generation sequencing reads of 30X and 35X to compare the assembly effect and sequencing cost of the present invention (results are shown in Table 1).

[0020] Table 1. Comparison of 10X Genomics Linked Read Assembly Results, Nanopore Sequencing Data Assembly Results, and Hybrid Assembly Results

[0021]

[0022] **The method used in the present invention

[0023] The total length in Table 1 is the overall length of the assembled genome sequence. The total length of human is 3,000,000,000bp. The larger the value...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for high-efficiency and high-quality de novo assembly of a genome by comprehensively applying third-generation ultra-long reads and second-generation linked reads. Thethird-generation ultra-long reads are ultra-long reads generated by a third-generation sequencing technology Nanopore and PacBio which are most widely applied at present. The second-generation linkedreads are reads generated by a 10x Genomics sequencing platform. A high-quality genome sequence is assembled through high-efficiency hybrid assembly software. According to the method, the advantagesof the third-generation super-long reads and the second-generation linked reads are brought into full play and are integrated, and high-efficiency assembly software DBG2OLC and SPARC are combined, sothat the application cost of the third-generation sequencing technology is greatly reduced. The invention provides an efficient, reliable and economic method for large-scale and high-quality genome denovo assembly by applying the third-generation sequencing technology.

Description

technical field [0001] The invention relates to a method for de novo assembly of genome sequencing data, in particular to a mixed assembly method for third-generation sequencing data and second-generation sequencing data. Third-generation sequencing data are mainly ultra-long reads generated by Pacbio, Nanopore or other sequencing technologies, and second-generation sequencing data are mainly linked reads generated by 10x Genomics sequencing. Combined with the high-efficiency assembly software-DBG2OLC, the cost of sequencing and calculation (especially the application cost of third-generation sequencing technology) is greatly reduced. Provides an efficient and reliable method for large-scale, high-quality genome de novo assembly using third-generation sequencing technology Background technique [0002] With the development of sequencing technology, genome sequence information generated by genome de novo assembly is becoming more and more detailed and accurate. The 10X Geno...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B30/20
Inventor 马占山张亚平李连伟彭旻晟
Owner KUNMING INST OF ZOOLOGY CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products