Method and system for assembling genome sequence

A technology for assembling genomes and genome skeletons, applied in the field of biological information, can solve the problems of consuming computing resources, large genome computing, and high error rate, improving the accuracy and speed of comparison, saving time and effect of data collation significant effect

Inactive Publication Date: 2015-04-22
HANGZHOU HEYI GENE TECH
View PDF3 Cites 41 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, due to the relatively high error rate of single-molecule sequencing, the error rate of single sequencing is 15%, and the error rate of cycle sequencing is about 8%. There is a big gap between its accuracy and second-generation sequencing technology. It is very computationally resource-intensive, and the calculation of large genomes is so huge that only a few institutions can afford to use this technology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for assembling genome sequence
  • Method and system for assembling genome sequence
  • Method and system for assembling genome sequence

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] Embodiments of the present invention are described in further detail below in conjunction with accompanying drawings:

[0042] Efficient and fast de novo assembly helps to discover structural variations of large fragments, which is of great significance for understanding disease-related genomes and genetic changes of diseases with fusion genes, copy number variations, and large-scale structural variations. High-quality genome assemblies are also very important for genome annotation and comparative genome analysis. The method of the present invention makes full use of the advantage of the read length of the third-generation sequencer PacBio RSII, and combines the data generated by it with the accurate short-read data generated by the second-generation sequencer, so that the accuracy of the genome assembly results is greatly improved, The average length of assembled contigs is more than twice that obtained by second-generation sequencers.

[0043] figure 1 is a schemati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a system for assembling a genome sequence. High-precision short segment sequence data obtained by a second-generation sequencing technology and long segment sequence data obtained by single-molecule real-time sequencing are combined together to assemble a genome sequence, so that the assembly efficiency and accuracy can be improved. The method comprises the following steps: (1) sequencing a sample by utilizing the second-generation sequencing technology to obtain a high-precision short segment sequence; (2) splicing the high-precision short segment sequence to obtain a high-precision frame diagram; (3) sequencing a sample which has the same source with the sample by utilizing a single-molecule sequencing technology to obtain three generations of sequencing data; (4) comparing the three generations of sequencing data in the frame diagram to obtain detailed contrast information of the three generations of sequencing data and the frame diagram; (5) clustering the three generations of sequencing data and constructing a genome frame by utilizing the detailed contrast information, and correcting the genome frame to obtain a fine genome map.

Description

technical field [0001] The invention relates to the technical field of biological information, in particular to a method and system for assembling genome sequences. Background technique [0002] Illumina's next-generation sequencing technology, with its high energy and accuracy, has become the preferred platform for many scientific research sites. At present, its average read length is 100bp-300bp. Due to its high energy and low cost, it has greatly advanced With the development of bioinformatics, many genomes are studied based on this platform. However, due to the limitation of read length, and complex genomes contain many high GC and highly repetitive regions, Illumina's performance in the assembly of these genomes is not ideal. [0003] PacBio RSII is currently the most mature third-generation sequencing platform on the market. Its average sequencing read length ranges from 2k at the beginning to 14k at present, and can span most of the repeating regions. It has great ad...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): C12Q1/68C12M1/00
CPCC12N15/1027
Inventor 詹东亮张姝蔡庆乐何荣军郝美荣梁倩韩雪莲刘三阳王军一
Owner HANGZHOU HEYI GENE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products