Method and device for assembling genome sequence

A technology for assembling genomes and reference genomes, which is applied in the field of bioinformatics, can solve problems such as bottlenecks and short fragment sequences, and achieve the effects of improving effects, increasing length, and improving assembly efficiency

Active Publication Date: 2011-10-05
WUXI QINGLAN BIOLOGICAL SCI & TECH +1
View PDF3 Cites 35 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the prior art, since the fragment sequences generated during sequencing are very short, it takes a very large amount of calculation to complete the restoration of large fragment data
[0005] At the same time, as one of the measures of the quality of the genome map, the fragment length N50 (N50 is to arrange all the assembled sequences from large to small and add them according to the lengt

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for assembling genome sequence
  • Method and device for assembling genome sequence
  • Method and device for assembling genome sequence

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The present invention will be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention are illustrated. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention, but do not constitute an improper limitation of the present invention.

[0028] The following description of at least one exemplary embodiment is merely illustrative in nature and in no way taken as limiting the invention, its application or uses.

[0029] Fosmid and Bacterial Artificial Chromosome (BAC) are large-segment clones available in genome research. BAC can usually insert fragments of about 100kb-200kb, and fosmids can generally insert fragments of about 40kb. BAC and fosmid not only have insert lengths characteristics, but also have very good stability, so they are important tools for genomics research, and play an important role in gene map cloning, gene analysis, structural...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and device for assembling a genome sequence. The method comprises the following steps: filtering short segment sequences output after long insertion segment library tail end sequencing, thereby removing unqualified sequences; comparing the short segment sequences after filtering with a reference genome sequence; dividing paired short segment sequences for comparison into soap reads sequences, single reads sequences and unmap reads sequences according to the comparison result, and making statistics on the quantity of each type of sequences; calculating the distance between the paired short segment sequences on the same segment of the reference genome sequence by utilizing the soap reads sequences, and making statistics on the distance distribution of all the paired short segment sequences on the reference genome sequence; and when the distance distribution satisfies the threshold requirement, assembling the genome sequence by utilizing the unique paired single reads sequences with different segments on the reference genome sequence.

Description

technical field [0001] The present invention relates to the technical field of biological information, in particular, to a method and device for assembling genome sequences. Background technique [0002] With the birth of next-generation sequencing technologies 454 (Roche), Solexa (Illumina) and SOLiD (ABI), the sequencing throughput has increased rapidly while the cost of sequencing has dropped sharply. This breakthrough in next-generation sequencing technology has greatly promoted the development of genome science, and the whole genome sequences of a large number of species have been published, including James Watson's personal genome, the genome of the first Asian, giant panda, cucumber, etc. [0003] Each round of sequencing of the next-generation sequencer can obtain millions of short-segment sequences, and several rounds of such sequencing are required to completely sequence a genome, which means that in order to obtain a complete genome Atlases must map, localize, an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): C12Q1/68C12M1/00G16B30/20G16B30/10
CPCG06F19/22G01F19/007C12M1/00C12Q1/68G16B30/00G16B30/10G16B30/20
Inventor 韩长磊陈文彬张秀清杨焕明
Owner WUXI QINGLAN BIOLOGICAL SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products