Genome sequencing data sequence assembling method

A genome sequencing and reference genome technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problem of not being able to accurately determine the exact placement of fragments, and achieve accurate and effective results

Active Publication Date: 2015-07-01
天工生物科技(天津)有限公司
View PDF2 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] In addition, for repetitive sequence regions whose length is much longer than the sequencing read length, currently commonly used assembly algorithms can only forcibly assemble related sequencing fragments to form a consistent fragment
Although other related methods published so far can estimate the number of repetitions of a repeated segment on the genome based on the coverage depth, they cannot accurately determine the exact placement of the segment on the genome

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Genome sequencing data sequence assembling method
  • Genome sequencing data sequence assembling method
  • Genome sequencing data sequence assembling method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] figure 2 It is a schematic flowchart of the genome sequencing data sequence assembly method in the embodiment of the present invention. like figure 2 As shown, the method includes:

[0035] Step 201: According to the overlap relationship between read sequences obtained by sequencing, construct an overlap relationship graph and a reverse complement graph. All corresponding nodes in the overlapping relationship graph and its reverse complementary graph are in reverse complementary and equivalent relationship with each other. Since we usually only know whether the two sequences have an overlapping relationship, but are not sure about the final arrangement order of the sequence groups in the assembly result, we need to construct two graphs at the same time, the overlapping relationship graph G and its anti-complementary sequence graph G'. As long as there is an overlapping relationship between two sequence fragments, it can be marked in the overlapping relationship dia...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a genome sequencing data sequence assembling method. By means of the method, the advantages of denovo sequencing and the advantages of re-sequencing can be integrated, and genome sequencing data sequences are effectively assembled. A drafted sequence traversal path of the genome sequencing sequences obtained based on a reference sequence and generated after sequencing data are mapped to the reference sequence in a comparing mode and an overlapping relation set of the genome sequencing data are known. The set comprises a determined relation subset and an undetermined relation subset. The method includes the steps that after the sequencing data sequences are mapped to an affinis reference genome in a comparing mode, the drafted data traversal path of the genome sequencing sequences is obtained based on the reference sequence, all nodes in the drafted sequence traversal path are checked one by one; iterated revision is conducted on the drafted sequence traversal path according to the connection relation of the determined relation subset and/or the undetermined relation subset in the overlapping relation set, and the overlapping relation set is updated; the next node is checked based on the updated drafted sequence traversal path and the updated overlapping relation set till the last node is checked.

Description

technical field [0001] The invention relates to genome sequence assembly technology, in particular to a genome assembly method under the condition of a close reference sequence. technical background [0002] With the continuous advancement of sequencing technology, a large number of microbial genomes have been completed and submitted to the database. For microorganisms with industrial uses, most of the industrial strains are obtained through continuous screening and transformation of existing strains. Therefore, the genome sequences of these starting strains or closely related strains can provide certain guidance and reference for the genome assembly process. [0003] In order to obtain a complete genome map of an industrial strain, De novo (de novo sequencing) is a commonly used analysis solution. De novo refers to the technical process of sequencing, assembling, framework building and gap (Gap) filling of the genome of the target species using sequencing and conventional...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 孙际宾李澎鹏郑平马延和
Owner 天工生物科技(天津)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products