Genome assembling method

A genome assembly and consistency technology, applied in the field of genome assembly, can solve problems such as indistinguishability, fragmentation of continuous sequence fragments, and many operation steps

Active Publication Date: 2019-01-18
INST OF GENETICS & DEVELOPMENTAL BIOLOGY CHINESE ACAD OF SCI
View PDF2 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a fast and efficient genome assembly method, which can effectively solve the problems existing in the prior art, especially in the prior art by compressing read segments to find paths without intersection nodes in the assembly graph However, due to too many operation steps in this process, the assembly speed is slow and the software complexity is high; also because in the process of compressing similar multi-segment repeat sequences, many reads from different copies of repeat sequences It will be compressed together, causing different copies of the repeated sequence to become one, so they cannot be distinguished; and due to the existence of crossover nodes due to similar sequences or sequencing errors, the formed sequence fragments are disconnected at the compressed start and end positions, resulting in The assembled sequence has the problem of fragmentation of continuous fragments

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Genome assembling method
  • Genome assembling method
  • Genome assembling method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0057] Embodiments of the present invention, a genome assembly method, such as figure 1 shown, including the following steps:

[0058] S1, compare all known DNA sequence fragments with each other, and find all overlapping read pairs with similar overlapping regions; wherein, the known DNA sequences include anchor sequence fragment set A and random sequencing read set B; the set of anchor sequence fragments A includes: one of the sequence fragment set A1 intercepted from the DNA sequence, the sequence fragment set A2 that has been assembled, and the read segment set A3 selected from random sequencing reads, or Several collections; said comparing all known DNA sequence fragments to each other, including comparing all anchor sequence fragments to all sequencing reads to each other, comparing all sequencing reads to each other, or comparing all T...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a genome assembling method, comprising the following four steps: sequence comparison, sequence extension, completion of extension and removal of redundancy. Genome-wide assembly is divided into two main steps: assembly of single copy sequence and assembly of remaining sequence, which simplifies the implementation process, makes the whole method fast, efficient and error-free, and greatly improves the continuity of the assembly sequence fragments and the assembly quality. By assembling the whole genome sequence with the method of the invention, the whole genome sequencecan be recovered quickly and efficiently, and the whole chromosome and the whole genome sequence can be recovered more easily. The genome assembling method of the invention can also be used for filling the sequence of the blank region in the genome sequence, in particular, the assembling effect can be greatly improved by combining the genome optical map information or the chromosome grouping sorting information. And for judging whether there is a connection between any two sequences or for estimating a distance between two adjacent sequences.

Description

technical field [0001] The invention relates to a genome assembly method, belonging to the technical field of genome assembly. Background technique [0002] Sequencers generate random read sequence fragments (reads) by sequencing fragments of the genome. The distribution of these reads across the genome is random. The process of genome assembly is to arrange and connect these reads in the correct order, assemble them into base-continuous DNA sequence fragments (continuous fragments), and finally restore the sequence of the entire chromosome and the entire genome. This assembly process generally includes three steps: the assembly of continuous fragments, the assembly of discontinuous fragments with gaps, and the filling of gaps. The difficulty of genome assembly comes from the presence of a large number of repetitive sequences in the genome (that is, two / segments or multiple / sequences of variable length, similar or identical sequences). In addition, the sequencer will make...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): C12N15/10
CPCC12N15/1027
Inventor 梁承志杜会龙
Owner INST OF GENETICS & DEVELOPMENTAL BIOLOGY CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products