Genome assembly method for constructing overlength continuous DNA (DeoxyriboNucleic Acid) sequence

A DNA sequence and genome assembly technology, applied in the field of genome assembly, can solve the problems of unable to truly restore the original genome sequence, indistinguishable, etc.

Active Publication Date: 2018-11-06
INST OF GENETICS & DEVELOPMENTAL BIOLOGY CHINESE ACAD OF SCI
View PDF11 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the tolerance of errors in sequence comparison, the Reads from different copies of the repeat sequence will be compressed together, and the repeat sequences of different copies will become one, so they cannot be distinguished
However, due to the existence

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Genome assembly method for constructing overlength continuous DNA (DeoxyriboNucleic Acid) sequence
  • Genome assembly method for constructing overlength continuous DNA (DeoxyriboNucleic Acid) sequence
  • Genome assembly method for constructing overlength continuous DNA (DeoxyriboNucleic Acid) sequence

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] Embodiment of the present invention: a genome assembly method for constructing ultra-long continuous DNA sequences, such as figure 1 shown, including the following steps:

[0065] S1, compare all known DNA sequences pairwise, and find similar overlapping regions between each pair of sequences; wherein, the known DNA sequences include anchor sequence fragments (that is, sequence fragments used for anchoring, For example, one or several specific sequence fragments intercepted from the DNA sequence, and / or one or several specific sequence fragments that have been assembled, and / or one or several specific sequence fragments selected from random sequencing Read sequences A specific Read sequence, etc.) and random sequencing Read sequence (in order to improve the accuracy of the Read sequence, the Read sequence can be corrected first, or the original random sequencing Read sequence can be directly used without correction; the correction method includes using The sequencing ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a genome assembly method for constructing an overlength continuous DNA (DeoxyriboNucleic Acid) sequence. The genome assembly method comprises the following steps: S1, finding out an overlapping region between each pair of known DNA sequences; S2, starting from a free tail end of any anchoring sequence fragment, extending the anchoring sequence fragment by using a Read sequence overlapped with the anchoring sequence fragment, cycling for multiple times until the Read sequence which can match the tail end of the other anchoring sequence fragment is encountered and obtaining one or more path sequences; S3, selecting at most one sequence from all the path sequences as an effective linker sequence for linking the tail end of a starting anchoring sequence fragment to thetail end of another endpoint anchoring sequence fragment; S4, linking the starting anchoring sequence fragment with the corresponding endpoint anchoring sequence fragment by utilizing the effective linker sequence; taking the effective linker sequence as a new anchoring sequence fragment after linking or recording the free tail end of the remaining anchoring sequence fragment and switching to S2;repeating S2 to S4 and finally, forming the overlength continuous DNA sequence. The genome assembly method disclosed by the invention has the advantage that the sequences of the whole chromosome and the whole genome are more favorably restored.

Description

technical field [0001] The invention relates to a genome assembly method for constructing superlong continuous DNA sequences, and belongs to the technical field of genome assembly. Background technique [0002] The sequencer generates random reads (Reads) by sequencing the genome fragments. The distribution of these reads on the genome is random. The process of genome assembly is to arrange and connect these Reads in the correct order, assemble them into base-continuous DNA fragments (Contig), and finally restore the sequence of the entire chromosome and the entire genome. This assembly process generally includes three steps: assembly of continuous fragments (Contig), assembly of discontinuous fragments with gaps (Scaffold), and gap filling (GF). The difficulty of genome assembly comes from the existence of a large number of repetitive sequences in the genome (that is, two / segments or multiple / sequences with similar or identical sequences). Repeats in the genome can be di...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): C12N15/10
CPCC12N15/1027
Inventor 梁承志杜会龙
Owner INST OF GENETICS & DEVELOPMENTAL BIOLOGY CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products