Unlock instant, AI-driven research and patent intelligence for your innovation.

Nucleotide sequence assembly method and device

A nucleic acid sequence and sequence technology, applied in the field of nucleic acid sequence splicing methods and devices, can solve problems such as poor effect, difficult splicing algorithm, short length, etc., and achieve the effect of improving the construction effect

Active Publication Date: 2015-08-19
BGI SHENZHEN CO LTD
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The basic splicing problem faced by the second-generation sequencing technology is that due to the short read length of the second-generation sequencing technology, it brings difficulties to the splicing algorithm, including how to deal with sequencing errors and how to deal with repeated sequences
In real genome projects, the repeat sequence content of large genomes, especially plant genomes, is very high, and some repeat sequences are as high as 60%. The short and fragmented contig (contig) sequence will affect the construction of scaffold (scaffold, which is composed of sequence-determined contigs connections) and hole filling in the subsequent splicing process, and the effect is not good.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Nucleotide sequence assembly method and device
  • Nucleotide sequence assembly method and device
  • Nucleotide sequence assembly method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] At present, researchers have developed different algorithms and written a large number of splicing software. The more successful stitching software is velvet, SOAPdenovo, etc. Among them, SOAPdenovo is developed by Shenzhen Huada Genomics. It is a splicing software for Illumina high-throughput sequencing data, which can be downloaded from http: / / soap.genomics.org.cn / soapdenovo.html. SOAPdenovo is based on the de Bruijn (De Bruijn) graph algorithm, and its splicing process generally includes the following steps A-F.

[0024] Step A, constructing insert read libraries of different lengths, such as 180bp (base pair, base pair), 500bp, etc.;

[0025] Step B, truncating the reads obtained by sequencing all the small read libraries (180 / 500bp) into smaller sequence fragments, and constructing a de Bruijn graph through the overlapping relationship between them, so as to connect these reads;

[0026] Step C, the de Bruijn graph constructed in step B is very complicated. In or...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a nucleotide sequence assembly method and device. The method comprises the steps of receiving sequencing sequences, wherein the sequencing sequences comprise reads and sequenced data; creating an original assembly graph according to the reads; aligning the sequenced data to the edges of the original assembly graph; selecting anchor point edges from an edge set of the original assembly graph; creating partial subgraphs with the anchor point edges as the centers; simplifying the partial subgraphs, repeatedly selecting the anchor point edges from a simplification result, and carrying out processing till no new anchor point edge exists; after processing, combining the left local subgraphs, and outputting a combination result as an assembly result. According to the method of the invention, sequenced data is aligned to the original assembly graph composed of the reads, the anchor point edges are selected from the original assembly graph and the partial subgraphs are built according to the anchor point edges, and then through simplification and combination of the partial subgraphs, a longer path is obtained, so that the problem of path selection of repetitive sequences between the anchor point edges is solved, the assembly of the sequencing sequences is completed, and the scaffold creation effect is likely to be improved.

Description

technical field [0001] The invention relates to the field of biological information processing, in particular to a nucleic acid sequence splicing method and device. Background technique [0002] In the development of genome sequencing technology, the second-generation sequencing technology with low cost, high sequencing throughput (also known as high throughput, refers to the amount of data output obtained within a certain period of time), and more accurate features has brought unprecedented application prospects , such as genome assembly, structural variation detection, etc. The reads (reads, small fragments obtained by random scrambling of DNA sequences) obtained by the second-generation sequencing technology are not only short in length, but also extremely large in number and high in coverage, which brings unprecedented difficulties to sequence assembly. Genome sequencing A key step in is sequence assembly. Sequence splicing is to compare and merge small reads into larg...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F19/18G06F19/20
Inventor 李振宇陈燕香张浩袁剑颖张广鑫李一萱
Owner BGI SHENZHEN CO LTD