Nucleic acid sequence splicing method and device

A nucleic acid sequence and sequence technology, which is applied in the field of nucleic acid sequence splicing methods and devices, can solve problems such as poor results, splicing algorithm difficulties, and short length, and achieve the effect of improving the construction effect

Active Publication Date: 2017-11-07
BGI SHENZHEN CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The basic splicing problem faced by the second-generation sequencing technology is that due to the short read length of the second-generation sequencing technology, it brings difficulties to the splicing algorithm, including how to deal with sequencing errors and how to deal with repeated sequences
In real genome projects, the repeat sequence content of large genomes, especially plant genomes, is very high, and some repeat sequences are as high as 60%. The short and fragmented contig (contig) sequence will affect the construction of scaffold (scaffold, which is composed of sequence-determined contigs connections) and hole filling in the subsequent splicing process, and the effect is not good.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Nucleic acid sequence splicing method and device
  • Nucleic acid sequence splicing method and device
  • Nucleic acid sequence splicing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] At present, researchers have developed different algorithms and written a large number of splicing software. The more successful stitching software is velvet, SOAPdenovo, etc. Among them, SOAPdenovo is developed by Shenzhen Huada Genomics. It is a splicing software for Illumina high-throughput sequencing data, which can be downloaded from http: / / soap.genomics.org.cn / soapdenovo.html. SOAPdenovo is based on the de Bruijn (De Bruijn) graph algorithm, and its splicing process generally includes the following steps A-F.

[0024] Step A, constructing insert read libraries of different lengths, such as 180bp (base pair, base pair), 500bp, etc.;

[0025] Step B, truncating the reads obtained by sequencing all the small read libraries (180 / 500bp) into smaller sequence fragments, and constructing a de Bruijn graph through the overlapping relationship between them, so as to connect these reads;

[0026] Step C, the de Bruijn graph constructed in step B is very complicated. In or...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a nucleic acid sequence splicing method and device, comprising: receiving a sequencing sequence, the sequencing sequence includes readings and testing data; constructing an original splicing graph according to the readings; and comparing the testing data to the edge of the original splicing graph ; Select the anchor point edge from the edge set of the original mosaic graph; construct a local subgraph centered on the anchor point edge; simplify the local subgraph, and repeatedly select the anchor point edge in the simplification result for processing until there is no new anchor point edge; Merge the remaining local subgraphs after processing, and output the merged result as the splicing result. According to the method of the present invention, by comparing the test pass data to the original mosaic graph constructed by the read segment, the anchor point edge is selected from it and the local subgraph is constructed accordingly, and then the local subgraph is simplified and merged to obtain The longer path can solve the problem of path selection of repeated sequences between anchor points, and then complete the task of splicing sequence sequences, making it possible to improve the effect of scaffold construction.

Description

technical field [0001] The invention relates to the field of biological information processing, in particular to a nucleic acid sequence splicing method and device. Background technique [0002] In the development of genome sequencing technology, the second-generation sequencing technology with low cost, high sequencing throughput (also known as high throughput, refers to the amount of data output obtained within a certain period of time), and more accurate features has brought unprecedented application prospects , such as genome assembly, structural variation detection, etc. The reads (reads, small fragments obtained by random scrambling of DNA sequences) obtained by the second-generation sequencing technology are not only short in length, but also extremely large in number and high in coverage, which brings unprecedented difficulties to sequence assembly. Genome sequencing A key step in is sequence assembly. Sequence splicing is to compare and merge small reads into larg...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F19/18G06F19/20
Inventor 李振宇陈燕香张浩袁剑颖张广鑫李一萱
Owner BGI SHENZHEN CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products