Parallel gene splicing method based on De Bruijn graph

A gene splicing and depth map technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of large genome splicing and massive data splicing that cannot be large genomes, etc., to achieve fast splicing speed and parallelism. The effect of high, simplified process is simple

Active Publication Date: 2013-08-21
SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI
View PDF3 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The present invention provides a parallel gene mosaic method based on De Bruijn graph, aiming at solving the problem that the traditional single machine serial gene mo

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel gene splicing method based on De Bruijn graph
  • Parallel gene splicing method based on De Bruijn graph
  • Parallel gene splicing method based on De Bruijn graph

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0044] In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

[0045] like figure 1 As shown, the embodiment of the present invention provides a parallel gene splicing method based on De Bruijn diagram, the method includes the following steps:

[0046] Step S1: Construct a distributed De Bruijn graph in parallel.

[0047] In this embodiment, the parallel construction of the distributed De Bruijn graph is performed through the following steps, such as figure 2 shown:

[0048]Step S11, initialization, all processors read the original short read sequence file in parallel, and each processor reads a part of the short read sequence file. Depending on ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of gene sequencing, and provides a parallel gene splicing method based on a De Bruijn graph. The parallel gene splicing method based on the De Bruijn graph comprises the following steps that S1, the distributed De Bruijn graph is built in parallel; S2, error paths are removed; S3, the De Bruijn graph is simplified on the base of a depth graph traversal method; S4, contig is combined, and scaffold is generated; S5, the scaffold is output. The parallel gene splicing method is based on a trunking system, the De Bruijn graph is built in parallel, and the problems that when large genomes are spliced, as the data volume of the large genomes is too large, graphs cannot be built and further processing cannot be executed in traditional single-machine serial gene splicing algorithms are solved. Meanwhile, in the simplifying process, parallel simplification based on depth graph traversal is carried out, the graph simplifying process is simple, the degree of parallelism is high, and the splicing speed is high.

Description

technical field [0001] The invention relates to the technical field of gene sequencing, in particular to a parallel gene splicing method based on a De Bruijn graph. Background technique [0002] Gene sequencing is one of the most important problems in the field of modern bioinformatics. With the development of modern biology, gene sequencing has been more and more widely used in various fields of society, such as gene diagnosis, gene therapy, drug design and so on. An important step in gene sequencing is gene splicing. [0003] With the widespread application of gene sequencing, on the one hand, it needs to sequence a large number of organisms with large genomes. When sequencing large genomes, the amount of data is very large; on the other hand, this also requires faster and faster gene splicing algorithms. [0004] There are two main types of gene sequence assembly algorithms. The first category is algorithms based on overlap graphs. In the overlap graph, each short se...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/18
Inventor 曾理成杰峰孟金涛涂志兵冯圣中
Owner SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products