Analysis and integration method and device for sequencing of medium-short gene segment

A gene fragment and gene sequence technology, which is applied in the field of analysis and splicing of short and medium gene fragment sequencing, can solve problems such as low error tolerance rate, unsatisfactory splicing performance, and large time consumption.

Inactive Publication Date: 2015-10-07
XI AN JIAOTONG UNIV
View PDF3 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this algorithm needs to consume a lot of time and memory to build de Bruijn graphs and has a low fault tolerance rate, and the splicing performance cannot meet the requirements

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Analysis and integration method and device for sequencing of medium-short gene segment
  • Analysis and integration method and device for sequencing of medium-short gene segment
  • Analysis and integration method and device for sequencing of medium-short gene segment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0058] The invention provides a method for analyzing and splicing the sequencing of short and medium gene fragments, the process of which is as follows: figure 1 shown. The implementation of the present invention will be described in detail below in conjunction with the drawings and examples.

[0059] The flow chart of the analysis splicing method for sequencing short and medium gene fragments is as follows: figure 1 shown.

[0060] 1. Check the read sequence and discard the following two gene fragments: one is the fragment containing the unrecognized base N; the other is the fragment with more than 90% of the A base.

[0061] 2. The read length in the embodiment of the present invention is 35bp, and each read is broken into 28 k-mers with a length of 8 (that is, the two k-mers before and after will overlap the data of 7 base positions), and pass Constructing the mapping relationship compresses each base into a two-digit binary number.

[0062] In the process of data analy...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides an analysis and integration method and device for sequencing of a medium-short gene segment. The method comprises: checking a read sequence and removing gene sequences comprising errors and unreliable information; reading processed read data, analyzing the data and constructing a k-mer structure and a quad-tree structure; constructing an integration storage table and recording the progress condition of the integration process and read information which currently participates in integration; after selecting initial k-mer to start to carry out integration, continuously selecting subsequent k-mer according to an integration scoring formula, and updating the information in the integration storage table structure in real time so as to obtain contig sequences; and combining the contig sequences on the basis of a longest common subsequence method by utilizing read-pair information and generating and outputting super-contig. Aiming at the special requirements of the integration method for performance, the device provided by the present invention is of an embedded handheld structure; and by utilizing the method and the device which are provided by the present invention, analysis and integration on sequencing of the medium-short gene segment can be rapidly and accurately implemented.

Description

technical field [0001] The invention relates to the technical field of biological gene sequencing, in particular to an analysis splicing method and equipment for sequencing short and medium gene fragments. Background technique [0002] In recent years, biological gene sequencing technology has developed rapidly. Although the accuracy of gene sequencing has improved, the time has been shortened, and the cost has been reduced, the amount of data that needs to be processed in the process of gene sequencing has increased. Therefore, it is a key link in gene sequencing to analyze and process massive data and efficiently complete gene fragment splicing with the help of computer technology. [0003] The main characteristics of the sequencing fragments (referred to as reads) obtained by the early Sanger first-generation sequencing technology are: the reads are long (500-1000bp), the number of reads is relatively small, and the overlapping relationship between reads is easy to find, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/20
Inventor 韩九强李严桵钟德星刘俊张新曼
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products