Method and system for filtering sequence segments in short-sequence assembly

A filtering method and a technology of sequence fragments, which are applied in the field of genetic engineering, can solve problems such as the inability to analyze gene sequences, consume huge memory, and bubble type errors, and achieve the effects of small errors, reduced short string sets, and improved performance

Active Publication Date: 2013-04-24
深圳市弘志拓新创业投资企业(有限合伙)
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For the second problem, high-throughput data itself can generate large-scale k-mer nodes, which will be constructed into graphs for analysis, and the number of k-mer nodes will increase due to the introduction of sequencing errors. 5 times larger, for example, the human genome sequencing data will generate about 15G k-mer; if the k-mer generated by the sequencing error enters the computer for direct processing, it will consume huge memory, for example, if the human

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for filtering sequence segments in short-sequence assembly
  • Method and system for filtering sequence segments in short-sequence assembly
  • Method and system for filtering sequence segments in short-sequence assembly

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] In order to enable those skilled in the art to better understand the technical solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0042] In the embodiment of the present invention, short strings (k-mer) of fixed base length are obtained by sliding and cutting the received sequencing sequence base by base, and the sequence values ​​of the obtained short strings are stored, and the obtained For each occurrence frequency of the short strings, draw a frequency statistical graph of the short strings, calculate the short string frequency threshold, and filter the short strings whose frequency is less than the threshold.

[0043] figure 1 Show...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for filtering sequence segments in a short-sequence assembly. The method comprises the following steps: receiving measured sequences, respectively performing base slide cutting to received measured sequences one by one to obtain short strings with fixed base length; storing sequence values and occurrence frequency of the short strings as a node; calculating a short string frequency threshold value; and filtering the short strings with frequency smaller than the threshold value. The invention further provides a system for filtering sequence segments in the short-sequence assembly. The method and system for filtering sequence segments in the short-sequence assembly has the advantages of filtering wrong short strings, decreasing assembled and spliced short string sets, reducing internal memory required by assembling and splicing programs, improving the performance of the assembling and splicing programs, performing statistics to the frequency of the short strings while storing short string modes, and being simple in operation and small in error.

Description

technical field [0001] The invention relates to the technical field of genetic engineering, in particular to a method and system for filtering sequence fragments in short sequence assembly. Background technique [0002] The short sequences generated by the new sequencing technology have the following two characteristics: first, the sequence length is short; second, the data volume is large. Commonly used software such as phrap for long sequence assembly is based on the overlap between sequences for splicing and assembly. This method will have a problem of too much calculation when applied to short sequences, and has no practical application value. Emerging short-sequence assemblies are limited by memory, time, etc., and have only been successfully applied to smaller prokaryotic genomes so far. Next-generation sequencing analysis has the following difficulties: First, massive sequence fragments, the length of the genome source sequence ranges from 100,000 bases (such as porc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/22
Inventor 孟金涛魏彦杰曾理成杰峰冯圣中
Owner 深圳市弘志拓新创业投资企业(有限合伙)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products