Gene order fragment fast positioning method based on bitmap

A gene sequence and positioning method technology, applied in genomics, special data processing applications, instruments, etc., can solve problems such as computing burdens, achieve fast positioning, accurate positioning, and accelerate the organization of data query processes

Active Publication Date: 2016-02-17
GENETALKS BIO TECH CHANGSHA CO LTD
View PDF6 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In summary, the purpose of BloomFilter design is mainly for the situation that most of the query data does not hit the data in the database. Its disadvantage is that it can only give a

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Gene order fragment fast positioning method based on bitmap
  • Gene order fragment fast positioning method based on bitmap
  • Gene order fragment fast positioning method based on bitmap

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] Such as figure 1 As shown, the steps of the method for rapidly locating gene sequence fragments based on bitmaps in this embodiment include:

[0038] 1) Construct a bitmap for storing gene sequence fragment information, and each data bit of the bitmap is initialized to 0;

[0039] 2) The gene sequence fragments in the gene reference chain are cyclically shifted to generate a plurality of gene sequence fragment vectors, and the gene sequence fragment vectors are mapped to the unique data bits in the bitmap using a hash function and the data bits are set from 0 to 1, Count the number of occurrences t of all data bits in the row R where the data bit is located from the 0th column to the column where the data bit is located, and the key-value pair t ,Value t >Store in the tth position of the hash bucket corresponding to the row number R of the data bit in the database; if all the data bits of the multiple gene sequence segment vectors of a certain gene sequence segment hav...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a gene order fragment fast positioning method based on a bitmap; the method comprises the following steps: constructing the bitmap, circulating and excursing each gene order fragment in a gene reference chain one by one so as to form a plurality of gene order fragment vectors, mapping the vectors to the only data bit in the bitmap, replacing 0 by 1, counting 1 appearing frequency t before the column on which the line located, and storing a key value in a t position of a Hash bucket corresponding to a line number R of the data bit in the database. If all data bits are set by 1, adding the data bits into a clash assembly; circulating and excursing to-be-positioned gene order fragments one by one so as to form a plurality of gene order fragment vectors, mapping the vectors to the only data bit in the bitmap, thus positioning a Hash data table. The positioning method can fast filter untargeted query data, can provide relatively accurate data positions of the Hash bucket, thus greatly accelerating a Hash bucket dividing mode organizing data query process; the gene order fragment fast positioning method based on the bitmap is high in space efficiency, fast in positioning speed, and accurate in positioning.

Description

technical field [0001] The invention relates to gene sequencing technology, in particular to a bitmap-based rapid positioning method for gene sequence fragments. Background technique [0002] Gene fragment mapping technology is the basis of current high-throughput gene sequencing. Through high-throughput sequencing, a large number of gene sequence fragments can be generated. Actual experiments have found that most of the gene sequence fragments can find accurate and complete matches in the longer reference gene sequence. The reference gene sequence can be regarded as a continuous string composed of A, C, G, T4 letters, and its length can usually be much more than 10 9 characters, starting from each character in such a long reference gene sequence, intercepting n characters as the Key in the Key-Value database, its position on the reference chain and other incidental information as Value, can be organized into a huge Key-Value database. [0003] Generally speaking, it is m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/18
CPCG16B20/00
Inventor 宋卓李根
Owner GENETALKS BIO TECH CHANGSHA CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products