Quick comparing and positioning method for gene sequence segments on reference genome

A reference genome and gene sequence technology, applied in the field of rapid comparison and positioning of gene sequence fragments on the reference genome, can solve the problems of accelerating the data analysis process, consuming large computing resources and time, and achieving low time complexity and low time complexity Accuracy and high positioning efficiency

Inactive Publication Date: 2016-01-13
GENETALKS BIO TECH CHANGSHA CO LTD
View PDF2 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The first step in analyzing sequencing data is sequence alignment, that is, aligning sequence fragments to the reference genome, which often consumes a lot of computing resources and time
Aligning and mapping sequence fragments is becoming a bottleneck for accelerating data analysis workflows

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Quick comparing and positioning method for gene sequence segments on reference genome
  • Quick comparing and positioning method for gene sequence segments on reference genome
  • Quick comparing and positioning method for gene sequence segments on reference genome

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041] Hereinafter, taking CNV analysis (CopyNumberVariation analysis / copy number variation analysis) through rapid sequence alignment as an example, the method for rapid alignment and positioning of gene sequence fragments on the reference genome of the present invention will be further described.

[0042] Such as figure 1As shown, the steps of the method for rapid alignment and positioning of gene sequence fragments on the reference genome in this embodiment include:

[0043] 1) Extract gene sequence fragments from the reference genome;

[0044] 2) For each gene sequence fragment on the reference genome, use the gene sequence fragment as the key part and the target information of the gene sequence fragment as the Value part to establish a key-value pair, and use the hash function mapping to determine the target storage in the database for the gene sequence fragment location, and write the key-value pair into the target storage location, and finally complete the library cons...

Embodiment 2

[0085] In addition to CNV analysis, the rapid alignment and positioning method of gene sequence fragments on the reference genome of the present invention can also be used for bacterial species identification. In this embodiment, the reference genome is a cross-species reference genome, and when step 2) is used to build a reference genome, the reference genome is downloaded from the NCBIRefSeq database (http: / / www.ncbi.nlm.nih.gov / refseq / ) The human reference genome and all bacterial and viral genome sequences. Use the gem-mappability software to calculate the mappability map of the reference sequence 31mer (human, bacteria, virus), and select the unique sequence fragment with a mappability of 1; in the NCBI species taxonomy database (http: / / www.ncbi.nlm.nih.gov / taxonomy) to download detailed taxonomic information for species. For each gene sequence fragment on the reference genome, the key-value pair is established with the gene sequence fragment as the key part and the tar...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a quick comparing and positioning method for a gene sequence segment on a reference genome. The method comprises: firstly, extracting the gene sequence segments from the reference genome; aiming at each gene sequence segment, establishing a key value pair by using the gene sequence segment as a key part and using target information of the gene sequence segment as a Value part, mapping the gene sequence segment by adopting a Hash function to determine a target storage location in a database, and writing the key value pair into the target storage location to complete database establishment of the reference genome; and when quick comparison and positioning need to be carried out on a gene sequence segment to be matched, firstly, mapping the gene sequence segment to be matched by adopting the Hash function to determine a target storage location of the gene sequence segment to be matched in the database, and then reading the target information of the gene sequence segment of the key value pair corresponding to the matched gene sequence segment from the target storage location. The quick comparing and positioning method disclosed by the present invention has the advantages of low time complexity, high comparing and positioning speed, high positioning efficiency, wide application range, and applicability to cross-species hybrid quick analysis.

Description

technical field [0001] The invention relates to bioinformatics analysis technology of gene sequencing data, in particular to a method for rapid comparison and positioning of gene sequence fragments on a reference genome. Background technique [0002] With the development of gene sequencing technology, the price of sequencing has dropped exponentially, even faster than Moore's Law. The accompanying large amount of sequencing data poses great challenges for fast and accurate computational analysis. The first step in analyzing sequencing data is sequence alignment, that is, aligning sequence fragments to the reference genome, which often consumes a lot of computing resources and time. Aligning and mapping sequence fragments is becoming a bottleneck to speed up data analysis workflows. [0003] In order to solve the problem of alignment and positioning of sequence fragments on the reference genome, people have developed many algorithms and widely used specific implementations,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/18
Inventor 宋卓李根
Owner GENETALKS BIO TECH CHANGSHA CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products