Sequence matching method based on k-mer

A sequence and base technology, applied in the field of sequence alignment based on k-mer, can solve the problems of large memory occupation and slow processing speed, so as to meet the alignment results, improve efficiency and accuracy, and save computing resources and time. cost effect

Pending Publication Date: 2022-05-20
浙江天科高新技术发展有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the prior art, the most representative comparison algorithms are bitmap method and dynamic programming algorithm, but these algorithms have disadvantages such as slow processing speed and large memory usage when faced with a large amount of data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sequence matching method based on k-mer
  • Sequence matching method based on k-mer
  • Sequence matching method based on k-mer

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] To better illustrate the present invention, the following in conjunction with the embodiments to be further described, examples of the embodiments are shown in the accompanying drawings. The embodiments described below by reference to the accompanying drawings are exemplary and are intended to be used to explain the present invention, and cannot be construed as a limitation of the present invention.

[0048] The sequence combination method provided by the present invention is based on the core idea of global comparison, on the basis of which the k-mer idea is further introduced, in accordance with custom scoring and specific backtracking rules, to achieve the combination of sequences. as Figure 1 As shown, the specific core steps are as follows: the first step, respectively, the input seq1 sequence and seq2 sequence according to the first predetermined length k -mer analysis, obtained k -mer sequence set, described k -mer is predeterminedly longer than the number of mismatc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a sequence matching method based on k-mer. According to the method, k-mer analysis is carried out on the seq1 sequence and the seq2 sequence, a k-mer set of the two sequences is obtained, and consistent fragments are screened out. And then carrying out sequence division by utilizing the consistent fragments, and carrying out global matching on different difference fragment sequences. Finally, combining the matching results from the 5'end to the 3 'end to obtain the matching result of the complete sequence. According to the method, the sequence matching time can be greatly shortened by using the k-mer method, the calculation memory occupied in the matching process can be greatly reduced, a brand new core idea of sequence matching is established, and a new efficient technical means is provided for sequence matching.

Description

Technical field [0001] The present invention belongs to the field of bioinformatics, in particular, the present invention relates to a k-mer-based sequence comparison method. Background [0002] With the advent of the next generation of sequencing technologies 454 (Roche), Solexa (Illumina), and SOLiD (ABI), sequencing throughput has increased rapidly, while sequencing costs have dropped dramatically, a breakthrough that has greatly advanced genomic science. Species identification by next-generation sequencing technology is a faster and more accurate identification method than traditional biochemical identification. The general step of a generation of sequencing bacteria identification is to obtain the entire sequence information by detecting the fluorescence signal, and then compare the sequence with the database to obtain the species identification information. Through transcriptomics and proteomics and other related technologies, the matching analysis of gene expression profil...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B30/10
CPCG16B30/10
Inventor 王庭璋李樱红张力孙玲莉洪烨庞襄伟刘洋邱晓力
Owner 浙江天科高新技术发展有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products