Divide-and-conquer global alignment algorithm for finding highly similar candidates of a sequence in database

A divide-and-conquer and similarity technology, applied in the field of comparison, can solve the problems of inaccurate comparison, insufficient speed, and inability to process sequencing data, etc.

Active Publication Date: 2018-03-13
ACAD SINIC
View PDF6 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Although there are many alignment methods that can handle the huge amount of short sequence data generated by NGS technology, some methods are not fast enough, and some methods are not accurate enough
In addition, the third-generation sequencing technology makes the comparison more challenging, and its sequencing brings longer sequences and higher error rates
For example, the PacBio RS II system can generate sequencing sequences with an average length of 5,500bp to 8,500bp, but the accuracy rate of a single sequencing sequence is only 87% on average, and most short sequence alignment methods cannot handle such sequencing data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Divide-and-conquer global alignment algorithm for finding highly similar candidates of a sequence in database
  • Divide-and-conquer global alignment algorithm for finding highly similar candidates of a sequence in database
  • Divide-and-conquer global alignment algorithm for finding highly similar candidates of a sequence in database

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] The present invention will be further described below in conjunction with specific embodiments, and the advantages and characteristics of the present invention will become clearer along with the description. However, these embodiments are only exemplary and do not constitute any limitation to the scope of the present invention. Those skilled in the art should understand that the details and forms of the technical solutions of the present invention can be modified or replaced without departing from the spirit and scope of the present invention, but these modifications and replacements all fall within the protection scope of the present invention.

[0019] Algorithm overview

[0020] Most of the alignment methods based on suffix array or block sorting compression follow the "seed and extension" method, that is, the longest identical fragment (MEM) is used as the seed and extended left and right to develop the final sequence alignment, while The extended approach is imp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A divide-and-conquer global alignment algorithm for finding highly similar candidates of a sequence in database is disclosed. The invention gives a divide-and-conquer algorithm called Kart, that separates the given sequence into smaller pieces whose alignment can be carried out independently, and their concatenated alignment constitutes the global alignment of the entire sequence. Kart could be viewed as aligning multiple seeds simultaneously in parallel. We illustrate the idea using the read mapping of Next-generation sequencing (NGS) as an example. NGS provides a great opportunity to investigate genome-wide variation at nucleotide resolution. Due to the huge amount of data, NGS applications require very fast alignment algorithms. The invention can process long reads as fast as short reads. Furthermore, it can tolerate much higher error rates. The experiments show that Kart spends much less time on longer reads than most aligners and still produce reliable alignments.

Description

technical field [0001] The invention relates to an alignment method, in particular to an alignment method using a divide and conquer method for high similarity sequences. Background technique [0002] High-throughput sequencing (Next-generation sequencing, NGS) technology allows biologists to explore the differences between gene bodies with a precision down to nucleotide resolution, resulting in many important research discoveries. NGS has now become one of the main methods for DNA sequencing and the exploration of genome differences in ethnic groups. Because new sequencing technologies can generate millions or even billions of nucleotide sequencing data in a day, many NGS applications require fast alignment methods for the analysis of large numbers of sequences. Traditional sequence alignment methods such as BLAST[1] or BLAT[2] cannot efficiently handle such a large amount of short sequence data, so many alignment methods for NGS short sequences have been developed in rece...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/22G16B30/10
CPCG16B30/00G16B30/10G06F16/24566G06F16/285
Inventor 许闻廉林信男
Owner ACAD SINIC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products