Parallel rapid matching method and system for stored DNA sequence

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of DNA sequence and matching method, which is applied in the field of parallel fast matching method of DNA sequence and its system, which can solve the problem of low efficiency of DNA sequence matching and achieve the effect of improving efficiency and speeding up operation

Inactive Publication Date: 2016-11-09

SHENZHEN UNIV

View PDF5 Cites 25 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] In view of this, the purpose of the present invention is to provide a parallel fast matching method and system for stored DNA sequences, aiming to solve the problem of low matching efficiency for DNA sequences in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0058] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0059] The specific embodiment of the present invention provides a parallel fast matching method for stored DNA sequences, which is applied to the compressed storage of DNA sequences, wherein the method mainly includes the following steps:

[0060] S11. Hash index construction step: construct a hash index based on the reference genome in FASTA format based on the prefix, find out all the kmers with the specified prefix and use them as key values to build a hash index table, and each entry stores the position where the kmer appears ;

[0061] S12, file block step: input the DNA sequence file in F...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a parallel rapid matching method and system for a stored DNA sequence. The parallel rapid matching method and system are applied to compressed storage for a DNA sequence. The method comprises the steps that a Hash index is built, wherein the Hash index is built based on a reference genome of a prefix for the FASTA format, all kmers of the designated prefix are found, a Hash index table is built with the kmers as key values, and each table stores corresponding kmer appearing position; a file is segmented, wherein the DNA sequence file with the FASTQ format is input and segmented; multithread processing is carried out, wherein multiple threads are started for processing multiple tasks determined by the number of threads, the multiple sub blocks call a matching function rapidly positioned based on the kmer Hash index at the same time, the sub blocks are matched into the target reference genome with the FASTA format in parallel, and the purpose of compressed storage is achieved by substituting the original DNA sequence with a storage matching result.

Description

technical field [0001] The invention relates to the field of data compression, in particular to a stored DNA sequence-oriented parallel fast matching method and a system thereof. Background technique [0002] The development of next-generation sequencing technology has promoted the generation of high-throughput DNA sequencing data. The exponential growth rate of data exceeds the growth rate of computer microprocessors and storage devices. High-throughput DNA sequencing data compression technology is an effective way to solve DNA sequence Methods of storage and transmission. Before being applied to compressed storage, a common practice is to match the high-throughput sequencing data FASTQ sequence file to the existing genome, that is, the reference genome. The format of the reference genome file is the FASTA file format, which stores the target sequence and the reference genome. The matching result replaces the original sequence to achieve the purpose of compressed storage, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F19/20G06F17/30G06F9/48

CPCG06F9/485G06F16/2255G16B25/00

Inventor 朱泽轩邓清津储颖孙怡雯

Owner SHENZHEN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Parallel rapid matching method and system for stored DNA sequence

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology