Gene sequence alignment method and system

A technology of gene sequence and genome sequence, applied in the field of gene sequence comparison method and system based on heterogeneous cloud platform, to achieve the effect of improving speed and accuracy

Pending Publication Date: 2021-04-30
HUAZHONG AGRI UNIV
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Sequence alignment algorithms for three generations of data have been developing continuously in recent years, but there is a lot of room for improvement in terms of speed and accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Gene sequence alignment method and system
  • Gene sequence alignment method and system
  • Gene sequence alignment method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0065] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0066] At present, the distributed computing frameworks mainly include Hadoop and Spark. The new Spark is an open source cluster computing system based on memory computing. Spark's in-memory calculations are 100 times faster than Hadoop. In addition, with the development and maturity of GPU and CUDA technology, CUDA parallel computing has also been widely used in various fields. Spark and CUDA have natural parallel processing capabilities, and how to integrat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a gene sequence alignment method and system. The method comprises the following steps: storing a reference genome sequence and a query genome sequence in a distributed storage system; under a Spark heterogeneous distributed computing platform framework, segmenting a reference genome sequence according to row offset, and preprocessing to obtain a plurality of preprocessed reference data sets; establishing an index for each preprocessing reference data set by adopting a suffix array algorithm, and combining all the preprocessing reference data sets after the index is established to obtain a reference sequence index file; carrying out CUDA fine-grained sequence comparison on each fragment in the query genome sequence and a reference sequence index file by adopting a seed extension algorithm, and determining position information of each fragment in the reference sequence index file; and combining the position information of all the fragments in the reference sequence index file to obtain a gene sequence comparison result. According to the invention, the calculation speed and precision of a large-scale sequence alignment algorithm are improved.

Description

technical field [0001] The invention relates to the technical field of gene comparison, in particular to a gene sequence comparison method and system based on a heterogeneous cloud platform. Background technique [0002] Since the invention of the first-generation genome sequencing technology, genome sequencing technology has undergone several technical iterations. The first-generation genome sequencing technology is also called Sanger sequencing. It is characterized by long sequencing sequences, up to 1kbp (kilobases pair), and high accuracy. However, due to its high price and low throughput, it is gradually replaced by the second generation genome sequencing technology (Next Generation Sequencing, NGS). According to the characteristics of next-generation sequencing data, scholars from various countries have designed various algorithms for sequence comparison, sequence splicing and sequence classification. Most of these researches on sequence alignment algorithms are base...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B30/10G16B40/00
CPCG16B30/10G16B40/00
Inventor 郑芳赵良田芳倪福川汪毅姚雅鹃姚娟章程
Owner HUAZHONG AGRI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products