Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Re-sequencing sequence alignment method based on Spark framework

A sequence comparison and resequencing technology, applied in the fields of computer science and bioinformatics, can solve problems such as insufficient accuracy, poor paired-end data support, and late appearance, and achieve the effect of improving efficiency

Inactive Publication Date: 2019-08-16
SHENZHEN INST OF ADVANCED TECH
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In genome resequencing, there are many gene comparison tools such as SOAP, BWA, bowtie2, etc. SOAP is the first gene comparison tool that can use a small computer memory to compare gene sequences, but its paired The support for -end data is poor, while BWA and bowtie2 appeared later than SOAP, but they can handle gene sequencing data in single-end and paired-end formats well. In terms of data processing speed, bowtie2 is better than BWA Obvious speed advantage, but the accuracy rate is slightly insufficient. However, the methods of these comparison tools are mostly based on a single node. At present, the preprocessing, management and analysis of sequencing data sets with diverse situations and sources have exceeded the capabilities of many bioinformatics scientists. At present, it often takes several days to complete the analysis of a patient's whole genome sequencing data, which greatly delays the progress of follow-up life and medical scientific research.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Re-sequencing sequence alignment method based on Spark framework
  • Re-sequencing sequence alignment method based on Spark framework
  • Re-sequencing sequence alignment method based on Spark framework

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0028] It should be noted that, in the case of no conflict, the embodiments of the present invention and the features in the embodiments can be combined with each other.

[0029] Spark is a low-latency cluster distributed computing system for very large data sets, which is about 40 times faster than MapReducer.

[0030] Hadoop Distributed File System (HDFS) is designed as a distributed file system suitable for running on commodity hardware; HDFS can provide high-throughput data access and is very suitable for applications on large-scale data sets.

[0031] FASTQ is a text-based, standard format fo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of computer science and bioinformatics, and particularly to a re-sequencing sequence alignment method based on Spark framework. The method comprises threesteps of a RDDs creating step, a Map step and a Reduce step. The corresponding RDDs are created based on the FASTQ file and are stored in an HDFS. Then a sequence alignment algorithm of a BWA is applied on each RDDs. Furthermore the RDDs perform multi-node mapping. Finally whether to execute a final combining step is determined according to a processing requirement. According to the method, a sequence alignment BWA which is used in a re-sequencing step is integrated in a Spark big data processing frame, and re-sequencing procedure optimization is finished in a distributed calculation manner, thereby effectively improving re-sequencing data analysis efficiency.

Description

technical field [0001] The invention relates to the technical fields of computer science and bioinformatics, in particular to a resequencing sequence alignment method based on the Spark framework. Background technique [0002] Whole genome resequencing is to perform genome sequencing on different individuals of species with existing reference sequences (Reference Sequence), and based on this, analyze differences at the individual or group level. Through genome-wide resequencing, researchers can find a large number of single nucleotide polymorphism sites (SNP), copy number variation (Copy Number Variation, CNV), insertion deletion (InDel, Insertion / Deletion), structural variation (Structure Variation, SV) and other variant sites, which have great guiding significance in the research of human diseases and animal and plant breeding. As the cost of sequencing decreases, it is foreseeable that we will accumulate a large number of genome resequencing data of biological individual...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G16B30/00
CPCG16B30/00
Inventor 郑志春郭宁魏彦杰冯圣中周家秀
Owner SHENZHEN INST OF ADVANCED TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products