Unlock instant, AI-driven research and patent intelligence for your innovation.

A mutation detection method based on cloud computing platform spark

A cloud computing platform and mutation detection technology, applied in the field of bioinformatics, can solve problems such as load imbalance, HaplotypeCaller mutation detection method cannot adapt to multi-node environment scenarios, etc., to achieve good load balance, reduce the steps of computing data, and scalability strong effect

Active Publication Date: 2020-05-22
SOUTH CHINA UNIV OF TECH
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is to overcome the deficiencies in the prior art, and provides a variation detection method based on cloud computing platform Spark, which can effectively solve the problem that the HaplotypeCaller variation detection method cannot be adapted to multi-node environment scenarios or the load is unbalanced when multi-node

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A mutation detection method based on cloud computing platform spark
  • A mutation detection method based on cloud computing platform spark
  • A mutation detection method based on cloud computing platform spark

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The present invention will be further described below in conjunction with specific examples.

[0034] Such as figure 1 As shown, the variation detection method based on the cloud computing platform Spark provided in this embodiment includes the following steps:

[0035] 1) The input sequence alignment mapping format file intercepted by the Spark master node is distributed to each Spark working node.

[0036] The input to the method of the invention is a sequence alignment file. The common format of the sequence alignment file is the SAM format (Sequence Alignment / Map), which records the alignment information of the sequencing sequence reads to the reference sequence in a text format. In addition, usually in order to save storage space and improve transmission rate, SAM files will be processed into BAM files by binary compression. A BAM file is a block-based compressed format consisting of a series of data blocks that do not exceed 64Kb. This feature allows efficient r...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a variation detection method based on a cloud computing platform Spark. The method comprises the steps that firstly, a Spark node intercepts part of an input sequence comparison mapping format file and distributes the format file to each Spark working node; secondly, the Spark working nodes preprocess the fragment sequence comparison mapping format file in parallel, and preprocessing information of the fragment is obtained and returned to the Spark main node; thirdly, the Spark main node performs self-defined granularity division on the input sequence comparison mappingformat file according to the preprocessing information, and distributes the format file to each Spark working node; fourthly, the Spark working nodes conduct variation detection on the fragment sequence comparison mapping format file, and the Spark main node receives data fed back by all the working nodes and writes the data into a file. The method can be used for effectively solving the problemthat a Haplotype Caller variation detection method cannot adapt to multi-node environment scenes or the load is unbalanced under the multi-node condition.

Description

technical field [0001] The invention belongs to the technical field of bioinformatics, in particular to a variation detection method based on a cloud computing platform Spark. Background technique [0002] In recent years, with the in-depth development of the second-generation sequencing technology (High Throughput Sequencing), the time and cost of human individual sequencing have been reduced to a relatively low level. From the original cost of $1 per base to $1,000-$5,000 to sequence a genome, from the time it took 13 years to complete the first human genome map to now only a few weeks, the research on the human genome has entered a low stage. The era of cost-high-throughput sequencing. However, due to the rapid growth of genetic data, massive data also makes traditional gene processing software somewhat powerless. Only when the data acquisition speed matches the processing speed can the advantages brought by high-throughput sequencing technology be better utilized. [0...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G16B20/50
CPCG16B30/00
Inventor 董守斌吴宗泽袁华付佳兵张铃启
Owner SOUTH CHINA UNIV OF TECH