Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Cluster-Based High-Throughput Data Analysis Method

A technology of data analysis and analysis methods, applied in the fields of genomics, instrumentation, proteomics, etc., can solve the problems of not giving solutions, not being able to cope well, increasing the time and inconvenience of analysis, etc., achieving obvious effects and improving Analysis speed, effect of reducing analysis time

Active Publication Date: 2019-03-05
SHANGHAI MAJORBIO BIO PHARM TECH
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The application believes that the above-mentioned solution solves the problem of too long analysis time of human resequencing biological information. However, this document does not involve high-throughput gene sequencing technology, and does not provide a solution to the existing problems in the existing high-throughput gene sequencing solution. suggestion
[0007] It can be seen that the existing solutions can only perform parallel processing of whole genome data, and cannot cope well with the situation of whole exon or target sequencing. In addition, if the number of parallel processing tasks is changed, the reference genome needs to be re-segmented and indexed , which increases analysis time and inconvenience

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Cluster-Based High-Throughput Data Analysis Method
  • A Cluster-Based High-Throughput Data Analysis Method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] Such as figure 1 As shown, data segmentation is performed on high-throughput sequencing off-machine data. After data segmentation is performed on the off-machine data, n data slice files are generated. After all the alignments of these data slice files with the reference genome are completed, the generated n alignment result slice files are merged into one alignment result file.

[0037] Such as figure 2 As shown, pre-specify a zone file and split it into the specified n zone subfiles. The comparison result file is divided again according to the specified n region sub-files to extract data, and n data slice files are generated, which are provided to subsequent steps for processing.

[0038] The above-mentioned data slice files are separated by each record unit. When splitting, the total number of lines of the file is preset to control the number of slice files generated, and thus set the number of tasks that need to be processed in parallel. .

[0039] For the cal...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a high-throughput data analyzing method. For processing of high-throughput sequencing offline data, the high-throughput data analyzing method comprises the following steps: partitioning offline data; generating a plurality of front data slice files after the offline data are partitioned, and combining a plurality of generated comparison result slice files into a comparison result file after all comparisons with a reference genome are finished; pre-specifying one zone file, and partitioning the zone file into a plurality of specified zone sub-files; extracting data from the comparison result file according to the plurality of specified zone sub-files, partitioning the extracted data once again to generate a plurality of back data slice files, and processing the plurality of back data slice files in a subsequent step. For operation processing of a plurality of data slices obtained after partitioning, computing resources including computing nodes, a corresponding CPU (Central Processing Unit) and a corresponding memory are allocated by a cluster management tool.

Description

technical field [0001] The invention belongs to the technical field of gene sequencing, in particular to a cluster-based high-throughput data analysis method. Background technique [0002] High-throughput gene sequencing technology, also known as "next-generation sequencing (NGS) technology, can determine hundreds of thousands or even millions of sequences at one time, and is the most widely used sequencing technology today. Compared with the traditional Sanger sequencing technology, NGS has the advantages of high speed, high throughput, and low price. [0003] Variation detection based on high-throughput sequencing has developed rapidly in recent years, and now coincides with the vigorous promotion of precision medicine, the demand for mutation detection is facing explosive growth. In the process of mutation detection, it is very common to process GB-level or even hundreds of gigabytes of data. The routine analysis time ranges from a few hours to a few days. It takes a lon...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G16B20/30
CPCG16B20/00
Inventor 杨飞陈昌岳任一占雪峰张祥林
Owner SHANGHAI MAJORBIO BIO PHARM TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products