Data processing method of next-generation sequencing (NGS) data analysis platform (IMP)

A technology of data analysis and next-generation sequencing, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of high hardware cost and maintenance cost

Active Publication Date: 2018-01-19
厦门极元科技有限公司
View PDF3 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The advantage is that you don’t need to maintain your own hardware. The disadvantage is that you need to transmit and store massive amounts of genetic data over the network. At the same time, how to protect confidentiality and security of gene

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method of next-generation sequencing (NGS) data analysis platform (IMP)
  • Data processing method of next-generation sequencing (NGS) data analysis platform (IMP)
  • Data processing method of next-generation sequencing (NGS) data analysis platform (IMP)

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0076] The data processing method of IMP, a second-generation sequencing data analysis platform of the present invention, uses hash table addressing, supports the functions of data sorting and removing repeated sequences through the hash table, and avoids excessive use of memory through lossless compression of data, all The data processing is based on the storage and calculation of memory data, multi-threaded parallel processing within each module and between different modules.

[0077] As shown in Figure 1, a data processing method of a next-generation sequencing data analysis platform IMP according to the present invention specifically includes the following steps:

[0078] Step 1. The next-generation sequencing data analysis platform IMP inputs short-read sequence files and indexed reference sequences;

[0079] Step 2. When comparing sequences, read a certain length of short-read sequences each time and put them into the cache, and use the multi-threaded working mode on the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data processing method of a next-generation sequencing (NGS) data analysis platform (IMP). The next-generation sequencing data analysis platform implements an entire next-generation sequencing processing process as a single step from inputting of short-reading long-sequences of an FASTQ file format to outputting of mutation detection of a standard VCF file format; at thesame time, the method also provides an option of outputting an intermediate result of sequence alignment in a standard SAM or BAM format; longer data searching and loading time required for I/O accessing of hard drives and SSDs can be avoided through massive memory accessing without using slow I/O to exchange data; hash table writing or reading, deletion of duplicated alignment records and mutation detection are all enabled to be faster; quick next-generation sequencing data analysis can be realized on the premise of not impacting analysis quality; and compared with speeds of existing schemes,a speed of the method is increased up to 20 times.

Description

technical field [0001] The invention relates to a data processing method of a next-generation sequencing data analysis platform. Background technique [0002] With the smooth implementation of the Human Genome Project and the rapid development of sequencing technology, the cost of sequencing has been significantly reduced, and the sequencing speed has been significantly improved. The sequencing cost of human whole genome sequencing has dropped to less than $1000, and the amount of DNA sequence data has grown exponentially. . How to quickly use and express these data, and then analyze and explain potential problems in gene sequences, and discover information beneficial to human beings from massive data has become an urgent problem to be solved. The sequence data generated by human whole-genome sequencing (WGS) is widely used, and the continuous demand for rapid analysis and processing of massive sequence data makes data analysis a new technical bottleneck. The clinical appli...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F19/22G06F19/28
Inventor 杨文娴张翔俞容山
Owner 厦门极元科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products