Unlock instant, AI-driven research and patent intelligence for your innovation.

A parallel compression and decompression method for fastq files of dna read sequence data

A decompression and data technology, applied in concurrent instruction execution, electrical digital data processing, special data processing applications, etc., can solve problems such as parallel algorithm research articles that have not yet seen multi-core CPUs

Inactive Publication Date: 2016-07-06
INST OF SOFTWARE - CHINESE ACAD OF SCI +1
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The above-mentioned G-SQZ algorithm and DSRC algorithm are both serial algorithms, and there are no research articles and patents on parallel algorithms based on multi-node multi-core CPUs related to this type of algorithm.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A parallel compression and decompression method for fastq files of dna read sequence data
  • A parallel compression and decompression method for fastq files of dna read sequence data
  • A parallel compression and decompression method for fastq files of dna read sequence data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] The present invention provides a method for parallel compression and decompression of FASTQ files of DNA reading sequence data. In order to make the purpose, technical solution and effect of the present invention clearer and clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0063] The raw data reading thread in the parallel compression method of the FASTQ file is explained in detail below, and its specific implementation steps are as follows:

[0064] (1) Open the FASTQ compressed file of the raw DNA read sequence data to be compressed.

[0065] (2) Obtain the memory paging size of the file system of the currently running machine.

[0066] (3) Set the memory mapping space size according to the memory paging size.

[0067] (4) According to the rang...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method for parallel compression and parallel decompression of DNA reading sequence data FASTQ files, aiming at the compression and decompression of DNA reading sequence data FASTQ files, using circular double buffer queues, circular double memory mapping and memory mapping combined with data block processing, multiple Thread pipeline parallel compression and decompression processing, reading and writing sequence two-dimensional array and other technologies realize parallel compression and parallel decompression processing between multiple processes of FASTQ files and multiple threads within a process. It can be implemented based on MPI and OpenMP, or based on MPI and Pthread. The invention makes full use of the powerful computing capabilities of each computing node and the multi-core CPU in the node, and can solve the limitation of processor, memory and other resources that the serial compression and decompression program is subjected to.

Description

technical field [0001] The invention relates to the fields of biological information, data compression and high-performance computing, in particular to a method for parallel compression and parallel decompression of DNA reading sequence data FASTQ files. Background technique [0002] One of the main tasks of bioinformatics is to collect and analyze large amounts of genetic data. These data are critical for genetic research, helping to identify genetic components that prevent or cause disease and develop targeted therapies. High-throughput sequencing methods and equipment generate massive amounts of short-read sequence data. The common way to store, manage and transmit DNA read sequence data is to use the FASTQ file format, which mainly contains DNA read sequence data and annotation information corresponding to each DNA base, such as QualityScores representing the uncertainty of the sequencing labeling process information. Read sequence markers and other descriptions such ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/38G06F17/30H03M7/30
Inventor 郑晶晶王婷张常有詹科
Owner INST OF SOFTWARE - CHINESE ACAD OF SCI