Gene sequencing data compression and transmission method

A technology for gene sequencing and sequencing data, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as difficult to meet big data analysis, reduce data transmission efficiency, large data storage space, etc., to ensure basic accuracy , The effect of high data transmission efficiency and small storage space

Inactive Publication Date: 2017-07-21
首度生物科技(苏州)有限公司 +1
View PDF6 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For the compression, storage and transmission of sequencing data, it is increasingly difficult to meet the needs of big data analysis by using traditional conventional compression methods. The huge data not only requires a large storage space, but also requires a lot Long time, greatly reducing the efficiency of data transmission

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Gene sequencing data compression and transmission method
  • Gene sequencing data compression and transmission method
  • Gene sequencing data compression and transmission method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0032] This example takes a batch of DNA sequencing data produced on January 8, 2017 as an example. The data output platform is illuminaNextSeq500, the standard DNA sequence is Homo sapiens, the standard DNA sequence database hg19, the data volume of DNA sequencing data is 1M pair reads, and the sequencing read length is 150bp.

[0033] Such as figure 1 Shown, a gene sequencing data compression method, comprises the following steps:

[0034] A. Establish a standard DNA sequence database: deploy the hg19 database to data processing equipment, and the hg19 database has DNA sequences and numbers corresponding to the DNA sequences;

[0035] B. Preprocessing of DNA sequencing data: Compare the DNA sequencing data produced by illumina NextSeq500 with the hg19 database one by one to generate a corresponding relationship, and replace the original text of the DNA sequencing data with the number in the standard DNA database. This example adopts non-destructive replacement. The same da...

Embodiment 2

[0045] Such as image 3 As shown, the DNA sequencing data and the standard DNA database adopted in the second implementation are the same as those in the first embodiment. The second implementation is provided with a first data processing device and a second data processing device. The first data processing device is a core computer, and the second data processing device is a core computer. The equipment is a processing terminal, and the transmission between the core computer and the processing terminal uses a Gigabit network card, and the system is a Unix operating system.

[0046] A method for transmitting gene sequencing data, comprising the following steps:

[0047] A. Establish a standard DNA sequence database: deploy the hg19 database to the core computer and processing terminal. The hg19 database has a DNA sequence and a number corresponding to the DNA sequence;

[0048] B. Preprocessing of DNA sequencing data: the core computer compares the DNA sequencing data produce...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a gene sequencing data compression and transmission method. The method comprises the following steps of A, establishing a standard DNA sequence database; B, deploying the standard DNA sequence database to a data processing device; C, preprocessing DNA sequencing data: comparing the DNA sequencing data with the standard DNA database one by one, generating a corresponding relationship, replacing an original text of the DNA sequencing data with numbers of the standard DNA database, and separately storing a part, different from the standard DNA database, of the DNA sequencing data; D, performing compression; and E, performing storage or transmission. The standard DNA sequence database is stored in the data processing device, so that a large amount of information contained in the DNA sequencing data can be represented by the numbers of the standard DNA database, and the data capacity after the step of preprocessing the DNA sequencing data is greatly reduced; through further compression, the capacity is smaller, so that the storage space of the DNA sequencing data is smaller, and the data transmission efficiency is higher; and the data in the method is matched with output data of a second-generation sequencing technology and even a third-generation sequencing technology.

Description

technical field [0001] The invention relates to the technical field of gene detection, in particular to a gene sequencing data compression and transmission method. Background technique [0002] With the development of gene sequencing technology and the reduction of sequencing costs, especially the application and popularization of next-generation sequencing (NGS), the output of sequencing data is increasing exponentially, and how to efficiently store and transmit sequencing data has become a challenge for the industry. major challenge. Mature DNA sequencing technology began in the 1970s with chemical degradation and dideoxy chain termination methods, followed by fluorescence and hybridization and other sequencing methods, collectively referred to as the first generation of DNA sequencing technology, the output data volume is usually in bp or On the order of kb. Around 2005, technologies such as 454 sequencing, solexa sequencing and SOLiD sequencing appeared successively, a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/20G06F19/28G06F17/30
CPCG06F16/1744G16B25/00G16B50/00
Inventor 左褀洋唐元华徐健
Owner 首度生物科技(苏州)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products