Gene sequencing quality row data compression preprocessing, decompression and reduction method and system

A technology for gene sequencing and row data, applied in sequence analysis, instruments, electrical components, etc., can solve the problems of small window, limited compression efficiency, small preprocessing window, etc., achieve good compression rate, improve compression efficiency, and good compression effect Effect

Active Publication Date: 2019-11-08
GENETALKS BIO TECH CHANGSHA CO LTD
View PDF7 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the BWT algorithm has the following two defects: (1) Large additional overhead: Since the BWT algorithm needs to save the position information I of the original string S in the matrix M, it introduces additional storage overhead in the preprocessing stage
Due to the existence of this additional overhead, the preprocessed result may not improve the compression efficiency
(2) The preprocessing window is small: the BWT algorithm only adjusts the order of the characters in the string, and its preprocessing window is only a fixed-length string. The preprocessing window is small, and it does not consider files or large data blocks. angle to adjust the order of the data
[0013] In a massive data environment, the BWT algorithm has a small preprocessing window, which limits its ability to improve data similarity in large data blocks.
In addition, the additional overhead in its preprocessing also limits the further improvement of compression efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Gene sequencing quality row data compression preprocessing, decompression and reduction method and system
  • Gene sequencing quality row data compression preprocessing, decompression and reduction method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0069] Such as figure 1 As shown, the implementation steps of the gene sequencing quality row data compression preprocessing method in this embodiment include:

[0070] 1) Read the original data block Data of the quality row data and determine the column number Index_No of its index column;

[0071] 2) Create a grouping information table IIT (Index Information Table) according to the index column of the original data block Data;

[0072] 3) According to the grouping information table IIT, each quality row in the original data block Data is rearranged according to the index column information, and the data in the index column part is deleted to obtain the grouped rearranged data Grouped_Data;

[0073] 4) Extract the data Index_Data of the index column of the original data block Data, and output the column number Index_No of the index column, the data Index_Data of the index column of the original data block Data, and the grouped rearranged data Grouped_Data as the result of co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a gene sequencing quality row data compression preprocessing, decompression and reduction method and system. The basic principle of gene sequencing quality row data compressionpreprocessing, decompression and reduction is that multiple columns are taken out of input quality row files or data blocks to serve as index columns, all the quality row data are rearranged, and thequality rows with the same index column form a group and arranged together according to the relative positions of the quality row data in an original database. Due to the fact that the quality row data with the same index column is usually more similar, the similar gene sequencing data can be arranged together through data reorganization, and then the local similarity of the data is improved. Accordingly, additional storage expenses are not introduced, data reorganization in a big data window is achieved only through very small computation overhead, and the compression efficiency is improved;the method and system are suitable for conducting compression preprocessing on the quality row data in the gene sequencing process, and when the data block is bigger, the advantage is more obvious.

Description

technical field [0001] The invention relates to compression preprocessing and decompression technology of gene sequencing quality row data, in particular to a gene sequencing quality row data compression preprocessing, decompression and restoration method and system. Background technique [0002] Genetic testing is a technology that detects DNA through blood, other body fluids, or cells. It detects the DNA molecular information in the cells of the tested person through specific equipment, and analyzes whether the gene types and gene defects and their expression functions contained in it are normal. A method that enables people to understand their own genetic information, clarify the cause or predict the risk of a certain disease in the body. Genetic testing can diagnose diseases and can also be used to predict the risk of diseases. With the continuous upgrading of gene sequencing technology, the sequencing throughput is getting higher and higher, and at the same time, the c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B30/00G16B50/00H03M7/30
CPCH03M7/30G16B50/50H03M7/3077G16B30/00G16B50/30
Inventor 赵强利宋卓李根蒋艳凰冯博伦唐宏伟徐霞丽毛海波
Owner GENETALKS BIO TECH CHANGSHA CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products