DNA sequence data compression system

A DNA sequence and data compression technology, applied in sequence analysis, electrical digital data processing, special data processing applications, etc., can solve the problems of data volume expansion, limit the compression performance of BioCompress-2 system, etc., to eliminate redundancy and improve the overall The effect of compression ratio

Inactive Publication Date: 2011-06-01
SHENZHEN UNIV
View PDF2 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, when searching, the LZ algorithm can only find small-scale fragment repetitions, resulting in the expansion of the encoded data volume
This also largely limits the compression performance of the BioCompress-2 system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • DNA sequence data compression system
  • DNA sequence data compression system
  • DNA sequence data compression system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The present invention provides a DNA sequence data compression system. In order to make the purpose, technical solution and effect of the present invention clearer and clearer, the present invention will be further described in detail below. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0045] Compared with ordinary text strings, DNA sequence data has the following three main distinctive features:

[0046] First, there is a large amount of similar redundancy in DNA sequence data. There are both simple segment duplication and large-scale gene sequence duplication. The high similarity of DNA sequence data is the fundamental basis of its compression algorithm. In theory, if a data model with sufficient coverage can be used to describe the redundancy in DNA sequence data, a higher compression ratio can be achieved.

[0047] Second, repetitions in DNA sequence data...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a DNA sequence data compression system which is a DNA sequence data lossless compression system based on an MA-ARV codebook. In the invention, a similar repeating segment of an MA-ARV code vector can be searched on a complete sequence, and a cultural meme heuristic optimization algorithm method (MA) is used for optimizing a construction process of the compression codebook so as to more comprehensively utilize the repeating characteristic of DNA sequence data and effectively eliminate redundancy.

Description

technical field [0001] The invention relates to the field of data compression, in particular to a DNA sequence data lossless compression system based on a cultural gene approximate repeating vector model. Background technique [0002] DNA is a double-stranded polymer used to store genetic instruction information in the cells of species, and is an important material basis for the survival, continuation and development of organisms. DNA sequence data is an abstract model of DNA material in bioinformatics (Bioinformatics), which contains complete genetic information and has important scientific research value and social significance. In order to obtain the genetic information of various organisms, various DNA sequencing projects have been launched one after another, generating massive amounts of DNA sequence data, which has brought enormous pressure on existing data storage and transmission resources. Therefore, it is necessary to compress the DNA sequence data. At present, t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/10G16B50/50G16B30/00
CPCG06F19/22G06F19/10G16B30/00G16B50/50G16B99/00
Inventor 纪震周家锐朱泽轩储颖
Owner SHENZHEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products