A Sequencing Data Compression Method Based on Double Codons

A technology of sequencing data and compression method, applied in the field of biomedicine, can solve problems such as non-use, and achieve the effect of reducing occupied space and saving data transmission time

Active Publication Date: 2019-11-05
洛阳晶云信息科技有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This compression strategy does not take advantage of the regularity of duplex codon-based sequencing data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Sequencing Data Compression Method Based on Double Codons
  • A Sequencing Data Compression Method Based on Double Codons
  • A Sequencing Data Compression Method Based on Double Codons

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0024] see Figure 1~5 , in an embodiment of the present invention, a duplex codon-based sequencing data compression method, comprising the following steps:

[0025] (1) Input a sequencing data file, assuming that the sequence in the input sequencing sequence file is:

[0026] AUGGUGCUGUCUCCUGCCCCUGCCCCUGCCGACAAGACCAACACCAACACCAACACCAACGUCAAGGCCGCCUG(; GGUUGGGGUAAGGUC; Segment the sequence with every 6 bases as the modulus, as follows: AUGGUG|CUGUCU|CCUGCC|CC...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a sequencing data compression method based on a duplex codon. The sequencing data compression method comprises the following steps of (1) cutting; (2) establishing of a hash table; (3) ordering; (4) establishing of a Huffman tree; (5) coding; and (6) calculation of the compression ratio, wherein the compression ratio of the algorithm is 8M / L. Space occupation of sequencing data is greatly reduced so that the sequencing data compression method plays a key role in storage and transmission of the large-scale sequencing data, and transmission time of local and cloud-side data can be saved.

Description

technical field [0001] The invention relates to biomedicine, in particular to a sequencing data compression method based on double codons. Background technique [0002] Because sequencing sequences are generally very long, the amount of data is very large. The transmission of large amounts of data will directly lead to the tension of bandwidth. Especially in some cases where the network is not good, it is considered to use a compression strategy to reduce the amount of data transmission. Everything on the Internet, from local and cloud storage to data streaming, relies heavily on compression algorithms and would be very inefficient without it. [0003] The general flow of a compression strategy includes: First, the raw data is input, which contains the sequence of symbols that we need to compress or reduce in size. Second, these symbols are encoded by the compressor. Third, output the result, that is, the encoded data. There are two types of compression strategies, loss...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): H03M7/40
CPCH03M7/40
Inventor 赵屹卜德超赵连鹤
Owner 洛阳晶云信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products