An Encoding Method for Rapidly Encoding Gene Character Sequences into Binary Sequences

A technology of binary sequence and character sequence, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of time-consuming, slow speed, low efficiency, etc., achieve high coding efficiency, fast coding speed, and improve coding The effect of compression

Active Publication Date: 2017-12-26
GENETALKS BIO TECH CHANGSHA CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since such encoding is performed word by word, it is very time-consuming for GB or even TB of DNA data, and it is very important to improve the encoding speed and efficiency
Therefore, the existing encoding method of encoding the gene character sequence into a binary sequence has the problems of slow speed and low efficiency when encoding the massive data generated by DNA sequencing.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An Encoding Method for Rapidly Encoding Gene Character Sequences into Binary Sequences
  • An Encoding Method for Rapidly Encoding Gene Character Sequences into Binary Sequences
  • An Encoding Method for Rapidly Encoding Gene Character Sequences into Binary Sequences

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] Such as figure 1 As shown, the steps of the encoding method for the rapid encoding of the gene character sequence of the present embodiment into a binary sequence include:

[0028] 1) Take a specified number of characters from the gene character sequence to be encoded as the current processing string;

[0029] 2) Shift the binary code of the currently processed character string to the right by 1 bit;

[0030] 3) Perform data and operation with each character in the current processing character string shifted right by 1 bit with 0x3;

[0031] 4) Extracting and assembling the data and the lowest two bits of each character in the result after the operation respectively to obtain a compact coded binary sequence corresponding to the currently processed character string;

[0032] 5) Judging whether the processing of the gene character sequence to be encoded is completed, if not, then jump to step 1); otherwise, end and exit.

[0033] see figure 2 The binary codes of the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an encoding method for rapidly encoding a gene character sequence into a binary sequence. The encoding method comprises the steps of: 1, taking out a specified number of character strings from a gene character sequence to be encoded and using the taken character strings as currently processed character strings; 2, shifting binary codes of the currently processed character strings right by one bit; 3, carrying out a data AND-operation on each character in the currently processed character string shifted right by one bit and 0x3; 4, respectively extracting and assembling two lowest bits in each character in a result after the data AND-operation to obtain a compact code binary sequence corresponding to the currently processed character strings; and 5, judging whether processing on the gene character sequence to be encoded is completed, skipping to execute the step 1 if processing is not completed, otherwise ending and exiting. The encoding method has the advantages of high encoding compression efficiency, high encoding speed, high encoding efficiency, no requirement for carrying out case change, flexible implementing mode and wide application range.

Description

technical field [0001] The invention relates to gene sequencing technology, in particular to a coding method for rapidly coding a gene character sequence into a binary sequence. Background technique [0002] With the development of gene sequencing technology, the price of sequencing has dropped exponentially, even faster than Moore's Law. With the large amount of sequencing data that comes with it, although the Internet and hardware continue to update and develop, the compression, storage, and transmission of the massive data generated by sequencing still pose great challenges. Both the DNA genome sequence and the sequencing sequence are composed of ATCG four bases. The encoding of A, T, C, and G is very important in data compression. The usual practice is as follows: due to the mixed case of characters, the character case can be unified Convert all to uppercase or lowercase (or directly judge without converting uppercase and lowercase, and the judgment situation increases ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F19/20
CPCG16B25/00
Inventor 李根宋卓
Owner GENETALKS BIO TECH CHANGSHA CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products