DNA storage coding method for optimizing Chinese storage

A coding method, Chinese technology, applied in the field of DNA storage, can solve the problems of low storage efficiency and large redundancy of DNA storage models, and achieve the effects of improving coding potential, improving compression effect, and error correction

Active Publication Date: 2020-08-28
SOUTHEAST UNIV
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Aiming at the problem of low storage efficiency and high redundancy of the existing DNA storage model for Chinese, a Chinese optimization coding scheme is adopted to reduce the redundancy of Chinese text and improve the compression effect of DNA storage coding

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • DNA storage coding method for optimizing Chinese storage
  • DNA storage coding method for optimizing Chinese storage
  • DNA storage coding method for optimizing Chinese storage

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038] Embodiment 1: Select the introduction to the tenth chapter of the Chinese text Outlaws of the Marsh as input data, the input format is a txt file, see the text example image 3 . Follow the steps below: Steps such as figure 1 , figure 2 .

[0039] Encoding process:

[0040] 1) According to GB2312-80 "Chinese Character Coded Character Set for Information Exchange", the first-level Chinese characters are renumbered in order from 0 to 3754.

[0041] 2) Input the Chinese text to be encoded, and design two character numbering methods according to the different types of characters contained in the text:

[0042] ①Numbering method E1: Count the number of character types other than first-level Chinese characters that appear in it. If there are no more than 341 types, all N characters other than first-level Chinese characters in the text 1 The character numbers are 3755 to 3755+N 1 -1,N 1 ≤341, go to step 3).

[0043] ② Numbering method E2: If the number of character ty...

Embodiment 2

[0064] Embodiment 2: Select three hundred Tang poems in Chinese text as input data, see the text example Figure 4 . Follow these steps:

[0065] Encoding process:

[0066] 1) According to GB2312-80 "Chinese Character Coded Character Set for Information Exchange", the first-level Chinese characters are renumbered in order from 0 to 3754.

[0067] 2) Input the Chinese text to be encoded, and design two character numbering methods according to the different types of characters contained in the text:

[0068] ①Numbering method E1: Count the number of character types other than first-level Chinese characters that appear in it. If there are no more than 341 types, all N characters other than first-level Chinese characters in the text 1 The character numbers are 3755 to 3755+N 1 -1,N 1 ≤341, go to step 3).

[0069] ② Numbering method E2: If the number of character types other than first-level Chinese characters exceeds 341, the second-level Chinese characters in GB2312-80 are ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a DNA storage coding method for optimizing Chinese storage, which comprises the following steps: 1) inputting a Chinese text, and recoding first-level Chinese characters or first-level and second-level Chinese characters according to the type of contained characters and the GB2312-80 standard; 2) counting the occurrence frequency of the segmented words in the text, multiplying the occurrence frequency by the length of the segmented words, sorting the products, and encoding the segmented words ranked in the front column,3) converting all characters into a binary sequence, and carrying out Huffman coding compression, 4) converting into a DNA sequence, and adding an address code and an RS error correction code, 5) the decoding process being an encoding reverse process,firstly carrying outerror correction , then carrying out the sequence splicing, and converting the DNA sequence into a binary sequence. According to the method, the redundancy of the Chinese text isreduced, the DNA storage coding compression effect is improved, and extremely high Chinese coding potential is obtained.

Description

technical field [0001] The invention relates to a DNA storage coding method for optimizing Chinese storage, belonging to the technical field of DNA storage. Background technique [0002] The total amount of global data information has reached 30ZB, and will soon exceed the capacity of existing storage media such as hard disks. DNA data storage technology has opened up a new storage mode, and its development plays an important role in saving storage energy and promoting the development of big data storage. DNA data storage has gradually become a global research hotspot in recent years. Many research institutions at home and abroad, including Harvard University, Columbia University, Microsoft Research Institute, University of Washington and Cambridge University, have carried out research on DNA storage. [0003] A unit mass of DNA has about 1021 bases, which can store 455 EB of information, which is 1 / 4 of the total annual information in the world; a unit volume of DNA can s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H03M7/40
CPCH03M7/4012
Inventor 毕昆陆祖宏
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products