Code word design method for DNA storage

A design method and DNA sequence technology, applied in bioinformatics, instruments, etc., can solve the problems of high synthesis cost and low coding rate, and achieve the effects of simple structure, improved coding rate and reduced cost.

Pending Publication Date: 2022-02-08
DALIAN UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the current DNA encoding method simply maps binary data into DNA sequences, which has the disadvantages of low encoding rate and high synthesis cost.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Code word design method for DNA storage
  • Code word design method for DNA storage
  • Code word design method for DNA storage

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] The embodiments of the present invention are implemented on the premise of the technical solutions of the present invention, and detailed implementation methods and specific operation processes are given, but the protection scope of the present invention is not limited to the following embodiments. In the embodiment, this encoding algorithm is used to encode a text file with a size of 511B, and the constraints satisfied are as described above.

[0041] Step 1: Take input data and convert to binary data.

[0042] Specifically, first use the abs function to convert the characters in the text file into ASCII codes, and then convert them into 8-bit binary numbers. The 511B text file can be converted into 4088bits binary data through the above operations.

[0043] Step 2: Compress the binary data.

[0044] It should be noted that each 16-bit binary number is taken as a group, and then compressed using the minimum variance Huffman tree, where the Huffman tree is constructed ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a code word design method for DNA storage, which specifically comprises the following steps of: converting storage information into a DNA sequence, firstly converting the information into binary data, secondly, constructing a minimum variance Huffman tree, and compressing binary data by using the minimum variance Huffman tree, then, performing non-overlapping partitioning on the compressed binary data by taking 4 bits as a group to obtain at most 16 combinations, and sequentially selecting code words from the dictionary according to the probability of the combinations for mapping to obtain a DNA sequence, finally, obtaining the GC content of the DNA sequence, and if the GC content is higher than 60% or lower than 40%, adjusting the mapping relation to range from 40% to 60%, and further checking whether the DNA sequence contains more than 3 homopolymers, and if so, carrying out replacement and modification. The method has the characteristics of high coding rate and simple structure, and the coded DNA sequence also meets the constraint conditions that the GC content is between 40% and 60% and the running length of the homopolymer does not exceed 3.

Description

technical field [0001] The invention relates to the technical field of code design, in particular to a code word design method for DNA storage. Background technique [0002] Global demand for data storage is currently outstripping growth in global storage capacity. As the carrier of natural genetic information, DNA provides a stable, resource-efficient, and sustainable data storage solution. It wasn't until the 2000s that Church and Goldman's pioneering work made DNA storage mainstream. Church et al. successfully stored up to 659KB of data in a DNA molecule, whereas prior to this work, the largest stored data was less than 1KB. Goldman et al. stored more data, reaching 739KB. It is worth noting that the data stored in these two studies not only contained text, but also images, sounds, pdfs, etc., which confirms that DNA can store multiple data types. [0003] Specifically, DNA data storage is an emerging research that converts binary digital information into DNA sequence...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B50/50G16B50/00
CPCG16B50/50G16B50/00
Inventor 王宾郑燕芬胡轶男张强
Owner DALIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products