Method for Biologically Storing and Restoring Data

A data and data division technology, which is applied in the fields of synthetic biology, computer and bioinformatics, can solve the problems of reduced conversion efficiency, increased cost of DNA synthesis and sequencing, and large distances in information storage density, so as to prevent generation, The effect of preventing single-base consecutive repeats

Active Publication Date: 2021-07-13
TSINGHUA UNIV
View PDF10 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

First of all, the binary algorithm adopted by Church et al. has a lot of room for improvement in information storage density, and the problem of high mutation rate introduced by the continuous repetition of single bases has not been solved; secondly, although Professor Goldman’s team applied ternary The algorithm improves the above two problems at the same time, but the information storage density of 2.2PB / gram of single-stranded DNA obtained by them is still far from the theoretical value of 445EB / gram of single-stranded DNA. Due to the limitation of the ternary conversion rule itself, on the other hand, due to the quadruple redundancy error correction mechanism, the sequence length is increased to 4 times the original sequence, and the conversion efficiency is reduced to a quarter. Accordingly, DNA synthesis and The cost of sequencing will also increase by 4 times at the same time; moreover, both Church and Goldman and others have only solved the problem of storing data through DNA under the premise of preserving DNA in vitro, and the biological adaptability required for implanting data DNA into organisms and error correction mechanism, they failed to give a good solution; finally, David Haughton and others from the computer field combined the "quaternary" algorithm and channel coding technology to significantly improve the information storage density and A near-optimal solution that satisfies the biological adaptability and error correction mechanism is given, but there are also problems, such as the 1 or 2 bits at the end of the 0 / 1 binary sequence that cannot be correctly encoded in the "quaternary" algorithm problems, as well as the problem of preventing the occurrence of start codons during the generation and integration of position information sequences, and David Haughton et al. only gave a set of solutions for how to convert data into data DNA sequences, and the complete process of biological storage No solution given, no actual tried and tested

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for Biologically Storing and Restoring Data
  • Method for Biologically Storing and Restoring Data
  • Method for Biologically Storing and Restoring Data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0203] Embodiment 1 Conversion and restoration of text data

[0204] The following takes text type data as an example to illustrate the data conversion process and restoration process of the present invention.

[0205] The different types of data have been preprocessed and the data format converted into a text file "written" by characters in the ASCII table. Therefore, the converter will be faced with a string literal, which can also be understood as a very long sequence of strings. Convert a data text to a data DNA sequence in units of string units of the data text. Such as figure 2 As shown, every 20 characters form a string, which is a conversion unit, and is encoded into a data DNA sequence single strand. Starting from the first conversion unit (#1) of the data text, each conversion unit (#2, #3, etc.) is encoded sequentially to generate multiple data DNA sequence single strands.

[0206] Generation and reduction of indexDNA sequences

[0207] (1) Algorithm for gener...

Embodiment 2

[0294] Embodiment 2 algorithm test and result

[0295] Based on the above algorithm and design as the core, a simple biological converter was written, and the performance of the converter was tested.

[0296] (1) Storage of small-scale text data

[0297] The first generation of converters did not have index and correction modules, so they could only convert some very short texts. When dealing with some short texts, since there is no indexDNA sequence and correctionDNA sequence part, the length of the data DNA sequence is shortened, the efficiency is improved, and the cost is reduced at the application level. On the other hand, in the short term, what is currently applied to short-text biological storage will be more common. Take "Dai Lab, Tsinghua University, Synthetic Yeast, Synthetic Biology" as the test text, and convert it to the dataDNA sequence shown in Table 6:

[0298] Table 6 Storage test results of small-scale text data

[0299]

[0300]The above dataDNA seque...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention relates to a method and a device for converting data into a data DNA sequence with good biological implantability, and restoring the DNA sequence library to original data, and also relates to a software product for realizing the method and storing the software product computer readable storage media. The invention realizes the possibility of storing data in organisms by constructing a data DNA library.

Description

technical field [0001] The invention belongs to the fields of bioinformatics, synthetic biology and computers, and in particular relates to a conversion method capable of converting data into biologically adaptable DNA sequences and restoring the DNA sequence library to original data. Background technique [0002] The 21st century is the century of life sciences, as well as the century of information and big data. At present, with the vigorous development of information technology, an important issue associated with it is how to deal with increasingly huge data. According to the information provided by International Data Corporation, the total amount of information data generated in the world has reached about 0.8ZB (1ZB=1.18*1021B) in 2009. At the same time, the organization also predicts that by 2020, the total amount of global data will reach 40ZB. Existing data storage technology exposes its shortcomings of low storage density, high storage energy consumption, and shor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G16B50/00
CPCG16B50/00
Inventor 戴俊彪吴庆余乃哥麦提·伊加提孙凯文董俊凯秦怡然
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products