Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data biological storage and restoration method

A technology of data and data division, which is applied in the fields of synthetic biology, computer and bioinformatics, can solve problems such as the large distance of information storage density, the increase of DNA synthesis and sequencing costs, and the inability to correctly encode the ends of binary sequences.

Active Publication Date: 2018-03-13
TSINGHUA UNIV
View PDF10 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

First of all, the binary algorithm adopted by Church et al. has a lot of room for improvement in information storage density, and the problem of high mutation rate introduced by the continuous repetition of single bases has not been solved; secondly, although Professor Goldman’s team applied ternary The algorithm improves the above two problems at the same time, but the information storage density of 2.2PB / gram of single-stranded DNA obtained by them is still far from the theoretical value of 445EB / gram of single-stranded DNA. Due to the limitation of the ternary conversion rule itself, on the other hand, due to the quadruple redundancy error correction mechanism, the sequence length is increased to 4 times the original sequence, and the conversion efficiency is reduced to a quarter. Accordingly, DNA synthesis and The cost of sequencing will also increase by 4 times at the same time; moreover, both Church and Goldman and others have only solved the problem of storing data through DNA under the premise of preserving DNA in vitro, and the biological adaptability required for implanting data DNA into organisms and error correction mechanism, they failed to give a good solution; finally, David Haughton and others from the computer field combined the "quaternary" algorithm and channel coding technology to significantly improve the information storage density and A near-optimal solution that satisfies the biological adaptability and error correction mechanism is given, but there are also problems, such as the 1 or 2 bits at the end of the 0 / 1 binary sequence that cannot be correctly encoded in the "quaternary" algorithm problems, as well as the problem of preventing the occurrence of start codons during the generation and integration of position information sequences, and David Haughton et al. only gave a set of solutions for how to convert data into data DNA sequences, and the complete process of biological storage No solution given, no actual tried and tested

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data biological storage and restoration method
  • Data biological storage and restoration method
  • Data biological storage and restoration method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0203] Embodiment 1 Conversion and restoration of text data

[0204] The following takes text type data as an example to illustrate the data conversion process and restoration process of the present invention.

[0205] The different types of data have been preprocessed and the data format converted into a text file "written" by characters in the ASCII table. Therefore, the converter will be faced with a string literal, which can also be understood as a very long sequence of strings. Convert a data text to a data DNA sequence in units of string units of the data text. Such as figure 2 As shown, every 20 characters form a string, which is a conversion unit, and is encoded into a data DNA sequence single strand. Starting from the first conversion unit (#1) of the data text, each conversion unit (#2, #3, etc.) is encoded sequentially to generate multiple data DNA sequence single strands.

[0206] Generation and reduction of indexDNA sequences

[0207] (1) Algorithm for gener...

Embodiment 2

[0294] Embodiment 2 algorithm test and result

[0295] Based on the above algorithm and design as the core, a simple biological converter was written, and the performance of the converter was tested.

[0296] (1) Storage of small-scale text data

[0297] The first generation of converters did not have index and correction modules, so they could only convert some very short texts. When dealing with some short texts, since there is no indexDNA sequence and correctionDNA sequence part, the length of the data DNA sequence is shortened, the efficiency is improved, and the cost is reduced at the application level. On the other hand, in the short term, what is currently applied to short-text biological storage will be more common. Take "Dai Lab, Tsinghua University, Synthetic Yeast, Synthetic Biology" as the test text, and convert it to the dataDNA sequence shown in Table 6:

[0298] Table 6 Storage test results of small-scale text data

[0299]

[0300]The above dataDNA seque...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method and an apparatus for converting data into data DNA sequences with good biological implantability and restoring a DNA sequence library to original data, and further relates to a software product used for realizing the method and a computer readable storage medium storing the software product. The possibility of storing the data in a biological body by constructingthe DNA library is realized.

Description

technical field [0001] The invention belongs to the fields of bioinformatics, synthetic biology and computers, and in particular relates to a conversion method capable of converting data into biologically adaptable DNA sequences and restoring the DNA sequence library to original data. Background technique [0002] The 21st century is the century of life sciences, as well as the century of information and big data. At present, with the vigorous development of information technology, an important issue associated with it is how to deal with increasingly huge data. According to the information provided by International Data Corporation, the total amount of information data generated in the world has reached about 0.8ZB (1ZB=1.18*1021B) in 2009. At the same time, the organization also predicts that by 2020, the total amount of global data will reach 40ZB. Existing data storage technology exposes its shortcomings of low storage density, high storage energy consumption, and shor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F19/28
CPCG16B50/00
Inventor 戴俊彪吴庆余乃哥麦提·伊加提孙凯文董俊凯秦怡然
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products