Unlock instant, AI-driven research and patent intelligence for your innovation.

A DNA data storage hybrid error correction and data recovery method

An error correction and recovery method technology, applied in the field of deoxyribonucleic acid data storage, can solve the problems of high cost of digital information, low reliability of data recovery, waste of synthesis and sequencing resources, etc., to achieve guaranteed reliability, high reliability, The effect of improving the utilization rate of sequencing data

Active Publication Date: 2021-08-13
TIANJIN UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The downside of DNA storage is that the cost of synthesizing and reading the digital information stored in DNA is high, but the daily storage of DNA molecules is relatively cheap
The reading method proposed by Nick Goldman et al. uses Hamming distance as a screening condition in data screening, which may discard reads with only a small number of insertion or deletion errors, reducing the number of samples used for data recovery and wasting synthesis. and sequencing resources
On the other hand, when the amount of data is small, directly using the method of large number merging to recover data will reduce the reliability of data recovery; if the insertion or deletion errors in the read segment are considered, the merging method cannot work effectively
This scheme adopts the quadruple repetition code method, and there are similar problems in the combination of repetition codes

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A DNA data storage hybrid error correction and data recovery method
  • A DNA data storage hybrid error correction and data recovery method
  • A DNA data storage hybrid error correction and data recovery method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] In order to make the purpose, technical solution and advantages of the present invention clearer, the implementation manners of the present invention will be further described in detail below.

[0046] see figure 1 , a DNA data storage hybrid error correction and data recovery method, the method comprises the following steps:

[0047] (1) Preprocessing the sequencing data, specifically: for the double-end read sequence, one of the read segments is reversed and complemented to obtain two overlapping read segments;

[0048] (2) Screen the sequencing reads according to the edit distance of the overlapping parts of the double-end reads and whether the verification information of the labeled part is correct;

[0049] (3) Cluster the reads according to the recovered label and file number, considering that the label part has been processed, and divide the reads in the cluster into two parts: the middle overlapping part and the non-overlapping part;

[0050] (4) If the copy n...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a DNA data storage mixed error correction and data recovery method, which includes: screening the sequencing reads according to the edit distance of the overlapping part of the double-end reads and whether the verification information of the label part is correct; The recovered label and file number will cluster the reads, and divide the reads in the cluster into two parts: the middle overlapping part and the non-overlapping part; if the number of copies of the middle overlapping part or the non-overlapping part is greater than the set threshold, use The clustering method determines the central sequence, otherwise it is determined by multiple sequence merging; the bases corresponding to the data part of each read segment are divided into several preset length segments, and each segment is jointly corrected according to the parity of the column number and the corresponding segment before and after Error; Error correction adopts multi-sequence merging to finally obtain reliable recovery of fragments with a repetition code length of a preset length. The present invention mainly addresses insertion / deletion errors in sequencing reads and the merging of reads with low sequencing coverage.

Description

technical field [0001] The invention relates to the field of data storage using deoxyribonucleic acid (DNA), in particular to a DNA data storage hybrid error correction and data recovery method. Background technique [0002] Deoxyribonucleic acid (DNA) is a double-stranded structure composed of deoxyribose sugar and four nitrogenous bases (including adenine A, thymine T, cytosine C, guanine G), and is the genetic information carrier of all life , which controls the development and continuation of life and the operation of life functions, is the natural and most important information storage carrier in nature. With the development of biotechnology, especially the development of DNA synthesis and sequencing technology, it has become technically feasible to use DNA sequence as a digital data storage carrier. DNA digital information storage refers to storing digital information in the base sequence of DNA, and using different bases or base combinations to represent data. This ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F11/10G06F16/28
CPCG06F11/10G06F16/285
Inventor 陈为刚黄刚韩昌彩杨晋生
Owner TIANJIN UNIV