DNA data storage mixed error correction and data recovery method

An error correction and recovery method technology, applied in the data storage field of deoxyribonucleic acid, can solve the problems of high cost of digital information, waste of synthesis and sequencing resources, and reduced reliability of data recovery, so as to ensure reliability and improve the utilization of sequencing data. efficiency, high reliability

Active Publication Date: 2019-11-12
TIANJIN UNIV
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The downside of DNA storage is that the cost of synthesizing and reading the digital information stored in DNA is high, but the daily storage of DNA molecules is relatively cheap
The reading method proposed by Nick Goldman et al. uses Hamming distance as a screening condition in data screening, which may discard reads with only a small number of insertion or deletion errors, reducing the number of samples used for data recovery and wast...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • DNA data storage mixed error correction and data recovery method
  • DNA data storage mixed error correction and data recovery method
  • DNA data storage mixed error correction and data recovery method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] In order to make the purpose, technical solution and advantages of the present invention clearer, the implementation manners of the present invention will be further described in detail below.

[0046] see figure 1 , a DNA data storage hybrid error correction and data recovery method, the method comprises the following steps:

[0047] (1) Preprocessing the sequencing data, specifically: for the double-end read sequence, one of the read segments is reversed and complemented to obtain two overlapping read segments;

[0048] (2) Screen the sequencing reads according to the edit distance of the overlapping parts of the double-end reads and whether the verification information of the labeled part is correct;

[0049] (3) Cluster the reads according to the recovered label and file number, considering that the label part has been processed, and divide the reads in the cluster into two parts: the middle overlapping part and the non-overlapping part;

[0050] (4) If the copy n...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a DNA data storage mixed error correction and data recovery method. The method comprises the following steps: screening sequencing reads according to two standards of an editing distance of an overlapping part of double-end reads and whether verification information of a label part is correct or not; clustering the read segments according to the recovered label and file number, and dividing the read segments in the cluster into a middle overlapping part and a non-overlapping part; if the number of the copies of the middle overlapped part or the non-overlapped part is greater than a set threshold value, determining a central sequence by adopting a clustering method, otherwise, performing determining through multi-sequence combination; dividing a basic group of a datapart corresponding to each read segment into a plurality of segments with preset lengths, and performing joint error correction on each segment according to odd-even of a column sequence number and corresponding segments before and after the odd-even of the column sequence number; and during error correction, adopting multi-sequence combination to finally obtain reliable recovery of the fragmentof which the repeated code length is the preset length. The method mainly solves the problems of insertion/deletion errors in sequencing reads and combination of reads covered by low sequencing.

Description

technical field [0001] The invention relates to the field of data storage using deoxyribonucleic acid (DNA), in particular to a DNA data storage hybrid error correction and data recovery method. Background technique [0002] Deoxyribonucleic acid (DNA) is a double-stranded structure composed of deoxyribose sugar and four nitrogenous bases (including adenine A, thymine T, cytosine C, guanine G), and is the genetic information carrier of all life , which controls the development and continuation of life and the operation of life functions, is the natural and most important information storage carrier in nature. With the development of biotechnology, especially the development of DNA synthesis and sequencing technology, it has become technically feasible to use DNA sequence as a digital data storage carrier. DNA digital information storage refers to storing digital information in the base sequence of DNA, and using different bases or base combinations to represent data. This ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F11/10G06F16/28
CPCG06F11/10G06F16/285
Inventor 陈为刚黄刚韩昌彩杨晋生
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products