A DNA data storage decoding method and apparatus

By employing an improved normalized minimum sum algorithm and multiple iterative decoding processes, the high error rate in DNA data storage decoding technology was resolved, thereby enhancing decoding performance.

CN116192160BActive Publication Date: 2026-06-19GUANGDONG UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
GUANGDONG UNIV OF TECH
Filing Date
2023-03-13
Publication Date
2026-06-19

Smart Images

  • Figure CN116192160B_ABST
    Figure CN116192160B_ABST
Patent Text Reader

Abstract

This application discloses a DNA data storage decoding method and apparatus. The technical solution provided by this application adopts an improved normalized minimum sum decoding logic in the variable node update stage. After completing the first round of decoding and determining that the decoding is unsuccessful, the useful information obtained from the first round of decoding is used to update the initial LLR value. Then, the updated initial LLR value is used to perform a new round of decoding, thereby reducing the bit error rate and improving the overall bit error rate performance. This solves the technical problem of high bit error rate in existing DNA data storage decoding technologies.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of information technology, and in particular to a method and apparatus for DNA data storage and decoding. Background Technology

[0002] With the continuous upgrading of information technology and internet applications, how to store the massive amounts of data generated has become an unavoidable problem. To alleviate the growing gap between the explosive growth of data production and current storage capacity, the demand for new storage media is becoming increasingly urgent. Considering its ultra-high density storage capacity and stability, deoxyribonucleic acid (DNA) is considered a suitable storage medium for long-term data storage.

[0003] DNA storage technology refers to the technology of using artificially synthesized DNA to store data such as text documents, images, and audio files, which can then be fully retrieved. It boasts advantages such as high efficiency, large storage capacity, long storage time, easy access, and maintenance-free operation. DNA storage consists of four nitrogenous bases called adenine (A), thymine (T), cytosine (C), and guanine (G), also known as bases. These four bases combine in different arrangements to form oligonucleotides (Oligs), i.e., DNA strands. Based on the principle of complementary ligands between A and T, and C and G bases, a double-helix DNA molecule structure can be formed, with each base representing a certain number of bits of data.

[0004] Current data storage generally employs error-correcting coding techniques to achieve enhanced data protection. The main process involves adding redundancy to the information sequence; the information sequence and redundancy together form a codeword. To recover the information sequence, the codeword is decoded. However, existing initial LLR values ​​calculated directly from received symbols are inaccurate. Directly inputting this LLR into the decoder can easily degrade the decoding performance of the DNA channel, leading to a high bit error rate. Summary of the Invention

[0005] This application provides a DNA data storage and decoding method and apparatus to solve the technical problem of high error rate in existing DNA data storage and decoding technologies.

[0006] To address the aforementioned technical problems, the first aspect of this application provides a DNA data storage and decoding method, comprising:

[0007] Obtain the DNA sequence and calculate two initial LLR values ​​for each bit pair in the DNA sequence;

[0008] The DNA sequence is updated using a node update equation based on an improved normalized minimum sum algorithm, which updates the variable nodes and check nodes.

[0009] Based on the updated variable nodes, check nodes, and the initial LLR value, the DNA sequence is subjected to variable node decision processing to obtain the codeword data for this decoding cycle.

[0010] The variable node update process, check node update process, and variable node decision process are executed in a loop. When the preset loop termination condition is met, the iterative decoding process is terminated, and the latest codeword data is used as the decoding output.

[0011] According to the preset decoding result judgment conditions, the decoding output result is verified. If the verification result of the decoding output result is that the decoding is unsuccessful, the initial LLR value is updated, and the decoding is re-decoded based on the updated initial LLR value to obtain a new decoding output result.

[0012] Preferably, the node update equation is as follows:

[0013]

[0014]

[0015] In the formula, N(j) refers to the set of all check nodes connected to variable node j, and Ch j Let J be the initial LLR value for variable node j. For the external information passed from variable node j to check node i in the iter-th iteration, δ represents the external information passed from check node i to variable node j in the iter-th iteration, where δ is the compensation factor. In the (iter-1)th iteration, N(i) refers to the external information passed from variable node j to check node i. N(i) is a set of all variable nodes connected to check node i. j' is used to refer to all variable nodes in the set N(i) except variable node j.

[0016] Preferably, the step of performing variable node decision processing on the DNA sequence based on the updated variable node, check node, and the initial LLR value to obtain the codeword data for this decoding cycle specifically includes:

[0017] Based on the updated variable node, check node, and the initial LLR value, the decision value of the variable node is obtained from the decision value calculation formula. The decision value is then compared with a preset decision threshold to determine the codeword data for the current decoding cycle.

[0018] Preferably, the formula for calculating the decision value is as follows:

[0019]

[0020] In the formula, Let N(j) be the decision value of variable node j, and let N(j) be the set of all check nodes connected to variable node j. j Let J be the initial LLR value for variable node j. It represents the total external information of variable node j.

[0021] Preferably, updating the initial LLR value if the verification result of the decoding output is that the decoding was unsuccessful specifically includes:

[0022] If the decoding output result is unsuccessful, the absolute value of the total external information of each bit pair in the current DNA sequence is compared with the bit flip threshold of the bit pair. If the absolute value of the total external information is less than the bit flip threshold, the corresponding bit of the bit pair is flipped; otherwise, the bit pair remains unchanged. The bit flip threshold is calculated by multiplying the initial LLR absolute value of the bit pair by the number of check equations associated with that bit.

[0023] Meanwhile, a second aspect of this application provides a DNA data storage decoding device, comprising:

[0024] A DNA sequence preprocessing unit is used to acquire a DNA sequence and calculate two initial LLR values ​​for each bit pair in the DNA sequence.

[0025] The node processing unit is used to update the variable nodes and check nodes of the DNA sequence using a node update equation based on an improved normalized minimum sum algorithm.

[0026] The codeword data acquisition unit is used to perform variable node decision processing on the DNA sequence based on the updated variable node, check node and the initial LLR value to obtain the codeword data for the current decoding cycle.

[0027] The loop control unit is used to perform variable node update processing, check node update processing, and variable node decision processing in a loop. When the preset loop termination condition is met, the iterative decoding process is terminated, and the latest codeword data is used as the decoding output result.

[0028] The decoding result verification unit is used to verify the decoding output result according to the preset decoding result judgment conditions. If the verification result of the decoding output result is that the decoding is unsuccessful, the initial LLR value is updated, and the decoding is re-decoded based on the updated initial LLR value to obtain a new decoding output result.

[0029] Preferably, the node update equation is as follows:

[0030]

[0031]

[0032] In the formula, N(j) refers to the set of all check nodes connected to variable node j, and Ch j Let J be the initial LLR value for variable node j. For the external information passed from variable node j to check node i in the iter-th iteration, δ represents the external information passed from check node i to variable node j in the iter-th iteration, where δ is the compensation factor. In the (iter-1)th iteration, N(i) refers to the external information passed from variable node j to check node i. N(i) is a set of all variable nodes connected to check node i. j' is used to refer to all variable nodes in the set N(i) except variable node j.

[0033] Preferably, the codeword data acquisition unit is specifically used for:

[0034] Based on the updated variable node, check node, and the initial LLR value, the decision value of the variable node is obtained from the decision value calculation formula. The decision value is then compared with a preset decision threshold to determine the codeword data for the current decoding cycle.

[0035] Preferably, the formula for calculating the decision value is as follows:

[0036]

[0037] In the formula, Let N(j) be the decision value of variable node j, and let N(j) be the set of all check nodes connected to variable node j. j Let J be the initial LLR value for variable node j. It represents the total external information of variable node j.

[0038] Preferably, updating the initial LLR value if the verification result of the decoding output is that the decoding was unsuccessful specifically includes:

[0039] If the decoding output result is unsuccessful, the absolute value of the total external information of each bit pair in the current DNA sequence is compared with the bit flip threshold of the bit pair. If the absolute value of the total external information is less than the bit flip threshold, the corresponding bit of the bit pair is flipped; otherwise, the bit pair remains unchanged. The bit flip threshold is calculated by multiplying the initial LLR absolute value of the bit pair by the number of check equations associated with that bit.

[0040] As can be seen from the above technical solutions, the embodiments of this application have the following advantages:

[0041] The technical solution provided in this application adopts an improved normalized minimum sum decoding logic in the variable node update stage. After completing the first round of decoding and determining that the decoding was unsuccessful, the useful information obtained from the first round of decoding is used to update the initial LLR value, and then this LLR value is used for a new round of decoding, thereby reducing the bit error rate and improving the overall bit error rate performance. This solves the technical problem of high bit error rate in existing DNA data storage decoding technology. Attached Figure Description

[0042] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0043] Figure 1 This is a flowchart illustrating a DNA data storage and decoding method provided in this application.

[0044] Figure 2 This is a schematic diagram of a DNA data storage and decoding device provided in this application.

[0045] Figure 3 This is a schematic diagram of the error probability model for DNA decoding.

[0046] Figure 4 This is a comparison chart of the results of LDPC code simulation experiments under a code rate of 0.5.

[0047] Figure 5 This is a comparison chart of the results of LDPC code simulation experiments under a code rate of 0.9. Detailed Implementation

[0048] This application provides a DNA data storage and decoding method, apparatus, terminal, and medium to solve the technical problem of high error rate in existing DNA data storage and decoding technologies.

[0049] To make the inventive objectives, features, and advantages of this application more apparent and understandable, the technical solutions in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the embodiments described below are only some embodiments of this application, and not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0050] DNA storage is composed of four nitrogenous bases (nt), called adenine (A), thymine (T), cytosine (C), and guanine (G), also known as bases. These four bases combine in different arrangements to form oligonucleotides (Oligs), which are DNA strands. Based on the principle of complementary ligands between A and T, and C and G bases, a double-helix DNA molecule can be formed, with each base representing a certain number of bits of data.

[0051] First, the data to be stored is input in binary sequence form. After source compression coding and channel error correction coding, the binary sequence is converted into a base sequence composed of A, G, C, and T using a certain mapping rule. Finally, the mapped base sequence is synthesized into oligonucleotides using biochemical methods and randomly and independently stored in specialized storage containers in solution or dry powder form. Current DNA synthesis technology can efficiently synthesize an oligonucleotide of 200 to 1000 bases in length; therefore, the written base sequence needs to be divided into short sequences of equal length before synthesis. (Each base sequence to be synthesized mainly includes primers, addresses, information data, and check bits. The addresses are used to locate, splice, and search the segmented short sequences; while the primers are located at both ends of the sequence, serving as the connection points between the short sequences.)

[0052] During data reading, the DNA strands in the storage pool must first be amplified using polymerase chain reaction (PCR) technology to copy the data. PCR is a process that exponentially increases the number of copies of selected oligonucleotides. Then, the amplified oligonucleotides are sequenced using a sequencing instrument to read their base arrangement. (A typical sequencing method utilizes the different colors emitted by fluorescent nucleotides; by detecting the color, the DNA sequence represented by the oligonucleotide can be read.) Subsequently, the sequence is reverse-mapped into a binary sequence, followed by error correction decoding and decompression to finally obtain the read information sequence.

[0053] First, an embodiment of the DNA data storage and decoding method provided in this application is described, as follows:

[0054] Please see Figure 1 This embodiment provides a DNA data storage and decoding method, including:

[0055] Step 101: Obtain the DNA sequence and calculate the two initial LLR values ​​for each bit pair in the DNA sequence.

[0056] It should be noted that the DNA sequence decoding process provided in this embodiment mainly includes: LLR calculation, node update, and decision.

[0057] The first stage is the LLR calculation stage, which stores DNA data and maps information sequences to base sequences. The main conversion methods for base sequences are: binary mode, ternary mode, quaternary mode, and a mixture of the above methods.

[0058] Since DNA sequences use four bases—A, T, C, and G—to store information, it is a natural quaternary sequence. Furthermore, the quaternary model has the highest theoretical storage limit for bases (2 bits / nt), making it a widely used mapping method. In this embodiment, the quaternary mapping method is preferred, and the mapping rules are set as follows: T→10, C→01, A→00, G→11.

[0059] According to the mapping rules above, a DNA sequence contains two bit sequences. In this embodiment, the DNA sequence is obtained and divided into a high-order bit sequence and a low-order bit sequence. Before decoding, the initial log-likelihood ratio (LLR) for each bit is calculated. Let... It is the symbol at the j-th position when the DNA sequence first enters the storage pool. It corresponds to the high-order bit. That corresponds to the low-order bits. It is the symbol at the j-th position of the DNA sequence when it is read. It corresponds to the high-order bit. This corresponds to the low-order bits. Therefore, the initial LLR expression for the high and low-order bits... The calculation method is as follows:

[0060]

[0061]

[0062] in This is represented by the symbol y when read out. j When the probability is 0, the higher-order input bit is 0. And so on. The symbols read out are y and y respectively. jThe probabilities of the most significant input bit being 1, the least significant input bit being 0, and the least significant input bit being 1 are given. These probabilities can be found in [reference needed]. Figure 3 For example, after passing through the nanopore sequencing channel, y j =C, by Bayes' theorem, the calculation of the initial LLR can be expressed as:

[0063]

[0064]

[0065] Step 102: Use the node update equation based on the improved normalized minimum sum algorithm to update the variable nodes and check nodes of the DNA sequence.

[0066] Following step 101, the next step is node update, which includes updating variable nodes and check nodes. This embodiment uses a node update equation based on an improved normalized minimum sum algorithm to update the variable nodes and check nodes of the DNA sequence. The node update equation in this embodiment is as follows:

[0067]

[0068]

[0069] In the formula, N(j) refers to the set of all check nodes connected to variable node j, and Ch j Let J be the initial LLR value for variable node j. For the external information passed from variable node j to check node i in the iter-th iteration, δ represents the external information passed from check node i to variable node j in the iter-th iteration, where δ is the compensation factor. In the (iter-1)th iteration, N(i) refers to the external information passed from variable node j to check node i. N(i) is a set of all variable nodes connected to check node i. j' is used to refer to all variable nodes in the set N(i) except variable node j.

[0070] It should be noted that the node update phase processing provided in this embodiment employs a Normalized Min-Sum (NMS) decoding algorithm, which uses a minimum value to approximate the overall calculation result, effectively reducing computational complexity to a certain extent. Simultaneously, to avoid the approximate result being larger than the true value, thus preventing performance loss and increased bit error rate due to this overestimation, this embodiment provides an improved Normalized Min-Sum decoding algorithm. This algorithm introduces a compensation factor δ to compensate for the performance loss caused by this approximation process, achieving a balance between computational performance and computational complexity.

[0071] Step 103: Based on the updated variable nodes, check nodes, and initial LLR values, perform variable node decision processing on the DNA sequence to obtain the codeword data for this decoding cycle.

[0072] Step 104: Execute variable node update processing, check node update processing, and variable node decision processing in a loop. When the preset loop termination condition is met, terminate the iterative decoding process and use the latest codeword data as the decoding output.

[0073] It should be noted that the formula for calculating the decision value of each variable node after iteration is as follows:

[0074]

[0075] in This represents the total external information of variable node j. Decision: If... The corresponding bit decision is Otherwise, it is 0.

[0076] The decoding process involves repeatedly updating the variable nodes to make decisions, i.e., steps 102 to 103, until a preset loop termination condition is met. At this point, the decoding terminates and the latest codeword data is used as the decoding output.

[0077] More specifically, in this embodiment, the loop termination condition can be set to the result of multiplying the codeword obtained from the last iteration of decoding with the parity check equation, which is a vector of all zeros. Or when the number of iterations reaches its maximum value.

[0078] Step 105: According to the preset decoding result judgment conditions, the decoding output result is verified. If the verification result of the decoding output result is that the decoding is unsuccessful, the initial LLR value is updated, and the decoding is re-decoded based on the updated initial LLR value to obtain a new decoding output result.

[0079] After the decoding result is output in step 104, the decoding output result can be verified. If it is determined that the previous decoding result was unsuccessful, the initial LLR value is updated, and the second round of decoding is performed based on the updated initial LLR value to obtain a new decoding output result.

[0080] The implementation of updating the initial LLR value when the previous decoding result is determined to be unsuccessful, as mentioned in this embodiment, can be seen in the following example:

[0081] After the first round of decoding, two new bit sequences will be obtained. If the decoding output check result indicates that decoding was unsuccessful, the initial LLR value for each bit position in the second round of iterative decoding can be set to... Using the initial LLR value of the first round of iterative decoding as a temporary reference. Equal to each other, it can serve as an intermediate amount of temporary data during subsequent updates of the initial LLR value.

[0082] The absolute value of the total external information (LLR) of each bit pair in the current DNA sequence is compared with the bit-flipping threshold. If the absolute value of the total LLR is less than the bit-flipping threshold, the corresponding bit in the bit pair is flipped; otherwise, the bit pair remains unchanged. The bit-flipping threshold is calculated by multiplying the initial LLR absolute value of the bit pair by the number of check equations associated with that bit. That is, the decision to flip a bit (i.e., from 1 to 0, or from 0 to 1) is first made based on the absolute value of the total LLR of each new sequence bit. When this absolute value is less than the initial LLR absolute value of the bit during the first round of decoding multiplied by the number of check equations associated with that bit (i.e., |Ch... j In this embodiment, if |*N(j))” is true, the bit is flipped; otherwise, it is not flipped. Two new bit sequences will eventually be determined.

[0083] The reason for this is that both the external information and the initial LLR value represent the probability of 0 divided by 1. This embodiment initially considers them equivalent; therefore, when the sum of the external information provided by all check equations is less than this threshold, this embodiment considers the external information unreliable and thus flips the bit. Then, these two new bit sequences are mapped to a new symbol sequence, and the initial LLR value is updated by combining this with the channel transition probability.

[0084] from Figure 3 It can be seen that under the same channel conditions, the transition probability between symbols is fixed. Therefore, for the four symbols (A, T, C, G), the initial LLR corresponding to their high and low bits is fixed, resulting in a total of eight possibilities. For ease of explanation, this embodiment sets the initial LLR of the high bit corresponding to symbol A as ACh. ms b, the initial LLR of the low-order bit corresponding to symbol A is set to ACh.lsb The high-order bits corresponding to symbol T are initially set to TCh. msb The low-order bits corresponding to symbol T are initially set to TCh. lsb The high-order bits corresponding to symbol C are initially set to CCh. msb The low-order bits corresponding to symbol C are initially set to CCh. lsb The high-order bits corresponding to symbol G are initially set to GCh. msb The low-order bits corresponding to symbol G are initially set to GCh. lsb .

[0085] After the first round of decoding, two new bit sequences are generated. These two new bit sequences are mapped to a new symbol sequence, which is then compared with the original received symbol sequence. If the symbols are in the same position, the current initial LLR value is used. Conversely, if different positions are found, the initial LLR value needs to be recalculated based on four cases: A, T, C, and G, depending on the received symbol.

[0086] The received symbol is T:

[0087]

[0088]

[0089] Replace the initial LLR value at the corresponding bit position with CCh msb and CCh lsb The reason is that the transition probability between symbols T and C is much greater than the transition probability with other symbols.

[0090] The received symbol is C:

[0091]

[0092]

[0093] Replace the initial LLR value at the corresponding bit position with TCh msb and TCh lsb The reason is that the transition probability between symbols C and T is much greater than the transition probability with other symbols.

[0094] (3) The received symbol is A:

[0095] When the symbol at the same position in the new symbol sequence is T, the initial LLR value at the corresponding bit position is changed to TCh. msb and TCh lsb ;

[0096]

[0097]

[0098] When the symbol at the same position in the new symbol sequence is C, the initial LLR value at the corresponding bit position is changed to CCh. ms b and CCh lsb ;

[0099]

[0100]

[0101] When the symbol at the same position in the new symbol sequence is G, since the transition probability between symbols A and G is almost zero, it is necessary to specifically compare the bit values ​​of the new bit sequence with the original bit sequence. The initial LLR value at the corresponding bit position should be replaced with TCh. msb and CCh lsb ;

[0102]

[0103]

[0104] (4) The received symbol is G:

[0105] When the symbol at the same position in the new symbol sequence is T, the initial LLR value at the corresponding bit position is changed to TCh. msb and TCh lsb ;

[0106]

[0107]

[0108] When the symbol at the same position in the new symbol sequence is C, the initial LLR value at the corresponding bit position is changed to CCh. ms b and CCh lsb ;

[0109]

[0110]

[0111] When the symbol at the same position in the new symbol sequence is A, since the transition probability between symbol G and symbol A is almost zero, it is necessary to specifically compare the bit values ​​of the new bit sequence with the original bit sequence. The initial LLR value at the corresponding bit position should be replaced with CCh. msb and TCh lsb ;

[0112]

[0113]

[0114] After the initial LLR value is recalculated, based on the current... These values ​​are used to replace the initial LLR values ​​for the second round of decoding, sequentially updating the variable nodes, checking nodes, and making decisions. Decoding continues until the maximum number of iterations is reached, at which point the bit error rate is calculated.

[0115] To further illustrate the technical solution of this application, this application also provides a simulation experiment demonstration of the DNA data storage and decoding method based on this application. The simulation was conducted under the Windows 10 operating system, using MATLAB software for bit error rate simulation. The experiment selected LDPC codes with a code rate of 0.5 and 0.9 for simulation, respectively. The simulation results can be found in the graph. Figure 4 and Figure 5 The parameter α, associated with the asymmetric error characteristics of the nanopore sequencer channels, is assumed to range from 0.022 to 0.034, with an incremental programming step of 0.002; the range of α is 0.002 to 0.0026. The maximum number of iterations for decoding is 50. The normalization factor δ for the NMS decoding algorithm is set to 0.8. Figure 4 and Figure 5 It can be seen that the proposed solution outperforms the traditional BP decoding scheme in both complexity and bit error rate performance. This is because this embodiment considers the role of the asymmetric error rate of the DNA channel combined with the external information generated by the decoding scheme in data error correction, and employs the less complex NMS decoding algorithm, which is more suitable for practical applications.

[0116] The above content is a detailed description of a specific embodiment of the DNA data storage and decoding method provided in this application. The following is a detailed description of a specific embodiment of the DNA data storage and decoding method provided in this application.

[0117] Please see Figure 2 This embodiment provides a DNA data storage and decoding device, including:

[0118] DNA sequence preprocessing unit 201 is used to acquire DNA sequences and calculate two initial LLR values ​​for each bit pair in the DNA sequence.

[0119] The node processing unit 202 is used to update the variable nodes and check nodes of the DNA sequence using a node update equation based on an improved normalized minimum sum algorithm.

[0120] The codeword data acquisition unit 203 is used to perform variable node decision processing on the DNA sequence based on the updated variable node, check node and initial LLR value to obtain the codeword data for this decoding cycle;

[0121] The loop control unit 204 is used to perform variable node update processing, check node update processing and variable node decision processing in a loop. When the preset loop termination condition is met, the iterative decoding process is terminated and the latest codeword data is used as the decoding output result.

[0122] The decoding result verification unit 205 is used to verify the decoding output result according to the preset decoding result judgment conditions. If the verification result of the decoding output result is that the decoding is unsuccessful, the initial LLR value is updated, and the decoding is re-decoded based on the updated initial LLR value to obtain a new decoding output result.

[0123] More specifically, the node update equation is as follows:

[0124]

[0125]

[0126] In the formula, N(j) refers to the set of all check nodes connected to variable node j, and Ch j Let J be the initial LLR value for variable node j. For the external information passed from variable node j to check node i in the iter-th iteration, δ represents the external information passed from check node i to variable node j in the iter-th iteration, where δ is the compensation factor. In the (iter-1)th iteration, N(i) refers to the external information passed from variable node j to check node i. N(i) is a set of all variable nodes connected to check node i. j' is used to refer to all variable nodes in the set N(i) except variable node j.

[0127] More specifically, the codeword data acquisition unit is used for:

[0128] Based on the updated variable node, check node, and initial LLR value, the decision value of the variable node is obtained from the decision value calculation formula. The decision value is then compared with a preset decision threshold to determine the codeword data for the current decoding cycle.

[0129] More specifically, the formula for calculating the decision value is as follows:

[0130]

[0131] In the formula, Let N(j) be the decision value of variable node j, and let N(j) be the set of all check nodes connected to variable node j. j Let J be the initial LLR value for variable node j. It represents the total external information of variable node j.

[0132] More specifically, if the decoding output verification result indicates that decoding was unsuccessful, then updating the initial LLR value specifically includes:

[0133] If the decoding output shows that decoding was unsuccessful, the absolute value of the total external information of each bit pair in the current DNA sequence is compared with the bit flip threshold of the bit pair. If the absolute value of the total external information is less than the bit flip threshold, the corresponding bit of the bit pair is flipped; otherwise, the bit pair remains unchanged. The bit flip threshold is calculated by multiplying the initial LLR absolute value of the bit pair by the number of check equations associated with that bit.

[0134] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the terminals, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0135] In the several embodiments provided in this application, it should be understood that the disclosed terminals, devices, and methods can be implemented in other ways. For example, the device embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection between devices or units through some interfaces, and may be electrical, mechanical, or other forms.

[0136] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that embodiments of the application described herein can be implemented, for example, in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0137] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0138] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.

[0139] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0140] The above-described embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.

Claims

1. A DNA data storage decoding method, characterized by, include: Obtain the DNA sequence and calculate two initial LLR values ​​for each bit pair in the DNA sequence; The DNA sequence is updated using a node update equation based on an improved normalized minimum sum algorithm, which updates the variable nodes and check nodes. Based on the updated variable nodes, check nodes, and the initial LLR value, the DNA sequence is subjected to variable node decision processing to obtain the codeword data for this decoding cycle. The variable node update process, check node update process, and variable node decision process are executed in a loop. When the preset loop termination condition is met, the iterative decoding process is terminated, and the latest codeword data is used as the decoding output. According to the preset decoding result judgment conditions, the decoding output result is verified. If the verification result of the decoding output result is that the decoding is unsuccessful, the absolute value of the total external information of each bit pair in the current DNA sequence is compared with the bit flip threshold of the bit pair. If the absolute value of the total external information is less than the bit flip threshold, the corresponding bit of the bit pair is flipped. Otherwise, the bit pair remains unchanged, and it is re-decoded based on the updated initial LLR value to obtain a new decoding output result. The bit flip threshold is calculated by multiplying the initial LLR absolute value of the bit pair by the number of check equations associated with that bit.

2. The DNA data storage and decoding method according to claim 1, characterized in that, The node update equation is specifically as follows: In the formula, It refers to the set of all check nodes connected to variable node j. Let J be the initial LLR value for variable node j. For the external information passed from variable node j to check node i in the iter-th iteration, For the external information passed from check node i to variable node j in the iter-th iteration, As a compensation factor, In the (iter-1)th iteration, N(i) refers to the external information passed from variable node j to check node i. N(i) is a set of all variable nodes connected to check node i. j' is used to refer to all variable nodes in the set N(i) except variable node j.

3. The DNA data storage decoding method of claim 2, wherein, The step of performing variable node decision processing on the DNA sequence based on the updated variable nodes, check nodes, and the initial LLR value to obtain the codeword data for this decoding cycle specifically includes: Based on the updated variable node, check node, and the initial LLR value, the decision value of the variable node is obtained from the decision value calculation formula. The decision value is then compared with a preset decision threshold to determine the codeword data for the current decoding cycle.

4. The DNA data storage decoding method of claim 3, wherein, The specific formula for calculating the decision value is as follows: In the formula, Let j be the decision value for variable node j. It refers to the set of all check nodes connected to variable node j. Let J be the initial LLR value for variable node j. It represents the total external information of variable node j.

5. A DNA data storage decoding apparatus, characterized by, include: A DNA sequence preprocessing unit is used to acquire a DNA sequence and calculate two initial LLR values ​​for each bit pair in the DNA sequence. The node processing unit is used to update the variable nodes and check nodes of the DNA sequence using a node update equation based on an improved normalized minimum sum algorithm. The codeword data acquisition unit is used to perform variable node decision processing on the DNA sequence based on the updated variable node, check node and the initial LLR value to obtain the codeword data for the current decoding cycle. The loop control unit is used to perform variable node update processing, check node update processing, and variable node decision processing in a loop. When the preset loop termination condition is met, the iterative decoding process is terminated, and the latest codeword data is used as the decoding output result. The decoding result verification unit is used to verify the decoding output result according to the preset decoding result judgment conditions. If the verification result of the decoding output result is that the decoding is unsuccessful, the unit compares the absolute value of the total external information of each bit pair in the current DNA sequence with the bit flip threshold of the bit pair. If the absolute value of the total external information is less than the bit flip threshold, the corresponding bit of the bit pair is flipped. Otherwise, the bit pair remains unchanged and is re-decoded based on the updated initial LLR value to obtain a new decoding output result. The bit flip threshold is calculated by multiplying the initial LLR absolute value of the bit pair by the number of check equations associated with that bit.

6. A DNA data storage decoding device according to claim 5, wherein, The node update equation is specifically as follows: In the formula, It refers to the set of all check nodes connected to variable node j. Let J be the initial LLR value for variable node j. For the external information passed from variable node j to check node i in the iter-th iteration, For the external information passed from check node i to variable node j in the iter-th iteration, As a compensation factor, In the (iter-1)th iteration, N(i) refers to the external information passed from variable node j to check node i. N(i) is a set of all variable nodes connected to check node i. j' is used to refer to all variable nodes in the set N(i) except variable node j.

7. A DNA data storage and decoding device according to claim 6, characterized in that, The codeword data acquisition unit is specifically used for: Based on the updated variable node, check node, and the initial LLR value, the decision value of the variable node is obtained from the decision value calculation formula. The decision value is then compared with a preset decision threshold to determine the codeword data for the current decoding cycle.

8. A DNA data storage decoding device according to claim 7, wherein, The specific formula for calculating the decision value is as follows: In the formula, Let j be the decision value for variable node j. It refers to the set of all check nodes connected to variable node j. Let J be the initial LLR value for variable node j. It represents the total external information of variable node j.