A super-high throughput rate LDPC decoder based on shift-based base graph

By using a shift-based LDPC decoder, the problem of high hardware resource consumption in high-throughput scenarios of existing decoders is solved by utilizing the row-wise cyclic shift relationship of the LDPC code base map and multi-core parallel decoding, thus achieving efficient hardware resource utilization and throughput improvement.

CN115694513BActive Publication Date: 2026-06-19NANJING UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NANJING UNIV
Filing Date
2022-09-26
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing LDPC decoders consume significant hardware resources in high-throughput scenarios, and some parallel architectures struggle to further improve the throughput-to-area ratio.

Method used

An LDPC decoder based on a shift-type base map is adopted. It utilizes the fixed row-by-row cyclic shift relationship between adjacent rows of the LDPC code base map, and combines a partially parallel architecture and a minimum sum decoding algorithm. By using multi-core parallel decoding and row-by-row group decoding timing, hardware resource consumption is reduced and the decoder throughput is improved.

Benefits of technology

It effectively reduces the hardware area of ​​the decoder, improves the efficiency per unit area, and significantly improves the decoding throughput through multi-core parallel decoding.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115694513B_ABST
    Figure CN115694513B_ABST
Patent Text Reader

Abstract

This invention discloses an LDPC decoder belonging to the field of channel coding and decoding technology. In some scenarios of modern communication, LDPC decoders require high throughput but consume significant hardware resources. This invention designs an ultra-high throughput LDPC decoder based on a shift-based basis graph. The decoder instantiates multiple decoders internally, and the decoder core uses an LDPC parity-check matrix based on a shift-based basis graph, employing a row-grouped minimum sum decoding algorithm for decoding. The advantages of this invention are: using an LDPC parity-check matrix based on a shift-based basis graph to design the decoder architecture reduces the hardware resource consumption of the read and write-back modules in the row-grouped decoder, reduces the decoder area, effectively improves the throughput per unit area, and significantly improves the decoding throughput through multi-core parallel decoding.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of channel coding and decoding technology in the field of communications, and specifically relates to a shift-based LDPC decoder with low hardware resource consumption and high throughput for use in channel decoding. Background Technology

[0002] With the increasing informatization of society, my country's communication technology is also developing rapidly. For example, research on technologies related to fifth-generation and more advanced mobile communications is constantly advancing. Among these, high transmission efficiency and high reliability in communication technology are goals that everyone strives for excellence in and constantly pursues.

[0003] Low-Density Parity Check (LDPC) codes, proposed by Dr. Gallager in 1962, are channel coding and decoding methods for forward error correction. They possess excellent software decoding performance, but limitations in computing power and semiconductor manufacturing processes at the time prevented their immediate application. However, their superior error correction performance led to their rediscovery by industry and academics in the 1990s, resulting in widespread attention and research. Due to the high reliability provided by their superior performance, LDPC codes are now used in high-speed memory, optical communication, and mobile communication. These fields also have high requirements for transmission efficiency (throughput) and the cost of decoder hardware implementation.

[0004] LDPC decoders implemented in a fully parallel manner suffer from severe wiring congestion. Serial decoding architectures generally cannot provide high decoding throughput and have high decoding latency. Therefore, partially parallel architectures are more commonly used, as they can better balance throughput and hardware complexity. However, for scenarios with extremely high throughput, partially parallel architectures still need continuous improvement to achieve a higher throughput-to-area ratio, thereby improving the transmission efficiency of encoding and decoding. Summary of the Invention

[0005] Objective of the Invention: This invention addresses the shortcomings of current LDPC decoders by disclosing a low-hardware-area, high-throughput LDPC decoder based on a shift-type basis graph. Specifically, this decoder is applicable to, but not limited to, fields such as mobile communications.

[0006] Technical Solution: A high-throughput LDPC decoder based on a shift-type base map, characterized in that: adjacent rows of the LDPC code base map used in this decoder have a fixed row-wise cyclic shift relationship. The base map represents the positions of all non-zero elements in the LDPC code parity-check matrix. All rows in the shift-type base map except the first row are obtained by cyclically shifting the first row of the base map by a certain number of bits. The decoder core used internally is based on a partially parallel architecture, employing a row-wise grouping decoding timing and minimum sum decoding algorithm for decoding, with the group width being the dimension of the quasi-cyclic LDPC code submatrix.

[0007] Specifically, it includes an input data bit width conversion module, a bit width conversion memory, an input arbitration module, a shift register, a post-hoc information reading module, a variable node processing module, a check node processing module, a barrel shifter, a post-hoc information processing module, a post-hoc information write-back module, an output selection module, a shift value storage module, a check node information memory, a check node information decompression module, an early termination control module, and a selector;

[0008] This decoder instantiates multiple decoder cores internally. The bit width conversion module is used to receive external channel information and convert it into the bit width required by the decoder core. The output selection module is used to output the data decoded by the decoder core in a time-division manner.

[0009] The decoding process is as follows:

[0010] The channel information input to the decoder is first entered into the input data bit-width conversion module in blocks. Once a complete frame of data is collected, it is stored in the bit-width conversion memory. This frame information is then transmitted to the corresponding decoder core's shift register after the input arbitration module arbitrates the current decoder core's idle state. The input arbitration module uses a fixed-priority arbitration method; under the same idle state, the core with higher priority receives the frame information first. While receiving external information, the internal non-idle decoder cores are in decoding mode.

[0011] The decoder core's a posteriori (AP) information reading module selects a portion of the APA information from the shift registers based on the LDPC code base map. The selected position is fixed according to the position of the non-zero element in the first row of the LDPC code base map. The selected information will be used for subsequent calculations and updates. The input to the reading module is the entire APA information block, and the output is the maximum row overlap information.

[0012] After decoding begins, the barrel shifters shift the read information to align it with the check node. The number of barrel shifters is equal to the maximum row weight.

[0013] The variable node processing module receives channel information read from the shift register by the a posteriori information reading module and parity node information read from the parity node information memory after being decompressed by the parity node information decompression module, and calculates the updated variable node information. This decoder's variable node processing module contains multiple variable node processing units that calculate the variable node information in parallel. The number of variable node processing units is equal to the dimension of the quasi-cyclic LDPC codeson matrix multiplied by the maximum row weight of the LDPC parity check matrix.

[0014] After the variable node processing module finishes updating, the check node processing module receives the updated variable node information and calculates the updated check node information, including the magnitude and sign of the information. The decoder's check node processing module contains multiple check node processing units. These units calculate the check node information in parallel, with the number of units (i.e., the degree of parallelism) equal to the dimension of the quasi-cyclic LDPC codeson matrix. The number of inputs to each check node unit is the maximum row weight, and the outputs are the first minimum value, the second minimum value, and the index of the first minimum value. The output of the entire check node processing module also includes the sign of each check node information.

[0015] After the check node update is completed, the updated check node information is stored in the check node information storage. The updated check node information is compressed, and the compressed information includes: the first minimum value, the second minimum value, the index of the first minimum value, and the sign bit. It needs to be decompressed by the check node information decompression module before being transmitted to the post-verification information processing module.

[0016] The a posteriori information processing module receives the updated and decompressed check node information and the updated variable node information, and calculates the updated a posteriori information. This decoder's a posteriori information processing module contains multiple variable node processing units that calculate the a posteriori information in parallel. The number of a posteriori information processing units is equal to the dimension of the quasi-cyclic LDPC codeson matrix multiplied by the maximum row weight of the LDPC check matrix. The a posteriori information module also updates the row check results, confirming whether the current decoding row check is satisfied and whether all row checks are satisfied. The results are used to prematurely terminate decoding.

[0017] After the posterior information obtained by the posterior information module is shifted by the barrel shifter, the posterior information write-back module performs certain shift processing on the posterior information according to the shift parameters of the LDPC code base diagram and writes it back to the shift register. The posterior information write-back module of this decoder writes the updated posterior information back to the shift register according to the LDPC code base diagram, and the write-back position is fixedly selected based on the position of the non-zero element in the first row of the LDPC code base diagram.

[0018] An iteration is complete when all rows have been decoded and all posterior information has been updated. The decoder traverses all rows of the parity-check matrix in cyclic order; that is, after the last row is decoded, the next row to be decoded is the second-to-last row. If the decoder core completes decoding within the specified number of iterations, it will terminate decoding early and output the result through the selector and output selection module; otherwise, if the decoder fails to complete decoding after reaching the maximum number of iterations, the decoder core will also terminate decoding but declare decoding failure.

[0019] The output selection module uses a fixed-priority output selection method. When all decoders have data to be output, the core with the higher priority outputs data first. If all row checks are satisfied, causing early decoding termination, or if the maximum number of decoding iterations is reached, causing decoding termination, the decoder will enter the output-waiting state. Before the decoding result is fully output in the core, the decoder core will not receive new external channel information.

[0020] The beneficial effects of this decoder are as follows: There is a fixed row-wise cyclic shift relationship between adjacent rows of the LDPC code base map used in this decoder. When using the decoding timing of row-wise grouping, the hardware resource consumption of the read and write-back modules in the row-wise grouping decoder is reduced by utilizing the property of the LDPC parity check matrix of the shifted base map, thereby reducing the area of ​​the decoder and effectively improving the efficiency per unit area. Furthermore, the decoding throughput is significantly improved through multi-core parallel decoding. Attached Figure Description

[0021] Figure 1a A schematic diagram of the shift-type basis diagram for quasi-cyclic LDPC codes;

[0022] Figure 1b This is a schematic diagram of the quasi-cyclic LDPC code parity-check matrix in a specific implementation method;

[0023] Figure 1c This is a diagram illustrating the locations of the posterior information that need to be selected when decoding each line;

[0024] Figure 1d This is a schematic diagram of data movement in a shift register;

[0025] Figure 2 This is a diagram showing the overall architecture of this decoder;

[0026] Figure 3 This is a diagram of the decoder core structure in this decoder; Detailed Implementation

[0027] The present invention will be further illustrated below with reference to the accompanying drawings and specific embodiments. It should be understood that these embodiments are for illustrative purposes only and are not intended to limit the scope of the invention. After reading this invention, any modifications of the invention in various equivalent forms by those skilled in the art will fall within the scope defined by the appended claims.

[0028] Shift-type basis maps are a special type of LDPC code basis map, using S pg This indicates that each element in the shifted base graph is either 0 or 1. A value of 1 indicates a connection between the variable node and the check node at the current position, while a value of 0 indicates no connection between them. pg The number of rows and columns are set to M and N, respectively. pg Rows 2, 3...N are all obtained by cyclically shifting row 1; this is the fundamental property and constraint of the shifted base diagram. This S... pg For illustrative purposes only, S pg It can be flexibly transformed based on the fundamental properties and constraints of the shifted base map. Using S... pcm This represents the above LDPC code base diagram S. pg The corresponding LDPC code parity-check matrix. S pcm The size of the submatrix used is set to z. In the following explanation, we set the specific parameters as M=4, N=16, and the specific S... pg like Figure 1a As shown, the decoder core count is set to 2, the channel information quantization bit count is set to 6 bits, the submatrix size is set to z = 128, the decoder input / output interface bit width is set to 512 bits, and the LDPC parity check matrix S used for decoding is... pcm like Figure 1b S pcm The maximum line weight is 12, and the working clock frequency of the decoding core is set to 200MHz. Since the number of channel information quantization bits is 6, the working frequency of the input data bit width conversion module can be set to 1200MHz. This working frequency can be determined according to the throughput required.

[0029] Based on the above parameter settings, the workflow of this LDPC decoder is as follows: Before actual decoding begins, the decoder first receives decoding parameters, such as the number of decoder rows and the maximum number of iterations. After the parameters are received, the channel information is received. The length of the entire codeword channel information is 16*128*6 bits. Therefore, the input data bit-width conversion module needs 24 clock cycles to receive one complete frame of codeword channel information. After reception, the frame of codeword channel information is stored in the bit-width conversion memory, waiting to be distributed to the decoder core.

[0030] The overall architecture diagram of this decoder is as follows: Figure 2As shown, in the initial state, all decoder cores are in an idle state. The input arbitration module directly inputs the codeword channel information from the bit-width conversion memory into the shift register of decoder core 1. At this point, the data reception phase is complete, and iterative decoding begins. While decoder core 1 is decoding, the input data bit-width conversion module receives the next frame of codeword channel information. After receiving a complete frame of codeword channel information, it distributes it to decoder core 2. Subsequently, upon receiving another complete frame of codeword channel information, it begins distributing it to the corresponding decoder core based on its idle state.

[0031] For the specific decoding process within a single decoder core, in the first clock cycle of the first iteration, the a posteriori information reading module reads the codeword channel information from the shift register. Based on the decoding using the LDPC parity-check matrix, the a posteriori information reading module will... pcm The partial block information corresponding to the first row is read out, and the module based on the shifted base map posterior information directly selects it using a direct connection method.

[0032] It's important to understand that using a direct connection to select data can save a significant amount of selectors used for data selection. For the first row, the posterior information module follows... Figure 1c The posterior information identified in the first line will be the shift registers LLR1, LLR2, LLR4, LLR5, LLR7, LLR9, and LLR... 10 LLR 11 LLR 13 LLR 14 LLR 15 LLR 16A total of 12 direct-connection segments are selected, each with a data width of 128*6 bits. In the diagram, LLRn represents the initial channel information or posterior information of the corresponding column, where n is the column index. These 12 data segments are input to a barrel shifter for shifting and alignment to the check node. The number of barrel shifters is the same as the maximum row weight, which is 12 groups. The variable node processing module receives the information shifted by the barrel shifter. Simultaneously, the check node information memory decompresses the compressed information and also transmits it to the variable node processing module. The basic unit of the variable node processing module, the variable node processing unit, subtracts the corresponding posterior information from the check node information to obtain the new variable node information required for the current row iteration. During the first iteration, the check node information memory does not store valid data, and the check node information decompression module outputs all zeros. Each row of the check node information memory stores only 128 compressed information entries. The data format of each compressed information entry is: the first minimum value, the second minimum value, the index of the first minimum value, and the sign bit of the check node information. After decompression, the number of information items becomes 12*128, and the variable node processing module also processes 12*128 variable node information items. The variable node information is then regrouped into 128 groups according to their corresponding group indices, with each group containing 12 information items, and input into the verification node processing module. The verification node processing module contains 128 verification node processing units. Each verification node processing unit calculates the first minimum value and its index, and the second minimum value within each group of information, and outputs one compressed information item. The 128 compressed information items output by the verification node processing module are stored in the verification node information memory according to their corresponding rows, and simultaneously transmitted to the verification node information decompression module to generate new verification node information, with 12*128 new verification node information items. The post-verification information processing module receives the new verification node information and the new variable node information. The post-verification information processing module contains 12*218 post-verification information processing units, each of which adds one pair of new verification node information and one pair of new variable node information to obtain new post-verification information. The number of new posterior information entries is 12*128. This information is transmitted to 12 sets of barrel shifters for further shifting and alignment to the variable nodes. The shifted information is then directly written back to the shift register via the posterior information write-back module using a direct connection. To ensure the posterior information reading module can continue reading posterior information directly when decoding the next line, the shift register needs to be shifted up by a sub-matrix size (128*6 bits) during the write-back based on the shifted base map. Figure 1d As shown, after the first line is decoded, the relative positions of the a posteriori information stored in the shift register are as follows: Figure 1d As shown in the second row.

[0033] It's important to understand that using a direct connection for write-back saves a significant amount of data selection on the multi-selector. When the posterior information module calculates new posterior information, it determines whether the parity check equation corresponding to the entire parity check matrix is ​​satisfied based on its sign. This result is used for the early exit function during the decoding process.

[0034] The above process is completed Figure 1b The decoding of the first row of the parity-check matrix in the first iteration is consistent with the decoding process described above for the second, third, and fourth rows of the parity-check matrix. The posterior information to be selected for the second, third, and fourth rows is as follows: Figure 1c As shown in lines 2, 3, and 4. Since the shift register performs a shift operation at the end of each line of decoding, the a posteriori information reading module only needs to shift LLR1, LLR2, LLR4, LLR5, LLR7, LLR9, and LLR when reading a posteriori information in lines 2, 3, and 4 of the decoding. 10 LLR 11 LLR 13 LLR 14 LLR 15 LLR 16 All 12 directly connected segments need to be selected. After decoding line 4, the second iteration begins. The second iteration starts decoding from line 3. It's important to understand that in the second and subsequent iterations, the data output by the decompression module of the check node information received by the variable node processing module is no longer all zeros, but valid data obtained by decompressing the information written to the check node information memory in the previous iteration. The second iteration ends when it reaches line 1. Then, decoding continues for the third and subsequent iterations. If the early exit condition is met before the maximum number of iterations is reached, decoding will terminate early and be declared successful; otherwise, decoding will also terminate when the maximum number of iterations is reached, and the success of decoding will be determined based on the satisfaction of all check matrices corresponding to the check matrix.

Claims

1. A high-throughput LDPC decoder based on a shift-type basis graph, characterized in that: The LDPC code base map used in this decoder has a fixed row-wise cyclic shift relationship between adjacent rows. The base map represents the relative positions of all non-zero elements in the LDPC code parity check matrix. The decoder core used inside this decoder adopts a row-wise grouping decoding timing and minimum sum decoding algorithm. The width of each row is the dimension of the quasi-cyclic LDPC code submatrix. Specifically, it includes an input data bit width conversion module, a bit width conversion memory, an input arbitration module, a shift register, a post-hoc information reading module, a variable node processing module, a check node processing module, a barrel shifter, a post-hoc information processing module, a post-hoc information write-back module, an output selection module, a check node information memory, a shift value storage module, a check node information decompression module, an early termination control module, and a selector; This decoder instantiates multiple decoder cores internally. The input data bit width conversion module is used to receive external information and convert it into the bit width required by the decoder core. The output selection module is used to output the data decoded by the decoder core in a time-division manner. Decoding process The channel information of the input decoder is first entered into the input data bit width conversion module in blocks. The data of the entire frame is collected and stored in the bit width conversion memory. The frame information is then transmitted by the input arbitration module to the shift register of the corresponding decoder core according to the current idle state of the decoder core. After decoding begins, the variable node processing module receives and processes two parts of information: first, the channel information read from the shift register by the a posteriori information reading module and then shifted by the barrel shifter; second, the check node information read from the check node information memory and then decompressed by the check node information decompression module. These are processed to obtain the updated variable node information. After the variable node processing module finishes updating, the check node processing module receives the updated variable node information and calculates the updated check node information, including its amplitude and sign. The shift values ​​of the barrel shifter all originate from the shift value storage module. After the check node update is completed, the updated check node information is stored in the check node information storage. At the same time, the a posteriori information processing module receives the updated check node information and the updated variable node information, and calculates the updated a posteriori information. The a posteriori information processing module also updates the row check results to confirm whether the current decoding row check is satisfied and whether all row checks are satisfied. The results will be used to terminate the decoding early. The early termination control module receives the verification result of this line and compares the sign bit of the updated posterior information with the sign bit of the old posterior information. The comparison result is used for the early exit judgment. After the posterior information obtained by the posterior information processing module is shifted by the barrel shifter, the posterior information write-back module performs certain shift processing on the posterior information according to the shift parameters of the LDPC code base map and writes it back to the shift register. At this time, the decoding of a row parity matrix is ​​completed. An iteration is completed when all rows are decoded and all a posteriori information is updated. At this point, the channel information stored in the shift register is updated to the a posteriori information. According to the properties of the shift-type base map, the decoder core traverses all rows of the parity check matrix in loop order. That is, after the last row is decoded, the next row to be decoded is the second to last row. If the decoder core completes the decoding within the specified number of iterations, the decoder core will terminate the decoding early and output the result through the selector and output selection module. Otherwise, if the decoder has not completed the decoding after reaching the maximum number of iterations, the decoder core will also terminate the decoding, but declare the decoding failure.

2. The ultra-high throughput LDPC decoder based on a shift-type basis graph as described in claim 1, characterized in that: The decoder contains multiple decoder cores that perform parallel decoding.

3. The ultra-high throughput LDPC decoder based on a shift-type basis graph as described in claim 1, characterized in that: The a posteriori information reading module of the decoder core selects a portion of the a posteriori information from the shift register according to the LDPC code base map. The selected position is fixed according to the position of the non-zero element in the first row of the LDPC code base map. The selected portion of information will be used for subsequent calculations and updates.

4. The ultra-high throughput LDPC decoder based on a shift-type basis graph as described in claim 1, characterized in that: The a posteriori information write-back module of the decoder core writes the updated a posteriori information back to the shift register according to the LDPC code base diagram. The write-back position is fixedly selected according to the position of the non-zero element in the first row of the LDPC code base diagram.

5. The ultra-high throughput LDPC decoder based on a shift-type basis graph as described in claim 1, characterized in that: The decoder core can support some LDPC code base maps with inter-row shift relationships, meaning that some columns may not participate in the shift.