A soft and hard cooperative memory access unit EDAC implementation method
By implementing the EDAC method for memory access units through hardware and software collaboration, memory access data masks are generated and address resolution is optimized. Combined with macro definition technology, the problems of low storage efficiency and high hardware overhead of on-chip memory access units in microprocessors are solved, multi-granularity data reliability protection is achieved, and the overall area and power consumption of microprocessors are reduced.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NAT UNIV OF DEFENSE TECH
- Filing Date
- 2023-12-26
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies for EDAC encoding in on-chip memory access units of microprocessors suffer from low storage efficiency, high hardware overhead, and complex control logic, making it difficult to achieve reliable protection of multi-granular data without affecting microprocessor performance.
A hardware-software co-operational EDAC implementation method for memory access units is adopted. By generating memory access data masks, address resolution and addressing, generating EDAC checksums, and optimizing read and write operations, error correction protection for multi-granularity data is achieved. Combined with macro definition technology, small-granularity data is re-encapsulated to reduce hardware overhead.
It significantly reduces the area and power consumption of the microprocessor memory access unit, realizes comprehensive and consistent reliability design and protection for multi-granularity data, supports EDAC functions for multiple data granularities, and reduces the hardware overhead and memory redundancy of the EDAC circuit.
Smart Images

Figure CN117667833B_ABST
Abstract
Description
Technical Field
[0001] This invention mainly relates to the field of microprocessor microarchitecture design technology, specifically a hardware-software co-operation method for implementing an EDAC memory access unit. Background Technology
[0002] As semiconductor process dimensions continue to shrink, the density of transistors integrated on microprocessors increases, on-chip memory capacity grows larger, the distance between memory cells decreases, and supply voltage decreases. This makes them more susceptible to high-energy particle impacts and external environmental interference, leading to single-event upsets (SEUs) and a continuously increasing probability of soft errors in memory access units. Soft errors in on-chip memory access units directly cause data corruption, becoming a major problem affecting the normal operation of microprocessors. Therefore, it is essential to strengthen the reliability design of microprocessor memory access units.
[0003] Since large-capacity on-chip memory access units typically occupy a significant proportion of the microprocessor area, reliability hardening designs for these units often employ EDAC (Error Detection and Correction) coding techniques to protect on-chip memory from SEU attacks. Considering the hardware complexity and latency of EDAC encoding and decoding implementations, linear block codes such as Hamming codes, Hsiao codes, and SEC-DED codes are commonly used in the reliability design of on-chip memory access. The latency and area overhead of EDAC-supporting memory access units are key concerns in their circuit architecture design. The goal is to minimize the hardware implementation area of the memory access units without affecting microprocessor performance and meeting reliability requirements, thereby reducing the overall area and power consumption of the microprocessor.
[0004] Microprocessor memory access units typically need to support multiple data access granularities to meet the needs of different applications and data interactions at different granularities in multi-core architectures. While controlling cost, this increases the difficulty of using EDAC check codes to uniformly protect data at multiple memory access granularities. For linear block codes commonly used for on-chip memory protection, the number of bits in the data check code increases according to the EDAC-protected data granularity m (m is a positive integer) at a ratio of O(n^2). The trend is increasing. For the same data capacity V, the increased redundant storage space for storing checksums is: That is, for the same data capacity, the smaller the memory access granularity m of EDAC protection, the greater the hardware overhead of EDAC circuit.
[0005] Without concern for EDAC overhead, some practitioners have proposed an intuitive and feasible scheme for implementing EDAC protection for multi-granularity memory accesses: EDAC encoding is performed according to the smallest data access granularity, and EDAC encoding for other granularities can be achieved by reusing multiple EDAC encoding modules of the smallest granularity. Therefore, the prerequisite for implementing this scheme is that all memory access granularities must be positive integer multiples of the smallest memory access granularity. However, this scheme mainly has two problems:
[0006] 1. Low storage efficiency; large redundant storage capacity for EDAC checksum information. Taking 8 bits as the smallest data access granularity and 128 bits as the largest data access granularity as an example. Using Hsiao code in linear grouping, EDAC is implemented in groups of 8 bits, i.e., according to Hsiao code (13,8). Every 8 bits of data requires an additional 5 bits of EDAC checksum. Encoding 128 bits of data with Hsiao code (13,8) requires an additional 80 bits of EDAC checksum; while implementing EDAC with Hsiao code (137,128) only requires an additional 9 bits of EDAC checksum for every 128 bits of data. That is, for the same memory access unit design capacity, encoding and decoding with the former requires an increase of 5 / 8 of the original storage capacity of the on-chip memory access unit for storing EDAC checksum, compared to the additional 9 / 128 of the latter. The redundant storage overhead of the former is significant, and the memory access unit storage efficiency is low.
[0007] 2. High hardware overhead; numerous encoding / decoding modules. Taking 8 bits as the minimum data access granularity and 128 bits as the maximum data access granularity as an example, if EDAC is performed with the minimum data granularity, to meet the maximum data granularity memory access bandwidth, 16 Hsiao code (13, 8) encoding / decoding modules (128 / 8) are needed. However, if EDAC is performed with the maximum data access granularity of 128 bits, only one Hsiao code (137, 128) encoding / decoding module is needed. Although the hardware implementation of a single encoding / decoding module with the minimum data access granularity is simpler and has lower hardware overhead, the number of encoding / decoding modules increases significantly compared to EDAC with a larger data granularity, resulting in a higher overall hardware overhead for EDAC.
[0008] Another approach proposed by industry professionals is the adaptive EDAC scheme, which achieves fault tolerance for multi-granularity data by adaptively adjusting the EDAC encoding length. While the adaptive EDAC scheme effectively reduces dynamic power consumption compared to the aforementioned EDAC scheme with minimum memory access granularity, the adaptive nature increases the hardware control logic overhead for selecting different EDAC encoding / decoding modules. Furthermore, multiple EDAC encoding / decoding modules in the hardware cannot avoid the problem of large EDAC hardware area overhead. Summary of the Invention
[0009] The technical problem to be solved by this invention is: in view of the technical problems existing in the prior art, this invention provides a software and hardware co-operated EDAC memory access unit implementation method that is simple in principle, easy to implement, and can improve the reliability of on-chip memory while reducing overhead.
[0010] To solve the above-mentioned technical problems, the present invention adopts the following technical solution:
[0011] A hardware-software co-operated EDAC (Extended Memory Access Unit) implementation method includes:
[0012] Store accessed data and its checksum;
[0013] Generate memory access data mask: Generate a memory access data mask based on the granularity of data memory access;
[0014] Address resolution and addressing: Resolve and address the redundant memory for incoming read or write memory access requests;
[0015] EDAC checksum generation: EDAC checksum generation is performed before write access data is written to memory;
[0016] Decoding and error correction of read access data: During read access operations, the read request is read from the redundant memory according to the granularity of the read access data mask, and the corresponding data is decoded and corrected by EDAC according to the n-bit data granularity.
[0017] After decoding and correcting the read request, the data is selected according to the granularity of the memory access data.
[0018] As a further improvement to the method of the present invention: during the storage of memory access data and its check code, the memory body is organized by high-order address interleaving or low-order address interleaving to improve data access bandwidth and realize data parallel operation, and read and write memory access addressing is realized by memory access address; during memory access operation, memory access request, read / write type, memory access address, and data memory access granularity will enter the EDAC circuit through the memory access pipeline.
[0019] As a further improvement to the method of the present invention: in the process of generating memory access data masks, for large memory access granularity, each n-bit data granularity corresponds to a 1-bit mask; for small memory access granularity, the n-bit data in which the small memory access granularity data is located corresponds to a 1-bit mask.
[0020] As a further improvement to the method of the present invention: each bit of the memory access data mask controls the enabling or disabling of a group of n-granularity data, wherein memory access granularity less than n granularity is counted as 1 n granularity; the write memory access data mask enables... Bit write select bit, controls Granularity data and its corresponding encoded k checksums Right now Bit data is written to redundant memory addressed by the memory access address; the memory access data mask is enabled. Bit read select bit, controls reading from redundant memory. Bit read data.
[0021] As a further improvement to the method of the present invention: during the EDAC check code generation process, the write data of the write request is divided into granularity n, and each n-granularity data is EDAC encoded to generate an EDAC check code check_bits corresponding to each n-granularity data to realize error detection and correction protection for the read data during read memory access; while selecting k n-granularity write memory access data according to the memory access data mask and writing them to the target address of the redundant memory, the generated k check codes are written to the corresponding address of the redundant memory.
[0022] As a further improvement to the method of the present invention: during the address resolution and addressing process, the redundant memory access ports for read and write memory access request operations are specifically located.
[0023] As a further improvement to the method of the present invention: a macro definition technique is adopted for small-granularity write data, which concatenates the small-granularity data of the small-granularity write request into the defined n-granularity data and then performs EDAC encoding; the macro definition technique is used to realize the re-encapsulation of small-granularity write memory access.
[0024] As a further improvement to the method of the present invention: the processing flow for sparse, small-granularity write data includes:
[0025] Step S1: Read the n-granularity data A and its checksum from the address corresponding to the small-granularity write request from the redundant memory, perform EDAC decoding and error correction, and then write it into the general-purpose register; this operation can be achieved through a single read-to-memory operation. The hardware can automatically align the address, eliminating the need to modify the address of the read request.
[0026] Step S2: Align the small-granularity write data B with the n-granularity data based on the memory access address offset and shift operation;
[0027] Step S3: Delete the old small-granularity data in the corresponding small-granularity data B position in granularity A to prepare for writing small-granularity B;
[0028] Step S4: Complete the concatenation of small-granularity B data and n-granularity A data to generate n-granularity data A' to be written;
[0029] Step S5: Encode the newly spliced n-granularity data A' using EDAC and write it into the same address space of the redundant memory; this operation can be achieved through a single write access operation.
[0030] As a further improvement to the method of the present invention: both steps S1 and S5 are implemented through a single write memory access operation.
[0031] As a further improvement to the method of the present invention: the processing flow for high-density, continuous, small-granularity write data includes:
[0032] Step S10: Based on the memory access address offset and shift operation, perform an alignment operation on the write data of p consecutive small-granularity write requests; where p is the granularity n divided by the size of a certain small granularity.
[0033] Step S20: Using the OR operation, concatenate the aligned p data points to obtain a complete n-granularity data A';
[0034] Step S30: Combine p small-granularity write requests into an n-granularity write request, perform EDAC encoding on A' and write it to redundant memory.
[0035] Compared with the prior art, the advantages of the present invention are as follows:
[0036] 1. The hardware-software co-operational EDAC implementation method for memory access units of the present invention is simple in principle, easy to implement, and can improve the reliability of on-chip memory while reducing overhead. The present invention significantly reduces the area of microprocessor memory access units, thereby effectively reducing the overall area and power consumption of the microprocessor. That is, the present invention can implement the EDAC function of commonly used memory access granularity n with low overhead while significantly reducing hardware costs, and can also provide the same error correction capability for all small-granularity data accesses, realizing comprehensive and consistent reliability design and protection for multi-granularity memory access data of microprocessor on-chip memory.
[0037] 2. The hardware-software co-operated EDAC implementation method of the present invention is simple to implement and has low hardware overhead. The present invention uses the larger data memory access granularity commonly used in microprocessors to construct the EDAC hardware circuit. Compared with existing EDAC schemes that support small-granularity data memory access, it not only significantly reduces the EDAC check code capacity and greatly reduces the memory area, but also has fewer EDAC encoding and decoding circuit modules, further reducing hardware overhead; it can significantly reduce the memory access unit area and power consumption, effectively reducing the overall area of the microprocessor.
[0038] 3. The hardware-software co-operational EDAC implementation method for memory access units of this invention achieves data protection for small-granularity write requests of the memory access unit through software macro definition programming. The software part employs macro definition technology to re-encapsulate small-granularity write operations into read-then-granularity data operations, and concatenates the small-granularity data into the n-granularity data to achieve unified encoding for EDAC. The macro definitions provide users with a convenient calling interface; users do not need to understand the internal implementation of the macros and can call them as needed, achieving a user-friendly interactive experience.
[0039] This paper provides a hardware-software co-operated EDAC (Extended Dedicated Access Controller) circuit and its implementation method, which, while meeting the reliability design requirements for error correction capability, can...
[0040] 4. The hardware-software co-operated EDAC implementation method of the present invention is a hardware-software co-operated EDAC implementation method. The entire EDAC circuit is simple to implement, has good portability, and is easy to apply to the reliability design of on-chip memory in various microprocessors. The present invention can achieve high storage efficiency, small hardware area overhead, and support for multi-granularity memory access and data integrity protection. Attached Figure Description
[0041] Figure 1 This is a flowchart illustrating the method of the present invention.
[0042] Figure 2 This is a schematic diagram illustrating the principle of macro definition operations in a specific application example of the present invention.
[0043] Figure 3 This is a schematic diagram of the EDAC circuit architecture and simulated data flow formed in a specific application example of the present invention.
[0044] Figure 4 This is a schematic diagram illustrating a software macro definition example in a specific application instance of the present invention. Detailed Implementation
[0045] The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
[0046] The microprocessor memory access unit addressed in this invention aims to implement memory access operations at multiple data granularities and support EDAC reliability design for the larger memory access granularity n; that is, the multiple data memory access granularities supported by the memory access unit include two types of data memory access granularities:
[0047] One type is various small memory access granularities less than n bits;
[0048] Another type is various large memory access granularities that are multiples of n by a positive integer k (where k is a positive integer, equal to 1, 2, 3, ...).
[0049] Furthermore, the reliability design requirements of the memory access unit necessitate error detection and correction at a relatively large memory access granularity of n bits. Therefore, for this type of memory access unit, regardless of which memory access granularity is used, it must support error detection and correction capabilities at a data granularity of no more than n bits in order to meet its reliability design requirements.
[0050] A memory access unit typically includes multi-granularity data write and read memory access pipelines. The memory access pipeline mainly implements instruction decoding, address calculation, memory access decoding, data writing or data reading, data selection, and write-back.
[0051] This invention mainly relates to the field of data access reliability design. It addresses the reliability design requirements of microprocessor memory access units for error detection and correction (EDAC) at a large data access granularity. By adopting a hardware and software co-operation approach, it can achieve comprehensive protection of memory access units for multiple data granularities with low hardware overhead.
[0052] like Figure 1 As shown, the present invention provides a hardware-software co-operated EDAC memory access unit implementation method, which includes:
[0053] Store accessed data and its checksum;
[0054] Generate memory access data mask: Generate a memory access data mask based on the granularity of data memory access;
[0055] Address resolution and addressing: Resolve and address the redundant memory for incoming read or write memory access requests;
[0056] EDAC checksum generation: EDAC checksum generation is performed before write access data is written to memory;
[0057] Decoding and error correction of read access data: During read access operations, the read request is read from the redundant memory according to the granularity of the read access data mask, and the corresponding data is decoded and corrected by EDAC according to the n-bit data granularity.
[0058] After decoding and correcting the read request, the data is selected according to the granularity of the memory access data.
[0059] In specific application examples, during the storage of memory access data and its checksum, the memory is organized by interleaving high-order or low-order addresses to improve data access bandwidth and achieve parallel data operations, and read and write memory access addressing is implemented by memory access address; during memory access operations, memory access request, read / write type, memory access address, and data memory access granularity will enter the EDAC circuit through the memory access pipeline.
[0060] In specific application examples, during the process of generating memory access data masks, for large memory access granularity, each n-bit data granularity corresponds to a 1-bit mask; for small memory access granularity, the n-bit data granularity containing the small memory access granularity corresponds to a 1-bit mask.
[0061] In a specific application example, each bit of the memory access data mask controls the enabling or disabling of a group of n-granularity data, where memory access granularity less than n granularity is counted as 1 n granularity; the write memory access data mask enables... Bit write select bit, controls Granularity data and its corresponding encoded k checksums Right now Bit data is written to redundant memory addressed by the memory access address; the memory access data mask is enabled. Bit read select bit, controls reading from redundant memory. Bit read data.
[0062] In a specific application example, during the EDAC checksum generation process, the write data of the write request is divided into granularities n, and each n-granularity data is EDAC encoded to generate an EDAC checksum check_bits corresponding to each n-granularity data to achieve error detection and correction protection for the read data during read memory access; while selecting k n-granularity write memory access data according to the memory access data mask and writing them to the target address of the redundant memory, the generated k checksums are written to the corresponding address of the redundant memory.
[0063] In specific application examples, during the address resolution and addressing process, the redundant memory access ports for read and write memory access request operations are specifically located.
[0064] In specific application examples, macro definition technology is used for small-granularity write data. The small-granularity data of the small-granularity write request is concatenated into the defined n-granularity data and then EDAC encoded. The macro definition technology is used to realize the re-encapsulation of small-granularity write memory access.
[0065] This invention further addresses EDAC reliability design for small-granularity write data by employing macro definition technology. The small-granularity data of the small-granularity write request is concatenated into defined n-granularity data and then EDAC encoded, thereby achieving EDAC reliability design for small-granularity data.
[0066] See Figure 2 The macro definition above is used to re-encapsulate small-granularity write accesses, replacing small-granularity write access operations with the following steps:
[0067] Step S1: Read the n-granularity data A and its checksum from the address corresponding to the small-granularity write request from the redundant memory, perform EDAC decoding and error correction, and then write it into the general-purpose register; this operation can be achieved through a single read-to-memory operation. The hardware can automatically align the address, eliminating the need to modify the address of the read request.
[0068] Step S2: Align the small-granularity write data B with the n-granularity data based on the memory access address offset and shift operation;
[0069] Step S3: Delete the old small-granularity data in the corresponding small-granularity data B position in granularity A to prepare for writing small-granularity B;
[0070] Step S4: Complete the concatenation of small-granularity B data and n-granularity A data to generate n-granularity data A' to be written;
[0071] Step S5: Encode the newly spliced n-granularity data A' using EDAC and write it into the same address space of the redundant memory; this operation can be achieved through a single write access operation.
[0072] The above operations are used to append small-granularity write request data into the n-granularity data, and replace small-granularity write operations with n-granularity read and write memory access operations, thereby achieving unified EDAC encoding protection.
[0073] In general high-performance computing and other applications, small-granularity write operations are relatively sparse. Therefore, the time delay introduced by extending small-granularity writes to the above five steps is acceptable. Furthermore, to ensure the comprehensiveness and completeness of this method, this invention not only considers the currently existing sparse small-granularity write scenario but also takes into account the extreme case of a large number of continuous small-granularity writes. For high-density continuous small-granularity write operations, this invention also proposes an optimization scheme for the above macro definition. The optimized operation steps are as follows:
[0074] Step S10: Based on the memory access address offset and shift operation, perform an alignment operation on the write data of p consecutive small-granularity write requests. Here, p is the granularity n divided by the size of a certain small granularity.
[0075] Step S20: Using the OR operation, concatenate the aligned p data points to obtain a complete n-granularity data A';
[0076] Step S30: Combine p small-granularity write requests into an n-granularity write request, perform EDAC encoding on A' and write it to redundant memory.
[0077] Compared to the original macro, the optimized macro reduces one read memory access operation and one old data deletion operation. Furthermore, by leveraging the contiguous address memory access characteristic, it merges multiple write memory access requests, further reducing the number of memory accesses. In addition, the finer the granularity of contiguous write operations, the more requests can be merged, thus improving the macro's performance.
[0078] A low-hardware-overhead EDAC circuit was designed for the memory access unit to achieve n-bit data granularity error detection and correction capabilities. In a specific application example, after adopting the above-described method of this invention, an EDAC circuit was formed, see [link to relevant documentation]. Figure 3 The EDAC circuit mainly includes a redundant memory, a memory access decoding module, an EDAC encoding module, an EDAC decoding and error correction module, and a readout data selection module; among which:
[0079] Redundant memory is used to store accessed data and its checksum.
[0080] In practical applications, memory is generally organized by interleaving high-order or low-order addresses to improve data access bandwidth and enable parallel data operations, and read and write memory access addresses are implemented based on the memory access address.
[0081] In practical applications, during memory access operations, the memory access request, read / write type, memory access address, and data access granularity will enter the EDAC circuit through the memory access pipeline.
[0082] The EDAC circuit generates a memory access data mask based on the data access granularity: for large memory access granularity, each n-bit data granularity corresponds to a 1-bit mask; for small memory access granularity, the n-bit data containing the small memory access granularity corresponds to a 1-bit mask. This memory access data mask is used to selectively enable or disable specific bits for reading or writing data, allowing for selective reading or writing based on the memory access granularity. Each bit of the memory access data mask controls the enabling or disabling of a group of n-bit granularity data (memory granularity less than n is counted as one n-bit granularity). The write memory access data mask enables... Bit write select bit, controls Granularity data and its corresponding encoded k checksums Right now Bit data is written to redundant memory addressed by the memory access address; the memory access data mask is enabled. Bit read select bit, controls reading from redundant memory. Bit read data.
[0083] The memory access decoding module is used to resolve the memory access address and address the redundant memory for the incoming read or write memory access request, specifically locating the redundant memory access port for the read or write memory access request operation.
[0084] The EDAC encoding module is used to implement write memory access data encoding. Specifically, it generates EDAC checksums for the write data before it is written to memory. The write request data is divided into granularities n, and each n-granularity data is EDAC encoded to generate corresponding EDAC checksums (check_bits) for each n-granularity data, thus enabling error detection and correction protection during read memory access. Simultaneously, based on the memory access data mask, k n-granularity write memory access data are selected and written to the target address of the redundant memory, while the generated k checksums are written to the corresponding address in the redundant memory.
[0085] The EDAC decoding and error correction module is used to decode and correct read / memory access data. Specifically, during a read / memory access operation, the read request is processed according to the read / memory access data mask. Granularity: Read data from redundant memory and perform EDAC decoding and error correction on the corresponding data at an n-bit data granularity.
[0086] The read data selection module is used to select data according to the memory access data granularity after decoding and correcting the read request.
[0087] The aforementioned EDAC circuit can effectively encode large memory access granularity write request data of granularity n and its k-fold size, and write the write data and its corresponding checksum into redundant memory; simultaneously, for small granularity read access requests, it first writes the n-granularity data and its checksum (... The entire data is read out and EDAC operation is performed. Then, the data selection module further selects and outputs small-granular data according to the memory access granularity. Therefore, this circuit supports the correct reading of data with granularity n, multiples of n, and any granularity less than n, and decodes and corrects the read data in units of n-bit data granularity.
[0088] However, for small-granularity write access requests smaller than granularity n, the macro definition technique described above in this invention is used.
[0089] The invention will be described in detail below with reference to a specific application example.
[0090] The microprocessor memory access unit addressed in this invention aims to implement one or more small-granularity memory access operations of less than n bits, and to support several large-granularity memory access operations equal to a positive integer k times n, where the maximum value of k is M. The granularity is the maximum memory access bandwidth supported by the memory access unit; and the microprocessor requires EDAC reliability design to be implemented with a larger memory access granularity n.
[0091] Figure 3 This is the hardware implementation of the EDAC circuit of the present invention. The EDAC circuit includes a memory access decoding module, M n-granularity EDAC encoding modules, M n-granularity EDAC decoding and error correction modules, redundant memory, and readout data selection module.
[0092] The blue arrows in the diagram indicate the write operation data flow for granularity n and its multiples. Write requests, write data, write addresses, and write data granularity are input into the EDAC circuit via the memory access pipeline. First, the memory access decoding module obtains the corresponding port number of the redundant memory and the write access data mask (hereinafter referred to as the write mask). The write request, write address, and write mask are then connected to the corresponding port, and simultaneously, the write request and k n-granularity write data are passed to the corresponding k n-granularity EDAC encoding modules to obtain the encoded data. Write data and connect to the data port of the redundant memory. The redundant memory finds the storage location corresponding to the write address based on the write request from the input terminal, and uses the write mask to select k data items and their check codes from M and write them into the redundant memory.
[0093] The green arrows in the diagram indicate the data flow for read operations at arbitrary memory access granularity. The read request, read address, and read data granularity are input into the EDAC circuit via the memory access pipeline. First, the memory access decoding module obtains the corresponding port number of the redundant memory and the read access data mask (hereinafter referred to as the read mask). Then, the read request, read address, and read mask are input to the corresponding port of the redundant memory, while the read request and read data granularity are input to the read data selection module. The redundant memory locates the memory location corresponding to the read address based on the read request, reads k n-granularity data points and their checksums based on the read mask, and outputs them to the corresponding n-granularity EDAC decoding and error correction modules. These n-granularity EDAC decoding and error correction modules decode and correct the input data at granularity n to obtain n-granularity data, and output the data to the read data selection module. The read data selection module selects and outputs the input data based on the read request and read data granularity.
[0094] The orange arrows in the diagram indicate the data flow of a small-granularity write operation. First, the small-granularity write operation is converted into an n-granularity read operation. Following the green arrows, the n-granularity data corresponding to the small-granularity write address space is read from the redundant memory and cached in a general-purpose register. Then, the small-granularity data is concatenated into the read n-granularity data using arithmetic units such as the ALU. The new n-granularity data is then stored in the redundant memory corresponding to the small-granularity write address, following the blue arrows. Furthermore, the improved EDAC scheme of this invention supports concatenating multiple small-granularity data into a single n-granularity data, eliminating the need for n-granularity read operations and directly writing to the redundant memory corresponding to the address indicated by the blue arrows.
[0095] Figure 4 This is an example of a software macro definition for the present invention. Taking an n / 2 granularity macro definition as an example, Figure 4 (b) shows the code implementation for writing small-granularity data directly to redundant memory using EDAC encoding, which only requires two clock cycles. However, because the n-granularity EDAC circuit cannot correctly encode data smaller than n granularity, it results in… Figure 4 (b) The program encountered a memory access error. Figure 4 (a) in the example is an n / 2 granularity macro definition, whose internal logic mainly consists of the following 7 steps:
[0096] Step ①: Load the small-granularity write address into the scalar address register ar9;
[0097] Step ②: Calculate the write address offset based on the low-order bits of addr. This offset determines the position where the n / 2 granularity data is appended to the n granularity data, i.e., the high n / 2 portion or the low n / 2 portion. Simultaneously, read the n granularity data from the address space pointed to by ar9 into a general-purpose register.
[0098] Step ③: To improve the robustness of the macro, this step is added to prevent n / 2 granularity data from overflowing, so as to prevent contaminating other valid bits of the n granularity data to be spliced.
[0099] Step 4: Align the data with granularity n according to the address offset using a shift operation. If the offset is n / 2, the data needs to be shifted to the higher n / 2 position, with low bits padded with zeros; if the offset is 0, the data remains unchanged.
[0100] Step 5: Use the AND operation to clear the corresponding old data in the n-granularity data;
[0101] Step 6: Use the OR operation to concatenate the small-granularity data into the n-granularity data to obtain new n-granularity data;
[0102] Step 7: Encode the new n-granularity data after splicing using EDAC and write it back to the redundant memory at the original address location.
[0103] Figure 4 (a) compared to Figure 4 In (b), the absolute runtime of a single small-granularity write operation increases by 12 clock cycles. However, since small-granularity memory access operations account for a negligible proportion in general high-performance computing applications, and write memory access operations are even less frequent than read memory access operations, the 12-cycle overhead added by calling the macro is tolerable. Furthermore, when consecutive small-granularity write operations exist, the macro can utilize hardware pipelining to implement a certain degree of macro pipelining, thereby significantly reducing the overall latency of the macro call.
[0104] This invention also considers the extreme case of numerous contiguous addresses and consecutive requests with fine-grained writes, and proposes an optimized macro definition scheme for this situation, such as... Figure 4 Example (c) in the document provides users with flexible options. Figure 4 The optimized n / 2 granularity macro implementation in (c) mainly consists of the following five steps:
[0105] Step ①: Load the small-granularity write address into the scalar address register ar9;
[0106] Step ②: Move Data2 to the high n / 2 position using a shift operation;
[0107] Step ③: Prevent Data1 from overflowing and improve macro robustness;
[0108] Step 4: Concatenate Data1 and Data2 into a single data set with granularity n.
[0109] Step 5: Encode the spliced n-granularity data using EDAC and write it into the corresponding address space of ar9.
[0110] Figure 4 (c) is compared to two consecutive times. Figure 4 (b) The small-granularity write operation only adds 3 clock cycles of overhead; compared to two consecutive write operations... Figure 4 The (a) fine-grained write operation effectively reduces 21 clock cycles, significantly improving macro performance. Figure 4 Taking the n / 2 granularity macro definition as an example, we have also implemented all other small-granularity macro definitions, and the implementation ideas and steps are the same as... Figure 4 The consistency is consistent and will not be explained in detail. As a supplementary note, the registers used in macro definitions are not limited to... Figure 4 Example.
[0111] The above are merely preferred embodiments of the present invention. The scope of protection of the present invention is not limited to the above embodiments. All technical solutions falling within the scope of the present invention's concept are within the scope of protection of the present invention. It should be noted that for those skilled in the art, any improvements and modifications made without departing from the principles of the present invention should be considered within the scope of protection of the present invention.
Claims
1. A method for implementing an EDAC (Extended Memory Access Unit) with hardware-software co-operation, characterized in that, include: Store accessed data and its checksum; Generate memory access data mask: Generate a memory access data mask based on the granularity of data memory access; Address resolution and addressing: Resolve and address the redundant memory for incoming read or write memory access requests; EDAC checksum generation: EDAC checksum generation is performed before write access data is written to memory; Decoding and error correction of read access data: During read access operations, the read request is read from the redundant memory according to the granularity of the read access data mask, and the corresponding data is decoded and corrected by EDAC according to the n-bit data granularity. After decoding and correcting the read request, the data is selected according to the granularity of the memory access data. For small-granularity write data, a macro definition technique is used to concatenate the small-granularity data of the small-granularity write request into the defined n-granularity data, and then perform EDAC encoding. The macro definition technique is used to repackage small-granularity write accesses; The processing flow for sparse, fine-grained data writing includes: Step S1: Read the n-granularity data A and its check code from the redundant memory at the address corresponding to the small-granularity write request, perform EDAC decoding and error correction, and then write it into the general-purpose register; this operation can be achieved through a single read memory access operation; the hardware can automatically align the address without modifying the address of the read request. Step S2: Align the small-granularity write data B with the n-granularity data based on the memory access address offset and shift operation; Step S3: Delete the old small-granularity data in the corresponding small-granularity data B position in granularity A to prepare for writing small-granularity B; Step S4: Complete the concatenation of small-granularity B data and n-granularity A data to generate n-granularity data A' to be written; Step S5: Encode the newly spliced n-granularity data A' using EDAC and write it into the same address space of the redundant memory; this operation can be achieved through a single write access operation.
2. The hardware-software co-operated EDAC memory access unit implementation method according to claim 1, characterized in that, During the storage of memory access data and its checksum, the memory is organized by interleaving high-order or low-order addresses to improve data access bandwidth and enable parallel data operations. Read and write memory access addressing is implemented by memory access address. During memory access operations, memory access request, read / write type, memory access address, and data memory access granularity will enter the EDAC circuit through the memory access pipeline.
3. The hardware-software co-operated EDAC memory access unit implementation method according to claim 1, characterized in that, During the generation of memory access data masks, for large memory access granularity, each n-bit data granularity corresponds to a 1-bit mask; for small memory access granularity, the n-bit data granularity containing the small memory access granularity corresponds to a 1-bit mask.
4. The hardware-software co-operated EDAC memory access unit implementation method according to claim 3, characterized in that, Each bit of the memory access data mask controls the enabling or disabling of a group of n-granularity data, where memory access granularity less than n granularity is counted as 1 n granularity; the write memory access data mask enables... The write-select bit controls the k n-granularity data and their corresponding k encoded check codes. Right now Bit data is written to redundant memory addressed by the memory access address; the memory access data mask is enabled. Bit read select bit, controls reading from redundant memory. Bit read data.
5. The method for implementing the hardware-software co-operated EDAC memory access unit according to claim 1, characterized in that, During the EDAC checksum generation process, the write data of the write request is divided into granularities n, and each n-granularity data is EDAC encoded to generate an EDAC checksum check_bits corresponding to each n-granularity data to achieve error detection and correction protection for the read data during memory access; while selecting k n-granularity write memory access data according to the memory access data mask and writing them to the target address of the redundant memory, the generated k checksums are written to the corresponding address of the redundant memory.
6. The hardware-software co-operated EDAC memory access unit implementation method according to claim 1, characterized in that, During address resolution and addressing, the redundant memory access ports for read and write memory access requests are specifically located.
7. The hardware-software co-operated EDAC memory access unit implementation method according to claim 1, characterized in that, Both steps S1 and S5 are implemented through a single write memory access operation.
8. The hardware-software co-operated EDAC memory access unit implementation method according to claim 1, characterized in that, The processing flow for high-density, continuous, small-granularity write data includes: Step S10: Based on the memory access address offset and shift operation, perform an alignment operation on the write data of p consecutive small-granularity write requests; where p is the granularity n divided by the size of a certain small granularity. Step S20: Using the OR operation, concatenate the aligned p data points to obtain a complete n-granularity data A'; Step S30: Combine p small-granularity write requests into an n-granularity write request, perform EDAC encoding on A' and write it to redundant memory.