Hierarchical adder tree structure module with fine-grained exact reconfigurable approximation computation

By designing a fine-grained, precise, and reconfigurable hierarchical addition tree structure module, the problem of low energy efficiency in existing approximate adders with multiple data inputs is solved, and low-power, high-precision approximate addition calculations are achieved.

CN115220689BActive Publication Date: 2026-06-26NANJING RES INST OF ELECTRONICS TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NANJING RES INST OF ELECTRONICS TECH
Filing Date
2022-07-29
Publication Date
2026-06-26

Smart Images

  • Figure CN115220689B_ABST
    Figure CN115220689B_ABST
Patent Text Reader

Abstract

The application relates to a hierarchical addition tree structure module with fine-grained accurate reconfigurable approximate calculation, comprising a multi-data input module, an adder tree structure generation module, a calculation and result output module; the multi-data input module receives addends and defines approximate adder calculation, and receives user required precision configuration; the adder tree structure generation module input end receives addition numbers; after initialization, the number of addition tree layers will be transmitted to the approximate adder tree structure module to complete generation of the fine-grained accurate reconfigurable approximate adder module; the calculation and result output module performs approximate addition operation on the approximate adder generated in the previous stage to complete the final approximate calculation task, and controls the precision required by each layer in the approximate calculation process. The application realizes approximate addition calculation on a data vector, higher energy efficiency and appropriate precision, and solves the problem that the existing approximate adder configuration scheme cannot cope with data vector calculation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of integrated circuit design and manufacturing technology, and in particular to a hierarchical addition tree structure module with fine-grained, accurate, and reconfigurable approximate calculation. Background Technology

[0002] Functional accuracy is a primary requirement for traditional arithmetic circuits. However, for some applications, arithmetic processing can be performed on an "inaccurate" or "approximate" basis, and approximate arithmetic circuits have been widely considered for fault-tolerant applications. To improve adder performance, approximate computation incorporates accuracy into a trade-off, sacrificing some accuracy for a significant performance boost. Introducing approximate designs into adders can reduce area and power consumption within a tolerable range of accuracy loss.

[0003] Some typical approximate adders include low-order OR gate adders, fault-tolerant adders, precision-configurable adders, carry-predictive selection adders, and carry-skip adders. These approximate adders are all used for multi-bit computations, performing precise calculations on bits with higher weights, while using approximate methods to calculate bits with lower weights.

[0004] However, previous approximate adder architectures used 16-bit single data inputs. Accelerating convolution computations in CNNs typically requires 3x3 convolution operations, necessitating 9x16-bit data vector computations. Therefore, this application aims to design a hierarchical adder tree structure with fine-grained, accurate, and reconstructable approximate computation to compute data vectors with multiple data inputs, achieving higher energy efficiency. Summary of the Invention

[0005] To address the existing technical problems, this invention provides a hierarchical addition tree structure module with fine-grained, accurate, and reconfigurable approximate calculations.

[0006] The specific content of the present invention is as follows: a hierarchical adder tree structure module with fine-grained, accurate, and reconfigurable approximate calculation, including a multi-data input module, an adder tree construction and generation module, and a calculation and result output module;

[0007] The input terminal A of the multi-data input module receives n1 addends and defines the approximate adder calculation as the addition of n1 numbers. Its input terminal B receives the user's required precision configuration n2 and transmits it to the precision control module.

[0008] The adder tree construction and generation module receives n1 addends from the input module and passes them through. Its adder generation process calls both the approximate adder calculation unit library and the full-precision adder calculation unit library. Its initialization process configures the adder tree layer number based on the number of addends input by the user. The initialized adder tree layer number is then transmitted to the approximate adder tree construction module, completing the generation of a fine-grained, accurate, and reconfigurable approximate adder module.

[0009] The calculation and result output module performs approximate addition operations on the approximate adder generated in the previous stage to complete the final approximate calculation task. At the same time, this module transmits the precision configuration requirements in the input signal to the precision control module to reasonably control the precision required by each layer in the approximate calculation process. Finally, the output link completes the data output.

[0010] Furthermore, the multiple data input module is a 16-bit data input module, with input terminal A receiving n1 16-bit addends.

[0011] Furthermore, the 16-bit multi-data input module can receive 16-bit input data, determine the number of data points, receive input data, and obtain the precision configuration required by the user.

[0012] Furthermore, the 16-bit multiple data input module's data processing steps include:

[0013] Step 101: Define the number of data points in the input data vector, and initialize the system's calculation bit unit based on the number of input data points n1;

[0014] Step 102: Input the precision configuration data n2 as the control signal for subsequent calculations to complete the precision control signal initialization;

[0015] Step 103: Precision compliance check. The required precision configuration n2 is checked to see if it meets the data bit width requirements. If it does, it will be passed to the next level. If it does not meet the requirements, it will be required to re-enter the data.

[0016] Step 104: The final integration module will integrate the data that has been processed and output it to the next level system.

[0017] Furthermore, the adder tree construction generation module includes:

[0018] Single-layer adder structure module: The single-layer adder structure is used for the generation of adders at each level of the adder tree;

[0019] The addition tree generation module is based on computational requirements: the addition tree generation module configures the number of addition tree layers according to the number of addends to be calculated; the addition tree layers handle the accumulation of multi-digit numbers, and operands and intermediate calculation results can be added in pairs;

[0020] Approximate Adder Generation Module: This unit configures adders in the single-layer adder structure module at each layer of the addition tree obtained by the addition tree generation module based on computational requirements. Each layer is configured with both approximate adders and full-precision adders. The precision control module controls which adder module is used in the final calculation at each layer, thus completing the generation of an accurate and reconfigurable approximate computation adder.

[0021] Furthermore, in the single-layer adder structure module, four types of adders are required for subsequent adder generation: the first adder structure is a full-precision adder FA; the second adder structure is AXA1, with the low-order bits consisting of OR gate adders and the high-order bits consisting of reconfigurable ORA and FA; the third adder structure is AXA2, with the low-order bits consisting of OR gate adders and the high-order bits consisting of reconfigurable TGA2 and FA.

[0022] Furthermore, based on computational requirements, the addition tree generation module processes the accumulation of multi-digit numbers in a hierarchical manner. Operands and intermediate calculation results can be added in pairs. Except for the last operand, all operands are added in pairs in the first level. The last operand needs to be added to the accumulation results of all operands before the last level.

[0023] Furthermore, the adder tree construction and generation module includes the following data processing steps:

[0024] Step 201: The adder number generation module receives the number of input data passed from the previous level to this level. First, it sets the number of levels in the adder tree.

[0025] Step 202: Preload the single-layer adder structure modules to be used according to the number of layers of the generated adder tree, including full-precision adders FA, AXA1 and AXA2;

[0026] Step 203: Generate an approximate adder, each layer containing a single-layer FA full-precision adder and a single-layer approximate adder, as well as a precision control block that requires precision input to select whether to use FA or approximate addition during calculation; in the reconfigurable part of adjacent layers, a structure is used alternately for error compensation using TGA2 with positive error and ORA with negative error.

[0027] Furthermore, the calculation and result output module includes:

[0028] The approximate calculation unit module uses the adder generated in the previous-level approximate adder generation module to complete the approximate addition calculation;

[0029] The precision control module sets the selection of adders for each layer of the approximate adders in the calculation process according to the required precision of the input.

[0030] The approximate calculation result output module is interconnected with other subsequent circuits to complete data transmission; it has basic logic transmission and level isolation functions.

[0031] Furthermore, the data processing steps in the calculation and result output module include:

[0032] Step 301: Receive the data and the generated approximate adder from the first two modules, and prepare for calculation;

[0033] Step 302: Perform parallel computation, during which the precision configuration n2 of the previous stage is called to select the adders in each layer of the approximate adder;

[0034] Step 303: The synchronous timing control module detects the timing of the output signal to ensure that the data output of the current cycle has been completed before writing the data of the next cycle, so as to avoid logical conflicts;

[0035] Step 304: The final output module will output the data from the final calculation to the final peripheral unit.

[0036] This invention generates an approximate adder for calculating data vectors within the allowable error range, based on the approximate accuracy requirements in approximate calculation technology. This achieves approximate addition calculations for data vectors, resulting in higher energy efficiency and appropriate accuracy, and solves the problem that existing approximate adder configurations cannot handle data vector calculations. Attached Figure Description

[0037] The specific embodiments of the present invention will be further explained below with reference to the accompanying drawings.

[0038] Figure 1 This is an overall architecture diagram of a hierarchical addition tree structure module with fine-grained, accurate, and reconfigurable approximate calculation according to the present invention;

[0039] Figure 2 This is the 3*3 nine-input adder tree structure in the adder tree construction generation module of this invention;

[0040] Figure 3 This is the 5*5 twenty-five input adder tree structure in the adder tree construction generation module of the present invention;

[0041] Figure 4 This is the adder in the single-layer adder structure module of the present invention. Detailed Implementation

[0042] Combination Figure 1This invention, based on existing approximation calculation modules, newly designs a hierarchical adder tree configuration scheme with fine-grained, accurate, and reconfigurable approximation calculation. It generates a reconfigurable multi-layered approximation adder, performing approximate addition operations on the data vectors to be calculated, ensuring that the error of the approximation addition is within an acceptable range, thus greatly improving the reliability of the approximation scheme. Specifically, this invention includes a 16-bit multi-data input module, an adder tree construction and generation module, and a calculation and result output module, wherein:

[0043] The 16-bit multiple data input module has input A receiving n1 16-bit addends and defining the approximate adder calculation as the addition of n1 numbers; its input B receiving the user's required precision configuration n2 and transmitting it to the precision control module.

[0044] The adder tree construction and generation module receives n1 16-bit addends from the input module. Its adder generation process calls both the approximate adder computation unit library and the full-precision adder computation unit library. Its initialization process configures the adder tree layer number based on the number of addends input by the user. The initialized adder tree layer number is then transferred to the approximate adder tree construction module, completing the generation of a fine-grained, accurate, and reconfigurable approximate adder module.

[0045] The calculation and result output module performs approximate addition operations on the approximate adder generated in the previous stage to complete the final approximate calculation task. At the same time, this module transmits the precision configuration requirements in the input signal to the precision control module to reasonably control the precision required by each layer in the approximate calculation process. Finally, the output link completes the data output.

[0046] The 16-bit multiple data input module has the functions of receiving 16-bit input data and determining the number of data, as well as receiving input data and obtaining the precision configuration required by the user.

[0047] The adder tree construction generation module includes:

[0048] First, the single-layer adder structure module is used for generating adders at each level of the adder tree. This module contains four types of adders required for subsequent adder generation. The first adder structure is a full-precision adder FA; the second adder structure is AXA1, with the low-order bits consisting of an OR gate adder (ORA) and the high-order bits consisting of a reconfigurable ORA and FA (RA1); the third adder structure is AXA2, with the low-order bits consisting of an OR gate adder (ORA) and the high-order bits consisting of a reconfigurable TGA2 and FA (RA2). For example... Figure 4 The image shows the adder in the single-layer adder structure module of this invention.

[0049] Second, an addition tree generation module is included based on computational requirements. This module configures the number of addition tree levels according to the number of addends to be calculated. The addition tree processes multi-digit additions hierarchically, with operands and intermediate calculation results added in pairs. Except for the last operand, all operands are added pairwise in the first level. The last operand is added to the sum of all operands preceding it in the last level. The addition tree employs a parallel structure, which can be used as a pipelined computation structure with reduced transistor count and to prevent data overflow during accumulation. Errors can be reduced by increasing the bit width of each adder by 1. Figure 2 and Figure 3 The above are the 3*3 nine-input and 5*5 twenty-five-input adder trees respectively. The nine-input adder tree is divided into 4 layers of processing. The operand [9] is added to the cumulative result 3#1 of all the operands in the 3rd layer. The twenty-five-input adder tree is divided into 5 layers of processing. The operand

[25] is added to the intermediate calculation result 3#1 of the previous layer in the 3rd layer.

[0050] Third, the approximate adder generation module. This unit configures adders in the single-layer adder structure module at each layer based on the addition tree generation module obtained from the addition tree generation module based on computational requirements. Each layer is configured with both approximate adders and full-precision adders. The precision control module controls which adder module is used in the final calculation at each layer, thus completing the generation of an accurate and reconfigurable approximate computation adder.

[0051] The calculation and result output module includes:

[0052] First, the approximate calculation unit module. This module uses the adder generated in the previous-level approximate adder generation module to complete the approximate addition calculation.

[0053] Second, the precision control module. This module sets the adder selection for each layer of the approximate adder in the calculation process according to the required precision of the input.

[0054] Third, the approximate calculation result output module. This module is used to interconnect with other subsequent circuits to complete the data transmission function; it has basic logic transmission and level isolation functions.

[0055] The working process of each module is as follows:

[0056] The workflow of the 16-bit multiple data input module specifically includes steps 101 to 104, as follows:

[0057] Step 101: Define the number of data points in the input data vector, and initialize the system's bit-number unit based on the number of input data points n1.

[0058] Step 102: Input the precision configuration data n2 as the control signal for subsequent calculations to complete the precision control signal initialization.

[0059] Step 103: Precision compliance check. The required precision configuration n2 is checked to see if it meets the data bit width requirements. If it does, it will be passed to the next level. If it does not meet the requirements, it will be required to re-enter.

[0060] Step 104: The final integration module will integrate the data that has been processed and output it to the next level system.

[0061] The specific steps in the adder tree construction module's workflow include steps 201 to 205, as follows:

[0062] Step 201: The adder number generation module receives the number of input data passed from the previous level to this level. First, it sets the number of levels in the adder tree.

[0063] Step 202: Preload the single-layer adder structure modules to be used according to the number of layers of the generated adder tree, including full-precision adders FA, AXA1 and AXA2.

[0064] Step 203: Generate approximate adders. Each layer contains a single-layer FA full-precision adder and a single-layer approximate adder, as well as a precision control block that requires precision input to select whether to use FA or approximate addition during computation. In the reconfigurable parts of adjacent layers, a structure using positive-error TGA2 and negative-error ORA is used alternately for error compensation. Taking a two-layer accumulation as an example, it was first determined that RA1 in the first layer AXA1 uses TGA2 with positive error. Then, by searching and using Cartesian genetic programming (CGP) and a multi-objective genetic algorithm, a library of 430 approximate 8-bit adders was automatically generated to obtain the negative-error approximate adder that best matches TGA2. Experimental results show that using ORA for RA2 in the second layer AXA2 minimizes the structural error of the two-layer accumulation. Using AXA1 in odd-numbered layers and AXA2 in even-numbered layers minimizes the final error.

[0065] The workflow of the calculation and result output module specifically includes steps 301 to 305, as follows:

[0066] Step 301: Receive the data and the generated approximate adder from the first two modules, and prepare to perform calculations.

[0067] Step 302: Perform parallel computation, during which the previous precision configuration n2 is called to select the adders in each layer of the approximate adder.

[0068] Step 303: The synchronous timing control module detects the timing of the output signal to ensure that the data output of the current cycle has been completed before writing the data for the next cycle, thus avoiding logical conflicts.

[0069] Step 304: Final output module. The data from the final calculation will be output to the final peripheral unit via this module.

[0070] The adder disclosed in this invention possesses the functions of approximate calculation of data vectors and generation of multi-level parallel adder trees. Its basic function is to achieve low-power, high-precision approximate addition calculations, specifically implementing the following functions: preprocessing the input data vector to obtain the number of 16-bit data points in the input data vector; setting the number of levels in the approximate adder tree based on the number of input data points; constructing a multi-level reconfigurable adder tree structure based on the single-level approximate calculation module to complete the approximate adder generation, and outputting the final calculation result data. Simultaneously, during the calculation process, the use of approximate calculation in each level of the approximate calculation adder is selected according to the accuracy configuration requirements to meet the accuracy requirements.

[0071] In the adder tree construction process, this application selects the most suitable reconfigurable single-layer approximate adder module configuration for each layer based on preliminary experimental results, thus completing the configuration of the approximate adder circuit. By using approximate adders with positive and negative errors for odd and even layers, the calculation error of each layer can be minimized. Simultaneously, the selection between approximate calculation and full-precision adders is achieved through a control unit, configured according to the required accuracy. This approach balances the accuracy requirements of approximation during calculation while also reducing power consumption and improving energy efficiency.

[0072] Many specific details have been set forth in the foregoing description to provide a thorough understanding of the present invention. However, the above description is merely a preferred embodiment of the present invention, and the present invention can be implemented in many other ways different from those described herein. Therefore, the present invention is not limited to the specific embodiments disclosed above. Furthermore, any person skilled in the art can make many possible variations and modifications to the technical solutions of the present invention, or modify them into equivalent embodiments, using the methods and techniques disclosed above, without departing from the scope of the present invention. Any simple modifications, equivalent changes, and modifications made to the above embodiments based on the technical essence of the present invention, without departing from the content of the present invention, shall still fall within the protection scope of the present invention.

Claims

1. A hierarchical addition tree structure module with fine-grained, accurate, and reconfigurable approximate computation, characterized in that: It includes a multi-data input module, an adder tree construction and generation module, and a calculation and result output module; The input terminal A of the multi-data input module receives n1 addends and defines the approximate adder calculation as the addition of n1 numbers. Its input terminal B receives the user's required precision configuration n2 and transmits it to the precision control module. The adder tree construction and generation module receives n1 addends from the multiple data input module. Its adder generation process calls both the approximate adder calculation unit library and the full-precision adder calculation unit library. Its initialization process configures the adder tree layer number based on the user-input addend number. The initialized adder tree layer number is then transmitted to the approximate adder tree construction module, completing the generation of a fine-grained, accurate, and reconfigurable approximate adder module. The calculation and result output module performs approximate addition operations on the approximate adder generated in the previous stage to complete the final approximate calculation task. At the same time, this module transmits the precision configuration requirements in the input signal to the precision control module to control the precision required by each layer in the approximate calculation process. Finally, the output link completes the data output. The multiple data input module is a 16-bit data input module, and input terminal A receives n1 16-bit addends; The 16-bit multiple data input module can receive 16-bit input data, determine the number of data, receive input data, and obtain the precision configuration required by the user. The data processing steps of the 16-bit multiple data input module include: Step 101: Define the number of data points in the input data vector, and initialize the system's calculation bit unit based on the number of input data points n1; Step 102: Input the precision configuration data n2 as the control signal for subsequent calculations to complete the precision control signal initialization; Step 103: Precision compliance check. The required precision configuration n2 is checked to see if it meets the data bit width requirements. If it does, it will be passed to the next level. If it does not meet the requirements, it will be required to re-enter the data. Step 104: The final integration module will integrate the processed input data and output it to the next level system; The adder tree construction and generation module's data processing includes: Step 201: The adder number generation module receives the number of input data passed from the previous level to this level. First, it sets the number of levels in the adder tree. ; Step 202: Preload the single-layer adder structure modules to be used according to the number of layers of the generated adder tree, including full-precision adders FA, AXA1 and AXA2; Step 203: Generate an approximate adder, each layer containing a single-layer FA full-precision adder and a single-layer approximate adder, as well as a precision control block that requires precision input to select whether to use FA or approximate addition during calculation; in the reconfigurable part of adjacent layers, a structure is used alternately for error compensation using TGA2 with positive error and ORA with negative error.

2. The hierarchical addition tree structure module with fine-grained, accurate, and reconfigurable approximate computation according to claim 1, characterized in that: The adder tree construction generation module includes: Single-layer adder structure module: The single-layer adder structure is used for the generation of adders at each level of the adder tree; The addition tree generation module is based on computational requirements: the addition tree generation module configures the number of addition tree layers according to the number of addends to be calculated; the addition tree layers handle the accumulation of multi-digit numbers, and operands and intermediate calculation results can be added in pairs; Approximate adder generation module: This unit configures adders in the single-layer adder structure module at each layer according to the addition tree generated by the addition tree generation module based on the computational requirements. Each layer is configured with an approximate adder and a full-precision adder respectively. The precision control module controls which adder module is used in the final calculation at each layer, thus completing the generation of an accurate and reconfigurable approximate calculation adder.

3. The hierarchical addition tree structure module with fine-grained, accurate, and reconfigurable approximate computation according to claim 2, characterized in that: In the single-layer adder structure module, four types of adders are required for subsequent adder generation: the first adder structure is a full-precision adder FA; the second adder structure is AXA1, with the low-order bits consisting of OR gate adders and the high-order bits consisting of ORA and FA reconfigurable; the third adder structure is AXA2, with the low-order bits consisting of OR gate adders and the high-order bits consisting of TGA2 and FA reconfigurable.

4. The hierarchical addition tree structure module with fine-grained, accurate, and reconfigurable approximate computation according to claim 2, characterized in that: The addition tree generation module is based on computational requirements. The addition tree is used to process the accumulation of multi-digit numbers in a hierarchical manner. Operands and intermediate calculation results can be added in pairs. Except for the last operand, all operands are added in pairs in the first level. The last operand needs to be added to the accumulation results of all operands in the last level.

5. The hierarchical addition tree structure module with fine-grained, accurate, and reconfigurable approximate computation according to claim 1, characterized in that: The calculation and result output module includes: The approximate calculation unit module uses the adder generated in the previous-level approximate adder generation module to complete the approximate addition calculation; The precision control module sets the selection of adders for each layer of the approximate adders in the calculation process according to the required precision of the input. The approximate calculation result output module is interconnected with other subsequent circuits to complete data transmission; it has basic logic transmission and level isolation functions.

6. The hierarchical addition tree structure module with fine-grained, accurate, and reconfigurable approximate computation according to claim 5, characterized in that: The data processing steps in the calculation and result output module include: Step 301: Receive the data and the generated approximate adder from the first two modules, and prepare for calculation; Step 302: Perform parallel computation, during which the precision configuration n2 of the previous stage is called to select the adders in each layer of the approximate adder; Step 303: The synchronous timing control module detects the timing of the output signal to ensure that the data output of the current cycle has been completed before writing the data of the next cycle, so as to avoid logical conflicts; Step 304: The final output module will output the data from the final calculation to the final peripheral unit.