An approximate 4-2 compressor and an approximate multiplier based on pair-wise error compensation

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using an approximate 4-2 compressor and a pairwise error compensation method, the hardware efficiency and accuracy of the approximate multiplier are optimized, solving the problem of high error rate in existing technologies and achieving efficient computation in image processing and neural network training.

CN121680780BActive Publication Date: 2026-06-19GREEN IND INNOVATION RES INST OF ANHUI UNIV

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: GREEN IND INNOVATION RES INST OF ANHUI UNIV
Filing Date: 2026-02-09
Publication Date: 2026-06-19

Application Information

Patent Timeline

09 Feb 2026

Application

19 Jun 2026

Publication

CN121680780B

IPC: G06F7/523; G06F7/575; G06F7/501

AI Tagging

Application Domain

Digital data processing details

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing approximate multipliers suffer from high error rates in image processing and neural network training scenarios, resulting in low hardware efficiency and failing to meet the requirements of high-precision computing.

Method used

By employing an approximate 4-2 compressor and a pairwise error compensation method, and through hybrid logic gate design and error compensation technology, hardware efficiency and accuracy are optimized. Combined with the precise-approximate-truncated Wallace tree compression method, the error rate is reduced.

Benefits of technology

It achieves a significant reduction in error rate in image processing and neural network training, improves hardware efficiency, reduces power consumption and area, and maintains high-precision computing performance.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN121680780B_ABST

Patent Text Reader

Abstract

This application relates to an approximate 4-2 compressor and an approximate multiplier based on pairwise error compensation. The approximate 4-2 compressor achieves synergistic optimization of hardware efficiency and accuracy through a hybrid design of transmission gates and logic gates. Drawing on the structural simplicity of OR gate compressors, the four operands are grouped in pairs and compressed using an OR operation. The compressed number is then processed using an XOR gate to calculate the current bit and an AND gate to calculate the carry. To further reduce error, either of the two groups of operands previously ORed is ANDed, and the AND gate output is then ORed with the current bit to further reduce the error rate.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of integrated circuit technology and relates to an approximate 4-2 compressor and an approximate multiplier based on pairwise error compensation. Background Technology

[0002] In emerging applications such as artificial intelligence and IoT edge devices, the design of low-power arithmetic units has become a core challenge for chip energy efficiency optimization. As a fundamental module of digital signal processing, the power consumption, area, and latency performance of multipliers directly constrain the overall system performance. Traditional precise multipliers rely on full-precision logic to perform operations. In error-tolerant scenarios such as image recognition and neural network inference, overly precise calculations can lead to significant hardware redundancy. Approximate computation techniques, by sacrificing precision for hardware efficiency, are becoming a key path to overcome this bottleneck. For example, the insensitivity of the human visual system to subtle errors in image smoothing, sharpening, and image multiplication allows arithmetic units to flexibly balance precision and power consumption. Furthermore, the inherent tolerance of neural network inference to errors provides important application scenarios for improving the hardware energy efficiency of approximate matrix multiplication units.

[0003] Significant progress has been made in hardware efficiency optimization of approximate multipliers, achieving performance breakthroughs such as a 30%-70% reduction in area and a 25%-60% reduction in power consumption through structural simplification (e.g., 4:2 compressor logic optimization, partial product truncation, etc.). The approximate compressor decomposes the traditional single-stage 4:2 compression logic into two cascaded simplified stages:

[0004] The first stage (primary compression) uses a 3:2 compressor (i.e., a full adder) to process three of the four input bits (X1, X2, X3, X4). For example, taking X1, X2, and X3 as inputs generates an intermediate sum S_mid and an intermediate carry C_mid.

[0005] The second stage (secondary merging): Using a highly simplified carry-preserving addition logic, the middle sum S_mid, the middle carry C_mid, and the fourth input bit X4 from the first stage are merged to finally produce two output bits: Sum and Carry.

[0006] To achieve lower power consumption and area, the designers removed some of the complex XOR gates or majority logic gates in the standard full adder in the second stage, directly using basic gate circuits such as "AND" and "OR" to approximate the addition function. This logic simplification disrupts the accurate carry-pass chain, which is the root cause of the introduced calculation error, but at the same time, it allows for a significant reduction in the number of transistors.

[0007] Existing solutions still have key bottlenecks in accuracy control. In the field of image processing, the mean relative error (MRED) of traditional approximate multipliers is generally higher than 1.5%, resulting in a peak signal-to-noise ratio (PSNR) of less than 42dB for operations such as Gaussian smoothing. In neural network training scenarios, error accumulation can reduce the model convergence speed by more than 40%, and the classification accuracy in the inference stage is often reduced due to insufficient accuracy. Summary of the Invention

[0008] To address the problems existing in the above-mentioned traditional methods, this invention proposes an approximate 4-2 compressor and an approximate multiplier based on pairwise error compensation.

[0009] To achieve the above objectives, the embodiments of the present invention adopt the following technical solutions:

[0010] On the one hand, an approximate 4-2 compressor is provided, comprising: a first OR gate, a second OR gate, a third OR gate, a first AND gate, a second AND gate, and an XOR gate.

[0011] The first input terminal of the first OR gate is connected to the first input terminal of the first AND gate and serves as the second input terminal of the approximate 4-2 compressor. The second input terminal of the first OR gate is connected to the second input terminal of the first AND gate and serves as the third input terminal of the approximate 4-2 compressor. The output terminal of the first AND gate is connected to the first input terminal of the third OR gate. The output terminal of the first OR gate is connected to the first input terminal of the XOR gate and the first input terminal of the second AND gate. The output terminal of the XOR gate is connected to the second input terminal of the third OR gate. The output terminal of the third OR gate serves as the S output terminal of the approximate 4-2 compressor. The two input terminals of the second OR gate serve as the first and fourth input terminals of the approximate 4-2 compressor, respectively. The output terminal of the second OR gate is connected to the second input terminal of the XOR gate and the second input terminal of the second AND gate. The output terminal of the second AND gate serves as the CO output terminal of the approximate 4-2 compressor.

[0012] In one embodiment, the approximate 4-2 compressor further includes a lightweight error-aware module for sensing error compensation terms of the approximate 4-2 compressor.

[0013] In one embodiment, a pairwise error compensation method is used to compensate for errors in two approximate 4-2 compressors. The pairwise error compensation method specifically includes: performing an OR operation on the error compensation terms of the two approximate 4-2 compressors to obtain a total error compensation term; and summing the total error compensation term with the results of the two approximate 4-2 compressors in the next stage of operation.

[0014] On the other hand, an approximate multiplier based on pairwise error compensation is also provided, including:

[0015] The partial product generation module receives two 8-bit binary numbers, performs a bitwise AND operation on the two 8-bit binary numbers in sequence, and constructs a Wallace tree with 15 columns and 8 rows.

[0016] The hybrid compressed tree module is used to compress Wallace trees using a combination of exact-approximate-truncation methods, outputting two compressed vectors. The module is divided into three regions based on column importance; specifically:

[0017] Precise compression area: The high-order columns of the Wallace tree are compressed using precise compression units.

[0018] Approximate compression region: The middle column of the Wallace tree is compressed using the above-mentioned approximate 4-2 compressor. In the hybrid compression tree module and the middle compression module, the approximate 4-2 compressors used in the same column are paired up and error compensation is performed using a paired error compensation method.

[0019] Truncated region: The lower columns of the Wallace tree are not compressed or are only simplified.

[0020] The intermediate compression module is used to compress two compressed result vectors using a full adder, a half adder, and an approximate 4-2 compressor to obtain two intermediate result vectors.

[0021] The final adder module is used to add the two intermediate result vectors and output a 16-bit product.

[0022] In one embodiment, the precision compression unit includes a precision compressor and a full adder.

[0023] In one embodiment, the precision compressor is a 4-2 compressor.

[0024] In one embodiment, columns 9 to 15 of the Wallace tree are high-order columns, columns 5 to 8 of the Wallace tree are middle-order columns, and columns 1 to 4 of the Wallace tree are low-order columns.

[0025] In one embodiment, in the approximate compression region, columns 8 and 7 of the Wallace tree are compressed using two approximate 4-2 compressors respectively; columns 6 and 5 of the Wallace tree are compressed using one approximate 4-2 compressor respectively.

[0026] In one embodiment, the intermediate compression module is further configured to compress the results corresponding to the high-order columns of the two compressed result vectors using a full adder, compress the results corresponding to the 8th column of the Wallace tree in the two compressed result vectors using two approximate 4-2 compressors, compress the results corresponding to the 7th column of the Wallace tree in the two compressed result vectors using two full adders, compress the results corresponding to the 6th column of the Wallace tree in the two compressed result vectors using one approximate 4-2 compressor and one full adder respectively, and compress the results corresponding to the 5th column of the Wallace tree in the first compressed result vector using a half adder.

[0027] In one embodiment, in the final adder module, a full adder is used to calculate the results in the two intermediate result vectors that correspond to columns 12 to 16 of the Wallace tree, and a half adder is used to calculate the results in the two intermediate result vectors that correspond to columns 9 to 11 of the Wallace tree.

[0028] One of the above technical solutions has the following advantages and beneficial effects:

[0029] The aforementioned approximate 4-2 compressor and approximate multiplier based on pairwise error compensation achieve synergistic optimization of hardware efficiency and accuracy through a hybrid design of transmission gates and logic gates. Borrowing the advantage of simple structure from OR gate compressors, the four operands are grouped in pairs and compressed using OR operations. The compressed number is then processed using an XOR gate to calculate the current bit and an AND gate to calculate the carry. To further reduce error, either of the two groups of operands previously ORed is ANDed, and the AND gate output is then ORed with the current bit again to further reduce the error rate. Attached Figure Description

[0030] To more clearly illustrate the technical solutions in the embodiments of this application or the conventional technology, the drawings used in the description of the embodiments or the conventional technology will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0031] Figure 1 This is a schematic diagram of an approximate 4-2 compressor in one embodiment;

[0032] Figure 2 This is a schematic diagram of an approximate multiplier based on pairwise error compensation in one embodiment. Detailed Implementation

[0033] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0034] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

[0035] It should be noted that, in this document, the reference to "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of the invention. The presentation of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art will understand that the embodiments described herein can be combined with other embodiments. The term "and / or" as used herein refers to any combination of one or more of the associated listed items, and all possible combinations, including such combinations.

[0036] The embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

[0037] In one embodiment, such as Figure 1 As shown, an approximate 4-2 compressor is provided, including: a first OR gate U1, a second OR gate U2, a third OR gate U3, a first AND gate U4, a second AND gate U5, and an XOR gate U6.

[0038] The first input terminal of the first OR gate U1 is connected to the first input terminal of the first AND gate U4 and serves as the second input terminal X1 of the approximate 4-2 compressor. The second input terminal of the first OR gate U1 is connected to the second input terminal of the first AND gate U4 and serves as the third input terminal X2 of the approximate 4-2 compressor. The output terminal of the first AND gate U4 is connected to the first input terminal of the third OR gate U3. The output terminal of the first OR gate U1 is connected to the first input terminal of the XOR gate U6 and the first input terminal of the second AND gate U5. The output terminal of the XOR gate U6 is connected to the second input terminal of the third OR gate U3. The output terminal of the third OR gate U3 serves as the S output terminal of the approximate 4-2 compressor. The two input terminals of the second OR gate U2 serve as the first input terminal X0 and the fourth input terminal X3 of the approximate 4-2 compressor, respectively. The output terminal of the second OR gate U2 is connected to the second input terminal of the XOR gate U6 and the second input terminal of the second AND gate U5. The output terminal of the second AND gate U5 serves as the CO output terminal of the approximate 4-2 compressor.

[0039] Specifically, the function of the approximate 4-2 compressor is to approximate the sum of four one-bit binary numbers. Previously, compression using an OR gate was proposed. When compressing two numbers using an OR gate, an error of -1 only occurs when both numbers are 1. However, when approximating the sum of more numbers (e.g., four), the error rate of the OR gate increases significantly. The approximate 4-2 compressor proposed in this application draws on the advantage of the simple structure of the OR gate compressor. It groups the four operands in pairs and performs OR operations on each pair for compression. The compressed number is then processed using an XOR gate to calculate the current bit and an AND gate to calculate the carry. To further reduce the error, an AND operation is performed on any one of the two sets of operands (X2, X1 in this embodiment) to extract cases where X2=1, X1=1 would result in an error. Since the error is 1, the AND gate output is then ORed with the current bit again to reduce the error rate. The error distribution of this compressor is shown in Table 1. The compressor produces an error of -1 when the input is 0110, 1001, 1011, 1101, 1110, or 1111. The compressor's error rate is 28 / 256.

[0040] Table 1 Approximate 4-2 Compressor Error Distribution

[0041]

[0042] In the aforementioned approximate 4-2 compressor, the hardware efficiency and accuracy are optimized through a hybrid design of transmission gates and logic gates. Drawing on the simple structure of OR gate compressors, the four operands are grouped in pairs and compressed using OR operations. The compressed number is then processed using an XOR gate to calculate the current bit and an AND gate to calculate the carry. To further reduce error, either of the two groups of operands from the previous OR operation is ANDed, and the AND gate output is then ORed with the current bit to further reduce the error rate.

[0043] In one embodiment, the approximate 4-2 compressor further includes a lightweight error-aware module for sensing error compensation terms of the approximate 4-2 compressor.

[0044] In one embodiment, a pairwise error compensation method is used to compensate for errors in two approximate 4-2 compressors. The pairwise error compensation method specifically includes: performing an OR operation on the error compensation terms of the two approximate 4-2 compressors to obtain a total error compensation term; and summing the total error compensation term with the results of the two approximate 4-2 compressors in the next stage of operation.

[0045] Specifically, to further reduce the error rate and improve the accuracy of the approximate 4-2 compressor in image processing and neural networks, this application introduces a lightweight error-aware module to reduce the error rate of the approximate 4-2 compressor. As can be seen from Table 1, when both X3 and X0 are 1, the approximate 4-2 compressor will inevitably produce an error. Therefore, an error compensation term can be designed: e = X3X0.

[0046] However, if we simply add the result of the approximate 4-2 compressor to the error compensation term, the overhead of the approximate multiplier or approximate matrix multiplier designed in this application will increase significantly, which contradicts the original intention of using approximate calculation. Therefore, this application proposes a pairwise error compensation method. The error compensation terms of the two approximate 4-2 compressors are ORed to obtain an error compensation term, which is then summed with the results of the two compressors in the next stage of operation. After the pairwise error compensation method, the two compressors will generate errors in the following situations: (1) Any one of the inputs of the two compressors is 0110. (2) When X3 and X0 of the two compressors are both 1. In the first case, the two approximate 4-2 compressors will generate a total error rate of 18 / 256, and the error distance is -1 or -2. In the second case, the two compressors will generate a total error of 1.12 / 256, and the error distance will decrease from -2 before compensation to -1. The two approximate 4:2 compressors using the paired compensation method will produce a total error of 19.12 / 256, averaging 9.56 / 256 per compressor. This represents a significant improvement in accuracy compared to other 4:2 compressors.

[0047] In one embodiment, such as Figure 2 As shown, an approximate multiplier based on pairwise error compensation is also provided, including:

[0048] The partial product generation module receives two 8-bit binary numbers, performs a bitwise AND operation on the two 8-bit binary numbers in sequence, and constructs a Wallace tree with 15 columns and 8 rows.

[0049] The hybrid compressed tree module is used to compress Wallace trees using a combination of exact-approximate-truncation methods, outputting two compressed vectors. The module is divided into three regions based on column importance; specifically:

[0050] Precise compression area: The high-order columns of the Wallace tree are compressed using precise compression units.

[0051] Approximate compression region: The middle column of the Wallace tree is compressed using the above-mentioned approximate 4-2 compressor. In the hybrid compression tree module and the middle compression module, the approximate 4-2 compressors used in the same column are paired up and error compensation is performed using a paired error compensation method.

[0052] Truncated region: The lower columns of the Wallace tree are not compressed or are only simplified.

[0053] The intermediate compression module is used to compress two compressed result vectors using a full adder, a half adder, and an approximate 4-2 compressor to obtain two intermediate result vectors.

[0054] The final adder module is used to add the two intermediate result vectors and output a 16-bit product.

[0055] Specifically, existing solutions still have key bottlenecks in accuracy control: In the field of image processing, the mean relative error (MRED) of traditional approximate multipliers is generally higher than 1.5%, resulting in the peak signal-to-noise ratio (PSNR) of operations such as Gaussian smoothing being lower than 42dB; while in neural network training scenarios, error accumulation can reduce the model convergence speed by more than 40%, and the classification accuracy in the inference stage is often reduced due to insufficient accuracy.

[0056] To address this challenge, this application proposes an approximate multiplier based on pairwise error compensation. This 8-bit approximate multiplier is built upon the proposed approximate 4-2 compressor architecture, achieving an average relative error of only 0.88% in actual measurements using a 45nm process. In image processing applications, smoothing, sharpening, and multiplication achieve near-accurate multiplier performance. In neural network inference tasks, the ResNet-18 model maintains ImageNet classification accuracy above experimental data, with gradient update errors during training controlled within an acceptable range, validating the applicability of this design in high-precision scenarios.

[0057] The first step of this pairwise error-compensated approximate multiplier involves sequentially performing a bitwise AND operation on two 8-bit binary numbers to create a 15-column, 8-row Wallace tree. The Wallace tree is processed using a combination of exact-approximate-truncation methods. Since the higher bits of the Wallace tree contribute significantly to the accuracy of the multiplication result, columns 9 to 15 are compressed using an exact compressor, while columns 5 to 8 are compressed using the approximate 4-2 compressor proposed in this application to reduce area, power consumption, and latency. Because columns 1 to 4 of the Wallace tree contribute little to the accuracy of the multiplication result, this segment is truncated, and the result 0110 is directly output.

[0058] In the design of this multiplier, a pairwise compensation method was adopted, which pairs of the approximate 4-2 compressors in the same column to compensate for errors. This not only reduces errors, but also reduces area, power consumption and delay compared to simple error compensation.

[0059] In the aforementioned approximate multiplier based on pairwise error compensation, the approximate multiplier is an 8-bit approximate multiplier built on the approximate 4-2 compressor. It adopts a pairwise compensation method, which performs error compensation on pairs of approximate 4-2 compressors in the same column. This not only reduces the error, but also reduces the area, power consumption and latency compared to simple error compensation. It achieves a computational accuracy superior to existing approximate multipliers, and is especially suitable for edge computing scenarios with a certain tolerance for error, such as image processing and neural network inference.

[0060] In one embodiment, the precision compression unit includes a precision compressor and a full adder.

[0061] In one embodiment, the precision compressor is a 4-2 compressor.

[0062] In one embodiment, columns 9 to 15 of the Wallace tree are high-order columns, columns 5 to 8 of the Wallace tree are middle-order columns, and columns 1 to 4 of the Wallace tree are low-order columns.

[0063] In one embodiment, in the approximate compression region, columns 8 and 7 of the Wallace tree are compressed using two approximate 4-2 compressors respectively; columns 6 and 5 of the Wallace tree are compressed using one approximate 4-2 compressor respectively.

[0064] In one embodiment, the intermediate compression module is further configured to compress the results corresponding to the high-order columns of the two compressed result vectors using a full adder, compress the results corresponding to the 8th column of the Wallace tree in the two compressed result vectors using two approximate 4-2 compressors, compress the results corresponding to the 7th column of the Wallace tree in the two compressed result vectors using two full adders, compress the results corresponding to the 6th column of the Wallace tree in the two compressed result vectors using one approximate 4-2 compressor and one full adder respectively, and compress the results corresponding to the 5th column of the Wallace tree in the first compressed result vector using a half adder.

[0065] In one embodiment, in the final adder module, a full adder is used to calculate the results in the two intermediate result vectors that correspond to columns 12 to 16 of the Wallace tree, and a half adder is used to calculate the results in the two intermediate result vectors that correspond to columns 9 to 11 of the Wallace tree.

[0066] In a verification embodiment, as shown in Table 2, comparing the approximate 4-2 compressor proposed in this application with the exact 4:2 compressor, the approximate 4-2 compressor in Reference 1 (Design of an Energy-Efficient Approximate Compressor for Error-Resilient Multiplications, Xilin Yi, Haoran Pei, Ziji Zhang, Hang Zhou, and Yajuan (UESTC), 2019 IEEE International Symposium on Circuits and Systems (ISCAS)), and the approximate 4-2 compressor in Reference 2 (Multipliers With Approximate 4-2 Compressors and Error Recovery Modules, Minho Ha and Sunggu Lee, Member, IEEE; IEEE EMBEDDED SYSTEMS LETTERS, VOL. 10, NO. 1, MARCH 2018), the approximate 4-2 compressor proposed in this application has the lowest area and power consumption among all compressors.

[0067] Table 2 Performance Comparison of Four Types of 4:2 Compressors

[0068]

[0069] As shown in Table 3, the approximate multiplier proposed in this application based on pairwise error compensation is an 8-bit approximate multiplier. The area of this 8-bit approximate multiplier is slightly larger than that of the approximate multiplier in Reference 2. This is mainly because it utilizes the pairwise error compensation method. Other multipliers only require three 4-2 compressors in the 8th column of the Wallace tree, while this application requires four compressors in the 8th column of the Wallace tree to perform pairwise error compensation, thus sacrificing some area overhead. However, since the proposed approximate 4-2 compressor has a smaller area overhead, the approximate multiplier based on the proposed approximate 4-2 compressor performs better than the approximate multiplier in Reference 1. The area overhead is reduced by 23.37% compared to the exact multiplier.

[0070] Table 3 Performance Comparison of Four 8-bit Multipliers

[0071]

[0072] As shown in Table 3, compared with the approximate multipliers in Reference 1 and Reference 2, the approximate multiplier based on pairwise error compensation proposed in this application exhibits better power consumption. The power consumption is reduced by 20.66% compared to the accurate multiplier.

[0073] Similarly, the approximate multiplier based on pairwise error compensation proposed in this application has a larger delay overhead than the approximate multiplier in Reference 2 because it uses a pairwise error compensation method and has an additional compressor delay compared to other approximate multipliers. However, since the delay overhead of the approximate 4-2 compressor proposed in this application is smaller, its delay overhead is lower than that of the approximate multiplier in Reference 1.

[0074] The power area delay product (PADP) is an important metric for evaluating the performance of a design. The PADP of the proposed approximate multiplier is only slightly lower than that of the approximate multipliers in Reference 1 and Reference 2. Compared to the exact multiplier, the PADP of the approximate multiplier proposed in this application is reduced by 40.19%.

[0075] Regarding accuracy, this embodiment uses the average relative error for comparison. Compared to other approximate multipliers, the MRED of the pairwise error-compensated approximate multiplier proposed in this application is significantly reduced, which also means that its accuracy is superior to other approximate multipliers. The superior accuracy of the pairwise error-compensated approximate multiplier proposed in this application is mainly attributed to the pairwise error compensation method. The pairwise error compensation method can improve the accuracy of the approximate multiplier at the expense of some area and power consumption, which makes the approximate multiplier perform better in application scenarios that require high accuracy, low area, and low power consumption. Since the partial product that each column of the Wallace tree of the 8-bit approximate multiplier needs to process is limited, an additional compressor is required when using the pairwise error compensation method, thus sacrificing some area, power consumption, and latency performance. However, the pairwise error compensation method may be more suitable for matrix multipliers with more partial product compression operations.

[0076] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

[0077] The above embodiments are merely illustrative of several implementation methods of this application, and their descriptions are relatively specific and detailed. However, they should not be construed as limiting the scope of protection of this application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and all such modifications and improvements fall within the scope of protection of this application.

Claims

1. An approximate 4-2 compressor, characterized in that, include: First OR gate, second OR gate, third OR gate, first AND gate, second AND gate, and XOR gate; The first input terminal of the first OR gate is connected to the first input terminal of the first AND gate and serves as the second input terminal of the approximate 4-2 compressor. The second input terminal of the first OR gate is connected to the second input terminal of the first AND gate and serves as the third input terminal of the approximate 4-2 compressor. The output terminal of the first AND gate is connected to the first input terminal of the third OR gate. The output terminal of the first OR gate is connected to the first input terminal of the XOR gate and the first input terminal of the second AND gate. The output terminal of the XOR gate is connected to the second input terminal of the third OR gate. The output terminal of the third OR gate serves as the S output terminal of the approximate 4-2 compressor. The two input terminals of the second OR gate serve as the first and fourth input terminals of the approximate 4-2 compressor, respectively. The output terminal of the second OR gate is connected to the second input terminal of the XOR gate and the second input terminal of the second AND gate. The output terminal of the second AND gate serves as the CO output terminal of the approximate 4-2 compressor.

2. The approximately 4-2 compressor according to claim 1, characterized in that, The approximate 4-2 compressor further includes a lightweight error detection module for sensing the error compensation term of the approximate 4-2 compressor.

3. The approximately 4-2 compressor according to claim 2, characterized in that, For the two approximate 4-2 compressors, a pairwise error compensation method is used for error compensation; The pairwise error compensation method specifically includes: performing an OR operation on the error compensation terms of the two approximate 4-2 compressors to obtain a total error compensation term; and summing the total error compensation term with the results of the two approximate 4-2 compressors in the next stage of operation.

4. An approximate multiplier based on pairwise error compensation, characterized in that, include: The partial product generation module is used to receive two 8-bit binary numbers, perform AND operations on the two 8-bit binary numbers in sequence, and form a Wallace tree with 15 columns and 8 rows. A hybrid compression tree module is used to compress the Wallace tree using a combination of exact-approximate-truncation methods, outputting two compressed result vectors. The hybrid compression tree module is divided into three regions based on column importance; specifically, it includes: Precise compression area: The high-order columns of the Wallace tree are compressed using precise compression units; Approximate compression region: The middle column of the Wallace tree is compressed using the approximate 4-2 compressor as described in claim 3, wherein, in the hybrid compression tree module and the middle compression module, the approximate 4-2 compressors used in the same column are paired up and error compensated using a pairwise error compensation method; Truncated region: The lower columns of the Wallace tree are not compressed or are only simplified. An intermediate compression module is used to compress two compressed result vectors using a full adder, a half adder, and the approximate 4-2 compressor to obtain two intermediate result vectors. The final adder module is used to add the two intermediate result vectors and output a 16-bit product.

5. The approximate multiplier based on pairwise error compensation according to claim 4, characterized in that, The precision compression unit includes a precision compressor and a full adder.

6. The approximate multiplier based on pairwise error compensation according to claim 5, characterized in that, The precision compressor is a 4-2 compressor.

7. The approximate multiplier based on pairwise error compensation according to claim 4, characterized in that, The 9th to 15th columns of the Wallace tree are the high columns, the 5th to 8th columns are the middle columns, and the 1st to 4th columns are the low columns.

8. The approximate multiplier based on pairwise error compensation according to claim 4, characterized in that, In the approximate compression zone, the 8th and 7th columns of the Wallace tree are compressed using two approximate 4-2 compressors respectively; the 6th and 5th columns of the Wallace tree are compressed using one approximate 4-2 compressor respectively.

9. The approximate multiplier based on pairwise error compensation according to claim 4, characterized in that, The intermediate compression module is further configured to compress the results corresponding to the high-order columns of the two compressed result vectors using a full adder, compress the results corresponding to the 8th column of the Wallace tree in the two compressed result vectors using two approximate 4-2 compressors, compress the results corresponding to the 7th column of the Wallace tree in the two compressed result vectors using two full adders, compress the results corresponding to the 6th column of the Wallace tree in the two compressed result vectors using one approximate 4-2 compressor and one full adder respectively, and compress the results corresponding to the 5th column of the Wallace tree in the first compressed result vector using a half adder.

10. The approximate multiplier based on pairwise error compensation according to claim 4, characterized in that, In the final adder module, a full adder is used to calculate the results in the two intermediate result vectors that correspond to columns 12 to 16 of the Wallace tree, and a half adder is used to calculate the results in the two intermediate result vectors that correspond to columns 9 to 11 of the Wallace tree.