Design method of memory computing integrated chip for mixed precision addition neural network
By designing a memory-computing array circuit with a bit-flexible minimum selector circuit, efficient computation of mixed-precision additive neural networks is supported, solving the problems of low computational and energy efficiency in existing technologies, and achieving efficient operator deployment and computation results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- PEKING UNIV
- Filing Date
- 2025-12-15
- Publication Date
- 2026-06-19
AI Technical Summary
Existing additive neural network in-memory computing chip designs cannot efficiently support mixed-precision computing, resulting in low computational and energy efficiency.
Design a memory-computing array circuit based on a bit-flexible minimum selector circuit, supporting addition neural network computation with 2-bit, 4-bit, and 8-bit precision, and realize operator deployment computation under different precisions through multi-precision flexible configurable deployment technology.
It improves the computational efficiency of edge-side additive neural networks, achieves 100% hardware utilization, and reduces the power consumption of edge devices.
Smart Images

Figure CN121766232B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of integrated circuit design technology, and relates to in-memory computing integrated circuit design technology. Specifically, it relates to an in-memory computing chip design technology for hybrid quantization precision additive neural networks, including circuit structure, operator deployment data flow and its implementation method. Background Technology
[0002] In recent years, with the rapid development of artificial intelligence, the demand for AI chips has also increased significantly. Artificial intelligence is mainly based on neural network model algorithms, such as convolutional neural networks, which are characterized by a large number of model parameters, high computational cost, and primarily matrix multiplication and addition. Especially when deploying neural networks in edge chips, it can cause significant power consumption and performance losses. To reduce the computational overhead of neural networks, recent research has proposed additive neural networks, which significantly reduce computational complexity by converting the multiplication and addition of weights and activation values into absolute value subtraction.
[0003] In-memory computing (CIM) technology significantly reduces data movement and shortens the distance between computation and storage by inserting computing circuitry into the memory array, thereby greatly reducing power consumption and becoming a recent research hotspot. Among these, CIM technology based on static random access memory (SRAM) is particularly suitable for edge-side applications due to its mature process and stable memory nodes. Recent research has also proposed in-memory computing chip designs for additive neural networks, which further simplify absolute value subtraction to a minimum value summation calculation, significantly improving energy efficiency.
[0004] Mixed-precision quantization further reduces the size of model parameters and computational complexity by quantizing different weights and inputs in the same model into different bit widths. However, minimum summation calculations require different computational circuit designs for different precisions. Therefore, current in-memory computing chip designs for additive neural networks are all designed for fixed bit widths and cannot efficiently support mixed-precision calculations. Summary of the Invention
[0005] To address the shortcomings of the existing technologies, this invention provides a method for designing an in-memory computing chip for mixed-precision additive neural networks. This invention designs an in-memory computing array circuit based on a bit-flexible minimum selector circuit and a multi-precision flexible configurable deployment technology. The designed in-memory computing chip can efficiently support the computation of 2-bit, 4-bit, and 8-bit precision additive neural networks, and has high circuit utilization and energy efficiency.
[0006] The technical solution of this invention is:
[0007] A method for designing an in-memory computing chip for mixed-precision additive neural networks: Based on conventional in-memory computing (CIM) systems, this invention improves upon existing CIMs by designing an in-memory computing array circuit based on a bit-flexible minimum selector circuit. This circuit efficiently supports 2-bit, 4-bit, and 8-bit precision additive neural networks. Furthermore, this invention employs a multi-precision, flexible, and configurable deployment technology. For 2-bit, 4-bit, and 8-bit precision additive neural networks, the circuit can be flexibly configured to achieve efficient operator deployment and computation, significantly improving the computational efficiency of edge-side additive neural networks.
[0008] In specific implementation, the in-memory array circuit based on the bit-flexible minimum selector circuit designed in this invention includes a read / write circuit, an address decoder circuit, multiple sub-blocks (banks), an activation summation adder tree circuit (summing the activation values), and multiple result generation circuits. Each bank contains multiple static random access memories (SRAMs), multiple bit-flexible minimum selector circuits, and one adder tree (summing the results of multiple bit-flexible minimum selector circuits). The SRAM read / write circuit is responsible for reading and writing weight data in the SRAM within the bank; the bank is responsible for storing the weights W and weight summation data Wsum of the additive neural network, comparing the weights W and the input activation values ACT to obtain the minimum value, and summing the minimum values to obtain the sum of the minimum values Csum; the activation summation adder tree circuit is responsible for summing and accumulating the activation values of the input additive neural network to obtain the activation value summation data ACTsum; the result generation circuit is used to generate the final calculation result based on the operators of the additive neural network. The bit-flexible minimum selector circuit designed in this invention consists of 8 encoder circuits, 1 multi-level comparison tree circuit, and 8 selector circuits. The input to the bit-flexible minimum selector circuit includes an 8-bit weight W, an 8-bit activation value ACT, and a precision configuration signal (used to configure the selection of different bits). The output is the minimum value between the weight and the activation value. The encoder circuit in the bit-flexible minimum selector circuit performs digital logic operations on the weight and activation value, converting them into a generation signal G and a propagation signal P. Then, a multi-level comparison tree circuit compares the G and P signals in stages, obtaining comparison signals for 2 bits, 4 bits, and 8 bits respectively. The bit-flexible minimum selector circuit first selects the corresponding bit comparison signal based on the precision configuration signal, and then selects the minimum value between the weight and activation values of the additive neural network based on the comparison signal, outputting it to the addition tree. Ultimately, it can achieve minimum value selection and summation calculation at different precisions.
[0009] In the specific deployment of the additive neural network operator, the SRAM in each bank of the in-memory computing array circuit is used to store the neural network's weight data W and the weight summation data Wsum. The activation value data is input as an input vector into the in-memory computing array circuit and calculated with the weights. For additive neural networks of different precisions (e.g., 2-bit, 4-bit, and 8-bit), the deployment scheme can be flexibly configured. For example, for an N-bit (N is 2, 4, or 8) additive neural network, one bank contains M-bit SRAM cells. That is, each bank stores M / N (M divided by N) weight data, and M / N activation data are input for each calculation, with each weight data corresponding to one activation data. Then, the bit-flexible minimum value selector circuit is configured for M-bit precision calculation. In the bit-flexible minimum value selector circuit, the weight data and activation value data are compared and calculated to obtain M / N minimum value data, which are then sent to subsequent circuits to implement the entire operator's calculation.
[0010] During computation, the activation value data first passes through an activation summation adder tree circuit to obtain the activation value summation data ACTsum. Then, each bank within the in-memory computing array reads multiple weight data stored in SRAM. According to the adder network operator, each weight data corresponds to an activation data. The weight data and activation value data are fed together into the bit-flexible minimum value selector circuit and the adder tree circuit within the bank. The bit-flexible minimum value selector circuit selects the M-bit comparison signal from the multi-level comparison tree circuit based on the precision configuration signal, and then selects the minimum value of the weight and activation based on the comparison signal, and performs minimum value selection summation to obtain the minimum value sum Csum. The weight summation data Wsum, the activation value summation data ACTsum, and the minimum value sum Csum are fed together into the result generation circuit to calculate ACTsum + Wsum - 2 × Csum, obtaining the calculation result PSUM of the adder neural network operator, which is the activation value data output by this layer. After subsequent activation function calculations, PSUM will continue to be passed to the next layer as the activation value data input for the next layer.
[0011] Compared with the prior art, the beneficial technical effects of the present invention are as follows:
[0012] This invention provides a method for designing an in-memory computing chip for mixed-precision additive neural networks. It involves designing a bit-flexible minimum selector circuit and an in-memory computing array circuit based on the bit-flexible minimum selector circuit. The bit-flexible minimum selector circuit includes multiple encoder circuits, a multi-level comparison tree circuit, and multiple selector circuits. The input to the bit-flexible minimum selector circuit includes the weight values, activation values, and precision configuration signals of the additive neural network. The output is the minimum value among the weight values and activation values. Flexible and configurable deployment is implemented for additive neural network operators of different precisions, and configuration signals are set for the bit-flexible minimum selector circuit. The activation values are then accumulated through an activation summation addition tree circuit, completing the computation of the additive neural network operator. This invention can flexibly support the deployment and computation of additive network operators under different bit precisions and achieve 100% hardware utilization for different bit precisions. This enables accelerated inference computation for mixed-precision quantized additive neural networks, significantly improving the efficiency of edge operation and reducing the power consumption of edge devices. Attached Figure Description
[0013] Figure 1 This is a flowchart of the method of the present invention.
[0014] Figure 2 This is a schematic diagram of the architecture of an in-memory computing array circuit.
[0015] Figure 3 This is a schematic diagram of the architecture of a bit-flexible minimum selector circuit.
[0016] Figure 4 This is a schematic diagram illustrating the working principle of a bit-flexible minimum selector circuit at different precision levels. Detailed Implementation
[0017] The present invention will be further illustrated below with reference to the accompanying drawings and embodiments, but the scope of the invention is not limited in any way.
[0018] In practical implementation, the in-memory computing chip design method proposed in this invention can significantly improve the inference computation efficiency of mixed-precision quantized additive neural networks, and can efficiently support additive neural network computations of different precisions: 2-bit, 4-bit, and 8-bit. The in-memory computing chip design method for mixed-precision additive neural networks proposed in this invention will be described in detail below, such as... Figure 1 As shown, the method of the present invention includes the following steps:
[0019] 1) Design of an in-memory computing array circuit based on a bit-flexible minimum selector circuit: This includes a read / write circuit, multiple sub-banks, an address decoder circuit, an activation summation adder tree circuit, and multiple result generation circuits. Each bank contains multiple SRAMs, multiple bit-flexible minimum selector circuits, and one adder tree circuit. The innovative design of this invention is the design of the bit-flexible minimum selector circuit and the in-memory computing array circuit based on it. This in-memory computing array circuit can support the deployment and computation of additive neural network operators with different precisions, significantly improving the inference computation efficiency of additive neural networks after mixed-precision quantization.
[0020] 2) Flexible and configurable deployment of additive neural network operators with different precisions: For additive neural network operators with different precisions, the weights corresponding to the bit width precision are stored in the SRAM of the in-memory computing array circuit, the activation corresponding to the bit width precision is input into the in-memory computing array circuit, and the configuration signal is set to the bit flexible minimum value selector circuit according to the precision of the current operator.
[0021] 3) Completion of the additive neural network operator calculation: The activation value data is processed by the activation summation addition tree circuit to obtain the activation value summation data ACTsum. Each bank within the in-memory array reads the weights W and weight summation data Wsum from the SRAM of the additive neural network. The weight data W and the activation value data ACT are fed together into the bit-flexible minimum value selector circuit and the addition tree circuit within the bank. The bit-flexible minimum value selector circuit selects the corresponding bit comparison signal according to the precision configuration signal, and then selects the minimum value of the weights and activations according to the comparison signal, performing minimum value selection and summation calculation to obtain the minimum value sum Csum. Wsum, ACTsum, and Csum are then fed together into the result generation circuit to calculate ACTsum + Wsum - 2 × Csum, thus obtaining the calculation result PSUM of the additive neural network operator.
[0022] In specific implementation, this invention designs a memory-computing integrated array circuit based on a bit-flexible minimum selector circuit. For example... Figure 2As shown, it includes one read / write circuit, 32 sub-blocks (banks), one address decoder circuit, one activation summation adder tree circuit, and 32 result generation circuits. Each bank contains 256 SRAMs, 32 bit-flexible minimum selector circuits, and one adder tree. The SRAM read / write circuit is responsible for reading and writing data in the SRAM within the bank; the bank is responsible for storing the weights W and the weight summation data Wsum, and for performing minimum value selection and summation calculation to obtain the minimum sum Csum; the activation summation adder tree circuit is responsible for summing and accumulating the input activation values to obtain the activation value summation data ACTsum; the result generation circuit generates the final calculation result PSUM based on the operators of the addition neural network.
[0023] Bit-flexible minimum selector circuit such as Figure 3 As shown, its inputs are an 8-bit weight value W<0:7>, an 8-bit weight logical inverted value WB<0:7>, and an 8-bit activation inverted value XB<0:7>. The bit-flexible minimum value selector circuit consists of 8 encoder circuits, 1 three-level comparison tree circuit, and 8 selector circuits. Each encoder performs a logical AND operation on 1 bit W and 1 bit XB to obtain a generated signal G, and performs a logical XOR operation on 1 bit WB and 1 bit XB to obtain a propagation signal P. Finally, the 8 encoder circuits generate an 8-bit generated signal G<0:7> and an 8-bit propagation signal P<0:7>. The three-level comparison tree circuit compares the G<0:7> and P<0:7> signals in stages to obtain comparison signals at 2 bits, 4 bits, and 8 bits, respectively. The 8 selector circuits first select the corresponding bit comparison signal according to the precision configuration signal, and then select the minimum value OUT<0:7> of weight and activation according to the comparison signal, and output it to the addition tree. Finally, it can realize minimum value selection and summation calculation at different precisions. The bit-flexible minimum selector can be flexibly configured according to different bit precisions. For example... Figure 4 As shown, for a 2-bit additive neural network, configuring the bit-flexible minimum selector circuit for 2-bit precision computation allows each circuit to perform four sets of 2-bit computations. For a 4-bit additive neural network, configuring the bit-flexible minimum selector circuit for 4-bit precision computation allows each circuit to perform two sets of 4-bit computations. For an 8-bit additive neural network, configuring the bit-flexible minimum selector circuit for 8-bit precision computation allows each circuit to perform one set of 8-bit computations.
[0024] In the specific deployment of additive neural network operators, the SRAM in the in-memory computing unit is used to store the neural network's weight data W and the weight summation data Wsum. Activation value data is input as an input vector into the in-memory computing array to perform calculations with the weights. For additive neural networks of different precisions (2-bit, 4-bit, and 8-bit), the deployment scheme can be flexibly configured. For example, for a 2-bit additive neural network, each bank stores 128 weight data points, and 128 activation data points are input for each calculation, with each weight data point corresponding to one activation data point. Then, the bit-flexible minimum value selector circuit is configured for 2-bit precision calculation. Each bit-flexible minimum value selector circuit can perform four sets of 2-bit calculations, and each bank can perform minimum value selection and summation calculations for 128 weight data points and 128 activation data points. For a 4-bit additive neural network, each bank stores 64 weight data points, and 64 4-bit activation data points are input for each calculation, with each weight data point corresponding to one activation data point. Then, the bit-flexible minimum selector circuit is configured for 4-bit precision computation. Each bit-flexible minimum selector circuit can perform two sets of 4-bit computations, and each bank can perform minimum selection and summation computations on 64 weight data and 64 activation data. For an 8-bit additive neural network, each bank stores 32 weight data, and 32 8-bit activation data are fed in for each computation, with each weight data corresponding to one activation data. Then, the bit-flexible minimum selector circuit is configured for 8-bit precision computation. Each bit-flexible minimum selector circuit can perform one set of 8-bit computations, and each bank can perform minimum selection and summation computations on 32 weight data and 32 activation data.
[0025] During calculation, the activation value data first passes through the activation summation adder tree circuit to obtain the activation value sum data ACTsum. Then, each bank within the in-memory array reads multiple weight data stored in SRAM. According to the adder network operator, each weight data corresponds to an activation data. The weight data and activation value data are fed together into the bit-flexible minimum value selector circuit and adder tree circuit within the bank. The selector circuit selects an 8-bit comparison signal from the multi-level comparison tree based on the precision configuration signal, and then selects the minimum value of the weight data W and the activation value data ACT based on the comparison signal, and performs minimum value selection summation to obtain the minimum value sum Csum. Wsum, ACTsum, and Csum are then fed together into the result generation circuit to calculate ACTsum + Wsum - 2 × Csum, which yields the calculation result PSUM of the adder neural network operator.
[0026] It should be noted that the purpose of disclosing the embodiments is to help further understand the present invention. However, those skilled in the art will understand that various substitutions and modifications are possible without departing from the scope of the present invention and the appended claims. Therefore, the present invention should not be limited to the content disclosed in the embodiments, and the scope of protection of the present invention is defined by the scope of the claims.
Claims
1. A method for a memory computing chip design for a mixed-precision addition neural network, the method comprising: Includes the following steps: 1) Design a bit-flexible minimum selector circuit and a memory-computing array circuit based on the bit-flexible minimum selector circuit; 11) The bit-flexible minimum selector circuit includes multiple encoder circuits, a multi-level comparison tree circuit, and multiple selector circuits; The inputs of the bit-flexible minimum selector circuit include the weights, activations, and precision configuration signals of the additive neural network, and the output is the minimum value among the weights and activations. The encoder circuit in the bit-flexible minimum selector circuit performs digital logic operations on the weights and activation values, converting them into generated signals and propagated signals; the multi-level comparison tree circuit compares the generated signals and propagated signals in stages to obtain the comparison signals for the corresponding bits. The comparison signal corresponding to the corresponding bit is selected based on the precision configuration signal. Then, the minimum value of the weight value and activation value of the addition neural network is selected based on the comparison signal and output to the addition tree to realize the minimum value selection and summation calculation under different precision. 12) Design a memory-computing array circuit based on a bit-flexible minimum selector circuit; The in-memory computing array circuit includes a read / write circuit, a sub-bank, an address decoder circuit, an activation summation adder tree circuit, and a result generation circuit. Each bank contains multiple static random access memories (SRAMs), multiple bit-flexible minimum selector circuits, and one adder tree. The read / write circuit is used to read and write data in the SRAM within the bank. The bank stores weights and weight summation data, and performs minimum value selection and summation calculation. The activation summation adder tree circuit sums the input activation values. The result generation circuit generates the calculation results based on the operators of the additive neural network. 2) Flexible and configurable deployment for addition neural network operators of different precisions: For additive neural network operators with different precisions, the weights corresponding to the bit width precision are stored in the SRAM of the in-memory array circuit, the activation values corresponding to the bit width precision are input into the in-memory array circuit, and the configuration signal is set to the bit flexible minimum value selector circuit according to the precision of the current operator. 3) The activation value summation data is obtained through the activation summation addition tree circuit, and the calculation of the addition neural network operator is completed; Each bank in the in-memory computing array reads the weight values from the SRAM and performs a weight summation; The weight value and activation value are fed together into the bit flexible minimum value selector circuit and the adder tree circuit in the bank. The bit flexible minimum value selector circuit selects the comparison signal of the corresponding bit according to the precision configuration signal, and then selects the minimum value of the weight value and activation value according to the comparison signal and performs minimum value selection summation to obtain the minimum value sum. The weighted summation data Wsum, the activation value summation data ACTsum, and the minimum value summation Csum are fed together into the result generation circuit for calculation, thus obtaining the activation value result of the additive neural network operator.
2. The in-memory computing chip design method for mixed-precision additive neural networks as described in claim 1, characterized in that, In step 3), the specific calculation in the result generation circuit is: ACTsum + Wsum - 2 × Csum, which serves as the calculation result of the additive neural network operator.
3. The in-memory computing chip design method for mixed-precision additive neural networks as described in claim 1, characterized in that, The bit-flexible minimum selector circuit consists of 8 encoder circuits, 1 multi-level comparison tree circuit, and 8 selector circuits.
4. The in-memory computing chip design method for mixed-precision additive neural networks as described in claim 3, characterized in that, The inputs of the bit-flexible minimum selector circuit include an 8-bit weight W, an 8-bit activation value ACT, and precision configuration signals of different bits.
5. The design method of the integrated chip of storage and calculation for the mixed precision addition neural network according to claim 1, wherein, The in-memory array circuit based on the bit-flexible minimum selector circuit includes one read / write circuit, 32 sub-banks, one address decoder circuit, one activation summation adder tree circuit, and 32 result generation circuits; each bank contains 256 SRAMs, 32 bit-flexible minimum selector circuits, and one adder tree.
6. The in-memory computing chip design method for mixed-precision additive neural networks as described in claim 1, characterized in that, The bit-flexible minimum selector can be flexibly configured according to different bit precisions.
7. The design method of the integrated chip of storage and calculation for the mixed precision addition neural network according to claim 6, wherein, For a 2-bit additive neural network, the bit flexible minimum selector circuit is configured for 2-bit precision calculation, and each bit flexible minimum selector circuit implements 4 sets of 2-bit calculations.
8. The design method of the integrated chip of storage and calculation for the neural network of mixed precision addition according to claim 6, wherein, For a 4-bit additive neural network, the bit-flexible minimum selector circuit is configured for 4-bit precision calculation, and each bit-flexible minimum selector circuit implements two sets of 4-bit calculations.
9. The design method of the integrated chip of storage and calculation for the mixed precision addition neural network according to claim 6, wherein, For an 8-bit additive neural network, the bit flexible minimum selector circuit is configured for 8-bit precision calculation, and each bit flexible minimum selector circuit implements one set of 8-bit calculations.