Method and apparatus for determining multiply-add, electronic device, and storage medium

By determining the target segment of the operands and generating the target product flag and weights, the problems of low energy efficiency and low hardware utilization in the prior art are solved, and the energy efficiency of the multiplication and addition process is improved.

CN116301717BActive Publication Date: 2026-06-19INST OF AUTOMATION CHINESE ACAD OF SCI +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INST OF AUTOMATION CHINESE ACAD OF SCI
Filing Date
2022-11-22
Publication Date
2026-06-19

Smart Images

  • Figure CN116301717B_ABST
    Figure CN116301717B_ABST
Patent Text Reader

Abstract

This invention provides a method, apparatus, electronic device, and storage medium for determining a multiply-addition sum, relating to the field of computer technology. The method includes: determining at least one target segment based on at least one set of operands; each set of operands includes a first operand and a second operand; the first operand has a data precision of q bits, and the second operand has a data precision of p bits; q and p are both positive integers; generating a target product flag bit for at least one target product based on each target segment; generating a weight for each target product based on each target product flag bit and a sign flag bit corresponding to each set of operands; and determining the multiply-addition sum corresponding to at least one set of operands based on each target product and its weight. The method provided by this invention reduces redundant computation, thereby reducing energy consumption and improving energy efficiency.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and in particular to a method, apparatus, electronic device, and storage medium for determining a multiplicative sum. Background Technology

[0002] Deep Neural Networks (DNN) algorithms have three characteristics, among which: (1) the basic mode of most computations in DNN is multiply-accumulate (MAC) operation, i.e., ∑a i ·w i a i The activation value of the convolutional layer is usually the output of the previous convolutional layer, w. i (1) The weights of the convolution kernel; (2) The operand precision (i.e. the bit width of the operand) of the MAC operation can be reduced without significantly losing the DNN computation results. Common operand precisions are 2 bits, 4 bits, 8 bits and 16 bits (also known as data bit width); (3) In order to maintain the accuracy of the DNN computation results, the operand precision of the MAC operation can be adjusted separately between different convolutional layers of the same DNN network, and the precision varies greatly.

[0003] Fixed-bit-width accelerators can only perform MAC operations with one data precision, which has significant limitations when running DNNs with variable precision. Variable-bit-width accelerators effectively solve the problems of fixed-bit-width accelerators. The basic computational unit of a variable-bit-width accelerator is a variable-precision multiply-accumulate unit, which can be configured to different operating modes according to the precision of the operands, and can well accommodate MAC operations with different precision data. Common design methods for variable-precision multiply-accumulate units can be divided into two categories: Low Precision Combination (LPC) methods and High Precision Split (HPS) methods. Among them, the LPC method has low energy efficiency in high-precision (e.g., 16-bit) mode, while the HPS method has low hardware utilization when performing low-precision (e.g., 2-bit) multiplication. Summary of the Invention

[0004] This invention provides a method, apparatus, electronic device, and storage medium for determining the sum of multiplication and addition, which addresses the shortcomings of low energy efficiency in high-precision mode in the prior art. It reduces redundant calculations in the calculation of the sum of multiplication and addition of at least one operand, thereby reducing energy consumption and improving energy efficiency.

[0005] This invention provides a method for determining the sum of multiplication and addition, comprising:

[0006] At least one target segment is determined based on at least one set of operands; each set of operands includes a first operand and a second operand; the data precision of the first operand is q bits, and the data precision of the second operand is p bits; where q and p are both positive integers.

[0007] Based on each of the target segments, generate the target product flag bit for each of the target products;

[0008] Based on the target product flag bits and the sign flag bits corresponding to each set of operands, the weights of each target product are generated;

[0009] Based on each of the target products and their respective weights, determine at least one set of multiply-addition sums corresponding to operands.

[0010] According to a method for determining a multiply-addition sum provided by the present invention, determining at least one target segment based on at least one set of operands includes:

[0011] For each set of operands, the first operand and the second operand are preprocessed to obtain the third operand and the fourth operand;

[0012] The third operand and the fourth operand are divided into segments to obtain at least one target segment.

[0013] According to a method for determining a multiply-addition sum provided by the present invention, generating a target product flag bit for at least one target product based on each of the target segments includes:

[0014] Based on each of the target segments, at least one segment combination is generated;

[0015] Based on the numerical values ​​corresponding to each of the aforementioned fragment combinations, a target product flag is generated for each of the aforementioned target products.

[0016] According to a method for determining a multiply-addition sum provided by the present invention, the step of generating the weights of each of the target products based on the unique target product flag bits and the sign flag bits corresponding to each set of operands includes:

[0017] Based on each of the target product flag bits, generate a target product flag bit combination corresponding to each of the target products;

[0018] The weights of each target product are determined based on the target product flag combination and the sign flag corresponding to each set of operands.

[0019] According to a method for determining a multiply-addition sum provided by the present invention, the step of determining the weight of each target product based on the target product flag combination and the sign flag corresponding to each group of operands includes:

[0020] The target coefficient is determined based on the target product flag combination and the sign flag corresponding to each group of operands;

[0021] Based on the target coefficients, the weights of each target product are determined.

[0022] According to a method for determining a multiply-addition sum provided by the present invention, determining the multiply-addition sum corresponding to at least one set of operands based on each of the target products and the weights of each of the target products includes:

[0023] Based on each of the target products and the weights of each of the target products, at least one partial sum is determined;

[0024] Add the sums of the aforementioned parts to obtain at least one set of multiply-addition sums corresponding to operands.

[0025] The present invention also provides an apparatus for determining the sum of multiplication and addition, comprising:

[0026] The first determining module is used to determine at least one target segment based on at least one set of operands; each set of operands includes a first operand and a second operand;

[0027] The first generation module is used to generate at least one target product flag bit for the target product based on each of the target fragments;

[0028] The second generation module is used to generate the weights of each target product based on the target product flag bits and the sign flag bits corresponding to each set of operands.

[0029] The second determining module is used to determine the target multiply-addition based on each of the target products and the weights of each of the target products.

[0030] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the method for determining the multiplication-addition sum as described above.

[0031] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method for determining the multiplication-addition sum as described above.

[0032] The present invention also provides a computer program product, including a computer program that, when executed by a processor, implements the method for determining the multiplication-addition sum as described above.

[0033] The present invention provides a method, apparatus, electronic device, and storage medium for determining a multiply-addition sum. This involves determining at least one target segment based on at least one set of operands; each set of operands includes a first operand and a second operand; the first operand has a data precision of q bits, and the second operand has a data precision of p bits; both q and p are positive integers; then, based on multiple target segments, at least one target product flag is generated; based on each target product flag and the sign flag corresponding to each set of operands, a weight for each target product is generated; and finally, based on multiple target products and the weight of each target product, the multiply-addition sum corresponding to at least one set of operands is determined. This reduces redundant calculations during the calculation of the multiply-addition sum of at least one set of operands, thereby reducing energy consumption and improving energy efficiency. Attached Figure Description

[0034] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0035] Figure 1 This is a schematic diagram of the LPC method provided by existing technology;

[0036] Figure 2 This is a schematic diagram of the HPS method provided by existing technology;

[0037] Figure 3 This is a schematic diagram of operand and bit segmentation provided in the prior art;

[0038] Figure 4 This is a schematic diagram of an LPC array provided in the prior art;

[0039] Figure 5 This is one of the flowcharts illustrating the method for determining the sum of multiplication and addition provided by the present invention;

[0040] Figure 6(a) is a numerical schematic diagram of the input operand representation under 16-bit precision provided by the present invention;

[0041] Figure 6(b) is a numerical diagram of the input operand representation under 8-bit precision provided by the present invention;

[0042] Figure 6(c) is a numerical schematic diagram of the input operand representation under 4-bit precision provided by the present invention;

[0043] Figure 6(d) is a numerical diagram of the input operand representation under 2-bit precision provided by the present invention;

[0044] Figure 7It is the bit segment number corresponding to the operand provided by this invention;

[0045] Figure 8(a) is one of the schematic diagrams of the generation structure of the target product flag provided by the present invention;

[0046] Figure 8(b) is a second schematic diagram of the generation structure of the target product flag provided by the present invention;

[0047] Figure 8(c) is the third schematic diagram of the generation structure of the target product flag provided by the present invention;

[0048] Figure 8(d) is the fourth schematic diagram of the generation structure of the target product flag provided by the present invention;

[0049] Figure 8(e) is the fifth schematic diagram of the generation structure of the target product flag provided by the present invention;

[0050] Figure 8(f) is the sixth schematic diagram of the generation structure of the target product flag provided by the present invention;

[0051] Figure 9 This is a schematic diagram illustrating the generation of the target product flag combination provided by the present invention;

[0052] Figure 10 This invention provides a calculation structure for the target coefficient c7 with 16-bit precision.

[0053] Figure 11(a) shows the c in 16-bit precision mode provided by the present invention. i Combining them yields c in 8-bit precision mode. i A schematic diagram;

[0054] Figure 11(b) shows the c provided by the present invention in 8-bit precision mode. i Combining them yields c in 4-bit precision mode. i A schematic diagram;

[0055] Figure 11(c) shows the c provided by the present invention in 4-bit precision mode. i Combining them yields c in 2-bit precision mode. i A schematic diagram;

[0056] Figure 12(a) is a schematic diagram of the splicing method in the 16-bit precision mode provided by the present invention;

[0057] Figure 12(b) is a schematic diagram of the splicing method in the 8-bit precision mode provided by the present invention;

[0058] Figure 12(c) is a schematic diagram of the splicing method in the 4-bit precision mode provided by the present invention;

[0059] Figure 12(d) is a schematic diagram of the splicing method in the 2-bit precision mode provided by the present invention;

[0060] Figure 13 This is one of the structural schematic diagrams of the variable precision multiply-accumulate calculation unit provided by the present invention;

[0061] Figure 14 This is the second schematic diagram of the variable precision multiply-accumulate calculation unit provided by the present invention;

[0062] Figure 15 This is a schematic diagram of the structure of the device for determining the sum of multiplication and addition provided by the present invention;

[0063] Figure 16 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation

[0064] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.

[0065] To facilitate a clearer understanding of the various embodiments of this application, some relevant background knowledge will be introduced as follows.

[0066] Figure 1 This is a schematic diagram of the LPC method provided by existing technology, such as... Figure 1 As shown, low-precision units are combined with configurable shifters to achieve multi-precision MAC operations within one clock cycle. Each LPC's MAC unit includes 16 basic 2-bit signed fixed-point multiplications. The multiplication integrals of these 16 2-bit signed fixed-point multiplications are divided into 4 groups, and then added together to obtain 4 partial sums. Figure 1 The shift values ​​of the shifters in the LPC MAC unit differ across modes. In 2-bit fixed-point mode, the 16 fragment products and 4 partial sums are added sequentially without shifting. In 4-bit fixed-point mode, the fragment products in each group are shifted left by 0, 2, 2, and 4 bits respectively, and then added to obtain a partial sum, which is then directly added together. In 8-bit fixed-point mode, the partial sums are generated in the same way as in 4-bit mode, but the four partial sums need to be shifted left by 0, 4, 4, and 8 bits respectively before being added together. Therefore, the LPC MAC unit in 2-bit, 4-bit, and 8-bit fixed-point modes can execute within one clock cycle. as well as However, achieving full reconfigurability of this MAC unit requires significant hardware costs and is less energy efficient in higher precision (e.g., 16-bit) modes.

[0067] Figure 2 This is a schematic diagram of the HPS method provided by existing technology, such as... Figure 2 As shown, gating divides the high-precision unit into several sub-units to accommodate low-precision operations. The HPS MAC unit is a two-dimensional symmetric scalable architecture. In 2-bit, 4-bit, or 8-bit fixed-point modes, the HPS MAC unit can perform one 8-bit fixed-point multiplication, two 4-bit fixed-point multiplications, or four 2-bit fixed-point multiplications, respectively. In 8-bit fixed-point multiplication mode, the hardware utilization of HPS is 100%; in 4-bit fixed-point multiplication mode, the hardware utilization is 50%; and in 2-bit fixed-point multiplication mode, the hardware utilization is only 25%. It is evident that when HPS performs low-precision multiplication, the hardware utilization is relatively low.

[0068] The LPC method will be explained further below.

[0069] Formula (1) describes the basic idea of ​​the LPC method, where:

[0070]

[0071] Where L represents the number of multiplications in the multiplication-addition operation, q represents the data precision of operand x (q bits), p represents the data precision of operand y (p bits), l represents the l-th multiplication in the multiplication-addition operation, M represents splitting operand x into M 2-bit segments (M = q / 2), and N represents splitting operand y into N 2-bit segments (N = p / 2). This represents the m-th 2-bit segment corresponding to operand x. α represents the nth 2-bit segment corresponding to operand y. m,n,l Let α represent the coefficient of the l-th multiplication. m,n =2 2(m+n) .

[0072] By calculating signed multiplications between different segments and shifting the products, operand multiplication in MAC is transformed into addition of bit segment product shift values.

[0073] The following section uses a 16-bit multiply-accumulate operation as an example for further explanation, and qualitatively describes the low energy efficiency problem of the LPC method in high-precision mode.

[0074] Figure 3 This is a schematic diagram of operand and bit segmentation provided in existing technology, such as... Figure 3As shown, operands x and y are both 16-bit signed data, which are split into 2-bit segments. The split segments are... Figure 4 Signed multiplication is performed in the basic unit of the array shown (requiring the 2-bit segment to be extended by 1 bit sign bit). Figure 4 This is a schematic diagram of an LPC array provided in the prior art, such as... Figure 4 As shown, the LPC array contains 64 basic units (called BitBricks, BBs). Each BB contains a 2-bit signed multiplier and a configurable shifter. The signed multiplier performs segment multiplication with sign bit extension, and the configurable shifter shifts the segment product left by a certain number of bits. This LPC array can perform multiply-add operations with 16-bit operand precision, 4 operand precision (8-bit), 16 operand precision (4-bit), or 64 operand precision (2-bit). Table 1 shows the multiplication and shifting operations performed within each BB when the LPC array performs a multiply-add operation with 16-bit operand precision.

[0075] The rules for sign bit extension of a bit segment are as follows:

[0076] Rule 1: If this segment is an unsigned segment (the segment does not contain the sign bit of the original value), then... Figure 3 If the 0 to 6 segments in the original unsigned segment are used, then the sign bit of the segment is extended to 0. At this time, the 3-bit signed number represented by the signed segment is numerically equal to the 2-bit unsigned number represented by the original unsigned segment.

[0077] Rule 2: If this segment is a signed segment (the segment contains the sign bit of the original value), that is... Figure 3 The 7-segment (2-bit signed segment) shown in the diagram has its sign bit extended to the sign bit of the operand. For example, if the sign bit of the operand is 0, the sign bit of segment 7 is extended to 0, and the signed number represented by this segment is numerically equal to the 2-bit unsigned number represented by the original segment. If the sign bit of the operand x is 1, the sign bit of segment 7 is extended to 1, and the signed number represented by this segment is numerically equal to the 2-bit signed number represented by the original segment.

[0078] Table 1. Operations between bit segments

[0079] BB number 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x fragment 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 y fragment 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Product shift value 0 2 4 6 8 10 12 14 2 4 6 8 10 12 14 16 BB number 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 x fragment 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 y fragment 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Product shift value 4 6 8 10 12 14 16 18 6 8 10 12 14 16 18 20 BB number 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 x fragment 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 y fragment 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Product shift value 8 10 12 14 16 18 20 22 10 12 14 16 18 20 22 24 BB number 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 x fragment 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 y fragment 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Product shift value 12 14 16 18 20 22 24 26 14 16 18 20 22 24 26 28

[0080] Based on the above sign bit extension rules, it can be concluded that the operands of the 49 un-bold, un-slanted blocks in Table 1 are all sign-extended according to rule 1. Here, a 3-bit signed multiplication is actually equivalent to a 2-bit unsigned multiplication. The 15 block-level operands in the bold, slanted blocks are sign-extended according to rule 2. Furthermore, when both operand x and operand y are positive (i.e., the sign bit is 0), the sign bit extension of the operands in the shaded block is 0. Here, a 3-bit signed multiplication is also equivalent to a 2-bit unsigned multiplication. Therefore, at the basic computational unit level, the LPC method wastes computational resources.

[0081] Table 2. All combinations of 2-bit unsigned number multiplication

[0082] Segment 1 00 00 00 00 01 01 01 01 10 10 10 10 11 11 11 11 Segment 2 00 01 10 11 00 01 10 11 00 01 10 11 00 01 10 11 product 0 0 0 0 0 1 2 3 0 2 4 6 0 3 6 9

[0083] Furthermore, Table 2 shows all combinations of 2-bit unsigned number multiplication provided by existing technologies. According to Table 2, there are 9 valid combinations (all multipliers are non-zero) and 6 valid products. The LPC method requires 49 bytes to perform all 3-bit signed multiplications with the sign bit set to 0. Therefore, at the array level, the LPC method suffers from significant computational redundancy.

[0084] The following is combined with Figures 1-13 The method for determining the multiplication-addition sum of the present invention is described.

[0085] Figure 5 This is one of the flowcharts illustrating the method for determining the sum of multiplication and addition provided by the present invention, such as... Figure 5 As shown, the method includes steps 510-540, wherein:

[0086] Step 510: Determine at least one target segment based on at least one set of operands; each set of operands includes a first operand and a second operand; the data precision of the first operand is q bits, and the data precision of the second operand is p bits; q and p are both positive integers.

[0087] It should be noted that the method for determining the multiplication sum provided by this invention can be applied to the scenario of convolution calculation in deep neural networks. The executing entity of the method for determining the multiplication sum provided by this invention can be the multiplication sum determination device provided by this invention, such as a variable precision multiplication-accumulation calculation unit or a control module in the multiplication sum determination device for executing the method for determining the multiplication sum.

[0088] Specifically, each operand pair includes a first operand and a second operand. The first operand has a data precision of q bits, and the second operand has a data precision of p bits; both q and p are positive integers. The first and second operands can be signed or unsigned. For example, each operand pair may include one 16-bit first operand and one 16-bit second operand, or four 8-bit first operands and one 8-bit second operand, or sixteen 4-bit first operands and one 4-bit second operand, or sixty-four 2-bit first operands and one 2-bit second operand.

[0089] For example, when the input operand is 128 bits, and the multiplication and addition precision is 16 bits, 8 bits, 4 bits, or 2 bits respectively, Figure 6(a) is a numerical diagram of the input operand representation under 16-bit precision provided by the present invention, Figure 6(b) is a numerical diagram of the input operand representation under 8-bit precision provided by the present invention, Figure 6(c) is a numerical diagram of the input operand representation under 4-bit precision provided by the present invention, and Figure 6(d) is a numerical diagram of the input operand representation under 2-bit precision provided by the present invention.

[0090] In practice, for each set of operands, at least one target segment can be determined based on the data precision of the first operand and the second operand; wherein the target segment is a 2-bit segment. For example, if the data precision of the first operand is qbit and the data precision of the second operand is pbit, then the M 2-bit segments corresponding to the first operand are (M = q / 2), and the N 2-bit segments corresponding to the second operand are (N = p / 2).

[0091] Step 520: Generate a target product flag based on each of the target segments.

[0092] Specifically, at least one target product flag can be generated based on at least one determined target segment.

[0093] For example, if the target segment is a 2-bit segment, Table 2 shows all combinations of multiplication of a pair of 2-bit unsigned numbers. As shown in Table 2, the product of the 2-bit unsigned numbers corresponding to segment 1 and segment 2 has a total of 7 possibilities: 0, 1, 2, 3, 4, 6, and 9. In this application, valid products of 1, 2, 3, 4, 6, and 9 are used as the target product.

[0094] The target product flag is represented by a 6-bit one-hot code, used to indicate which of the six target products (1, 2, 3, 4, 6, and 9) the product of each target segment belongs to. For example, if the target product flag is 100000, it means that the target segment corresponding to the first operand is x_slice = 01, and the target segment corresponding to the second operand is y_slice = 01, i.e., x_slice × y_slice = 1. In this case, the target product flag is 100000, and the target product indicated by the target product flag is 1.

[0095] Step 530: Generate the weights of each target product based on the target product flag bits and the sign flag bits corresponding to each set of operands.

[0096] It should be noted that reordering multiple target segments can yield a combination of segments corresponding to multiple target segments. During the weight generation process of the target product, each target segment needs to be appended with a 1-bit sign flag to indicate the positive or negative value of each target segment. The 1-bit sign flag can be arranged according to the integer shaping method of the operands.

[0097] The sign flag for each set of operands can be obtained from the sign bits of the first and second operands, and can be represented by formula (2):

[0098]

[0099] Where sign_p represents the sign flag of each operand group, sign_x represents the sign bit of operand x, and sign_y represents the sign bit of operand y.

[0100] For example, by extracting the sign flag bits of the first operand x and the second operand y from 128-bit input data and performing a bitwise XOR operation, the sign flag bit of the target product can be obtained. For 16-bit precision mode, a 1-bit target product sign flag bit can be obtained, denoted as s. 16,0 For 8-bit precision mode, four 1-bit target product sign flags can be obtained, denoted as s. 8,0 -s 8,3 For 4-bit precision mode, 16 1-bit target product sign flags can be obtained, denoted as s. 4,0 -s 4,15 For 2-bit precision mode, 64 1-bit target product sign flags can be obtained, denoted as s. 2,0 -s 2,63The sign flag of the target product is arranged and shaped as shown in Tables 3-6. Table 3 shows the shaping method for the target product sign flag corresponding to 16-bit operands, Table 4 shows the shaping method for the target product sign flag corresponding to 8-bit operands, Table 5 shows the shaping method for the target product sign flag corresponding to 4-bit operands, and Table 6 shows the shaping method for the target product sign flag corresponding to 2-bit operands.

[0101] Table 3. Rectification methods for the target product sign flag corresponding to 16-bit operands

[0102] x fragment 0 1 0 2 1 0 3 2 1 0 4 3 2 1 0 5 4 3 2 1 0 6 5 4 3 2 1 0 7 6 5 4 y fragment 0 0 1 0 1 2 0 1 2 3 0 1 2 3 4 0 1 2 3 4 5 0 1 2 3 4 5 6 0 1 2 3 Symbol flag <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]>

[0103] Table 3 (continued). The shaping method of the target product sign flag for 16-bit operands.

[0104] x fragment 3 2 1 0 7 6 5 4 3 2 1 7 6 5 4 3 2 7 6 5 4 3 7 6 5 4 7 6 5 7 6 7 y fragment 4 5 6 7 1 2 3 4 5 6 7 2 3 4 5 6 7 3 4 5 6 7 4 5 6 7 5 6 7 6 7 7 Symbol flag <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s16.0]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 ,0 ]]> <![CDATA[s 16 , 0]]>

[0105] Table 4. Integer shaping of the target product sign flag for 8-bit operands

[0106] x fragment 3 1 0 0 2 1 0 4 8 12 2 0 1 6 4 5 4 9 8 13 12 5 10 8 9 14 12 13 7 4 6 5 y fragment 0 0 1 3 1 2 0 4 8 12 0 2 1 4 6 4 5 8 9 12 13 5 8 10 9 12 14 13 4 7 5 6 Symbol flag <![CDATA[s 8, 0]]> <![CDATA[s 8, 0]]> <![CDATA[s 8, 0]]> <![CDATA[s 8, 0]]> <![CDATA[s 8,0 ]]> <![CDATA[s 8, 0]]> <![CDATA[s 8, 0]]> <![CDATA[s 8, 1]]> <![CDATA[s 8, 2]]> <![CDATA[s 8, 3]]> <![CDATA[s 8, 0]]> <![CDATA[s 8, 0]]> <![CDATA[s 8, 0]]> <![CDATA[s 8, 1]]> <![CDATA[s 8, 1]]> <![CDATA[s 8, 1]]> <![CDATA[s 8, 1]]> <![CDATA[s 8, 2]]> <![CDATA[s 8, 3]]> <![CDATA[s 8, 3]]> <![CDATA[s 8, 3]]> <![CDATA[s 8, 1]]> <![CDATA[s 8, 2]]> <![CDATA[s 8, 2]]> <![CDATA[s 8, 2]]> <![CDATA[s 8, 3]]> <![CDATA[s 8, 3]]> <![CDATA[s 8, 3]]> <![CDATA[s 8, 1]]> <![CDATA[s 8, 1]]> <![CDATA[s 8, 1]]> <![CDATA[s 8, 1]]>

[0107] Table 4 (continued). The shaping method of the target product sign flag for 8-bit operands.

[0108] x fragment 11 8 10 1 3 1 2 7 5 6 11 3 2 7 6 11 10 9 10 15 13 14 3 7 11 15 15 12 14 15 14 13 y fragment 8 11 9 10 1 3 2 5 7 6 9 2 3 6 7 10 11 11 10 13 15 14 3 7 11 15 12 15 13 14 15 14 Symbol flag <![CDATA[s 8, 2]]> <![CDATA[s 8, 2]]> <![CDATA[s 8, 2]]> <![CDATA[s 8, 2]]> <![CDATA[s 8, 0]]> <![CDATA[s 8, 0]]> <![CDATA[s 8, 0]]> <![CDATA[s 8, 1]]> <![CDATA[s 8, 1]]> <![CDATA[s 8, 1]]> <![CDATA[s 8, 2]]> <![CDATA[s 8, 0]]> <![CDATA[s 8, 0]]> <![CDATA[s 8, 1]]> <![CDATA[s 8, 1]]> <![CDATA[s 8, 2]]> <![CDATA[s 8, 2]]> <![CDATA[s 8, 2]]> <![CDATA[s 8, 2]]> <![CDATA[s 8, 3]]> <![CDATA[s 8, 3]]> <![CDATA[s 8, 3]]> <![CDATA[s 8, 0]]> <![CDATA[s 8, 1]]> <![CDATA[s 8, 2]]> <![CDATA[s 8, 3]]> <![CDATA[s 8, 3]]> <![CDATA[s 8, 3]]> <![CDATA[s 8, 3]]> <![CDATA[s 8, 3]]> <![CDATA[s 8, 3]]> <![CDATA[s 8, 3]]>

[0109] Table 5. Integer shaping of the target product sign flag for 4-bit operands

[0110] x fragment 30 31 28 29 26 27 30 28 26 24 22 20 18 16 14 24 25 22 23 20 21 12 10 8 6 4 2 0 18 19 16 17 y fragment 31 30 29 28 27 26 30 28 25 24 22 20 18 16 14 25 24 23 22 21 20 12 10 8 6 4 2 0 19 18 17 16 Symbol flag <![CDATA[s 4, 0]]> <![CDATA[s 4, 0]]> <![CDATA[s 4, 1]]> <![CDATA[s 4, 1]]> <![CDATA[s 4, 2]]> <![CDATA[s 4, 2]]> <![CDATA[s 4, 0]]> <![CDATA[s 4, 1]]> <![CDATA[s 4, 2]]> <![CDATA[s 4, 3]]> <![CDATA[s 4, 4]]> <![CDATA[s 4, 5]]> <![CDATA[s 4, 6]]> <![CDATA[s 4, 7]]> <![CDATA[s 4, 8]]> <![CDATA[s 4, 3]]> <![CDATA[s 4, 3]]> <![CDATA[s 4, 4]]> <![CDATA[s 4, 4]]> <![CDATA[s 4, 5]]> <![CDATA[s 4, 5]]> <![CDATA[s 4, 9]]> <![CDATA[s 4, 10 ]]> <![CDATA[s 4, 11 ]]> <![CDATA[s 4, 12 ]]> <![CDATA[s 4, 13 ]]> <![CDATA[s 4, 14 ]]> <![CDATA[s 4, 15 ]]> <![CDATA[s 4, 6]]> <![CDATA[s 4, 6]]> <![CDATA[s 4, 7]]> <![CDATA[s 4, 7]]>

[0111] Table 5 (continued). The shaping method of the target product sign flag for 4-bit operands.

[0112] x fragment 14 15 12 13 31 29 27 25 23 21 19 10 11 8 9 6 7 17 15 13 11 9 7 5 3 1 4 5 2 3 0 1 y fragment 15 14 13 12 31 29 27 25 23 21 19 11 10 9 8 7 6 17 15 13 11 9 7 5 3 1 5 4 3 2 1 0 Symbol flag <![CDATA[s 4, 8]]> <![CDATA[s 4, 8]]> <![CDATA[s 4, 9]]> <![CDATA[s 4, 9]]> <![CDATA[s 4, 0]]> <![CDATA[s 4, 1]]> <![CDATA[s 4, 2]]> <![CDATA[s 4, 3]]> <![CDATA[s 4, 4]]> <![CDATA[s 4, 5]]> <![CDATA[s 4, 6]]> <![CDATA[s 4, 10 ]]> <![CDATA[s 4, 10 ]]> <![CDATA[s 4, 11 ]]> <![CDATA[s 4, 11 ]]> <![CDATA[s 4, 12 ]]> <![CDATA[s 4, 12 ]]> <![CDATA[s 4, 7]]> <![CDATA[s 4, 8]]> <![CDATA[s 4, 9]]> <![CDATA[s 4, 10 ]]> <![CDATA[s 4, 11 ]]> <![CDATA[s 4, 12 ]]> <![CDATA[s 4, 13 ]]> <![CDATA[s 4, 14 ]]> <![CDATA[s 4, 15 ]]> <![CDATA[s 4, 13 ]]> <![CDATA[s 4, 13 ]]> <![CDATA[s 4, 14 ]]> <![CDATA[s 4, 14 ]]> <![CDATA[s 4, 15 ]]> <![CDATA[ s4 , 15 ]]>

[0113] Table 6. Integer shaping of the target product sign flag for 2-bit operands

[0114] x fragment 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 y fragment 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Symbol flag <![CDATA[s 2, 0]]> <![CDATA[s 2, 1]]> <![CDATA[s 2, 2]]> <![CDATA[s 2, 3]]> <![CDATA[s 2, 4]]> <![CDATA[s 2, 5]]> <![CDATA[s 2, 6]]> <![CDATA[s 2, 7]]> <![CDATA[s 2, 8]]> <![CDATA[s 2, 9]]> <![CDATA[s 2, 10 ]]> <![CDATA[s 2, 11 ]]> <![CDATA[s 2, 12 ]]> <![CDATA[s 2, 13 ]]> <![CDATA[s 2, 14 ]]> <![CDATA[s 2, 15 ]]> <![CDATA[s 2, 16 ]]> <![CDATA[s 2, 17 ]]> <![CDATA[s 2, 18 ]]> <![CDATA[s 2, 19 ]]> <![CDATA[s 2, 2,0 ]]> <![CDATA[s 2, 2,1 ]]> <![CDATA[s 2, 12 ]]> <![CDATA[s 2, 2,3 ]]> <![CDATA[s 2, 2,4 ]]> <![CDATA[s 2, 2,5 ]]> <![CDATA[s 2, 26 ]]> <![CDATA[s 2, 27 ]]> <![CDATA[s 2, 28 ]]> <![CDATA[s 2, 29 ]]> <![CDATA[s 2, 30 ]]> <![CDATA[s 2, 31 ]]>

[0115] Table 6 (continued). The shaping method of the target product sign flag for 2-bit operands.

[0116] x fragment 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 y fragment 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Symbol flag <![CDATA[s 2, 32 ]]> <![CDATA[s 2, 33 ]]> <![CDATA[s 2, 34 ]]> <![CDATA[s 2, 35 ]]> <![CDATA[s 2, 36 ]]> <![CDATA[s 2, 37 ]]> <![CDATA[s 2, 33 ]]> <![CDATA[s 2, 39 ]]> <![CDATA[s 2, 40 ]]> <![CDATA[s 2, 41 ]]> <![CDATA[s 2, 42 ]]> <![CDATA[s 2, 43 ]]> <![CDATA[s 2, 44 ]]> <![CDATA[s 2, 45 ]]> <![CDATA[s 2, 46 ]]> <![CDATA[s 2, 47 ]]> <![CDATA[s 2, 48 ]]> <![CDATA[s 2, 49 ]]> <![CDATA[s 2, 50 ]]> <![CDATA[s 2, 51 ]]> <![CDATA[s 2, 52 ]]> <![CDATA[s 2, 53 ]]> <![CDATA[s 2, 54 ]]> <![CDATA[s 2, 55 ]]> <![CDATA[s 2, 56 ]]> <![CDATA[s 2, 57 ]]> <![CDATA[s 2, 58 ]]> <![CDATA[s 2, 59 ]]> <![CDATA[s 2, 60 ]]> <![CDATA[s 2, 61 ]]> <![CDATA[s 2, 62 ]]> <![CDATA[s 2, 63 ]]>

[0117] Specifically, the weights of each target product can be generated based on the flag bits of each target product and the sign flag bits corresponding to each set of operands.

[0118] Step 540: Based on each of the target products and the weights of each of the target products, determine at least one set of multiply-addition sums corresponding to operands.

[0119] Specifically, based on each target product and its weight, at least one set of operands corresponding to the multiply-addition can be determined.

[0120] The method for determining the sum of multiplication and addition provided by this invention determines at least one target segment based on at least one set of operands; each set of operands includes a first operand and a second operand; the data precision of the first operand is q bits, and the data precision of the second operand is p bits; q and p are both positive integers; then, based on multiple target segments, at least one target product flag is generated; based on each target product flag and the sign flag corresponding to each set of operands, the weight of each target product is generated; and finally, based on multiple target products and the weight of each target product, the sum of multiplication and addition corresponding to at least one set of operands is determined. This method reduces redundant calculations during the calculation of the sum of multiplication and addition of at least one operand, thereby reducing energy consumption and improving energy efficiency.

[0121] Optionally, the specific implementation of step 510 above includes:

[0122] Step 1) For each set of operands, preprocess the first operand and the second operand to obtain the third operand and the fourth operand.

[0123] It should be noted that for the first operand x and the second operand y (assuming x < 0, y > 0) where the product s is negative, the product s is calculated using the LPC method, and the product s is expressed by formula (3), where:

[0124]

[0125] Where M represents the number of 2-bit segments corresponding to the first operand x, N represents the number of 2-bit segments corresponding to the first operand y, and sg(·) represents signed multiplication. This represents the m-th 2-bit segment corresponding to the first operand x. α represents the nth 2-bit segment corresponding to the second operand y. m,n α represents the coefficient of the product of the m-th 2-bit segment and the n-th 2-bit segment. m,n =2 2(m+n) .

[0126] at this time, and Multiplication is a signed multiplication. If the product s is calculated using formula (4), then:

[0127]

[0128] Here, us(·) represents unsigned multiplication.

[0129] at this time, and Multiplication can be viewed as unsigned multiplication.

[0130] Therefore, the first and second operands can be preprocessed to convert them into positive numbers, as shown in formula (5), where:

[0131]

[0132] Where op_p represents the sign bit of the first or second operand after the conversion, and sign_op represents the sign bit of the first or second operand before the conversion.

[0133] Check the sign bits of the first and second operands. If the sign bits of both operands are 0, no action is needed; otherwise, convert the sign bits of both operands to their opposites. Specifically, invert each bit of the operand and increment it by one.

[0134] It should be noted that if all negative operands involved in the multiplication-addition calculation are processed to integers before the calculation, then the above formula (1) can be expressed as formula (6), where:

[0135]

[0136] in,

[0137] Since the number of target products in unsigned multiplication is finite, formula (6) can be transformed into formula (7), where:

[0138]

[0139] Where A represents the set of coefficients for the target product k, and N k It can be expressed by formula (8), where:

[0140]

[0141] Where i = m + n, c i,k This represents the number of times the target product k appears in the set of all products that need to be left-shifted by 2i bits.

[0142] Step 2) Divide the third operand and the fourth operand respectively to obtain at least one target segment.

[0143] Specifically, after preprocessing the first and second operands to obtain the third and fourth operands, the third and fourth operands are divided to obtain at least one target segment. For example, if the data precision of the third operand is qbit and the data precision of the fourth operand is pbit, then the third operand can be divided into M 2-bit segments (M = q / 2), and the fourth operand can be divided into N 2-bit segments (N = p / 2).

[0144] The method for determining the multiplication-addition sum provided by the present invention preprocesses the first and second operands in each group of operands to obtain the third and fourth operands, both of which have a sign bit of 0; then, the third and fourth operands are divided to obtain at least one target segment. This reduces redundant calculations in the calculation of the multiplication-addition sum of at least one operand, thereby reducing energy consumption and improving energy efficiency.

[0145] Optionally, the specific implementation of step 520 above includes the following steps:

[0146] Step a) Generate at least one combination of segments based on each of the target segments.

[0147] Specifically, based on multiple target segments, all segment combinations can be generated using the LPC method. That is, by reshaping the target segments obtained from the preprocessed third and fourth operands, at least one segment combination can be generated. For example, Tables 7 to 10 show the reordering methods for 16-bit, 8-bit, 4-bit, and 2-bit precision data, respectively.

[0148] Table 7. Segment Combination Methods for 16-bit Operands

[0149]

[0150] Table 7 (continued). Segment combination methods for 16-bit operands

[0151]

[0152] Table 8. Fragment Combination Methods for 8-bit Operands

[0153]

[0154] Table 8 (continued). Segment combination methods corresponding to 8-bit operands

[0155]

[0156] Table 9. Fragment Combination Methods for 4-bit Operands

[0157]

[0158] Table 9 (continued). Segment combination methods corresponding to 4-bit operands

[0159]

[0160] Table 10: Fragment Combination Methods for 2-bit Operands

[0161]

[0162] Table 10 (continued): Segment Combinations for 2-bit Operands

[0163]

[0164] The segment numbers in Tables 7 to 10 have the following meanings: Figure 7 As shown, Figure 7 This refers to the bit segment number corresponding to the operand provided by this invention. It should be noted that the number of segment x can be represented by m, and the number of segment y can be represented by n.

[0165] Step b) Generate the target product flag bit for each target product based on the numerical values ​​corresponding to each of the segment combinations.

[0166] Specifically, since there are only 6 valid products for the target bit (2-bit) unsigned multiplication, a 6-bit one-hot code is generated for each of the 64 possible combinations of segments corresponding to the 2-bit operand after the result is shaped. This code indicates which of the 6 target products the product of the segment combination belongs to.

[0167] x[1] represents the most significant bit (MSB) of segment x, x[0] represents the least significant bit (LSB) of segment x, y[1] represents the MSB of segment y, and y[0] represents the LSB of segment y. If a segment combination is x_slice = 01 and y_slice = 01, then Figure 8(a) is one of the schematic diagrams of the generation structure of the target product flag provided by the present invention. As shown in Figure 8(a), the output bit is 1. Figure 8(b) is another schematic diagram of the generation structure of the target product flag provided by the present invention. Figure 8(c) is a third schematic diagram of the generation structure of the target product flag provided by the present invention. Figure 8(d) is a fourth schematic diagram of the generation structure of the target product flag provided by the present invention. Figure 8(e) is a fifth schematic diagram of the generation structure of the target product flag provided by the present invention. Figure 8(f) is a sixth schematic diagram of the generation structure of the target product flag provided by the present invention. Figures 8(b) to 8(f)All output bits are 0, meaning the target product flag for this segment is 100000, indicating that if multiplication is performed between these bit segments, the product will be 1, i.e., x_slice × y_slice = 1. If a segment combination is x_slice = 01, y_slice = 10, or x_slice = 10, y_slice = 01, then the output bit of 8(b) is 1, as shown in Figure 8(a). Figures 8(c) to 8(f) All output bits are 0, meaning the target product flag for this segment is 010000, indicating that if multiplication is performed between these bit segments, the product is 2, i.e., x_slice × y_slice = 2. If a segment combination is x_slice = 01, y_slice = 11, or x_slice = 11, y_slice = 01, then the output bit in Figure 8(c) is 1, and the output bits in Figures 8(a), 8(b), and... Figures 8(d) to 8(f) All output bits are 0, meaning the target product flag for this segment is 001000, indicating that if multiplication is performed between these bit segments, the product will be 3, i.e., x_slice × y_slice = 3. If a segment combination is x_slice = 10 and y_slice = 10, then the output bits in Figure 8(d) will be 1. Figures 8(a) to 8(c) ,as well as Figure 8(e) and 8(f) All output bits are 0, meaning the target product flag for this segment is 000100, indicating that if multiplication is performed between segments, the product will be 4, i.e., x_slice × y_slice = 4. If a segment combination is x_slice = 10, y_slice = 11, or x_slice = 11, y_slice = 10, then the output bits in Figure 8(e) are 1. Figures 8(a) to 8(d) The bits output in Figure 8(f) are all 0, meaning the target product flag for this segment is 000010, indicating that if multiplication is performed between these bit segments, the product will be 6, i.e., x_slice × y_slice = 6. If a segment combination is x_slice = 11 and y_slice = 11, then the bits output in Figure 8(f) are 1. Figures 8(a) to 8(e) All output bits are 0, meaning the target product flag for this segment is 000001, indicating that if multiplication is performed between these bit segments, the product will be 9, i.e., x_slice×y_slice=9.

[0168] The method for determining the sum of multiplication and addition provided by the present invention generates at least one combination of segments based on multiple target segments; and generates a target product flag bit for each target product based on the value corresponding to each combination of segments. This reduces redundant calculations in the calculation process of the sum of multiplication and addition of at least one operand, thereby reducing energy consumption and improving energy efficiency.

[0169] Optionally, the specific implementation of step 530 above includes:

[0170] 1) Based on each of the target product flag bits, generate a target product flag bit combination corresponding to each target product.

[0171] Specifically, based on each target product flag bit, by combining the bits corresponding to the target product flag bits of 64 fragment pairs, a corresponding target product flag bit combination can be generated for each target product of unsigned 2-bit multiplication, for example, a flag (flag_k) of length 64 bits.

[0172] Figure 9 This is a schematic diagram illustrating the generation of the target product flag combination provided by the present invention, as shown below. Figure 9 As shown, the input data is 256 bits, and the 64 corresponding segment pairs are [3:0], [7:4], [11:8], ..., [247:244], [251:248], and [255:252]. Each segment pair represents two 2-bit unsigned numbers. Based on each segment pair, the target product flag bit generation structure shown in Figure 8 is used to determine the target product flag bit for that segment pair. By combining the bits at the same positions of the target product flag bits for each of the 64 segment pairs, the target product flag bit combinations corresponding to the six target products can be obtained.

[0173] For example, if the value corresponding to the fragment pair [255:252] is 0101, then inputting 0101 into the target product flag generation structure shown in Figure 8 will result in a target product flag of 100000, which indicates that the product of the fragment pair is 1; if the value corresponding to the fragment pair [251:248] is 1010, then inputting 1010 into the target product flag generation structure shown in Figure 8 will result in a target product flag of 000100, which indicates that the product of the fragment pair is 4; if the value corresponding to the fragment pair [247:244] is 1001, then inputting 1001 into the target product flag generation structure shown in Figure 8 will result in a target product flag of 010000, which indicates that the product of the fragment pair is 4. The product is 2; the value corresponding to the fragment pair [11:8] is 0110. Then, inputting 0110 into the target product flag generation structure shown in Figure 8, we can get the target product flag corresponding to 0110 as 010000, which means that the product of the fragment pair is 2; the value corresponding to the fragment pair [7:4] is 1111. Then, inputting 1111 into the target product flag generation structure shown in Figure 8, we can get the target product flag corresponding to 1111 as 000001, which means that the product of the fragment pair is 9; the value corresponding to the fragment pair [3:0] is 0000. Then, inputting 0000 into the target product flag generation structure shown in Figure 8, we can get the target product flag corresponding to 0000 as 000000, which means that the product of the fragment pair is 0, which is an invalid product.

[0174] Finally, the bits at the same position of the target product flags for each of the 64 segment pairs are combined. For example, the first bit 1 of the target product flag 100000 for segment pair [255:252], the first bit 0 of the target product flag 000100 for segment pair [251:248], the first bit 0 of the target product flag 010000 for segment pair [247:244], the first bit 0 of the target product flag 010000 for segment pair [11:8], and the first bit 0 of the target product flag 000001 for segment pair [7:4] are combined to obtain the target product flag combination corresponding to target product 1 as: 100·····000.

[0175] 2) Determine the weight of each target product based on the target product flag combination and the sign flag corresponding to each group of operands.

[0176] Specifically, the weight of each target product can be determined based on the combination of target product flags and the sign flags corresponding to each set of operands.

[0177] Optionally, determining the weight of each target product based on the target product flag combination and the sign flag corresponding to each group of operands includes:

[0178] Based on the target product flag combination and the sign flag corresponding to each group of operands, the target coefficient is determined; based on the target coefficient, the weight of each target product is determined.

[0179] Specifically, the N corresponding to each target product k is determined based on the combination of target product flag bits and the sign flag bit corresponding to each group of operands. k The above formula (8) yields N. k The calculation formula, where c i,k This represents the number of occurrences of the target product k in the set of all products that need to be left-shifted by 2i bits. The following example uses 16-bit multiplication to illustrate N. k The generation process. Assume the target product flag combination flag_k corresponding to target product k is:

[0180] flag_k=1010_1111_0110_0101_0000_0010_1000_1111_0111_0001_1001_0110_0000_1001_1101_1011

[0181] This indicates that the products in the italicized and bolded sections of Table 7 are all k. It's important to note that when k appears at a certain position, the number of occurrences of k at that position should be recorded as 1 or -1, depending on the sign of the product of the original operands to which the segment belongs. Taking a 16-bit multiplication unit as an example, if the product of the original operands is positive, i.e., sign_p = sign_x^sign_y = 0, then the target coefficient c... i The values ​​are shown in Table 11. Table 11 shows the target coefficient values ​​corresponding to 16-bit precision unsigned multiplication. Then, according to the above formula (8), the weight N of the target product k can be obtained. k It can be expressed by formula (9), where:

[0182] N k =1×2 0 +1×2 2 +2×2 4 +3×2 6 +2×2 8 +1×2 10 +2×2 12 +2×2 16 +3×2 18 +1×2 20 +3×2 22 +2×2 24 +1×2 26+1×2 28 (9)

[0183] Table 11. Target coefficients c for 16-bit precision multiplication i The value of (sign_x^sign_y = 0)

[0184] <![CDATA[c0]]> <![CDATA[c1]]> <![CDATA[c2]]> <![CDATA[c3]]> <![CDATA[c4]]> <![CDATA[c5]]> <![CDATA[c6]]> <![CDATA[c7]]> <![CDATA[c8]]> <![CDATA[c9]]> <![CDATA[c 10 ]]> <![CDATA[c 11 ]]> <![CDATA[c 12 ]]> <![CDATA[c 13 ]]> <![CDATA[c 14 ]]> 1 1 2 3 2 1 2 7 2 3 1 3 2 1 1

[0185] If the product of the original operands is negative, i.e., sign_p = sign_x^sign_y = 1, then the target coefficient c i The values ​​are shown in Table 12. Table 12 shows the target coefficient values ​​corresponding to 16-bit precision signed multiplication. Then, according to the above formula (8), the weight N of the target product k can be obtained. k It can be expressed by formula (10), where:

[0186] N k =-(1×2) 0 +1×2 2 +2×2 4 +3×2 6 +2×2 8 +1×2 10 +2×2 12 +2×2 16 +3×2 18 +1×2 20 +3×2 22 +2×2 24 +1×2 26 +1×2 28 (10)

[0187] Table 12: Target coefficient c for 16-bit precision multiplication i The value of (sign_x^sign_y = 1)

[0188] <![CDATA[c0]]> <![CDATA[c1]]> <![CDATA[c2]]> <![CDATA[c3]]> <![CDATA[c4]]> <![CDATA[c5]]> <![CDATA[c6]]> <![CDATA[c7]]> <![CDATA[c8]]> <![CDATA[c9]]> <![CDATA[c 10 ]]> <![CDATA[c 11 ]]> <![CDATA[c 12 ]]> <![CDATA[c 13 ]]> <![CDATA[c 14 ]]> -1 -1 -2 -3 -2 -1 -2 -7 -2 -3 -1 -3 -2 -1 -1

[0189] Based on the above description, find N. k The process is to find c. i The process. First, let's introduce the c process when using 16-bit precision. i The computational structure is then introduced, followed by descriptions of 8-bit, 4-bit, and 2-bit precision. i Compatibility design.

[0190] Figure 10 This invention provides a calculation structure for the target coefficient c7 with 16-bit precision, as follows: Figure 10As shown, this structure has 8 pairs of inputs, each pair containing a bit of flag_k and its corresponding sign flag. A sign flag of 0 indicates that k at that position is a positive number, and should be incremented by 1 during counting; a sign flag of 1 indicates that k at that position is a negative number, and should be decremented by 1 during counting. Let c be the count of all positive products. ip Let cin be the number of all negative products. Then the target coefficient ci is the difference between the number of positive products and the number of negative products, i.e., c. i =c ip -c in .

[0191] It should be noted that the computational unit structure of the remaining ci is the same as... Figure 9 The only difference is the number of inputs; Table 13 shows the number of inputs per c in 16-bit precision. i The structure of the computing unit includes the number of input ports and the input data. Taking c2 as an example, the input data (flag_k) consists of 58 bits to 60 bits and its corresponding sign flag.

[0192] Table 13. Each c under 16-bit precision i The number of input ports and input data of the computing unit

[0193]

[0194] Furthermore, under different precision configurations, c i The quantity and each c i The corresponding product values ​​differ. Table 14 shows the c values ​​under different precision configurations. i and c i The number of corresponding target products.

[0195] Table 14. c under various precision configurations i and the corresponding product quantity

[0196]

[0197] Furthermore, c under low-precision configuration i The calculation can be performed by combining high-precision configurations under c i The calculation results are obtained.

[0198] Figure 11(a) shows the c in 16-bit precision mode provided by the present invention. i Combining them yields c in 8-bit precision mode. i A schematic diagram, Figure 11(b) is a diagram of the c provided by the present invention in 8-bit precision mode. i Combining them yields c in 4-bit precision mode. iA schematic diagram, Figure 11(c) is a diagram provided by the present invention, showing the c in 4-bit precision mode. i Combining them yields c in 2-bit precision mode. i A schematic diagram.

[0199] For example, in 8-bit mode, the sign flag bits corresponding to 16 1-bit segments need to be added together to obtain c3. However, if the 16 segments corresponding to c3 are distributed among c0, c2, c7, c8, c9, c10, c10, c11, c20, c30, c12, c30, c13, c24, c30 ... 12 and c 14 At the corresponding positions, the calculation of c3 can be decomposed into c0, c2, c7, and c in 16-bit mode. 12 and c 14 The calculation, thus reusing 16-bit precision c i The calculation structure is represented by the following formula (11), where:

[0200] c 3,8b =c 0,16b +c 2,16b +c 7,16b +c 12,16b +c 14,16b (11)

[0201] Among them, c 3,8b This represents the target coefficients c3 and c0 in 8-bit precision mode. 16b This represents the target coefficients c0 and c2 in 16-bit precision mode. 16b c2 and c represent the target coefficients in 16-bit precision mode. 7,16b This represents the target coefficient c7, c in 16-bit precision mode. 12,16b The target coefficient c represents the target coefficient in 16-bit precision mode. 12 c 14,16b The target coefficient c represents the target coefficient in 16-bit precision mode. 14 .

[0202] Based on the above description, c1 in 4-bit precision mode and c0 in 2-bit precision mode can be obtained respectively, expressed by formula (12) and formula (13), where:

[0203] c 1,4b =c 1,8b +c 3,8b +c 5,8b (12)

[0204] c 0,2b =c 0,4b +c 1,4b +c 2,4b (13)

[0205] After obtaining the target coefficient ci, N can be obtained according to the above formula (8). k It should be further noted that in this application, concatenation is used instead of multiplication and addition, thereby avoiding the calculation of N. k The process involves introducing high-width multiplication and addition units. The principle of using concatenation to replace multiplication and addition is explained below.

[0206] Assuming operands a and b are both 5-bit signed operands, where the highest bit is the sign bit and the lower 4 bits are the value bits, then a × 2 m +b(m≥4) is numerically equal to concat{{sign_a'},a',(m-4){sign_b},b[3:0]}, where sign_b represents the sign bit of operand b and sign_a' represents the sign bit of a'. a' can be represented by formula (14), where:

[0207]

[0208] For multiply-accumulate units compatible with different precisions, the maximum bit width of each target coefficient is shown in Table 13. Figure 12(a) is a schematic diagram of the splicing method in the 16-bit precision mode provided by the present invention, Figure 12(b) is a schematic diagram of the splicing method in the 8-bit precision mode provided by the present invention, Figure 12(c) is a schematic diagram of the splicing method in the 4-bit precision mode provided by the present invention, and Figure 12(d) is a schematic diagram of the splicing method in the 2-bit precision mode provided by the present invention, wherein s_c i The target coefficient c i The sign bit, for example, s_c 13 The target coefficient c 13 The sign bit.

[0209] Therefore, N can be represented by the above formula (8). k The calculation is simplified to the addition of three 30-bit signed numbers, as represented by formula (15), thus reducing area overhead. Formula (15) is expressed as:

[0210] N k =num k_1 +num k_2 +num k_3 (15)

[0211] Where, num k_1 ,num k_2 and num k_3 Indicated by c i The three 30-bit signed numbers are concatenated.

[0212] Optionally, the specific implementation of step 540 above includes:

[0213] Based on each of the target products and their weights, at least one partial sum is determined; the partial sums are added together to obtain at least one set of multiply-add sums corresponding to operands.

[0214] Specifically, the weight N of the target product k is obtained. k Then, according to the above formula (7), it can be obtained through N k ·k yields the partial sum corresponding to the target product k.

[0215] In this application, N is calculated in a hardware-friendly manner. k •k, to avoid introducing multiplication with a large bit width. The target product k can only take 6 values, namely 1 (0001), 2 (0010), 3 (0011), 4 (0100), 6 (0110), and 9 (1001). Each branch in Figure 8 corresponds to one of the values ​​of k, and the partial and generation module structures used for the target product are also different.

[0216] For the three branches N1·1, N2·2 and N4·4, we can obtain the partial sum by left-shifting N1, N2 and N4 by 0 bits, 1 bit and 2 bits respectively.

[0217] For N3·3, N6·6 and N9·9, they can be decomposed according to the following formulas (16)-(18), respectively, where:

[0218] N3·3=N3·2+N3·1=N3<<1+N3 (16)

[0219] N6·6=N6·4+N5·2=N6<<2+N6<<1 (17)

[0220]

[0221] The partial sum generation module corresponding to the three branches N3·3, N6·6 and N9·9 only requires two shifters and one adder to obtain the partial sum.

[0222] After obtaining the sums of each part, the sums of each part can be added together to obtain the final sum of multiplication and addition.

[0223] The method for determining the sum of multiplication and addition provided by the present invention determines at least one partial sum based on each of the target products and their weights; the partial sums are added together to obtain the sum of multiplication and addition corresponding to at least one set of operands. The method implements the partial sums of each target product in a hardware-friendly manner, thereby obtaining the sum of multiplication and addition. This reduces redundant calculations in the calculation of the sum of multiplication and addition of at least one operand, thereby reducing energy consumption and improving energy efficiency.

[0224] Figure 13 This is one of the structural schematic diagrams of the variable precision multiply-accumulate calculation unit provided by the present invention, such as... Figure 13 As shown, the variable precision multiply-accumulate calculation unit includes an operand preprocessing module, an operand shaping module, a target product flag generation module, a sign flag generation module, a target product weight generation module, a partial sum generation module, and a result generation module. The operand preprocessing module converts negative operands in the input operands 1 and 2 into positive numbers. The operand shaping module divides the converted values ​​into 2-bit segments and generates all segment combinations according to the LPC method. The target product flag generation module generates six target product flags based on all segment combinations. The sign flag generation module generates the target product sign flag based on the sign bits of the input operands 1 and 2. The target product weight generation module generates the weight N of the target product k based on the target product flag, the target product sign flag, and all segment combinations. k The partial and generation modules are used to calculate the weights N of the target product k. k The partial sum N corresponding to the target product k is generated in a hardware-friendly manner. k ·k; The result generation module is used to add the six parts corresponding to the six objective products using an addition structure to obtain the final multiplicative sum.

[0225] It should be noted that, Figure 13 This is just one schematic diagram of a variable-precision multiply-accumulate unit; several other variations are also possible.

[0226] Variant Structure 1: Only one branch is retained, and the partial sums corresponding to different target products are calculated in time-sharing and then accumulated;

[0227] Variant Structure 2: Figure 14 This is the second schematic diagram of the variable precision multiply-accumulate calculation unit provided by the present invention, as shown below. Figure 14 As shown, four pairs of operands share a single target product weight generation module. Module 1 includes... Figure 13 The diagram shows the operand preprocessing module, operand shaping module, target product flag generation module, and sign flag generation module.

[0228] In the concatenation method shown in Figure 12, except for c7 in Figure 12(a) and c8 in Figure 12(b), each coefficient still has room for bit width expansion, which means that this concatenation method can support the statistics of more target product numbers. Therefore, multiple pairs of operands can share a single target product weight generation module.

[0229] For c7 in Figure (a) and c8 in Figure (b) of Figure 12, when supporting a shared target product weight generation module for 4 pairs of operands, the numerical values ​​of these two numbers require 7 bits, but there is only 6 bits of space. Therefore, this invention adds an additional processing module for these two coefficients in variant 2. Specifically, when the value of c7 in Figure (a) of Figure 12 is greater than 63, the lower 6 bits of c7 are concatenated to num. k_1 Meanwhile, in c 10 The numerical value is incremented by 1 to ensure the accuracy of the final multiplication and addition.

[0230] The apparatus for determining the sum of multiplication and addition provided by the present invention will be described below. The apparatus for determining the sum of multiplication and addition described below can be referred to in correspondence with the method for determining the sum of multiplication and addition described above.

[0231] Figure 15 This is a schematic diagram of the structure of the device for determining the multiplication-addition sum provided by the present invention, as shown below. Figure 15 As shown, the multiplication-addition sum determination device 1500 includes: a first determination module 1501, a first generation module 1502, a second generation module 1503, and a second determination module 1504; wherein,

[0232] The first determining module 1501 is used to determine at least one target segment based on at least one set of operands; each set of operands includes a first operand and a second operand; the data precision of the first operand is q bits, and the data precision of the second operand is p bits; q and p are both positive integers.

[0233] The first generation module 1502 is used to generate at least one target product flag bit for the target product based on each of the target segments;

[0234] The second generation module 1503 is used to generate the weights of each of the target products based on the target product flag bits and the sign flag bits corresponding to each set of operands.

[0235] The second determining module 1504 is used to determine the target multiply-addition based on each of the target products and the weights of each of the target products.

[0236] The multiply-addition sum determination device provided by the present invention determines at least one target segment based on at least one set of operands; each set of operands includes a first operand and a second operand; the data precision of the first operand is q bits, and the data precision of the second operand is p bits; q and p are both positive integers; then, based on multiple target segments, at least one target product flag is generated; based on each target product flag and the sign flag corresponding to each set of operands, a weight for each target product is generated; then, based on multiple target products and the weight of each target product, the multiply-addition sum corresponding to at least one set of operands is determined. This reduces redundant calculations during the calculation of the multiply-addition sum of at least one operand, thereby reducing energy consumption and improving energy efficiency.

[0237] Optionally, the first determining module 1501 is specifically used for:

[0238] For each set of operands, the first operand and the second operand are preprocessed to obtain the third operand and the fourth operand;

[0239] The third operand and the fourth operand are divided into segments to obtain at least one target segment.

[0240] Optionally, the first generation module 1502 is specifically used for:

[0241] Based on each of the target segments, at least one segment combination is generated;

[0242] Based on the numerical values ​​corresponding to each of the aforementioned fragment combinations, a target product flag is generated for each of the aforementioned target products.

[0243] Optionally, the second generation module 1503 is specifically used for:

[0244] Based on each of the target product flag bits, generate a target product flag bit combination corresponding to each of the target products;

[0245] The weights of each target product are determined based on the target product flag combination and the sign flag corresponding to each set of operands.

[0246] Optionally, the second generation module 1503 is specifically used for:

[0247] The target coefficient is determined based on the target product flag combination and the sign flag corresponding to each group of operands;

[0248] Based on the target coefficients, the weights of each target product are determined.

[0249] Optionally, the second determining module 1504 is specifically used for:

[0250] Based on each of the target products and the weights of each of the target products, at least one partial sum is determined;

[0251] Add the sums of the aforementioned parts to obtain at least one set of multiply-addition sums corresponding to operands.

[0252] Figure 16 This is a schematic diagram of the physical structure of an electronic device provided by the present invention, such as... Figure 16 As shown, the electronic device may include: a processor 1610, a communications interface 1620, a memory 1630, and a communication bus 1640, wherein the processor 1610, the communications interface 1620, and the memory 1630 communicate with each other through the communication bus 1640. The processor 1610 can call logical instructions in the memory 1630 to execute a method for determining a multiply-addition sum, the method including: determining at least one target segment based on at least one set of operands; each set of operands includes a first operand and a second operand; the data precision of the first operand is q bits, and the data precision of the second operand is p bits; q and p are both positive integers; generating at least one target product flag bit for each target product based on each target segment; generating a weight for each target product based on each target product flag bit and a sign flag bit corresponding to each set of operands; and determining the multiply-addition sum corresponding to at least one set of operands based on each target product and its weight.

[0253] Furthermore, the logical instructions in the aforementioned memory 1630 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0254] On the other hand, the present invention also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer is able to execute the multiply-addition determination method provided by the above methods. The method includes: determining at least one target segment based on at least one set of operands; each set of operands includes a first operand and a second operand; the data precision of the first operand is q bits, and the data precision of the second operand is p bits; q and p are both positive integers; generating a target product flag bit for at least one target product based on each target segment; generating a weight for each target product based on each target product flag bit and a sign flag bit corresponding to each set of operands; and determining the multiply-addition sum corresponding to at least one set of operands based on each target product and its weight.

[0255] In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements a method for determining the multiply-addition sum provided by the methods described above. This method includes: determining at least one target segment based on at least one set of operands; each set of operands includes a first operand and a second operand; the data precision of the first operand is q bits, and the data precision of the second operand is p bits; q and p are both positive integers; generating a target product flag bit for at least one target product based on each target segment; generating a weight for each target product based on each target product flag bit and a sign flag bit corresponding to each set of operands; and determining the multiply-addition sum corresponding to at least one set of operands based on each target product and its weight.

[0256] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0257] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0258] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of determining a sum-of-products, characterized by, This is applied to a variable-width accelerator, where the computational unit is a variable-precision multiply-accumulate computation unit. This unit includes: an operand preprocessing module, an operand shaping module, a target product flag generation module, a target product weight generation module, and a result generation module. The operand preprocessing module is used to preprocess the first and second operands for each set of operands to obtain the third and fourth operands; each set of operands includes the first and second operands; the data precision of the first operand is q bits, and the data precision of the second operand is p bits; q and p are both positive integers; The operand shaping module is used to divide the third operand and the fourth operand respectively to obtain at least one 2-bit target segment; The target product flag generation module is used to generate at least one combination of segments based on each of the 2-bit target segments; and to generate a target product flag for each of the target products based on the values ​​corresponding to each of the segment combinations. Specifically, it includes: since there are only 6 valid products in 2-bit unsigned multiplication, a 6-bit one-hot code is generated for each pair of segments corresponding to the 64 combinations of segments after the 2-bit operands is shaped, as the target product flag, which is used to indicate which of the 6 target products the product of the values ​​corresponding to each pair of segments belongs to. The target product weight generation module is used to generate the weights of each target product based on the target product flag bits and the sign flag bits corresponding to each set of operands. The result generation module is used to determine at least one set of multiply-addition sums corresponding to operands based on each of the target products and the weights of each of the target products.

2. The method of determining a sum-of-products according to claim 1, wherein, The step of generating the weights of each target product based on the target product flag bits and the sign flag bits corresponding to each set of operands includes: Based on the target product flag bits and the segment combinations corresponding to each target segment, generate the target product flag bit combinations corresponding to each target product; The weights of each target product are determined based on the target product flag combination and the sign flag corresponding to each group of operands.

3. The method of determining a sum-of-products according to claim 2, wherein, The step of determining the weight of each target product based on the target product flag combination and the sign flag corresponding to each group of operands includes: The target coefficient is determined based on the target product flag combination and the sign flag corresponding to each group of operands; Based on the target coefficients, the weights of each target product are determined.

4. The method of determining a sum-of-products according to claim 1, wherein, The step of determining at least one set of multiply-addition sums corresponding to operands based on each of the target products and their weights includes: Based on each of the target products and the weights of each of the target products, at least one partial sum is determined; Add the sums of the aforementioned parts to obtain at least one set of multiply-addition sums corresponding to operands.

5. A device for determining the sum of multiplication and addition, characterized in that, This is applied to a variable-width accelerator, where the computational unit is a variable-precision multiply-accumulate computation unit. This unit includes: an operand preprocessing module, an operand shaping module, a target product flag generation module, a target product weight generation module, and a result generation module. The first determining module is used to preprocess the first and second operands for each set of operands to obtain the third and fourth operands; and to divide the third and fourth operands respectively to obtain at least one 2-bit target segment; the data precision of the first operand is q bits, and the data precision of the second operand is p bits; q and p are both positive integers; The first generation module is used to generate at least one combination of segments based on each of the 2-bit target segments; and to generate a target product flag bit for each of the target products based on the numerical values ​​corresponding to each of the segment combinations. Specifically, it includes: since there are only 6 valid products in 2-bit unsigned multiplication, a 6-bit one-hot code is generated for each pair of segments corresponding to the numerical values ​​in the 64 pairs of segment combination methods corresponding to the 2-bit operand after the integerization, as the target product flag bit, which is used to indicate which of the 6 target products the product of the numerical values ​​corresponding to each pair of segments belongs to. The second generation module is used to generate the weights of each target product based on the target product flag bits and the sign flag bits corresponding to each set of operands. The second determining module is used to determine the target multiply-addition based on each of the target products and the weights of each of the target products.

6. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the method for determining the multiply-addition sum as described in any one of claims 1 to 4.

7. A non-transitory computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method for determining the multiply-addition sum as described in any one of claims 1 to 4.

8. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the method for determining the multiply-addition sum as described in any one of claims 1 to 4.