Floating point multiplier-adder, processor and electronic device

By designing a control module in the floating-point multiply-adder to bypass the multiplication array module, the power consumption problem during addition operations is solved, achieving more efficient energy management.

CN119148978BActive Publication Date: 2026-06-26HYGON INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HYGON INFORMATION TECH CO LTD
Filing Date
2024-08-26
Publication Date
2026-06-26

Smart Images

  • Figure CN119148978B_ABST
    Figure CN119148978B_ABST
Patent Text Reader

Abstract

The present disclosure provides a floating point multiplier-adder, a processor and an electronic device. The floating point multiplier-adder includes a control module, a receiving module, a splitting module, an exponent operation module, a multiplication module, a scaling module and an addition module. The receiving module receives all of the first, second and third operands or only the first and third operands for an operation. The splitting module splits the operands received by the receiving module into a sign bit, an exponent and a mantissa, respectively. The addition module performs a first addition process on a scaling result of the mantissa and a multiplication result when performing a multiply-add operation, or performs a second addition process on the scaling result of the mantissa and a first mantissa of the first operand when performing an addition operation. The control module controls the splitting module, the exponent operation module, the addition module and the multiplication array module according to the type of the operation. The floating point multiplier-adder can reduce the power loss when the floating point multiplier-adder performs an addition operation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] Embodiments of this disclosure relate to a floating-point multiply-accumulator, processor, and electronic device. Background Technology

[0002] The floating-point multiply-accumulate (FMA) is a commonly used computing unit in modern high-performance processors. It can execute floating-point multiply-accumulate instructions to perform the multiplication-accumulation operation A*B+C, where A, B, and C are the floating-point operands involved in the multiplication-accumulation operation. The FMA can also execute floating-point addition and multiplication instructions. For example, when executing a floating-point multiplication instruction, setting operand C to 0 is equivalent to performing a multiplication operation A*B. When executing a floating-point addition instruction, setting operand B to 1 is equivalent to performing an addition operation A+C. Summary of the Invention

[0003] At least one embodiment of this disclosure provides a floating-point multiply-accumulator, comprising: a control module, a receiving module, a splitting module, an exponentiation module, a multiplication array module, an alignment module, and an addition module. The receiving module is configured to receive all or only the first operand and the third operand from a first operand, a second operand, and a third operand used for operations, wherein the operations include multiplication-accumulation or addition. The splitting module is configured to split the operand received by the receiving module into a sign bit, an exponent, and a mantissa, respectively. The exponentiation module is configured to perform exponentiation on the exponent obtained by the splitting module to obtain an exponentiation result. The multiplication array module is configured to, when performing the multiplication-accumulation operation, adjust the first mantissa of the first operand. The multiplication operation is performed on the second mantissa of the second operand to obtain the multiplication result; the alignment module is configured to perform an alignment shift on the third mantissa of the third operand based on the exponent operation result to obtain a mantissa alignment result; the addition module is configured to, when performing the multiplication-addition operation, perform a first addition process based on the mantissa alignment result and the multiplication result to obtain a first addition process result, or, when performing the addition operation, perform a second addition process based on the mantissa alignment result and the first mantissa of the first operand to obtain a second addition process result; and the control module is configured to control the operation of the splitting module, the exponent operation module, the addition module, and the multiplication array module according to the type of the operation.

[0004] The floating-point multiply-accumulator provided in at least one embodiment of this disclosure further includes a selection module, wherein the selection module is coupled to the splitting module, the exponentiation module and the addition module, and is configured to select the exponent and mantissa obtained by splitting from the splitting module according to the control instructions provided by the control module, and send the exponent and mantissa obtained by splitting into the exponentiation module and the addition module in the mode of the multiply-accumulate operation or the addition operation.

[0005] In at least one embodiment of this disclosure, a floating-point multiply-accumulate unit is provided, wherein the selection module includes a plurality of selectors, the plurality of selectors including a first selector; the first selector is coupled to the splitting module and the exponentiation module, and is configured to receive an exponent offset value and a second exponent of the second operand received from the splitting module, wherein when the operation is the multiplication operation, the second exponent is selected and output to the exponentiation module, or, when the operation is the addition operation, the exponent offset value is selected and output to the exponentiation module.

[0006] In at least one embodiment of this disclosure, a floating-point multiply-adder is provided. The addition module includes a carry-preserving adder and a summation leading zero prediction module. The carry-preserving adder is coupled to the alignment module and configured to compress the mantissa alignment result, the received first addend, and the received second addend to obtain first compressed data and second compressed data. The summation leading zero prediction module is configured to perform the first addition process on the received third addend and fourth addend when the operation is the multiplication operation to obtain the first addition process result, or to perform the second addition process on the received third addend and fourth addend when the operation is the addition operation to obtain the second addition process result, and is configured to calculate the number of leading zeros in the first addition process result or the second addition process result.

[0007] In at least one embodiment of this disclosure, a floating-point multiply-adder is provided, wherein the plurality of selectors includes a second selector and a third selector, the summation leading zero prediction module is coupled to the carry-retaining adder, and the third addend and the fourth addend are the first compressed data and the second compressed data, respectively; the second selector is configured to, when the operation is the multiplication operation, select to provide the first result in the multiplication operation result to the carry-retaining adder as the first addend, or, when the operation is the addition operation, select to provide zero to the carry-retaining adder as the first addend; the third selector is configured to, when the operation is the multiplication operation, select to provide the second result in the multiplication operation result to the carry-retaining adder as the second addend, or, when the operation is the addition operation, select to provide the first mantissa obtained by the splitting module to the carry-retaining adder as the second addend.

[0008] In at least one embodiment of this disclosure, a floating-point multiply-accumulate unit is provided, wherein the splitting module is further configured to perform bit-width expansion processing on the first mantissa of the first operand when the operation is the addition operation, and provide the expanded first mantissa to the third selector, wherein the expanded first mantissa has the same bit width as the second result.

[0009] In at least one embodiment of this disclosure, a floating-point multiply-accumulator is provided, wherein the plurality of selectors includes a second selector and a third selector, wherein the first addend and the second addend are respectively the first result and the second result in the multiplication operation result; the second selector is configured to, when the operation is the multiplication operation, select to provide the first compressed data to the summation leading zero prediction module as the third addend, or, when the operation is the addition operation, select to provide the mantissa alignment result obtained by the alignment module to the summation leading zero prediction module as the third addend; the third selector is configured to, when the operation is the multiplication operation, select to provide the second compressed data to the summation leading zero prediction module as the fourth addend, or, when the operation is the addition operation, select to provide the first mantissa obtained by the splitting module to the summation leading zero prediction module as the fourth addend.

[0010] In at least one embodiment of this disclosure, a floating-point multiply-accumulator is provided, wherein the receiving module includes a first register, a second register, and a third register, wherein the first register, the second register, and the third register are respectively configured to store the first operand, the second operand, and the third operand.

[0011] In at least one embodiment of this disclosure, a floating-point multiply-accumulator is provided, wherein the receiving module further includes the fourth register, which is configured to store the first operand when the operation is the addition operation, and to provide the first operand to the splitting module to obtain the first mantissa used for the addition operation and provided to the addition module.

[0012] At least one embodiment of this disclosure provides a floating-point multiply-accumulator, further comprising: a normalization shift module configured to logically shift the first addition result or the second addition result to obtain a normalized result, wherein the number of bits of the logical shift is equal to the number of leading zeros; and a rounding module configured to round the normalized result.

[0013] At least one embodiment of this disclosure also provides a processor that includes the floating-point multiply-accumulator provided in any of the above embodiments.

[0014] At least one embodiment of this disclosure also provides an electronic device that includes the floating-point multiply-accumulator provided in any of the above embodiments. Attached Figure Description

[0015] To more clearly illustrate the technical solutions of the embodiments of this disclosure, the accompanying drawings of the embodiments will be briefly described below. Obviously, the drawings described below only relate to some embodiments of this disclosure and are not intended to limit this disclosure.

[0016] Figure 1 A schematic diagram of a conventional floating-point multiply-accumulator is shown.

[0017] Figure 2 A schematic diagram of the structure of a floating-point multiply-accumulator provided in at least one embodiment of the present disclosure is shown;

[0018] Figure 3 This diagram illustrates the execution of bit-width extension processing according to at least one embodiment of the present disclosure;

[0019] Figure 4 A schematic diagram of another floating-point multiply-accumulator provided in at least one embodiment of this disclosure is shown; and

[0020] Figure 5 A schematic diagram of the structure of an electronic device provided in at least one embodiment of the present disclosure is shown. Detailed Implementation

[0021] To make the objectives, technical solutions, and advantages of the embodiments of this disclosure clearer, the technical solutions of the embodiments of this disclosure will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this disclosure. All other embodiments obtained by those skilled in the art based on the described embodiments of this disclosure without creative effort are within the scope of protection of this disclosure.

[0022] Unless otherwise defined, the technical or scientific terms used in this disclosure shall have the ordinary meaning understood by one of ordinary skill in the art to which this disclosure pertains. The terms “first,” “second,” and similar terms used in this disclosure do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Similarly, the terms “an,” “a,” or “the,” and similar terms do not indicate a quantity limitation, but rather indicate the presence of at least one. The terms “including,” “comprising,” or “containing,” and similar terms mean that the element or object preceding the word encompasses the elements or objects listed following the word and their equivalents, without excluding other elements or objects. The terms “connected,” “linked,” or similar terms are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect. The terms “upper,” “lower,” “left,” and “right,” etc., are used only to indicate relative positional relationships, and these relative positional relationships may change accordingly when the absolute position of the described objects changes.

[0023] Floating-point numbers are used internally in computers to represent real numbers. They represent data whose decimal point position can float, and are often represented as N = M × R. E It includes the sign bit, the exponent field E, and the mantissa M, where R is the base of the order (e.g., 2, 8, 16, etc.).

[0024] For example, in the IEEE 754 standard, a single-precision floating-point number (e.g., FP32) is 32 bits long, which includes 1 sign bit, 8 exponent bits, and 23 mantissa bits (i.e., the sign bit occupies 1 bit, the exponent part occupies 8 bits, and the mantissa part occupies 23 bits); a double-precision floating-point number (e.g., FP64) is 64 bits long, which includes 1 sign bit, 11 exponent bits, and 52 mantissa bits.

[0025] The sign bit indicates whether the operand is positive or negative; for example, 0 represents a positive number and 1 represents a negative number. The exponent part stores the exponent value, which determines the position of the decimal point relative to the mantissa. The mantissa represents the fractional part of the operand; for example, the mantissa is a binary fraction. It can be assumed that there is a hidden highest bit "1" (this highest bit may not be explicitly saved during storage to save space). In binary floating-point numbers, these bits represent the digits after the binary decimal point.

[0026] Floating-point addition refers to the process of adding two or more values ​​encoded in floating-point representation. This process is not simply adding the mantissas, because the decimal point position of a floating-point number is not fixed but determined by the exponent. Typically, floating-point addition includes the following steps: alignment, mantissa addition, overflow handling, rounding, normalization, and special handling.

[0027] Alignment operations (i.e., exponent alignment or order alignment) ensure that all floating-point operands involved in the addition have the same decimal point position before performing floating-point addition. The exponents of the operands should be aligned to the same level. If the exponents of two addends are different, alignment is achieved by shifting the mantissa of the addend with the smaller exponent. For example, if one operand has a larger exponent, the mantissa of the smaller operand needs to be right-shifted to align the exponents of the operands with the larger exponent. Mantissa addition, assuming all exponents of the floating-point operands involved in the addition have been aligned, adds the mantissas of the aligned operands. This process is similar to ordinary integer addition, but because it is performed in binary and limited by the number of significant bits in the mantissa, it may involve additional carry-overs and subsequent rounding operations (described below).

[0028] Furthermore, floating-point multiplication refers to the process of multiplying two or more values ​​encoded according to floating-point representation. For example, floating-point multiplication follows the IEEE 754 standard (the standard adopted by most modern computers), including sign bit calculation, exponent calculation (also known as "exponent calculation"), mantissa calculation, normalization and rounding, overflow checking, etc.

[0029] The sign bit calculation involves: first determining the sign of the result; if the sign bits of the two operands are the same (both positive or both negative), then the sign bit of the result is 0 (positive); if the sign bits are different, the sign bit of the result is 1 (negative). The exponent calculation involves: calculating the exponent of the result, where, since it is a multiplication, the exponent of the result is equal to the sum of the exponents of the two operands, minus any offset (if an exponent representation with bias is used). For example, in single-precision floating-point numbers, the exponent has 8 bits and an offset value of 127. The mantissa calculation involves: multiplying the mantissas of the two operands. The mantissa after multiplication may exceed the normalization range, so normalization is required, i.e., adjusting the mantissa and exponent to ensure the mantissa is between 1.0 and 2.0 (for normalized floating-point numbers). Normalization and rounding involve: if the mantissa after multiplication exceeds the normalization range, a right shift (mantissa) and an increment (exponent) operation may be needed to renormalize the result. During normalization, rounding of the mantissa may also be necessary to meet the precision requirements of floating-point numbers. Rounding rules can include rounding to the nearest integer, rounding towards zero, rounding up, or rounding down. Overflow and underflow checks include checking if the exponent exceeds the representation range, which may lead to overflow (the value is too large) or underflow (the value is too small to be accurately represented). In the event of overflow, the result may be set to infinity, and in the event of underflow, it may be set to zero or a non-zero number smaller.

[0030] As mentioned earlier, a floating-point multiply-accumulate (FMA) can perform multiplication, addition, and multiplication operations. In the design of a FMA, the pipeline length and operational logic are usually divided according to the operating frequency of the various functional modules of the FMA (e.g., multipliers, adders, registers, etc.) to ensure that the FMA can operate efficiently at different operating frequencies.

[0031] For example, Figure 1 A schematic diagram of a floating-point multiply-accumulator is shown.

[0032] like Figure 1 As shown, taking a floating-point multiply-accumulator with a delay of 4 clock cycles (e.g., 4 clock cycles) as an example, the floating-point multiply-accumulator includes a four-stage pipeline, with each pipeline corresponding to one clock cycle. The floating-point multiply-accumulator performs a portion of the computational task within each clock cycle. Due to the clock cycle duration limitation, the alignment shift operation performed by the alignment module 130 is divided into alignment Align_1 and alignment Align_2 operations; the multiplication operation of the multiplication array module 140 is divided into multiplication operation Mult_1 and multiplication operation Mult_2; and the shift operation of the normalization shift module 160 is divided into Norm_1 and Norm_2 operations, executed in different pipeline stages.

[0033] like Figure 1 As shown, the first-stage pipeline includes a splitting module 110, an exponentiation module 120, an alignment module 130, and a multiplication array module 140. Three floating-point operands (e.g., A, B, and C) are written to three registers 101-103. The splitting module 110 splits each operand to obtain its corresponding sign bit, exponent, and mantissa. For example, the splitting module 110 splits operand A into its corresponding sign bit SA, exponent EA, and mantissa MA. The exponentiation module 120 calculates the exponentiation result based on the respective exponents EA, EB, and EC of operands A, B, and C. The alignment module 130 performs an alignment operation (Align_1) on the received exponentiation result and the mantissa MC of operand C. The multiplication array module 140 performs a multiplication operation (Mult_1) on the mantissas MA and MB of operands A and B.

[0034] like Figure 1 As shown, the second-stage pipeline includes an alignment module 130, a multiplication array module 140, and a carry-retaining adder 151. The alignment module 130 performs an alignment operation (Align_2) to obtain the mantissa alignment result and inputs it to the carry-retaining adder 151 for processing. The multiplication array module 140 performs a multiplication operation (Mult_2) to obtain the multiplication result and inputs it to the carry-retaining adder 151 for processing. The processed result generated by the carry-retaining adder 151 is input to the summation leading zero prediction module 152.

[0035] like Figure 1 As shown, the third-stage pipeline includes an addition leading zero prediction module 152 and a normalization shift module 160. The addition leading zero prediction module 152 performs addition processing on the processing result generated by the carry-holding adder 151 and calculates the number of leading zeros, and outputs the generated processing result and the calculated number of leading zeros to the normalization shift module 160, so that the normalization shift module 160 performs the Norm_1 operation in the third-stage pipeline.

[0036] like Figure 1 As shown, the fourth-stage pipeline includes a normalization shift module 160 and a rounding module 170. The normalization shift module 160 performs the Norm_2 operation and obtains the normalization result. The rounding module 170 performs rounding processing on the normalization result.

[0037] When the above floating-point multiply-accumulator performs a multiplication operation, operand C can be set to 0 to perform the multiplication-accumulation operation A*B+0. When the above floating-point multiply-accumulator performs an addition operation, operand B can be set to 1 to perform the multiplication-accumulation operation A*1+C.

[0038] The inventors of this disclosure have noted that the aforementioned floating-point multiply-accumulator includes a multiplication array module to perform multiplication operations, and this multiplication array module occupies a relatively large area. When the floating-point multiply-accumulator is used to perform addition operations, this multiplication array module continues to operate, resulting in unnecessary power consumption. To address this problem, a common approach is to include a separate floating-point adder within the floating-point multiply-accumulator to perform addition operations. While this method avoids unnecessary power consumption, the added floating-point adder also increases the chip's area overhead and static power consumption. This solution does not effectively solve the aforementioned problem.

[0039] One or more embodiments of this disclosure provide a floating-point multiply-accumulator. The floating-point multiply-accumulator includes a control module, a receiving module, a splitting module, an exponentiation module, a multiplication array module, an alignment module, and an addition module. The receiving module is configured to receive all or only the first and third operands from a first operand, a second operand, and a third operand used for operations, wherein the operations include multiplication-accumulation or addition. The splitting module is configured to split the operands received by the receiving module into a sign bit, an exponent, and a mantissa, respectively. The exponentiation module is configured to perform exponentiation on the exponents obtained by the splitting module to obtain an exponentiation result. The multiplication array module is configured to perform multiplication on the first mantissa of the first operand and the second mantissa of the second operand during multiplication-accumulation operations to obtain a multiplication result. The alignment module is configured to perform alignment shift on the third mantissa of the third operand according to the exponentiation result to obtain a mantissa alignment result. The addition module is configured to, when performing a multiplication-addition operation, perform a first addition process based on the mantissa alignment result and the multiplication result to obtain a first addition result, or, when performing an addition operation, perform a second addition process based on the mantissa alignment result and the first mantissa of the first operand to obtain a second addition result. The control module is configured to control the operations of the splitting module, the exponentiation module, the addition module, and the multiplication array module according to the type of operation. The floating-point multiply-adder provided in this embodiment can bypass the multiplication array module when executing floating-point addition (addition) instructions, based on the operation of the control module, thereby reducing the power consumption loss when the floating-point multiply-adder performs addition operations.

[0040] The present disclosure will now be described through several specific embodiments. To keep the following description of the embodiments of the present disclosure clear and concise, detailed descriptions of known functions and components may be omitted. When any component of the embodiments of the present disclosure appears in more than one drawing, the component is represented by the same or similar reference numerals in each drawing.

[0041] The following detailed description, with reference to the accompanying drawings, describes some embodiments and examples of this disclosure.

[0042] Figure 2A schematic diagram of the structure of a floating-point multiply-accumulator provided in at least one embodiment of the present disclosure is shown.

[0043] like Figure 2 As shown, the floating-point multiply-accumulator includes a control module ( Figure 2 (Not shown in the diagram) Receiver module 200, splitting module 210, exponentiation module 220, multiplication array module 240, alignment module 230 and addition module 250.

[0044] For example, the pipeline of this floating-point multiply-accumulator can be divided into a four-stage pipeline, meaning the operation process of the floating-point multiply-accumulator is divided into four stages, as described above. Figure 1 The floating-point multiply-accumulator shown is similar and will not be described in detail here. However, in the embodiments of this disclosure, the pipeline division of the floating-point multiply-accumulator can be determined according to the operating frequency of each functional module or component in the floating-point multiply-accumulator, and the embodiments of this disclosure do not limit this.

[0045] In this embodiment, the receiving module 200 is configured to receive floating-point operands for computation, including all or only operands A and C from operands A, B, and C. Correspondingly, the operations performed include multiplication-addition operation A*B+C or addition operation A+C. The splitting module 210 is configured to split the operands (A, B, and C, or A and C) received by the receiving module 200 into their respective sign bits, exponents, and mantissas. The exponentiation module 220 is configured to perform exponentiation operations on the exponents obtained by the splitting module 210 to obtain the exponentiation result.

[0046] For example, in at least one example, the receiving module 200 may include registers 201-203, which are configured to store operands A, B, and C, respectively. For example, in other examples, the receiving module may also include register 204. Register 204 is configured to store operand D when the floating-point multiply-accumulate performs an addition operation, and to provide operand D to the splitting module 210 to obtain the corresponding sign bit SD, exponent ED, and mantissa MD. Operand D is a copy of operand A for use in the addition operation, thereby simplifying control during the addition operation.

[0047] It should be noted that, in order to facilitate the distinction between "operand A" used for addition and "operand A" used for multiplication, one or more embodiments of this disclosure use "operand D" (stored in register 204) to represent "operand A" used for addition.

[0048] When the floating-point multiply-adder performs the multiplication-addition operation A*B+C, the multiplication array module 240 is configured to perform a multiplication operation on the mantissa MA of operand A and the mantissa MB of operand B. That is, the multiplication array module 240 performs the multiplication operation Mult_1 in the first-stage pipeline, and then performs the multiplication operation Mult_2 in the second-stage pipeline, obtaining the multiplication result. This multiplication result includes a first result Carry (carry) and a second result Sum (sum).

[0049] Alignment module 230 is configured to perform alignment shift on the mantissa MC of operand C according to the result of exponent operation. That is, alignment module 230 performs alignment Align_1 operation in the first-stage pipeline, and then performs alignment Align_2 operation in the second-stage pipeline to obtain mantissa alignment result MC_Align.

[0050] When the floating-point multiply-adder performs the multiplication-addition operation A*B+C, the addition module 250 is configured to perform a first addition operation on the mantissa alignment result MC_Align and the multiplication result to obtain a first addition result. Alternatively, when the floating-point multiply-adder performs the addition operation A+C (i.e., D+C), the addition module 250 is configured to perform a second addition operation on the mantissa alignment result MC_Align and the mantissa MD of the operand D to obtain a second addition result.

[0051] Control module ( Figure 2 (Not shown in the diagram) are configured to control the operations of the splitting module 210, exponentiation module 220, addition module 250, and multiplication array module 240 according to the type of operation, thereby realizing different operations; for example, the control module determines the type of operation based on the received operation instruction, and then issues the corresponding control signal. For example, the control module can be implemented by logic processing circuits (e.g., microprogram control) or by hardwiring, and the embodiments of this disclosure are not limited in this regard.

[0052] For example, when the floating-point multiply-accumulator receives a floating-point addition instruction, the control module can turn off the clocks of registers 201, 202, and possibly register 206 in the multiplication array module 240. Register 206 is used to store the intermediate result Mult_Tmp during the multiplication operation. By turning off the clocks of registers 201, 202, and 206, the data in registers 201, 202, and 206 can be prevented from being updated or overwritten as the clock signal changes. This prevents the splitting module from splitting operands A and B in registers 201 and 202, thus bypassing the multiplication array module 240 when the floating-point multiply-accumulator performs addition, avoiding unnecessary power consumption caused by the multiplication array module 240 participating in addition operations. That is, during the floating-point multiply-accumulator's addition operation, Figure 2 The grayed-out modules (registers 201, 202, 206 and multiplication array module 240) do not participate in addition operations, while the ungrayed modules (such as registers 203, 204, etc.) are still in normal working state, thereby reducing the power consumption of the floating-point multiply-accumulator when performing addition operations.

[0053] The floating-point multiply-accumulator provided in one or more embodiments of this disclosure may further include a selection module. This selection module is coupled to a splitting module, an exponentiation module, and an addition module. The selection module is configured to select the split exponent and mantissa from the splitting module according to control instructions provided by the control module, and to send the split exponent and mantissa to the exponentiation module and the addition module in a multiply-accumulate or addition operation mode.

[0054] For example, a selection module can include multiple selectors. Figure 2 In the illustrated embodiment, multiple selectors of the selection module ( Figure 2 The system (referred to as "MUX") includes selectors 2802, 2803, and 2804. Selector 2802 is coupled to the splitting module 210 and the exponentiation module 220, and is configured to receive the exponent offset value Bias and the exponent EB of the split operand B received from the splitting module 210. When the floating-point multiply-accumulator performs a multiplication operation, selector 2802 selects to output the exponent EB to the exponentiation module 220; or, when the floating-point multiply-accumulator performs an addition operation, selector 2802 selects to output the exponent offset value Bias to the exponentiation module 220. For example, the exponent offset value Bias can be determined according to the IEEE 754 standard and the data format of the floating-point operand, i.e., exponent offset value Bias = 2. (n-1) -1, where n is the number of bits in the exponent of the floating-point number. For example, in the IEEE 754 standard, single-precision floating-point numbers include an 8-bit exponent, so the bias used for single-precision floating-point numbers is 2. (8-1)-1 = 127. Since double-precision floating-point numbers include an 11-bit exponent, the bias used for double-precision floating-point numbers is 2. (11-1) -1 = 1023. The connection method and function of selector 2803 and selector 2804 will be introduced below, and will not be repeated here.

[0055] For example, if the receiving module includes register 204, the plurality of selectors described above may also include selector 2801. Figure 2 As shown, selector 2801 is coupled to splitting module 210 and exponentiation module 220, and is configured to receive from splitting module 210 the exponent EA of operand A stored in register 201 and the exponent ED of operand D stored in register 204. When the floating-point multiply-accumulator performs a multiplication operation, selector 2801 selects to output exponent EA to exponentiation module 220; or, when the floating-point multiply-accumulator performs an addition operation, selector 2801 selects to output exponent ED to exponentiation module 220.

[0056] In one example of this disclosure, when the floating-point multiply-accumulator performs a multiplication operation, the exponent operation module receives the exponents EA, EB, and EC, and the resulting exponent operation can be expressed as E = EC - (EA + EB) + Bias0; when the floating-point multiply-accumulator performs an addition operation, the exponent operation module receives the exponent EA, the exponent ED, and the exponent offset value Bias, and the resulting exponent operation can be expressed as E = EC - (ED + Bias) + Bias0 = EC - ED, where Bias0 is the initial exponent offset value for exponent operation that is pre-set in the exponent operation module, and the values ​​of Bias0 and Bias are the same.

[0057] For example, in one or more embodiments of the floating-point multiply-adder provided in this disclosure, the addition module includes a carry-preserving adder and an addition leading zero prediction module.

[0058] A carry-save adder (CSA) is a high-efficiency adder design suitable for performing addition operations on multiple numbers. During calculation, the carry-save adder does not immediately process carry-overs generated in the lower bits; instead, it saves the carry-over for later processing. This approach helps reduce latency in addition operations and improves computation speed.

[0059] like Figure 2As shown, since the carry-retaining adder 251 has three input terminals and two output terminals, it is also called a "3:2 compressor". It is used to compress the sum of three input data (e.g., A1 + A2 + A3) into the sum of two output data (e.g., PS + PC). The three input data and two output data satisfy the following relationships: PS = A1⊕A2⊕A3, PC = (A1·A2) + (A1·A3) + (A2·A3), where "⊕" represents the XOR operation and "·" represents the AND operation. The floating-point multiply-adder in this disclosure can also be selected to use a 4:2 compressor, a 5:2 compressor, etc., depending on actual needs; this disclosure does not impose any limitations on this.

[0060] like Figure 2 As shown, the carry-preserving adder 251 is coupled to the output of the alignment module 230 and is configured to compress the mantissa alignment result MC_Align, the received first addend, and the received second addend to obtain compressed data PS and PC. The first and second addends describe the two data received by the carry-preserving adder 251, excluding the mantissa alignment result MC_Align. The compressed data PS represents the sum of the alignment result MC_Align, the first addend, and the second addend without considering the carry, also known as the "partial sum." The compressed data PD (Propagate Carry) represents the carry generated during the addition of the alignment result MC_Align, the first addend, and the second addend.

[0061] The summation leading zero prediction module 252 is configured to perform a first addition operation on the received third and fourth addends during multiplication by the floating-point multiplier-adder, obtaining a first addition result; or to perform a second addition operation on the received third and fourth addends during addition by the floating-point multiplier-adder, obtaining a second addition result. The summation leading zero prediction module 252 is also configured to calculate the number of leading zeros in the first or second addition result. Here, "third addend" and "fourth addend" describe the data received at the two inputs of the summation leading zero prediction module 252, and the "first addition operation" and "second addition operation" have the same addition steps.

[0062] In one embodiment of this disclosure, such as Figure 2 As shown, selector 2803 is coupled to the output of multiplication array module 240 and the input of carry-holding adder 251, and selector 2804 is coupled to multiplication array module 240, carry-holding adder 251 and split module 210.

[0063] For example, when the floating-point multiply-adder performs a multiplication operation, selector 2803 selects to provide the first result Carry from the multiplication result to the carry-holding adder 251 as the first addend, and selector 2804 selects to provide the second result Sum from the multiplication result to the carry-holding adder 251 as the second addend. Then, the carry-holding adder 251 performs compression processing on the mantissa alignment result MC_Align, the first result Carry, and the second result Sum in the second pipeline stage to obtain compressed data PS and PC.

[0064] For example, when the floating-point multiply-adder performs an addition operation, selector 2803 selects to provide 0 (i.e., the case without carry) to carry-preserving adder 251 as the first addend, and selector 2804 selects to provide the mantissa MD obtained from the splitting module 210 to carry-preserving adder 251 as the second addend. Then, carry-preserving adder 251 performs compression processing on the mantissa alignment result MC_Align, 0, and mantissa MD in the second pipeline stage to obtain compressed data PS and PC.

[0065] Following the above embodiment, the summation leading zero prediction module 252 is coupled to the carry-retaining adder 251. The compressed data PS and PC generated by the carry-retaining adder 251 are directly provided to the summation leading zero prediction module 252. In the third-stage pipeline, the summation leading zero prediction module 252 first performs addition processing on the compressed data PS and PC to obtain the addition processing result, and then calculates the number of leading zeros in the addition processing result.

[0066] It is important to note that when the floating-point multiply-accumulator performs addition operations, Figure 2 The mantissa MD received by selector 2804 is actually the mantissa MD_d1 obtained after the expansion operation. In the first-stage pipeline, split module 210 performs bit-width expansion processing on the mantissa MD of operand D and provides the expanded mantissa MD_d1 to selector 2804. The expanded mantissa MD_d1 has the same bit width as the second result Sum. The expanded mantissa MD_d1 can be stored in register 205, which is coupled to selector 2804 so that selector 2804 can read the mantissa MD_d1.

[0067] Figure 3 A schematic diagram illustrating the bit-width extension process provided in at least one embodiment of the present disclosure is shown.

[0068] like Figure 3As shown, the mantissa MD has a bit width of W, and the second result Sum in the multiplication operation has a bit width of 2W. When extending the mantissa MD, since the bit width of the second result Sum is greater than the bit width of the mantissa MD, this extension is a positive extension. One 0 bit is added to the high-order bits of the mantissa MD, and then (W-1) 0 bits are added to the low-order bits of the mantissa MD, resulting in a mantissa MD_d1 with a bit width of 2W. The highest bit of the extended mantissa MD_d1 is 0, the last (W-1) bits are all 0, and the middle W bits represent the mantissa MD.

[0069] In at least one embodiment of this disclosure, by extending the mantissa (also known as “bit width alignment”), it can be ensured that data can be correctly processed and operated on inside the floating-point multiply-accumulator, thereby improving operational efficiency and ensuring compatibility with hardware design.

[0070] For example, the floating-point multiply-accumulator provided in one or more embodiments of this disclosure also includes a normalization shift module and a rounding module.

[0071] like Figure 2 As shown, the normalization shift module 260 is configured to logically shift either the first or second addition result output by the addition module 250 to obtain a normalized result, wherein the number of bits logically shifted is equal to the number of leading zeros in the first or second addition result. The logical shift operation is divided into two parts: the Norm_1 operation executed in the third-stage pipeline and the Norm_2 operation executed in the fourth-stage pipeline. The rounding module 270 is configured to perform rounding processing on the normalized result generated by the normalization shift module 260 in the fourth-stage pipeline.

[0072] It is important to note that Figure 2 The floating-point multiply-accumulator shown omits the sign bit prediction module, which operates on the sign bit obtained from the split operation. The sign bit prediction module can determine the sign of the final operation result based on the exponent operation result, that is, determine whether the sign of the final operation result is positive or negative. The specific determination method is quite conventional and will not be described in detail here.

[0073] Compared to Figure 2In one embodiment, register 204 is omitted. Instead, register 201 is used to directly provide operand A for addition. In this case, the word count EA and mantissa MA obtained by splitting operand A are selectively provided directly to exponentiation module 220 (in which case selector 2801 is not needed) and selector 2804 (which is then further provided to addition module 250). For example, another selector is provided to selectively connect the splitting module corresponding to operand A to multiplication array module 240. For example, in the case of multiplying and adding, the splitting module corresponding to operand A is connected to multiplication array module 240, while in the case of adding, the splitting module corresponding to operand A is disconnected from multiplication array module 240. Other connections and operations of this embodiment can be found above. Figure 2 The situation described will not be elaborated upon here.

[0074] Figure 4 A schematic diagram of another floating-point multiply-accumulator provided in at least one embodiment of the present disclosure is shown.

[0075] Figure 4 The floating-point multiply-accumulator shown in the embodiment also includes a control module, a receiving module, a splitting module, an exponentiation module, a normalization shifting module, a rounding module, etc. The connection method and function of these modules can be referred to the above. Figure 2 The floating-point multiply-accumulator shown will not be described in detail here; only the following section will focus on... Figure 4 Implementation examples and Figure 2 The differences between the embodiments will be explained.

[0076] like Figure 4 As shown, the carry-retaining adder 451 is a 3:2 compressor. The three inputs of the carry-retaining adder 451 are coupled to the output of the alignment module 430 and the two outputs of the multiplication array module, respectively. The carry-retaining adder 451 is configured to compress the mantissa alignment result MC_Align, the first result Carry, and the second result Sum of the multiplication operation, obtaining compressed data PS and PC. The compressed data PS and PC are then provided to selector 4803 and selector 4804, respectively.

[0077] like Figure 4 As shown, selector 4803 is coupled to alignment module 430 and carry-preserving adder 451 to receive mantissa alignment result MC_Align and compressed data PS. Selector 4804 is coupled to carry-preserving adder 451 and register 405 to receive compressed data PC and extended mantissa MD_d1. The method for generating extended mantissa MD_d1 can be referred to the above. Figure 3 The bit-width expansion processing method shown here will not be elaborated further.

[0078] For example, when the floating-point multiply-adder performs a multiplication operation, selector 2803 selects to provide compressed data PS to the summation leading zero prediction module 452 as the third addend, and selector 4804 selects to provide compressed data PC to the summation leading zero prediction module 452 as the fourth addend. The leading zero prediction module 452 first performs a first addition process on the compressed data PS and PC to obtain the first addition process result, and then calculates the number of leading zeros in the first addition process result. For example, when the floating-point multiply-accumulator provided in the embodiment performs an addition operation, selector 4803 selects to provide the mantissa alignment result MC_Align to the summation leading zero prediction module 452 as the third addend, and selector 4804 selects to provide the extended mantissa MD_d1 to the summation leading zero prediction module 252 as the fourth addend. The leading zero prediction module 452 first performs a second addition process on the mantissa alignment result MC_Align and the extended mantissa MD_d1 to obtain the second addition process result, and then calculates the number of leading zeros in the second addition process result.

[0079] One or more embodiments of this disclosure also provide a processor that includes the floating-point multiply-accumulator provided in any of the above embodiments. The processor may be, for example, a central processing unit (CPU) or a coprocessor. The coprocessor may be, for example, a graphics processing unit (GPU), a general-purpose graphics processing unit (GPGPU), an AI accelerator (e.g., a tensor processor), etc., and the embodiments of this disclosure do not limit this; furthermore, the embodiments of this disclosure do not limit the specific architecture of the processor, the applicable instruction set, etc.

[0080] One or more embodiments of this disclosure also provide an electronic device that includes the processor provided in any of the above embodiments.

[0081] For example, Figure 5 A schematic diagram of the structure of an electronic device provided in at least one embodiment of the present disclosure is shown.

[0082] The electronic devices in this disclosure may include, but are not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), vehicle terminals (e.g., vehicle navigation terminals), and fixed terminals such as digital TVs and desktop computers, and may also be used to implement various types of servers.

[0083] For example, such as Figure 5As shown, in some examples, electronic device 500 includes a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 501, which may include a floating-point multiply-accumulator provided in at least one embodiment of this disclosure. The processing device 501 can perform various appropriate actions and processes based on a program stored in read-only memory (ROM) 502 or a program loaded from storage device 508 into random access memory (RAM) 503. The RAM 503 also stores various programs and data required for the operation of the computer system. The processing device 501, ROM 502, and RAM 503 are connected via a bus 504. An input / output (I / O) interface 505 is also connected to the bus 504.

[0084] For example, the following components can be connected to I / O interface 505: input devices 506 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 507 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 508 including, for example, magnetic tapes, hard disks, etc.; and communication devices 509, such as network interface cards like LAN cards and modems, etc. Communication device 509 allows electronic device 500 to communicate wirelessly or wiredly with other devices to exchange data and perform communication processing via networks such as the Internet. Drive 510 is also connected to I / O interface 505 as needed. Removable media 511, such as disks, optical disks, magneto-optical disks, semiconductor memories, etc., are installed on drive 510 as needed so that computer programs read from them can be installed into storage device 508 as needed. Although Figure 5 An electronic device 500 including various devices is shown; however, it should be understood that implementation or inclusion of all shown devices is not required. More or fewer devices may be implemented or included alternatively.

[0085] For example, the electronic device 500 may further include a peripheral interface (not shown in the figure). This peripheral interface can be various types of interfaces, such as a USB interface, a Lightning interface, etc. The communication device 509 can communicate wirelessly with a network and other devices, such as the Internet, an intranet, and / or a wireless network such as a cellular telephone network, a wireless local area network (LAN), and / or a metropolitan area network (MAN). Wireless communication can use any of a variety of communication standards, protocols, and technologies, including but not limited to Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), Wideband Code Division Multiple Access (W-CDMA), Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Bluetooth, Wi-Fi (e.g., based on IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, and / or IEEE 802.11n standards), Voice over Internet Protocol (VoIP), Wi-MAX, protocols for email, instant messaging, and / or Short Message Service (SMS), or any other suitable communication protocol.

[0086] In addition to the illustrative examples described above, the following points also need to be noted:

[0087] (1) The accompanying drawings of the embodiments of this disclosure only involve the structures involved in the embodiments of this disclosure. Other structures can be referred to the general design.

[0088] (2) Where there is no conflict, the embodiments of this disclosure and the features in the embodiments can be combined with each other to obtain new embodiments.

[0089] The above description is merely an exemplary embodiment of this disclosure and is not intended to limit the scope of protection of this disclosure, which is determined by the appended claims.

Claims

1. A floating-point multiply-accumulator, comprising: The module includes a control module, a receiving module, a splitting module, an exponentiation module, a multiplication array module, an alignment module, and an addition module. The receiving module is configured to receive all or only the first operand and the third operand from the first operand, the second operand, and the third operand used for the operation, wherein the operation includes multiplication-addition or addition. The splitting module is configured to split the operands received by the receiving module into the sign bit, exponent, and mantissa, respectively. The exponent calculation module is configured to perform exponent calculation on the exponents obtained by the splitting module to obtain the exponent calculation result. The multiplication array module is configured to perform a multiplication operation on the first mantissa of the first operand and the second mantissa of the second operand when performing the multiplication-addition operation, so as to obtain the multiplication result; The alignment module is configured to perform alignment shift on the third mantissa of the third operand based on the result of the exponent operation, so as to obtain the mantissa alignment result. The addition module is configured to, when performing the multiplication-addition operation, perform a first addition operation based on the mantissa alignment result and the multiplication result to obtain a first addition result, or, when performing the addition operation, perform a second addition operation based on the mantissa alignment result and the first mantissa of the first operand to obtain a second addition result; and The control module is configured to control the operation of the splitting module, the exponentiation module, the addition module, and the multiplication array module according to the type of operation.

2. The floating-point multiply-accumulator as described in claim 1, further comprising a selection module, wherein, The selection module is coupled to the splitting module, the exponent calculation module, and the addition module, and is configured to select the exponent and mantissa obtained from the splitting module according to the control instructions provided by the control module, and send the exponent and mantissa obtained from the splitting to the exponent calculation module and the addition module in the mode of the multiplication-addition operation or the addition operation.

3. The floating-point multiply-accumulator as described in claim 2, wherein, The selection module includes multiple selectors, and the multiple selectors include a first selector; The first selector is coupled to the splitting module and the exponentiation module, and is configured to receive an exponent offset value and a second exponent of the second operand received from the splitting module. When the operation is the multiplication operation, the second exponent is selected and output to the exponentiation module; or, when the operation is the addition operation, the exponent offset value is selected and output to the exponentiation module.

4. The floating-point multiply-accumulator as described in claim 3, wherein, The addition module includes a carry-preserving adder and a leading zero prediction module. The carry-preserving adder is coupled to the alignment module and is configured to compress the mantissa alignment result, the received first addend, and the received second addend to obtain first compressed data and second compressed data. The summation leading zero prediction module is configured to, when the operation is the multiplication operation, perform the first addition process on the received third addend and fourth addend to obtain the first addition process result, or when the operation is the addition operation, perform the second addition process on the received third addend and fourth addend to obtain the second addition process result, and is configured to calculate the number of leading zeros in the first addition process result or the second addition process result.

5. The floating-point multiply-accumulator as described in claim 4, wherein, The plurality of selectors includes a second selector and a third selector, the summation leading zero prediction module is coupled to the carry-preserving adder, and the third addend and the fourth addend are the first compressed data and the second compressed data, respectively; The second selector is configured to, when the operation is the multiplication operation, select to provide the first result of the multiplication operation result to the carry-retaining adder as the first addend, or, when the operation is the addition operation, select to provide zero to the carry-retaining adder as the first addend; The third selector is configured to, when the operation is the multiplication operation, select to provide the second result of the multiplication operation result to the carry-retaining adder as the second addend, or, when the operation is the addition operation, select to provide the first mantissa obtained by the splitting module to the carry-retaining adder as the second addend.

6. The floating-point multiply-accumulate unit as described in claim 5, wherein, The splitting module is further configured to, when the operation is the addition operation, perform bit-width expansion processing on the first mantissa of the first operand and provide the expanded first mantissa to the third selector, wherein the expanded first mantissa has the same bit width as the second result.

7. The floating-point multiply-accumulator as described in claim 4, wherein, The plurality of selectors includes a second selector and a third selector, wherein the first addend and the second addend are respectively the first result and the second result in the multiplication operation result; The second selector is configured to, when the operation is the multiplication operation, select to provide the first compressed data to the summation leading zero prediction module as the third addend, or, when the operation is the addition operation, select to provide the mantissa alignment result obtained by the alignment module to the summation leading zero prediction module as the third addend; The third selector is configured to, when the operation is the multiplication operation, select to provide the second compressed data to the summation leading zero prediction module as the fourth addend, or, when the operation is the addition operation, select to provide the first mantissa obtained by the splitting module to the summation leading zero prediction module as the fourth addend.

8. The floating-point multiply-accumulator as claimed in claim 1, wherein, The receiving module includes a first register, a second register, and a third register, wherein the first register, the second register, and the third register are respectively configured to store the first operand, the second operand, and the third operand.

9. The floating-point multiply-accumulator as described in claim 8, wherein, The receiving module also includes a fourth register. The fourth register is configured to store the first operand when the operation is the addition operation, and to provide the first operand to the splitting module to obtain the first mantissa used for the addition operation and provided to the addition module.

10. The floating-point multiply-accumulator of claim 1, further comprising: A normalization shift module is configured to logically shift the result of the first addition or the result of the second addition to obtain a normalized result, wherein the number of bits of the logical shift is equal to the number of leading zeros; and The rounding module is configured to perform rounding processing on the normalized result.

11. A processor, comprising: The floating-point multiply-accumulator as described in any one of claims 1-10.

12. An electronic device comprising the processor as claimed in claim 11.