FPGA-based parallelization method for floating-point multiplication

By designing a parallel floating-point multiplication operator on an FPGA, the IEEE 754 floating-point number is divided into exponent blocks and floating blocks. Parallel processing is then performed using the FPGA, solving the problems of accuracy and speed in IEEE floating-point multiplication calculations and achieving high flexibility and efficient computation.

CN116028012BActive Publication Date: 2026-06-26SOUTHEAST UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SOUTHEAST UNIV
Filing Date
2023-02-22
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing IEEE floating-point multiplication calculations suffer from poor computational precision and flexibility, as well as long latency, in in-memory computing architectures. This is especially true when the IEEE 754 floating-point representation is fixed and cannot be divided into blocks, making it difficult to meet the requirements for high flexibility and high computational speed.

Method used

Design an FPGA-based floating-point parallel multiplication operator. By dividing IEEE754 floating-point numbers into exponent blocks and floating blocks, a variable-bit floating block fixed-point multiplication is adopted, and the parallel processing capability of the FPGA is utilized to realize the parallel calculation of the sign bit, exponent bit, and mantissa bit.

Benefits of technology

It achieves high flexibility and precision in floating-point calculations while minimizing storage resource requirements, thus improving calculation speed, reducing computational complexity, and enabling large-scale operations to be completed simultaneously, thereby reducing latency in linear processing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116028012B_ABST
    Figure CN116028012B_ABST
Patent Text Reader

Abstract

The application discloses a kind of parallelization multiplication operation methods of floating point number based on FPGA, comprising the following steps: based on the representation under IEEE754 different precision, design a kind of low-precision storage mode that can be blocked, including the design of exponent block and floating block;Then design arbitrary variable bit floating block fixed-point addition, variable bit floating block fixed-point multiplication is realized by the way of multiplication pool;Then using FPGA, the multiplication calculation result between the sign bit and the exponent bit of the multiplicand and the multiplier is obtained by exclusive or operation and fixed-point addition, the multiplication calculation result of the significand bit of the multiplicand and the multiplier is obtained by fixed-point multiplication;Finally, the obtained multiplication calculation result is normalized.The application can be effectively applied in memory computing, with the increase of the number of blocks, the data calculation delay can be significantly reduced, and the variable precision can improve the flexibility of calculation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to the field of computers, and relates to the design of floating-point multiplication operators. More specifically, it relates to a parallel multiplication operator designed based on FPGA and using the block division of floating-point numbers. Background Art

[0002] The current computing architecture is mainly the von Neumann architecture, which has multiple data storage media, making data need to be frequently moved between different media to meet the needs of computing and storage. However, the low latency requirement and large data volume exacerbate the efficiency and energy consumption problems of the von Neumann structure. In contrast, the in-memory computing (PIM) architecture based on the integration of computing and storage can effectively eliminate the data throughput between the storage unit and the computing unit and is not restricted by the energy consumption ratio, thus effectively solving the von Neumann bottleneck. Therefore, PIM is receiving extensive attention.

[0003] The design of low-precision multiplication operators is the core of the in-memory computing platform. Currently, the commonly used low-precision storage method is the IEEE floating-point arithmetic standard (IEEE Standard for Floating-point Arithmetic, IEEE-754), which defines a sign bit for a number, a fixed number of exponent bits, and a fixed number of mantissa bits. The representation method of floating-point numbers is similar to scientific notation. Taking the IEEE single-precision floating-point data format standard as an example, it has a total of 32 bits. The value of a normalized floating-point number is (-1)

[0006] ,

[0005] , , , , ,

[0004] ×1.f×2 e-127 (0 < e < 255). When s takes 0, it represents a positive value, and when it takes 1, it represents a negative value; e represents the exponent bit, recording the power exponent value with a base of 2, and usually needs to subtract an offset value; 1.f represents the mantissa.

[0004] This representation method has a fixed relative error and a large representation range. However, due to its non-blockability, the representation method of IEEE floating-point numbers is fixed, and the flexibility of computing precision is poor. In addition, when using IEEE 754 floating-point numbers for multiplication calculations, data usually needs to be processed linearly, and the number of processing times is exponential, resulting in a long latency. Therefore, in the in-memory computing architecture, the design of low-precision multiplication operators with high flexibility and high computing speed has become a major challenge. Summary of the Invention

[0005] To solve the above problems, based on the IEEE 754 floating-point storage method, the present invention proposes a floating-point multiplication operator that can achieve parallel processing based on FPGA.

[0006] To achieve the above objectives, this invention proposes a floating-point parallel multiplication method based on FPGA. The operator design is carried out in the following steps: First, a block-based low-precision storage method is designed, which includes the design of exponent blocks and floating blocks. Then, a floating block fixed-point multiplication method with arbitrary variable bit length is designed. This process includes addition algorithms and shifting processes. Next, the FPGA is used to complete the calculation of the sign bit and exponent bits, and the multiplication of the mantissa bits of the multiplier and multiplicand is implemented. Finally, the obtained multiplication calculation result is normalized.

[0007] The present invention provides a method for parallelizing floating-point multiplication based on FPGA, comprising the following steps:

[0008] Step 1: Extract the sign bits of the multiplicand A and the multiplier B, and perform an XOR operation on the sign bits to obtain the sign bit of the result value;

[0009] Step 2: Extract the exponent bits of the floating-point number A (multiplicand) and the floating-point number B (multiplier), and add them together to obtain the exponent bits of the multiplication result.

[0010] Step 3: Divide the mantissas of the floating-point multiplicand A and multiplier B into separate blocks to obtain the number m of floating blocks, as shown below:

[0011]

[0012] in, This indicates rounding up. m0 represents the number of bits in the mantissa of IEEE 754, and F represents the number of bits in each floating block, which is a user-defined parameter.

[0013] Step 4: Construct a multiplication pool for fixed-point multiplication of the mantissa. The multiplication pool is 2m long and consists of an addition pool. Each addition pool digests the floating blocks stored in the addition pool through fixed-point addition, and another floating block is constructed to store the digestion result of this addition pool, that is, to store the final result after fixed-point addition of all floating blocks in the addition pool.

[0014] Step 5: For all 1≤i,j≤m, calculate the product of A(i) and B(j). Place the lower F bits of the multiplication result into the (i+j)th addition pool of the multiplication pool, and place the higher F bits of the multiplication result into the (i+j-1)th addition pool of the multiplication pool. Repeat this process for all i,j, and complete m operations simultaneously. 2 Calculation of fixed-point multiplication of floating blocks;

[0015] Step 6: Based on the fixed-point addition of floating blocks, calculate the cumulative result of the floating blocks stored in each addition pool, and store the result in the floating block constructed by this addition pool;

[0016] Step 7: Normalize the multiplication result; retain the first m floating blocks, which are the complete mantissa bits in the multiplication result. Add a floating block consisting of the sign bit and exponent bit before the floating block corresponding to the mantissa bits to obtain the final multiplication result of the floating-point parallel multiplication operator.

[0017] Furthermore, in step 2, the exponent bits of the floating-point number A (multiplicand) and the floating-point number B (multiplier) are extracted, and the two are added together to obtain the exponent bits of the multiplication result; specifically:

[0018] Define the index of the first floating block to the left of the decimal point as 0, and the indexes of the floating blocks to its left increase sequentially, while the indexes of the floating blocks to its right decrease sequentially. The index of the most significant floating block is represented as:

[0019]

[0020] Where d represents the initial decimal data.

[0021] If the number of bits in the highest-order floating block is less than F, add leading zeros until the number of bits in the highest-order floating block is also equal to F.

[0022] The decimal result of the exponent is represented as:

[0023] e 754 =m H +2 E-2 -1.

[0024] The decimal result of the exponent e 754 Convert to binary representation e b :

[0025] Define the sign bit and e of the stored number b Combined floating blocks are collectively referred to as exponential blocks;

[0026] The exponents of multiplier A and multiplier B are added together after removing their sign bits to obtain the exponent of the multiplication result.

[0027] Furthermore, in step 3, floating blocks are used to represent the mantissa as follows:

[0028] The mantissa of the multiplicand A is represented as [A(1), A(2), ..., A(i), ..., A(m)], and the mantissa of the multiplier B is represented as [B(1), B(2), ..., B(j), ..., B(m)];

[0029] Furthermore, the calculation of the floating block fixed-point multiplication in step 5 specifically includes the following steps:

[0030] Step 5.1: For two floating blocks with the same number of bits, represent the F-bit fixed-point multiplicand as a = [a(0), a(1), a(2)...a(F-1)], and the F-bit fixed-point multiplier as b = [b(0), b(1), b(2)...b(F-1)]. Assume that the number of non-zero numbers in the F-bit data of the fixed-point multiplier b is n. b ;

[0031] Step 5.2: For 0≤i≤F-1, if b(i)=1, shift the entire a left by (F-1-i) bits. For data exceeding the F-bit range, pad the data with zeros before it and store it in the high F bits. For data below the F-bit range, pad the data with zeros on the right and store it in the low F-bit register. If b(i)=0, do not process it and move to the next bit of b to continue the judgment.

[0032] Step 5.3: After traversing all the data in b, a total of n can be obtained. b The high F bits and n b Using the low F bits and floating block fixed-point addition, in 2n b After -2 F-bit fixed-point addition operations, all the lower F-bit data and the higher F-bit data can be added together to obtain the multiplication result of the F-bit fixed-point multiplicand a and the F-bit multiplier b.

[0033] Furthermore, the fixed-point addition of floating blocks in the addition pool described in step 6 specifically includes the following:

[0034] First, ensure that the lengths of the floating blocks to be added are the same. After aligning all the floating blocks, preset a carry storage variable to 0. Then, starting from the least significant bit of the determined operation length, add them sequentially, where the result of each addition is 0, 1, 2, or 3. Save the operation result and update the carry variable each time until the operation of the most significant bit of the determined operation length is completed. Finally, return the carry of the calculation result and the addition result of the two floating blocks.

[0035] Furthermore, the normalization of the multiplication calculation result described in step 7 means that for the 2m floating blocks in the multiplication pool, if the floating block with the highest bit of the mantissa multiplication result is a floating block with all zeros, then the floating block is discarded, the exponent is decremented by 1, and then the remaining floating blocks in the multiplication pool are rounded.

[0036] Compared with the prior art, the present invention has the following advantages and beneficial effects:

[0037] (1) The present invention can divide the IEEE754 floating-point number into a variable number of blocks by means of storage iteration, and realize the block division of floating-point number by using multiple low-width data, which greatly saves storage resources and matches the needs of in-memory computing.

[0038] (2) Compared with the IEEE 754 floating-point system where the exponent bits store the scientific notation, this invention uses the exponent bits to represent the sequence number of the floating block containing the highest bit, which greatly improves the data storage range.

[0039] (3) Compared with the fixed-bit IEEE754 floating-point storage method, the present invention allows for customization of the number of bits in the floating block, which greatly improves the flexibility of calculation accuracy.

[0040] (4) This invention makes full use of the advantages of FPGA parallel data processing to achieve large-scale operations in the same time, avoids linear calculations, greatly improves the running speed and reduces the computational complexity. Attached Figure Description

[0041] Figure 1 This is an example block diagram of the floating-point parallel multiplication calculation proposed in this invention;

[0042] Figure 2 This is a graph showing the variation of the variance of the simulation error of the floating-point parallelized multiplication operator proposed in this invention and the continuous computation simulation of IEEE 754 with the number of consecutive computations.

[0043] Figure 3 This is a line graph showing the computation delay of the floating-point parallelized multiplication operator proposed in this invention under different numbers of blocks. Detailed Implementation

[0044] The technical solution provided by the present invention will be described in detail below with reference to the accompanying drawings and specific examples. It should be understood that the following specific embodiments are only used to illustrate the present invention and are not intended to limit the scope of the present invention. After reading the present invention, any modifications of the present invention in various equivalent forms by those skilled in the art fall within the scope defined by the appended claims.

[0045] like Figure 1 As shown, a method for parallelizing floating-point multiplication based on FPGA, wherein the floating-point numbers are in IEEE single-precision floating-point data format, includes the following steps:

[0046] Step 1: Extract the sign bits of the multiplicand A and the multiplier B, and perform an XOR operation on the sign bits to obtain the sign bit of the result value;

[0047] In this embodiment, the decimal representation of multiplicand A is 0.21321, and the decimal representation of multiplier B is 231.4314123. Based on the IEEE 754 floating-point representation of multiplicand A and multiplier B, the IEEE 754 sign bit is preserved, with 0 representing positive and 1 representing negative. In this embodiment, the sign bits of both multiplicand A and multiplier B are 0. Therefore, the sign bit 0 of the multiplication result can be obtained by XORing the sign bit 0 of A with the sign bit 0 of B.

[0048] Step 2: Extract the exponent bits of the floating-point number A (multiplicand) and the floating-point number B (multiplier), and add them together to obtain the exponent bits of the multiplication result.

[0049] The first floating block to the left of the decimal point is defined as 0. The floating blocks to its left are numbered sequentially, and the floating blocks to its right are numbered sequentially. The number of the most significant floating block is represented as:

[0050]

[0051] Where d represents the initial decimal data.

[0052] If the number of bits in the highest-order floating block is less than F, add leading zeros until the number of bits in the highest-order floating block is also equal to F.

[0053] The decimal result of the exponent is represented as:

[0054] e 754 =m H +2 E-2 -1.

[0055] The decimal result of the exponent e 754 Convert to binary representation e b :

[0056] Define the sign bit and e of the stored number b Combined floating blocks are collectively referred to as exponential blocks;

[0057] Based on the representation of multiplicand A and multiplier B according to IEEE 754 floating-point numbers, in this example, the number of bits E of the exponent block is set to 8. Combined with the sign bit, the sign bit is placed at the beginning of the exponent block. Then the exponent of multiplicand A can be represented as: [0, 0, 1, 1, 1, 1, 1, 1], and the exponent of multiplier B can be represented as: [0, 1, 0, 0, 0, 0, 0, 0];

[0058] After removing the sign bit, the exponent of A is [0, 1, 1, 1, 1, 1, 1], and the exponent of B is [1, 0, 0, 0, 0, 0, 0]. Therefore, after adding the exponents of the multiplicand A and the multiplier B, and adding an offset (carry) value, the exponent of the multiplication result is [1, 0, 0, 0, 0, 0, 0].

[0059] Step 3: Divide the mantissa of the floating-point number as defined by IEEE 754 into blocks to obtain several floating blocks with the same number of bits. Then the number of floating blocks is...

[0060]

[0061] in, This indicates rounding up. m0 represents the number of bits in the mantissa of IEEE 754, and F represents the number of bits in each floating block, which is a user-defined parameter.

[0062] The mantissa of the multiplicand A is represented as [A(1), A(2), ..., A(i), ..., A(m)], and the mantissa of the multiplier B is represented as [B(1), B(2), ..., B(j), ..., B(m)];

[0063] Step 4: Construct a multiplication pool to store the multiplication results of the mantissa. The multiplication pool is 2m long and consists of 2m addition pools. Each addition pool digests the floating blocks stored in it through fixed-point addition, and then constructs another floating block to store the digested result of this addition pool, i.e., it stores the final result after fixed-point addition of all floating blocks in this addition pool. Therefore, a total of 2m floating blocks are needed to store the multiplication results of the mantissa in the multiplication pool. The total number of bits of data contained in the multiplication pool is represented as 2F*m.

[0064] Based on the mantissa representation of multiplicand A and multiplier B in step 2, in this example, the number of bits F of the floating block is 8 bits, and the mantissa of multiplicand A is represented as [0, 0, 1, 1, 0, 1, 1, 0], [1, 0, 0, 1, 0, 1, 0, 0], [1, 1, 1, 0, 1, 1, 1, 0], [0, 0, 1, 1, 1, 0, 0, 1], [0, 0, 1, 0, 1, 1, 1, 0], [0, 0, 0, 1, 1, 1, 1, 0], [1, 1, 1, 1, 1, 0, 0, 0];

[0065] The mantissa of the multiplier B is represented as [1, 1, 1, 0, 0, 1, 1, 1], [0, 1, 1, 0, 1, 1, 1, 0], [0, 1, 1, 1, 0, 0, 0, 1], [0, 0, 0, 0, 1, 0, 0, 1], [0, 1, 0, 1, 1, 1], [1, 0, 0, 1, 1, 1, 1], [1, 0, 0, 1, 1, 0, 0, 0].

[0066] Step 5, Fixed-point multiplication of floating blocks

[0067] For 1≤i, j≤7, calculate the fixed-point floating block multiplication result between floating block A(i) and floating block B(j). The lower eight bits of the multiplication result are placed in the (i+j)th addition pool of the multiplication pool, and the higher eight bits are placed in the (i+j-1)th addition pool of the multiplication pool. Utilizing the parallelization characteristics of the FPGA, multiple always statements are used to traverse all i and j, completing m in the same time. 2 Calculation of fixed-point multiplication of floating blocks.

[0068] Specifically, the calculation of fixed-point multiplication of floating blocks includes the following steps:

[0069] Step 5.1: For two floating blocks with the same number of bits, represent the F-bit fixed-point multiplicand floating block as a = [a(0), a(1), a(2)...a(F-1)], and the F-bit fixed-point multiplier floating block as b = [b(0), b(1), b(2)...b(F-1)]. Assume that the number of non-zero numbers in the F-bit data of the fixed-point number b is n. b .

[0070] Step 5.2: For 0≤i≤F-1, if b(i)=1, shift the entire a left by (F-1-i) bits. For data exceeding the F-bit range, pad the data with zeros before it and store it in the high F bits. For data below the F-bit range, pad the data with zeros on the right and store it in the low F-bit register. If b(i)=0, do not process it and move to the next bit of b to continue the judgment.

[0071] Step 5.3: After traversing all the data in b, a total of n can be obtained. b The high F bits and n b The lowest F bits, using 2n b After -2 F-bit fixed-point addition operations, all the low F-bit data and high F-bit data can be added together to obtain the floating block fixed-point multiplication result of F-bit fixed-point multiplicand floating block a and F-bit multiplier floating block b. This result is divided into low F-bit data and high F-bit data, thus completing the design of floating block fixed-point multiplication.

[0072] Step 6: For each addition pool within the multiplication pool, based on floating block fixed-point addition, the floating blocks stored in this addition pool are accumulated and digested. A new floating block is then constructed to store the digestion result of this addition pool, i.e., the final result after fixed-point addition of all floating blocks in this addition pool. There are a total of 2m addition pools in the multiplication pool. During the entire calculation process, a total of 2m*(m-1) F-bit fixed-point additions are required. Utilizing the parallelization characteristics of the FPGA, multiple always statements are used to complete the calculation of 2m*(m-1) floating block fixed-point additions simultaneously.

[0073] In this embodiment, after calculating the data in each addition pool, 14 8-bit data are obtained. The entire calculation process requires a total of 84 8-bit point additions. Finally, a total of 14 8-bit data are obtained in the multiplication pool.

[0074] Step 7: Since the number of mantissa blocks exceeds the specified storage length for mantissas during the calculation, the multiplication result is normalized. For the 14 floating blocks in the multiplication pool, if the highest-order floating block of the mantissa multiplication result is an all-zero floating block, the floating block is discarded, and the exponent is decremented by 1. Then, the remaining floating blocks in the multiplication pool are rounded, and only the first 7 floating blocks are retained. These 7 floating blocks are the complete mantissas in the multiplication result. Adding a floating block to store the sign and exponent bits before the mantissas yields the final multiplication result of the floating-point parallel multiplication operator. Converting this to decimal, it is 49.343491416483005, which perfectly matches the result of multiplying the multiplicand A (0.21321) and the multiplier B (231.4314123).

[0075] Figure 2 This graph shows the variation of the variance of the simulated iteration error with the number of operations when performing continuous calculations using the floating-point parallelized multiplication operator proposed in this invention and the IEEE 754 multiplication method. For randomly generated multipliers and multiplicands, the data is always stored in 32-bit format. IEEE 754 does not use block partitioning; this invention divides the mantissa of IEEE 754 into 2, 4, 6, and 8 blocks for continuous calculations. It can be seen that the iteration error variance of both the multiplication operator and IEEE 754 increases linearly with the number of continuous calculations. The more floating blocks the mantissa of IEEE 754 is divided in this invention, the smaller the iteration error variance becomes. Meanwhile, the iteration error variance of the floating-point parallelized multiplication operator proposed in this invention is always slightly larger than that of IEEE 754, but the order of magnitude is very small, around 10 to the power of -18, and its impact is negligible. This invention significantly improves the calculation speed while having almost no impact on the iteration error variance of continuous calculations.

[0076] Figure 3 This is a line graph showing the computation latency of the proposed floating-point parallelized multiplication operator as a function of the number of blocks during multiplication. The number of blocks ranges from 1 to 8. The horizontal line represents the reference computation latency using the IEEE 754 standard, while the dashed lines (represented by triangles) represent different computation latencies obtained using this invention. Data is always stored in 32-bit format. It can be seen that the computation latency of the proposed multiplication operator decreases with increasing block size, indicating that after multiple block divisions for parallel data processing, the computational complexity of this invention is reduced and remains consistently lower than that of the IEEE 754 standard.

Claims

1. A method for parallelizing floating-point multiplication operations based on FPGA, characterized in that, Includes the following steps: Step 1: Extract the sign bits of the multiplicand A and the multiplier B, and perform an XOR operation on the sign bits to obtain the sign bit of the result value; Step 2: Extract the exponent bits of the floating-point number A (multiplicand) and the floating-point number B (multiplier), and add them together to obtain the exponent bits of the multiplication result. Step 3: Divide the mantissas of the floating-point multiplicand A and multiplier B into separate blocks to obtain the number m of floating blocks, as shown below: in, This indicates rounding up. m0 represents the number of bits in the mantissa of IEEE 754, and F represents the number of bits in each floating block, which is a user-defined parameter. Step 4: Construct a multiplication pool for fixed-point multiplication of the mantissa. The multiplication pool is 2m long and consists of an addition pool. Each addition pool digests the floating blocks stored in the addition pool through fixed-point addition, and another floating block is constructed to store the digestion result of this addition pool, that is, to store the final result after fixed-point addition of all floating blocks in the addition pool. Step 5: For all 1≤i,j≤m, calculate the product of A(i) and B(j). Place the lower F bits of the multiplication result into the (i+j)th addition pool of the multiplication pool, and place the higher F bits of the multiplication result into the (i+j-1)th addition pool of the multiplication pool. After traversing all i,j, m is completed simultaneously. 2 Calculation of fixed-point multiplication of floating blocks; Step 6: Based on the fixed-point addition of floating blocks, calculate the cumulative result of the floating blocks stored in each addition pool, and store the result in the floating block constructed by this addition pool; Step 7: Normalize the multiplication result; retain the first m floating blocks, which are all the mantissa bits in the multiplication result; add a floating block composed of the sign bit and exponent bit before the floating block corresponding to the mantissa bits to obtain the final multiplication result of the floating-point parallel multiplication operator.

2. The FPGA-based floating-point parallel multiplication method according to claim 1, characterized in that, Step 2 extracts the exponent bits of the floating-point number A (multiplicand) and the floating-point number B (multiplier), and adds them together to obtain the exponent bits of the multiplication result; specifically: The first floating block to the left of the decimal point is defined as 0. The floating blocks to its left are numbered sequentially, and the floating blocks to its right are numbered sequentially. The number of the most significant floating block is represented as: Where d represents the initial decimal data; If the number of bits in the highest-order floating block is less than F, pad with 0s in front until the number of bits in the highest-order floating block is also equal to F; The decimal result of the exponent is represented as: have been 754 =m H +2 E-2 -1. The decimal result of the exponent e 754 Convert to binary representation e b : Define the sign bit and e of the stored number b Combined floating blocks are collectively referred to as exponential blocks; The exponents of multiplier A and multiplier B are added together after removing their sign bits to obtain the exponent of the multiplication result.

3. The FPGA-based parallel floating-point multiplication method according to claim 1, characterized in that, In step 3, floating blocks are used to represent the mantissa as follows: The mantissa of the multiplicand A is represented as [A(1),A(2),…,A(i),…,A(m)], and the mantissa of the multiplier B is represented as [B(1),B(2),…,B(j),…,B(m)].

4. The FPGA-based floating-point parallel multiplication method according to claim 1, characterized in that, The calculation of the fixed-point multiplication of floating blocks in step 5 specifically includes the following steps: Step 5.1: For two floating blocks with the same number of bits, represent the F-bit fixed-point multiplicand as a = [a(0), a(1), a(2)...a(F-1)], and the F-bit fixed-point multiplier as b = [b(0), b(1), b(2)...b(F-1)]; assuming that the number of non-zero numbers in the F-bit data of the fixed-point multiplier b is n. b ; Step 5.2: For 0≤i≤F-1, if b(i)=1, shift a to the left by (F-1-i) bits. For data that exceeds the range of F bits, pad with zeros in front of the data and store it in the high F bits. For data that is lower than F bits, pad with zeros on the right and store it in the low F bits register. If b(i)=0, do not process it and move to the next bit of b to continue the judgment. Step 5.3: After traversing all the data in b, a total of n can be obtained. b The high F bits and n b Using the low F bits and floating block fixed-point addition, in 2n b After -2 F-bit fixed-point addition operations, all the lower F-bit data and the higher F-bit data can be added together to obtain the multiplication result of the F-bit fixed-point multiplicand a and the F-bit multiplier b.

5. The FPGA-based parallel floating-point multiplication method according to claim 1, characterized in that, The fixed-point addition of floating blocks in the addition pool mentioned in step 6 specifically includes the following: First, ensure that the lengths of the floating blocks to be added are the same. After aligning all the floating blocks, preset a carry storage variable to 0. Then, starting from the least significant bit of the determined operation length, add them sequentially, with each addition resulting in 0, 1, 2, or 3. Save the operation result and update the carry variable each time until the operation of the most significant bit of the determined operation length is completed. Finally, return the carry of the calculation result and the addition result of the two floating blocks.

6. The FPGA-based floating-point parallel multiplication method according to claim 1, characterized in that, The normalization of the multiplication result described in step 7 refers to the following: for the 2m floating blocks in the multiplication pool, if the floating block with the highest bit of the multiplication result in the mantissa is a floating block with all zeros, then the floating block is discarded, the exponent is decremented by 1, and then the remaining floating blocks in the multiplication pool are rounded.