System and method for floating point vector operations for neural networks and convolutions
By checking the range of floating-point multiplication elements in the hardware and directly converting them to fixed-point integers, the low efficiency of floating-point vector dot product calculation is solved, resulting in faster processing speed and lower power consumption, and improved semiconductor utilization.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- EXPEDERA INC
- Filing Date
- 2024-10-10
- Publication Date
- 2026-06-19
AI Technical Summary
In existing technologies, the dot product calculation of floating-point vectors is computationally and power-intensive, requiring a large amount of semiconductor area, resulting in low processing speed and low power utilization.
By checking the range of elements before performing floating-point multiplication in hardware, multiplications that exceed the range of fixed-point results are avoided, and hardware products are directly converted to fixed-point integers, reducing unnecessary computational steps and power consumption. Addition is performed using a fixed-point accumulator.
It improves computing speed, reduces semiconductor area and power consumption, and achieves more efficient semiconductor utilization.
Smart Images

Figure CN122249804A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of application-specific semiconductor circuits and electronic hardware, providing apparatus and methods for improving the efficiency of tensor operations (including vector dot products). Vector dot products form the basis of tensor operations, including convolution and computation within neural networks. Tensor multiplication comprises multiple vector dot product operations. The apparatus is designed for semiconductor implementations that utilize semiconductor area, power, and processing speed more efficiently. These operations are commonly used in image processing convolution functions and neural networks. Background Technology
[0002] The inclusion of any method described in this section should not be construed as prior art. The calculation of the dot product of two floating-point vectors, when conforming to IEEE standards, can be computationally and power-intensive, requiring significant semiconductor area. What is needed are methods and digital semiconductor structures for improving the processing speed, power efficiency, and semiconductor utilization of hardware that performs floating-point dot products, including applications such as convolutions and neural networks. Summary of the Invention
[0003] The present invention is provided to present the selection of concepts in a simplified form, which will be further described in the detailed description below. The present invention is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
[0004] In one aspect of the invention, a method for generating the dot product of two floating-point vectors in hardware is disclosed. The method includes retrieving elements from a first vector and elements from a second vector from memory. In digital semiconductor hardware, floating-point multiplication is performed on the corresponding elements from the first and second vectors. The resulting floating-point product is then converted to a fixed-point representation of the floating-point product.
[0005] In one embodiment, before performing the floating-point multiplication, the multiplicand and multiplier elements are checked to determine if the resulting product exceeds the range for conversion to fixed-point results. If it does, the multiplication is not performed, thus saving both computation time and power consumed by switching gates during the multiplication operation. This range check can be performed by checking the exponents of the elements being multiplied.
[0006] Next, the hardware product is converted to a fixed-point integer, having a predetermined number of bits representing the integer value and a predetermined number of bits representing the decimal value. Optionally, this step is skipped if the product is out of range.
[0007] Next, the hardware accumulator adds the converted fixed-point integer to the accumulated value. Attached Figure Description
[0009] Exemplary embodiments are illustrated by way of example and are not limited to the accompanying drawings, wherein similar reference numerals indicate similar elements.
[0010] Figure 1 —Shows an instance of the bit organization of a 16-bit floating-point number.
[0011] Figure 2 —Examples of fixed-point numbers.
[0012] Figure 3 —This is a flowchart providing a method for efficiently performing the dot product of two vectors.
[0013] Figure 4 — is an instance of a system used to perform the dot product of floating-point vectors.
[0014] Figure 5 —A graph of weighted sums and neurons performed by floating-point dot products. Detailed Implementation
[0015] The following detailed description includes reference to the accompanying drawings, which are an integral part of the detailed description. The drawings illustrate an illustration based on exemplary embodiments. These exemplary embodiments (also referred to herein as “examples”) are sufficiently detailed to enable those skilled in the art to practice the subject matter. Embodiments may be combined, other embodiments may be utilized, or structural, functional, logical, and electrical changes may be made without departing from the scope of the claims. Therefore, the following detailed description should not be considered limiting, and its scope is defined by the appended claims and their equivalents.
[0016] Data representation in computer memory can vary greatly. In floating-point representation, 12.5 = 1.25. 10 1 =12.5000, but it can be represented in different ways in computer memory, including IEEE-compliant floating-point or integer representations. This is important because the representation affects the space required to represent the number and the processing speed required to perform multiplication and addition.
[0017] For example, but not limited to, two representations of data include FP-16 (floating-point 16) and 16-bit representation. Other representations of floating-point numbers include FP-32 and FP-64.
[0018] refer to Figure 1 This is a description of the FP-16 number organization, 100. There is a sign bit 110, for either a positive or negative sign. Five digits 120 are used for the exponent, which is the value of 2 raised to the power of 2 in scientific notation. Ten digits 130 are used for the mantissa, which are the significant digits of the decimal. For IEEE format, the mantissa is preceded by an implicit value "1".
[0019] The processing steps required to perform floating-point multiplication are simple. As shown below, the exponents are added together and the mantissas are multiplied.
[0020] multiplication
[0021] 0.0000004 3,250,000,000
[0022] = (4 10 -7 )(3.25 10 9 )
[0023] = (4 3.25)(10 -7 10 9 )
[0024] = 13 10 2
[0025] = 1300
[0026] The method used for floating-point multiplication does not need to be changed. However, floating-point addition is more complex. Before addition, the fractional part of the number needs to be shifted based on the exponent value, which slows down the addition process.
[0027] Another optimization that can be implemented when generating floating-point vector dot products in semiconductor hardware is to reduce the steps required for floating-point multiplication. Typically, a floating-point multiplication circuit consists of a mantissa multiplier, an exponent adder, followed by shift, rounding, and normalization logic. When obtaining the result of a floating-point multiplication and converting it to a fixed-point representation—some of the operations performed by rounding and normalization are largely unnecessary. Instead, the intermediate result created after the floating-point multiplier and adder is directly converted to a fixed-point representation. This involves comparing the exponents, and then shift logic correctly shifts the multiplier result to the appropriate decimal places.
[0028] Figure 3 Flowchart 300 illustrates a computationally efficient process for performing the dot product of two floating-point vectors. This efficiency includes faster computation speed, less semiconductor area required, and less power consumption.
[0029] In step 310, the floating-point elements of the first vector and the elements of the second vector are received. These are the elements for which the dot product calculation is to be performed efficiently in hardware.
[0030] In optional step 320, a check is performed to determine whether the multiplication of the corresponding elements of the dot product exceeds the limits of the hardware accumulator. This can be done by checking the exponents of the two floating-point numbers to be multiplied in the dot product. For example, the accumulator may only support numbers as small as the number of decimal places allocated in integers. For example, 2 -8 It can be the smallest number in a 16-bit accumulator, or 2 -16 This can be the smallest number in a 32-bit accumulator. Therefore, if the result of multiplying the two elements in the dot product would be less than this value, then performing the multiplication and accumulation is a waste of processing time and power, as it will not have any effect on the final result in the accumulator.
[0031] Furthermore, if the multiplication of the corresponding elements of the dot product exceeds the range of the accumulator, the accumulator can be set to its maximum value. This check can also be performed by examining the exponents of the two floating-point numbers to be multiplied in the dot product. Again, this saves power and time compared to performing a full multiplication.
[0032] If the multiplication is not within the range of the accumulator, the process returns to step 310 to check the multiplication of the next element to be multiplied in the dot product array.
[0033] In step 330, floating-point multiplication of the corresponding elements is performed.
[0034] In step 340, the floating-point multiplication result is converted to a fixed-point integer that is compatible with the configuration of the integer and fractional parts of the fixed-point integer accumulator.
[0035] In step 350, the transformed fixed-point multiplication result is added to the accumulator. The process continues at step 310, where the next pair of corresponding vector elements to be multiplied and accumulated is processed. Once the end of the vector is reached, the process ends.
[0036] Not shown in the diagram, the process may include converting the resulting accumulation from a fixed-point number back to a floating-point number via a hardware implementation.
[0037] Figure 4 A system block diagram 400 provides semiconductor logic blocks configured to perform efficient floating-point vector dot products in custom semiconductor digital hardware. The system may include a processor 470, a sequencer 410, a memory unit 420, optional test logic 430 for performing multiplication and accumulation operations, computational logic including one or more floating-point multipliers 435, one or more semiconductor logic tests 440, one or more floating-point to fixed-point converters 450, and one or more accumulators 460.
[0038] Processor 460 provides advanced control to enable various applications involving floating-point generated vector dot products. These applications may include (but are not limited to) convolutions and neural networks. Processor 460 may be a digital signal processor, a microprocessor, a neural processor, or other custom computational logic suitable for the aforementioned functions.
[0039] Sequencer 410 includes the microelectronics required to control the data flow from memory to the logic processing components, including (but not limited to) logic 430 for determining whether floating-point multiplication and accumulation should be performed, one or more floating-point multipliers 435, one or more semiconductor logic tests 440, one or more semiconductor conversion logic 450 for floating-point to fixed-point conversion, and one or more accumulators 460. Sequencer 410 also provides control over the data flow on memory cell 420 and controls the data flow and processing sequence via logic blocks 430, 440, 450, and 460. Those skilled in the art (POSITA) of digital semiconductors will know how to design sequencers and associated hardware to control the data flow between memory and associated hardware to perform dot product calculations.
[0040] Memory unit 420 may contain multiple memory blocks to support applications suitable for parallel processing. These applications may include image convolution processing or neural nodes in a neural network.
[0041] The floating-point multiplier 430 can be a semiconductor representation of IEEE-compliant floating-point multiplication. The multiplier can be used for 16, 32, 64-bit, or more floating-point numbers. The multiplier may include the process described above, which eliminates unnecessary steps when generating fixed-point numbers from floating-point numbers. Hardware logic 440 for determining whether floating-point multiplication and accumulation should be performed can be copied for each multiplier 430. The result is fed back to a sequencer to control the process.
[0042] Each multiplier can provide one or more semiconductor conversion logic 450 for floating-point to fixed-point conversion.
[0043] One or more accumulators 460 are implemented in semiconductor hardware. The accumulators can be of any size, but are preferably at least 16 bits. The decimal point can be located in any bit position.
[0044] refer to Figure 5 This describes node 500 that can reside within a hardware implementation of a neural network. This process performs a dot product between the input vector 510 and the weight vector 520. Each multiplication X1 The result of w1 is, for example, summed by accumulator 530. The resulting output 540 can be a floating-point number or a fixed-point number.
[0045] All corresponding structures, materials, actions, and equivalents of the components or steps plus functional elements in the appended claims are intended to encompass any structure, material, or action that performs a function in combination with other claimed elements expressly claimed. The description of this technology is presented for illustrative and descriptive purposes and is not intended to be exhaustive or limited to the disclosed forms. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of this technology. Exemplary embodiments were chosen and described to best explain the principles of this technology and its practical application, and to enable those skilled in the art to understand this technology to implement various embodiments with various modifications suitable for the particular purpose contemplated.
[0046] The foregoing description, with reference to flowchart illustrations and / or block diagrams of methods and apparatus (systems) according to embodiments of the present technology, describes aspects of the present technology.
[0047] The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present technology. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code, including one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions mentioned in a block may not occur in the order shown in the figures. For example, depending on the functionality involved, two blocks shown consecutively may actually be executed substantially simultaneously, or the blocks may sometimes be executed in reverse order. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented by a system or combination of dedicated hardware based on dedicated hardware that performs the specified function or action.
[0048] In the following description, specific details, such as particular embodiments, processes, techniques, etc., are set forth for purposes of explanation and not limitation, in order to provide a thorough understanding of the invention. However, it will be apparent to those skilled in the art that the invention may be practiced in other embodiments departing from these specific details.
[0049] Throughout this specification, references to "an embodiment" or "an embodiment" mean that a particular feature, structure, or characteristic described in connection with an embodiment is included in at least one embodiment of the invention. Therefore, the phrases "in an embodiment," "in an embodiment," or "according to an embodiment" (or other phrases with similar meanings) appearing in various places throughout this specification do not necessarily refer to the same embodiment. Furthermore, in one or more embodiments, a particular feature, structure, or characteristic may be combined in any suitable manner. Additionally, depending on the context discussed herein, singular terms may include their plural forms, and plural terms may include their singular forms. Similarly, hyphenated terms (e.g., "on-demand") may occasionally be used interchangeably with their unhyphenated versions (e.g., "on-demand"), capitalized entries (e.g., "Software") may be used interchangeably with their non-capitalized versions (e.g., "software"), plural terms may be indicated with or without an apostrophe (e.g., PE's or PEs), and italicized terms (e.g., "N+1") may be used interchangeably with their non-italicized versions (e.g., "N+1"). Such occasional interchangeable uses should not be considered inconsistent with each other.
[0050] Furthermore, some embodiments may be described in reference to "components for performing a task or set of tasks." It should be understood that "components for..." herein may be expressed in reference to structures, such as processors, memory, I / O devices such as cameras, or combinations thereof. Alternatively, "components for..." may comprise algorithms describing function or method steps, while in other embodiments, "components for..." is expressed in reference to mathematical formulas, textual descriptions, flowcharts, or signal diagrams.
[0051] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used herein, unless the context clearly indicates otherwise, the singular forms “a / an” and “described” are intended to include the plural forms as well. It should be further understood that when the term “comprises and / or comprising” is used in this specification, it means the presence of the stated features, integers, steps, operations, elements, and / or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and / or groups thereof.
[0052] It should be noted that the terms “coupled,” “connected,” “connected,” and “electrically connected,” etc., are used interchangeably herein and generally refer to the state of an electrical / electronic connection. Similarly, when a first entity electrically sends and / or receives (whether via wired or wireless means) information signals (whether containing data information or non-data / control information) to a second entity, regardless of the type of these signals (analog or digital), the first entity is considered to be “communicating” with the second entity (or entities). It should be further noted that the various diagrams (including component diagrams) shown and discussed herein are for illustrative purposes only and are not drawn to scale.
[0053] If any disclosure is incorporated herein by reference and conflicts in whole or in part with any of this disclosure, this disclosure shall prevail to the extent of the conflict and / or the broader definitions of the disclosures and / or terms. If any of such incorporated disclosures conflict in whole or in part with each other, the later disclosure shall prevail to the extent of the conflict.
[0054] While various embodiments have been described above, it should be understood that they are presented by way of example only and not by way of limitation. The description is not intended to limit the scope of the invention to the specific forms set forth herein. Rather, this description is intended to cover such alternatives, modifications, and equivalents that may be included within the spirit and scope of the invention as defined by the appended claims, and otherwise understood by one of ordinary skill in the art. Therefore, the breadth and scope of the preferred embodiments should not be limited by any of the exemplary embodiments described above.
Claims
1. A method for generating the dot product of two vectors, the method comprising: Retrieve the floating-point elements of the first vector and the elements of the second vector from memory; In hardware, floating-point multiplication is performed by multiplying the corresponding array elements of the first vector and the second vector to generate a floating-point product: In hardware, the floating-point product is converted to a fixed-point integer, the fixed-point integer including multiple bits representing a binary integer value and multiple bits representing a fractional binary integer value; and The fixed-point integer is accumulated using a hardware integer accumulator.
2. The method according to claim 1, further comprising: The hardware checks whether the floating-point product exceeds the range of the integer accumulator, and if it does, the floating-point multiplication, the conversion, and the fixed-point accumulation are not performed.
3. The method of claim 2, wherein the check for each floating-point multiplication is performed by examining the floating-point exponents of the input tensor value and the kernel tensor value.
4. The method of claim 3, wherein the floating-point product range check includes whether the result is higher or lower than the range of the integer accumulator.
5. The method of claim 1, further comprising: If the accumulation overflows the accumulator, then an accumulator with a larger range is used.
6. The method of claim 1, wherein the fixed-point integer is between sixteen bits and sixty-four bits.
7. The method of claim 6, wherein half of the bit is used to represent the fractional binary value.
8. The method of claim 1, wherein one of the two vectors is an input to a layer of the neural network and the other of the two vectors is a weight vector of the neural network, and the dot product generates an input to another layer of the neural network or an output of the neural network.
9. The method of claim 1, wherein the floating-point multiplier and integer accumulator are implemented as part of a semiconducting circuit.
10. A hardware system for generating the dot product of two tensors, the dot product generation comprising: Hardware floating-point multiplier; Hardware integer accumulator; A sequencer configured to generate a first tensor and a second tensor dot product, the tensor dot product comprising multiple vector multiplications and the accumulation of multiple vector dot products, each vector dot product comprising an array of elements, a first vector and a second vector, including... The sequencer is configured to access the floating-point elements of the first vector and the second vector, and the execution process is as follows: Multiply the corresponding array elements of the first vector and the second vector to generate multiple floating-point products; In hardware, the plurality of floating-point products are converted into a plurality of fixed-point integers, wherein the fixed-point integers include a plurality of bits representing binary integer values and a plurality of bits representing fractional binary integer values; and The dot product of the binary integer vectors is determined by accumulating the plurality of fixed-point integers using the integer accumulator.
11. The hardware system according to claim 10, further comprising: The plurality of floating-point products are checked to determine whether the product exceeds the range of the integer accumulator, wherein for any floating-point product that exceeds the range, the floating-point multiplication, the conversion, and the fixed-point accumulation are not performed.
12. The hardware system of claim 11, wherein the check for each floating-point multiplication is performed by examining the floating-point exponents of the input tensor value and the kernel tensor value.
13. The method of claim 12, wherein the floating-point product range check includes whether the result is higher or lower than the range of the integer accumulator.
14. The hardware system of claim 10, further comprising: If the accumulation overflows the accumulator, then an accumulator with a larger range is used.
15. The hardware system of claim 10, wherein the fixed-point integer is between sixteen bits and sixty-four bits.
16. The hardware system of claim 15, wherein half of the bit is used to represent the fractional binary value.
17. The hardware system of claim 10, wherein one of the two vectors is an input to a layer of the neural network and the other of the two vectors is a weight vector of the neural network, and the dot product generates an input to another layer of the neural network or an output of the neural network.
18. The hardware system of claim 10, wherein the floating-point multiplier and integer accumulator are implemented as part of a semiconducting circuit.