Softmax operation device for low power and acceleration

The softmax computing device addresses the time and power challenges of softmax operations in neural networks by employing an optimized circuit structure for high-speed, low-power computation using logic circuits and adaptive thresholding.

WO2026127369A1PCT designated stage Publication Date: 2026-06-18INDUSTRY UNIVERSITY COOPERATION FOUNDATION HANYANG UNIVERSITY

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
INDUSTRY UNIVERSITY COOPERATION FOUNDATION HANYANG UNIVERSITY
Filing Date
2025-10-31
Publication Date
2026-06-18

AI Technical Summary

Technical Problem

Softmax operations in neural networks, particularly in massive language models, require significant time and power consumption due to increased sequence lengths, necessitating acceleration and power optimization.

Method used

A softmax computing device utilizing an optimized circuit structure that includes a selection circuit, comparison circuit, one-hot encoding circuit, and softmax operation circuit, performing approximated softmax operations using logic circuits without processors, and adaptively setting thresholds based on model and input data dimensions.

🎯Benefits of technology

Enables high-speed softmax computation with reduced power consumption by approximating softmax operations through circuit-based methods, thereby accelerating and simplifying the process.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2025017696_18062026_PF_FP_ABST
    Figure KR2025017696_18062026_PF_FP_ABST
Patent Text Reader

Abstract

Disclosed is a softmax operation device for low power and acceleration. The device disclosed herein comprises: a selection circuit for selecting the largest value and the second largest value among elements of an input vector; a comparison circuit for comparing a difference value between the largest value and the second largest value with a predetermined threshold value and output a comparison result; a one-hot encoding circuit which performs one-hot encoding for setting the element corresponding to the largest value of the input vector to 1 and the remaining elements to 0 if the difference value is equal to or greater than the preset threshold value; and a softmax operation circuit which performs a softmax operation on the input vector if the difference value is equal to or less than the preset threshold value. The softmax operation circuit includes: a first shift register for shifting an element (Si) of each input vector to the right and outputting 0.5 Si; a first summer for summing the element of the input vector and the output of the first shift register and outputting 1.5 Si; a second shift register for shifting the element of the input vector to the right four times and outputting 0.0625 Si; and an inverter for inverting the output of the second shift register and outputting -0.0625 Si. The softmax operation device disclosed herein has the advantage of being able to perform softmax operations at high speed without using a processor and being able to operate at low power through an optimized circuit structure and setting.
Need to check novelty before this filing date? Find Prior Art

Description

Softmax computing unit for low power and acceleration

[0001] The present invention relates to a softmax computing device, and more specifically, to a softmax computing device for low power consumption and acceleration.

[0002]

[0003] The softmax operation is a method used in various neural networks to convert a given input vector into a probability value. The softmax operation has been primarily used to calculate class-specific probability values ​​in the final layers of classification and prediction neural networks.

[0004] With the recent advancements in generative neural networks, softmax operations are being widely used within these networks. In particular, in massive language models, a type of generative neural network, sequence lengths increase linearly, and performing softmax operations on such long sequences requires significant time. Furthermore, the increased computation time leads to substantial power consumption.

[0005] When the number of tokens supported by the language model is 2048 or more, the softmax operation accounts for 40% of the total execution time, so there is an urgent need for acceleration and simplification of the softmax operation.

[0006]

[0007] The present invention proposes a softmax computing device capable of performing softmax computation at high speed without using a processor.

[0008] The present invention proposes a softmax computing device capable of operating at low power through an optimized circuit structure and settings.

[0009]

[0010] According to one aspect of the present invention, a softmax operation device is provided that includes: a selection circuit for selecting the largest value and the second largest value among the elements of an input vector; a comparison circuit for comparing the difference between the largest value and the second largest value with a preset threshold value and outputting a comparison result; a one-hot encoding circuit for setting the element corresponding to the largest value of the input vector to 1 and the remaining elements to 0 when the difference value is greater than or equal to the preset threshold value; and a softmax operation circuit for performing a softmax operation on the input vector when the difference value is less than or equal to the preset threshold value, wherein the softmax operation circuit performs a softmax operation as shown in the following mathematical formula.

[0011]

[0012] In the above mathematical formula, Si represents an element of the input vector.

[0013] The above softmax operation circuit includes a first shift register that outputs 0.5Si by shifting each input vector to the right, and a first summer that outputs 1.5Si by summing the elements of the input vector and the output of the first shift register.

[0014] The above softmax operation circuit further includes a second shift register that outputs 0.0625Si by shifting the elements of the input vector to the right four times, and an inverter that outputs -0.0625Si by inverting the output of the second shift register.

[0015] The above softmax calculation circuit further includes a second summer that sums the output of the first summer and the output of the inverter to output 1.4375Si.

[0016] The above softmax operation circuit bit-shifts the output of the second summer to the left to 2 1.4375Si It further includes a bit shifter that outputs.

[0017] The above softmax calculation circuit is 2, which is a transformed input vector element output from the second summer. 1.4375Si It further includes a conversion input vector register that stores elements by element.

[0018] The above softmax operation circuit further includes a round circuit that converts the 1.4375Si into an integer before inputting it to the bit shifter.

[0019] According to another aspect of the present invention, a selection circuit for selecting the largest value and the second largest value among the elements of an input vector; a comparison circuit for comparing the difference between the largest value and the second largest value with a preset threshold value and outputting a comparison result; and a one-hot encoding circuit for performing one-hot encoding in which, if the difference value is greater than or equal to the preset threshold value, the element corresponding to the largest value of the input vector is set to 1 and the remaining elements are set to 0. and includes a softmax operation circuit that performs a softmax operation on the input vector when the difference value is less than or equal to a preset threshold value, wherein the softmax operation circuit includes a first shift register that outputs 0.5Si by shifting the element (Si) of each input vector to the right, a first summer that outputs 1.5Si by summing the element of the input vector and the output of the first shift register, a second shift register that outputs 0.0625Si by shifting the element of the input vector to the right four times, and an inverter that outputs -0.0625Si by inverting the output of the second shift register.

[0020]

[0021] The softmax computing device of the present invention has the advantage of being able to perform softmax computation at high speed without using a processor and to operate at low power through an optimized circuit structure and settings.

[0022]

[0023] FIG. 1 is a drawing illustrating the overall structure of a softmax computing device for low power and acceleration according to one embodiment of the present invention.

[0024] FIG. 2 is a drawing illustrating the detailed structure of a softmax calculation circuit according to one embodiment of the present invention.

[0025] FIG. 3 is a diagram showing pseudocode for controlling a softmax computing device according to an embodiment of the present invention.

[0026] FIG. 4 is a diagram illustrating the overall flow of a softmax computation method for low power and acceleration according to an embodiment of the present invention.

[0027]

[0028] Hereinafter, specific embodiments according to embodiments of the present invention will be described with reference to the drawings. The following detailed description is provided to facilitate a comprehensive understanding of the methods, devices, and / or systems described herein. However, this is merely illustrative and the present invention is not limited thereto.

[0029] In describing the embodiments of the present invention, if it is determined that a detailed description of known technology related to the present invention may unnecessarily obscure the essence of the embodiments, such detailed description will be omitted. Furthermore, the terms described below are defined in consideration of their functions in the present invention, and these may vary depending on the intentions or practices of the user or operator. Therefore, such definitions should be based on the content throughout this specification. Terms used in the detailed description are intended merely to describe specific embodiments and should not be limiting. Unless explicitly stated otherwise, expressions in the singular form include the meaning of the plural form. In this description, expressions such as “include” or “comprising” are intended to refer to certain characteristics, numbers, steps, actions, elements, parts thereof, or combinations thereof, and should not be interpreted to exclude the existence or possibility of one or more other characteristics, numbers, steps, actions, elements, parts thereof, or combinations thereof other than those described.

[0030] FIG. 1 is a diagram illustrating the overall structure of a softmax computing device for low power and acceleration according to one embodiment of the present invention.

[0031] Referring to FIG. 1, a softmax operation device for low power and acceleration according to one embodiment of the present invention includes a vector element selection circuit (100), a comparison circuit (200), a one-hot encoding circuit (300), and a softmax operation circuit (400).

[0032] The vector element selection circuit (100) functions to select the largest value and the second largest value among the elements of the input vector that are subject to softmax operation. For example, if the input vector is an 8-dimensional vector consisting of (5, 10, 12, 3, 1, 2, 9, 0), the vector element selection circuit (100) functions to select the largest element, 12, and the second largest element, 10, among them. Since circuits that select specific elements based on the size of elements in an input vector are widely known, the detailed structure of the vector element selection circuit is not specifically described, and various circuits configured to output only specific values ​​among several scalar values ​​based on size may be used as the vector element selection circuit (100) of the present invention.

[0033] The comparison circuit (200) determines whether the difference between two selected elements is greater than or equal to a preset threshold. As in the previous example, when the selected vector elements are 10 and 12, it determines whether the difference between 10 and 12, which is 2, is greater than or equal to a preset threshold.

[0034] If the difference between the two selected elements is greater than or equal to a preset threshold, the comparison circuit (200) outputs a flag that activates the one-hot encoding circuit (300), and if the difference between the two selected elements is less than or equal to a preset threshold, the comparison circuit (200) outputs a flag that activates the softmax operation circuit (400).

[0035] Since a comparison circuit that compares two specific values ​​is a widely known circuit, a detailed description of its structure is omitted, and various known types of comparison circuits may be used as the comparison circuit (200) of the present invention.

[0036] If the difference between the two selected elements is greater than or equal to a preset threshold, the one-hot encoding circuit (300) performs one-hot encoding on the input vector and the softmax operation circuit (400) does not operate. If the difference between the two selected elements is less than or equal to a preset threshold, the one-hot encoding circuit (300) does not operate and the softmax operation circuit (400) performs a softmax operation on the input vector.

[0037] According to one embodiment of the present invention, a threshold value that is compared with the difference value between two selected elements can be adaptively set. Specifically, the threshold value can be adaptively set according to the structure of the model or the dimension of the input data. Conventionally, a fixed threshold value was used without considering the structure of the model or the dimension of the input data, but in the present invention, the threshold value is used adaptively according to the structure of the neural network and the structure of the input data, and for example, any one of 1-3 values ​​may be selected and used as the threshold value.

[0038] The one-hot encoding circuit (300) converts the input vector into a one-hot encoding vector. Specifically, the one-hot encoding circuit (300) performs one-hot encoding to convert the value of the element with the largest value among the input vectors to 1 and the remaining elements to 0. For example, if the input vector is (5, 10, 12, 3, 1, 2, 9, 0), the one-hot encoding circuit converts the element with the largest value, 12, to 1 and all remaining elements to 0. In short, the input vector (5, 10, 12, 3, 1, 2, 9, 0) is converted into (0, 0, 1, 0, 0, 0, 0, 0) by the one-hot encoding circuit (500). The vector thus one-hot encoded becomes a softmax vector.

[0039] The one-hot encoding circuit (300) can be implemented using a multiplexer, pointer, summer, OR gate, etc., and since the one-hot encoding circuit implemented as a logic circuit is widely known, a detailed description thereof is omitted.

[0040] If the difference between the two elements with the largest values ​​is greater than a preset threshold, it means that the value of a specific element is relatively large and the other values ​​are small, so the impact is minimal. In this case, to accelerate the operation, the input vector is converted into a one-hot encoded vector in which only specific elements have a value of 1.

[0041] Meanwhile, if the difference between the two elements with the largest values ​​is less than or equal to a preset threshold, the value of a specific element cannot be considered to represent the input vector. In this case, the softmax operation circuit (400) of the present invention is activated to perform a softmax operation, and one-hot encoding is not performed.

[0042] The following mathematical formula 1 represents a general softmax operation.

[0043]

[0044] In the above mathematical formula 1, x i represents an element of the input vector. A standard softmax operation uses the natural constant (Euler's number) e, and performing a softmax operation using the natural constant e results in high computational complexity and requires processing.

[0045] The present invention aims to solve such problems e xi The softmax operation circuit of the present invention approximates as shown in mathematical formula 2 below, and performs a softmax operation based on the approximated mathematical formula.

[0046]

[0047] In the mathematical formula above, S is the input vector, and ultimately, 21.4375 It takes the form of multiplying the exponent of by S. Ultimately, the present invention converts the natural constant e corresponding to the base to 2 and converts the exponent of 2 to 1.4375S to perform a softmax operation. While it is preferable for the exponent to be 1.442S when converting the base from the Euler constant to 2, the present invention performs the softmax operation so that the exponent of 2 becomes 1.4375S in order to implement a softmax operator using only logic circuits.

[0048] Using the approximation as above, the softmax operation of the present invention is performed as shown in the following mathematical formula 3.

[0049]

[0050] FIG. 2 is a diagram illustrating the detailed structure of a softmax operation circuit according to one embodiment of the present invention.

[0051] Referring to FIG. 2, it includes an input vector register (410), a first shift register (420), a first summer (430), a second shift register (440), an inverter (450), a second summer (460), a round & bit shifter (470), a converted input vector register (480), an accumulator (490), and a divider (500).

[0052] The input vector to be subjected to the softmax operation is stored in the register (410). FIG. 2 illustrates a case where the number of elements in the input vector is 8 and accordingly the number of elements in the softmax output vector is also 8, but it will be obvious to those skilled in the art that the number of elements in the input vector and the number of elements in the softmax output vector (e.g., the number of classes) can be set in various ways by the designer.

[0053] The first shift register (420) shifts each element of the input vector to the right once, and the input vector element Si becomes 0.5Si due to the operation of the first shift register (420). The input vector element Si and the output of the first shift register (420), 0.5Si, are input to the first summer (430), and the first summer (430) outputs 1.5Si.

[0054] The second shift register (440) is a shift register formed by combining four shift registers, and the element Si of the input vector is input to the second shift register (440), and 0.0625Si is output from the second shift register (440) through four right shifts. 0.0625 is 1 / (2 4 It is a value corresponding to ).

[0055] 0.0625Si is converted to -0.0625Si by the inverter (450), and 1.5Si, which is the output of the first summer, and -0.0625Si, which is the output of the inverter (450), are input to the second summer (460) and summed, and the second summer (460) outputs 1.4375Si.

[0056] The round & bit shifter (470) uses 1.4375Si 2 1.4375Si It performs the function of converting to. The round & bit shifter (470) includes a round circuit and a bit shifter.

[0057] 1.4375Si, which corresponds to the power of 2, is converted into an integer by a round circuit. The round circuit performs integer conversion by removing the floating point. Since the round circuit that performs integer conversion is widely known, its detailed structure will not be described.

[0058] The bit shifter converts 1.4375Si to 2 through shifting 1.4375Si It converts to. The bit shifter (470) shifts 1.4375Si, which is integerized by the round circuit, to the left to 2 1.4375Si Convert to.

[0059] Calculated 2 1.4375Si is stored in the converted input vector register (480). The above operation is performed for each input vector element, and the value calculated for each input vector element is sequentially stored in the converted input vector register (480). For example, by the softmax operation circuit of the present invention, S0 is 2 1.4375S0 It is converted and stored as S'0 in the converted input vector register (480). Also, S1 is 2 1.4375S1 It is converted and stored as S'1 in the converted input vector register (480).

[0060] Ultimately, the converted input vector register (480) contains 2 converted from each input vector Si. 1.4375Si It is stored on an element-by-element basis.

[0061] The elements of the conversion input vector register (480) are summed by the accumulator (490). The value summed by the accumulator (490) corresponds to the denominator in the softmax operation of the present invention as in Equation 3 above.

[0062] The divider (500) is 2 elements of each conversion input vector stored in the conversion input vector register (480). 1.4375Si It functions to divide the input vector into the output of the accumulator (490). The divider (500) divides each transformed input vector element into the output of the accumulator (490), and according to this operation, a softmax vector with a total of 8 dimensions is output.

[0063] The softmax operation circuit of the present invention, as described above, performs softmax operations using a logic circuit without using a process, thereby enabling softmax operations to be performed at a high speed.

[0064] Meanwhile, although this embodiment was described using the case where the dimension of the input vector (the number of elements of the input vector) is 8 as an example, it will be obvious to those skilled in the art that the dimension of the input vector and the dimension of the softmax vector can be freely set by the designer.

[0065] In addition, if the size of the input vector register is large, the complexity of the operation may increase; to address this, the input vector may be divided, and then the transformation elements for the input vector may be sequentially processed and stored in the transformed input vector register.

[0066] FIG. 3 is a diagram showing pseudocode for controlling a softmax computing device according to one embodiment of the present invention, and FIG. 4 is a diagram showing the overall flow of a softmax computing method for low power and acceleration according to one embodiment of the present invention.

[0067] Referring to FIGS. 3 and 4, the largest value and the second largest value among the elements of the input vector are first selected (step 1000). For example, the input vector may be an attention score of a Large Language Model (LM), but is not limited thereto.

[0068] When the largest value and the second largest value are selected, calculate the difference between the largest value and the second largest value (step 1002).

[0069] When the difference value is calculated, it is determined whether the difference value is greater than or equal to a preset threshold (step 1004).

[0070] If the difference value is greater than or equal to a threshold, one-hot encoding is performed on the input vector by setting the element with the largest value of the input vector to 1 and the values ​​of the remaining elements to 0 (step 1006). In this case, the one-hot encoded vector for the input vector becomes the softmax vector. Since only simple one-hot encoding is performed without performing a separate softmax operation when the difference value is greater than or equal to a threshold, it is possible to accelerate the softmax operation.

[0071] If the difference value is below the threshold value, each element Si of the input vector is converted to 1.4375Si using a shift register, a summer, and an inverter (step 1008).

[0072] 1.4375Si through a bit shifter 2 1.4375Si Convert to (Step 1010). As previously explained, after integering 1.4375Si through a round circuit, 2 1.4375Si It can be converted to.

[0073] Each transformation element of the transformation input vector is divided by the sum of all transformation elements of the transformation input vector to output a softmax vector (step 1012).

[0074] The present invention has been described with reference to embodiments illustrated in the drawings, but this is merely illustrative, and those skilled in the art will understand that various modifications and equivalent alternative embodiments are possible therefrom. Accordingly, the true technical scope of protection of the present invention should be determined by the technical spirit of the appended claims.

Claims

1. A selection circuit that selects the largest value and the second largest value among the elements of an input vector; A comparison circuit that outputs a comparison result by comparing the difference between the largest value and the second largest value with a preset threshold value; A one-hot encoding circuit that performs one-hot encoding by setting the element corresponding to the largest value of the input vector to 1 and the remaining elements to 0 when the difference value is greater than or equal to the preset threshold; and The softmax operation circuit that performs a softmax operation on the input vector when the difference value is less than or equal to a preset threshold value, The above softmax calculation circuit is a softmax calculation device that performs softmax calculations as shown in the following mathematical formula. In the above mathematical formula, Si represents an element of the input vector.

2. In Paragraph 1, The above softmax operation circuit is a softmax operation device comprising a first shift register that outputs 0.5Si by shifting each input vector to the right, and a first summer that outputs 1.5Si by summing the elements of the input vector and the output of the first shift register.

3. In Paragraph 2, The softmax operation circuit further comprises a second shift register that outputs 0.0625Si by shifting the elements of the input vector to the right four times, and an inverter that outputs -0.0625Si by inverting the output of the second shift register.

4. In Paragraph 3, The softmax calculation circuit further includes a second summer that sums the output of the first summer and the output of the inverter to output 1.4375Si, forming a softmax calculation device.

5. In Paragraph 4, The above softmax operation circuit bit-shifts the output of the second summer to the left to 2 1.4375Si A softmax arithmetic unit further including a bit shifter that outputs 6. In Paragraph 1, The above softmax calculation circuit is 2, which is a transformed input vector element output from the second summer. 1.4375Si A softmax arithmetic unit further comprising a conversion input vector register that stores elements by element.

7. In Paragraph 6, The above softmax operation circuit handles each transformation element 2 stored in the transformation input vector register 1.4375Si A softmax operation unit further comprising a divider that outputs each element of the softmax vector by dividing it by the sum of all transformation elements.

8. In Paragraph 5, The softmax operation circuit described above is a softmax operation device further comprising a round circuit that integerizes the 1.4375Si before inputting it to the bit shifter.

9. A selection circuit that selects the largest value and the second largest value among the elements of an input vector; A comparison circuit that outputs a comparison result by comparing the difference between the largest value and the second largest value with a preset threshold value; A one-hot encoding circuit that performs one-hot encoding by setting the element corresponding to the largest value of the input vector to 1 and the remaining elements to 0 when the difference value is greater than or equal to the preset threshold; and The softmax operation circuit that performs a softmax operation on the input vector when the difference value is less than or equal to a preset threshold value, The softmax operation circuit comprises a first shift register that outputs 0.5Si by shifting each input vector element (Si) to the right, a first summer that outputs 1.5Si by summing the input vector element and the output of the first shift register, a second shift register that outputs 0.0625Si by shifting the input vector element to the right four times, and an inverter that outputs -0.0625Si by inverting the output of the second shift register.

10. In Paragraph 9, The softmax calculation circuit further includes a second summer that sums the output of the first summer and the output of the inverter to output 1.4375Si, forming a softmax calculation device.

11. In Paragraph 10, The above softmax operation circuit bit-shifts the output of the second summer to the left to 2 1.4375Si A softmax arithmetic unit further including a bit shifter that outputs 12. In Paragraph 11, The above softmax calculation circuit is 2, which is a transformed input vector element output from the second summer. 1.4375Si A conversion input vector register that stores elements by element, and each conversion element 2 stored in the conversion input vector register 1.4375Si A softmax operation unit further comprising a divider that outputs each element of the softmax vector by dividing it by the sum of all transformation elements.

13. In Paragraph 12, The softmax operation circuit described above is a softmax operation device further comprising a round circuit that integerizes the 1.4375Si before inputting it to the bit shifter.

14. In Paragraph 9, The above softmax calculation circuit is a softmax calculation device that performs softmax calculations as shown in the following mathematical formula. In the above mathematical formula, Si represents an element of the input vector.