Activation method and device of neural network, NPU, equipment and storage medium
By using a two-level lookup table structure for neural network activation calculation, the problem of large storage space occupied by LUT tables is solved, and efficient and accurate activation value calculation is achieved on edge devices.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 伟光有限公司(CN)
- Filing Date
- 2022-11-29
- Publication Date
- 2026-06-16
AI Technical Summary
Existing technologies using LUTs for neural network activation calculations consume a large amount of storage space, leading to reduced resource utilization and difficulty in guaranteeing calculation accuracy.
A two-level lookup table structure is adopted. The first-level lookup table determines the target first-level interval to which the input value belongs, and the fitting parameters of the target second-level interval are determined from the second-level lookup table based on the parameters of the target first-level interval. The activation value is calculated using the fitting parameters, which reduces storage requirements and improves accuracy.
While ensuring the accuracy of activation values, it reduces the computational load and storage space requirements for activation calculations, making it suitable for resource-constrained edge devices.
Smart Images

Figure CN115759217B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of artificial intelligence, and in particular to a method, apparatus, NPU, device, and storage medium for activating a neural network. Background Technology
[0002] With the development of deep learning, neural networks have been widely used in various fields, and the scale of models has also developed in a wider and deeper direction. In order to reduce the consumption of computing resources and expand the application scenarios of neural networks, related technologies have reduced the model size and saved storage space by quantization while preserving the model structure.
[0003] To utilize NPUs (Neural-network Processing Units) for neural network computation, many related technologies employ LUTs (Look-Up Tables) to simplify the calculation process of activation functions. However, the LUTs for activation functions require a large amount of storage space, reducing resource utilization. Summary of the Invention
[0004] This application provides a method, apparatus, NPU, device, and storage medium for activating a neural network, which improves the accuracy of activation results while reducing data storage pressure. The technical solution is as follows:
[0005] On one hand, embodiments of this application provide an activation method for a neural network, the method comprising:
[0006] The target first-level interval to which the input value belongs is determined from the first-level lookup table. The first-level lookup table contains the correspondence between the first-level interval and the interval parameter. The first-level interval is obtained by dividing the range of the input value. The input value is a fixed-point number, and the range of the input value is a range of fixed-point numbers.
[0007] Based on the target interval parameters of the target first-level interval, the target fitting parameters of the target second-level interval to which the input value belongs are determined from the second-level lookup table. The target second-level interval belongs to the target first-level interval, and each first-level interval is divided into at least one second-level interval. The second-level lookup table contains the correspondence between the second-level interval and the fitting parameters. The fitting parameters are the parameters of the linear fit of the activation function corresponding to the second-level interval.
[0008] Based on the target fitting parameters and the input value, the activation value corresponding to the input value is determined, and the activation value is a fixed number.
[0009] On the other hand, embodiments of this application provide an activation device for a neural network, the device comprising:
[0010] The system comprises a first lookup table unit, a second lookup table unit, and a calculation unit, wherein the calculation unit is connected to both the first lookup table unit and the second lookup table unit.
[0011] The first lookup table unit is used to store a first-level lookup table, which contains the correspondence between first-level intervals and interval parameters. The first-level intervals are obtained by dividing the range of input values, where the input values are fixed-point numbers and the range of input values is a range of fixed-point numbers.
[0012] The second lookup table unit is used to store a secondary lookup table, which contains the correspondence between secondary intervals and fitting parameters. The fitting parameters are the parameters of linear fitting of the activation function corresponding to the secondary interval. Each primary interval is divided into at least one secondary interval.
[0013] The calculation unit is configured to determine the target first-level interval to which the input value belongs from a first-level lookup table; based on the target interval parameters of the target first-level interval, determine the target fitting parameters of the target second-level interval to which the input value belongs from a second-level lookup table, wherein the target second-level interval belongs to the target first-level interval; and based on the target fitting parameters and the input value, determine the activation value corresponding to the input value, wherein the activation value is a fixed number.
[0014] On the other hand, embodiments of this application provide an NPU, including programmable logic circuits and / or program instructions, which, when the NPU is running, are used to implement the neural network activation method as described above.
[0015] On the other hand, embodiments of this application provide an NPU, which includes an activation device for a neural network as described above.
[0016] On the other hand, embodiments of this application provide a computer device including a processor and a memory, wherein the memory stores at least one program, which is loaded and executed by the processor to implement the neural network activation method as described above.
[0017] On the other hand, embodiments of this application provide a computer-readable storage medium storing at least one program that is loaded and executed by a processor to implement the neural network activation method as described above.
[0018] On the other hand, embodiments of this application provide a computer program product including computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the neural network activation method described above.
[0019] In this embodiment, the computer device performs activation calculations by sequentially searching through a first-level lookup table and a second-level lookup table. The second-level intervals in the second-level lookup table are obtained by further subdividing the first-level intervals in the first-level lookup table, improving the accuracy of linear fitting of the activation function interval. The first-level lookup table is used to assist in reading the fitting parameters stored in the second-level lookup table, reducing the storage requirements of the lookup tables. Once the fitting parameters are found, the computer device performs activation calculations on the fixed-point input values based on these parameters, ensuring the accuracy of the activation values while reducing the computational load of the activation calculations. Attached Figure Description
[0020] To more clearly illustrate the technical solutions in the embodiments of this application, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0021] Figure 1 This application shows a structural block diagram of a computer device provided in an exemplary embodiment;
[0022] Figure 2 A flowchart illustrating an exemplary embodiment of the present application provides a method for activating a neural network.
[0023] Figure 3 A schematic diagram illustrating the interval division provided in an exemplary embodiment of this application is shown;
[0024] Figure 4 A schematic diagram illustrating a table lookup process provided in an exemplary embodiment of this application is shown;
[0025] Figure 5 A flowchart illustrating a secondary lookup table lookup method provided in an exemplary embodiment of this application is shown.
[0026] Figure 6 A structural block diagram of an activation device for a neural network provided in an exemplary embodiment of this application is shown. Detailed Implementation
[0027] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings.
[0028] For ease of understanding, the terms used in the embodiments of this application will be explained below.
[0029] Model quantization is a method of compressing models. To meet the accuracy requirements of various AI applications, the width, number of layers, depth, and various parameters of deep neural network structures have increased rapidly. This results in deep learning models occupying more storage space and requiring longer inference latency, hindering industrial deployment. Current models run on four types of chips: CPUs (Central Processing Units), GPUs (Graphics Processing Units), FPGAs (Field Programmable Gate Arrays), and ASICs (Application Specific Integrated Circuits). However, compared to deep learning models, the computing power of these chips is limited. For chips on edge devices, there are many limitations in terms of storage, memory, power consumption, and latency, with inference efficiency being particularly important.
[0030] In the context of computer vision and deep learning, "model" specifically refers to a convolutional neural network. Quantization, or quantization, is the process of approximating a continuous value of a signal as a finite number of discrete values; it's a method of information compression. A model consists of weights and biases, both of which are stored using the float32 data type. Floating-point numbers occupy 32 bits in storage, while fixed-point numbers (int8) occupy only 8 bits. Model quantization is essentially a model compression method that uses fixed-point numbers to represent floating-point numbers for computation.
[0031] A Look-Up Table (LUT) is essentially a type of RAM (Random Access Memory). A LUT replaces the logical AND and OR gates required to obtain data with two tables similar to truth tables. The LUT stores all the results of the input variables and the output variables after passing through the logic gates. After data is pre-written into RAM, inputting a signal is equivalent to inputting an address to look up the table; if the corresponding value is found at that address, that value is output as the result.
[0032] The Least Squares Method (LSM) is a commonly used method for solving unconstrained optimization problems. In this embodiment, the LSM is used to calculate the optimal fitted line by linearly fitting discrete values. The LSM finds the best matching function for the data by minimizing the sum of squared errors. This embodiment uses the LSM for linear fitting to obtain a linear expression with smaller errors.
[0033] In the process of using LUTs for activation calculation in neural networks, the input values are often quantized. For input values quantized in int8 data format, a LUT table that maps input values to activation values can be constructed one-to-one. However, for neural network computation tasks, int8 input values cannot guarantee sufficient computational precision. To ensure computational precision, floating-point input values can be quantized into 16-bit fixed-point numbers for calculation. For 16-bit inputs, due to limitations in hardware data storage capacity, it is not possible to construct a one-to-one mapping LUT table to complete the activation calculation. Instead, piecewise linear fitting of the activation curve is often used, and the fitting parameters of each fitted line are stored in the LUT table. The activation values are then calculated using the fitting parameters.
[0034] In the calculation of activation values based on fitting parameters, the accuracy of the activation values is closely related to the fitting effect of linear fitting. For existing linear fitting methods, the fitting effect is negatively correlated with the granularity used when dividing the curve intervals. That is, to obtain a better fitting effect, a smaller granularity is required, resulting in more curve intervals. When using a LUT table to store the fitting parameters, each curve interval corresponds to one entry. Therefore, to ensure the accuracy of the activation values, the hardware needs to provide sufficient storage space for the LUT table. When hardware space is limited, the accuracy of the activation values will be affected. This embodiment of the application constructs two LUT tables, which can reduce the storage space occupied by the LUT tables while ensuring the accuracy of the activation values.
[0035] Please refer to Figure 1 This diagram illustrates a structural block diagram of a computer device provided in an exemplary embodiment of this application. The computer device 100 may include one or more components such as a processor 110 and a memory 120.
[0036] The processor 110 integrates an NPU (Neural Network Processing Unit) for performing neural network processing, executing neural network activation methods, and realizing artificial intelligence (AI) functions. The processor 110 may include one or more processing cores. The processor 110 connects to various parts within the computer device 100 using various interfaces and lines, and performs various functions and processes data of the computer device 100 by running or executing instructions, programs, code sets, or instruction sets stored in memory 120, and by calling data stored in memory 120. Optionally, the processor 110 may be implemented using at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), or Programmable Logic Array (PLA). The processor 110 may integrate one or a combination of several of the following: Central Processing Unit (CPU), Graphics Processing Unit (GPU), Neural-network Processing Unit (NPU), and modem. The CPU primarily handles the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the content displayed on the touchscreen; and the modem handles wireless communication. It is understood that the modem may not be integrated into the processor 110 and can be implemented separately by a dedicated processor.
[0037] The memory 120 may include random access memory (RAM) or read-only memory (ROM). Optionally, the memory 120 may include a non-transitory computer-readable storage medium. The memory 120 may be used to store instructions, programs, code, code sets, or instruction sets. The memory 120 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function, instructions for implementing the various method embodiments described below, etc.; the data storage area may store data (such as audio data, telephone directory, etc.) created according to the use of the computer device 100.
[0038] In addition, those skilled in the art will understand that the structure of the computer device 100 shown in the above figures does not constitute a limitation on the terminal. The terminal may include more or fewer components than shown, or combine certain components, or have different component arrangements. For example, the computer device 100 may also include components such as an input unit, audio circuitry, a Wi-Fi module, a power supply, and a Bluetooth module, which will not be described in detail here.
[0039] Please refer to Figure 2 This document illustrates a flowchart of a neural network activation method provided in an exemplary embodiment of this application. This embodiment uses the method applied to a computer device as an example, and the method includes the following steps:
[0040] Step 201: Determine the target first-level interval to which the input value belongs from the first-level lookup table. The first-level lookup table contains the correspondence between the first-level interval and the interval parameters. The first-level interval is obtained by dividing the range of the input value. The input value is a fixed number and the range of the input value is a range of fixed numbers.
[0041] Given an input value, the computer device first searches the first-level lookup table based on the input value. Based on the mapping relationship stored in the first-level lookup table, which is the correspondence between the first-level interval and the interval parameter, the computer device needs to determine its target first-level interval based on the input value, and then read the target interval parameter.
[0042] In the embodiments of this application, the LUT table is used to simplify the activation calculation process of the quantization model. Therefore, during the calculation process, the input value is a fixed-point number, and the corresponding input value range is a fixed-point number range.
[0043] In an illustrative example, such as Figure 3 As shown, the activation function interval corresponding to the first-level interval is obtained by uniformly dividing the activation function, and the division granularity is the first-level division granularity.
[0044] It should be noted that, in the embodiments of this application, the activation function used in the neural network can be the sigmoid function f(x) = 1 / (1+e^x). -x It can also be a Tanh function f(x) = Tanh(x), or a ReLU function f(x) = max(0,x), etc., and this application does not limit it.
[0045] Step 202: Based on the target interval parameters of the target first-level interval, determine the target fitting parameters of the target second-level interval to which the input value belongs from the second-level lookup table. The target second-level interval belongs to the target first-level interval, and each first-level interval is divided into at least one second-level interval. The second-level lookup table contains the correspondence between the second-level intervals and the fitting parameters. The fitting parameters are the parameters of the linear fit of the activation function corresponding to the second-level interval.
[0046] Once the target primary interval is determined, the corresponding target interval parameter can be read from the primary lookup table. Based on the target interval parameter, the computer device calculates and determines the target secondary interval corresponding to the input value. In one possible implementation, such as... Figure 3 As shown, the activation function intervals corresponding to the second-level intervals are obtained by non-uniformly dividing the activation function. Within the same first-level interval, the activation function uniformly divides the second-level intervals, while the number of second-level intervals varies between different first-level intervals.
[0047] The second-level lookup table stores the correspondence between second-level intervals and fitting parameters. That is, each second-level interval corresponds to the fitting parameters of the activation function in its interval. When performing linear fitting on the activation function curve corresponding to the second-level interval, the fitting line can be expressed as:
[0048] y = A × x + C (1)
[0049] Substituting the starting value of the second-level interval into Formula 1, we get:
[0050] Y n =A×X n +C (2)
[0051] Among them, (X) n ,Y n Let be the value of the starting point of the nth secondary interval. According to Formula 1 and Formula 2, the parameters in the fitted line can be simplified, and the fitted line can be expressed by the following formula:
[0052] y = Ax + Y n -AX n (3)
[0053] Furthermore, to reduce the size of the neural network model so that it can still be used in edge computing scenarios with limited computing resources, this embodiment uses model quantization for neural network computation. Accordingly, during the activation process through linear fitting, the fitted line represented by Equation 3 needs to be quantized. The formula based on linear quantization is shown below:
[0054] Q = R / S + Z
[0055] Where R is the floating-point number, i.e., the actual data before quantization; Q is the fixed-point number obtained after quantization of the actual floating-point data; S is the quantization scale when quantizing the floating-point number, which represents the proportional relationship between the floating-point number and the fixed-point number before and after quantization. Optionally, the scaling factor can be calculated using the following formula:
[0056] S=(R max -R min ) / (Qmax -Q min )
[0057] Among them, R max R is the maximum value among the floating-point numbers used for quantization calculations. min Q is the minimum floating-point number in quantization calculations. max Q is the maximum value among the fixed-point numbers obtained by quantization. min S represents the minimum value among the quantized fixed-point numbers, indicating the ratio between the floating-point number range and the fixed-point number range before and after quantization; Z is the quantization zero, representing the data offset introduced during the quantization process. The quantization zero Z can be expressed as:
[0058]
[0059] Here, `round` indicates a function for rounding numerical values. It should be noted that the formulas used to calculate S or Z are for illustrative purposes only, and this application does not impose any limitations on them.
[0060] Substituting the linear quantization formula into Equation 3, we obtain the expression for the quantized fitted line, as follows:
[0061] (Q y -Z y )S y =AS x (Q x -Z x )+Y n -AX n (4)
[0062] By performing a mathematical transformation on formula 4, we can obtain:
[0063]
[0064] Formula 5 uses the activation value Q y The expression for the fitted straight line is given, and the two parameters in this formula are the fitted parameters stored in the second-level lookup table.
[0065] Furthermore, based on the tabular data stored in the second-level lookup table, each second-level interval in the second-level lookup table stores the corresponding fitting parameters. Therefore, given a determined target second-level index, the corresponding target second-level interval in the second-level lookup table can be determined based on the target second-level index, and the stored target fitting parameters can be read to further calculate the activation value based on the target fitting parameters.
[0066] Indicative, such as Figure 4 As shown, given the target secondary interval, the target fitting parameters a and c stored in that interval can be read to obtain a = 31101 and c = -10743.
[0067] Step 203: Based on the target fitting parameters and the input values, determine the activation values corresponding to the input values. The activation values are fixed points.
[0068] With the target fitting parameters read, the target fitting parameters and input values are substituted into the quantized fitted line expression (Formula 5) to calculate the activation function, thereby obtaining the activation value. The activation value is obtained by quantizing the fitted line expression (Formula 3) based on Formula 5, where both the input value and the activation value are quantized fixed-point numbers, reducing the computational resources required in the calculation process.
[0069] In summary, in this embodiment, the computer device first determines the target primary interval corresponding to the input value in the primary lookup table, and then determines the target secondary interval based on the target interval parameters stored in the primary interval. Once the target secondary interval is determined, the computer device reads the target fitting parameters stored in the secondary interval and then calculates the activation value using a fitted linear expression. In this embodiment, the computer device performs activation calculation by sequentially searching the primary and secondary lookup tables. The secondary intervals in the secondary lookup table are obtained by further subdividing the primary intervals in the primary lookup table, improving the accuracy of linear fitting of the activation function interval. Using the primary lookup table to assist in reading the fitting parameters stored in the secondary lookup table reduces the storage requirements of the lookup tables. After obtaining the fitting parameters, the computer device performs activation calculation on the fixed-point input value based on the fitting parameters, ensuring the accuracy of the activation value while reducing the computational load of the activation calculation.
[0070] In the activation calculation based on the lookup table, when an input value is obtained, the first-level interval in the first-level lookup table corresponding to that input value is first determined. Each entry in the first-level lookup table corresponds to one first-level interval. In one possible implementation, the first-level intervals are obtained by uniformly dividing the input value range based on the first-level partitioning granularity; that is, the number of first-level intervals can be calculated using the following formula:
[0071] S num =Q range / ΔS
[0072] Among them, S num Q represents the number of first-level intervals in the first-level lookup table. range Q represents the range of input values. range Stored as the maximum input value within the input value range, that is, Q. range Q represents the range of values for a fixed-point number after quantization. range =Q max -Q minΔS represents the first-level partitioning granularity. Since determining the first-level interval corresponding to the input value in the first-level lookup table does not require high precision, in one possible implementation, the range of input values in the first-level interval can be represented by 8-bit data, meaning there exists a maximum input value Q. range =255. This is illustrative; with an input value range of 0 to 255 and a first-level partition granularity of 8, the first-level lookup table includes 32 first-level intervals.
[0073] Furthermore, based on the input value, the minimum value within the input value range, and the first-level partition granularity, the target first-level index corresponding to the input value is determined. In one possible implementation, the target first-level index can be calculated using the following formula:
[0074]
[0075] Among them, Q x For the input value, Q min The minimum input value is used, and floor() represents the floor function.
[0076] Indicative, such as Figure 4 As shown, with an input value of 123, a minimum input value of 0, and a first-level partitioning granularity of 8, the target first-level index is 15 according to Formula 6.
[0077] Furthermore, based on the one-to-one correspondence between the target first-level index and each first-level interval, once the target first-level index is calculated, the first-level interval corresponding to the target first-level index in the first-level lookup table is determined as the target first-level interval.
[0078] Since LUT tables are essentially data stored in memory, the process of reading the corresponding entry content through the target first-level index is equivalent to reading the data information stored at the memory address. In this embodiment, the target first-level index is used as an address, and the interval parameters stored at that address are the target interval parameters. In one possible implementation, the target interval parameters include the starting second-level index of the second-level interval within the target first-level interval, the number of target second-level intervals within the target first-level interval, and the starting value of the interval within the target first-level interval. The second-level intervals within each first-level interval are uniformly divided; that is, the second-level intervals are obtained by uniformly dividing the curves within each first-level interval.
[0079] In one possible implementation, the target secondary interval can be determined based on the target interval parameters. For details on how to determine the target secondary interval, please refer to [reference needed]. Figure 5 The diagram illustrates a flowchart of determining a target secondary interval provided by an exemplary embodiment of this application.
[0080] Step 501: Based on the first-level partitioning granularity and the number of target second-level intervals, determine the second-level partitioning granularity of the second-level intervals in the target first-level intervals.
[0081] In this embodiment, to save storage space while ensuring the accuracy of activation calculation, a suitable granularity needs to be selected for the secondary intervals. Correspondingly, for curve intervals with significant changes in the slope of the activation function curve, a smaller granularity is used to determine the fitting parameters. That is, intervals with larger slope changes than the primary intervals should contain more secondary intervals so that the curve segments in each secondary interval are as close to a straight line as possible. In the secondary lookup table, the numerical ranges contained in each secondary interval are non-uniform, but it should be noted that the secondary intervals within each primary interval are uniformly divided. Since the granularity of the secondary intervals varies depending on the degree of change in the curve slope within the primary interval, it is necessary to first determine the granularity of the secondary intervals when determining the target secondary intervals based on the primary intervals.
[0082] Each entry in the first-level lookup table stores at least a first-level index and the number of corresponding second-level intervals within the first-level interval. The calculation method for determining the second-level partition granularity based on the first-level partition granularity and the number of second-level intervals can be expressed by the following formula:
[0083] ΔP=ΔS / P (7)
[0084] Where ΔP is the secondary partitioning granularity, ΔS is the primary partitioning granularity, and P is the number of secondary intervals. It should be noted that the primary intervals are obtained by uniformly dividing the activation curve; for the same primary lookup table, the primary partitioning granularity ΔS is a fixed value.
[0085] Indicative, such as Figure 4 As shown, when the first-level partitioning granularity is 8, and P is read as 3 in the first-level lookup table, the second-level partitioning granularity within the target first-level interval can be calculated using Formula 7 as 8 / 3.
[0086] In the above process, the granularity of the second-level division is determined based on the number of second-level intervals stored in the target first-level interval. To improve the accuracy of the activation values, it is necessary to improve the linear fitting effect, that is, to set a smaller granularity of the second-level division in the first-level interval where the curve slope changes greatly, and to increase the number of second-level intervals for linear fitting. In one possible implementation, the number of second-level intervals in each first-level interval is positively correlated with the rate of change of the second derivative of the activation function within the first-level interval.
[0087] Accordingly, when constructing the secondary lookup table, i points on the corresponding activation function curve are determined as reference points in each primary interval, and the second derivative of each reference point is calculated. Then, the i calculated values in each primary interval are summed. The sum of the second derivatives in each primary interval characterizes the change in the slope of the curve within the primary interval; that is, the slope of the activation curve changes faster in the primary interval where the sum of the second derivatives is larger. Based on the determined number of secondary intervals and according to the sum of the second derivatives in each primary interval, the number of secondary intervals in each primary interval is determined proportionally. This ensures that more secondary intervals are divided in primary intervals with larger curve slope changes, resulting in a smaller granularity of secondary division; conversely, fewer secondary intervals are divided in primary intervals with smaller curve slope changes, saving storage space in the secondary lookup table. Thus, while controlling the number of secondary intervals, a better fitting effect can be obtained.
[0088] It should be noted that, due to limitations in hardware calculation methods, the total number of secondary intervals must be a power of 2.
[0089] In a schematic representation, when constructing the secondary lookup table, the activation curve is divided into 32 primary intervals. Ten points are determined within each primary interval, and the second derivative of each point is calculated and summed. When the total number of secondary intervals is 128, the number of secondary intervals in each primary interval is determined proportionally based on the sum of the second derivatives in each primary interval. The specific method for determining the number of secondary intervals proportionally could be multiplying the total number of secondary intervals by the proportion and rounding, or sorting based on the sum of the second derivatives and determining the number of secondary intervals according to a predefined proportion, etc. These methods are for illustrative purposes only and are not limited in this application.
[0090] Step 502: Determine the index offset based on the input value, the starting value of the target first-level interval, and the second-level partitioning granularity.
[0091] Each entry in the first-level lookup table stores at least the first-level index, the starting value of the interval, the starting second-level index, and the number of second-level intervals. The starting value indicates the initial value of the input value range within the target first-level interval, and the second-level partitioning granularity indicates the granularity applicable when further dividing the target first-level interval into second-level intervals. The calculation process for determining the index offset based on the input value, the starting value of the interval, and the second-level partitioning granularity can be expressed by the following formula:
[0092]
[0093] Where ΔE represents the index offset, Q x The input values are Q0, the starting value of the target first-level interval, and ΔP, the granularity of the second-level division within the target first-level interval. `floor()` performs a floor function operation.
[0094] Indicative, such as Figure 4 As shown, with an input value of 123, the target first-level index is determined to be the first-level interval corresponding to 15. Then, Q0 is read as 120 and P as 3. Based on a first-level partition granularity of 8, the second-level partition granularity ΔP is calculated to be 8 / 3. Using formula 8, the index offset ΔE is calculated to be 1.
[0095] Step 503: Determine the target secondary index based on the index offset and the starting secondary index.
[0096] Here, the index offset represents the deviation between the starting secondary index stored in the target primary interval and the target secondary index corresponding to the input value. Furthermore, the process of calculating and determining the target secondary index can be expressed by the following formula:
[0097] E=ΔE+E0 (9)
[0098] Where E is the target secondary index, E0 is the starting secondary index, which is read from the target primary interval, and ΔE is the index offset, which is calculated in the same way as step 402.
[0099] Indicative, such as Figure 4 As shown, each first-level interval of the first-level lookup table stores the first-level index, the interval start value, the starting second-level index, and the number of second-level intervals. With an input value of 123, for a first-level lookup table with a first-level partition granularity of 8, the target first-level index is 15. Then, it is read from the corresponding target first-level interval. The target interval start value is 120, the starting second-level index is 60, and the number of second-level intervals is 3. The index offset is calculated to be 1. By formula 9, the target second-level index is 61.
[0100] Step 504: Determine the second-level interval corresponding to the target second-level index in the second-level lookup table as the target second-level interval.
[0101] Based on the one-to-one correspondence between secondary indices and secondary intervals, and the fact that the secondary lookup table stores the mapping relationship between secondary indices and secondary intervals, when the target secondary interval is determined according to the target interval parameters, the secondary interval corresponding to the target secondary index in the secondary lookup table can be determined as the target secondary interval. Similar to the primary lookup table, the secondary lookup table is essentially a RAM (Random Access Memory). Therefore, the corresponding target secondary index is essentially a lookup address. Given the lookup address, the computer device can read the target fitting parameters stored in the target secondary interval from the secondary lookup table.
[0102] Step 505: Obtain the target fitting parameters corresponding to the target second-level interval from the second-level lookup table.
[0103] The second-level lookup table stores the fitting parameters obtained by linearly fitting the activation function in tabular form. The target second-level interval serves as the target interval in the second-level lookup table, storing the target fitting parameters used to determine the activation value. Furthermore, the computer device can read the target fitting parameters from the target second-level interval.
[0104] In summary, in this embodiment, the computer device, based on the input value, first determines the target primary interval in the primary lookup table by searching through a primary lookup table with the same granularity. Then, based on the target interval parameters stored in the primary lookup table, the target secondary interval is determined in a secondary lookup table with a different granularity, thus assisting the search process. Even when the secondary interval is non-uniform, the target secondary interval can still be quickly determined without additional hardware such as comparators. This saves storage space for the secondary intervals and reduces device power consumption, enabling the search and reading process for non-uniform lookup tables.
[0105] The aforementioned target secondary interval is used as a storage unit, which stores the target fitting parameters. As shown in step 203, the expression for the quantized fitting line is:
[0106]
[0107] Based on the above expression, the slope parameter a and intercept parameter c in the fitting parameters can be determined as follows:
[0108]
[0109] This application embodiment implements the lookup process in an NPU. Since the NPU can only store integer data, the fitting parameters in the fitted line obtained by fitting and quantizing based on the activation function are not integers. Therefore, in one possible implementation, the fitting parameters in the secondary lookup table need to undergo corresponding parameter amplification processing. That is, each fitting parameter has a corresponding scaling factor when stored, and the scaling factor indicates the coefficient of change in the amplification processing of the fitting parameters. The fitting parameters include a slope parameter and an intercept parameter. Accordingly, the slope parameter is obtained by amplification based on a first scaling factor, and the intercept parameter is obtained by amplification based on a second scaling factor. Furthermore, the amplified fitting parameters can be expressed as:
[0110]
[0111] Among them, 2 na 2 is the first scaling factor corresponding to the slope parameter. ncThis is the second scaling factor corresponding to the intercept parameter. It should be noted that the scaling factor is directly proportional to the accuracy of the fitted parameters; therefore, computer equipment should choose the largest possible scaling factor without exceeding the hardware data bit width.
[0112] Indicative, such as Figure 4 As shown, when storing the fitted parameters in the secondary lookup table, the slope parameter is increased by 2. 15 Multiply the intercept parameter by 2. 23 The first scaling factor is 2 times. 15 The second scaling factor is 2 23 .
[0113] Accordingly, regarding the method of determining activation values based on target fitting parameters, in one possible implementation, the computer device determines the activation value corresponding to the input value based on the target fitting parameters, the scaling factor corresponding to the target fitting parameters, and the input value.
[0114] Furthermore, the target fitting parameters include slope and intercept parameters, with the slope parameter obtained by scaling up using a first scaling factor and the intercept parameter obtained by scaling up using a second scaling factor. In determining the activation value, the computer device needs to determine the activation value corresponding to the input value based on the input value, slope parameter, first scaling factor, intercept parameter, and second scaling factor. That is, the activation calculation process performed by the computer device can be represented by the following formula:
[0115]
[0116] Among them, 2 na As the first scaling factor, 2 nc This is the second scaling factor. It should be noted that the calculation based on the first and second scaling factors in Formula 10 is a right-shift calculation. Since the right-shift calculation rounds the result down, it introduces a certain error compared to the more accurate rounding operation. Therefore, 0.5 can be added to the intercept parameter during the calculation to offset the above error and improve the accuracy of the activation value.
[0117] Optionally, the computer device can programmatically construct a new calculation module to complete the activation calculation based on the calculation process in Formula 10, which involves performing multiplication, then right shift, and finally addition. Alternatively, it can perform mathematical transformations on Formula 10 to reuse the existing quant module to complete the activation function calculation.
[0118] In summary, in this embodiment, the fitted straight line is obtained by quantizing the activation function. Calculations based on input values in fixed-point format yield activation values of the same fixed-point type, improving computational efficiency. During the quantization calculation process, based on the lookup table and the fitted straight line expression, the computer device can complete the activation calculation without additional quantization and dequantization processes, significantly saving computation time by utilizing NPU computing power.
[0119] Please refer to Figure 6 The diagram illustrates a structural block diagram of an activation device for a neural network provided in an exemplary embodiment of this application. The device includes:
[0120] The system comprises a first lookup table unit 601, a second lookup table unit 602, and a calculation unit 603, with the calculation unit 603 connected to both the first lookup table unit 601 and the second lookup table unit 602. It should be noted that the first lookup table unit 601, the second lookup table unit 602, and the calculation unit 603 are all hardware units within the NPU.
[0121] The first lookup table unit 601 is used to store a first-level lookup table. The first-level lookup table contains the correspondence between the first-level interval and the interval parameter. The first-level interval is obtained by dividing the range of input values. The input value is a fixed-point number, and the range of input values is a range of fixed-point numbers.
[0122] The first lookup table unit 601 can be constructed in the form of a register file, and the first-level lookup table is essentially a RAM. Since the second-level lookup table stored in the second lookup table unit 602 is a non-uniform table, to avoid increasing power consumption by introducing additional hardware such as comparison units, this embodiment stores a first-level lookup table containing a first-level interval and interval parameters in the first lookup table unit 601. The interval parameters are used to determine the target second-level index corresponding to the input value, so that the auxiliary calculation unit 603 can search the second-level lookup table in the second lookup unit 602 based on the input value.
[0123] The second lookup table unit 602 is used to store a secondary lookup table, which contains the correspondence between secondary intervals and fitting parameters. The fitting parameters are the parameters of the linear fitting of the activation function corresponding to the secondary interval. Each primary interval is divided into at least one secondary interval.
[0124] The calculation unit 603 is used to determine the target first-level interval to which the input value belongs from the first-level lookup table; based on the target interval parameters of the target first-level interval, determine the target fitting parameters of the target second-level interval to which the input value belongs from the second-level lookup table, wherein the target second-level interval belongs to the target first-level interval; and based on the target fitting parameters and the input value, determine the activation value corresponding to the input value, wherein the activation value is a fixed point number.
[0125] Optionally, the computing unit 603 is further configured to:
[0126] Based on the input value, the minimum value of the input value range, and the first-level partitioning granularity, determine the target first-level index corresponding to the input value;
[0127] The first-level interval corresponding to the target first-level index in the first-level lookup table is determined as the target first-level interval.
[0128] Optionally, the computing unit 603 is further configured to:
[0129] Based on the first-level partitioning granularity and the number of target second-level intervals, determine the second-level partitioning granularity of the second-level intervals within the target first-level intervals;
[0130] Based on the input value, the starting value of the target first-level interval, and the second-level partitioning granularity, the index offset is determined;
[0131] Based on the index offset and the starting secondary index, determine the target secondary index;
[0132] The second-level interval corresponding to the target second-level index in the second-level lookup table is determined as the target second-level interval;
[0133] Obtain the target fitting parameters corresponding to the target second-level interval from the second-level lookup table.
[0134] Optionally, the computing unit 603 is further configured to:
[0135] Based on the target fitting parameters, the scaling factor corresponding to the target fitting parameters, and the input value, the activation value corresponding to the input value is determined.
[0136] Optionally, the computing unit 603 is further configured to:
[0137] The activation value corresponding to the input value is determined based on the input value, the slope parameter, the first scaling factor, the intercept parameter, and the second scaling factor.
[0138] In summary, in this embodiment, upon obtaining an input value, the calculation unit calculates the corresponding target primary index based on the input value. Then, based on the target primary index, it determines the target primary interval in the first lookup table unit. The calculation unit then obtains the target interval parameters stored in the first lookup table unit and calculates and determines the target secondary index based on the target interval parameters, thereby determining the target secondary interval. Further, the calculation unit performs activation calculations based on the target fitting parameters read from the second lookup table unit to obtain the activation value. In this embodiment, the computer device stores a primary lookup table for assisting in locating the target secondary interval through the first lookup table unit. This achieves the use of a non-uniform secondary lookup table without increasing power consumption. Furthermore, the use of the secondary lookup table for activation calculations ensures the accuracy of the activation calculation while minimizing storage space usage. It also enables activation calculations of quantized data to be performed in the NPU, improving computational efficiency and resource utilization.
[0139] This application also provides an NPU. The NPU includes programmable logic circuitry and / or program instructions, which, when the NPU is executed, are used to implement the neural network activation method as described in the above embodiments.
[0140] This application also provides a computer-readable storage medium storing at least one instruction, which is loaded and executed by a processor to implement the neural network activation method described in any of the above embodiments.
[0141] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.
[0142] The above description is merely an optional embodiment of this application and is not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.
Claims
1. An activation method for a neural network, characterized in that, The method is used in a neural network processor (NPU), where the hardware units include a first lookup table unit, a second lookup table unit, and a computation unit. The first lookup table unit stores a first-level lookup table, and the second lookup table unit stores a second-level lookup table. The method includes: The calculation unit determines the target first-level index corresponding to the input value based on the input value, the minimum value of the input value range, and the first-level partitioning granularity; the first-level interval corresponding to the target first-level index in the first-level lookup table is determined as the target first-level interval. The first-level lookup table contains the correspondence between the first-level interval and the interval parameter. The first-level interval is obtained by uniformly dividing the input value range based on the first-level partitioning granularity. The input value is a fixed number, and the input value range is a fixed number range. The calculation unit determines the target fitting parameters of the target secondary interval to which the input value belongs from the secondary lookup table based on the target interval parameters of the target primary interval. The target secondary interval belongs to the target primary interval, and each primary interval is divided into at least one secondary interval. The secondary lookup table contains the correspondence between the secondary intervals and the fitting parameters. The fitting parameters are the parameters for linear fitting of the activation function corresponding to the secondary interval. The secondary intervals within each primary interval are evenly divided. The target interval parameters include the starting secondary index of the secondary interval in the target primary interval, the number of target secondary intervals in the target primary interval, and the interval starting value of the target primary interval. The number of secondary intervals in each primary interval is positively correlated with the rate of change of the second derivative of the activation function within the primary interval. The calculation unit determines the activation value corresponding to the input value based on the target fitting parameters and the input value, wherein the activation value is a fixed number.
2. The method according to claim 1, characterized in that, The step of determining the target fitting parameters of the target second-level interval to which the input value belongs from the second-level lookup table based on the target interval parameters of the target first-level interval includes: Based on the first-level partitioning granularity and the number of target second-level intervals, determine the second-level partitioning granularity of the second-level intervals within the target first-level intervals; Based on the input value, the starting value of the target first-level interval, and the second-level partitioning granularity, the index offset is determined; Based on the index offset and the starting secondary index, determine the target secondary index; The second-level interval corresponding to the target second-level index in the second-level lookup table is determined as the target second-level interval; Obtain the target fitting parameters corresponding to the target second-level interval from the second-level lookup table.
3. The method according to claim 1, characterized in that, The fitting parameters in the secondary lookup table have undergone parameter amplification processing; The step of determining the activation value corresponding to the input value based on the target fitting parameters and the input value includes: Based on the target fitting parameters, the scaling factor corresponding to the target fitting parameters, and the input value, the activation value corresponding to the input value is determined.
4. The method according to claim 3, characterized in that, The target fitting parameters include a slope parameter and an intercept parameter. The slope parameter is obtained by scaling up based on a first scaling factor, and the intercept parameter is obtained by scaling up based on a second scaling factor. The step of determining the activation value corresponding to the input value based on the target fitting parameters, the scaling factor corresponding to the target fitting parameters, and the input value includes: The activation value corresponding to the input value is determined based on the input value, the slope parameter, the first scaling factor, the intercept parameter, and the second scaling factor.
5. The method according to claim 1, characterized in that, The fitting parameters were determined using the least squares method.
6. An activation device for a neural network, characterized in that, The device includes: The system comprises a first lookup table unit, a second lookup table unit, and a calculation unit, wherein the calculation unit is connected to the first lookup table unit and the second lookup table unit respectively, and the first lookup table unit, the second lookup table unit, and the calculation unit are all hardware units in a neural network processor (NPU). The first lookup table unit is used to store a first-level lookup table. The first-level lookup table contains the correspondence between the first-level interval and the interval parameter. The first-level interval is obtained by uniformly dividing the input value range based on the first-level division granularity. The input value is a fixed number and the input value range is a fixed number range. The second lookup table unit is used to store a secondary lookup table, which contains the correspondence between secondary intervals and fitting parameters. The fitting parameters are the parameters for linear fitting of the activation function corresponding to the secondary interval. Each primary interval is divided into at least one secondary interval. The secondary intervals within each primary interval are evenly divided, and the number of secondary intervals within each primary interval is positively correlated with the rate of change of the second derivative of the activation function within the primary interval. The calculation unit is configured to: determine the target first-level index corresponding to the input value based on the input value, the minimum value of the input value range, and the first-level partitioning granularity; determine the first-level interval corresponding to the target first-level index in the first-level lookup table as the target first-level interval; determine the target fitting parameters of the target second-level interval to which the input value belongs from the second-level lookup table based on the target interval parameters of the target first-level interval, wherein the target second-level interval belongs to the target first-level interval, and the target interval parameters include the starting second-level index of the second-level interval in the target first-level interval, the number of target second-level intervals in the target first-level interval, and the interval starting value of the target first-level interval; and determine the activation value corresponding to the input value based on the target fitting parameters and the input value, wherein the activation value is a fixed-point number.
7. The apparatus according to claim 6, characterized in that, The computing unit is also used for: Based on the first-level partitioning granularity and the number of target second-level intervals, determine the second-level partitioning granularity of the second-level intervals within the target first-level intervals; Based on the input value, the starting value of the target first-level interval, and the second-level partitioning granularity, the index offset is determined; Based on the index offset and the starting secondary index, determine the target secondary index; The second-level interval corresponding to the target second-level index in the second-level lookup table is determined as the target second-level interval; Obtain the target fitting parameters corresponding to the target second-level interval from the second-level lookup table.
8. The apparatus according to claim 6, characterized in that, The fitting parameters in the secondary lookup table have undergone parameter amplification processing; The computing unit is also used for: Based on the target fitting parameters, the scaling factor corresponding to the target fitting parameters, and the input value, the activation value corresponding to the input value is determined.
9. A neural network processor (NPU), characterized in that, The NPU includes programmable logic circuits and / or program instructions, which, when the NPU is running, are used to implement the activation method of the neural network as described in any one of claims 1 to 5.
10. A neural network processor (NPU), characterized in that, The NPU includes an activation device for the neural network as described in any one of claims 6 to 8.
11. A computer device, characterized in that, The computer device includes a processor and a memory, the processor being configured to load and execute a program stored in the memory to implement the neural network activation method as described in any one of claims 1 to 5.
12. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores at least one program, which is loaded and executed by a processor to implement the neural network activation method as described in any one of claims 1 to 5.
13. A computer program product, characterized in that, The computer program product includes computer instructions stored in a computer-readable storage medium; a processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the neural network activation method as described in any one of claims 1 to 5.