Neural network operation method and apparatus, chip, electronic device, and storage medium
By splitting the convolutional kernel into sub-kernel groups and rearranging and accumulating the data, the data inflation problem caused by img2col is solved, improving the efficiency of neural network computation and reducing power consumption.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ZTE CORP
- Filing Date
- 2021-12-03
- Publication Date
- 2026-06-26
AI Technical Summary
When performing convolution operations, existing neural network accelerators using the img2col method cause input data inflation, increasing data access volume and dynamic power consumption, thus affecting computational efficiency.
The convolution kernel of the neural network is split into sub-convolution kernel groups, and convolution operations are performed by data rearrangement and accumulation, avoiding img2col conversion and performing calculations directly on the original input data.
It eliminates the overhead of hardware design, the increase in data access and memory, and the increase in dynamic power consumption, thereby improving computing efficiency.
Smart Images

Figure CN116306840B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of data computing, and in particular to a neural network computing method, apparatus, chip, electronic device, and storage medium. Background Technology
[0002] 90% of the computation in neural networks is in convolution and fully connected operations. Fully connected operations are essentially a special type of convolution operation. Convolution operations are currently basically converted into matrix operations, implemented through systolic arrays or General Matrix Multiplication (GEMM). Current research on neural networks mainly focuses on how to efficiently implement multiplication and addition operations in convolution, while ignoring the impact of data access on computational efficiency and the increased power consumption caused by memory access.
[0003] To facilitate scheduling, existing neural network accelerators typically use the `img2col` method to arrange weights and activation data. After both weights and input data undergo `img2col`, the two matrices are input to the matrix operation unit for computation, easily obtaining the final result of multiplying the two matrices, which is the output of the neural network convolution. Using `img2col` on weights does not increase the data size; it only requires data rearrangement. Furthermore, since weights can be arranged offline, `img2col` does not incur additional overhead. However, using `img2col` on input data significantly increases the input data volume due to the sliding window of the convolution, such as... Figure 1 As shown, the original input is an image with W=10 and H=10, and the total data size is 10^(10) = 100. After img2col, the data size is 64 * 9 = 576, which is nearly 6 times the original size. If the input size (W*H) is even larger, the theoretical data inflation will be close to the K of the convolution kernel. W *K H The `img2col` method can be implemented in software or hardware, but regardless of the method, it increases access to the input data, leading to increased dynamic power consumption. Furthermore, since neural network computations are inherently memory-constrained, this increase in data volume also results in performance degradation. Summary of the Invention
[0004] The main objective of this application is to provide a neural network operation method, apparatus, electronic device, and storage medium. The aim is to eliminate the hardware design overhead, increased data access volume, and increased dynamic power consumption caused by img2col.
[0005] To achieve the above objectives, this application provides a neural network operation method, comprising: acquiring input data for neural network operation and Wk*Hk sub-convolutional kernel groups, and proceeding to the operation step; wherein, the N Wk*Hk*C convolutional kernels of the neural network operation are split into N*Wk*Hk 1*1*C sub-convolutional kernels, the N*Wk*Hk 1*1*C sub-convolutional kernels are divided into the Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels, where N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolutional kernel corresponds to a portion of the input data in the input data, and when N≥2, the portion of the input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is... The input data is identical; the operation steps include: rearranging the input data according to the data rearrangement method corresponding to each sub-convolutional kernel group to obtain rearranged input data corresponding to each sub-convolutional kernel group; convolving each sub-convolutional kernel group and the rearranged input data corresponding to each sub-convolutional kernel group to obtain a convolution result corresponding to each sub-convolutional kernel group; accumulating the convolution results corresponding to each sub-convolutional kernel group to obtain an accumulation result, and using the data located at a valid position in the accumulation result as the output result of the neural network operation; wherein, the rearranged input data corresponding to each sub-convolutional kernel group has the same data position as the portion of input data corresponding to each sub-convolutional kernel group, and the same data position is the valid position.
[0006] To achieve the above objectives, this application also provides a neural network operation method, including: acquiring input data for neural network operation and Wk*Hk sub-convolutional kernel groups, and proceeding to the operation step; wherein, the N Wk*Hk*C convolutional kernels of the neural network operation are split into N*Wk*Hk 1*1*C sub-convolutional kernels, the N*Wk*Hk 1*1*C sub-convolutional kernels are divided into the Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels; N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolutional kernel corresponds to a portion of the input data in the input data, and when N≥2, the portion of input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is the same; the operation step includes: dividing each... The sub-convolutional kernel groups and the input data are convolved to obtain the convolutional results corresponding to each sub-convolutional kernel group; the convolutional results corresponding to each sub-convolutional kernel group are rearranged according to the data rearrangement method corresponding to each sub-convolutional kernel group to obtain the rearranged convolutional results corresponding to each sub-convolutional kernel group; the rearranged convolutional results corresponding to each sub-convolutional kernel group are accumulated to obtain the accumulated result, and the data in the accumulated result located at the valid position is used as the output result of the neural network operation; wherein, each sub-convolutional kernel group is convolved with the partial input data corresponding to each sub-convolutional kernel group to obtain the valid convolutional results corresponding to each sub-convolutional kernel group, and the valid convolutional results in the rearranged convolutional results corresponding to each sub-convolutional kernel group have the same data position, and the same data position is the valid position.
[0007] To achieve the above objectives, this application also provides a neural network operation method, including: acquiring input data for neural network operation and Wk*Hk sub-convolutional kernel groups; and proceeding to the operation step; wherein, the N Wk*Hk*C convolutional kernels of the neural network operation are split into N*Wk*Hk 1*1*C sub-convolutional kernels, the N*Wk*Hk 1*1*C sub-convolutional kernels are divided into the Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels; N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolutional kernel corresponds to a portion of the input data in the input data, and when N≥2, the portion of the input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is the same; the operation step includes: convolving the i-th sub-convolutional kernel group with the input data to obtain the i-th convolution result; wherein, the i-th sub-convolutional kernel group and the portion of the input data corresponding to the i-th sub-convolutional kernel group are the same. Data convolution yields the effective convolution result corresponding to the i-th sub-convolution kernel group, and the i-th convolution result contains the effective convolution result; the (i-1)-th accumulation result is rearranged so that the effective convolution result in the rearranged (i-1)-th accumulation result and the effective convolution result in the i-th convolution result have the same data position; the rearranged (i-1)-th accumulation result is added to the i-th convolution result to obtain the i-th accumulation result; if i is less than Wk*Hk, i is updated to i+1, and the operation steps are executed again; if i is equal to Wk*Hk, the effective convolution result in the i-th accumulation result is used as the output result of the neural network operation; wherein, the initial value of i is 1, and when i=1, the 0th accumulation result is set to zero, and the effective convolution result in the rearranged 0th accumulation result and the effective convolution result in the 1st convolution result are assumed to have the same data position.
[0008] To achieve the above objectives, embodiments of this application also provide a neural network computing device, including: a first storage unit, a second storage unit, a control unit, a first data rearrangement unit, a convolution unit, and an addition unit; the first storage unit is used to store input data for neural network operations, and the second storage unit is used to store Wk*Hk sub-convolution kernel groups for the neural network operations; wherein, the N Wk*Hk*C convolution kernels for the neural network operations are split into N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels... The kernel is divided into Wk*Hk sub-kernel groups, and each sub-kernel group includes N 1*1*C sub-kernels, where N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-kernel corresponds to a portion of the input data, and when N≥2, the portion of input data corresponding to the N sub-kernels in each sub-kernel group is the same; the control unit is used to obtain the input data from the first storage unit and input the input data into the first data rearrangement unit, and the control unit is also used to sort the input data corresponding to each sub-kernel group. The data rearrangement method is sent to the first data rearrangement unit; the first data rearrangement unit is used to rearrange the input data according to the data rearrangement method corresponding to each sub-convolution kernel group, to obtain the rearranged input data corresponding to each sub-convolution kernel group, and output the rearranged input data corresponding to each sub-convolution kernel group to the convolution unit; wherein, the rearranged input data corresponding to each sub-convolution kernel group has the same data position as the part of input data corresponding to each sub-convolution kernel group, and the same data position is the valid position; the control unit also The method is used to obtain each of the sub-convolution kernel groups from the second storage unit and send each of the sub-convolution kernel groups to the convolution unit; the convolution unit is used to convolve each of the sub-convolution kernel groups and the rearranged input data corresponding to each of the sub-convolution kernel groups to obtain the convolution result corresponding to each of the sub-convolution kernel groups, and output the convolution result corresponding to each of the sub-convolution kernel groups to the addition unit; the addition unit is used to accumulate the convolution results corresponding to each of the sub-convolution kernel groups to obtain an accumulation result, and use the data in the accumulation result located at the valid position as the output result of the neural network operation.
[0009] To achieve the above objectives, embodiments of this application also provide a neural network computing device, including: a first storage unit, a second storage unit, a control unit, a second data rearrangement unit, a convolution unit, and an addition unit; the first storage unit is used to store input data for neural network operations, and the second storage unit is used to store Wk*Hk sub-convolutional kernel groups for the neural network operations; wherein, the N Wk*Hk*C convolutional kernels for the neural network operations are split into N*Wk*Hk 1*1*C sub-convolutional kernels, and the N*Wk*Hk sub-convolutional kernels are... The 1*1*C sub-convolutional kernel is divided into Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels, where N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolutional kernel corresponds to a portion of the input data, and when N≥2, the portion of input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is the same; the control unit is used to obtain the input data from the first storage unit and input the input data into the convolutional unit, and the control unit is also used to... The control unit retrieves each of the sub-convolution kernel groups from the second storage unit and inputs each of the sub-convolution kernel groups into the convolution unit; the convolution unit is used to convolve each of the sub-convolution kernel groups and the input data respectively to obtain the convolution result corresponding to each of the sub-convolution kernel groups, and outputs the convolution result corresponding to each of the sub-convolution kernel groups to the second data rearrangement unit; the control unit is also used to send the data rearrangement method corresponding to each of the sub-convolution kernel groups to the second data rearrangement unit; the second data rearrangement unit adjusts the data rearrangement method according to the data rearrangement method corresponding to each of the sub-convolution kernel groups. The convolution results corresponding to each of the sub-convolution kernel groups are rearranged to obtain rearranged convolution results corresponding to each of the sub-convolution kernel groups, and the rearranged convolution results corresponding to each of the sub-convolution kernel groups are output to the addition unit; wherein, each of the sub-convolution kernel groups is convolved with the partial input data corresponding to each of the sub-convolution kernel groups to obtain the effective convolution results corresponding to each of the sub-convolution kernel groups, and the effective convolution results in the rearranged convolution results corresponding to each of the sub-convolution kernel groups have the same data position, and the same data position is the effective position.
[0010] To achieve the above objectives, embodiments of this application also provide a neural network computing device, comprising: a first storage unit, a second storage unit, a third storage unit, a control unit, a third data rearrangement unit, a convolution unit, and an addition unit; the first storage unit is used to store input data for neural network operations, and the second storage unit is used to store Wk*Hk sub-convolution kernel groups for the neural network operations; wherein, the N Wk*Hk*C convolution kernels of the neural network operations are split into N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into the Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels, where N, Wk, Hk, and C are... All are integers greater than or equal to 1; each sub-convolutional kernel corresponds to a portion of the input data, and when N≥2, the portion of input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is the same; the control unit is used to obtain the input data from the first storage unit and input the input data into the convolution unit; the control unit is also used to obtain the i-th sub-convolutional kernel group from the second storage unit and input the i-th sub-convolutional kernel group into the convolution unit; the convolution unit is used to convolve the i-th sub-convolutional kernel group and the input data to obtain the i-th convolution result, and output the i-th convolution result to the addition unit; wherein, the i-th sub-convolutional kernel group and the i-th sub-convolutional kernel group The corresponding input data is convolved to obtain the effective convolution result corresponding to the i-th sub-convolution kernel group, and the i-th convolution result contains the effective convolution result; the control unit is further configured to obtain the (i-1)-th accumulation result from the third storage unit and send the (i-1)-th accumulation result to the third data rearrangement unit; the third data rearrangement unit is configured to rearrange the (i-1)-th accumulation result so that the effective convolution result in the rearranged (i-1)-th accumulation result and the effective convolution result in the i-th convolution result have the same data position; and output the rearranged (i-1)-th accumulation result to the addition unit; and output the rearranged (i-1)-th accumulation result to the addition unit. The (i-1)th accumulation result is accumulated with the i-th convolution result to obtain the i-th accumulation result, and the i-th accumulation result is stored in the third storage unit, overwriting the (i-1)-th accumulation result; the control unit is also used to determine the value of i. If i is less than Wk*Hk, i is updated to i+1, and the operation step is executed again; if i is equal to Wk*Hk, the effective convolution result in the i-th accumulation result is used as the output result of the neural network operation; wherein, the initial value of i is 1, and when i=1, the 0th accumulation result is set to zero, and the effective convolution result in the rearranged 0th accumulation result and the effective convolution result in the 1st convolution result are assumed to have the same data position.
[0011] To achieve the above objectives, embodiments of this application also provide a chip, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the above-described neural network operation method.
[0012] To achieve the above objectives, embodiments of this application also provide an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the above-described neural network operation method.
[0013] To achieve the above objectives, embodiments of this application also provide a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the aforementioned neural network operation method.
[0014] The neural network operation method proposed in this application obtains the input data and Wk*Hk sub-convolutional kernel groups during the neural network operation process, and then proceeds to the operation step. Specifically, the N Wk*Hk*C convolutional kernels of the neural network operation are split into N*Wk*Hk 1*1*C sub-convolutional kernels. These N*Wk*Hk 1*1*C sub-convolutional kernels are further divided into Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels, where N, Wk, Hk, and C are all integers greater than or equal to 1. Each sub-convolutional kernel corresponds to a portion of the input data, and when N≥2, the N kernels in each sub-convolutional kernel group... The input data corresponding to each sub-convolutional kernel group is partially identical. The computation steps include: rearranging the input data according to the data rearrangement method corresponding to each sub-convolutional kernel group to obtain the rearranged input data corresponding to each sub-convolutional kernel group; convolving each sub-convolutional kernel group with the rearranged input data corresponding to each sub-convolutional kernel group to obtain the convolution result corresponding to each sub-convolutional kernel group; accumulating the convolution results corresponding to each sub-convolutional kernel group to obtain the accumulated result, and using the data in the accumulated result at the valid position as the output result of the neural network operation; wherein, the data positions in the rearranged input data corresponding to each sub-convolutional kernel group have the same data positions as the partial input data corresponding to each sub-convolutional kernel group, and the same data positions are valid positions. By splitting the convolution, the input data is reused without needing to rearrange the data using img2col to meet scheduling and computation requirements. Since no img2col transformation is performed on the input data, computation is performed directly on the original input data, thereby eliminating the hardware design overhead, increased data access volume, and increased dynamic power consumption caused by img2col. Attached Figure Description
[0015] Figure 1 This is a schematic diagram of the img2col processing of input data in the prior art;
[0016] Figure 2 This is a flowchart of the neural network operation method provided in the embodiments of this application;
[0017] Figure 3 This is a schematic diagram of the input data provided in the embodiments of this application;
[0018] Figure 4 This is a schematic diagram of the sub-convolution kernel group provided in the embodiments of this application;
[0019] Figure 5 This is a schematic diagram of the valid position of the rearranged input data provided in the embodiments of this application;
[0020] Figure 6 This is a schematic diagram of the split input data provided in the embodiments of this application;
[0021] Figure 7 This is a schematic diagram of convolution of input data in existing technology;
[0022] Figure 8 This is a schematic diagram illustrating the convolution of input data provided in an embodiment of this application;
[0023] Figure 9 This is a flowchart of the neural network operation method provided in the embodiments of this application;
[0024] Figure 10 This is a flowchart of the neural network operation method provided in the embodiments of this application;
[0025] Figure 11 This is a flowchart of the neural network operation method provided in the embodiments of this application;
[0026] Figure 12 This is a schematic diagram of the structure of the neural network computing device provided in the embodiments of this application;
[0027] Figure 13 This is a schematic diagram of the structure of the neural network computing device provided in the embodiments of this application;
[0028] Figure 14 This is a schematic diagram of the structure of the neural network computing device provided in the embodiments of this application;
[0029] Figure 15 This is a schematic diagram of the chip structure provided in the embodiments of this application;
[0030] Figure 16This is a schematic diagram of the structure of the electronic device provided in the embodiments of this application. Detailed Implementation
[0031] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the various embodiments of this application will be described in detail below with reference to the accompanying drawings. However, those skilled in the art will understand that many technical details have been provided in the various embodiments of this application to help readers better understand this application. However, the technical solutions claimed in this application can be implemented even without these technical details and various changes and modifications based on the following embodiments. The division of the various embodiments below is for the convenience of description and should not constitute any limitation on the specific implementation of this application. The various embodiments can be combined with and referenced by each other without contradiction.
[0032] One embodiment of this application relates to a neural network operation method, such as... Figure 2 As shown, it includes:
[0033] Step 101: Obtain the input data for the neural network operation and Wk*Hk sub-convolutional kernel groups, and proceed to the operation step; wherein, the N Wk*Hk*C convolutional kernels of the neural network operation are split into N*Wk*Hk 1*1*C sub-convolutional kernels, and the N*Wk*Hk 1*1*C sub-convolutional kernels are divided into Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels, where N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolutional kernel corresponds to a portion of the input data, and when N≥2, the portion of the input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is the same.
[0034] In one example implementation, such as Figure 3 As shown, the obtained input data is W. 输入 *H 输入 *C 输入 Data, when C 输入 When C is 1, the input data is two-dimensional data; when C is 1, the input data is two-dimensional data. 输入 When the value is greater than 1, the input data is three-dimensional data.
[0035] In one example implementation, such as Figure 4 As shown, the obtained sub-convolutional kernel groups are obtained by splitting the convolutional kernel. The number of sub-convolutional kernels contained in a sub-convolutional kernel group is determined by the number of convolutional kernels that were split. For example, if there are 9 convolutional kernels that were split, each obtained sub-convolutional kernel group will contain 9 sub-convolutional kernels. The number of sub-convolutional kernel groups is determined by the width W and length H of the convolutional kernel. For example, if the convolutional kernels that were split are 3*3 in size, after splitting, 3*3=9 sub-convolutional kernel groups can be obtained.
[0036] In one example implementation, each sub-kernel corresponds to a portion of the input data, such as: Figure 4 The sub-convolution kernel 00 corresponds to Figure 3 The input data from 00 to 77 Figure 4 The sub-convolution kernel 01 corresponds to Figure 3 The input data from 01 to 78 Figure 4 The sub-convolution kernel 02 corresponds to Figure 3 The input data includes numbers from 02 to 79, ... Figure 4 The sub-convolution kernel 22 corresponds to Figure 3 The input data includes numbers 22 to 99; when a sub-convolutional kernel group contains more than two sub-convolutional kernels, the partial input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is the same, such as: Figure 4 Each sub-convolution kernel 00 in the first sub-convolution kernel group corresponds to Figure 3 The input data from 00 to 77 Figure 4 The sub-kernels 0 and 1 in the second sub-kernel group correspond to Figure 3 The input data from 01 to 78 Figure 4 The sub-convolution kernels 0 and 2 in the third sub-convolution kernel group correspond to Figure 3 The input data includes numbers from 02 to 79, ... Figure 4 The 9th sub-convolutional kernel group corresponds to each sub-convolutional kernel 22 Figure 3 The input data is from 22 to 99.
[0037] Step 102: Rearrange the input data according to the data rearrangement method corresponding to each sub-convolutional kernel group to obtain the rearranged input data corresponding to each sub-convolutional kernel group; Convolve each sub-convolutional kernel group and the rearranged input data corresponding to each sub-convolutional kernel group to obtain the convolution result corresponding to each sub-convolutional kernel group; Accumulate the convolution results corresponding to each sub-convolutional kernel group to obtain the accumulated result, and use the data in the accumulated result that is in a valid position as the output result of the neural network operation; wherein, the data positions in the rearranged input data corresponding to each sub-convolutional kernel group that have the same data positions as the input data corresponding to each sub-convolutional kernel group are valid positions.
[0038] In one example implementation, the computation process for each sub-convolutional kernel group and the input data is completed in the matrix operation unit of the neural network. Each sub-convolutional kernel group has its corresponding data rearrangement method. Before convolving each sub-convolutional kernel group with the input data, it is necessary to obtain the data rearrangement method corresponding to each sub-convolutional kernel group. The input data is then rearranged according to the data rearrangement method corresponding to each sub-convolutional kernel group to obtain the rearranged input data corresponding to each sub-convolutional kernel group. Then, each sub-convolutional kernel group and its corresponding rearranged input data are convolved to obtain the convolution result corresponding to each sub-convolutional kernel group. Finally, the convolution results corresponding to each sub-convolutional kernel group are accumulated to obtain the accumulated result, and the data in the valid position in the accumulated result is used as the output result of the neural network operation.
[0039] In one example implementation, the rearranged input data corresponding to each sub-convolutional kernel group shares the same data position as a portion of the input data corresponding to each sub-convolutional kernel group; these shared data positions are considered valid. In other words, the rearranged input data of each sub-convolutional kernel group shares the same data position as a portion of its input data, but the data at these shared positions has been rearranged. Figure 4 The data rearrangement method corresponding to the first sub-convolutional kernel group 00 is set so that the positions of each part of the input data remain unchanged. The rearranged input data corresponding to the first sub-convolutional kernel group 00 is as follows: Figure 5 As shown; Figure 4 The data rearrangement method corresponding to the second sub-convolutional kernel group 01 is set to shift each column of data in each part of the input data forward by one column. The rearranged input data corresponding to the second sub-convolutional kernel group 01 is as follows: Figure 5 As shown, and so on. Figure 4 The data rearrangement method corresponding to the Wk*Hkth sub-convolutional kernel group 22 is set to shift each column of the input data forward by two columns and each row upward by two columns. The rearranged input data corresponding to the Wk*Hkth sub-convolutional kernel group 22 is as follows: Figure 5 As shown; where the effective position is Figure 5 The solid line portion of the data.
[0040] In one example implementation, to ensure efficient operation of the matrix operation unit, the input data bit width for matrix operations needs to match the scale of the matrix operations. Assuming the matrix operation module can output an M*N matrix per cycle, the input data bandwidth is M*W. i W i The bit width for representing a single data item, such as W in INT8 precision. i =8, W at FP16 precision i=16. Assume C0 is the depth of the input data participating in one matrix operation, that is, the granularity of the input data in the depth direction, with a minimum of 1. C1 is the total depth C of the input data, calculated based on the number of times the granularity C0 is divided. The input data is stored in the buffer in the order C1HWC0, such as... Figure 6 As shown, M*Wi bit width data will be divided into M*Wi / C0 groups. Each group of data is stored in a memory block with a bit width of Wi*C0. Each group of data has its own address management, and W can be retrieved from any position in a single cycle. i *C0 bit data. The data rearrangement module rearranges the data read from each buffer and then sends it to the matrix operation module for processing.
[0041] In one example implementation, when the input data is stored in a memory block, the instructions input to the matrix operation unit only need to specify the starting address of the input data, the starting address of the weight data, and the sizes of the two matrices involved in the matrix operation, i.e., the sizes of m, n, and k. Assuming that the minimum size supported by the matrix unit is M*K and K*N matrix operations, the control module will automatically retrieve M*K data from storage unit 1 and K*N data from storage unit 2 each cycle and load them into the matrix operation unit for matrix operations. For the two data matrices of m*k and k*n, the control unit will automatically split and perform calculations. Therefore, the values of m, n, and k must be integer multiples of M, N, and K. If not, they need to be padded at the time of input. That is, when participating in matrix operations, the values of m, n, and k must satisfy the condition that they are integer multiples of M, N, and K.
[0042] In one example implementation, Figure 7 The diagram shows the convolution methods used in existing technologies. Figure 8 The convolution process used in this application uses the same input source data for each sub-convolutional kernel group, thus avoiding data inflation. The convolution calculation process is basically the same as the traditional convolution calculation process: first, the input data and weight data are loaded. For a single convolutional block, the input data is loaded all at once, and the weight data during convolution can be determined based on... Figure 8 The method shown involves splitting the data into batches for loading, but it can also load all the weights into the cache unit at once. For better parallelism, input data and weight data can be loaded into their respective storage units simultaneously. After the input data and weights are loaded, they are processed according to... Figure 8 The described splitting rules involve multiple matrix operations without changing the input data. Only different starting positions need to be specified. The input data is highly reused, and for the data during convolution, different weights from the convolution are used in the matrix operation each time.
[0043] In one example implementation, assume Figure 8The depth C shown is 16, then for Figure 8 The input data shown is 10*10*16. Figure 4 The convolution kernel shown is 3*3*16 (split into...) Figure 8 The 1*1*16 sub-convolutional kernel groups will add two columns of invalid computation to the output (i.e., Figure 5 (As shown by the dashed line in the middle), assuming the matrix computation itself is 100% efficient, then for 10*10*16 input data, the efficiency with a 3*3*16 convolution kernel is 8*8 / (8*10) = 80%. However, general-purpose neural network accelerators typically struggle to achieve efficiencies greater than 50% in single-graph mode. Therefore, this computation mode doesn't significantly impact the overall network efficiency, but the input data access is only (10*10) / (9*8*8) = 17.3% in the img2col mode. For input sizes larger than 10*10, the proportion of invalid computations is even lower, at (Wk-1) / W. If the input size is too small, computation can be performed in batches or via img2col. Since the input data is usually not a bottleneck in this case, generating img2col data will not affect system performance.
[0044] In one example implementation, for a sub-convolution kernel group containing N sub-convolution kernels, each of the N sub-convolution kernels in the sub-convolution kernel group is convolved with the rearranged input data corresponding to the sub-convolution kernel group to obtain N sub-convolution results. The N sub-convolution results are then used as the Nth layer data of the convolution results corresponding to the sub-convolution kernel group.
[0045] In this embodiment of the application, during the neural network operation, the input data and Wk*Hk sub-convolutional kernel groups are obtained, and the operation step is initiated. Specifically, the N Wk*Hk*C convolutional kernels of the neural network operation are split into N*Wk*Hk 1*1*C sub-convolutional kernels. These N*Wk*Hk 1*1*C sub-convolutional kernels are further divided into Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels, where N, Wk, Hk, and C are all integers greater than or equal to 1. Each sub-convolutional kernel corresponds to a portion of the input data, and when N≥2, the N sub-convolutional kernels in each sub-convolutional kernel group... The corresponding input data is the same; the operation steps include: rearranging the input data according to the data rearrangement method corresponding to each sub-convolutional kernel group to obtain the rearranged input data corresponding to each sub-convolutional kernel group; convolving each sub-convolutional kernel group with the rearranged input data corresponding to each sub-convolutional kernel group to obtain the convolution result corresponding to each sub-convolutional kernel group; accumulating the convolution results corresponding to each sub-convolutional kernel group to obtain the accumulated result, and using the data in the valid position in the accumulated result as the output result of the neural network operation; wherein, the data position in the rearranged input data corresponding to each sub-convolutional kernel group has the same data position as the part of the input data corresponding to each sub-convolutional kernel group, and the same data position is the valid position. By splitting the convolution, the input data is reused without needing to rearrange the data using img2col to meet scheduling and computation requirements. Since no img2col conversion is performed on the input data, the computation is performed directly on the original input data, thereby eliminating the hardware design overhead, increased data access volume, and increased dynamic power consumption caused by img2col.
[0046] One embodiment of this application relates to the computational steps of a neural network computation method, such as... Figure 9 As shown, it includes:
[0047] Step 201: Rearrange the input data according to the data rearrangement method corresponding to the i-th sub-convolutional kernel group to obtain the i-th rearranged input data.
[0048] In one example implementation, after obtaining the input data, this application obtains sub-convolutional kernel groups. However, when obtaining sub-convolutional kernel groups, it does not obtain all Wk*Hk sub-convolutional kernel groups at once; instead, it first obtains the first sub-convolutional kernel group, and after all the operation steps are completed in the first sub-convolutional kernel group, it obtains the second sub-convolutional kernel group, and so on, until the Wk*Hk sub-convolutional kernel group is obtained.
[0049] In one example implementation, for each obtained i-th sub-convolutional kernel group, the i-th data rearrangement method corresponding to the i-th sub-convolutional kernel group is first determined, and the input data is rearranged according to the i-th data rearrangement method to obtain the i-th rearranged input data; where i takes values from 1 to Wk*Hk.
[0050] In one example implementation, the i-th sub-convolutional kernel group is loaded using a data overlay loading method. For example, when loading the second sub-convolutional kernel group, the second sub-convolutional kernel group is used to overlay the first sub-convolutional kernel group, thereby reducing the memory occupied by storing each sub-convolutional kernel group.
[0051] In one example implementation, when the value of i is 1, the data rearrangement method corresponding to the first sub-convolutional kernel group is set so that the positions of each part of the input data remain unchanged.
[0052] Step 202: Convolve the i-th sub-convolution kernel group and the i-th rearranged input data to obtain the i-th convolution result.
[0053] In one example implementation, after obtaining the i-th rearranged input data, the i-th sub-convolutional kernel group and the i-th rearranged input data are convolved to obtain the i-th convolution result.
[0054] In one example implementation, when the neural network contains X matrix operation units (taking 3 as an example), it can be... Figure 4 The nine sub-convolution kernel groups are divided into three groups. In the first operation, the first three sub-convolution kernel groups are input into three matrix operation units for operation. In the second operation, the fourth to sixth sub-convolution kernel groups are input into three matrix operation units for operation. In the third operation, the seventh to ninth sub-convolution kernel groups are input into three matrix operation units for operation, and the corresponding convolution results are obtained.
[0055] Step 203: Add the i-th convolution result and the (i-1)-th accumulation result to obtain the i-th accumulation result.
[0056] In one example implementation, the generated i-th convolution result is summed with the previous (i-1)-th summation result to obtain the i-th summation result.
[0057] In one example implementation, when the value of i is 1, the 0th accumulation result corresponding to the 1st sub-convolutional kernel group is set to zero.
[0058] In one example implementation, when the neural network contains Y adder units (taking 3 as an example), it can be... Figure 4The nine convolution results corresponding to the nine sub-convolution kernel groups are divided into three groups. The first three convolution results are input into the first addition unit to calculate the sum of the first three convolution results. The fourth to sixth convolution results are input into the second addition unit to calculate the sum of the fourth to sixth convolution results. The seventh and eighth convolution results are input into the third addition unit to calculate the sum of the seventh and eighth convolution results. Then, the sum of the first three convolution results, the fourth to sixth convolution results, and the seventh and eighth convolution results are input into any one of the addition units to calculate and obtain the cumulative result of the nine convolution results.
[0059] Step 204: If i is less than Wk*Hk, update i to i+1 and execute the operation step again; if i is equal to Wk*Hk, use the data at the valid position in the i-th accumulation result as the output result of the neural network operation.
[0060] In one example implementation, after the operation of the i-th sub-convolutional kernel group is completed, it is necessary to determine the value of i. When the value of i is less than Wk*Hk, the value of i is updated to i+1, and the (i+1)-th sub-convolutional kernel group is loaded, and the operation steps are executed again. When the value of i is equal to Wk*Hk, the operation steps end, and the data at the valid position in the i-th accumulation result is used as the output result of the neural network operation.
[0061] In addition to other embodiments, the implementation of this application can also perform convolution operations on sub-convolution kernel groups and input data in a serial or parallel manner, thereby enabling the neural network operation method mentioned in this application to be applied to various types of neural networks.
[0062] One embodiment of this application relates to a method for operating a neural network, such as... Figure 10 As shown, it includes:
[0063] Step 301: Obtain the input data for the neural network operation and Wk*Hk sub-convolutional kernel groups, and proceed to the operation step; wherein, the N Wk*Hk*C convolutional kernels of the neural network operation are split into N*Wk*Hk 1*1*C sub-convolutional kernels, and the N*Wk*Hk 1*1*C sub-convolutional kernels are divided into Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels, where N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolutional kernel corresponds to a portion of the input data, and when N≥2, the portion of the input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is the same.
[0064] In one example implementation, this step is largely the same as step 101 in the embodiments of this application, and will not be described in detail here.
[0065] Step 302: Convolve each sub-convolutional kernel group and the input data to obtain the convolution result corresponding to each sub-convolutional kernel group; rearrange the convolution results corresponding to each sub-convolutional kernel group according to the data rearrangement method corresponding to each sub-convolutional kernel group to obtain the rearranged convolution results corresponding to each sub-convolutional kernel group; accumulate the rearranged convolution results corresponding to each sub-convolutional kernel group to obtain the accumulated result, and use the data in the accumulated result at the valid position as the output result of the neural network operation; wherein, each sub-convolutional kernel group is convolved with a portion of the input data corresponding to each sub-convolutional kernel group to obtain the valid convolution result corresponding to each sub-convolutional kernel group, and the valid convolution results in the rearranged convolution results corresponding to each sub-convolutional kernel group have the same data position, and the same data position is a valid position.
[0066] In one example implementation, each sub-convolutional kernel group and the input data are first convolved to obtain the convolution result corresponding to each sub-convolutional kernel group. Then, the data rearrangement method corresponding to each sub-convolutional kernel group is obtained, and the convolution result corresponding to each sub-convolutional kernel group is rearranged according to the data rearrangement method to obtain the rearranged convolution result corresponding to each sub-convolutional kernel group. Then, the rearranged convolution results corresponding to each sub-convolutional kernel group are accumulated to obtain the accumulated result, and the data in the valid position in the accumulated result is used as the output result of the neural network operation.
[0067] In one example implementation, for a sub-convolution kernel group containing N sub-convolution kernels, each of the N sub-convolution kernels in the sub-convolution kernel group is convolved with the rearranged input data corresponding to the sub-convolution kernel group to obtain N sub-convolution results. The N sub-convolution results are then used as the Nth layer data of the convolution results corresponding to the sub-convolution kernel group.
[0068] In one example implementation, the data rearrangement method corresponding to each sub-convolution kernel group is roughly the same as the data rearrangement method mentioned in step 102 of the embodiment of this application, and will not be described in detail here.
[0069] In one example implementation, the serial and parallel operations mentioned in steps 201 to 204 can both be applied to the embodiments of this application; first, a sub-convolution kernel group is convolved, rearranged, and then accumulated until the last sub-convolution kernel group is convolved, rearranged, and then accumulated, and the output result of the neural network operation can be obtained; the operation steps can also be performed by multiple matrix operation units and / or multiple addition units.
[0070] In addition to other embodiments, the implementation of this application can first perform convolution on each sub-convolution kernel group and the input data, and then perform operations such as rearranging and accumulating the convolution results of each sub-convolution kernel group and the input data, so that this application has specific provisions on the order of convolution and rearrangement, thereby improving the applicability of this application.
[0071] One embodiment of this application relates to a neural network operation method, such as...Figure 11 As shown, it includes:
[0072] Step 401: Obtain the input data for the neural network operation and Wk*Hk sub-convolutional kernel groups, and proceed to the operation step; wherein, the N Wk*Hk*C convolutional kernels of the neural network operation are split into N*Wk*Hk 1*1*C sub-convolutional kernels, and the N*Wk*Hk 1*1*C sub-convolutional kernels are divided into Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels, where N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolutional kernel corresponds to a portion of the input data, and when N≥2, the portion of the input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is the same.
[0073] In one example implementation, this step is largely the same as step 101 in the embodiments of this application, and will not be described in detail here.
[0074] Step 402: Convolve the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result; wherein, the i-th sub-convolution kernel group is convolved with the corresponding part of the input data to obtain the effective convolution result corresponding to the i-th sub-convolution kernel group, and the i-th convolution result contains the effective convolution result.
[0075] In one example implementation, the i-th sub-convolution kernel group is convolved with the input data to obtain the i-th convolution result; during the process of convolving the i-th sub-convolution kernel group with the input data, the convolution result obtained by convolving the i-th sub-convolution kernel group with the corresponding part of the input data is called the effective convolution result, that is, the i-th convolution result contains the effective convolution result.
[0076] In one example implementation, for a sub-convolution kernel group containing N sub-convolution kernels, each of the N sub-convolution kernels in the sub-convolution kernel group is convolved with the rearranged input data corresponding to the sub-convolution kernel group to obtain N sub-convolution results. The N sub-convolution results are then used as the Nth layer data of the convolution results corresponding to the sub-convolution kernel group.
[0077] In one example implementation, Wk*Hk sub-kernel groups can be loaded sequentially. When loading the i-th sub-kernel group, it is loaded in a data overlay manner, that is, the i-th sub-kernel group is used to overlay the (i-1)-th sub-kernel group.
[0078] Step 403: Rearrange the (i-1)th accumulation result so that the effective convolution result in the rearranged (i-1)th accumulation result and the effective convolution result in the i-th convolution result have the same data position.
[0079] In one example implementation, the (i-1)th accumulation result is rearranged according to the data rearrangement method corresponding to the i-th sub-convolution kernel group, so that the effective convolution result in the (i-1)th accumulation result after rearrangement has the same data position as the effective convolution result in the i-th convolution result.
[0080] Step 404: Add the rearranged (i-1)th cumulative result to the ith convolution result to obtain the ith cumulative result.
[0081] In one example implementation, the rearranged (i-1)th cumulative result is added to the ith convolution result to obtain the ith cumulative result.
[0082] In one example implementation, the initial value of i is 1, and when i = 1, the 0th accumulation result is set to zero, and the valid convolution results in the rearranged 0th accumulation result and the valid convolution results in the 1st convolution result are assumed to have the same data position.
[0083] Step 405: If i is less than Wk*Hk, update i to i+1 and execute the operation step again; if i is equal to Wk*Hk, use the valid convolution result in the i-th accumulation result as the output result of the neural network operation.
[0084] In one example implementation, after the operation of the i-th sub-convolutional kernel group is completed, it is necessary to determine the value of i. When the value of i is less than Wk*Hk, the value of i is updated to i+1, and the (i+1)-th sub-convolutional kernel group is loaded, and the operation steps are executed again. When the value of i is equal to Wk*Hk, the operation steps end, and the data at the valid position in the i-th accumulation result is used as the output result of the neural network operation.
[0085] In addition to other embodiments, the implementation of this application can also perform convolution operations on sub-convolution kernel groups and input data in a serial or parallel manner, thereby enabling the neural network operation method mentioned in this application to be applied to various types of neural networks.
[0086] The steps of the various methods described above are only for clarity. In practice, they can be combined into one step or some steps can be split into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but without changing the core design of the algorithm and process, are also within the scope of protection of this patent.
[0087] Another embodiment of this application relates to a neural network computing device. The details of the neural network computing device of this embodiment are described below. The following content is merely for ease of understanding and is not essential for implementing this example. Figure 12This is a schematic diagram of the neural network computing device described in this embodiment, including: a first storage unit 1201, a second storage unit 1202, a control unit 1203, a first data rearrangement unit 1204, a convolution unit 1205, and an addition unit 1206.
[0088] The first storage unit is used to store the input data for neural network operations;
[0089] The second storage unit is used to store the Wk*Hk sub-convolutional kernel groups for neural network operations. The N Wk*Hk*C convolutional kernels for neural network operations are split into N*Wk*Hk 1*1*C sub-convolutional kernels. These N*Wk*Hk 1*1*C sub-convolutional kernels are divided into Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels. N, Wk, Hk, and C are all integers greater than or equal to 1. Each sub-convolutional kernel corresponds to a portion of the input data, and when N≥2, the portion of input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is the same.
[0090] The control unit is used to obtain input data from the first storage unit and input the input data into the first data rearrangement unit. The control unit is also used to send the data rearrangement method corresponding to each sub-convolution kernel group to the first data rearrangement unit.
[0091] The first data rearrangement unit is used to rearrange the input data according to the data rearrangement method corresponding to each sub-convolutional kernel group, to obtain the rearranged input data corresponding to each sub-convolutional kernel group, and output the rearranged input data corresponding to each sub-convolutional kernel group to the convolutional unit; wherein, the rearranged input data corresponding to each sub-convolutional kernel group has the same data position as the part of the input data corresponding to each sub-convolutional kernel group, and the same data position is a valid position.
[0092] The control unit is also used to retrieve each sub-convolution kernel group from the second storage unit and send each sub-convolution kernel group to the convolution unit;
[0093] The convolution unit is used to convolve the input data after rearranging each sub-convolution kernel group to obtain the convolution result corresponding to each sub-convolution kernel group, and output the convolution result corresponding to each sub-convolution kernel group to the addition unit.
[0094] The addition unit is used to sum the convolution results corresponding to each sub-convolution kernel group to obtain the summed result, and use the data in the valid position in the summed result as the output result of the neural network operation.
[0095] In one example implementation, the neural network computing apparatus provided in this application further includes a third storage unit for storing the computation results of the previous sub-convolution kernel group and input data computation when the neural network computation process is a serial computation.
[0096] Another embodiment of this application relates to a neural network computing device. The details of the neural network computing device of this embodiment are described below. The following content is merely for ease of understanding and is not essential for implementing this example. Figure 13 This is a schematic diagram of the neural network computing device described in this embodiment, including: a first storage unit 1301, a second storage unit 1302, a control unit 1303, a second data rearrangement unit 1404, a convolution unit 1405, and an addition unit 1406.
[0097] The first storage unit is used to store the input data for neural network operations.
[0098] The second storage unit is used to store the Wk*Hk sub-convolutional kernel groups for neural network operations. The N Wk*Hk*C convolutional kernels for neural network operations are split into N*Wk*Hk 1*1*C sub-convolutional kernels. These N*Wk*Hk 1*1*C sub-convolutional kernels are divided into Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels. N, Wk, Hk, and C are all integers greater than or equal to 1. Each sub-convolutional kernel corresponds to a portion of the input data, and when N≥2, the portion of input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is the same.
[0099] The control unit is used to obtain input data from the first storage unit and input the input data into the convolution unit. The control unit is also used to obtain each sub-convolution kernel group from the second storage unit and input each sub-convolution kernel group into the convolution unit.
[0100] The convolution unit is used to convolve each sub-convolution kernel group and the input data to obtain the convolution result corresponding to each sub-convolution kernel group, and output the convolution result corresponding to each sub-convolution kernel group to the second data rearrangement unit.
[0101] The control unit is also used to send the data rearrangement method corresponding to each sub-convolution kernel group to the second data rearrangement unit.
[0102] The second data rearrangement unit is used to rearrange the convolution results corresponding to each sub-convolution kernel group according to the data rearrangement method corresponding to each sub-convolution kernel group, to obtain the rearranged convolution results corresponding to each sub-convolution kernel group, and output the rearranged convolution results corresponding to each sub-convolution kernel group to the addition unit; wherein, each sub-convolution kernel group is convolved with the partial input data corresponding to each sub-convolution kernel group to obtain the effective convolution results corresponding to each sub-convolution kernel group, and the effective convolution results in the rearranged convolution results corresponding to each sub-convolution kernel group have the same data position, and the same data position is a valid position.
[0103] In one example implementation, the neural network computing apparatus provided in this application further includes a third storage unit for storing the computation results of the previous sub-convolution kernel group and input data computation when the neural network computation process is a serial computation.
[0104] Another embodiment of this application relates to a neural network computing device. The details of the neural network computing device of this embodiment are described below. The following content is merely for ease of understanding and is not essential for implementing this example. Figure 14 This is a schematic diagram of the neural network computing device described in this embodiment, including: a first storage unit 1401, a second storage unit 1402, a control unit 1403, a third data rearrangement unit 1404, a convolution unit 1405, and an addition unit 1406.
[0105] The first storage unit is used to store the input data for neural network operations.
[0106] The second storage unit is used to store the Wk*Hk sub-convolutional kernel groups for neural network operations. The N Wk*Hk*C convolutional kernels for neural network operations are split into N*Wk*Hk 1*1*C sub-convolutional kernels. These N*Wk*Hk 1*1*C sub-convolutional kernels are divided into Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels. N, Wk, Hk, and C are all integers greater than or equal to 1. Each sub-convolutional kernel corresponds to a portion of the input data, and when N≥2, the portion of input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is the same.
[0107] The control unit is used to obtain input data from the first storage unit and input the input data into the convolution unit. The control unit is also used to obtain the i-th sub-convolution kernel group from the second storage unit and input the i-th sub-convolution kernel group into the convolution unit.
[0108] The convolution unit is used to convolve the i-th sub-convolution kernel group and the input data to obtain the i-th convolution result, and output the i-th convolution result to the addition unit; wherein, the i-th sub-convolution kernel group is convolved with the part of the input data corresponding to the i-th sub-convolution kernel group to obtain the effective convolution result corresponding to the i-th sub-convolution kernel group, and the i-th convolution result contains the effective convolution result.
[0109] The control unit is also used to obtain the (i-1)th accumulation result from the third storage unit and send the (i-1)th accumulation result to the third data rearrangement unit.
[0110] The third data rearrangement unit rearranges the (i-1)th accumulation result so that the effective convolution results in the rearranged (i-1)th accumulation result and the effective convolution results in the i-th convolution result have the same data position; and outputs the rearranged (i-1)th accumulation result to the addition unit; accumulates the rearranged (i-1)th accumulation result with the i-th convolution result to obtain the i-th accumulation result, and stores the i-th accumulation result in the third storage unit, overwriting the (i-1)th accumulation result.
[0111] The control unit is also used to determine the value of i. If i is less than Wk*Hk, i is updated to i+1 and the calculation steps are executed again. If i is equal to Wk*Hk, the effective convolution result in the i-th accumulation result is used as the output result of the neural network operation. The initial value of i is 1, and when i=1, the 0th accumulation result is set to zero, and the effective convolution result in the rearranged 0th accumulation result and the effective convolution result in the 1st convolution result are assumed to have the same data position.
[0112] It is not difficult to see that this embodiment is a system embodiment corresponding to the above method embodiments, and this embodiment can be implemented in conjunction with the above method embodiments. The relevant technical details and technical effects mentioned in the above embodiments are still valid in this embodiment, and will not be repeated here to reduce repetition. Accordingly, the relevant technical details mentioned in this embodiment can also be applied to the above embodiments.
[0113] It is worth mentioning that all modules involved in this embodiment are logical modules. In practical applications, a logical unit can be a physical unit, a part of a physical unit, or a combination of multiple physical units. Furthermore, to highlight the innovative aspects of this application, this embodiment does not introduce units that are not closely related to solving the technical problems proposed in this application; however, this does not mean that other units are absent in this embodiment.
[0114] Another embodiment of this application relates to a chip, such as Figure 6 As shown, it includes: at least one processor 601; and a memory 602 communicatively connected to the at least one processor 601; wherein the memory 602 stores instructions executable by the at least one processor 601, the instructions being executed by the at least one processor 601 to enable the at least one processor 601 to perform the neural network operation methods in the above embodiments.
[0115] The memory and processor are connected via a bus, which can include any number of interconnecting buses and bridges, connecting various circuits of one or more processors and memories. The bus can also connect various other circuits, such as peripheral devices, voltage regulators, and power management circuits, which are well known in the art and will not be described further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver can be a single element or multiple elements, such as multiple receivers and transmitters, providing a unit for communicating with various other devices over a transmission medium. Data processed by the processor is transmitted over the wireless medium via an antenna, which further receives data and transmits it to the processor.
[0116] The processor manages the bus and general processing, and also provides various functions, including timing, peripheral interfaces, voltage regulation, power management, and other control functions. Memory is used to store data used by the processor during operation.
[0117] Another embodiment of this application relates to an electronic device, such as... Figure 6 As shown, it includes: at least one processor 601; and a memory 602 communicatively connected to the at least one processor 601; wherein the memory 602 stores instructions executable by the at least one processor 601, the instructions being executed by the at least one processor 601 to enable the at least one processor 601 to perform the neural network operation methods in the above embodiments.
[0118] The memory and processor are connected via a bus, which can include any number of interconnecting buses and bridges, connecting various circuits of one or more processors and memories. The bus can also connect various other circuits, such as peripheral devices, voltage regulators, and power management circuits, which are well known in the art and will not be described further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver can be a single element or multiple elements, such as multiple receivers and transmitters, providing a unit for communicating with various other devices over a transmission medium. Data processed by the processor is transmitted over the wireless medium via an antenna, which further receives data and transmits it to the processor.
[0119] The processor manages the bus and general processing, and also provides various functions, including timing, peripheral interfaces, voltage regulation, power management, and other control functions. Memory is used to store data used by the processor during operation.
[0120] Another embodiment of this application relates to a computer-readable storage medium storing a computer program. When executed by a processor, the computer program implements the method embodiments described above.
[0121] That is, those skilled in the art will understand that all or part of the steps in the methods of the above embodiments can be implemented by a program instructing related hardware. This program is stored in a storage medium and includes several instructions to cause a device (which may be a microcontroller, chip, etc.) or processor to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as a USB flash drive, a portable hard drive, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
[0122] Those skilled in the art will understand that the above embodiments are specific implementations of this application, and in practical applications, various changes can be made in form and detail without departing from the spirit and scope of this application.
Claims
1. A neural network operation method, characterized in that, include: The process involves acquiring the input data for the neural network operation and Wk*Hk sub-convolutional kernel groups, and then proceeding to the operation step. Specifically, the N Wk*Hk*C convolutional kernels of the neural network operation are split into N*Wk*Hk 1*1*C sub-convolutional kernels. These N*Wk*Hk 1*1*C sub-convolutional kernels are further divided into the Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels, where N, Wk, Hk, and C are all integers greater than or equal to 1. Each sub-convolutional kernel corresponds to a portion of the input data, and when N≥2, the portion of input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is the same. The calculation steps include: The input data is rearranged according to the data rearrangement method corresponding to each sub-convolutional kernel group to obtain rearranged input data corresponding to each sub-convolutional kernel group; the sub-convolutional kernel group and the rearranged input data corresponding to each sub-convolutional kernel group are convolved to obtain the convolution result corresponding to each sub-convolutional kernel group; the convolution results corresponding to each sub-convolutional kernel group are accumulated to obtain an accumulated result, and the data in the valid position in the accumulated result is used as the output result of the neural network operation; Among them, the rearranged input data corresponding to each of the sub-convolutional kernel groups has the same data position as the part of the input data corresponding to each of the sub-convolutional kernel groups, and the same data position is the valid position.
2. The neural network operation method according to claim 1, characterized in that, The calculation steps include: The input data is rearranged according to the data rearrangement method corresponding to the i-th sub-convolutional kernel group to obtain the i-th rearranged input data. The i-th sub-convolutional kernel group and the i-th rearranged input data are convolved to obtain the i-th convolution result; The i-th convolution result and the (i-1)-th accumulation result are summed to obtain the i-th accumulation result; If i is less than Wk*Hk, update i to i+1 and execute the operation step again; if i is equal to Wk*Hk, use the data at the effective position in the i-th accumulation result as the output result of the neural network operation. Where the initial value of i is 1, and when i=1, the data rearrangement method corresponding to the first sub-convolutional kernel group is set so that the positions of each part of the input data remain unchanged, and the 0th accumulation result is set to zero.
3. The neural network operation method according to claim 2, characterized in that, The process of obtaining the input data for neural network operations and Wk*Hk sub-convolutional kernel groups includes: Load the input data; Before rearranging the input data according to the data rearrangement method corresponding to the i-th sub-convolutional kernel group to obtain the i-th rearranged input data, the i-th sub-convolutional kernel group is loaded.
4. The neural network operation method according to claim 3, characterized in that, The loading of the i-th sub-convolutional kernel group specifically means loading the i-th sub-convolutional kernel group in a data overlay manner.
5. The neural network operation method according to any one of claims 1 to 4, characterized in that, When N≥2, the step of convolving the input data after rearranging the sub-convolution kernel groups and their corresponding sub-convolution kernel groups to obtain the convolution result corresponding to each sub-convolution kernel group is as follows: For each sub-convolution kernel group, the N sub-convolution kernels in the sub-convolution kernel group are convolved with the rearranged input data corresponding to the sub-convolution kernel group to obtain N sub-convolution results, and the N sub-convolution results are used as the N layers of data in the convolution results corresponding to the sub-convolution kernel group.
6. A neural network operation method, characterized in that, include: The process involves acquiring the input data for the neural network operation and Wk*Hk sub-convolutional kernel groups, and then proceeding to the operation step. Specifically, the N Wk*Hk*C convolutional kernels of the neural network operation are split into N*Wk*Hk 1*1*C sub-convolutional kernels. These N*Wk*Hk 1*1*C sub-convolutional kernels are further divided into the Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels. N, Wk, Hk, and C are all integers greater than or equal to 1. Each sub-convolutional kernel corresponds to a portion of the input data, and when N≥2, the portion of input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is the same. The calculation steps include: Each sub-convolutional kernel group is convolved with the input data to obtain the convolution result corresponding to each sub-convolutional kernel group; the convolution results corresponding to each sub-convolutional kernel group are rearranged according to the data rearrangement method corresponding to each sub-convolutional kernel group to obtain the rearranged convolution results corresponding to each sub-convolutional kernel group; the rearranged convolution results corresponding to each sub-convolutional kernel group are accumulated to obtain the accumulated result, and the data in the valid position in the accumulated result is used as the output result of the neural network operation; Wherein, each of the sub-convolution kernel groups is convolved with the corresponding portion of the input data to obtain the effective convolution result corresponding to each of the sub-convolution kernel groups. The effective convolution results in the rearranged convolution results corresponding to each of the sub-convolution kernel groups have the same data position, and the same data position is the effective position.
7. The neural network operation method according to claim 6, characterized in that, The calculation steps include: Convolve the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result; The i-th convolution result is rearranged according to the data rearrangement method corresponding to the i-th convolution result to obtain the i-th rearranged convolution result; The i-th rearranged convolution result and the (i-1)-th accumulation result are summed to obtain the i-th accumulation result; If i is less than Wk*Hk, update i to i+1 and execute the operation step again; if i is equal to Wk*Hk, use the data at the effective position in the i-th accumulation result as the output result of the neural network operation. Where the initial value of i is 1, and when i=1, the data rearrangement method corresponding to the first convolution result is set so that the positions of each part of the convolution result in the first convolution result remain unchanged, and the 0th accumulation result is set to zero.
8. The neural network operation method according to claim 7, characterized in that, The process of obtaining the input data for neural network operations and Wk*Hk sub-convolutional kernel groups includes: Load the input data; Before convolving the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result, the i-th sub-convolution kernel group is loaded.
9. The neural network operation method according to claim 8, characterized in that, The loading of the i-th sub-convolutional kernel group specifically means loading the i-th sub-convolutional kernel group in a data overlay manner.
10. The neural network operation method according to any one of claims 6 to 9, characterized in that, When N≥2, the step of convolving each of the sub-convolution kernel groups with the input data to obtain the convolution result corresponding to each of the sub-convolution kernel groups specifically involves: For each sub-convolution kernel group, the N sub-convolution kernels in the sub-convolution kernel group are convolved with the input data to obtain N sub-convolution results, and the N sub-convolution results are used as the N layers of data in the convolution result corresponding to the sub-convolution kernel group.
11. A neural network operation method, characterized in that, include: The process involves acquiring the input data for the neural network operation and Wk*Hk sub-convolutional kernel groups, and then proceeding to the operation step. Specifically, the N Wk*Hk*C convolutional kernels of the neural network operation are split into N*Wk*Hk 1*1*C sub-convolutional kernels. These N*Wk*Hk 1*1*C sub-convolutional kernels are further divided into the Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels. N, Wk, Hk, and C are all integers greater than or equal to 1. Each sub-convolutional kernel corresponds to a portion of the input data, and when N≥2, the portion of input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is the same. The calculation steps include: The i-th sub-convolution kernel group is convolved with the input data to obtain the i-th convolution result; wherein, the i-th sub-convolution kernel group is convolved with the portion of input data corresponding to the i-th sub-convolution kernel group to obtain the effective convolution result corresponding to the i-th sub-convolution kernel group, and the i-th convolution result includes the effective convolution result; The (i-1)th accumulation result is rearranged so that the effective convolution result in the rearranged (i-1)th accumulation result and the effective convolution result in the i-th convolution result have the same data position; The (i-1)th cumulative result after rearrangement is added to the ith convolution result to obtain the ith cumulative result; If i is less than Wk*Hk, update i to i+1 and execute the operation step again; if i is equal to Wk*Hk, use the effective convolution result in the i-th accumulation result as the output result of the neural network operation. Where the initial value of i is 1, and when i=1, the 0th accumulation result is set to zero, and the effective convolution result in the rearranged 0th accumulation result and the effective convolution result in the 1st convolution result are assumed to have the same data position.
12. The neural network operation method according to claim 11, characterized in that, The process of obtaining the input data for neural network operations and Wk*Hk sub-convolutional kernel groups includes: Load the input data; Before convolving the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result, the i-th sub-convolution kernel group is loaded.
13. The neural network operation method according to claim 12, characterized in that, The loading of the i-th sub-convolutional kernel group specifically means loading the i-th sub-convolutional kernel group in a data overlay manner.
14. The neural network operation method according to any one of claims 11 to 13, characterized in that, When N≥2, the step of convolving the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result is as follows: The N sub-convolution kernels in the i-th sub-convolution kernel group are convolved with the input data respectively to obtain N sub-convolution results, and the N sub-convolution results are used as the N layers of data in the convolution result corresponding to the i-th sub-convolution kernel group.
15. A neural network computing device, characterized in that, include: The system comprises a first storage unit, a second storage unit, a control unit, a first data rearrangement unit, a convolution unit, and an addition unit; The first storage unit is used to store the input data of the neural network operation, and the second storage unit is used to store the Wk*Hk sub-convolutional kernel groups of the neural network operation; wherein, the N Wk*Hk*C convolutional kernels of the neural network operation are split into N*Wk*Hk 1*1*C sub-convolutional kernels, and the N*Wk*Hk 1*1*C sub-convolutional kernels are divided into the Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels, where N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolutional kernel corresponds to a portion of the input data, and when N≥2, the portion of input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is the same; The control unit is used to obtain the input data from the first storage unit and input the input data into the first data rearrangement unit. The control unit is also used to send the data rearrangement method corresponding to each of the sub-convolution kernel groups to the first data rearrangement unit. The first data rearrangement unit is used to rearrange the input data according to the data rearrangement method corresponding to each of the sub-convolutional kernel groups to obtain rearranged input data corresponding to each of the sub-convolutional kernel groups, and output the rearranged input data corresponding to each of the sub-convolutional kernel groups to the convolutional unit; wherein, the rearranged input data corresponding to each of the sub-convolutional kernel groups has the same data position as the part of input data corresponding to each of the sub-convolutional kernel groups, and the same data position is a valid position. The control unit is further configured to obtain each of the sub-convolution kernel groups from the second storage unit and send each of the sub-convolution kernel groups to the convolution unit; The convolution unit is used to convolve each of the sub-convolution kernel groups and the rearranged input data corresponding to each of the sub-convolution kernel groups to obtain the convolution result corresponding to each of the sub-convolution kernel groups, and output the convolution result corresponding to each of the sub-convolution kernel groups to the addition unit; The addition unit is used to accumulate the convolution results corresponding to each of the sub-convolution kernel groups to obtain an accumulated result, and to use the data in the valid position in the accumulated result as the output result of the neural network operation.
16. A neural network computing device, characterized in that, include: The system comprises a first storage unit, a second storage unit, a control unit, a second data rearrangement unit, a convolution unit, and an addition unit. The first storage unit is used to store the input data of the neural network operation, and the second storage unit is used to store the Wk*Hk sub-convolutional kernel groups of the neural network operation; wherein, the N Wk*Hk*C convolutional kernels of the neural network operation are split into N*Wk*Hk 1*1*C sub-convolutional kernels, and the N*Wk*Hk 1*1*C sub-convolutional kernels are divided into the Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels, where N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolutional kernel corresponds to a portion of the input data, and when N≥2, the portion of input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is the same; The control unit is configured to obtain the input data from the first storage unit and input the input data into the convolution unit. The control unit is also configured to obtain each of the sub-convolution kernel groups from the second storage unit and input each of the sub-convolution kernel groups into the convolution unit. The convolution unit is used to convolve each of the sub-convolution kernel groups and the input data to obtain the convolution result corresponding to each of the sub-convolution kernel groups, and output the convolution result corresponding to each of the sub-convolution kernel groups to the second data rearrangement unit; The control unit is also used to send the data rearrangement method corresponding to each of the sub-convolutional kernel groups to the second data rearrangement unit; The second data rearrangement unit rearranges the convolution results corresponding to each sub-convolution kernel group according to the data rearrangement method corresponding to each sub-convolution kernel group, obtains the rearranged convolution results corresponding to each sub-convolution kernel group, and outputs the rearranged convolution results corresponding to each sub-convolution kernel group to the addition unit. Wherein, each of the sub-convolution kernel groups is convolved with the corresponding portion of the input data to obtain the effective convolution result corresponding to each of the sub-convolution kernel groups. The effective convolution results in the rearranged convolution results corresponding to each of the sub-convolution kernel groups have the same data position, and the same data position is a valid position.
17. A neural network computing device, characterized in that, include: The system comprises a first storage unit, a second storage unit, a third storage unit, a control unit, a third data rearrangement unit, a convolution unit, and an addition unit. The first storage unit is used to store the input data of the neural network operation, and the second storage unit is used to store the Wk*Hk sub-convolutional kernel groups of the neural network operation; wherein, the N Wk*Hk*C convolutional kernels of the neural network operation are split into N*Wk*Hk 1*1*C sub-convolutional kernels, and the N*Wk*Hk 1*1*C sub-convolutional kernels are divided into the Wk*Hk sub-convolutional kernel groups, and each sub-convolutional kernel group includes N 1*1*C sub-convolutional kernels, where N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolutional kernel corresponds to a portion of the input data, and when N≥2, the portion of input data corresponding to the N sub-convolutional kernels in each sub-convolutional kernel group is the same; The control unit is configured to obtain the input data from the first storage unit and input the input data into the convolution unit. The control unit is also configured to obtain the i-th sub-convolution kernel group from the second storage unit and input the i-th sub-convolution kernel group into the convolution unit. The convolutional unit, the control unit, and the third data rearrangement unit are used to perform computation steps, which include: The convolution unit is used to convolve the i-th sub-convolution kernel group and the input data to obtain the i-th convolution result, and output the i-th convolution result to the addition unit; wherein, the i-th sub-convolution kernel group is convolved with the portion of input data corresponding to the i-th sub-convolution kernel group to obtain the effective convolution result corresponding to the i-th sub-convolution kernel group, and the i-th convolution result includes the effective convolution result; The control unit is further configured to obtain the (i-1)th accumulation result from the third storage unit and send the (i-1)th accumulation result to the third data rearrangement unit; The third data rearrangement unit is used to rearrange the (i-1)th accumulation result so that the effective convolution result in the rearranged (i-1)th accumulation result and the effective convolution result in the i-th convolution result have the same data position; and outputs the rearranged (i-1)th accumulation result to the addition unit; The rearranged (i-1)th accumulation result is added to the ith convolution result to obtain the ith accumulation result, and the ith accumulation result is stored in the third storage unit, overwriting the (i-1)th accumulation result; The control unit is also used to determine the value of i. If i is less than Wk*Hk, i is updated to i+1 and the operation step is executed again. If i is equal to Wk*Hk, the effective convolution result in the i-th accumulation result is used as the output result of the neural network operation. Where the initial value of i is 1, and when i=1, the 0th accumulation result is set to zero, and the effective convolution result in the rearranged 0th accumulation result and the effective convolution result in the 1st convolution result are assumed to have the same data position.
18. A chip, characterized in that, include: At least one processing module; as well as, A storage module communicatively connected to the at least one processing module; wherein, The storage module stores instructions that can be executed by the at least one processing module, the instructions being executed by the at least one processing module to enable the at least one processing module to perform the method as described in any one of claims 1 to 5, or the method as described in any one of claims 6 to 10, or the method as described in any one of claims 11 to 14.
19. An electronic device, characterized in that, include: At least one processor; as well as, A memory communicatively connected to the at least one processor; wherein, The memory stores instructions executable by the at least one processor, which, when executed by the at least one processor, enables the at least one processor to perform the method as described in any one of claims 1 to 5, or the method as described in any one of claims 6 to 10, or the method as described in any one of claims 11 to 14.
20. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the method as described in any one of claims 1 to 5, or performs the method as described in any one of claims 6 to 10, or performs the method as described in any one of claims 11 to 14.