Sparse data processing method and device of neural network processor
By combining and shifting the weighted subvectors, the problem of high hardware cost in existing technologies is solved, and variable precision sparse data processing is achieved, reducing hardware requirements.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- AXERA TECH (BEIJING) CO LTD
- Filing Date
- 2022-11-23
- Publication Date
- 2026-06-23
AI Technical Summary
Existing neural network processors cannot effectively utilize the distribution of weights in sparse data processing, resulting in high hardware costs and an inability to support sparse computation with variable precision.
By obtaining multiple sets of weight sub-vectors, the feature vectors corresponding to the weight vectors to be calculated are determined, and the weight sub-vectors are combined to obtain a combined weight vector with the same information unit supported by the basic computing unit. Then, the basic computing unit is controlled to perform shift and addition calculations on the combined weight vector and the feature vector to achieve sparse data processing.
While ensuring the accuracy of sparse data processing, it effectively reduces hardware costs and supports sparse computation with variable precision.
Smart Images

Figure CN115809683B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of deep learning technology, and in particular to a method and apparatus for sparse data processing of a neural network processor. Background Technology
[0002] The core structure of deep learning neural network computation is matrix multiplication. Matrix computation generally consists of vector inner product calculations. The two inputs to matrix multiplication are features (or some integer operation on the features) and weights. On the inference side, quantization techniques are generally used to perform inference through integer calculations, such as INT8 and INT4.
[0003] In related technologies, the computation of deep learning neural networks generally exhibits sparse properties, with the weights of deep learning neural networks often following a normal distribution. For normally distributed data, the range of values varies considerably. For a set of weights, the value range is more often distributed within the INT2 / INT4 range, and less often within the INT6 / INT8 range.
[0004] In this approach, existing neural network processors cannot effectively utilize the distribution of weights and cannot support sparse computation with variable precision, resulting in high hardware costs required for sparse data processing. Summary of the Invention
[0005] This disclosure aims to at least partially address one of the technical problems in the related art.
[0006] Therefore, the purpose of this disclosure is to propose a sparse data processing method, apparatus, neural network processor, electronic device, non-transient computer-readable storage medium storing computer instructions, and computer program product for a neural network processor, which can fully utilize the distribution of weights, support sparse computation with variable precision, and effectively reduce the hardware cost required for sparse data processing while ensuring the accuracy of sparse data processing.
[0007] The first aspect of this disclosure discloses a method for sparse data processing of a neural network processor, the neural network processor comprising: a basic computing unit; the method comprising: acquiring multiple sets of weight sub-vectors, wherein the weight sub-vectors are obtained by sparse processing of a weight vector to be computed; determining a feature vector to be computed corresponding to the weight vector to be computed; combining the multiple sets of weight sub-vectors to obtain a combined weight vector, wherein the combined weight vector has the same information unit supported by the basic computing unit; and controlling the basic computing unit to perform shift-add calculation on the combined weight vector and the feature vector to be computed to obtain a sparse data processing result.
[0008] The second aspect of this disclosure provides a sparse data processing apparatus for a neural network processor, the neural network processor comprising: a basic computing unit; the apparatus comprising: an acquisition module for acquiring multiple sets of weight sub-vectors, wherein the weight sub-vectors are obtained by sparsifying a weight vector to be computed; a determination module for determining a feature vector to be computed corresponding to the weight vector to be computed; a processing module for combining the multiple sets of weight sub-vectors to obtain a combined weight vector, wherein the combined weight vector has the same information unit supported by the basic computing unit; and a control module for controlling the basic computing unit to perform shift-add calculations on the combined weight vector and the feature vector to be computed to obtain a sparse data processing result.
[0009] The neural network processor proposed in the third aspect of this disclosure includes a processing unit and a basic computing unit. The processing unit is configured to acquire multiple sets of weight sub-vectors, wherein the weight sub-vectors are obtained by sparsifying a weight vector to be computed; determine a feature vector to be computed corresponding to the weight vector to be computed; combine the multiple sets of weight sub-vectors to obtain a combined weight vector, wherein the combined weight vector has the same information unit supported by the basic computing unit; and control the basic computing unit to perform shift-add calculations on the combined weight vector and the feature vector to be computed to obtain a sparsified data processing result.
[0010] An electronic device according to a fourth aspect embodiment of this disclosure includes: at least one neural network processor; and a memory communicatively connected to the at least one neural network processor; wherein the memory stores instructions executable by the at least one neural network processor, the instructions being executed by the at least one neural network processor to enable the at least one neural network processor to perform a sparse data processing method for a neural network processor as proposed in a first aspect embodiment of this disclosure.
[0011] The fifth aspect of this disclosure provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the sparse data processing method of the neural network processor as proposed in the first aspect of this disclosure.
[0012] A sixth aspect of this disclosure provides a computer program product in which, when instructions in the computer program product are executed by a processor, the sparse data processing method of a neural network processor as described in a first aspect of this disclosure is performed.
[0013] Additional aspects and advantages of this disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of this disclosure. Attached Figure Description
[0014] The above and / or additional aspects and advantages of this disclosure will become apparent and readily understood from the following description of the embodiments taken in conjunction with the accompanying drawings, in which:
[0015] Figure 1 This is a schematic flowchart of a sparse data processing method for a neural network processor according to an embodiment of the present disclosure;
[0016] Figure 2 This is a flowchart illustrating a sparse data processing method for a neural network processor according to another embodiment of this disclosure;
[0017] Figure 3a This is an application diagram of one embodiment of the present disclosure;
[0018] Figure 3b This is another application illustration in the embodiments of this disclosure;
[0019] Figure 4a This is another application illustration in the embodiments of this disclosure;
[0020] Figure 4b This is another application illustration in the embodiments of this disclosure;
[0021] Figure 5a This is another application illustration in the embodiments of this disclosure;
[0022] Figure 5b This is another application illustration in the embodiments of this disclosure;
[0023] Figure 6a This is another application illustration in the embodiments of this disclosure;
[0024] Figure 6b This is another application illustration in the embodiments of this disclosure;
[0025] Figure 7 This is a schematic diagram of the structure of a sparse data processing device for a neural network processor according to an embodiment of the present disclosure;
[0026] Figure 8 This is a schematic diagram of the structure of a neural network processor proposed in an embodiment of this disclosure;
[0027] Figure 9 A block diagram of an exemplary electronic device suitable for implementing embodiments of the present disclosure is shown. Detailed Implementation
[0028] Embodiments of this disclosure are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are used only to explain this disclosure, and should not be construed as limiting this disclosure. Rather, embodiments of this disclosure include all variations, modifications, and equivalents falling within the spirit and scope of the appended claims.
[0029] Figure 1 This is a schematic flowchart of a sparse data processing method for a neural network processor proposed in an embodiment of this disclosure.
[0030] This embodiment illustrates the example of a sparse data processing method for a neural network processor being configured in a sparse data processing device for a neural network processor. In this embodiment, the sparse data processing method for a neural network processor can be configured in a sparse data processing device for a neural network processor. The sparse data processing device for a neural network processor can be located in a server or in an electronic device. This embodiment does not limit the scope of the invention.
[0031] This embodiment uses the example of a sparse data processing method for a neural network processor being configured in an electronic device. The electronic device includes hardware devices with various operating systems, such as smartphones, tablets, personal digital assistants, and e-readers.
[0032] It should be noted that the execution entity of the embodiments disclosed herein may be, in hardware, a central processing unit (CPU) in a server or electronic device, and in software, a related background service in a server or electronic device, without limitation.
[0033] The neural network processor in this embodiment of the disclosure may be, for example, a processor that performs some deep learning calculations, such as a neural-network processing unit (NPU), and there is no limitation thereto.
[0034] The neural network processor in this embodiment includes a basic computing unit, which can be understood as a basic computing unit in a deep learning neural network. The basic computing unit can be used to process input data, such as a set of feature vectors and a set of weight vectors. The basic computing unit can then perform online inference based on the input set of feature vectors and the corresponding set of weight vectors.
[0035] like Figure 1 As shown, the sparse data processing method of this neural network processor includes:
[0036] S101: Obtain multiple sets of weight sub-vectors, where the weight sub-vectors are obtained by sparsifying the weight vectors to be calculated.
[0037] The information unit (i.e., bit) supported by the basic computing unit in this embodiment can be, for example, 4 bits x 16 bits, 8 bits x 8 bits, or 4 bits x 8 bits, and there is no limitation thereto.
[0038] The weight vector to be calculated can be represented by W, and the corresponding feature vector to be calculated can be represented by F. The basic computing unit can then perform inference based on the weight vector W and the feature vector F to obtain the inference result, which is the sparse data processing result, and can be represented by Y.
[0039] The weight sub-vector refers to the sparse splitting of the weight vector W to be calculated. When performing the sparse splitting, the weight vector W to be calculated can be sparsed according to a preset ratio pattern, without any restrictions.
[0040] The preset ratio pattern is, for example, 8:6:2, which means that for 8-bit vector data (feature vector or weight vector to be calculated), 6 numbers are selected for the low bits and 2 numbers are selected for the high bits, or for example, 8:4:4, which is not restricted.
[0041] For example, if the length of the weight vector W to be calculated is 8 bits, then 6 numbers can be selected for the high bits of the weight vector W to be calculated (the selected numbers can be combined into a set of weight sub-vectors), and 2 numbers can be selected for the low bits (the selected numbers can be combined into another set of weight sub-vectors), without any restrictions.
[0042] In some embodiments of this disclosure, the weight sub-vector includes multiple weight sub-elements, which are obtained by sparsifying the weight elements in the weight vector to be calculated, thereby enabling efficient sparsification calculation of the weight vector to be calculated and facilitating full utilization of the weight distribution.
[0043] In this context, a weight element refers to one or more numerical values describing the weights contained in the weight vector to be calculated. For example, the numerical values describing the weights in the high and low bits of the weight vector to be calculated in the example above can be referred to as weight elements.
[0044] Then, the weight elements in the weight vector to be calculated can be sparsified. The sparsification method is to select numbers from the high bits and low bits respectively. For example, the 6 numbers selected from the high bits and the 2 numbers selected from the low bits can be called weight sub-elements. The 6 numbers selected from the high bits can be combined into a weight sub-vector, and the 2 numbers selected from the low bits can be combined into another weight sub-vector.
[0045] S102: Determine the feature vector to be calculated corresponding to the weight vector to be calculated.
[0046] After obtaining multiple sets of weight sub-vectors by sparsifying the weight vector to be calculated, the feature vector to be calculated corresponding to the weight vector to be calculated can be determined, that is, the feature vector to be calculated corresponding to the weight vector W to be calculated can be determined.
[0047] S103: Combine multiple sets of weight sub-vectors to obtain a combined weight vector, wherein the combined weight vector has the same information unit as the basic computing unit.
[0048] After obtaining multiple sets of weight sub-vectors by sparsifying the weight vector to be calculated and determining the feature vector to be calculated corresponding to the weight vector to be calculated, the multiple sets of weight sub-vectors can be combined, and the vector obtained by the combination process can be used as the combined weight vector.
[0049] For example, the 6 numbers selected from the high bits can be combined into a weighted sub-vector, and the 2 numbers selected from the low bits can be combined into another weighted sub-vector. The two weighted sub-vectors can then be combined, and the resulting vector can be used as the combined weighted vector.
[0050] S104: Control the basic computing unit to perform shift and addition calculations on the combined weight vector and the feature vector to be calculated to obtain the sparse data processing result.
[0051] The above process combines multiple sets of weight sub-vectors to obtain a combined weight vector. Since the combined weight vector has the same information unit supported by the basic computing unit, the basic computing unit can be controlled to perform shift and addition calculations on the combined weight vector and the feature vector to be calculated to obtain the sparse data processing result.
[0052] In some embodiments, the basic computing unit can be controlled to select feature elements from the feature vector to be calculated according to the selection method of the weight sub-vector. For example, several feature elements can be selected as feature sub-elements. The selected multiple feature sub-elements can be formed into feature sub-vectors, and multiple feature sub-vectors can be combined to obtain a combined feature vector. Then, the combined weight vector and feature sub-vectors are shifted and added to obtain the sparse data processing result. There are no restrictions on this.
[0053] In this process, the combined feature vector can be shifted, and the combined weight vector and the combined feature vector obtained from the shifting operation can be multiplied by a vector dot product. The result can be called the sparse data processing result.
[0054] For example, in the offline computation phase: the weight vector W to be computed is preprocessed. Assuming the weight vector W includes weight elements Wi, and each weight element Wi is 8 bits, it is split into two groups of 4-bit weight elements. Then, several weight elements can be selected from each group of 4-bit weight elements (the selected weight elements can be called weight sub-elements; weight sub-elements belonging to the same group can form a weight sub-vector). Then, based on whether each 4-bit weight element is 0, a sparsity index is constructed. That is, in the offline computation phase, the weight vector to be computed is sparsified to obtain multiple groups of weight sub-vectors. Then, the multiple groups of weight sub-vectors are combined to obtain a combined weight vector. In the online inference phase: based on the sparsity index, numbers are selected from the feature vector to be computed (the selection method can be seen in subsequent embodiments). The selected numbers are then combined to form a combined feature vector. The basic computation unit is controlled to perform shift-add calculations on the combined weight vector and the feature vector to be computed to obtain the sparsity data processing result.
[0055] In this embodiment of the present disclosure, since multiple sets of weight sub-vectors are obtained by sparsifying the weight vector to be calculated, and then the multiple sets of weight sub-vectors are combined to obtain a combined weight vector, the basic computing unit supports shifting and adding calculations based on the combined weight vector and the feature vector to be calculated. Since the consistency of the vector length between the combined weight vector and the feature vector to be calculated is guaranteed, it is not necessary to rely on the basic computing unit that supports different information units to achieve online inference. Under the premise of ensuring the accuracy of sparsified data processing, the hardware cost required for sparsified data processing is effectively reduced.
[0056] In this embodiment, multiple sets of weight sub-vectors are obtained, where the weight sub-vectors are obtained by sparsifying the weight vector to be calculated. The feature vector to be calculated corresponding to the weight vector to be calculated is determined, and the multiple sets of weight sub-vectors are combined to obtain a combined weight vector. The combined weight vector has the same information unit supported by the basic computing unit. The basic computing unit is controlled to perform shift and addition calculations on the combined weight vector and the feature vector to be calculated to obtain the sparsified data processing result. This can make full use of the weight distribution, support sparse calculation with variable precision, and effectively reduce the hardware cost required for sparse data processing while ensuring the accuracy of sparse data processing.
[0057] Figure 2 This is a flowchart illustrating a sparse data processing method for a neural network processor according to another embodiment of this disclosure.
[0058] like Figure 2 As shown, the sparse data processing method of this neural network processor includes:
[0059] S201: Determine multiple weight elements of first bits and / or multiple weight elements of second bits from the weight vector to be calculated, wherein the first bits are lower than the second bits.
[0060] In this embodiment, the weight vector W to be calculated is 8 bits. The weight element of the first bit can be, for example, the weight element of the lower four bits of the weight vector W to be calculated, and the weight element of the second bit can be, for example, the weight element of the higher four bits of the weight vector W to be calculated. That is, the first bit is lower than the second bit.
[0061] S202: Select a first number of weight elements from multiple weight elements of the first bit as the first weight sub-element, wherein each first weight sub-element has a corresponding first index.
[0062] In order to effectively achieve the sparsification of the weight vector to be computed, in this embodiment of the present disclosure, several weight elements can be selected from the weight elements of the first bit, and the selected weight elements can be called the first weight sub-elements.
[0063] For example, sparsification calculation is performed on the lower four bits (e.g., 4 bits) of the weight vector W to be calculated. Six numbers, 0, 1, 3, 4, 6, and 7, are selected from the lower four bits (e.g., 4 bits) of the weight element as the first weight sub-element, and the first index W[3:0] of the first weight sub-element is formed. This first index can be used to index the position of the first weight sub-element in the weight vector W to be calculated.
[0064] S203: Select a second number of weight elements from multiple weight elements of the second bit as second weight sub-elements, wherein each second weight sub-element has a corresponding second index.
[0065] For example, sparsification calculation is performed on the high four bits (e.g., 4 bits) of the weight vector W to be calculated. Two numbers, 1 and 3, are selected from the high four bits (e.g., 4 bits) of the weight element as the second weight sub-element, and the second index W[7:4] of the second weight sub-element is formed. This second index can be used to index the position of the second weight sub-element in the weight vector W to be calculated.
[0066] S204: The first weight sub-element and / or the second weight sub-element are used together as multiple weight sub-elements, wherein the first weight sub-element and the second weight sub-element belong to different groups of weight sub-vectors.
[0067] For example, the six numbers selected from the lower four digits (0, 1, 3, 4, 6, 7) can belong to one set of weighted subvectors, while the two numbers selected from the higher four weighted elements (1, 3) can belong to another set of weighted subvectors.
[0068] S205: Determine the feature vector to be calculated corresponding to the weight vector to be calculated.
[0069] After obtaining multiple sets of weight sub-vectors by sparsifying the weight vector to be calculated, the feature vector to be calculated corresponding to the weight vector to be calculated can be determined, that is, the feature vector to be calculated corresponding to the weight vector W to be calculated can be determined.
[0070] S206: Combine multiple sets of weight sub-vectors to obtain a combined weight vector, wherein the combined weight vector has the same information unit as the basic computing unit.
[0071] After obtaining multiple sets of weight sub-vectors by sparsifying the weight vector to be calculated and determining the feature vector to be calculated corresponding to the weight vector to be calculated, the multiple sets of weight sub-vectors can be combined, and the vector obtained by the combination process can be used as the combined weight vector.
[0072] For example, the 6 numbers selected from the high bits can be combined into a weighted sub-vector, and the 2 numbers selected from the low bits can be combined into another weighted sub-vector. The two weighted sub-vectors can then be combined, and the resulting vector can be used as the combined weighted vector.
[0073] S207: The basic calculation unit controls the selection of the first feature element corresponding to the first weighted sub-element from the feature vector to be calculated based on the first index.
[0074] The first index can be used to index the position of the first weight sub-element in the weight vector W to be calculated. The feature sub-element selected from the feature vector to be calculated based on the first index, which has the same position as the first weight sub-element, can be called the first feature sub-element.
[0075] S208: The control unit selects the second feature element corresponding to the second weight element from the first feature element according to the second index.
[0076] The second index can be used to index the position of the second weight sub-element in the weight vector W to be calculated. The feature sub-element selected from the first feature sub-element based on the second index, which has the same position as the second weight sub-element, can be called the second feature sub-element.
[0077] S209: Control the basic calculation unit to perform combination processing on the first feature sub-element and / or the second feature sub-element to obtain a combined feature vector.
[0078] For example, if a first feature element with the same position as the first weighted sub-element is selected from the feature vector to be calculated based on a first index, and a second feature element with the same position as the second weighted sub-element is selected from the first feature element based on a second index, then the first feature element and the second feature element can be directly combined to obtain a combined feature vector.
[0079] For example, if the first feature element with the same position as the first weighted sub-element is selected from the feature vector to be calculated based only on the first index (at this time, the number of selected first feature elements is several), then the several first feature elements can be combined to obtain a combined feature vector.
[0080] S210: Controls the basic computing unit to perform shift and add calculations on the combined weight vector and combined feature vector to obtain the sparse data processing result.
[0081] The aforementioned control base computing unit combines the first feature sub-element and / or the second feature sub-element to obtain a combined feature vector. Alternatively, the obtained combined feature vector can be made to have the same information unit as the base computing unit. Then, the control base computing unit can perform shift and addition calculations on the combined weight vector and the combined feature vector to obtain the sparse data processing result.
[0082] In some embodiments of this disclosure, controlling the basic computing unit to perform shift-add calculations on the combined weight vector and the combined feature vector to obtain the sparsified data processing result can involve determining the matching mode used when sparsifying the weight vector to be calculated, wherein the matching mode has a corresponding displacement operation amount, and controlling the basic computing unit to perform displacement operation processing on the combined feature vector according to the displacement operation amount to obtain the target feature vector, and controlling the basic computing unit to perform vector inner product operation on the combined weight vector and the target feature vector to obtain the sparsified data processing result.
[0083] For example, the ratio pattern is 8:6:2, which means that for 8-bit vector data (feature vector or weight vector to be calculated), 6 numbers are selected for the low bits and 2 numbers are selected for the high bits, or for example, 8:4:4, which is not restricted.
[0084] Specific calculation examples for the above steps can be illustrated as follows:
[0085] As Figure 3a and Figure 3b shown, Figure 3a is an application schematic diagram in an embodiment of the present disclosure, Figure 3b is another application schematic diagram in an embodiment of the present disclosure, Figure 3a describes the offline calculation process, Figure 3b describes the online calculation process, where W represents the weight vector to be calculated, F represents the feature vector to be calculated. Taking W and F as 8-bit for example, W can be split into 4-bit for sparsification processing.
[0086] In the offline calculation process:
[0087] Perform sparsification calculation on the lower 4 bits of W (where when performing sparsification calculations on the lower and higher bits respectively, the division between the lower and higher bits can be achieved through value range judgment. For maintaining precision, the division between the lower and higher bits can be fine-tuned), select a total of 6 numbers 0, 1, 3, 4, 6, 7 (the first weight sub-elements), and the shift amount (i.e., displacement operation amount) for each of the aforementioned numbers is 0.
[0088] Perform sparsification calculation on the higher 4 bits of W, select 2 numbers (the second weight sub-elements), and the shift amount (i.e., displacement operation amount) for each of the aforementioned numbers is 4.
[0089] Store the selected 4-bit numbers and the index index (the first index and / or the second index).
[0090] Select a total of 6 numbers 0, 1, 3, 4, 6, 7 (the first weight sub-elements), and combine with the 2 numbers selected from the higher 4 bits of W (the second weight sub-elements) to obtain W' (referred to as the combined weight vector).
[0091] In the online calculation process:
[0092] Perform data selection on the 8-bit feature vector F to be calculated, select 6 numbers (the first feature sub-elements) according to the index (the first index) of the lower 4 bits of the weight vector W to be calculated, and select 2 numbers (the second feature sub-elements) according to the index (the second index) of the higher 4 bits of the weight vector W to be calculated, and combine them to obtain F' (referred to as the combined feature vector). Then, calculate the inner product with shift for the combined weight vector W' and the combined feature vector F', that is, for i = 0, 1, 2,..., 7, calculate W'[i] * F'[i] << Shift[i] in sequence, and sum the 8 calculated numbers to obtain the sparsification data processing result.
[0093] The above scheme can be called 8:6:2, which means that for 8-bit vector data (feature vector or weight vector to be calculated), 6 numbers are selected for the low bits and 2 numbers are selected for the high bits.
[0094] like Figure 4a and Figure 4b As shown, Figure 4a This is another application illustration in the embodiments of this disclosure. Figure 4b This is another application illustration in the embodiments of this disclosure. Figure 4a The offline calculation process is described. Figure 4b The online computation process is described, where W represents the weight vector to be calculated and F represents the feature vector to be calculated. An example is given where both W and F are 8 bits; W can be split into 4-bit segments for sparsification. A special case is shown where the second stage selects the same number of corresponding data, called 8:4:4, meaning the low-order bits and high-order bits select the same number of corresponding data.
[0095] like Figure 5a and Figure 5b As shown, Figure 5a This is another application illustration in the embodiments of this disclosure. Figure 5b This is another application illustration in the embodiments of this disclosure. Figure 5a The offline calculation process is described. Figure 5b The online computation process is described, where W represents the weight vector to be calculated and F represents the feature vector to be calculated. An example is given where both W and F are 8 bits; W can be split into 4-bit segments for sparsity processing. A special case is shown where no data is selected in the second stage, referred to as 8:8:0, where 8 numbers are selected from the low bits and none from the high bits.
[0096] like Figure 6a and Figure 6b As shown, Figure 6a This is another application illustration in the embodiments of this disclosure. Figure 6b This is another application illustration in the embodiments of this disclosure. Figure 6a The offline calculation process is described. Figure 6b The online computation process is described, where W represents the weight vector to be calculated and F represents the feature vector to be calculated. An example is given where both W and F are 8 bits; W can be sparsified by splitting it into 4-bit segments. A special case, called 8:4:4, is shown, where the low and high bits select the same number of data points, and at least some of them do not correspond.
[0097] This embodiment supports a hardware module design for reconfigurable mixed-precision sparse computation, enabling variable-precision sparse computation. It allows computation to be completed using fewer basic computing units while maintaining model accuracy. Reconfigurability means that the bit ratio pattern can be changed during computation for different bit counts without requiring different hardware designs. A more suitable bit ratio pattern can be selected offline for W to adapt to different weight distributions. Mixed-precision computation is supported; the bit counts of W and F do not need to be identical, such as 8 bits x 8 bits, 4 bits x 8 bits, or 4 bits x 16 bits. The bit decomposition method for W can be 8 bits divided into 4 bits + 4 bits, or 2 bits + 2 bits + 2 bits + 2 bits. Similarly, if W is 16 bits, it can be divided into 8 bits + 8 bits, or 4 + 4 + 4 + 4, or 8 groups of 2 bits.
[0098] In this embodiment, multiple sets of weight sub-vectors are obtained, where the weight sub-vectors are obtained by sparsifying the weight vector to be calculated. The feature vector to be calculated corresponding to the weight vector to be calculated is determined, and the multiple sets of weight sub-vectors are combined to obtain a combined weight vector. The combined weight vector has the same information unit supported by the basic computing unit. The basic computing unit is controlled to perform shift and addition calculations on the combined weight vector and the feature vector to be calculated to obtain the sparsified data processing result. This can make full use of the weight distribution, support sparse calculation with variable precision, and effectively reduce the hardware cost required for sparse data processing while ensuring the accuracy of sparse data processing.
[0099] Figure 7 This is a schematic diagram of the structure of a sparse data processing device for a neural network processor according to an embodiment of the present disclosure.
[0100] The processor includes: basic computing units.
[0101] like Figure 7 As shown, the sparse data processing device 70 of the neural network processor includes:
[0102] The acquisition module 701 is used to acquire multiple sets of weight sub-vectors, wherein the weight sub-vectors are obtained by sparsifying the weight vectors to be calculated.
[0103] The determination module 702 is used to determine the feature vector to be calculated corresponding to the weight vector to be calculated.
[0104] The processing module 703 is used to combine multiple sets of weight sub-vectors to obtain a combined weight vector, wherein the combined weight vector has the same information unit as the basic computing unit.
[0105] The control module 704 is used to control the basic computing unit to perform shift and addition calculations on the combined weight vector and the feature vector to be calculated in order to obtain the sparse data processing result.
[0106] It should be noted that the foregoing explanation of the sparse data processing method for neural network processors also applies to the sparse data processing device for neural network processors in this embodiment, and will not be repeated here.
[0107] In this embodiment, multiple sets of weight sub-vectors are obtained, where the weight sub-vectors are obtained by sparsifying the weight vector to be calculated. The feature vector to be calculated corresponding to the weight vector to be calculated is determined, and the multiple sets of weight sub-vectors are combined to obtain a combined weight vector. The combined weight vector has the same information unit supported by the basic computing unit. The basic computing unit is controlled to perform shift and addition calculations on the combined weight vector and the feature vector to be calculated to obtain the sparsified data processing result. This can make full use of the weight distribution, support sparse calculation with variable precision, and effectively reduce the hardware cost required for sparse data processing while ensuring the accuracy of sparse data processing.
[0108] Figure 8 This is a schematic diagram of the structure of a neural network processor proposed in an embodiment of this disclosure.
[0109] like Figure 8 As shown, the neural network processor 80 includes: a processing unit 801 and a basic computing unit 802; wherein,
[0110] The processing unit 801 is used to obtain multiple sets of weight sub-vectors, wherein the weight sub-vectors are obtained by sparsifying the weight vector to be calculated, determine the feature vector to be calculated corresponding to the weight vector to be calculated, combine the multiple sets of weight sub-vectors to obtain a combined weight vector, wherein the combined weight vector has the same information unit supported by the basic calculation unit 802, and control the basic calculation unit 802 to perform shift and addition calculation on the combined weight vector and the feature vector to be calculated to obtain the sparsified data processing result.
[0111] Figure 9 A block diagram of an exemplary electronic device suitable for implementing embodiments of the present disclosure is shown. Figure 9 The electronic device 12 shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments disclosed herein.
[0112] like Figure 9 As shown, the electronic device 12 is represented in the form of a general-purpose computing device. The components of the electronic device 12 may include, but are not limited to: one or more neural network processors 16, system memory 28, and bus 18 connecting different system components (including system memory 28 and neural network processors 16).
[0113] Bus 18 represents one or more of several bus architectures, including a memory bus or memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local bus using any of the various bus architectures. Examples of these architectures include, but are not limited to, the Industry Standard Architecture (ISA) bus, the Micro Channel Architecture (MAC) bus, the Enhanced ISA bus, the Video Electronics Standards Association (VESA) local bus, and the Peripheral Component Interconnect (PCI) bus.
[0114] Electronic device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 12, including volatile and non-volatile media, removable and non-removable media.
[0115] Memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and / or cache memory 32. Electronic device 12 may further include other removable / non-removable, volatile / non-volatile computer system storage media. By way of example only, storage system 34 may be used to read and write non-removable, non-volatile magnetic media (… Figure 9 Not shown; usually referred to as a "hard drive".
[0116] although Figure 9 Not shown, a disk drive for reading and writing to a removable non-volatile disk (e.g., a "floppy disk") and an optical disc drive for reading and writing to a removable non-volatile optical disc (e.g., a compact disc read-only memory (CD-ROM), a digital video disc read-only memory (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 18 via one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to perform the functions of the embodiments of this disclosure.
[0117] A program / utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28. Such program modules 42 include, but are not limited to, an operating system, one or more application programs, other program modules, and program data. Each or some combination of these examples may include an implementation of a network environment. Program modules 42 typically perform the functions and / or methods described in the embodiments of this disclosure.
[0118] Electronic device 12 can also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), and with one or more devices that enable human interaction with electronic device 12, and / or with any device that enables electronic device 12 to communicate with one or more other computing devices (e.g., network card, modem, etc.). This communication can be performed via input / output (I / O) interface 22. Furthermore, electronic device 12 can also communicate with one or more networks (e.g., local area network (LAN), wide area network (WAN), and / or public networks, such as the Internet) via network adapter 20. As shown, network adapter 20 communicates with other modules of electronic device 12 via bus 18. It should be understood that, although not shown in the figures, other hardware and / or software modules can be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems.
[0119] The neural network processor 16 executes various functional applications and data processing by running programs stored in the system memory 28, such as implementing the sparse data processing method of the neural network processor mentioned in the foregoing embodiments.
[0120] To implement the above embodiments, this disclosure also proposes a non-transitory computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the sparse data processing method of the neural network processor as proposed in the foregoing embodiments of this disclosure.
[0121] To implement the above embodiments, this disclosure also proposes a computer program product that, when the instruction processor in the computer program product is executed, performs the sparse data processing method of the neural network processor as proposed in the foregoing embodiments of this disclosure.
[0122] It should be noted that in the description of this disclosure, the terms "first," "second," etc., are used for descriptive purposes only and should not be construed as indicating or implying relative importance. Furthermore, in the description of this disclosure, unless otherwise stated, "a plurality of" means two or more.
[0123] Any process or method description in the flowchart or otherwise herein can be understood as representing a module, segment, or portion of code comprising one or more executable instructions for implementing a particular logical function or process, and the scope of preferred embodiments of this disclosure includes additional implementations in which functions may be performed not in the order shown or discussed, including substantially simultaneously or in reverse order depending on the function involved, as will be understood by those skilled in the art to which embodiments of this disclosure pertain.
[0124] It should be understood that various parts of this disclosure can be implemented using hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented using software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.
[0125] Those skilled in the art will understand that all or part of the steps of the methods in the above embodiments can be implemented by a program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.
[0126] Furthermore, the functional units in the various embodiments of this disclosure can be integrated into a processing module, or each unit can exist physically separately, or two or more units can be integrated into a module. The integrated module can be implemented in hardware or as a software functional module. If the integrated module is implemented as a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.
[0127] The storage media mentioned above can be read-only memory, disk, or optical disk, etc.
[0128] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of this disclosure. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.
[0129] Although embodiments of the present disclosure have been shown and described above, it is to be understood that the above embodiments are exemplary and should not be construed as limiting the present disclosure. Those skilled in the art can make changes, modifications, substitutions and variations to the above embodiments within the scope of the present disclosure.
Claims
1. A sparse data processing method for a neural network processor, characterized in that, The neural network processor includes: a basic computing unit, and the method includes: Determine multiple weight elements of first bits and / or multiple weight elements of second bits from the weight vector to be calculated, wherein the first bits are lower than the second bits; A first number of weight elements are selected as first weight sub-elements from the weight elements of the plurality of first bits, wherein each first weight sub-element has a corresponding first index; and / or A second number of weight elements are selected as second weight sub-elements from the plurality of weight elements of the second bit, wherein each second weight sub-element has a corresponding second index; and The first weight sub-element and / or the second weight sub-element are used together as multiple weight sub-elements to obtain multiple sets of weight sub-vectors, wherein the first weight sub-element and the second weight sub-element belong to different sets of weight sub-vectors, and the weight sub-vectors are obtained by sparsifying the weight vector to be calculated. Determine the feature vector to be calculated corresponding to the weight vector to be calculated; The multiple sets of weight sub-vectors are combined to obtain a combined weight vector, wherein the combined weight vector has the same information unit supported by the basic computing unit; and The basic computing unit is controlled to select a first feature sub-element corresponding to the first weight sub-element from the feature vector to be calculated based on the first index; and / or The basic computing unit is controlled to select a second feature sub-element corresponding to the second weight sub-element from the first feature sub-element according to the second index; The basic computing unit is controlled to perform combination processing on the first feature sub-element and / or the second feature sub-element to obtain a combined feature vector; The basic computing unit is controlled to perform shift and addition calculations on the combined weight vector and the combined feature vector to obtain the sparse data processing result.
2. The method as described in claim 1, characterized in that, The weight sub-vector includes multiple weight sub-elements, which are obtained by sparsening the weight elements in the weight vector to be calculated.
3. The method as described in claim 1, characterized in that, The control of the basic computing unit to perform shift-add calculations on the combined weight vector and the combined feature vector to obtain the sparsified data processing result includes: Determine the matching mode used when sparsifying the weight vector to be calculated, wherein the matching mode has a corresponding displacement operation amount; The basic computing unit is controlled to perform displacement operations on the combined feature vector according to the displacement operation quantity to obtain the target feature vector; The basic computing unit is controlled to perform a vector inner product operation on the combined weight vector and the target feature vector to obtain the sparse data processing result.
4. The method according to any one of claims 1-3, characterized in that, There are multiple basic computing units, and different basic computing units support the same information units.
5. A sparse data processing device for a neural network processor, characterized in that, The neural network processor includes: a basic computing unit, the device being used to implement the sparse data processing method of the neural network processor as described in claim 1, the device comprising: The acquisition module is used to acquire multiple sets of weight sub-vectors, wherein the weight sub-vectors are obtained by sparsifying the weight vector to be calculated; A determination module is used to determine the feature vector to be calculated corresponding to the weight vector to be calculated; The processing module is used to combine the multiple sets of weight sub-vectors to obtain a combined weight vector, wherein the combined weight vector has the same information unit supported by the basic computing unit; and The control module is used to control the basic computing unit to perform shift and addition calculations on the combined weight vector and the feature vector to be calculated, so as to obtain the sparse data processing result.
6. A neural network processor, characterized in that, The neural network processor includes a processing unit and a basic computing unit, wherein the neural network processor is used to implement the sparse data processing method of the neural network processor as described in claim 1; wherein, The processing unit is configured to acquire multiple sets of weight sub-vectors, wherein the weight sub-vectors are obtained by sparsifying the weight vector to be calculated, determine the feature vector to be calculated corresponding to the weight vector to be calculated, combine the multiple sets of weight sub-vectors to obtain a combined weight vector, wherein the combined weight vector has the same information unit supported by the basic computing unit, and control the basic computing unit to perform shift and addition calculations on the combined weight vector and the feature vector to be calculated to obtain the sparsified data processing result.
7. An electronic device, characterized in that, include: At least one neural network processor; as well as A memory communicatively connected to the at least one neural network processor; wherein, The memory stores instructions that can be executed by the at least one neural network processor to enable the at least one neural network processor to perform the method of any one of claims 1-4.
8. A non-transitory computer-readable storage medium storing computer instructions, characterized in that, in, The computer instructions are used to cause the computer to perform the method according to any one of claims 1-4.
9. A computer program product, characterized in that, Includes a computer program that, when executed by a processor, implements the steps of the method according to any one of claims 1-4.