A neural network pruning method and device, electronic equipment and storage medium

By detecting and removing silent channels in a neural network and replacing their outputs with constant values, the problem of long pruning and compression time and high cost in existing technologies is solved, achieving efficient neural network pruning and compression, and improving computational efficiency and accuracy.

CN116992938BActive Publication Date: 2026-06-19VERISILICON MICROELECTRONICS (SHANGHAI) CO LTD +5

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
VERISILICON MICROELECTRONICS (SHANGHAI) CO LTD
Filing Date
2023-08-08
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies for pruning and compressing neural network models are time-consuming and labor-intensive, resulting in low computational efficiency.

Method used

By detecting and removing silent channels in the neural network (neurons that are not activated by data), replacing the output of silent channels with constant values, and deleting the entire operator and upstream operators when necessary, redundant computations are reduced.

🎯Benefits of technology

It improves the computational efficiency and accuracy of neural networks, reduces the time and cost of pruning and compression, and eliminates the need to retrain the neural network.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116992938B_ABST
    Figure CN116992938B_ABST
Patent Text Reader

Abstract

The application provides a neural network pruning method and device, electronic equipment and storage medium, and relates to the field of neural networks. The neural network pruning method comprises: detecting a silent channel in a neural network to be pruned, the silent channel being a neuron in the neural network that is not activated by data; and deleting the silent channel from the neural network. The above-mentioned neural network pruning method does not change the deployment and operation result of the neural network model, and does not require retraining. In the case of not affecting the operation accuracy of the neural network, the neural network is pruned and compressed, and the time consumption and cost of neural network pruning and compression are reduced.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of neural networks, and more specifically, to a neural network pruning method, apparatus, electronic device, and storage medium. Background Technology

[0002] In neural network models, a large number of neurons are used to improve computational accuracy. However, in practical applications, some neurons may never be used. These unused neurons lead to redundancy in the neural network model, affecting its computational efficiency. Therefore, the computational performance of the neural network model can be improved by removing these redundant neurons, i.e., by pruning and compressing the neural network.

[0003] Currently, among the various methods for pruning and compressing neural network models, it is usually necessary to retrain the neural network model, or change the data processing method, or change the neural network deployment method. Such methods are time-consuming and labor-intensive for pruning and compressing neural network models. Summary of the Invention

[0004] In view of this, this application aims to provide a neural network pruning method, apparatus, electronic device, and storage medium to improve the efficiency and reduce the cost of pruning and compressing neural network models.

[0005] In a first aspect, embodiments of this application provide a neural network pruning method, comprising: detecting silent channels in a neural network to be pruned, wherein the silent channels are neurons in the neural network that are not activated by data; and deleting the silent channels from the neural network.

[0006] In this embodiment, removing silent channels from the neural network reduces redundancy, decreases the computation of silent channels, and improves the network's computational power, thus achieving pruning and compression. Since silent channels are neurons in the neural network that are not activated by data, their removal does not affect the original deployment and computation of the neural network, has minimal impact on computational accuracy (even zero impact), and eliminates the need for retraining. Therefore, it effectively improves the efficiency and reduces the cost of pruning and compressing the neural network.

[0007] In one embodiment, detecting a silent channel in the neural network includes: obtaining the channel output range of the channel under test; the channel under test is any channel in the neural network; comparing the channel output range with a preset range; if the channel output range is outside the preset range or the channel output range is a fixed value, then determining that the channel under test is the silent channel.

[0008] In this embodiment, the output range of the silent channel is usually fixed. Therefore, the silent channel can be determined by determining the output range of the channel under test and comparing it with a preset channel. This method of determining the channel output range is easy to implement, improves the efficiency of determining the silent channel, and reduces the time cost of neural network pruning and compression.

[0009] In one embodiment, the neural network includes a first operator and a second operator, wherein the first operator and the second operator are any two consecutive operators in the neural network, and the input data of the second operator is the output data of the first operator; the first operator includes a plurality of the tested channels; before comparing the channel output range with a preset range, the method further includes: detecting the operator input range of the second operator; and determining the operator input range of the second operator as the preset range corresponding to each tested channel in the first operator.

[0010] In this embodiment, regardless of the value of the silent channel output, after the output is passed to the next operator, if the next operator needs to perform normal calculation, the output should be within the input range of the next operator; otherwise, the next operator cannot perform normal calculation. Therefore, for two consecutive operators (such as the first operator and the second operator), the input range of the later operator is taken as the preset range corresponding to the earlier operator; that is, the input range of the second operator can be taken as the preset range corresponding to the first operator. This method is simple and easy to implement, reduces the difficulty of determining the preset range, and improves the efficiency of neural network pruning and compression.

[0011] In one embodiment, the neural network includes multiple operators, each operator including multiple channels, and the step of deleting the silent channel from the neural network includes: for any silent channel, determining the target endpoint value of the channel output range that is closest to the two endpoint values ​​of the preset range corresponding to the silent channel; and configuring the silent channel as the target endpoint value in the operator where the silent channel is located.

[0012] In this embodiment, if the channel under test is a silent channel, the output of the channel will not change regardless of the data input. It is understood that neural networks have high requirements for computational precision, and can only be used when the data remains completely unchanged. Therefore, a constant value can be used to replace the computation of the silent channel to achieve pruning and compression of the neural network. Replacing the silent channel with a constant value does not change the operation method of the operator containing the silent channel, and the deployment of the neural network remains unchanged. Therefore, this method eliminates the need to retrain the neural network when deleting the silent channel, effectively improving the efficiency of neural network pruning and compression. Furthermore, if the value used to replace the silent channel is unreasonable, it may cause abnormal operation of the operator. Using the target endpoint value as a constant ensures that the constant does not affect the normal operation of the neural network after replacing the silent channel.

[0013] In one embodiment, the neural network includes multiple operators, each operator including multiple channels, and the step of deleting the silent channel from the neural network includes: if all channels of any operator are silent channels, then the operator is deleted.

[0014] In this embodiment, all channels with any operator are silent channels. Therefore, the operations performed by the operator are meaningless operations. The operator can be deleted to reduce the amount of computation in the neural network and improve the computational performance of the neural network.

[0015] In one embodiment, deleting the operator includes: determining the target endpoint value corresponding to each of the silent channels in the operator; the target endpoint value is the target endpoint value of the channel output range that is closest to the two endpoint values ​​of the preset range corresponding to the silent channel; and configuring the target endpoint value corresponding to each of the silent channels as the input value of the next operator of the operator.

[0016] In this embodiment of the application, when deleting an operator, the target endpoint value corresponding to each silent channel is used as the output of the operator. Compared with taking any constant as the output of the operator, this makes each determined operator more consistent with the actual output result of the operator, thereby reducing the impact of operator deletion on the accuracy of the neural network.

[0017] In one embodiment, deleting the operator includes deleting the operator and all upstream operators of the operator, wherein the upstream operators are operators in the neural network that are computed before the operator.

[0018] In this embodiment of the application, if all channels of an operator are silent channels, deleting the operator indicates that the output result of the operator is fixed. In this case, the operation performed by the upstream operator of the operator is meaningless. Deleting the upstream operator can reduce the operation of the upstream operator in the neural network, reduce the amount of computation of the neural network, and improve the computational efficiency of the neural network.

[0019] Secondly, embodiments of this application provide a neural network pruning device, comprising: a detection module for detecting silent channels in a neural network, wherein the silent channels are neurons in the neural network that are not activated by data; and a deletion module for deleting the silent channels from the neural network.

[0020] Thirdly, embodiments of this application provide an electronic device, including a memory and a processor, wherein the memory stores computer-readable instructions, which are executed by the processor to perform the method as described in the first aspect.

[0021] Fourthly, embodiments of this application provide a computer-readable storage medium storing a computer program that, when run on a computer, causes the computer to perform the method described in the first aspect. Attached Figure Description

[0022] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments of this application will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0023] Figure 1 A flowchart illustrating a neural network pruning method provided in one embodiment of this application;

[0024] Figure 2 A flowchart for determining a silent channel by backtracking is provided as an embodiment of this application;

[0025] Figure 3 This is a schematic diagram of a neural network pruning device provided in an embodiment of this application;

[0026] Figure 4 This is a schematic diagram of an electronic device provided in an embodiment of this application.

[0027] Icons: Neural network pruning device 200; detection module 210; deletion module 220; electronic device 300; processor 310; memory 320. Detailed Implementation

[0028] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0029] First, the application scenarios of the neural network pruning method in this application are explained. In some devices that require neural network models, certain functions are typically implemented using these models. However, neural network models require a significant amount of computation. If the device's hardware performance is limited, the neural network model can affect the normal operation of the device, or even prevent the device or the neural network model from functioning properly. In fact, during the use of a neural network model, many neurons may never be usable. The use of these neurons consumes device hardware performance; therefore, pruning of the neural network model is usually necessary. The neural network pruning method provided in this application is used to prune the neural network model to reduce its size and computational load, thereby improving the performance of the device hosting the neural network model.

[0030] Please see Figure 1 , Figure 1 This is a flowchart illustrating a neural network pruning method provided in an embodiment of this application. The neural network pruning method includes:

[0031] S110, detects silent channels in the neural network to be pruned.

[0032] A neural network comprises a large number of neurons. When a neuron receives a stimulus signal, it transmits the signal to other neurons in the network. However, some neurons in a neural network may never be activated by a signal and continue to transmit signals. The computational effectiveness of these neurons is extremely low, failing to improve the computational accuracy of the neural network. In fact, because these neurons need to perform calculations every time they are used, the computational efficiency of the neural network may decrease, affecting its performance. In this embodiment, these neurons that are not activated by data are defined as "dead neurons," or silent channels.

[0033] A neural network to be pruned will include multiple pre-trained and deployed operators. The operators in a neural network are used to perform pre-defined function operations on the data input to that operator. These operators include, but are not limited to, linear functions for linear operations, pooling layers, convolutional layers, or functions for normalization. There can be various types of operators in a neural network, which will not be elaborated upon here. The operators are interconnected. Each operator can acquire input data, perform its own designed operations, and output the result. For example, in a neural network with three operators X, Y, and Z connected sequentially, operator X performs an operation on the input data `data` to obtain the result `x`. Then, operator X can output the result `x` to operator Y, so that operator Y can perform an operation on `x` to obtain the result `y`. Next, operator Y can output the result `y` to operator Z for further operations, and so on.

[0034] Each operator includes multiple operation channels, each performing different operations. Finally, the results of each channel are multiplied by a certain weight and summed to obtain the result of the operator. For example, if operator M includes three operation channels, W1, W2, and W3, and the results of each channel are w1, w2, and w3 respectively, then the output of operator M is m = a × w1 + b × w2 + c × w3 + d, where a, b, and c represent different coefficients or weights, and d is a constant. In other words, the output data of the operator consists of the operation results of each channel.

[0035] In this embodiment, each channel in the aforementioned neural network operator can be detected to determine whether it is a silent channel. If a channel is not activated by data, then the channel is a silent channel.

[0036] Neural network models are typically built to meet specific needs. Training a neural network model enables it to accurately process input data and produce output data that meets those needs. Therefore, in a trained neural network, the output range of each operator is usually limited to a fixed range. If the output data of a certain channel is abnormal—for example, regardless of the input data, if the output data of a certain channel exceeds the preset output range of the operator to which that channel belongs or is a fixed value—it can be determined that the channel will not be activated by data; that is, the channel is a silent channel.

[0037] Based on this, in some optional embodiments of this application, detecting a silent channel in a neural network may include: obtaining the channel output range of the channel under test, comparing the channel output range with a preset range; if the channel output range is outside the preset range or the channel output range is a fixed value, then the channel under test can be determined to be a silent channel.

[0038] In this embodiment, when pruning and compressing the neural network, the silent channels in the neural network can be determined as comprehensively as possible. Therefore, the channel to be tested can be any channel in the neural network, and all channels are detected to determine whether each channel is a silent channel.

[0039] In this embodiment, obtaining the channel output range of the tested channel can be achieved by inputting multiple data values ​​within the input data range to the channel and obtaining the output result of the tested channel to obtain the channel output range. For example, if the input data range of the operator to which a tested channel belongs is [-1, 5], then multiple data values ​​in the range [-1, 5] can be input to the tested channel, such as -1, 1.5, 3, 5, etc. The input data can be integers or non-integers, as long as they are within the allowed input range. Then, the output data of the channel is obtained and combined to form the output range.

[0040] In some embodiments, the corresponding output range can also be determined according to the calculation formula corresponding to the channel being tested. For example, for the operator Convolution (convolutional layer), the corresponding range calculation formula can be:

[0041] Max(R (c) )=∑ x>0,w>0 InputRange (c) *w (c) +∑ x<0,w<0 InputRange (c) *w (c) +b (c)

[0042] Min(R (c) )=∑ x<0,w>0 InputRange (c) *w (c) +∑ x<0,w>0 InputRnage (c) *w (c) +b (c)

[0043]

[0044]

[0045] Where R represents the input range, R' represents the preset range or the input range of the current operator, c represents the channel in the operator Convolution, x represents the input range, InputRange represents the input data of channel c, and Max(R) represents the input range. (c) ) represents the maximum input range of channel c, Min(R) (c) ) represents the minimum input range of channel c, w (c)The weight parameter represents channel c, and b is the offset parameter. OutputRange min OutputRange represents the minimum output value of channel c. max This represents the maximum output value of channel c. OutputRange min and OutputRange max This constitutes the output range.

[0046] In the above formula, the theoretical range of the output channels of the convolution operator is first calculated, and then shrunk by the theoretical range of the input channels of the next layer operator. This shrinkage is not a simple intersection; there are three possible outcomes: First, when the output channel range of the convolution is equal to or exceeds the input channel range of the next operator, the shrunk channel range should be the extreme value of the input channel range of the next operator (either the maximum or minimum value). In this case, the minimum value of the channel equals the maximum value, and the channel range equals a fixed value. If this fixed value is the extreme value of the input channel of the next operator, then the channel is a silent channel. Second, when the minimum value of the convolution output channel range itself equals the maximum value and is within the input channel range of the next operator, the channel range remains unchanged. However, since the minimum value of the channel range equals the maximum value and is equal to a fixed value, the channel is still a silent channel. Third, if the convolution output channel range and the input channel range of the next operator intersect, and the minimum value of the shrunk channel range is less than the maximum value, then the channel is not a silent channel. Here, the output range of the current operator is the input range of the next operator.

[0047] The above calculation formula is only an example and is not intended to limit this application. Different operators may have different operation methods. When calculating the output range of the channel, you can refer to the operation methods configured for different operators, which will not be elaborated here.

[0048] After obtaining the channel output range of the channel under test, the channel output range can be compared with the preset range to determine whether the channel under test is a silent channel.

[0049] In this embodiment, the preset range can be the normal output range of a channel. If the output range of the channel under test matches the preset range, the channel under test is a normal channel; otherwise, if the output range of the channel under test does not match the preset range, the channel under test is silent. For example, if the normal output range of a channel is [1, 10], and the output range of the channel under test is also [1, 10], then the channel under test is a normal channel. If the output range of the channel under test is outside [1, 10], such as [-1, 0], [11, 12], then the channel under test is a silent channel. The above are merely examples and are not intended to limit this application.

[0050] In one embodiment, the preset range can also be determined by: detecting the operator input range of the second operator; and determining the operator input range of the second operator as the preset range corresponding to each tested channel in the first operator.

[0051] In this embodiment, the first operator and the second operator are any two consecutive operators in the neural network, and the input data of the second operator is the output data of the first operator; the first operator may include one or more channels under test.

[0052] Since the output range of an operator is fixed, the output range of a certain operator will not exceed the input range of the next operator connected to it. For example, among two connected operators, one operator includes an X channel and the other operator includes a Y channel. The operator that includes the Y channel is the next operator that includes the X channel. If X is a normal channel, the output range [x1, x2] of the X channel will not always exceed the input range [y1, y2] of Y. That is, [x1, x2] and [y1, y2] have an intersection, so channel X is a normal channel. Conversely, if [x1, x2] and [y1, y2] have no intersection or the intersection is at an extreme value (or endpoint), then channel X is a silent channel.

[0053] Therefore, in this embodiment, when there are two consecutive operators, and the output range of a certain channel of the first operator always reaches the extreme value of the operator input range of the second operator, then that channel can be determined as a silent channel. Correspondingly, the input range of the second operator can be used as a preset range for determining the silent channel in the first operator. Specifically, when detecting the operator input range of the second operator, if the second operator is an operator with a fixed range, the operator input range can be directly obtained. If the second operator is a specific type of operator, such as a linear scaling operator or a convolution operator, the input range can be derived in reverse from the output range. If the second operator is an operator whose range is neither derivable nor known, its input and output range can be set to positive or negative infinity.

[0054] For ease of understanding, an embodiment is provided herein for illustration, which is not intended to limit the scope of this application.

[0055] A neural network includes operators ConvA, ReLU6, and ConvB, where ConvA and ConvB are different convolutional layers, and ReLU6 is a variant of the linear rectified unit. The output of ConvB is the input value of ReLU6, and the output of ReLU6 is the input value of ConvA, which can be represented as ConvA(ReLU6(ConvB(data))), where data is the input data, and the input and output ranges of ReLU6 are both [0, 6].

[0056] If ConvB includes two channels, w1 and w2, and the output range of w1 is [-1, 6], while the output range of w2 is [7, 8], then the portion of w1's output range, [0, 6], falls within the input range of the ReLU6 operator. In this case, channel w1 is a normal channel. However, the output range of w2 is always outside the output range of the ReLU6 operator, so channel w2 can be determined as a silent channel. The above is merely an example and is not intended to limit this application.

[0057] The input range of an operator can be derived by inversely from its output range. The calculation method for the output range of a channel is similar and will not be elaborated further here. For some operators, their output range is a fixed value. For example, the ReLU6 operator truncates the output of the activation function to 6 when it is greater than 6, meaning the output range of ReLU6 is [0, 6]. The TanH operator has an output range of [-1, 1] and an input range of [-3.2, 3.2]. The Sigmoid operator has an output range of [0, 1] and an input range of [-6.3, 6.3]. For operators with fixed output or input ranges, the corresponding input and output ranges can be directly obtained. The above are just examples; other operators may also have fixed output ranges, which will not be discussed further here.

[0058] In an alternative embodiment, the silent channel can also be determined by backtracking, such as... Figure 2 As shown, the backtracking process includes: S112, determining the input range and initial output range of each operator; S113, for each operator, updating the input range of the next operator to the preset range of this operator; S114, determining the silent channel based on the updated preset range and the initial output range of each operator.

[0059] For example, consider the computational stream ReLU6(Concat(ConvA(data1), ConvB(data2))). Assume that during the first forward pass, the output range of ConvA is [-6, 1], the output range of ConvB is [-1, 8], the input range of Concat is [-6, 8], and the output and input ranges of ReLU6 are both [0, 6]. The output ranges of each operator during the first forward pass are the initial output ranges. During the backtracking process, the preset ranges of Concat can be updated to [0, 6], ConvA to [0, 1], and ConvB to [0, 6]. At this point, for ConvA, the channel whose output range is always less than or equal to 0 or greater than 1 is a silent channel; for ConvB, the channel whose output range is always less than or equal to 0 or greater than or equal to 6 is a silent channel. For example, if the output range of ConvA is reduced to [0, 0] after reducing the output range of ConvA and its preset output range, meaning the maximum and minimum values ​​are equal and a fixed value of 0, then this channel can also be identified as a silent channel. Similarly, if the output range of ConvA is [6, 6] and the output range of ConvB is [0, 0] or [6, 6], then ConvA and ConvB can also be identified as silent channels.

[0060] By using backtracking, silent channels can be identified relatively quickly without having to repeat the identification process multiple times for each operator or channel.

[0061] S120 removes the silent channel from the neural network.

[0062] In a neural network, if a silent channel exists, since the silent channel itself is not activated by data, the output of the channel will not change significantly regardless of the data input to the channel, nor will it affect the operation of the operator to which it belongs. Therefore, deleting the silent channel will not affect the computational accuracy of the neural network.

[0063] It is understandable that if a channel is a silent channel, its output range will always exceed the extreme value of the next operator's output range, such as being greater than the maximum extreme value of the next operator or less than the minimum extreme value of the next operator. When the output value of this channel is passed to the next operator for calculation, regardless of the input value, it will be limited to the output range of the next operator. Therefore, the calculation performed by the silent channel is meaningless, and its output result will not change the output result of subsequent operators. Therefore, when calculating the output result of the operator, the data of the silent channel can be configured as a constant, so that the silent channel does not perform calculations, reducing the amount of computation and improving the computational performance of the neural network.

[0064] Therefore, in one embodiment, deleting a silent channel may include: for any silent channel, determining the target endpoint value that is closest to the channel output range of the silent channel among the two endpoint values ​​of the preset range corresponding to the silent channel; and configuring the silent channel as the target endpoint value in the operator where the silent channel is located.

[0065] For example, if the output range of a certain silent channel is [-2, -1], and the preset range is [0, 6], then the endpoint value of the channel whose preset range is closest to the output range of the silent channel is 0, and 0 can be used as the target endpoint value. As another example, if the output range of a certain silent channel is [6, 8], and the preset range is [0, 6], then the target endpoint value is 6, and so on.

[0066] The output range of a silent channel exceeds the input range of the next operator. During the operation of the next operator, the data transmitted by the silent channel is limited to its own input range. Therefore, any reasonable constant can be used to determine the result of the silent channel operation, as long as the output range of the operator containing the silent channel does not exceed the input range of the next operator. In this case, it is feasible to use the endpoint of the output range of the next operator as the output result of the silent channel of this operator. That is, the constant can be taken according to the extreme value of the input range of the next operator. If the result of the current operator operation is greater than the maximum endpoint value of the input range of the next operator, the maximum endpoint value is taken as the constant. If the result of the current operator operation is less than the minimum endpoint value of the input range of the next operator, the minimum endpoint value is taken as the constant. Neither of these will affect the normal output of the next operator.

[0067] After determining the target endpoint value, the silent channel can be configured as the target endpoint value in the operator containing the silent channel.

[0068] For example, if an operator M includes three channels, W1, W2, and W3, and the outputs of each channel are w1, w2, and w3 respectively, then the output of the operator is m = a × w1 + b × w1 + c × w1 + d, where a, b, c, and d are known constants. If channels W2 and W3 are silent channels, and their corresponding endpoint values ​​are both 6, then after configuring the silent channels as the target endpoint values, the output of the operator is m = a × w1 + b × 6 + c × 6 + d, where b × 6 + c × 6 + d are constants. Therefore, when using this operator for subsequent calculations, only the calculation of channel W1 needs to be considered, effectively reducing the amount of data computation.

[0069] In one embodiment, if all channels of any operator are silent channels, then the operator is deleted.

[0070] In this embodiment, if all channels of an operator are silent channels, the output of the silent channel can be considered constant. As a channel with a constant output, it does not require computational resources to perform calculations. Therefore, the calculations performed by this operator are meaningless and cannot improve the accuracy of the data network model. Since an operator with a constant output does not require computational resources to perform calculations, this operator can be deleted like a silent channel, so that the data is not processed by this operator, reducing the computational load of the neural network and improving the computational efficiency of the neural network.

[0071] As an optional implementation, the deletion operator process includes: determining the target endpoint value corresponding to each silent channel in the operator; and configuring the target endpoint value corresponding to each silent channel as the input value of the next operator of the operator.

[0072] The target endpoint value can be the target endpoint value that is closest to the channel output range of the silent channel among two endpoint values ​​within a preset range corresponding to the silent channel. The target endpoint value can be referred to in the aforementioned embodiments, and will not be repeated here.

[0073] Since all channels are silent channels, each silent channel can determine its corresponding target endpoint value, and the output of the operator can be configured with the target endpoint value corresponding to each silent channel, that is, the input value of the next operator can be configured.

[0074] For example, if an operator M includes three channels, W1, W2, and W3, and the outputs of each channel are w1, w2, and w3 respectively, then the output m of the operator is m = a × w1 + b × w1 + c × w1 + d, where a, b, c, and d are known constants. If all three channels are silent channels and their corresponding endpoint values ​​are all 6, then after configuring the silent channels as the target endpoint values, the output m of the operator is m = a × 6 + b × 6 + c × 6 + d. Therefore, the output m of this operator is a constant, which can be configured as the output of this operator and used as the input value for the next operator in calculations.

[0075] By determining the target endpoint value corresponding to each silent channel and then configuring the target endpoint value as the output of the operator, the configured output result can be more consistent with the actual operation result of the operator, thereby reducing the impact of deleting the operator on the accuracy of the neural network.

[0076] In some embodiments, if the output range of the operator to be deleted is outside the input range of the next operator, the endpoint value closest to the output range of the operator to be deleted can be selected from the two endpoints of the input range of the next operator, and the endpoint value can be configured as the output result of the replacement operator and output to the next operator for operation.

[0077] In one embodiment, when deleting an operator, the operator and all its upstream operators can be deleted.

[0078] In this embodiment, the upstream operator is the operator that is computed before the operator in the neural network. For example, for the operator combination ReLU6(Concat(ConvA(data1), ConvB(data2))), ConvA and ConvB are the upstream operators of Concat.

[0079] If all channels of Concat are silent, the output of Concat will be a fixed value. Therefore, regardless of the output values ​​of ConvA and ConvB, and regardless of whether any calculations are performed on ConvA and ConvB, it will be meaningless. Thus, ConvA and ConvB can be deleted, and the combined Concat(ConvA(data1), ConvB(data2)) can be configured as a constant for subsequent calculations. The above is merely an example and is not intended to limit this application.

[0080] Operations in neural networks are performed layer by layer. That is, after the current operator completes its operation, it outputs the result to the next operator for operation. If the upstream operator is not deleted, the upstream operator will still perform operations, but the operations performed will be meaningless, which may waste the computing power of the neural network. Therefore, deleting the upstream operator can improve the computing efficiency of the neural network without affecting the deployment, results, and accuracy of the neural network operations.

[0081] Furthermore, if the silent channels are determined by the backtracking method mentioned in the aforementioned embodiments, the order of determining silent channels during the backtracking process is reverse judgment, that is, first downstream operators, then upstream operators. In this case, after determining that all channels of an operator are silent channels, there is no need to judge the silent channels of subsequent operators (here, subsequent operators refer to upstream operators), thereby improving the efficiency of neural network pruning and compression.

[0082] In this embodiment, removing silent channels from the neural network reduces redundancy, decreases the computation of silent channels, and improves the network's computational power. Since silent channels are neurons in the neural network that are not activated by data, their removal does not affect the original deployment and computation of the neural network. Furthermore, there is no need to retrain the neural network after deletion, effectively improving the efficiency and cost of pruning and compressing the neural network.

[0083] Furthermore, regarding the model described in the aforementioned neural network pruning method, in some embodiments, the model can be a model from different domains used to perform corresponding tasks. The model can be applied to multiple domains. These domains include, but are not limited to, image processing, speech processing, and NLP (Natural Language Processing), etc.

[0084] Accordingly, the input tensor of the model can be the data to be processed in the relevant domain. For example, the input tensor can include, but is not limited to, the image to be processed, the speech to be processed, the text to be processed, and so on.

[0085] Please see Figure 3 , Figure 3 This is a schematic diagram of a neural network pruning device 200 provided in an embodiment of this application. The neural network pruning device 200 includes a detection module 210 and a deletion module 220.

[0086] The detection module 210 is used to detect silent channels in a neural network, which are neurons in the neural network that are not activated by data.

[0087] Deletion module 220 is used to remove silent channels from the neural network.

[0088] In one embodiment, the detection module 210 is further configured to obtain the channel output range of the channel under test; the channel under test is any channel in the neural network; the channel output range is compared with a preset range; if the channel output range is outside the preset range or the channel output range is a fixed value, then the channel under test is determined to be a silent channel.

[0089] In one embodiment, the neural network includes a first operator and a second operator, which are any two consecutive operators in the neural network, and the input data of the second operator is the output data of the first operator; the first operator includes multiple test channels. The detection module is further used to detect the operator input range of the second operator; and to determine the operator input range of the second operator as a preset range corresponding to each test channel in the first operator.

[0090] In one embodiment, the neural network includes multiple operators, each operator including multiple channels. The deletion module 220 is further configured to, for any silent channel, determine the target endpoint value of the channel output range that is closest to the two endpoint values ​​of the preset range corresponding to the silent channel; and configure the silent channel as the target endpoint value in the operator where the silent channel is located.

[0091] In one embodiment, the deletion module 220 is further configured to delete an operator if all channels of any operator are silent channels.

[0092] In one embodiment, the deletion module 220 is further configured to determine the target endpoint value corresponding to each silent channel in the operator; the target endpoint value is the target endpoint value of the channel output range that is closest to the two endpoint values ​​of the preset range corresponding to the silent channel; and configure the target endpoint value corresponding to each silent channel as the input value of the next operator of the operator.

[0093] In one embodiment, the deletion module 220 is further configured to delete the operator and all upstream operators of the operator, wherein the upstream operators are operators in the neural network that are computed before the operator.

[0094] The neural network pruning device 200 provided in this application has similar functions to the neural network pruning method provided in the foregoing embodiments. For the sake of brevity, it will not be elaborated here. The functions implemented by the neural network pruning device 200 can be referred to the foregoing neural network pruning method.

[0095] Please refer to Figure 4 , Figure 4 This application also provides a schematic diagram of an electronic device 300, which can serve as the execution subject of the aforementioned neural network pruning method, including: a processor 310 and a memory 320, wherein the processor 310 and the memory 320 are communicatively connected.

[0096] The memory 320 stores computer-readable instructions that can be executed by the processor 310, enabling the processor 310 to perform the neural network pruning method in the foregoing embodiments.

[0097] The processor 310 and memory 320 are connected, but are not limited to, via a communication bus.

[0098] Processor 310 can be an integrated circuit chip with signal processing capabilities. Processor 310 can be a general-purpose processor, including a CPU (Central Processing Unit), NP (Network Processor), etc.; it can also be a digital signal processor, application-specific integrated circuit, off-the-shelf programmable gate array, or other programmable logic device or transistor logic device, or discrete hardware component. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor can be a microprocessor or any conventional processor.

[0099] The memory 320 may include, but is not limited to, RAM (Random Access Memory), ROM (Read Only Memory), PROM (Programmable Read-Only Memory), EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electric Erasable Programmable Read-Only Memory), etc.

[0100] It is understood that the electronic device 300 may also include more general modules required by itself, which will not be described one by one in the embodiments of this application.

[0101] Based on the same inventive concept, embodiments of this application also provide a computer-readable storage medium storing a computer program thereon, which, when run on a computer, causes the computer to perform the methods provided in the above embodiments.

[0102] The computer-readable storage medium can be any available medium that a computer can access, or a data storage device such as a server or data center that integrates one or more available media. The available medium can be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs (digital video discs)), or semiconductor media (e.g., SSDs (solid state disks)).

[0103] If the neural network pruning method is implemented as a software functional module and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, external hard drives, ROM, RAM, magnetic disks, or optical disks.

[0104] In the embodiments provided in this application, it should be understood that the disclosed methods and apparatus can also be implemented in other ways. The apparatus embodiments described above are merely illustrative. The functional modules in the various embodiments of this application can be integrated together to form an independent part, or each module can exist independently, or two or more modules can be integrated to form an independent part.

[0105] The above embodiments can be freely combined without conflict, and the resulting embodiments are covered within the protection scope of this application.

[0106] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

[0107] It should be noted that, in this document, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

Claims

1. A method of neural network pruning, the method comprising: include: Detecting silent channels in a neural network to be pruned, wherein silent channels are neurons in the neural network that are not activated by data; Remove the silent channel from the neural network; The input tensor of the neural network includes one of the following: the image to be processed, the speech to be processed, or the text to be processed. The method for detecting silent channels in a neural network to be pruned includes: obtaining the channel output range of the channel under test; the channel under test is any channel in the neural network; comparing the channel output range with a preset range; if the channel output range is outside the preset range or the channel output range is a fixed value, then the channel under test is determined to be the silent channel.

2. The neural network pruning method according to claim 1, characterized in that, The neural network includes a first operator and a second operator, wherein the first operator and the second operator are any two consecutive operators in the neural network, and the input data of the second operator is the output data of the first operator; The first operator includes multiple channels under test; Before comparing the channel output range with the preset range, the method further includes: Detect the operator input range of the second operator; The operator input range of the second operator is determined to be the preset range corresponding to each tested channel in the first operator.

3. The neural network pruning method according to claim 1 or 2, characterized in that, The neural network includes multiple operators, each operator including multiple channels, and the step of deleting the silent channel from the neural network includes: For any of the silent channels, determine the target endpoint value of the channel output range that is closest to the two endpoint values ​​of the preset range corresponding to the silent channel. In the operator containing the silent channel, the silent channel is configured as the target endpoint value.

4. The neural network pruning method according to claim 1 or 2, characterized in that, The neural network includes multiple operators, each operator including multiple channels, and the step of deleting the silent channel from the neural network includes: If all channels of any given operator are the silent channels, then delete the given operator.

5. The neural network pruning method according to claim 4, characterized in that, The deletion of the operator includes: Determine the target endpoint value corresponding to each silent channel in the operator; the target endpoint value is the target endpoint value of the channel output range that is closest to the two endpoint values ​​of the preset range corresponding to the silent channel. Configure the target endpoint value corresponding to each of the silent channels as the input value of the next operator of the operator.

6. The neural network pruning method of claim 4, wherein, The deletion of the operator includes: Delete the operator and all its upstream operators, where the upstream operators are operators in the neural network that are computed before the current operator.

7. A neural network pruning apparatus, characterized by comprising: include: A detection module is used to detect silent channels in a neural network, wherein the silent channels are neurons in the neural network that are not activated by data; A deletion module is used to delete the silent channel from the neural network; The input tensor of the neural network includes one of the following: the image to be processed, the speech to be processed, or the text to be processed. The detection module is used to obtain the channel output range of the channel under test; the channel under test is any channel in the neural network; the channel output range is compared with a preset range; if the channel output range is outside the preset range or the channel output range is a fixed value, then the channel under test is determined to be the silent channel.

8. A computer-readable storage medium, characterized in that, The readable storage medium stores a computer program that, when run on a computer, causes the computer to perform the neural network pruning method as described in any one of claims 1-6.

9. An electronic device, characterized in that, The device includes a memory and a processor, wherein the memory stores computer-readable instructions that, when executed by the processor, cause the processor to perform the neural network pruning method as described in any one of claims 1-6.