Method and apparatus for compressing artificial neural networks

By compressing artificial neural networks by measuring pruning sensitivity and determining percentile pruning thresholds, the problems of excessive network complexity and storage requirements are solved, achieving optimization of performance and cost.

CN114386590BActive Publication Date: 2026-06-12SAMSUNG ELECTRONICS CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SAMSUNG ELECTRONICS CO LTD
Filing Date
2021-03-05
Publication Date
2026-06-12

Smart Images

  • Figure CN114386590B_ABST
    Figure CN114386590B_ABST
Patent Text Reader

Abstract

Methods and apparatuses for compressing artificial neural networks are disclosed. The method can acquire a first input image and a second input image; pre-train a neural network based on the first input image to generate a pre-trained artificial neural network; acquire weights corresponding to the pre-trained artificial neural network, wherein the artificial neural network comprises a plurality of layers, and a processor is configured to: generate, based on the weights and the second input image, data for acquiring a change in behavior of the artificial neural network due to pruning the artificial neural network; determine, based on the change in behavior of the artificial neural network, a pruning threshold for pruning the artificial neural network; and compress the artificial neural network based on the pruning threshold.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] This application claims the benefit of Korean Patent Application No. 10-2020-0128136, filed on October 5, 2020, with the Korean Intellectual Property Office, the entire disclosure of which is incorporated herein by reference for all purposes. Technical Field

[0002] The following description relates to methods and apparatus for compressing artificial neural networks. Background Technology

[0003] Artificial neural networks (ANNs) may require significant computation for complex input data. As the amount of data used to train an ANN increases, the connections between its layers become more complex. Furthermore, while the accuracy of predictions based on previously trained data increases with the amount of data used, overfitting can occur, leading to a deterioration in the reliability of predictions based on new input data. The increased complexity of ANNs also results in excessive memory allocation, causing miniaturization and commercialization challenges. Therefore, there is a desire for compression methods that reduce the system cost of implementing ANNs while maintaining their performance.

[0004] The above description was possessed or acquired by one or more inventors in the course of conceiving this disclosure, and is not necessarily technology known prior to the filing of this application. Summary of the Invention

[0005] This summary is provided to introduce, in a simplified form, the selection of concepts further described in the following detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to help determine the scope of the claimed subject matter.

[0006] In one general aspect, a method for training an artificial neural network for image recognition is provided, the method comprising: acquiring a first input image and a second input image; pre-training a neural network based on the first input image to generate a pre-trained artificial neural network; acquiring weights corresponding to the pre-trained artificial neural network, wherein the artificial neural network includes multiple layers; generating data based on the weights and the second input image for acquiring changes in the behavior of the artificial neural network caused by pruning the artificial neural network; determining a pruning threshold for pruning the artificial neural network based on the changes in the behavior of the artificial neural network; and compressing the artificial neural network based on the pruning threshold.

[0007] The step of determining the pruning threshold may include: measuring a pruning sensitivity for each of the plurality of layers, the pruning sensitivity indicating the degree to which the behavior of the artificial neural network changes during the pruning of each layer of the artificial neural network; and determining the pruning threshold by performing pruning on each of the plurality of layers based on the pruning sensitivity measured for each of the plurality of layers.

[0008] The step of measuring pruning sensitivity may include: for each of the plurality of layers, in response to the behavior of the artificial neural network being maintained by pruning the corresponding layer, measuring pruning sensitivity by gradually increasing a percentile-based pruning threshold for the corresponding layer.

[0009] The pruning sensitivity can be determined based on at least one of the distribution of weights corresponding to the plurality of layers or the form of connections between the plurality of layers.

[0010] The step of determining the pruning threshold may include: selecting layers from the plurality of layers in ascending order of pruning sensitivity; determining a percentile-based pruning threshold corresponding to the selected layer such that the behavior of the artificial neural network is maintained by pruning the selected layer; and repeating the steps of selecting layers and determining the percentile-based pruning threshold for each of the remaining layers from the plurality of layers other than the selected layer.

[0011] The step of determining the pruning threshold may include: selecting layers from the plurality of layers in ascending order of pruning sensitivity; pruning the selected layers to determine a percentile-based pruning threshold; and determining a percentile-based pruning threshold in response to the inclusion of the higher k classes predicted by the artificial neural network before pruning into the higher p classes predicted by the artificial neural network after pruning, where k and p are natural numbers and k ≤ p.

[0012] The step of determining a percentile-based pruning threshold may include: in response to the behavior of the artificial neural network, increasing the percentile-based pruning threshold at set intervals by pruning selected layers while maintaining the behavior of the artificial neural network.

[0013] The change in the behavior of the artificial neural network can be measured based on whether the output before and after pruning is applied to the artificial neural network meets the decision criteria.

[0014] Decision criteria may include the condition that the top p classes predicted by the pruned artificial neural network include the top k classes predicted by the unpruned artificial neural network, where k and p are each natural numbers and k ≤ p. When the output satisfies the decision criteria, it can be measured that no change in the behavior of the artificial neural network has occurred.

[0015] The pruning threshold may include: a percentile-based pruning threshold for each of the plurality of layers; and the step of compressing the artificial neural network may include: for each of the plurality of kernels in the plurality of layers, applying a size-based pruning threshold to the corresponding kernel based on the percentile-based pruning threshold of the corresponding layer.

[0016] The step of compressing the artificial neural network may include removing a percentage of the weights of the artificial neural network, wherein the percentage weights correspond to a percentile-based pruning threshold.

[0017] The step of generating the data may include: generating the data based on the weights by repeatedly correcting a second input image by the artificial neural network until the class predicted in the artificial neural network is the target class from multiple classes.

[0018] The second input image may include a random noise image.

[0019] The steps for generating the data may include: backpropagating the cross-entropy loss between the one-hot vector corresponding to the target class and the class predicted in the artificial neural network.

[0020] The weights can be fixed and remain unchanged during the backpropagation of the cross-entropy loss.

[0021] In one general aspect, a computer-implemented method for compressing an artificial neural network is provided, the method comprising: acquiring weights corresponding to a pre-trained artificial neural network, wherein the artificial neural network includes multiple layers; generating data based on the weights for acquiring changes in the behavior of the artificial neural network caused by pruning the artificial neural network; determining a pruning threshold for pruning the artificial neural network based on the changes in the behavior of the artificial neural network; and compressing the neural network based on the pruning threshold.

[0022] In another general aspect, an apparatus for compressing an artificial neural network is provided, the apparatus comprising: a communication interface configured to acquire weights corresponding to a pre-trained artificial neural network, wherein the artificial neural network includes multiple layers; and a processor configured to: generate data based on the weights for acquiring changes in the behavior of the artificial neural network caused by pruning the artificial neural network; determine a pruning threshold for pruning the artificial neural network based on the changes in the behavior of the artificial neural network; and compress the neural network based on the pruning threshold.

[0023] The processor may be configured to: measure a pruning sensitivity for each of the plurality of layers, the pruning sensitivity indicating the degree to which the behavior of the artificial neural network changes during the pruning of each layer of the artificial neural network; and determine a pruning threshold by performing pruning on each of the plurality of layers based on the pruning sensitivity measured for each of the plurality of layers.

[0024] The processor can be configured to measure pruning sensitivity for each of the plurality of layers, in response to the behavior of the artificial neural network being maintained by pruning the corresponding layer, by gradually increasing a percentile-based pruning threshold for the corresponding layer.

[0025] The pruning sensitivity can be determined based on at least one of the distribution of weights corresponding to the plurality of layers or the form of connections between the plurality of layers.

[0026] The processor can be configured to: select layers from the plurality of layers in ascending order of pruning sensitivity; determine a percentile-based pruning threshold corresponding to the selected layer, such that the behavior of the artificial neural network is maintained by pruning the selected layer; and repeatedly perform the layer selection process and the percentile-based pruning threshold determination process for each of the remaining layers other than the selected layer.

[0027] The change in the behavior of the artificial neural network can be measured based on whether the output before and after pruning is applied to the artificial neural network meets the decision criteria.

[0028] Decision criteria may include the condition that the higher p classes predicted by the pruned artificial neural network include the higher k classes predicted by the unpruned artificial neural network, where k and p are each natural numbers and k ≤ p.

[0029] The pruning threshold may include: a percentile-based pruning threshold for each of the plurality of layers; and the processor may be configured to: apply a size-based pruning threshold to each of the plurality of cores in the plurality of layers, based on the percentile-based pruning threshold of the corresponding layer.

[0030] The processor can be configured to generate the data based on the weights by repeatedly correcting the input image by the artificial neural network until the class predicted in the artificial neural network is the target class from a plurality of classes.

[0031] The input image may include images with random noise.

[0032] The processor can be configured to backpropagate the cross-entropy loss between the one-hot vector corresponding to the target class and the class predicted in the artificial neural network.

[0033] The weights can be fixed and remain unchanged during the backpropagation of the cross-entropy loss.

[0034] The device may include at least one of the following: an advanced driver assistance system (ADAS), a head-up display (HUD), a three-dimensional (3D) digital information display (DID), a navigation device, a neuromorphic device, a 3D mobile device, a smartphone, a smart TV (TV), a smart vehicle, an Internet of Things (IoT) device, a medical device, or a measuring device.

[0035] Other features and aspects will become clear from the following detailed description, the accompanying drawings, and the claims. Attached Figure Description

[0036] Figure 1 This is a diagram illustrating an example of an artificial neural network.

[0037] Figure 2 This is a diagram showing an example of pruning.

[0038] Figure 3 This is a diagram illustrating an example of a method for compressing artificial neural networks.

[0039] Figure 4 This is a diagram illustrating an example of a method for compressing artificial neural networks.

[0040] Figure 5 This is a diagram illustrating an example of a method for generating data.

[0041] Figure 6 This is a diagram illustrating an example of a method for generating data.

[0042] Figure 7 This is a diagram illustrating an example of a method for determining the pruning threshold.

[0043] Figure 8 This is a diagram illustrating an example of a method for determining a trimming threshold by performing trimming on each layer.

[0044] Figure 9 This is a diagram illustrating an example of a method for determining a percentile-based pruning threshold corresponding to a layer.

[0045] Figure 10 This is a diagram illustrating another example of a method for determining the trimming threshold by performing trimming on each layer.

[0046] Figure 11 This is a diagram illustrating an example of applying a pruning threshold on a kernel-by-kernel basis.

[0047] Figure 12This is a diagram illustrating an example of a device used to compress artificial neural networks.

[0048] Throughout the accompanying drawings and detailed embodiments, unless otherwise described or provided, the same reference numerals will be understood to denote the same elements, features, and structures. The drawings may not be to scale, and for clarity, illustration, and convenience, the relative dimensions, scale, and depiction of elements in the drawings may be exaggerated. Detailed Implementation

[0049] The following detailed embodiments are provided to aid the reader in gaining a comprehensive understanding of the methods, apparatus, and / or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatus, and / or systems described herein will become apparent upon understanding this disclosure. For example, the order of operations described herein is merely illustrative and is not limited to those orders set forth herein, but may be changed as will become clear upon understanding this disclosure, except for operations that must occur in a specific order. Furthermore, for clarity and conciseness, descriptions of features known in the art may be omitted.

[0050] The features described herein may be implemented in different forms and should not be construed as limited to the examples described herein. Rather, the examples described herein have been provided merely to illustrate some of the many feasible ways of implementing the methods, apparatus, and / or systems described herein that will be clear upon understanding the disclosure of this application.

[0051] The terminology used herein is for the purpose of describing particular examples only and is not intended to be limiting. As used herein, the singular form is intended to include the plural form as well, unless the context clearly indicates otherwise. It will also be understood that when the terms “comprising” and / or “including” are used herein, they indicate the presence of the described features, wholes, steps, operations, elements, and / or components, but do not preclude the presence or addition of one or more other features, wholes, steps, operations, elements, components, and / or groups thereof.

[0052] Regarding the reference numerals assigned to elements in the accompanying drawings, it should be noted that wherever possible, even if the same element is shown in different drawings, the same element will be designated by the same reference numeral. Furthermore, in the description of the examples, where a detailed description of a well-known related structure or function would lead to a vague interpretation of this disclosure, such description will be omitted.

[0053] Additionally, terms such as first, second, A, B, (a), (b), etc., are used herein to describe components. Each of these terms is not intended to define the nature, order, or sequence of the corresponding component, but only to distinguish the corresponding component from one or more other components. It should be noted that if the specification describes the first component as "connected," "joined," or "engaged" to the second component, then although the first component may be directly connected, joined, or engaged to the second component, the third component may be "connected," "joined," or "engaged" between the first and second components.

[0054] Components that share functionality with components included in one example are described using the same name in another example. Unless otherwise stated, descriptions in one example are applicable to another, and detailed descriptions within the same scope are omitted.

[0055] Figure 1 This is a diagram illustrating an example of an artificial neural network. Figure 1 The diagram illustrates the configuration of a deep neural network (DNN) corresponding to an example of an artificial neural network. For brevity, the following description of the DNN's structure is given; this is merely an example, and various artificial neural network structures can be used.

[0056] DNN is a method for implementing artificial neural networks and can include multiple layers. A DNN may include, for example, an input layer, an output layer, and multiple hidden layers. Input data is applied to the input layer, and the output layer outputs the result value obtained by performing a prediction based on the input data during training. Multiple hidden layers are located between the input layer and the output layer.

[0057] DNNs are categorized based on the algorithms used to process information, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). In the following text, following general practice in the field of artificial neural networks, the input layer is referred to as the lowest layer, the output layer as the highest layer, and the layers from the highest output layer to the lowest input layer can be ordered sequentially for naming purposes. For example, hidden layer 2 can be a higher layer than hidden layer 1 and the input layer, and a lower layer than the output layer.

[0058] For example, in a DNN, a higher-level layer can receive a value obtained by multiplying the output value of a lower-level layer by a weight and applying a bias to the result of the multiplication, thereby outputting a predetermined computation result. In this case, the output result can be applied to a higher-level layer adjacent to the corresponding layer in a similar manner.

[0059] Methods for training artificial neural networks are known as, for example, deep learning. As mentioned above, various algorithms (such as convolutional neural network methods and recurrent neural network methods) can be used in deep learning.

[0060] In one example, “training an artificial neural network” can be understood in a broad sense as including determining and updating one or more weights and one or more biases between layers, and / or determining and updating one or more weights and one or more biases between multiple neurons belonging to different layers in adjacent layers.

[0061] For example, multiple layers, hierarchical structures, and the weights and biases of neurons can all be collectively expressed as the "connectivity" of an artificial neural network. Therefore, training an artificial neural network can also be understood as building and training connectivity.

[0062] In an artificial neural network, each of multiple layers may include multiple nodes. A node may correspond to a neuron in the artificial neural network. In the following description, the term "neuron" may be used interchangeably with the term "node".

[0063] exist Figure 1 In DNNs, it can be seen that connections are formed between combinations of nodes included in a single layer and combinations of nodes included in adjacent layers. Thus, the state in which combinations of all nodes in adjacent layers of an artificial neural network are connected can be called a "fully connected" state. For example, as... Figure 1 As shown, node 3-1 of hidden layer 2 can be connected to all nodes of hidden layer 1 (i.e., nodes 2-1 to 2-4) to receive the value obtained by multiplying the output value of each node by a predetermined weight.

[0064] Data input to the input layer can be processed through multiple hidden layers, resulting in an output value being output through the output layer. Here, the larger the weight multiplied by the output value of each node, the stronger the connection between the two corresponding nodes. Conversely, the smaller the weight, the weaker the connection between the two nodes. Weights can have values ​​ranging from, for example, 0 to 1. If the weight is 0, then there is no connection between the two nodes.

[0065] Simultaneously, as the connectivity between nodes increases through weights, the connectivity of the artificial neural network strengthens, but so does its complexity. Therefore, the amount of memory allocated to store the weights increases. Furthermore, the speed of executing the entire artificial neural network task is hindered. Thus, the efficiency of the artificial neural network decreases.

[0066] Figure 2 This is a diagram illustrating an example of pruning. (See reference...) Figure 2 Part 210 shows the structure of the artificial neural network obtained before pruning, and part 230 shows the structure of the artificial neural network obtained after pruning.

[0067] As can be seen in section 210, the combination of all nodes in two different adjacent layers of the artificial neural network is fully connected. When the artificial neural network is fully connected, the weights indicating the strength of the connection between two predetermined nodes belonging to different adjacent layers in the artificial neural network can have values ​​greater than 0. As shown in section 210, when neurons in all adjacent layers of the artificial neural network are connected, the complexity of the entire artificial neural network increases. Furthermore, in this case, the accuracy and reliability of the prediction results of the artificial neural network may decrease due to overfitting.

[0068] To prevent this, as shown in section 230, pruning can be performed on portions of the artificial neural network. For example, in the artificial neural network obtained before pruning, as shown in section 210, the weights between node 1 and nodes 2-3 may be less than or equal to a predetermined threshold. In this example, to determine the portion of the artificial neural network to be pruned, the compression device can search the artificial neural network for portions with weights less than or equal to the threshold (e.g., between node 1 and nodes 2-3). The compression device can perform pruning to remove the connectivity of portions with weights less than or equal to the threshold. In this case, the compression device can prune the artificial neural network by reducing or removing a portion of the layers and / or a portion of the weights, which can substantially maintain the predictive (or inferential) accuracy of the artificial neural network.

[0069] In this way, pruning can be performed on layers of an artificial neural network that do not substantially affect the network's output. For example, pruning can be performed on one or more input feature maps of a layer, which can substantially affect the output feature map generated by that layer.

[0070] A compression device can perform pruning by searching for connections between neurons whose weights have values ​​less than a threshold. Connections corresponding to all weights identified as having values ​​less than the threshold can be removed, scaled to zero, or otherwise ignored. For example, the compression device can select layers with weights less than the threshold as candidates to be pruned in an artificial neural network.

[0071] Figure 3 This is a diagram illustrating an example of a method for compressing artificial neural networks. (See reference...) Figure 3 The compression device can perform pruning via operations 310 to 330, thereby compressing the artificial neural network. In one example, the method / device for compressing the artificial neural network can be included in the method / device for training the artificial neural network. For example, before performing pruning via operations 310 to 330, the compression device can acquire training images and input images, and pre-train the artificial neural network based on the training images to generate a pre-trained artificial neural network.

[0072] In operation 310, the compression device can acquire weights corresponding to a pre-trained artificial neural network. In one example, the artificial neural network can build an inference model by training it against a desired operation. Furthermore, the artificial neural network can output inferences about external input values ​​based on the built inference model. The artificial neural network can be applied to, for example, facial recognition modules or software for smartphones, recognition / classification operations (such as object recognition, speech recognition, and image classification) for medical and diagnostic devices, and unmanned systems, and can be implemented as a dedicated processing device for processing image data to extract meaningful information.

[0073] In operation 320, the compression device generates data based on the weights received in operation 310. In one example, the compression device generates data based on the input image and the weights received in operation 310. The data may be, for example, data used to acquire changes in the behavior of the artificial neural network due to pruning. Operation 320 may correspond to processing for generating data optimized for acquiring the effect of pruning on the output value of the artificial neural network (e.g., changes in the behavior of the artificial neural network due to pruning). In operation 320, the compression device can generate data by repeatedly training the generated input image so that the artificial neural network predicts a predetermined class.

[0074] In operation 330, the compression device may use the data generated in operation 320 to generate a pruning threshold and prune layers with weights less than the determined or generated pruning threshold. Operation 330 may correspond to the process of determining a pruning threshold optimized for the corresponding artificial neural network using the data generated in operation 320.

[0075] Figure 4 This is a diagram illustrating an example of a method for compressing artificial neural networks. It can be performed in the order and manner shown. Figure 4 The operations described herein may be performed, but the order of some operations or some operations may be changed or omitted without departing from the spirit and scope of the illustrative examples described. Figure 4 Many of the operations shown can be performed in parallel or simultaneously. Figure 4 One or more blocks, and combinations thereof, can be implemented by a computer (such as a processor) based on dedicated hardware, or a combination of dedicated hardware and computer instructions, performing the specified functions. For example, a compression device can compress an artificial neural network via operations 410 to 440. In one example, a method / device for compressing an artificial neural network can be included in a method / device for training an artificial neural network. For example, before performing pruning via operations 410 to 440, the compression device can acquire training images and input images, and pre-train the artificial neural network based on the training images to generate a pre-trained artificial neural network. In addition to the following... Figure 4 In addition to the description, Figures 1 to 3 The description also applies to Figure 4 And it is included here by reference. Therefore, the above description need not be repeated here.

[0076] Reference Figure 4 In operation 410, the compression device acquires the weights corresponding to the pre-trained artificial neural network. The artificial neural network consists of multiple layers.

[0077] In operation 420, the compression device generates data for acquiring changes in the behavior of the artificial neural network due to pruning, based on the weights obtained in operation 410. In one example, the compression device generates data for acquiring changes in the behavior of the artificial neural network due to pruning, based on the input image and the weights obtained in operation 410. The changes in the behavior of the artificial neural network can be measured, for example, based on whether the outputs obtained for the data before and after pruning are applied to the artificial neural network meet decision criteria. Decision criteria can include conditions such as: the higher p classes predicted by the pruned artificial neural network include the higher k classes predicted by the unpruned artificial neural network, where k and p are each positive integers and k ≤ p.

[0078] In operation 420, the compression device generates data based on the weights acquired in operation 410 by repeatedly correcting the input image by an artificial neural network until the class predicted in the artificial neural network is the target class, which is one of a plurality of classes. The input image may include, for example, an image with random noise. See below for further details. Figure 5 and Figure 6 A more detailed description of the method for generating data using compression devices.

[0079] In operation 430, the compression device determines a pruning threshold for pruning the artificial neural network by observing changes in its behavior via data acquisition. The pruning threshold may include, for example, percentile-based pruning thresholds for each of multiple layers. See below. Figures 7 to 10 The method for determining the trimming threshold using a compression device is described in more detail.

[0080] In operation 440, the compression device compresses the artificial neural network based on a pruning threshold determined in operation 430. Here, "compression" means reducing the capacity of the artificial neural network. The compression device can compress the artificial neural network by performing pruning using the pruning threshold determined in operation 430. For example, if the pruning threshold is determined to be 40%, the compression device can compress the artificial neural network by performing pruning on the weights corresponding to the lower 40% of the weights of the artificial neural network. The compression device can reduce the capacity of the artificial neural network by removing weights with a smaller impact on the task accuracy of the artificial neural network from the weights of the artificial neural network based on the pruning threshold.

[0081] In some cases, each layer of an artificial neural network may include multiple kernels. In this scenario, for each of the multiple kernels included in each layer of the artificial neural network, the compression device may apply a magnitude-based pruning threshold to the corresponding kernel based on a percentile-based pruning threshold for the corresponding layer. See below for further details. Figure 11 The method of applying a size-based pruning threshold to the corresponding kernel via a compression device is described in more detail.

[0082] Figure 5 This is a diagram illustrating an example of a method for generating data. It can be performed in the order and manner shown. Figure 5 The operations described herein may be performed, but the order of some operations may be changed or some operations may be omitted without departing from the spirit and scope of the illustrative examples described. Figure 5 Many of the operations shown can be performed in parallel or simultaneously. Figure 5 One or more blocks, and combinations thereof, can be implemented by a computer based on dedicated hardware (such as a processor) or a combination of dedicated hardware and computer instructions to perform the specified functions. In addition to the following... Figure 5 In addition to the description, Figures 1 to 4 The description also applies to Figure 5 And it is included here by reference. Therefore, the above description need not be repeated here.

[0083] Figure 6 This is a diagram illustrating an example of a method for generating data. Figure 5 and Figure 6 The process of generating data via a compression device is illustrated through operations 510 to 540.

[0084] Reference Figure 5 In operation 510, the compression device generates a random noise image 605 corresponding to the input image of the artificial neural network 610. The artificial neural network 610 may include, for example, convolutional (Conv) layers, batch normalization (BN) units, fully connected (FC) layers, and softmax layers. The softmax layer may output a one-hot vector 615 that classifies the input image into classes.

[0085] In operation 520, the compression device may use weights corresponding to the pre-trained artificial neural network 610 (e.g., using the weights acquired in operation 410) to acquire the result (e.g., class 901) predicted by the artificial neural network 610 for the random noise image 605 generated in operation 510. Class 901 may correspond to the highest-1 class predicted by the artificial neural network 610 (e.g., the class with the highest confidence score). The artificial neural network 610 may have been pre-trained to derive the target class (e.g., class 1) in the inference process, thereby acquiring the result (e.g., class 901) obtained in operation 520 by predicting the class for the random noise image 605.

[0086] In operation 530, the compression device may determine whether the Top-1 class (e.g., class 901 (620)) predicted by the artificial neural network 610 in operation 520 is the same as class 1, which is the target class (e.g., class i). For example, if it is determined in operation 530 that the predicted Top-1 class is the same as the target class (e.g., class i), the compression device may terminate the operation.

[0087] In some cases, the class predicted by the artificial neural network 610 (e.g., class 901(620)) may be determined to be different from the target class (e.g., class 1) in operation 530. In this case, in operation 540, the compression device can backpropagate the cross-entropy loss between the one-hot vector corresponding to the target class and the class predicted in the artificial neural network 610, or between the class predicted in the artificial neural network 610 and the one-hot vector corresponding to the target class (e.g., the cross-entropy loss between the class predicted by the artificial neural network 610 (e.g., class 901(620)) and the one-hot vector 615 in which only elements corresponding to the target class (e.g., class 1) are activated). At this time, the weights corresponding to the artificial neural network 610 can be simply transferred and fixed without change during the backpropagation of the cross-entropy loss. The compression device can use, for example, a one-hot vector 615 as a ground truth (GT) label (e.g., GT = class 1) to obtain a cross-entropy loss and train or correct a random noise image 605 via backpropagation.

[0088] The compression device can backpropagate the cross-entropy loss to allow the artificial neural network 610 to correct the input image 605 in operation 540, and then predict the class of the corrected input image via operation 520. The compression device can generate data by repeatedly performing backpropagation of the cross-entropy loss by steps 1, 2, ..., N until the class predicted in the artificial neural network (e.g., class 901) is the target class (e.g., class 1) as indicated by reference numeral 650. For example, the compression device can use either the input image 605 or the corrected input image 605 as the data generated in operation 420. In one example, as the backpropagation of the cross-entropy loss is repeated, the confidence score of the target class can be gradually increased. Furthermore, the compression device can repeatedly perform training or correction on the input image 605 until the highest-predicted class in the artificial neural network 610 is the target class.

[0089] Through the processing described above, the compression device can generate an input image corresponding to each class (e.g., the target class) and use the generated input image corresponding to each class to determine the pruning threshold corresponding to each class, thereby pruning the artificial neural network without using separate training data.

[0090] Figure 7 This is a diagram illustrating an example of a method for determining the pruning threshold. It can be performed in the order and manner shown. Figure 7 The operations described herein may be performed, but the order of some operations may be changed or some operations may be omitted without departing from the spirit and scope of the illustrative examples described. Figure 7 Many of the operations shown can be performed in parallel or simultaneously. Figure 7 One or more blocks, and combinations thereof, can be implemented by a computer (such as a processor) based on dedicated hardware, or a combination of dedicated hardware and computer instructions, performing the specified functions. For example, a compression device can determine a trimming threshold via operations 710 and 720. In addition to the following... Figure 7 In addition to the description, Figures 1 to 6 The description also applies to Figure 7 And it is included here by reference. Therefore, the above description need not be repeated here.

[0091] Reference Figure 7 In operation 710, the compression device can measure the pruning sensitivity for each of the multiple layers, the pruning sensitivity indicating the degree to which the behavior of the artificial neural network changes during pruning.

[0092] For example, in operation 710, the compression device may, for each of a plurality of layers, measure the trimming sensitivity by gradually increasing a percentile-based trimming threshold corresponding to the corresponding layer (e.g., gradually increasing a predetermined initial percentile-based trimming threshold corresponding to the corresponding layer by 1%) until the behavior of the artificial neural network is not preserved due to trimming the corresponding layer. In one example, the trimming sensitivity may correspond to the extent to which the trimming threshold is increased or be inversely proportional to the extent to which the trimming threshold is increased. For example, the larger the trimming threshold is increased until the behavior of the artificial neural network for the data generated in operation 420 is not preserved due to trimming the corresponding layer, the smaller the trimming sensitivity; the smaller the trimming threshold is increased until the behavior of the artificial neural network for the data generated in operation 420 is not preserved due to trimming the corresponding layer, the larger the trimming sensitivity. The trimming sensitivity may be determined based on at least one of the distribution of weights corresponding to the layers and the form of connections between layers. For example, in an image with text, most of the text is located in the center of the image, while the edges of the image are blank. In this example, the weights may also be distributed in the portion corresponding to the center of the image. Thus, the trimming sensitivity may be determined to vary based on the distribution of the weights. In addition, the form of connection between layers can include the positional relationship between a corresponding layer and another layer.

[0093] In operation 720, the compression device determines a pruning threshold by performing pruning on each of the multiple layers based on a pruning sensitivity measured for each of the multiple layers. The measured pruning sensitivity may differ for each layer. For example, like the artificial neural network 610 described above, the artificial neural network may include multiple Conv layers, FC layers, and flexible maximum layers. In this example, the pruning sensitivity may be measured differently for each of the Conv layers, FC layers, and flexible maximum layers. Because pruning is performed on each layer based on the pruning sensitivity of the corresponding layer, the reduction in accuracy due to the compression of the artificial neural network can be minimized.

[0094] A pruning threshold is a standard value used to reduce or remove weights in an artificial neural network, and can be, for example, a percentile-based threshold. For instance, the pruning threshold could be fixed at a minimum of 0 and a maximum of 1. By using a percentile-based pruning threshold, the distortion of the output feature map distribution due to pruning can be prevented when the distributions of the filters differ.

[0095] In operation 720, for example, if the Top-k class predicted by the artificial neural network for the data generated in operation 420 before pruning and the Top-p class predicted by the artificial neural network for the data generated in operation 420 after pruning meet a criterion (e.g., the k:p criterion), the compression device can identify that the behavior of the artificial neural network caused by pruning has not changed significantly, repeat the pruning process, and determine a pruning threshold. See below. Figure 8 and Figure 10 A more detailed description is given of the method by which a compression device determines a trimming threshold by performing trimming on each layer.

[0096] Figure 8 This is a diagram illustrating an example of a method for determining a pruning threshold by performing pruning on each layer. It can be performed in the order and manner shown. Figure 8 The operations described herein may be performed, but the order of some operations may be changed or some operations may be omitted without departing from the spirit and scope of the illustrative examples described. Figure 8 Many of the operations shown can be performed in parallel or simultaneously. Figure 8 One or more blocks, and combinations thereof, can be implemented by a computer based on dedicated hardware (such as a processor) or a combination of dedicated hardware and computer instructions to perform the specified function. For example, Figure 8 The process shown illustrates how the compression apparatus determines a trimming threshold by performing trimming on each layer via operations 810 to 830. In addition to the following... Figure 8 In addition to the description, Figures 1 to 7 The description also applies to Figure 8 And it is included here by reference. Therefore, the above description need not be repeated here.

[0097] Reference Figure 8 In operation 810, the compression device can select layers from multiple layers in ascending order of trim sensitivity. Here, low trim sensitivity indicates that the behavior changes little due to trimming; in other words, it is robust to trimming. Therefore, "low trim sensitivity" can be understood as the same as "robust to trimming".

[0098] In operation 820, the compression device can determine a percentile-based pruning threshold corresponding to the layer selected in operation 810, such that the behavior of the artificial neural network is preserved by pruning the layer selected in operation 810. See below. Figure 9 Describes a method for determining a percentile-based pruning threshold corresponding to the layer selected in operation 810.

[0099] In operation 830, the compression device may repeatedly perform the layer selection process (e.g., operation 810) and the process of determining a percentile-based trimming threshold (e.g., operation 820) for each of the remaining layers other than the layer selected in operation 810.

[0100] Figure 9 This is a diagram illustrating an example of a method for determining a percentile-based pruning threshold corresponding to a layer. It can be performed in the order and manner shown. Figure 9 The operations described herein may be performed, but the order of some operations may be changed or some operations may be omitted without departing from the spirit and scope of the illustrative examples described. Figure 9Many of the operations shown can be performed in parallel or simultaneously. Figure 9 One or more blocks, and combinations thereof, can be implemented by a computer based on dedicated hardware (such as a processor) or a combination of dedicated hardware and computer instructions to perform the specified function. For example, Figure 9 The diagram illustrates the process by which the compression apparatus determines a percentile-based pruning threshold in a layer of an artificial neural network through operations 910 to 940. In addition to the following... Figure 9 In addition to the description, Figures 1 to 8 The description also applies to Figure 9 And it is included here by reference. Therefore, the above description need not be repeated here.

[0101] Reference Figure 9 In operation 910, the compression device can search for layers robust to pruning (i.e., layers with low pruning sensitivity) from multiple layers of the artificial neural network based on the pruning sensitivity measured by the processing described above.

[0102] In operation 920, the compression device may prune the layers found in operation 910 and obtain the output corresponding to the inference result of the pruned artificial neural network. For example, the compression device may prune the layers found in operation 910 based on an initial pruning threshold corresponding to the layers found in operation 910 or an initial pruning threshold corresponding to the pruning sensitivity of the layers found in operation 910. The inference result of the pruned artificial neural network may correspond to, for example, the higher p classes predicted by the artificial neural network for the data (e.g., the input image).

[0103] In operation 930, the compression device may determine whether the inference result of operation 920 satisfies the k:p decision criterion (hereinafter referred to as the "k:p criterion"). Here, the k:p criterion may correspond to a criterion used to obtain the degree of change in the behavior of the artificial neural network during pruning. The k:p criterion may include a first condition that the highest class predicted by the artificial neural network before and after pruning is the same, and a second condition that the combination of the top k classes predicted by the artificial neural network before pruning matches the combination of the top p classes predicted by the artificial neural network after pruning. For example, "the combination of the top k classes predicted by the artificial neural network before pruning matches the combination of the top p classes predicted by the artificial neural network after pruning" can be understood as the top k classes predicted by the artificial neural network before pruning being included in the top p classes predicted by the artificial neural network after pruning. Here, k and p may each be positive integers, and k ≤ p. For example, when it is determined in operation 930 that the inference result does not satisfy the k:p criterion, the compression device may terminate the operation.

[0104] In some cases, during operation 930, the inference result can be determined to satisfy the k:p criterion. In this case, during operation 940, the compression device can identify that the behavior of the artificial neural network has not changed significantly (e.g., the compression device can measure that no change in the behavior of the artificial neural network has occurred), and increase the pruning threshold, for example, by 1%. The compression device can then re-perform the pruning and inference of operation 920 with the increased pruning threshold in operation 940.

[0105] The compression device determines the pruning threshold of a selected layer by performing pruning while gradually increasing the percentile-based pruning threshold corresponding to the corresponding layer, so that the behavior of the artificial neural network is preserved by pruning the corresponding layer. Here, "the behavior of the artificial neural network is preserved by pruning" can be understood as the absence of a meaningful behavioral change that would lead to "a change in the inference accuracy of the artificial neural network due to pruning the artificial neural network".

[0106] Figure 10 This is a diagram illustrating another example of a method for determining a pruning threshold by performing pruning on each layer. It can be performed in the order and manner shown. Figure 10 The operations described herein may be performed, but the order of some operations may be changed or some operations may be omitted without departing from the spirit and scope of the illustrative examples described. Figure 10 Many of the operations shown can be performed in parallel or simultaneously. Figure 10 One or more blocks, and combinations thereof, can be implemented by a computer based on dedicated hardware (such as a processor) or a combination of dedicated hardware and computer instructions to perform the specified function. For example, Figure 10 The process of determining the trimming threshold for each layer using the compression device is illustrated in operations 1010 to 1060. (Except for the following...) Figure 10 In addition to the description, Figures 1 to 9 The description also applies to Figure 10 And it is included here by reference. Therefore, the above description need not be repeated here.

[0107] Reference Figure 10 In operation 1010, the compression device may search from multiple layers of the artificial neural network for the layer with the lowest pruning sensitivity (e.g., layer A) based on the pruning sensitivity measured in the processing described above.

[0108] In operation 1020, the compression device may trim the layer (e.g., layer A) or a block of the corresponding layer (e.g., layer A) found in operation 1010, and obtain the inference result of the trimmed artificial neural network. A layer may include multiple kernels. Furthermore, a layer may also be referred to as a block in the sense that it includes multiple kernels. The inference result of the trimmed artificial neural network may correspond to, for example, the higher p classes predicted by the artificial neural network for the input image.

[0109] In operation 1030, the compression device can determine whether the inference result of operation 1020 meets the k:p criterion.

[0110] For example, the behavior of the artificial neural network may not change significantly when the inference result is determined to meet the k:p criterion in operation 1030. In one example, in operation 1040, the compression device may increase the pruning threshold of the corresponding layer (e.g., layer A). The compression device can then re-perform the pruning and inference of operation 1020 with the increased pruning threshold in operation 1040.

[0111] In some cases, during operation 1030, the inference result may be determined to not meet the k:p criterion. In this case, during operation 1050, the compression device may determine whether the pruning of all layers in the artificial neural network is complete. When it is determined in operation 1050 that the pruning of all layers is complete, the compression device may terminate the operation.

[0112] If it is determined in operation 1050 that the trimming of all layers is incomplete, then in operation 1060, the compression device may search for the layer with the lowest trimming sensitivity from the remaining layers other than the layer found in operation 1010 (e.g., layer A).

[0113] The compression device can perform the trimming and inference results of operation 1020 on the layer found in operation 1060.

[0114] The compression device can sequentially select layers or blocks with low pruning sensitivity from the artificial neural network by means of a newly defined threshold via operation 1040, and perform pruning until the selected layers or blocks no longer produce meaningful behavioral changes in the artificial neural network. In this case, the compression device can use the k:p criterion to identify the degree of behavioral change in the artificial neural network. When the pruning of a layer is terminated, the compression device can prune all layers of the artificial neural network by repeating the above-described process on the remaining layers.

[0115] Figure 11 This is a diagram illustrating an example of the process of applying a pruning threshold on a kernel-by-kernel basis. Figure 11 A method for determining the pruning threshold is shown in the case where layer 1130 of an artificial neural network includes multiple kernels 1131 and 1133.

[0116] For example, the input image 1110 may include three channels of red (R), green (G), and blue (B) in a size of 6×6×3. Furthermore, kernels 1131 and 1133 may each have a size of 3×3×3. Kernels 1131 and 1133 can be filters used to find features of the input image 1110. In this case, the compression device can generate an output feature map of size 4×4×2 by performing a convolution operation on each kernel while moving kernels 1131 and 1133 at preset intervals over the input image 1110.

[0117] Layer 1130 in an artificial neural network may include, for example: Figure 11 The diagram shows multiple kernels 1131 and 1133. In this case, multiple kernels 1131 and 1133 included in the same layer 1130 can share a percentile-based pruning threshold. The compression device can perform pruning on each of the multiple kernels 1131 and 1133 included in layer 1130 by applying a size-based pruning threshold to each kernel based on the percentile-based pruning threshold of the corresponding layer, in order to compress the artificial neural network.

[0118] For example, a pruning threshold of 30% can be shared for layer 1130 of the artificial neural network. In this example, pruning can be performed on kernels 1131 and 1133 of layer 1130 using different thresholds proportional to the size of the weights included in the corresponding kernels. For example, weights corresponding to the lower 30% of the weights included in the first kernel (e.g., kernel 1131) can be pruned, and weights corresponding to the lower 30% of the weights included in the second kernel (e.g., kernel 1133) can also be pruned. In this example, the threshold for the weights corresponding to the lower 30% of the weights in the first kernel 1131 can be different from the threshold for the weights corresponding to the lower 30% of the weights in the second kernel 1133.

[0119] Reference Figures 1 to 11 The compression method described can also be applied substantially equivalently to situations where quantization is performed while maintaining the accuracy of the artificial neural network. For example, the degree of quantization for each layer can be controlled based on the number of quantization bits. Furthermore, quantization can be applied to layers in descending order of robustness to quantization. In this example, the k:p criterion can be used to measure the sensitivity of each layer to quantization.

[0120] Reference Figures 1 to 11 The described example employs a data distillation technique to generate data for capturing behavioral changes in an artificial neural network. In one example, a generative adversarial network (GAN) or an autoencoder could be used to generate data for capturing behavioral changes in an artificial neural network.

[0121] For example, an autoencoder may include an encoder and a decoder. The decoder of the autoencoder can be used to receive the output of the encoder and recover the input of the encoder. After applying a pre-trained artificial neural network to the encoder, the decoder of the autoencoder can be trained. Subsequently, using the decoder of the autoencoder, data for obtaining changes in the behavior of the artificial neural network can be generated. Furthermore, when training the decoder of the autoencoder, a discriminator included in the GAN structure can be attached.

[0122] Figure 12 This is a diagram illustrating an example of a device 1200 for compressing artificial neural networks. (See reference...) Figure 12 The device for compressing artificial neural networks (hereinafter referred to as the "compression device") 1200 includes a communication interface 1210, a processor 1230, a memory 1250, and an output / input interface 1260. The communication interface 1210, processor 1230, memory 1250, and output / input interface 1260 can be connected via a communication bus 1205.

[0123] Communication interface 1210 acquires the weights corresponding to the pre-trained artificial neural network. The artificial neural network consists of multiple layers.

[0124] Processor 1230 generates data for acquiring changes in the behavior of the artificial neural network caused by pruning, based on weights obtained through communication interface 1210. Processor 1230 determines a pruning threshold for pruning the artificial neural network based on the behavioral changes of the artificial neural network acquired via the data. Processor 1230 compresses the neural network based on the pruning threshold.

[0125] Additionally, the processor 1230 can execute references Figures 1 to 11 At least one of the described methods or an algorithm corresponding to at least one of these methods. Processor 1230 may be a hardware-implemented compression device having circuitry physically configured to perform a desired operation. For example, the desired operation comprises code or instructions included in a program. Hardware-implemented compression devices include, but are not limited to, microprocessors, central processing units (CPUs), graphics processing units (GPUs), processor cores, multi-core processors, multiprocessors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and neural processors (NPUs). Further details regarding processor 1230 are provided below.

[0126] The processor 1230 can execute programs and control the compression device 1200. The code of the program executed by the processor 1230 is stored in the memory 1250.

[0127] The memory 1250 may store weights, for example, obtained through the communication interface 1210. In this example, the weights may correspond to the parameters of an artificial neural network comprising multiple layers.

[0128] Furthermore, memory 1250 may store data generated in processor 1230 and / or pruning thresholds determined in processor 1230. Additionally, memory 1250 may store information about the artificial neural network compressed by processor 1230. Information about the compressed artificial neural network may include, for example, information about pruned layers.

[0129] Thus, memory 1250 can store various information generated during the processing operations of processor 1230 as described above. Additionally, memory 1250 can store various data and programs. Memory 1250 may include volatile memory or non-volatile memory. Memory 1250 may include a high-capacity storage medium (such as a hard disk) to store various data.

[0130] The device 1200 for compressing an artificial neural network may further include an output / input interface 1260 configured to output the result of processing by the artificial neural network or to receive input. In one example, the output / input interface 1260 may receive user input. Input devices may detect input from, for example, a keyboard, mouse, touchscreen, microphone, or user, and may include any other devices configured to transmit the detected input. Output devices may provide output to the user via a visual, auditory, or tactile means. Output devices may include, for example, displays (such as computer monitors, smartphones, smart TVs (TVs), tablet computers, head-up displays (HUDs), three-dimensional (3D) digital information displays (DIDs), 3D mobile devices, and, operatively connected to the device 1200 for compressing an artificial neural network, displays of intelligent vehicles, advanced driver assistance systems (ADAS), and glasses displays (EGDs), which may be used without departing from the spirit and scope of the described illustrative examples.

[0131] The compression device 1200 can correspond to devices in various fields (such as advanced driver assistance systems (ADAS), head-up display (HUD) devices, three-dimensional (3D) digital information displays (DID), navigation devices, neuromorphic devices, 3D mobile devices, smartphones, smart TVs, smart vehicles, Internet of Things (IoT) devices, medical devices, and measuring devices). 3D mobile devices can be understood in the sense of display devices for, for example, augmented reality (AR), virtual reality (VR), and / or mixed reality (MR), head-mounted displays (HMDs), and face-mounted displays (FMDs).

[0132] The device 1200 for compressing artificial neural networks, as well as other devices, apparatuses, units, modules, and components described herein, are implemented by hardware components. Examples of hardware components that can be used to perform the operations described herein include, where appropriate, controllers, sensors, generators, drivers, memories, comparators, arithmetic logic units, adders, subtractors, multipliers, dividers, integrators, and any other electronic components configured to perform the operations described herein. In other examples, one or more of the hardware components performing the operations described herein are implemented by computing hardware (e.g., by one or more processors or computers). The processor or computer may be implemented by one or more processing elements, such as logic gate arrays, controllers and arithmetic logic units, digital signal processors, microcomputers, programmable logic controllers, field-programmable gate arrays, programmable logic arrays, microprocessors, or any other means or combination of means configured to respond to and execute instructions in a defined manner to achieve a desired result. In one example, the processor or computer includes or is connected to one or more memories storing instructions or software executed by the processor or computer. Hardware components implemented by a processor or computer can execute instructions or software (such as an operating system (OS) and one or more software applications running on the OS) for performing the operations described in this application. Hardware components can also access, manipulate, process, create, and store data in response to the execution of instructions or software. For simplicity, the singular terms "processor" or "computer" may be used in the description of the examples described in this application; however, in other examples, multiple processors or computers may be used, or a processor or computer may include multiple processing elements or multiple types of processing elements or both. For example, a single hardware component or two or more hardware components may be implemented by a single processor, or two or more processors, or a processor and a controller. One or more hardware components may be implemented by one or more processors, or a processor and a controller, and one or more other hardware components may be implemented by one or more other processors, or additional processors and additional controllers. One or more processors, or a processor and a controller, may implement a single hardware component or two or more hardware components.The hardware components may be any one or more with different processing configurations. Examples of different processing configurations include: a single processor, a standalone processor, a parallel processor, a single instruction single data (SISD) multiprocessor, a single instruction multiple data (SIMD) multiprocessor, a multiple instruction single data (MISD) multiprocessor, a multiple instruction multiple data (MIMD) multiprocessor, a controller and arithmetic logic unit (ALU), a DSP, a microcomputer, a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a programmable logic unit (PLU), a central processing unit (CPU), a graphics processing unit (GPU), a neural processing unit (NPU), or any other device capable of responding to and executing instructions in a defined manner.

[0133] The methods for performing the operations described in this application are executed by computing hardware (e.g., by one or more processors or a computer), which is implemented to execute instructions or software as described above to perform the operations performed by the methods described in this application. For example, a single operation or two or more operations may be performed by a single processor, or two or more processors, or a processor and a controller. One or more operations may be performed by one or more processors, or a processor and a controller, and one or more other operations may be performed by one or more other processors, or additional processors and additional controllers. One or more processors, or a processor and a controller, may perform a single operation or two or more operations.

[0134] Instructions or software for controlling a processor or computer to implement hardware components and perform the methods described above are written as computer programs, code segments, instructions, or any combination thereof to individually or collectively instruct or configure the processor or computer, such as a machine or special-purpose computer, to perform the operations performed by the hardware components and methods described above. In one example, the instructions or software include machine code (such as machine code generated by a compiler) that is directly executed by the processor or computer. In one example, the instructions or software include at least one of a tooltip for compressing a neural network, a dynamic link library (DLL), middleware, firmware, a device driver, or an application program. In another example, the instructions or software include high-level code that is executed by the processor or computer using an interpreter. Programmers skilled in the art can readily write instructions or software based on the block diagrams and flowcharts shown in the accompanying drawings and the corresponding descriptions in the specification, which disclose algorithms for performing the operations performed by the hardware components and methods described above.

[0135] Instructions or software for controlling a processor or computer to implement hardware components and perform the methods described above, along with any associated data, data files, and data structures, are recorded, stored, or fixed on one or more non-transitory computer-readable storage media. Examples of non-transitory computer-readable storage media include: read-only memory (ROM), random access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD+R, CD-RW, CD+RW, DVD-ROM, DVD-R, DVD+R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or optical disc storage, hard disk drive (HDD), solid-state drive (SSD), flash memory, card storage (such as multimedia cards or microcards (e.g., Secure Digital (SD) or Extreme Digital (XD))), magnetic tape, floppy disk, magneto-optical data storage device, optical data storage device, hard disk, solid-state drive, and any other device configured to store instructions or software and any associated data, data files and data structures in a non-transitory manner and to provide instructions or software and any associated data, data files and data structures to a processor or computer such that the processor or computer can execute the instructions.

[0136] While this disclosure includes specific examples, it will be clear upon understanding this disclosure that various changes in form and detail may be made to these examples without departing from the spirit and scope of the claims and their equivalents. The examples described herein should be considered descriptive only and not for limiting purposes. The description of features or aspects in each example should be considered applicable to similar features or aspects in other examples. Suitable results may be achieved if the described techniques are performed in a different order, and / or if components in the described system, architecture, apparatus, or circuit are combined in a different manner, and / or replaced or supplemented by other components or their equivalents.

[0137] Therefore, the scope of the disclosure is not limited by the specific embodiments, but by the claims and their equivalents, and all variations within the scope of the claims and their equivalents should be interpreted as included in the disclosure.

Claims

1. A method for training an artificial neural network for image recognition, the method comprising: Obtain the first input image and the second input image; A neural network is pre-trained based on a first input image to generate a pre-trained artificial neural network; Obtain the weights corresponding to a pre-trained artificial neural network, wherein the artificial neural network comprises multiple layers; Based on the weights and the second input image, first data is generated for obtaining changes in the behavior of the artificial neural network caused by pruning the artificial neural network; Based on changes in the behavior of the artificial neural network, a pruning threshold for pruning the artificial neural network is determined using first data; and The artificial neural network is compressed based on a pruning threshold. The steps for generating the first data include: The weights are used to obtain the result of the artificial neural network predicting the class of the second input image; When the class predicted in the artificial neural network is the target class from multiple classes, the second input image is determined as the first data corresponding to the target class; When the class predicted by the artificial neural network is not the target class, the second input image is repeatedly corrected by the artificial neural network until the class predicted by the artificial neural network is the target class, and the corrected second input image is determined as the first data corresponding to the target class. The steps for generating the first data also include: Generate first data corresponding to each of the plurality of classes. The steps for determining the pruning threshold include: The first data corresponding to each of the plurality of classes is used to determine the pruning threshold corresponding to each of the plurality of classes.

2. The method according to claim 1, wherein, The steps for determining the pruning threshold also include: For each of the plurality of layers, a pruning sensitivity is measured, the pruning sensitivity indicating the degree to which the behavior of the artificial neural network changes during the pruning of each layer of the artificial neural network; and A pruning threshold is determined by performing pruning on each of the plurality of layers based on the pruning sensitivity measured for each of the layers.

3. The method according to claim 2, wherein, The steps for measuring pruning sensitivity include: For each of the plurality of layers, the behavior of the artificial neural network is maintained by pruning the corresponding layer, and the pruning sensitivity is measured by gradually increasing the percentile-based pruning threshold for the corresponding layer.

4. The method according to claim 2, wherein, The pruning sensitivity is determined based on at least one of the distribution of weights corresponding to the plurality of layers and the form of connections between the plurality of layers.

5. The method according to claim 2, wherein, The steps for determining the pruning threshold also include: Select layers from the plurality of layers in ascending order of trimming sensitivity; Determine a percentile-based pruning threshold corresponding to the selected layer, such that the behavior of the artificial neural network is preserved by pruning the selected layer; and For each of the remaining layers from the plurality of layers other than the selected layer, the steps of selecting the layer and determining the pruning threshold based on percentiles are performed repeatedly.

6. The method according to claim 2, wherein, The steps for determining the pruning threshold also include: Select layers from the plurality of layers in ascending order of trimming sensitivity; The selected layers are pruned to determine a percentile-based pruning threshold; and In response to the inclusion of the higher k classes predicted by the artificial neural network before pruning into the higher p classes predicted by the artificial neural network after pruning, a percentile-based pruning threshold is determined, where k and p are each positive integers and k ≤ p.

7. The method according to claim 6, wherein, The steps to determine the percentile-based pruning threshold include: The behavior of the artificial neural network is maintained by pruning the selected layers, with the pruning threshold based on percentiles increased at set intervals.

8. The method according to any one of claims 1 to 7, wherein, The change in the behavior of the artificial neural network is measured based on whether the output before and after pruning is applied to the artificial neural network meets the decision criteria.

9. The method according to claim 8, wherein, The decision criteria include the condition that the top p classes predicted by the pruned artificial neural network include the top k classes predicted by the unpruned artificial neural network, where k and p are each positive integers and k ≤ p. Specifically, when the output meets the decision criteria, it is measured that no change in the behavior of the artificial neural network has occurred.

10. The method according to any one of claims 1 to 7, wherein, The pruning thresholds include: A percentile-based pruning threshold for each of the plurality of layers; and The steps of compressing the artificial neural network include: For each of the multiple kernels in each of the multiple layers, a size-based pruning threshold is applied to the corresponding kernel based on the percentile-based pruning threshold of the corresponding layer.

11. The method according to claim 10, wherein, The step of compressing the artificial neural network includes removing a percentage of the weights of the artificial neural network, wherein the percentage weights correspond to a percentile-based pruning threshold.

12. The method according to any one of claims 1 to 7, wherein, The second input image includes a random noise image.

13. The method according to any one of claims 1 to 7, wherein, The steps for generating the first data also include: The cross-entropy loss between the one-hot vector corresponding to the target class and the class predicted in the artificial neural network is calculated through backpropagation.

14. The method according to claim 13, wherein, The weights are fixed and do not change during the backpropagation of the cross-entropy loss.

15. A method for compressing an artificial neural network for image recognition, implemented by a computer, the method comprising: Obtain the input image; Obtain the weights corresponding to a pre-trained artificial neural network, wherein the artificial neural network comprises multiple layers; Based on the weights and the input image, first data is generated for obtaining changes in the behavior of the artificial neural network caused by pruning the artificial neural network; Based on changes in the behavior of the artificial neural network, a pruning threshold for pruning the artificial neural network is determined using first data; and The artificial neural network is compressed based on a pruning threshold. The steps for generating the first data include: The weights are used to obtain the results of the artificial neural network's prediction of the class of the input image; When the class predicted in the artificial neural network is the target class from multiple classes, the input image is determined as the first data corresponding to the target class; When the class predicted by the artificial neural network is not the target class, the input image is repeatedly corrected by the artificial neural network until the class predicted by the artificial neural network is the target class, and the corrected input image is determined as the first data corresponding to the target class. The steps for generating the first data also include: Generate first data corresponding to each of the plurality of classes. The steps for determining the pruning threshold include: The first data corresponding to each of the plurality of classes is used to determine the pruning threshold corresponding to each of the plurality of classes.

16. A non-transitory computer-readable storage medium storing instructions, which, when executed by a processor, cause the processor to perform the method according to any one of claims 1 to 15.

17. An apparatus for compressing an artificial neural network for image recognition, the apparatus comprising: A communication interface is configured to: acquire an input image and acquire weights corresponding to a pre-trained artificial neural network, wherein the artificial neural network comprises multiple layers; and The processor is configured to: generate first data based on the weights and the input image for acquiring changes in the behavior of the artificial neural network caused by pruning; determine a pruning threshold for pruning the artificial neural network using the first data based on the changes in the behavior of the artificial neural network; and compress the artificial neural network based on the pruning threshold. The processor is also configured as follows: The weights are used to obtain the results of the artificial neural network's prediction of the class of the input image; When the class predicted in the artificial neural network is the target class from multiple classes, the input image is determined as the first data corresponding to the target class; When the class predicted by the artificial neural network is not the target class, the input image is repeatedly corrected by the artificial neural network until the class predicted by the artificial neural network is the target class, and the corrected input image is determined as the first data corresponding to the target class. The processor is also configured as follows: Generate first data corresponding to each of the plurality of classes; The first data corresponding to each of the plurality of classes is used to determine the pruning threshold corresponding to each of the plurality of classes.

18. The device according to claim 17, wherein, The processor is also configured to measure pruning sensitivity for each of the plurality of layers, the pruning sensitivity indicating the degree to which the behavior of the artificial neural network changes during the pruning of each layer of the artificial neural network; The pruning threshold is determined by performing pruning on each of the plurality of layers based on the pruning sensitivity measured for each of the plurality of layers.

19. The device according to claim 18, wherein, The processor is also configured to: for each of the plurality of layers, in response to the behavior of the artificial neural network being maintained by pruning the corresponding layer, measure pruning sensitivity by gradually increasing a percentile-based pruning threshold for the corresponding layer.

20. The device according to claim 18, wherein, The pruning sensitivity is determined based on at least one of the distribution of weights corresponding to the plurality of layers and the form of connections between the plurality of layers.

21. The device according to claim 18, wherein, The processor is also configured to: select layers from the plurality of layers in ascending order of pruning sensitivity; determine a percentile-based pruning threshold corresponding to the selected layer, such that the behavior of the artificial neural network is maintained by pruning the selected layer; and repeatedly perform the layer selection process and the percentile-based pruning threshold determination process for each of the remaining layers in the plurality of layers other than the selected layer.

22. The device according to any one of claims 17 to 21, wherein, The change in the behavior of the artificial neural network is measured based on whether the output before and after pruning is applied to the artificial neural network meets the decision criteria.

23. The device according to claim 22, wherein, The decision criteria include the condition that the top p classes predicted by the pruned artificial neural network include the top k classes predicted by the unpruned artificial neural network, where k and p are each positive integers and k ≤ p. Specifically, when the output meets the decision criteria, it is measured that no change in the behavior of the artificial neural network has occurred.

24. The device according to any one of claims 17 to 21, wherein, The pruning thresholds include: A percentile-based pruning threshold for each of the plurality of layers; and The processor is also configured to apply a size-based pruning threshold to each of the plurality of cores in each of the plurality of layers, based on a percentile-based pruning threshold for the corresponding layer.

25. The device according to any one of claims 17 to 21, wherein, The input image includes images with random noise.

26. The device according to any one of claims 17 to 21, wherein, The processor is also configured to backpropagate the cross-entropy loss between the one-hot vector corresponding to the target class and the class predicted in the artificial neural network.

27. The device according to claim 26, wherein, The weights are fixed and do not change during the backpropagation of the cross-entropy loss.

28. The device according to claim 17, wherein, The device includes at least one of the following: an advanced driver assistance system, a head-up display, a 3D digital information display, a navigation device, a neuromorphic device, a 3D mobile device, a smartphone, a smart TV, a smart vehicle, an Internet of Things device, a medical device, and a measuring device.