Adaptive fine-tuning method and device of convolutional neural network, equipment and storage medium
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- XINQIAO (BEIJING) SEMICONDUCTOR CO LTD
- Filing Date
- 2023-02-20
- Publication Date
- 2026-06-19
AI Technical Summary
[0006]本发明提供一种卷积神经网络的自适应微调方法、装置、设备及存储介质,用以解决现有技术中未考虑卷积神经网络中不同层之间的关系,会导致一些层更新量不足,影响卷积神经网络的模型精度的缺陷,实现充分考虑卷积神经网络不同层之间的关系,提升卷积神经网络的模型精度的目的
[0054] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the adaptive fine-tuning method for a convolutional neural network as described above.
Smart Images

Figure CN116108893B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of model fine-tuning technology, and in particular to an adaptive fine-tuning method, apparatus, device, and storage medium for convolutional neural networks. Background Technology
[0002] Model fine-tuning is a crucial step in the model deployment process. Neural network models trained on large amounts of data typically exhibit high accuracy. However, in many application scenarios, it is difficult to obtain sufficient data. Therefore, a common model deployment approach is to start with a model pre-trained on a large amount of public data (i.e., source data), and then collect a small amount of real-world application data (i.e., target data) to fine-tune the model parameters in order to quickly obtain a high-accuracy model.
[0003] For image classification tasks, existing convolutional neural network fine-tuning schemes typically involve fixing the model parameters of the feature extraction layers and only updating the parameters of the final fully connected layer. However, increasing research indicates that simply adjusting the parameters of the fully connected layer cannot achieve high accuracy on target domain data.
[0004] To address the aforementioned issues, existing technologies offer a novel convolutional neural network fine-tuning scheme: a convolutional neural network for image classification is pre-trained using a source domain image dataset; the target domain image dataset is input into the convolutional neural network; for each target domain image data, a standard value for each layer is calculated; and the parameters of each model in the convolutional neural network are fine-tuned based on the standard values of each layer, allowing for targeted adjustment of the parameters of certain layers in the model.
[0005] However, existing technologies do not take into account the relationships between different layers in convolutional neural networks, which can lead to insufficient updates in some layers and affect the accuracy of convolutional neural network models. Summary of the Invention
[0006] This invention provides an adaptive fine-tuning method, apparatus, device, and storage medium for convolutional neural networks, which addresses the shortcomings of existing technologies that do not consider the relationships between different layers in a convolutional neural network, leading to insufficient update amounts for some layers and affecting the model accuracy of the convolutional neural network. The invention aims to fully consider the relationships between different layers in a convolutional neural network and improve the model accuracy of the convolutional neural network.
[0007] This invention provides an adaptive fine-tuning method for convolutional neural networks, comprising:
[0008] Obtain a convolutional neural network for image classification;
[0009] The layers in the convolutional neural network are traversed in reverse order, and each layer in the convolutional neural network is divided into multiple blocks; at least one block includes: multiple adjacent related layers;
[0010] The target domain image dataset is input into the convolutional neural network. For each target domain image data, a standard value for each block is calculated, and the model parameters of the convolutional neural network are fine-tuned based on the standard value of each block.
[0011] According to the present invention, an adaptive fine-tuning method for a convolutional neural network includes inputting a target domain image dataset into the convolutional neural network, calculating a standard value for each block of the target domain image data for each block, and fine-tuning the model parameters of the convolutional neural network based on the standard value of each block, comprising:
[0012] The target domain image dataset is input into the convolutional neural network;
[0013] For each target domain image data, forward propagation and backward propagation calculations are performed to obtain the gradient of the model parameters of each block of the convolutional neural network;
[0014] The standard value of each block is obtained by calculating the ratio between the second norm of the gradient of the model parameters of each block and the second norm of the model parameters of the block.
[0015] Based on the standard value of each block, adjust the gradient of the model parameters of the block;
[0016] The optimizer fine-tunes the model parameters of the convolutional neural network based on the gradient of the adjusted model parameters of each block.
[0017] According to the present invention, an adaptive fine-tuning method for a convolutional neural network includes inputting a target domain image dataset into the convolutional neural network, calculating a standard value for each block of the target domain image data for each block, and fine-tuning the model parameters of the convolutional neural network based on the standard value of each block, comprising:
[0018] During the first training cycle, the target domain image dataset is input into the convolutional neural network;
[0019] For each target domain image data, forward propagation and backward propagation calculations are performed to obtain the gradient of the model parameters of each block of the convolutional neural network;
[0020] The standard value of each block is obtained by calculating the ratio between the second norm of the gradient of the model parameters of each block and the second norm of the model parameters of the block.
[0021] The average standard value of each block is obtained by statistically analyzing multiple standard values obtained during the first training cycle and calculating the average of the multiple standard values of each block.
[0022] In subsequent training cycles, the target domain image dataset is re-input into the convolutional neural network, and for each target domain image data, the model parameters of the convolutional neural network are fine-tuned based on the average standard value of each block.
[0023] According to an adaptive fine-tuning method for a convolutional neural network provided by the present invention, in the first training cycle, after obtaining the standard value of each block and before statistically analyzing the multiple standard values of each block obtained in the first training cycle, the method further includes:
[0024] Based on the standard value of each block, adjust the gradient of the model parameters of the block;
[0025] The optimizer fine-tunes the model parameters of the convolutional neural network based on the gradient of the adjusted model parameters of each block.
[0026] According to the present invention, an adaptive fine-tuning method for a convolutional neural network includes inputting a target domain image dataset into the convolutional neural network, calculating a standard value for each block of the target domain image data for each block, and fine-tuning the model parameters of the convolutional neural network based on the standard value of each block, comprising:
[0027] The target domain image dataset is input into the convolutional neural network;
[0028] For each target domain image data, forward propagation and backward propagation calculations are performed to obtain the gradient of the model parameters of each block of the convolutional neural network;
[0029] Calculate the product between the L2 norm of the model parameters of each block and the historical value of the first variable, calculate the ratio between the L2 norm of the gradient of the model parameters of the block and the product, calculate the minimum value between the ratio and the preset value, and obtain the standard value of each block;
[0030] The current value of the second variable is updated based on the ratio between the L2 norm of the gradient of the model parameters of each block and the L2 norm of the model parameters of the block, and the maximum value among the historical values of the second variable.
[0031] Based on the standard value of each block, adjust the gradient of the model parameters of the block;
[0032] The optimizer is used to fine-tune the model parameters of the convolutional neural network based on the gradient of the adjusted model parameters of each block.
[0033] Update the current value of the first variable based on the current value of the second variable.
[0034] According to the present invention, an adaptive fine-tuning method for a convolutional neural network includes inputting a target domain image dataset into the convolutional neural network, calculating a standard value for each block of the target domain image data for each block, and fine-tuning the model parameters of the convolutional neural network based on the standard value of each block, comprising:
[0035] During the first training cycle, the target domain image dataset is input into the convolutional neural network;
[0036] For each target domain image data, forward propagation and backward propagation calculations are performed to obtain the gradient of the model parameters of each block of the convolutional neural network;
[0037] Calculate the product between the L2 norm of the model parameters of each block and the historical value of the first variable, calculate the ratio between the L2 norm of the gradient of the model parameters of the block and the product, calculate the minimum value between the ratio and the preset value, and obtain the standard value of each block;
[0038] The current value of the second variable is updated based on the ratio between the L2 norm of the gradient of the model parameters of each block and the L2 norm of the model parameters of the block, and the maximum value among the historical values of the second variable.
[0039] The average standard value of each block is obtained by statistically analyzing multiple standard values obtained during the first training cycle and calculating the average of the multiple standard values of each block.
[0040] In subsequent training cycles, the target domain image dataset is re-input into the convolutional neural network, and for each target domain image data, the model parameters of the convolutional neural network are fine-tuned based on the average standard value of each block.
[0041] According to an adaptive fine-tuning method for a convolutional neural network provided by the present invention, in the first training cycle, after updating the current value of the second variable and before statistically analyzing multiple standard values of each block obtained in the first training cycle, the method further includes:
[0042] Based on the standard value of each block, adjust the gradient of the model parameters of the block;
[0043] The optimizer fine-tunes the model parameters of the convolutional neural network based on the gradient of the adjusted model parameters of each block.
[0044] According to the present invention, an adaptive fine-tuning method for a convolutional neural network is provided, wherein the block includes a fully connected block and a convolutional block;
[0045] The step of dividing the layers in the convolutional neural network into multiple blocks includes:
[0046] The fully connected layer in the convolutional neural network is divided into the fully connected block;
[0047] The convolutional blocks are defined as adjacent convolutional layers and batch normalization layers in the convolutional neural network.
[0048] According to the adaptive fine-tuning method for a convolutional neural network provided by the present invention, the standard value of the fully connected block is the standard value of the weights of the fully connected layer, and the standard value of the convolutional block is the standard value of the weights of the convolutional layer.
[0049] The present invention also provides an adaptive fine-tuning device for a convolutional neural network, comprising:
[0050] The acquisition module is used to acquire the convolutional neural network for image classification;
[0051] A partitioning module is used to traverse the layers in the convolutional neural network in reverse order and divide the layers in the convolutional neural network into multiple blocks; at least one block includes: multiple adjacent related layers;
[0052] The fine-tuning module is used to input the target domain image dataset into the convolutional neural network, calculate the standard value of each block for each target domain image data, and fine-tune the model parameters of the convolutional neural network based on the standard value of each block.
[0053] The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the adaptive fine-tuning method of the convolutional neural network as described above.
[0054] The present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the adaptive fine-tuning method for a convolutional neural network as described above.
[0055] The present invention provides an adaptive fine-tuning method, apparatus, device, and storage medium for convolutional neural networks. First, a convolutional neural network for image classification is obtained. Then, the layers in the convolutional neural network are traversed in reverse order, dividing each layer into multiple blocks, with at least one block comprising multiple adjacent related layers. That is, multiple adjacent related layers can be divided into one block, fully considering the relationships between different layers of the convolutional neural network. Finally, a target domain image dataset is input into the convolutional neural network. For each target domain image dataset, a standard value for each block is calculated, and the model parameters of the convolutional neural network are fine-tuned based on the standard values of each block. Since the model fine-tuning is based on the standard values of each block, and the block division fully considers the relationships between different layers of the convolutional neural network, the model accuracy of the convolutional neural network can be improved. Attached Figure Description
[0056] To more clearly illustrate the technical solutions in this invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0057] Figure 1 This is a schematic diagram illustrating the pre-training principle of convolutional neural networks provided by existing technology;
[0058] Figure 2 This is a schematic diagram illustrating the fine-tuning principle of convolutional neural networks provided by existing technology;
[0059] Figure 3 This is a schematic diagram of the test results of convolutional neural networks provided by existing technology under different offset conditions;
[0060] Figure 4 This is a flowchart illustrating the adaptive fine-tuning method for convolutional neural networks provided by the present invention.
[0061] Figure 5 This is a schematic diagram illustrating the fine-tuning principle of the convolutional neural network provided by the present invention;
[0062] Figure 6 This is a schematic diagram of the specific process of step 103 in the block mode provided by the present invention;
[0063] Figure 7 This is a schematic diagram of the specific process of step 103 in the preheating mode provided by the present invention;
[0064] Figure 8 This is a schematic diagram of the specific process of step 103 in the preheating fine-tuning mode provided by the present invention;
[0065] Figure 9 This is a schematic diagram of the specific process of step 103 under the history normalization mode provided by the present invention;
[0066] Figure 10 This is a schematic diagram of the specific process of step 103 under the preheating-history normalization mode provided by the present invention;
[0067] Figure 11 This is a schematic diagram of the specific process of step 103 under the preheating fine-tuning-history normalization mode provided by the present invention;
[0068] Figure 12 This is a schematic diagram of the model accuracy corresponding to the CIFAR-C dataset provided by this invention;
[0069] Figure 13 This is a schematic diagram of the model accuracy corresponding to the Living-17 dataset provided by this invention;
[0070] Figure 14 This is a schematic diagram of the model accuracy corresponding to the CIFAR-10F dataset provided by this invention;
[0071] Figure 15 This is a schematic diagram of the structure of the adaptive fine-tuning device for the convolutional neural network provided by the present invention;
[0072] Figure 16 This is a schematic diagram of the structure of the electronic device provided by the present invention. Detailed Implementation
[0073] To make the objectives, technical solutions, and advantages of this invention clearer, the technical solutions of this invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of this invention. All other embodiments obtained by those skilled in the art based on the embodiments of this invention without creative effort are within the scope of protection of this invention.
[0074] In convolutional neural networks, layers of different depths are more efficient at handling different types of shifts between source and target domain data. For example, earlier layers are more efficient at handling shifts in the appearance of the input image (such as seasonal changes, deformation, or added noise in the same scene), later layers are more efficient at handling shifts in image annotation (such as changes in classification criteria or class numbers), and middle layers are more efficient at handling shifts in features (such as different subclasses of the same class, for example, source domain data all showing wooden furniture, while target domain data all showing plastic furniture).
[0075] Based on this, existing technologies provide a method for fine-tuning convolutional neural networks. Model fine-tuning is a crucial step in the model deployment process. The typical deployment pattern for neural network models is as follows: first, the neural network model is pre-trained using large-scale source domain data; then, target domain data is collected according to the specific application scenario, and the model parameters of the pre-trained neural network model are fine-tuned.
[0076] like Figure 1 and Figure 2 As shown, the specific steps of the pre-training and fine-tuning methods for convolutional neural networks provided by existing technologies may include:
[0077] Step 1: Input the source domain data into the convolutional neural network to obtain the pre-trained convolutional neural network; use the pre-trained convolutional neural network as the initial model Φ. src Its parameters are Θ = {θ0, θ1, ..., θ} n};
[0078] Step 2: Input the target domain dataset, complete the forward and backward propagation of the model, and obtain the gradients g0, g1, ..., g corresponding to all parameters. n ;
[0079] Step 3: Calculate the standard value for each layer:
[0080] Step 4: Use η0, η1, ..., η n The maximum value η max =max(η0,η1,…,η) n Normalize the criteria values:
[0081] Step 5: Adjust the gradient of each parameter using standard values: For the i-th layer parameter, the gradient g i Adjusted to η i ×g i ;
[0082] Step 6: Update the model parameters using the optimizer based on the adjusted gradient;
[0083] Step 7: Repeat steps 2 to 6 to fine-tune the model parameters by traversing the target domain data.
[0084] However, the adaptive fine-tuning method for convolutional neural networks (also known as baseline mode) provided by the above-mentioned existing technology has the following drawbacks:
[0085] 1) Existing technologies do not take into account the relationship between different layers in convolutional neural networks, which may lead to insufficient update of some layers and affect the model accuracy of convolutional neural networks;
[0086] 2) Existing technologies do not fully utilize the information between different data sets;
[0087] 3) Existing technologies require a second traversal of model parameters, which cannot be well integrated with asynchronous update technologies.
[0088] For example, such as Figure 3 As shown, the vertical axis represents the relative accuracy of the model after fine-tuning. Positive values indicate that adjusting only some model parameters yields better results, while negative values indicate that adjusting all parameters yields better results. The horizontal axis represents different datasets. The first part of the layers (Block 1) represents the earliest layers, the second / third / fourth parts of the layers (Block 2 / 3 / 4) represent the middle layers, and the last layer (Last Layer) represents the last fully connected layer. The datasets corresponding to image appearance shifts (Input-level shifts) include: the first dataset (CIFAR-C) and the second dataset (ImageNet-C). The datasets corresponding to feature shifts (Feature-level shifts) include: the third dataset (Living-17) and the fourth dataset (Entity-30). The datasets corresponding to image annotation shifts (Output-level shifts) include: the fifth dataset (CIFAR-Flip), the sixth dataset (Waterbirds), and the seventh dataset (CelebA). Here, shift refers to the offset between the source dataset used for pre-training and the target dataset.
[0089] from Figure 3 As can be seen from the test results of the convolutional neural network under different offset conditions, each test only updates the model parameters of one block or last layer, adjusts different hyperparameters and records the model accuracy. It can be seen that when the updated model parameters can correctly reflect the data offset, the model accuracy is high.
[0090] However, this approach requires manual assessment of the offset between the source and target domain data, and necessitates multiple tests to determine which layers require optimal updating. This invention, in its embodiment, automatically evaluates the model parameters that need updating by utilizing information from the statistical model fine-tuning process.
[0091] The following is combined Figures 4 to 14 The adaptive fine-tuning method of convolutional neural networks provided in this invention will be described in detail through some embodiments and application scenarios.
[0092] Please refer to Figure 4 , Figure 4 This is a flowchart illustrating the adaptive fine-tuning method for convolutional neural networks provided by this invention. Figure 4 As shown, the method may include the following steps:
[0093] Step 101: Obtain the convolutional neural network for image classification;
[0094] Step 102: Traverse each layer in the convolutional neural network in reverse order and divide each layer in the convolutional neural network into multiple blocks; at least one block includes: multiple adjacent related layers;
[0095] Step 103: Input the target domain image dataset into the convolutional neural network. For each target domain image data, calculate the standard value of each block and fine-tune the model parameters of the convolutional neural network based on the standard value of each block.
[0096] In step 101, for example, a convolutional neural network for image classification, pre-trained using a source domain image dataset, is obtained as the initial model Φ. src Φ src The parameters included are Θ = {θ0, θ1, ..., θ} n The parameters of different layers are arranged in reverse order, and θ0 represents the last parameter.
[0097] In step 102, the layers in the convolutional neural network are traversed in reverse order. If there are functionally independent layers, the functionally independent layers are divided into a block; if there are adjacent functionally related layers, multiple adjacent related layers are divided into a block. This fully considers the relationship between different layers of the convolutional neural network.
[0098] For example, a block may include fully connected blocks and convolutional blocks. Fully connected layers in a convolutional neural network are divided into fully connected blocks (also called FC blocks), and adjacent convolutional layers and batch normalization layers in a convolutional neural network are divided into convolutional blocks (also called CONV blocks).
[0099] In the convolutional neural network used for image classification tasks, the layers containing parameters are fully connected layers, convolutional layers, and batch normalization layers. The parameters of each layer are divided into two types: weights and biases.
[0100] Convolutional layers are used to extract image features, while batch normalization layers primarily adjust the output distribution of adjacent convolutional layers to improve training efficiency. It can be seen that adjacent convolutional layers and batch normalization layers are functionally related; therefore, adjacent convolutional layers and batch normalization layers are grouped into a single block, known as a convolutional block.
[0101] Fully connected layers are mainly used to map image features to image classification results. Since the function of fully connected layers is independent, they are divided into a block from a functional perspective, namely a fully connected block.
[0102] In step 103, the target domain image dataset is input into the convolutional neural network. The network iterates through each target domain image data point in the dataset, calculating the standard value for each block for each target domain image data point, and fine-tuning the model parameters of the convolutional neural network. Since the model fine-tuning is based on the standard value of each block, and the block division fully considers the relationship between different layers of the convolutional neural network, it can improve the model accuracy of the convolutional neural network.
[0103] For example, for a fully connected block, the standard value of the fully connected block is the standard value of the weights of the fully connected layer. For a convolutional block, the standard value of the convolutional block is the standard value of the weights of the convolutional layer.
[0104] The adaptive fine-tuning method for convolutional neural networks provided in this embodiment first obtains a convolutional neural network for image classification; then, iterates through each layer of the convolutional neural network in reverse order, dividing each layer into multiple blocks, with at least one block including multiple adjacent related layers; that is, multiple adjacent related layers can be divided into one block, fully considering the relationship between different layers of the convolutional neural network; finally, the target domain image dataset is input into the convolutional neural network, and for each target domain image data, a standard value for each block is calculated, and the model parameters of the convolutional neural network are fine-tuned based on the standard value of each block; since the model fine-tuning is based on the standard value of each block, and the block division fully considers the relationship between different layers of the convolutional neural network, the model accuracy of the convolutional neural network can be improved.
[0105] In one embodiment, please refer to Figure 6 , Figure 6 This is a schematic diagram illustrating the specific process of step 103 in the block mode provided by the present invention. For example... Figure 5 and Figure 6 As shown, step 103 above may include:
[0106] Step 201: Input the target domain image dataset into the convolutional neural network;
[0107] Step 202: For each target domain image data, perform forward propagation and backward propagation calculations to obtain the gradients of the model parameters of each block of the convolutional neural network.
[0108] Step 203: Calculate the ratio between the L2 norm of the gradient of the model parameters of each block and the L2 norm of the model parameters of the block to obtain the standard value of each block;
[0109] Step 204: Adjust the gradient of the model parameters of each block based on the standard values of each block;
[0110] Step 205: Use the optimizer to fine-tune the model parameters of the convolutional neural network based on the gradient of the adjusted model parameters of each block.
[0111] In step 202, for each target domain image data, forward propagation and backward propagation calculations are performed to obtain the gradients of the model parameters of each block of the convolutional neural network, i.e., the gradients g0, g1, ..., g of all parameters. n .
[0112] In step 203, the criteria value for each block can be calculated using the following expression (1):
[0113]
[0114] Where η represents the standard value of each block, θ represents a model parameter of each block, and g represents the gradient of θ.
[0115] For example, for a fully connected block, the standard value of the fully connected block is the standard value of the weights of the fully connected layer. For a convolutional block, the standard value of the convolutional block is the standard value of the weights of the convolutional layer.
[0116] In step 204, since the weights in each layer are more sensitive to data information, the gradient of the model parameters of each block is adjusted based on the standard value of each block (usually the standard value of the weights).
[0117] For example, for the i-th layer parameter, the standard value is based on the block corresponding to the i-th layer parameter. The gradient g of the parameters of the i-th layer i Adjusted to
[0118] In this embodiment, since model fine-tuning is based on the gradient of the model parameters of each block after adjustment, and the gradient of the model parameters of each block is adjusted based on the standard value of each block, and the division of blocks fully considers the relationship between different layers of the convolutional neural network, the model accuracy of the convolutional neural network can be improved in the case of feature offset and image annotation offset.
[0119] In one embodiment, please refer to Figure 7 , Figure 7 This is a schematic diagram illustrating the specific process of step 103 in the preheating mode provided by the present invention. For example... Figure 5 and Figure 7 As shown, step 103 above may include:
[0120] Step 301: During the first training cycle, input the target domain image dataset into the convolutional neural network;
[0121] Step 302: For each target domain image data, perform forward propagation and backward propagation calculations to obtain the gradients of the model parameters of each block of the convolutional neural network.
[0122] Step 303: Calculate the ratio between the L2 norm of the gradient of the model parameters of each block and the L2 norm of the model parameters of the block to obtain the standard value of each block;
[0123] Step 304: Collect multiple standard values for each block obtained in the first training period, and calculate the average of the multiple standard values for each block to obtain the average standard value for each block;
[0124] Step 305: In subsequent training cycles, the target domain image dataset is re-input into the convolutional neural network. For each target domain image data, the model parameters of the convolutional neural network are fine-tuned based on the average standard value of each block.
[0125] Model fine-tuning typically requires multiple traversals of the target domain image dataset. In step 301, a complete traversal is called a training cycle.
[0126] In step 302, for each target domain image data, forward propagation and backward propagation calculations are performed to obtain the gradients of the model parameters of each block of the convolutional neural network, i.e., the gradients g0, g1, ..., g of all parameters. n .
[0127] In step 303, the criteria value for each block can be calculated using the following expression (2):
[0128]
[0129] Where η represents the standard value of each block, θ represents a model parameter of each block, and g represents the gradient of θ.
[0130] The standard values of each block in the convolutional neural network can be calculated using the above expression (2), namely η0, η1, ..., η b .
[0131] In step 304, after traversing the target domain image dataset, the multiple η0, η1, ..., η obtained during the first training cycle are statistically analyzed. b And calculate the average standard value of each block.
[0132] In step 305, during subsequent training cycles, the target domain image dataset is re-input into the convolutional neural network. For each target domain image data point, forward propagation and backward propagation calculations are performed to obtain the gradients of the model parameters for each block of the convolutional neural network, i.e., the gradients g0, g1, ..., g of all parameters.n .
[0133] The gradient of the model parameters for each block is adjusted based on the average standard value of each block.
[0134] For example, for the i-th layer parameter, the average standard value is based on the block corresponding to the i-th layer parameter. The gradient g of the parameters of the i-th layer i Adjusted to
[0135] The optimizer is used to fine-tune the parameters of each model in the convolutional neural network based on the gradient of the adjusted model parameters of each block.
[0136] In this embodiment, no model parameters are fine-tuned during the first training cycle; only multiple standard values for each block are collected. In subsequent training cycles, it is not necessary to calculate the standard values for each block. Adjusting the gradient used for model fine-tuning using the average standard value of each block allows for the fusion of multiple standard values collected during the first training cycle, thus integrating information from different data sets and improving the accuracy of the convolutional neural network model.
[0137] In one embodiment, please refer to Figure 8 , Figure 8 This is a schematic diagram illustrating the specific process of step 103 in the preheating fine-tuning mode provided by the present invention. For example... Figure 5 and Figure 8 As shown, step 103 above may include:
[0138] Step 401: During the first training cycle, input the target domain image dataset into the convolutional neural network;
[0139] Step 402: For each target domain image data, perform forward propagation and backward propagation calculations to obtain the gradient of the model parameters of each block of the convolutional neural network.
[0140] Step 403: Calculate the ratio between the L2 norm of the gradient of the model parameters of each block and the L2 norm of the model parameters of the block to obtain the standard value of each block;
[0141] Step 404: Adjust the gradient of the model parameters of each block based on the standard values of each block;
[0142] Step 405: Use the optimizer to fine-tune the model parameters of the convolutional neural network based on the gradient of the adjusted model parameters of each block.
[0143] Step 406: Collect multiple standard values for each block obtained in the first training cycle, and calculate the average of the multiple standard values for each block to obtain the average standard value for each block;
[0144] Step 407: In subsequent training cycles, the target domain image dataset is re-input into the convolutional neural network. For each target domain image data, the model parameters of the convolutional neural network are fine-tuned based on the average standard value of each block.
[0145] In step 401, model fine-tuning typically requires multiple traversals of the target domain image dataset, with one complete traversal being called a training cycle.
[0146] In step 402, for each target domain image data, forward propagation and backward propagation calculations are performed to obtain the gradients of the model parameters of each block of the convolutional neural network, i.e., the gradients g0, g1, ..., g of all parameters. n .
[0147] In step 403, the criteria value for each block can be calculated using the following expression (3):
[0148]
[0149] Where η represents the standard value of each block, θ represents a model parameter of each block, and g represents the gradient of θ.
[0150] In step 404, for example, for the i-th layer parameter, the standard value is based on the block corresponding to the i-th layer parameter. The gradient g of the parameters of the i-th layer i Adjusted to
[0151] In step 405, the optimizer fine-tunes the model parameters of the convolutional neural network based on the gradients of the adjusted model parameters of each block. For the next target domain image data, the convolutional neural network used is the one fine-tuned by the optimizer.
[0152] In step 406, after traversing the target domain image dataset, the multiple η0, η1, ..., η obtained during the first training cycle are statistically analyzed. b And calculate the average standard value of each block.
[0153] In step 407, during subsequent training cycles, the target domain image dataset is re-input into the convolutional neural network. For each target domain image data point, forward propagation and backward propagation calculations are performed to obtain the gradients of the model parameters for each block of the convolutional neural network, i.e., the gradients g0, g1, ..., g0 corresponding to all parameters. n .
[0154] The gradient of the model parameters for each block is adjusted based on the average standard value of each block.
[0155] For example, for the i-th layer parameter, the average standard value is based on the block corresponding to the i-th layer parameter. The gradient g of the parameters of the i-th layer i Adjusted to
[0156] The optimizer is used to fine-tune the parameters of each model in the convolutional neural network based on the gradient of the adjusted model parameters of each block.
[0157] In this embodiment, compared with the warm-up mode, the model parameters are fine-tuned in the first training cycle to avoid wasting a training cycle; and, multiple standard values of each block counted in the first training cycle can be dynamically fused, that is, information between different data can be dynamically fused, which can significantly improve the model accuracy of the convolutional neural network in the case of image annotation offset.
[0158] In one embodiment, please refer to Figure 9 , Figure 9 This is a schematic diagram illustrating the specific process of step 103 under the historical normalization mode provided by the present invention. For example... Figure 5 and Figure 9 As shown, step 103 above may include:
[0159] Step 501: Input the target domain image dataset into the convolutional neural network;
[0160] Step 502: For each target domain image data, perform forward propagation and backward propagation calculations to obtain the gradients of the model parameters of each block of the convolutional neural network.
[0161] Step 503: Calculate the product between the L2 norm of the model parameters of each block and the historical value of the first variable, calculate the ratio between the L2 norm of the gradient of the model parameters of the block and the product, calculate the minimum value between the ratio and the preset value, and obtain the standard value of each block.
[0162] Step 504: Update the current value of the second variable based on the ratio between the L2 norm of the gradient of the model parameters of each block and the L2 norm of the model parameters of the block, and the maximum value among the historical values of the second variable.
[0163] Step 505: Adjust the gradient of the model parameters of each block based on the standard values of each block;
[0164] Step 506: Use the optimizer to fine-tune the model parameters of the convolutional neural network based on the gradient of the adjusted model parameters of each block.
[0165] Step 507: Update the current value of the first variable based on the current value of the second variable.
[0166] Before step 501, the first variable current_max is used to record the maximum value used in the current normalization operation, and the second variable running_max is used to track the historical maximum value. These two variables are initialized to: running_max = 0.0, current_max = 1.0.
[0167] In step 502, for each target domain image data, forward propagation and backward propagation calculations are performed to obtain the gradients of the model parameters of each block of the convolutional neural network, i.e., the gradients g0, g1, ..., g of all parameters. n .
[0168] In step 503, the preset value can be 1.0, and the standard value of each block is calculated using the following expression (4):
[0169]
[0170] Where η represents the standard value of each block, θ represents a model parameter of each block, g represents the gradient of θ, and current_max is the current_max obtained in the previous step.
[0171] In step 504, running_max is updated using the following expression (5):
[0172]
[0173] Based on the previous running_max and The maximum value in the current running_max is updated, and the historical maximum value of running_max can be tracked.
[0174] In step 507, update the current_max value: current_max := running_max.
[0175] In this embodiment, the block mode does not use normalization to avoid a second traversal of the model parameters. However, not using normalization may lead to model training instability. The history normalization mode uses historical maximum values to perform normalization, which avoids a second traversal of the model parameters, thus reducing the number of traversals and allowing for better integration with asynchronous update techniques. It also avoids model training instability.
[0176] In one embodiment, please refer to Figure 10 , Figure 10 This is a schematic diagram illustrating the specific process of step 103 under the preheating-history normalization mode provided by the present invention. For example... Figure 10 As shown, step 103 above may include:
[0177] Step 601: During the first training cycle, input the target domain image dataset into the convolutional neural network;
[0178] Step 602: For each target domain image data, perform forward propagation and backward propagation calculations to obtain the gradients of the model parameters of each block of the convolutional neural network.
[0179] Step 603: Calculate the product between the L2 norm of the model parameters of each block and the historical value of the first variable, calculate the ratio between the L2 norm of the gradient of the model parameters of the block and the product, calculate the minimum value between the ratio and the preset value, and obtain the standard value of each block.
[0180] Step 604: Update the current value of the second variable based on the ratio between the L2 norm of the gradient of the model parameters of each block and the L2 norm of the model parameters of the block, and the maximum value among the historical values of the second variable.
[0181] Step 605: Collect multiple standard values for each block obtained in the first training cycle, and calculate the average of the multiple standard values for each block to obtain the average standard value for each block;
[0182] Step 606: In subsequent training cycles, the target domain image dataset is re-input into the convolutional neural network. For each target domain image data, the model parameters of the convolutional neural network are fine-tuned based on the average standard value of each block.
[0183] Before step 601, during the first training cycle, the first variable current_max is used to record the maximum value used by the current normalization operation, and the second variable running_max is used to track the historical maximum value. These two variables are initialized as follows: running_max = 0.0, current_max = 1.0.
[0184] In step 602, for each target domain image data, forward propagation and backward propagation calculations are performed to obtain the gradients of the model parameters of each block of the convolutional neural network, i.e., the gradients g0, g1, ..., g corresponding to all parameters. n .
[0185] In step 603, the preset value can be 1.0, and the standard value of each block is calculated using the following expression (6):
[0186]
[0187] Where η represents the standard value of each block, θ represents a model parameter of each block, g represents the gradient of θ, and current_max is the current_max obtained in the previous step.
[0188] In step 604, running_max is updated using the following expression (7):
[0189]
[0190] Based on the previous running_max and The maximum value in the current running_max is updated, and the historical maximum value of running_max can be tracked.
[0191] In step 605, after traversing the target domain image dataset, the multiple η0, η1, ..., η obtained during the first training cycle are statistically analyzed. b And calculate the average standard value of each block.
[0192] In step 606, during subsequent training cycles, the target domain image dataset is re-input into the convolutional neural network. For each target domain image data, forward propagation and backward propagation calculations are performed to obtain the gradients of the model parameters for each block of the convolutional neural network, i.e., the gradients g0, g1, ..., g0 corresponding to all parameters. n .
[0193] The gradient of the model parameters for each block is adjusted based on the average standard value of each block.
[0194] For example, for the i-th layer parameter, the average standard value is based on the block corresponding to the i-th layer parameter. The gradient g of the parameters of the i-th layer i Adjusted to
[0195] The optimizer is used to fine-tune the parameters of each model in the convolutional neural network based on the gradient of the adjusted model parameters of each block.
[0196] In this embodiment, the preheating-history normalization mode is used for fine-tuning. This not only integrates multiple standard values of each block counted in the first training cycle, that is, integrates information between different data, but also avoids secondary traversal of model parameters, which can reduce the number of traversals of model parameters and better cooperate with asynchronous update and other techniques.
[0197] In one embodiment, please refer to Figure 11 , Figure 11 This is a schematic diagram illustrating the specific process of step 103 under the preheating fine-tuning-history normalization mode provided by the present invention. For example... Figure 11 As shown, step 103 above may include:
[0198] Step 701: During the first training cycle, input the target domain image dataset into the convolutional neural network;
[0199] Step 702: For each target domain image data, perform forward propagation and backward propagation calculations to obtain the gradient of the model parameters of each block of the convolutional neural network.
[0200] Step 703: Calculate the product between the L2 norm of the model parameters of each block and the historical value of the first variable, calculate the ratio between the L2 norm of the gradient of the model parameters of the block and the product, calculate the minimum value between the ratio and the preset value, and obtain the standard value of each block.
[0201] Step 704: Update the current value of the second variable based on the ratio between the L2 norm of the gradient of the model parameters of each block and the L2 norm of the model parameters of the block, and the maximum value among the historical values of the second variable.
[0202] Step 705: Adjust the gradient of the model parameters of each block based on the standard value of each block;
[0203] Step 706: Use the optimizer to fine-tune the model parameters of the convolutional neural network based on the gradient of the adjusted model parameters of each block.
[0204] Step 707: Collect multiple standard values for each block obtained in the first training cycle, and calculate the average of the multiple standard values for each block to obtain the average standard value for each block;
[0205] Step 708: In subsequent training cycles, the target domain image dataset is re-input into the convolutional neural network. For each target domain image data, the model parameters of the convolutional neural network are fine-tuned based on the average standard value of each block.
[0206] Before step 701, during the first training cycle, the first variable current_max is used to record the maximum value used by the current normalization operation, and the second variable running_max is used to track the historical maximum value. These two variables are initialized as follows: running_max = 0.0, current_max = 1.0.
[0207] In step 702, for each target domain image data, forward propagation and backward propagation calculations are performed to obtain the gradients of the model parameters of each block of the convolutional neural network, i.e., the gradients g0, g1, ..., g of all parameters. n .
[0208] In step 703, the preset value can be 1.0, and the standard value of each block is calculated using the following expression (8):
[0209]
[0210] Where η represents the standard value of each block, θ represents a model parameter of each block, g represents the gradient of θ, and current_max is the current_max obtained in the previous step.
[0211] In step 704, running_max is updated using the following expression (9):
[0212]
[0213] Based on the previous running_max and The maximum value in the current running_max is updated, and the historical maximum value of running_max can be tracked.
[0214] In step 707, after traversing the target domain image dataset, the multiple η0, η1, ..., η obtained in the first training cycle are statistically analyzed. b And calculate the average standard value of each block.
[0215] In step 708, during subsequent training cycles, the target domain image dataset is re-input into the convolutional neural network. For each target domain image data point, forward propagation and backward propagation calculations are performed to obtain the gradients of the model parameters for each block of the convolutional neural network, i.e., the gradients g0, g1, ..., g of all parameters. n .
[0216] The gradient of the model parameters for each block is adjusted based on the average standard value of each block.
[0217] For example, for the i-th layer parameter, the average standard value is based on the block corresponding to the i-th layer parameter. The gradient g of the parameters of the i-th layer i Adjusted to
[0218] The optimizer is used to fine-tune the parameters of each model in the convolutional neural network based on the gradient of the adjusted model parameters of each block.
[0219] In this embodiment, firstly, compared with the warm-up-history normalization mode, the warm-up fine-tuning-history normalization mode also fine-tunes the model parameters in the first training cycle, avoiding wasting a training cycle; secondly, it can dynamically fuse multiple standard values of each block counted in the first training cycle, that is, dynamically fuse information between different data; thirdly, it can avoid secondary traversal of model parameters, that is, it can reduce the number of traversals of model parameters, and can better cooperate with techniques such as asynchronous updates; in the case of feature offset and image annotation offset, it can significantly improve the model accuracy of convolutional neural networks.
[0220] The methods of the above embodiments were tested as follows:
[0221] Figures 12-14 The test results are shown for existing techniques (baseline mode, baseline mode), block mode, warm-up mode, warm-up fine-tuning mode, history normalization mode, block-warm-up mode, block-warm-up fine-tuning mode, block-warm-up-history normalization mode, and block-warm-up-history normalization mode, respectively, under the conditions of image appearance shift (CIFAR-C dataset), feature shift (Living-17 dataset), and image annotation shift (CIFAR-10F dataset).
[0222] The test results are as follows: the horizontal axis represents different fine-tuning modes, and the vertical axis represents model accuracy.
[0223] 1) Block mode fully considers the relationship between different layers of the convolutional neural network. For image appearance offset, it has the same effect as the basic mode, and is slightly better than the basic mode in the case of feature offset and image annotation offset.
[0224] 2) The preheating mode integrates information from different data. Its effect is not obvious when used alone. It is slightly better than the basic mode when used in conjunction with the block-history normalization mode.
[0225] 3) The preheating and fine-tuning mode dynamically integrates information from different data. When used alone, it shows a significant improvement over the basic mode in the case of image annotation offset. When combined with the block-history normalization mode, it shows a significant improvement in the case of feature offset and image annotation offset.
[0226] 4) The history normalization mode reduces the number of times model parameters are traversed, which can better cooperate with techniques such as asynchronous updates. When used alone, it can maintain a similar accuracy to the global normalization mode of the basic mode. When combined with block and warm-up mode / warm-up fine-tuning mode, it can significantly improve the model accuracy.
[0227] The adaptive fine-tuning device for convolutional neural networks provided by the present invention will be described below. The adaptive fine-tuning device for convolutional neural networks described below can be referred to in correspondence with the adaptive fine-tuning method for convolutional neural networks described above.
[0228] Please refer to Figure 15 , Figure 15 This is a schematic diagram of the structure of the adaptive fine-tuning device for the convolutional neural network provided by the present invention. Figure 15 As shown, the device may include:
[0229] The acquisition module 10 is used to acquire a convolutional neural network for image classification;
[0230] The partitioning module 20 is used to traverse the layers in the convolutional neural network in reverse order and divide the layers in the convolutional neural network into multiple blocks; at least one block includes: multiple adjacent related layers;
[0231] The fine-tuning module 30 is used to input the target domain image dataset into the convolutional neural network, calculate the standard value of each block for each target domain image data, and fine-tune the model parameters of the convolutional neural network based on the standard value of each block.
[0232] Optionally, the fine-tuning module 30 is specifically used for:
[0233] The target domain image dataset is input into the convolutional neural network;
[0234] For each target domain image data, forward propagation and backward propagation calculations are performed to obtain the gradient of the model parameters of each block of the convolutional neural network;
[0235] The standard value of each block is obtained by calculating the ratio between the second norm of the gradient of the model parameters of each block and the second norm of the model parameters of the block.
[0236] Based on the standard value of each block, adjust the gradient of the model parameters of the block;
[0237] The optimizer fine-tunes the model parameters of the convolutional neural network based on the gradient of the adjusted model parameters of each block.
[0238] Optionally, the fine-tuning module 30 is specifically used for:
[0239] During the first training cycle, the target domain image dataset is input into the convolutional neural network;
[0240] For each target domain image data, forward propagation and backward propagation calculations are performed to obtain the gradient of the model parameters of each block of the convolutional neural network;
[0241] The standard value of each block is obtained by calculating the ratio between the second norm of the gradient of the model parameters of each block and the second norm of the model parameters of the block.
[0242] The average standard value of each block is obtained by statistically analyzing multiple standard values obtained during the first training cycle and calculating the average of the multiple standard values of each block.
[0243] In subsequent training cycles, the target domain image dataset is re-input into the convolutional neural network, and for each target domain image data, the model parameters of the convolutional neural network are fine-tuned based on the average standard value of each block.
[0244] Optionally, the fine-tuning module 30 is also used for:
[0245] During the first training cycle, after obtaining the standard value of each block, and before statistically analyzing the multiple standard values of each block obtained during the first training cycle, the gradient of the model parameters of the block is adjusted based on the standard value of each block.
[0246] The optimizer fine-tunes the model parameters of the convolutional neural network based on the gradient of the adjusted model parameters of each block.
[0247] Optionally, the fine-tuning module 30 is specifically used for:
[0248] The target domain image dataset is input into the convolutional neural network;
[0249] For each target domain image data, forward propagation and backward propagation calculations are performed to obtain the gradient of the model parameters of each block of the convolutional neural network;
[0250] Calculate the product between the L2 norm of the model parameters of each block and the historical value of the first variable, calculate the ratio between the L2 norm of the gradient of the model parameters of the block and the product, calculate the minimum value between the ratio and the preset value, and obtain the standard value of each block;
[0251] The current value of the second variable is updated based on the ratio between the L2 norm of the gradient of the model parameters of each block and the L2 norm of the model parameters of the block, and the maximum value among the historical values of the second variable.
[0252] Based on the standard value of each block, adjust the gradient of the model parameters of the block;
[0253] The optimizer is used to fine-tune the model parameters of the convolutional neural network based on the gradient of the adjusted model parameters of each block.
[0254] Update the current value of the first variable based on the current value of the second variable.
[0255] Optionally, the fine-tuning module 30 is specifically used for:
[0256] During the first training cycle, the target domain image dataset is input into the convolutional neural network;
[0257] For each target domain image data, forward propagation and backward propagation calculations are performed to obtain the gradient of the model parameters of each block of the convolutional neural network;
[0258] Calculate the product between the L2 norm of the model parameters of each block and the historical value of the first variable, calculate the ratio between the L2 norm of the gradient of the model parameters of the block and the product, calculate the minimum value between the ratio and the preset value, and obtain the standard value of each block;
[0259] The current value of the second variable is updated based on the ratio between the L2 norm of the gradient of the model parameters of each block and the L2 norm of the model parameters of the block, and the maximum value among the historical values of the second variable.
[0260] The average standard value of each block is obtained by statistically analyzing multiple standard values obtained during the first training cycle and calculating the average of the multiple standard values of each block.
[0261] In subsequent training cycles, the target domain image dataset is re-input into the convolutional neural network, and for each target domain image data, the model parameters of the convolutional neural network are fine-tuned based on the average standard value of each block.
[0262] Optionally, the fine-tuning module 30 is also used for:
[0263] During the first training period, after updating the current value of the second variable and before statistically analyzing the multiple standard values of each block obtained during the first training period, the gradient of the model parameters of the block is adjusted based on the standard values of each block.
[0264] The optimizer fine-tunes the model parameters of the convolutional neural network based on the gradient of the adjusted model parameters of each block.
[0265] Optionally, the block includes fully connected blocks and convolutional blocks, and the partitioning module 20 is specifically used for:
[0266] The fully connected layer in the convolutional neural network is divided into the fully connected block;
[0267] The convolutional blocks are defined as adjacent convolutional layers and batch normalization layers in the convolutional neural network.
[0268] Optionally, the standard value of the fully connected block is the standard value of the weights of the fully connected layer, and the standard value of the convolutional block is the standard value of the weights of the convolutional layer.
[0269] Figure 16A schematic diagram of the structure of the electronic device provided by the present invention is illustrated, such as... Figure 16 As shown, the electronic device may include: a processor 810, a communication interface 820, a memory 830, and a communication bus 840, wherein the processor 810, the communication interface 820, and the memory 830 communicate with each other via the communication bus 840. The processor 810 can call logical instructions in the memory 830 to execute an adaptive fine-tuning method for a convolutional neural network, the method including:
[0270] Obtain a convolutional neural network for image classification;
[0271] The layers in the convolutional neural network are traversed in reverse order, and each layer in the convolutional neural network is divided into multiple blocks; at least one block includes: multiple adjacent related layers;
[0272] The target domain image dataset is input into the convolutional neural network. For each target domain image data, a standard value for each block is calculated, and the model parameters of the convolutional neural network are fine-tuned based on the standard value of each block.
[0273] Furthermore, the logical instructions in the aforementioned memory 830 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, essentially, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0274] On the other hand, the present invention also provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, wherein when the program instructions are executed by a computer, the computer is able to execute the adaptive fine-tuning method for convolutional neural networks provided by the above methods, the method comprising:
[0275] Obtain a convolutional neural network for image classification;
[0276] The layers in the convolutional neural network are traversed in reverse order, and each layer in the convolutional neural network is divided into multiple blocks; at least one block includes: multiple adjacent related layers;
[0277] The target domain image dataset is input into the convolutional neural network. For each target domain image data, a standard value for each block is calculated, and the model parameters of the convolutional neural network are fine-tuned based on the standard value of each block.
[0278] In another aspect, the present invention also provides a non-transitory computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the adaptive fine-tuning methods for the convolutional neural networks provided above, the method comprising:
[0279] Obtain a convolutional neural network for image classification;
[0280] The layers in the convolutional neural network are traversed in reverse order, and each layer in the convolutional neural network is divided into multiple blocks; at least one block includes: multiple adjacent related layers;
[0281] The target domain image dataset is input into the convolutional neural network. For each target domain image data, a standard value for each block is calculated, and the model parameters of the convolutional neural network are fine-tuned based on the standard value of each block.
[0282] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0283] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0284] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. An adaptive fine-tuning method of a convolutional neural network, characterized in that, include: Obtain a convolutional neural network for image classification; The layers in the convolutional neural network are traversed in reverse order, and each layer in the convolutional neural network is divided into multiple blocks; at least one block includes: multiple adjacent related layers; The target domain image dataset is input into the convolutional neural network. For each target domain image data, the standard value of each block is calculated, and the model parameters of the convolutional neural network are fine-tuned based on the standard value of each block. The step of fine-tuning the model parameters of the convolutional neural network based on the standard values of each block includes: For each target domain image data, forward propagation and backward propagation calculations are performed to obtain the gradient of the model parameters of each block of the convolutional neural network; The gradient of the model parameters of the block is adjusted based on the standard value of each block, or based on the average standard value of each block obtained by statistical analysis of multiple standard values. The optimizer fine-tunes the model parameters of the convolutional neural network based on the gradient of the adjusted model parameters of each block. 2.The method of adaptive fine-tuning of convolutional neural networks of claim 1, wherein, The calculation of the standard value for each of the blocks includes: The standard value of each block is obtained by calculating the ratio between the second norm of the gradient of the model parameters of each block and the second norm of the model parameters of the block. The step of adjusting the gradient of the model parameters of the block based on the standard value of each block, or based on the average standard value of each block obtained by statistical analysis of multiple standard values, includes: The gradient of the model parameters of each block is adjusted based on the standard value of each block. 3.The method of adaptive fine-tuning of convolutional neural networks of claim 1, wherein, The calculation of the standard value for each of the blocks includes: During the first training cycle, the target domain image dataset is input into the convolutional neural network, and the ratio between the second norm of the gradient of the model parameters of each block and the second norm of the model parameters of the block is calculated to obtain the standard value of each block. The method further includes: statistically analyzing multiple standard values of each block obtained during the first training period, and calculating the average of the multiple standard values of each block to obtain the average standard value of each block; The step of adjusting the gradient of the model parameters of the block based on the standard value of each block, or based on the average standard value of each block obtained by statistical analysis of multiple standard values, includes: In subsequent training cycles, the target domain image dataset is re-input into the convolutional neural network, and the gradient of the model parameters of the block is adjusted based on the average standard value of each block.
4. The adaptive fine-tuning method for convolutional neural networks according to claim 3, characterized in that, During the first training cycle, after obtaining the standard value of each block and before statistically analyzing the multiple standard values of each block obtained during the first training cycle, the method further includes: Based on the standard value of each block, adjust the gradient of the model parameters of the block; The optimizer fine-tunes the model parameters of the convolutional neural network based on the gradient of the adjusted model parameters of each block. 5.The method of adaptive fine-tuning of convolutional neural networks of claim 1, wherein, The calculation of the standard value for each of the blocks includes: Calculate the product between the L2 norm of the model parameters of each block and the historical value of the first variable, calculate the ratio between the L2 norm of the gradient of the model parameters of the block and the product, calculate the minimum value between the ratio and the preset value, and obtain the standard value of each block; The method further includes: updating the current value of the second variable based on the ratio between the second norm of the gradient of the model parameters of each block and the second norm of the model parameters of the block, and the maximum value among the historical values of the second variable; Update the current value of the first variable based on the current value of the second variable; Wherein, the first variable is used to record the maximum value used in the current normalization operation, and the second variable is used to track historical maximum values. The step of adjusting the gradient of the model parameters of the block based on the standard value of each block, or based on the average standard value of each block obtained by statistical analysis of multiple standard values, includes: The gradient of the model parameters of each block is adjusted based on the standard value of each block. 6.The method of adaptive fine-tuning of convolutional neural networks of claim 1, wherein, The calculation of the standard value for each of the blocks includes: During the first training cycle, the target domain image dataset is input into the convolutional neural network. The product between the L2 norm of the model parameters of each block and the historical value of the first variable is calculated. The ratio between the L2 norm of the gradient of the model parameters of the block and the product is calculated. The minimum value between the ratio and the preset value is calculated to obtain the standard value of each block. The method further includes: updating the current value of the second variable based on the ratio between the second norm of the gradient of the model parameters of each block and the second norm of the model parameters of the block, and the maximum value among the historical values of the second variable; The average standard value of each block is obtained by statistically analyzing multiple standard values obtained during the first training cycle and calculating the average of the multiple standard values of each block. Update the current value of the first variable based on the current value of the second variable; Wherein, the first variable is a variable used to record the maximum value used in the current normalization operation, and the second variable is a variable used to track the historical maximum value; The step of adjusting the gradient of the model parameters of the block based on the standard value of each block or based on the average standard value of each block obtained by statistical analysis of multiple standard values includes: in subsequent training cycles, re-inputting the target domain image dataset into the convolutional neural network and adjusting the gradient of the model parameters of the block based on the average standard value of each block.
7. The method of adaptive fine-tuning of a convolutional neural network according to claim 6, wherein, During the first training period, after updating the current value of the second variable and before statistically analyzing the multiple standard values obtained for each block during the first training period, the method further includes: Based on the standard value of each block, adjust the gradient of the model parameters of the block; The optimizer fine-tunes the model parameters of the convolutional neural network based on the gradient of the adjusted model parameters of each block.
8. The adaptive fine-tuning method for a convolutional neural network according to any one of claims 1 to 7, characterized in that, The block includes fully connected blocks and convolutional blocks; The step of dividing the layers in the convolutional neural network into multiple blocks includes: The fully connected layer in the convolutional neural network is divided into the fully connected block; The convolutional blocks are defined as adjacent convolutional layers and batch normalization layers in the convolutional neural network. 9.The method of adaptive fine-tuning of convolutional neural networks of claim 8, wherein, The standard value of the fully connected block is the standard value of the weights of the fully connected layer, and the standard value of the convolutional block is the standard value of the weights of the convolutional layer.
10. An adaptive fine-tuning device for a convolutional neural network, characterized in that, include: The acquisition module is used to acquire the convolutional neural network for image classification; A partitioning module is used to traverse the layers in the convolutional neural network in reverse order and divide the layers in the convolutional neural network into multiple blocks; at least one block includes: multiple adjacent related layers; The fine-tuning module is used to input the target domain image dataset into the convolutional neural network, calculate the standard value of each block for each target domain image data, and fine-tune the model parameters of the convolutional neural network based on the standard value of each block. Specifically, the fine-tuning module is used to: perform forward propagation and backward propagation calculations for each target domain image data to obtain the gradient of the model parameters of each block of the convolutional neural network; adjust the gradient of the model parameters of the block based on the standard value of each block, or based on the average standard value of each block obtained by statistical analysis of multiple standard values; and fine-tune each model parameter of the convolutional neural network using an optimizer based on the adjusted gradient of the model parameters of each block.
11. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the steps of the adaptive fine-tuning method for the convolutional neural network as described in any one of claims 1 to 9.
12. A non-transitory computer-readable storage medium having stored thereon a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the adaptive fine-tuning method for a convolutional neural network as described in any one of claims 1 to 9.
Citation Information
Patent Citations
Attention mechanism convolutional neural network-based infrared target classification method
CN111401473A
Transfer learning with basis scaling and pruning
US20220405596A1