A neural network model training method, device and electronic equipment

By pre-training and extracting feature vector labels from the front end of a large-capacity neural network model, and combining this with fine-tuning of fully connected layers, the problems of training difficulties and low accuracy in large-capacity neural network models are solved, achieving faster and more efficient training results.

CN116432728BActive Publication Date: 2026-06-19SHENZHEN INTELLIFUSION TECHNOLOGIES CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHENZHEN INTELLIFUSION TECHNOLOGIES CO LTD
Filing Date
2021-12-31
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Large-capacity neural network models are difficult to train, have low training accuracy, require many parameters and a large number of training samples, making it difficult to achieve high accuracy standards.

Method used

The front-end of a large-capacity neural network model is pre-trained by removing fully connected layers. Feature vector labels are extracted using a smaller-capacity neural network model, which is then pre-trained with a small amount of training data and subsequently fine-tuned by combining it with fully connected layers.

Benefits of technology

It reduces the number of training parameters, lowers the training difficulty, improves training speed and accuracy, and solves the problem of training large-capacity neural network models.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116432728B_ABST
    Figure CN116432728B_ABST
Patent Text Reader

Abstract

This invention discloses a method, apparatus, and electronic device for training a neural network model. The method includes: pre-training the front-end portion of a target neural network model based on training data, wherein the front-end portion is the remaining part of the target neural network model after removing fully connected layers; and performing secondary training on the target neural network model after training the front-end portion based on the training data. The technical solution provided by this invention improves the training accuracy of large-capacity neural network models.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of machine learning, and more specifically to a method, apparatus, and electronic device for training neural network models. Background Technology

[0002] With the development of facial recognition and natural language processing technologies, neural networks and deep learning applications are booming. Generally, to accurately classify and predict input data, the larger the capacity (number of neurons and layers) of a neural network model, the higher its training accuracy. However, in practical applications, it has been found that directly training large-capacity neural network models using training data does not yield good results. This is because large-capacity neural network models require adjusting too many parameters and a large number of training samples. If the number of training samples is small and the number of training iterations is insufficient, it becomes too difficult to ensure that all adjustable parameters simultaneously meet the expected output state of the training data. Due to the large number of parameters requiring adjustment, even with numerous training iterations, it is difficult to achieve a high level of accuracy. Therefore, improving the training accuracy of large-capacity neural network models is an urgent problem to be solved. Summary of the Invention

[0003] In view of this, embodiments of the present invention provide a neural network model training method, apparatus, and electronic device, thereby improving the training accuracy of large-capacity neural network models.

[0004] According to a first aspect, the present invention provides a neural network model training method, the method comprising: pre-training a front-end portion of a target neural network model based on training data, wherein the front-end portion is the remaining portion of the target neural network model after removing fully connected layers; and performing secondary training on the target neural network model after the front-end portion has been trained based on the training data.

[0005] Optionally, the pre-training of the front-end portion of the target neural network model based on training data includes: extracting a predetermined proportion of data from the training data to form a sample subset, wherein the training data includes input data for inputting into the model and a first data label for labeling the categories of the input data; inputting the input data in the sample subset into a predetermined first neural network model to obtain a data classification result, wherein the capacity of the first neural network model is smaller than that of the target neural network model; correcting the parameters of the first neural network model based on the error between the data classification result and the first data label in the sample subset; removing the fully connected layers of the corrected first neural network model to obtain a second neural network model; extracting feature vectors from the input data in the sample subset using the second neural network model, and using the obtained feature vectors as second data labels, which, together with the input data in the sample subset, form second training data; and training the front-end portion based on the second training data.

[0006] Optionally, the second training of the target neural network model trained on the front end based on the training data includes: extracting data of a second preset proportion from the training data to form a second sample subset; and training the target neural network model trained on the front end based on the second sample subset.

[0007] Optionally, before performing secondary training on the target neural network model trained on the front-end based on the training data, the method further includes: pre-training the fully connected layers in the target neural network model based on the training data.

[0008] Optionally, the pre-training of the fully connected layer in the target neural network model based on the training data includes: extracting a second feature vector from the input data in the training data using the front-end portion; associating each of the second feature vectors with the corresponding first data labels based on the mapping relationship between the second feature vectors and the first data labels to form third training data; and training the fully connected layer in the target neural network model based on the third training data.

[0009] Optionally, the first neural network model is ResNet-101, and the target neural network model is a Transformer backbone network.

[0010] Optionally, the training data is facial image data, and the steps for generating the training data include: acquiring the facial images and adding color noise to each facial image; flipping each facial image after adding color noise from multiple angles; stitching together the flipped facial images belonging to the same person; associating each stitched facial image with the corresponding person information to generate the training data.

[0011] According to a second aspect, the present invention provides a neural network model training apparatus, the apparatus comprising: a pre-training module for pre-training a front-end portion of a target neural network model based on training data, the front-end portion being the remaining portion of the target neural network model after removing fully connected layers; and a full training module for performing secondary training on the target neural network model after the front-end portion has been trained based on the training data.

[0012] According to a third aspect, embodiments of the present invention provide an electronic device, including: a memory and a processor, wherein the memory and the processor are communicatively connected to each other, the memory stores computer instructions, and the processor executes the computer instructions to perform the method described in the first aspect, or any optional embodiment of the first aspect.

[0013] According to a fourth aspect, embodiments of the present invention provide a computer-readable storage medium storing computer instructions for causing the computer to perform the method described in the first aspect, or any alternative embodiment of the first aspect.

[0014] The technical solution provided in this application has the following advantages:

[0015] The technical solution provided in this application addresses the difficulty of training large-capacity neural network models. First, it pre-trains the front-end portion of the large-capacity neural network model, excluding fully connected layers, ensuring the accuracy of the front-end portion used to output data feature vectors. This training process reduces a large number of neurons in the fully connected layers, thereby reducing the number of model parameters that need to be trained. In other words, fewer model parameters need to simultaneously meet preset conditions, resulting in faster training speed and lower training difficulty. After the front-end portion is trained, it is then combined with the fully connected layers for further training. This allows for only minor adjustments to the model parameters in subsequent training processes. As the adjustment range of the model parameters decreases, the overall training difficulty of the model is significantly reduced, thereby improving the training accuracy of large-capacity neural network models.

[0016] Furthermore, the output data of the front-end of a neural network model is a feature vector, while the actual labels of the training data are not feature vectors. Therefore, standard feature vector labels are usually unknown. To address the issue of unknown training labels for the front-end of a neural network model, this invention first trains a smaller, simpler first neural network model based on the training data. Due to the simplicity of the model, the required amount of training data is much smaller, making it easier to achieve good training results. Then, the trained, smaller-capacity neural network model is used to extract features from the training data, obtaining the feature vector before computation through the fully connected layers. This solves the problem of difficulty in obtaining the data labels for the front-end of the training data. The obtained feature vectors are then used as labels and combined with the original input data to form new training data. This new training data is used to train the front-end of a large-capacity neural network model, effectively realizing the technical solution of training one part of the large-capacity neural network model first, and then training the other part. This improves the training accuracy of the large-capacity neural network model. Attached Figure Description

[0017] The features and advantages of the invention will be more clearly understood by referring to the accompanying drawings, which are schematic and should not be construed as limiting the invention in any way. In the drawings:

[0018] Figure 1 This diagram illustrates the steps of a neural network model training method according to one embodiment of the present invention.

[0019] Figure 2 A flowchart illustrating a neural network model training method according to one embodiment of the present invention is shown.

[0020] Figure 3 A schematic diagram of a neural network model training device according to one embodiment of the present invention is shown;

[0021] Figure 4 A schematic diagram of a neural network model training device according to one embodiment of the present invention is shown. Detailed Implementation

[0022] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0023] Please see Figure 1 and Figure 2In one embodiment, a neural network model training method specifically includes the following steps:

[0024] Step S101: Pre-train the front-end part of the target neural network model based on the training data. The front-end part is the remaining part of the target neural network model after removing the fully connected layers.

[0025] Step S102: Perform secondary training on the target neural network model after the front-end part has been trained based on the training data.

[0026] Specifically, addressing the issue of low training accuracy in large-capacity neural network models due to the large number of parameters requiring training and the difficulty of training, this embodiment first splits the large-capacity neural network model into two parts: a front-end part excluding fully connected layers, and fully connected layers. The front-end part is used to transform training data into feature vectors, while the fully connected layers are classification layers. Their principle is to input feature vectors and calculate the data category based on the feature vectors. The front-end part is pre-trained, significantly reducing the model's capacity compared to the overall target neural network model. This results in fewer model parameters and fewer layers required to simultaneously meet training conditions, reducing training difficulty and leading to a more accurate model. The initially acquired training data typically includes input data and its category labels. For example, in face recognition data, the input data consists of a large number of face images, and the category labels are the personnel information (including but not limited to identity IDs) corresponding to each face image. It's important to note that the labels used for the pre-training of the front-end part are not the personnel information from the initial training data, but rather standard feature vector labels (corresponding to the actual numerical form of the feature vectors output by the front-end part). In this embodiment, the standard feature vector labels are obtained through pre-training experiments with other models. After the front-end is trained, the front-end and the fully connected layer are combined for full training (it should be noted that the combined network is a complete target neural network model, and the labels of the full training data are the category labels of the initial input data, such as personnel information). This allows the model parameters to be fine-tuned in subsequent training processes. Since the adjustment range of the model parameters is greatly reduced, the overall training difficulty of the model is greatly reduced, thereby improving the training accuracy of large-capacity neural network models.

[0027] Specifically, in one embodiment, step S101 above includes the following steps:

[0028] Step 1: Extract a predetermined proportion of data from the training data to form a sample subset. The training data includes input data for inputting into the model and a first data label for labeling the categories of the input data.

[0029] Step 2: Input the input data from the sample subset into the first neural network model to obtain the data classification result.

[0030] Step 3: Based on the error between the data classification result and the first data label in the sample subset, the parameters of the first neural network model are corrected. The capacity of the first neural network model is smaller than that of the target neural network model.

[0031] Step 4: Remove the fully connected layers of the corrected first neural network model to obtain the second neural network model.

[0032] Step 5: Use the second neural network model to extract the feature vectors of the input data in the sample subset, and use the obtained feature vectors as the second data labels to form the second training data with the input data in the sample subset.

[0033] Step 6: Train the front-end part based on the second training data.

[0034] Specifically, in this embodiment, training the front-end of the target neural network model first requires obtaining training samples for training the front-end. For example, the input is facial image data, and the required output is feature vector labels. However, usually only the personnel information labels (i.e., the first data labels) are known, while the feature vector labels are usually unknown and difficult to obtain. Therefore, in this embodiment, a simpler neural network model (the first neural network model) with a smaller capacity than the target neural network model is first initialized. Training this model does not require a large amount of training data to achieve a good training effect. Therefore, a preset proportion of data is extracted from the training data to form a sample subset. In this embodiment, 1 / 10 of the original training data is used to form the sample subset, which greatly reduces the computational load of training and lowers the requirements for hardware resources. The first neural network model is then trained again using a subset of samples. For example, face image data is input into the first neural network model, which outputs a predicted person information. The error between the predicted person information and the actual person information is then used to correct the model parameters in the first neural network model. It is important to note that the first neural network model in the above training process has fully connected layers and outputs the data category (i.e., person information). When training is complete, the fully connected layers of the first neural network model are removed to obtain the second neural network model. The second neural network model is then used to calculate the output of each face image data. At this time, the output result is multiple feature vectors, not the final person information label. The feature vectors output by the simple model are then used as theoretically more standard feature vector labels. Finally, each output feature vector label is paired with the corresponding face image data to obtain the training samples (i.e., the second training data) used to train the front end of the large-capacity target neural network model.

[0035] In this embodiment, the training data used is facial image data. The first neural network training model is a convolutional neural network (CNN) ResNet-101, but it can also be a residual network structure such as ResNet-152; this invention is not limited to these. ResNet-101 is a convolutional neural network with small capacity, fast training speed, and high training accuracy. Training samples based on the ResNet-101 network structure result in more accurate feature vector labels. Although the ResNet-101 model performs well, its capacity is limited, and performance cannot be further improved by increasing the model capacity. Its classification performance is not good enough for tens of millions or millions of data points. Therefore, the large-capacity target neural network model finally trained in this embodiment adopts the Transformer backbone network proposed in recent years. This neural network model has a large capacity and wide application. Compared with convolutional neural networks, the Transformer backbone network has a global receptive field and the ability to adaptively extract features, resulting in better recognition performance for large amounts of facial data. However, the training effect of existing technologies for this network is limited by the amount of data and the number of training iterations, making training difficult and the training effect not good enough. By employing the steps described above in the training method of this invention, a small number of training samples (second training data) for training the front-end can be constructed using ResNet-101. These small training samples are then used to pre-train the front-end of the target neural network model, resulting in excellent training performance with a significantly reduced computational load. Subsequently, another small amount of training data is used to train the pre-trained target neural network model, effectively fine-tuning its parameters. This not only greatly improves training accuracy but also significantly reduces the amount of training data compared to directly training a large number of target neural network models, thus lowering the training complexity of the target neural network model.

[0036] Specifically, in one embodiment, step S102 above includes the following steps:

[0037] Step 7: Extract a second preset proportion of data from the training data to form a second sample subset.

[0038] Step 8: Train the target neural network model after the front-end part is trained based on the second sample subset.

[0039] Specifically, once the front-end of the target neural network model is trained, its model parameters are already in a relatively ideal state. In this embodiment, the front-end and fully connected layers are merged, and then training data is used to train the merged target neural network model. For example, inputting facial image data yields predicted personnel information, and then the target neural network is corrected based on the error between the predicted and actual personnel information to complete the training. On the one hand, in the previous embodiment, although a smaller-capacity first neural network model was used to obtain theoretically accurate feature vector labels, and front-end training was performed based on these labels, the feature vector labels are not inherently absolute. Therefore, in this embodiment, the target neural network model is trained again using facial image data and personnel information, allowing the front-end model parameters to be fine-tuned again, further improving the accuracy of the target neural network model. Furthermore, based on the already trained front-end, the training steps for the target neural network model only require major adjustments to the model parameters of the subsequent fully connected layers, making the training process focus more on the fully connected layers. This not only makes the fully connected layer model parameters more accurate but also achieves good results without extensive training, further reducing hardware resource requirements and solving the problem of difficulty in training large-capacity neural network models with existing hardware technology. Therefore, a second preset proportion of data is extracted from the training data to form a second sample subset (in this embodiment, another part of the training data is extracted at a ratio of 1 / 10). Less training data results in less computation, less training complexity, and less training difficulty.

[0040] Specifically, in one embodiment, the training data is facial image data, the input data is multiple facial images, the first data label is the personnel information corresponding to each facial image, and the specific steps for generating the training data include:

[0041] Step 9: Acquire face images and add color noise to each face image.

[0042] Step 10: Flip each face image after adding color noise from multiple angles.

[0043] Step 11: Stitch together the flipped face images of the individuals belonging to the same person.

[0044] Step 12: Associate each stitched face image with the corresponding personnel information to generate training data.

[0045] Specifically, in this embodiment, the training data used is facial image data. To further improve the accuracy of model training, data augmentation was performed on the training data. First, different color noise was added to each facial image, including but not limited to dividing the color image into three channels (RGB) and adding Gaussian noise to each channel. This Gaussian noise creates colored noise points on the image. Then, the color image was converted from RGB color mode to YCC color mode, and Gaussian noise was added to the brightness channel. The YCC color mode was then converted back to RGB mode. This way, the added Gaussian noise only manifests as noise points with varying brightness on the image and does not affect the color. Next, the facial image data with added color noise was flipped at different angles, and then images from multiple angles were stitched together, for example, using Mosaic data augmentation technology to stitch together four images from different facial angles. Finally, the processed facial images were associated with corresponding personnel information tags to generate training data. Through the above data augmentation methods, the trained model can perform accurate face recognition even under complex lighting conditions and with facial angle shifts, further improving the training accuracy of the target neural network.

[0046] Specifically, in one embodiment, prior to step S102 above, a neural network model training method further includes the following steps:

[0047] Step Thirteen: Pre-train the fully connected layers in the target neural network model based on the training data. Specifically, to further improve the training accuracy of the target neural network model, in addition to pre-training the front-end, this embodiment also pre-trains the fully connected layers, thereby further improving the accuracy of the target neural network model. In this embodiment, the specific training steps for the fully connected layers of the target neural network are as follows:

[0048] 1. Use the front-end portion to extract the second feature vector of the input data in the training data.

[0049] 2. Based on the mapping relationship between the second feature vector and the first data label, each second feature vector is associated with its corresponding first data label to form the third training data.

[0050] 3. Train the fully connected layers in the target neural network model based on the third training data.

[0051] After the front-end is trained, it extracts feature vectors from the input data, such as feature vectors from face image data. Since the capacity of the front-end is larger than that of the first neural network model without the fully connected layer, the feature vectors extracted by the front-end (second feature vectors) are usually more accurate than those extracted by the first neural network model. Then, using the mapping relationship between each second feature vector and the first data label (e.g., personnel information label), each second feature vector is associated with its corresponding first data label (usually multiple feature vectors correspond to one data label). The second feature vectors obtained from the associated third training data are then used as input to the fully connected layer to calculate the predicted data label. Finally, the error between the predicted data label and the first data label is used to adjust the model parameters of the fully connected layer, thus completing the pre-training of the fully connected layer. This achieves the goal of requiring only fine-tuning in subsequent training steps, further reducing the training difficulty of the target neural network model and improving training accuracy. To further improve the accuracy of the pre-training of the fully connected layer, in this embodiment, the second feature vector is the average feature vector obtained from multiple image data of a person. For example, images of the target face are acquired from multiple angles, and then the feature vectors of each angle image are extracted using the front-end part. The average value of all the obtained feature vectors is then calculated, and the result is the second feature vector. Subsequently, the row of the fully connected layer used for target face recognition is pre-trained using the second feature vector and the target face ID, completing one parameter tuning operation of the fully connected layer.

[0052] Specifically, in one embodiment, after step thirteen, a neural network model training method further includes the following steps:

[0053] Step Fourteen: Extract the third feature vector of the input data using the target neural network model.

[0054] Step 15: Pre-train the front end of the second target neural network model based on the training data composed of the input data and the third feature vector. The capacity of the second target neural network model is greater than that of the target neural network model.

[0055] Step 16: If the number of training iterations does not reach the preset threshold, the trained second target neural network model will be used as the target neural network model, and the process will return to Step 14.

[0056] Specifically, if the neural network model to be trained has a large capacity, training it twice will still not yield good results. In this embodiment, staged training can be performed. For example, a series of Transformer models T_0, T_1, T_2, ..., T_K can be constructed, with the model capacity T_0 < T_1 < ... < T_K. T_K is the desired final model. Staged training will be performed to train T_0, T_1, ..., T_K step by step. First, feature vector labels for training the front end of T_0 are obtained based on the ResNet-101 network, and then T_0 is trained. Then, T_0 is used to obtain feature vector labels for training the front end of T_1, and then T_1 is trained. This process is repeated until K training iterations are completed, gradually obtaining T_K, which is the desired high-capacity neural network model based on Transformer. This can greatly improve the training accuracy of the high-capacity neural network model.

[0057] Through the above steps, the technical solution provided in this application addresses the difficulty of training large-capacity neural network models. First, it pre-trains the front-end portion of the large-capacity neural network model, excluding the fully connected layers, ensuring the accuracy of the front-end portion used to output data feature vectors. This training process reduces a large number of neurons in the fully connected layers, significantly reducing the number of model parameters that need to be trained. This results in a substantial reduction in the number of model parameters that need to simultaneously meet preset conditions, leading to faster training speed and lower training difficulty. After the front-end portion is trained, it is then combined with the fully connected layers for further training. This allows for only minor adjustments to the model parameters in subsequent training processes. As the adjustment range of the model parameters decreases, the overall training difficulty of the model is significantly reduced, thereby improving the training accuracy of large-capacity neural network models.

[0058] Furthermore, the output data of the front-end of a neural network model is a feature vector, while the actual labels of the training data are not feature vectors. Therefore, standard feature vector labels are usually unknown. To address the issue of unknown training labels for the front-end of a neural network model, this invention first trains a smaller, simpler first neural network model based on the training data. Due to the simplicity of the model, the required amount of training data is much smaller, making it easier to achieve good training results. Then, the trained, smaller-capacity neural network model is used to extract features from the training data, obtaining the feature vectors before computation through the fully connected layers. This solves the problem of difficulty in obtaining the data labels for the front-end of the training data. The obtained feature vectors are then used as labels and combined with the original training data to form new training data. This new training data is used to train the front-end of a large-capacity neural network model, effectively realizing the technical solution of training one part of the large-capacity neural network model first, and then training the other part. This improves the training accuracy of the large-capacity neural network model.

[0059] like Figure 3 As shown, this embodiment also provides a neural network model training device, which includes:

[0060] The pre-training module 101 is used to pre-train the front-end portion of the target neural network model based on training data. The front-end portion is the remaining part of the target neural network model after removing the fully connected layers. For details, please refer to the relevant description of step S101 in the above method embodiment, which will not be repeated here.

[0061] The full training module 102 is used to perform secondary training on the target neural network model trained in the front-end part based on the training data. For details, please refer to the relevant description of step S102 in the above method embodiment, which will not be repeated here.

[0062] The present invention provides a neural network model training device for executing a neural network model training method provided in the above embodiments. Its implementation method and principle are the same. For details, please refer to the relevant description of the above method embodiments, which will not be repeated here.

[0063] Through the synergistic cooperation of the aforementioned components, the technical solution provided in this application addresses the difficulty of training large-capacity neural network models. Firstly, it pre-trains the front-end portion of the large-capacity neural network model, excluding the fully connected layers, ensuring the accuracy of the front-end portion used to output data feature vectors. This training process reduces a large number of neurons in the fully connected layers, thereby reducing the number of model parameters that need to be trained. In other words, the number of model parameters that need to simultaneously meet preset conditions is significantly reduced, resulting in faster training speed and lower training difficulty. After the front-end portion is trained, it is then combined with the fully connected layers for further training. This allows for only minor adjustments to the model parameters in subsequent training processes. As the adjustment range of the model parameters shrinks, the overall training difficulty of the model is significantly reduced, thereby improving the training accuracy of large-capacity neural network models.

[0064] Furthermore, the output data of the front-end of a neural network model is a feature vector, while the actual labels of the training data are not feature vectors. Therefore, standard feature vector labels are usually unknown. To address the issue of unknown training labels for the front-end of a neural network model, this invention first trains a smaller, simpler first neural network model based on the training data. Due to the simplicity of the model, the required amount of training data is much smaller, making it easier to achieve good training results. Then, the trained, smaller-capacity neural network model is used to extract features from the training data, obtaining the feature vectors before computation through the fully connected layers. This solves the problem of difficulty in obtaining the data labels for the front-end of the training data. The obtained feature vectors are then used as labels and combined with the original training data to form new training data. This new training data is used to train the front-end of a large-capacity neural network model, effectively realizing the technical solution of training one part of the large-capacity neural network model first, and then training the other part. This improves the training accuracy of the large-capacity neural network model.

[0065] Figure 4 An electronic device according to an embodiment of the present invention is shown. The device includes a processor 901 and a memory 902, which can be connected via a bus or other means. Figure 4 Taking the example of a connection between China and Israel via a bus.

[0066] Processor 901 can be a Central Processing Unit (CPU). Processor 901 can also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or combinations of the above types of chips.

[0067] The memory 902, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as the program instructions / modules corresponding to the methods in the above method embodiments. The processor 901 executes various functional applications and data processing of the processor by running the non-transitory software programs, instructions, and modules stored in the memory 902, thereby implementing the methods in the above method embodiments.

[0068] The memory 902 may include a program storage area and a data storage area. The program storage area may store the operating system and applications required for at least one function; the data storage area may store data created by the processor 901, etc. Furthermore, the memory 902 may include high-speed random access memory and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory 902 may optionally include memory remotely located relative to the processor 901, and these remote memories may be connected to the processor 901 via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

[0069] One or more modules are stored in memory 902 and, when executed by processor 901, perform the methods described in the above method embodiments.

[0070] The specific details of the aforementioned electronic device can be understood by referring to the relevant descriptions and effects in the above method embodiments, and will not be repeated here.

[0071] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The implemented program can be stored in a computer-readable storage medium. When executed, the program can include the processes of the embodiments of the above methods. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), random access memory (RAM), flash memory, hard disk drive (HDD), or solid-state drive (SSD), etc.; the storage medium can also include combinations of the above types of memory.

[0072] Although embodiments of the invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations all fall within the scope defined by the appended claims.

Claims

1. A method for training a neural network model, characterized in that, The method includes: Pre-training the front-end portion of a target neural network model based on training data, wherein the front-end portion is the remaining part of the target neural network model after removing fully connected layers; the pre-training of the front-end portion of the target neural network model based on training data includes: extracting a predetermined proportion of data from the training data to form a sample subset, wherein the training data includes input data for inputting into the model and first data labels for labeling the categories of the input data; inputting the input data from the sample subset into a predetermined first neural network model to obtain a data classification result, wherein the capacity of the first neural network model is smaller than that of the target neural network model; correcting the parameters of the first neural network model based on the error between the data classification result and the first data labels in the sample subset; removing the fully connected layers of the corrected first neural network model to obtain a second neural network model; extracting feature vectors from the input data in the sample subset using the second neural network model, and using the obtained feature vectors as second data labels, which, together with the input data in the sample subset, form second training data; training the front-end portion based on the second training data; The target neural network model trained on the front end is then trained a second time based on the training data.

2. The method of claim 1, wherein, The secondary training of the target neural network model after the front-end part has been trained based on the training data includes: A second sample subset is formed by extracting a second preset proportion of data from the training data; The target neural network model after the front-end part is trained is trained based on the second sample subset.

3. The method of claim 1, wherein, Before performing secondary training on the target neural network model trained on the front-end portion based on the training data, the method further includes: Pre-train the fully connected layers in the target neural network model based on the training data.

4. The method of claim 3, wherein, The pre-training of the fully connected layers in the target neural network model based on training data includes: The second feature vector of the input data in the training data is extracted using the front-end portion; Based on the mapping relationship between the second feature vector and the first data label, each second feature vector is associated with its corresponding first data label to form the third training data; The fully connected layers in the target neural network model are trained based on the third training data.

5. The method of claim 1, wherein, The first neural network model is ResNet-101, and the target neural network model is a Transformer backbone network.

6. The method of claim 1, wherein, The training data is facial image data, and the steps for generating the training data include: The face images are acquired, and color noise is added to each face image; The face images after adding color noise are flipped from multiple angles; The flipped facial images belonging to the same person are stitched together; The stitched facial images are associated with the corresponding personnel information to generate the training data.

7. A neural network model training apparatus characterized by comprising: The device includes: A pre-training module is used to pre-train the front-end portion of a target neural network model based on training data. The front-end portion is the remaining part of the target neural network model after removing fully connected layers. The pre-training of the front-end portion of the target neural network model based on training data includes: extracting a predetermined proportion of data from the training data to form a sample subset; the training data includes input data for inputting into the model and first data labels for labeling the categories of the input data; inputting the input data from the sample subset into a predetermined first neural network model to obtain a data classification result, wherein the capacity of the first neural network model is smaller than that of the target neural network model; correcting the parameters of the first neural network model based on the error between the data classification result and the first data labels in the sample subset; removing the fully connected layers of the corrected first neural network model to obtain a second neural network model; extracting feature vectors from the input data in the sample subset using the second neural network model, and using the obtained feature vectors as second data labels, which, together with the input data in the sample subset, form second training data; and training the front-end portion based on the second training data. The full training module is used to perform secondary training on the target neural network model trained in the front-end part based on the training data.

8. An electronic device, comprising: include: A memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing computer instructions, the processor executing the computer instructions to perform the method as described in any one of claims 1-6.

9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions for causing the computer to perform the method as described in any one of claims 1-6.