Method, apparatus, electronic device and medium for image processing
By generating adversarial example sets through contrastive learning, the performance problem of image classification models under imbalanced training data is solved, and the recognition accuracy and generalization ability of tail categories are improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- DOUYIN VISION CO LTD
- Filing Date
- 2022-08-26
- Publication Date
- 2026-06-19
Smart Images

Figure CN115565023B_ABST
Abstract
Description
Technical Field
[0001] Embodiments of this disclosure relate to the field of image processing technology, and more specifically, to methods, apparatus, computing devices, computer-readable storage media, and computer program products for image processing. Background Technology
[0002] Deep learning-based image classification techniques have been widely applied in image analysis. However, collecting and labeling image training data is challenging in some domains, and the training data can be highly imbalanced. Some categories have abundant training data, while others have very little, exhibiting a long-tail effect. This leads to suboptimal training results and makes it difficult to achieve the expected performance. Summary of the Invention
[0003] In view of this, embodiments of the present disclosure propose a technical solution for handling the long-tail problem in image classification tasks based on contrastive learning.
[0004] According to a first aspect of this disclosure, a method for image processing is provided. The method includes generating first sample features from image samples in a training dataset based on a feature extraction network of an image classification model. The method further includes obtaining a set of category features from a classification network of the image classification model, wherein each category feature corresponds to a category associated with the image classification model. The method also includes generating an adversarial positive sample set and an adversarial negative sample set for the image samples based on the first sample features, the set of category features, and a set of reference sample features, wherein the reference sample features in the set of reference sample features are generated from multiple image samples in the training dataset by the feature extraction network. The method further includes updating the image classification model based on the set of adversarial positive samples, the set of adversarial negative samples, and the set of category features.
[0005] Here, the image classification model includes a feature extraction network and a classification network. The reference sample feature set consists of sample features extracted by the image classification model, representing its semantic understanding of the image. The category features in the category feature set represent the category prototype of the image classification model. In this paper, category features and prototypes are used interchangeably. Category features can be understood as the "typical" features for that category; the higher the similarity, the greater the probability of the predicted result being of that category. The image classification model classifies images based on category features and image sample features. Therefore, by effectively combining category features and reference sample features, we can obtain contrastive learning samples that are "difficult" for the current image samples—that is, samples that are easily confused and cause the image classification model to produce incorrect predictions. In this way, we can construct adversarial positive sample sets and adversarial negative sample sets with discriminative difficulty, correcting the decision boundary of the image classification model for tail categories, thereby enabling effective training of the image classification model in the long-tail environment of imbalanced original training data.
[0006] In some embodiments of the first aspect, the image classification model may further include at least one encoder connected to a feature extraction network, and generating first sample features from image samples includes generating first sample features from image samples based on the feature extraction network and at least one encoder. In this manner, the sample features extracted from the image classification model for training differ from the sample features used for prediction, thereby improving the accuracy of model training. In some embodiments, at least one encoder may include a first encoder used to generate the first sample features from current image samples. The first encoder may be updated online.
[0007] In some embodiments of the first aspect, the method may further include: generating sample features from image samples in a batch containing the image samples, based on a feature extraction network and at least one encoder; adding the generated sample features to a reference sample feature set; and removing the earliest batch of reference sample features from the reference sample feature set. In this manner, the reference sample feature set can be dynamically maintained and updated during training, for example, by maintaining the reference sample feature set in a first-in-first-out (FIFO) manner. In some embodiments, the at least one encoder may include a second encoder used to generate the sample features in the reference sample feature set. The first encoder and the second encoder may be different, and the second encoder may be updated offline, specifically, during the training of the image classification model, based on the first encoder, for example, by using a momentum mechanism.
[0008] In some embodiments of the first aspect, the classification network may include a fully connected network, and obtaining the category feature set from the classification network includes determining the category feature set based on the weights of the fully connected network. In this manner, prototypes representing each category can be easily obtained; for example, the category of the prototype with the highest similarity to sample features of the image to be predicted is determined as the prediction result.
[0009] In some embodiments of the first aspect, the method may further include, for each category feature in the category feature set: generating sample features for at least one image sample in the batch containing the image sample, corresponding to the category feature, based on a feature extraction network; determining a local calibration factor for the category feature in that batch based on the sample features and category features of the at least one image sample; determining a global calibration factor for the category feature from the local calibration factor by a moving average across batches; and adjusting the category feature using the global calibration factor. In this manner, the problem of category features favoring head categories due to training data imbalance can be eliminated or mitigated.
[0010] In some embodiments of the first aspect, updating the image classification model may include updating the image classification model based on adjusted class features derived from an adversarial set of positive samples, an adversarial set of negative samples, and a set of class features. In this manner, the trained image classification model exhibits better performance on long-tailed categories.
[0011] In some embodiments of the first aspect, generating an adversarial positive sample set may include: determining reference sample features in a reference sample feature set whose categories are the same as those of the image sample, to generate a first candidate positive sample set; selecting reference sample features with mispredicted categories from the candidate positive sample set; combining the reference sample features with mispredicted categories and category features corresponding to the mispredicted categories to generate a second candidate positive sample set; and generating an adversarial positive sample set based on the second candidate positive sample set. In this manner, the generated adversarial positive samples have the same true category as the current image sample and possess sample features that were mispredicted by the image classification model, thus belonging to challenging positive samples. Utilizing such adversarial positive samples reduces the likelihood of false negatives for this category and improves the recognition accuracy of the image classification model.
[0012] In some embodiments of the first aspect, combining reference sample features with mispredicted categories and category features corresponding to those mispredicted categories to generate a second set of candidate positive samples may include: performing a weighted summation of the reference sample features with mispredicted categories and the category features corresponding to those mispredicted categories, wherein a first weight for the reference sample features and a second weight for the category features are randomized, and the first weight is greater than the second weight. Based on this approach, it is possible to increase resistance to the randomness of positive samples and improve the generalization ability of the image classification model.
[0013] In some embodiments of the first aspect, generating an adversarial negative sample set may include: determining reference sample features from a reference sample feature set whose categories differ from those of the image sample, to generate a first candidate negative sample set; selecting reference sample features from the first candidate negative sample set based on a comparison with the first sample features; combining the selected reference sample features with category features corresponding to the category of the image sample to generate a second candidate negative sample set; and generating an adversarial negative sample set based on the second candidate negative sample set. In this manner, the generated adversarial negative samples have true categories different from the current image sample and possess sample features that are easily confused with the current image, thus constituting challenging negative samples. Utilizing such adversarial negative samples reduces the likelihood of false positives for this category and improves the recognition accuracy of the image classification model.
[0014] In some embodiments of the first aspect, selecting reference sample features from the candidate negative sample set based on a comparison with the first sample features may include: determining the distance between the reference sample features in the first candidate negative sample set and the first sample features; and selecting several reference sample features that are closest to the first sample features. In this manner, sample features more similar to the current image sample can be selected from the reference sample features, generating more challenging adversarial negative samples.
[0015] In some embodiments of the first aspect, combining the selected reference sample features and the category features corresponding to the category of the image sample to generate a second candidate negative sample set includes: performing a weighted summation of the selected reference sample features and the category features corresponding to the category of the image sample, wherein the first weight for the selected reference sample features and the second weight for the category features are randomized, and the first weight is greater than the second weight. Based on this approach, the difficulty of recognizing adversarial negative samples can be increased, improving the generalization ability of the image classification model.
[0016] In some embodiments of the first aspect, updating the image classification model based on an adversarial positive sample set, an adversarial negative sample set, and a class feature set may include: generating second sample features for the image sample based on a feature extraction network; determining a first loss for the image sample based on a comparison of the second sample features and class features in the class feature set whose class differs from that of the image sample, and a comparison of the first sample features and the adversarial negative sample set; determining a second loss for the image sample based on a comparison of the second sample features and class features in the class feature set whose class is the same as that of the image sample, and a comparison of the first sample features and the adversarial positive sample set; and updating the image classification model based on the first loss and the second loss. Here, the first loss represents the prediction loss of the image classification model, and the second loss represents the contrastive loss of the image classification model. In this manner, a unified loss function can be used to train the image classification model by combining the prediction loss and the contrastive loss, achieving better training results.
[0017] According to a second aspect of this disclosure, an apparatus for image processing is provided. The apparatus includes: a first sample feature generation unit, a category feature acquisition unit, an adversarial example set generation unit, and a model update unit. The first sample feature generation unit is configured to generate first sample features of image samples in a training dataset based on a feature extraction network of an image classification model. The category feature acquisition unit is configured to acquire a category feature set from a classification network of the image classification model, wherein each category feature corresponds to a category associated with the image classification model. The adversarial example set generation unit is configured to generate an adversarial positive sample set and an adversarial negative sample set for the image samples based on the first sample features, the category feature set, and a reference sample feature set, wherein the reference sample features in the reference sample feature set are generated from multiple image samples in the training dataset based on the feature extraction network. The model update unit is configured to update the image classification model based on the adversarial positive sample set, the adversarial negative sample set, and the category feature set.
[0018] According to a third aspect of this disclosure, a computing device is provided. The computing device includes at least one processing unit and at least one memory, the at least one memory being coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions, when executed by the at least one processing unit, causing the computing device to perform the method according to a first aspect of this disclosure.
[0019] According to a fourth aspect of this disclosure, a computer-readable storage medium is provided, including machine-executable instructions that, when executed by a device, cause the device to perform the method described according to a first aspect of this disclosure.
[0020] According to a fifth aspect of this disclosure, a computer program product is provided, including machine-executable instructions that, when executed by a device, cause the device to perform the method according to a first aspect of this disclosure.
[0021] This content section is provided to present the selection of concepts in a simplified form, which will be further described in the detailed embodiments below. This content section is not intended to identify key or essential features of this disclosure, nor is it intended to limit the scope of this disclosure. Attached Figure Description
[0022] The above and other objects, features and advantages of this disclosure will become more apparent from the accompanying drawings, in which like reference numerals generally denote like parts.
[0023] Figure 1 A schematic diagram of an example environment in which several embodiments of the present disclosure can be implemented is shown;
[0024] Figure 2 An exemplary framework for an image classification model according to embodiments of the present disclosure is shown;
[0025] Figure 3 An example flowchart of an image processing procedure according to an embodiment of the present disclosure is shown;
[0026] Figure 4 A schematic conceptual diagram illustrating the generation of adversarial positive and adversarial negative samples according to embodiments of the present disclosure is shown.
[0027] Figure 5 An example flowchart illustrating the process of generating adversarial positive samples according to embodiments of the present disclosure is shown;
[0028] Figure 6 An example flowchart illustrating the process of generating adversarial negative samples according to embodiments of the present disclosure is shown;
[0029] Figure 7 An example flowchart illustrating the process of determining the loss of an image sample according to an embodiment of the present disclosure is shown;
[0030] Figure 8 An example block diagram of an image processing apparatus according to an embodiment of the present disclosure is shown.
[0031] Figure 9 A schematic block diagram of an example device that can be used to implement embodiments of the present disclosure is shown. Detailed Implementation
[0032] It is understood that before using the technical solutions disclosed in the various embodiments of this disclosure, users should be informed of the types, scope of use, and usage scenarios of the personal information involved in this disclosure in an appropriate manner in accordance with relevant laws and regulations, and user authorization should be obtained.
[0033] For example, upon receiving a user's active request, a prompt message is sent to the user to explicitly inform them that the requested operation will require the acquisition and use of the user's personal information. This allows the user to independently choose whether to provide personal information to the software or hardware, such as the electronic device, application, server, or storage medium performing the operations of this disclosed technical solution, based on the prompt message.
[0034] As an optional but non-limiting implementation, in response to a user's active request, sending a prompt message to the user can be done via a pop-up window, where the prompt message can be presented in text format. Furthermore, the pop-up window can also include a selection control allowing the user to choose "agree" or "disagree" to provide personal information to the electronic device.
[0035] It is understood that the above notification and user authorization process are merely illustrative and do not constitute a limitation on the implementation of this disclosure. Other methods that comply with relevant laws and regulations may also be applied to the implementation of this disclosure.
[0036] Preferred embodiments of the present disclosure will now be described in more detail with reference to the accompanying drawings. While preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that the present disclosure will be thorough and complete, and will fully convey the scope of the present disclosure to those skilled in the art.
[0037] The term "comprising" and its variations as used herein signify open inclusion, i.e., "including but not limited to". Unless otherwise stated, the term "or" means "and / or". The term "based on" means "at least partially based on". The terms "one example embodiment" and "one embodiment" mean "at least one example embodiment". The term "another embodiment" means "at least one additional embodiment". The terms "first", "second", etc., may refer to different or the same objects. Other explicit and implicit definitions may also be included below.
[0038] It should be noted that any numerical values or figures used in this disclosure are exemplary and are in no way intended to limit the scope of this disclosure.
[0039] In embodiments of this disclosure, the term "model" is capable of processing inputs and providing corresponding outputs. Taking a neural network model as an example, it typically includes an input layer, an output layer, and one or more hidden layers between the input and output layers. Models used in deep learning applications (also called "deep learning models") typically include many hidden layers, thereby extending the depth of the network. The layers of a neural network model are sequentially connected such that the output of the previous layer is used as input to other layers or itself, where the input layer receives input for the neural network model, and the output of the output layer serves as the final output of the neural network model. Each layer of a neural network model includes one or more nodes (also called processing nodes or neurons), each node processing input from the layer above. The model has parameters that act within or between layers. In this document, the terms "neural network," "model," "network," and "neural network model" are used interchangeably. A model can be trained for various specific tasks, such as image analysis, text processing, speech processing, etc. The parameters of the model are updated during training.
[0040] As mentioned above, imbalanced training data negatively impacts image classification model training in image classification tasks. For example, in medical image recognition, a large amount of training data focuses on the head category (e.g., indicating health), while the tail category has very little training data (e.g., showing symptoms). This makes it difficult for the image classification model to learn information about the tail category, hindering its performance. Traditional approaches aim to increase the tail category's influence on model parameters by expanding its training data or increasing its weight. However, this approach simply replicates information from existing training data, making it difficult for the model to learn new, discriminative information, thus resulting in unsatisfactory performance.
[0041] In view of this, embodiments of the present disclosure propose a scheme based on contrastive learning, which can effectively solve or at least partially alleviate the problem of imbalanced training data and improve the performance of image classification models on tail categories. In the image processing method according to embodiments of the present disclosure, a first sample feature is generated from image samples in the training dataset based on a feature extraction network of the image classification model. The method further includes obtaining a set of category features from a classification network of the image classification model, each category feature corresponding to a category for the image classification model. The classification network can compare the features of the image to be predicted with the category features (e.g., calculate similarity) and use the closest category as the prediction result. The method further includes providing a set of reference sample features, wherein the reference sample features are generated based on multiple image samples in the training dataset by the feature extraction network, for example, image samples from previous batches. The set of reference sample features may also include image samples from the current batch. In the method, an adversarial positive sample set and an adversarial negative sample set are generated based on the first sample features of the current image samples, the set of category features, and the set of reference sample features. The method further includes updating the image classification model based on the set of adversarial positive samples, the set of adversarial negative samples, and the set of category features.
[0042] The following is for reference Figures 1 to 8 The implementation details of the embodiments of this disclosure are described in detail.
[0043] Figure 1 A schematic diagram of an example environment 100 in which various embodiments of the present disclosure can be implemented is shown. For example... Figure 1 As shown, environment 100 includes computing device 101. Computing device 101 can be any device with computing capabilities, such as personal computers, tablet computers, wearable devices, cloud servers, mainframes, and distributed computing systems.
[0044] The computing device 101 can acquire an image classification model 110 and a training dataset 130, and use the training dataset 130 to train the image classification model 110 to generate a trained image classification model 140. The image classification model 110 can be implemented using any suitable network architecture, including but not limited to support vector machine (SVM) models, Bayesian models, random forest models, and various deep learning / neural network models such as convolutional neural networks (CNN), recurrent neural networks (RNN), residual networks (ResNet), deep neural networks (DNN), deep reinforcement learning networks (DQN), etc. The scope of this disclosure is not limited in this respect.
[0045] The training dataset 130 may include image samples and the category associated with each image sample. The categories in the training dataset 130 are considered to be the true category information of the image samples, which can be obtained through any suitable means, such as manual annotation or other methods. During training, the image samples and their categories in the training dataset 130 can be input into the computing device 101 in batches, and the computing device can update the parameters of the image classification model 110 according to the training method. When the training termination condition is met (e.g., after a predetermined time, using a predetermined amount of training data, or determining that the model has converged), the computing device 101 can terminate the training and output the trained image classification model 140.
[0046] The trained image classification model 140 can be deployed on any device, whether the same as or different from the computing device 101. When the trained image classification model 140 is deployed on a device (not shown) different from the computing device 101, the computing device 101 can transmit the structure and parameters of the trained image classification model to that device via a communication network. As shown, the trained image classification model 140 can receive the image 150 to be classified and generate an output as a prediction result 160.
[0047] The above is for reference only. Figure 1 Exemplary environments in which embodiments of this disclosure can be implemented are described. It should be understood that... Figure 1 This is merely illustrative; the environment may include more modules or components, or some modules or components may be omitted, or the modules or components shown may be recombined. Embodiments of this disclosure can be used in conjunction with... Figure 1 This disclosure does not limit the implementation in the different environments shown.
[0048] Figure 2 An exemplary framework of an image classification model 200 according to an embodiment of the present disclosure is shown. The image classification model 200 may be... Figure 1 An exemplary implementation of the image classification model 110 is shown. For ease of explanation, it is combined with... Figure 1 To describe image classification model 200.
[0049] As shown in the figure, the image classification model 200 includes a feature extraction network 210 and a classification network 220. The feature extraction network 210 extracts sample features from image samples 201 in the input training dataset 130. During training, image samples can be input into the feature extraction network 210 of the image classification model 200 in training batches. The feature extraction network 210 may include a backbone network 212 and an encoder 214. In some embodiments, the backbone network 212 may be implemented as a residual network, such as ResNet50. The encoder 214 encodes the output of the backbone network 212 and generates sample features 203 as input to the classification network 220. The encoder 214 may be implemented as a fully connected network including at least one layer. In this paper, the encoder 214 may be represented as a projection network. .
[0050] The classification network 220 receives sample features 203 from the image samples output by the feature extraction network 210. Based on these sample features, it predicts the classification of the image samples. The classification network 220 can maintain a set 221 of category features related to the categories of the image classification model 200, where each category feature 222 corresponds to a category. The classification network 220 compares the received image sample's sample features with each category feature 222 and determines the category of the category feature closest to the sample feature as the predicted result for the image sample. For example, the classification network 220 can calculate and normalize the dot product between the image sample's sample features and each category feature 222 to obtain cosine similarity information. In some embodiments, the classification network 220 can be implemented as including a fully connected network, generating the prediction result based on the product of the weight matrix of the fully connected network and the sample features. In this case, each row of the weight matrix of the fully connected network can be determined as a category feature 222 corresponding to the corresponding category. It should be noted that after training, the feature extraction network 210 and the classification network 220 in the image classification model 200 are also used to predict the classification of images, and Figure 2 Other components in the model are used during training but not for the prediction task after training. The following describes those components in the image classification model 200 that are used in the training task.
[0051] As shown in the figure, in the feature extraction network 210, the output of the backbone network 212 can be provided to the encoder 230 for contrastive learning. The encoder 230 can be implemented as a fully connected network including one or more fully connected layers, and its output size can be, for example, 1024, 2048, or other suitable values. In this paper, the encoder 230 can be represented as a projection. Encoder 230 receives the output of backbone network 212 and further encodes each image sample in the training batch of training dataset 130 to obtain corresponding sample features 202. Encoder 230 can also encode other samples in training dataset 130 to generate multiple sample features 241 and store them as reference sample features. Multiple reference sample features form a reference sample feature set 240. In some embodiments, reference sample features 241 may be generated based on image samples in previous training batches in training dataset 130, and optionally, may also include those generated based on image samples in the current training batch. The reference sample feature set 240 can be implemented as a first-in-first-out (FIFO) queue with a fixed size. Thus, whenever a sample feature from the current training batch is added to the reference sample feature set 240, the sample feature from the earliest training batch is removed to update the reference sample feature set 240. The update of the reference sample feature set 240 can be performed at the beginning or end of the current training batch.
[0052] To distinguish between the outputs of encoder 230 and encoder 214, the sample feature 202 generated by encoder 230 for contrastive learning will be referred to as the first sample feature, and the sample feature 203 generated by encoder 214 for classification will be referred to as the second sample feature. It should be understood that either or both of encoder 214 and encoder 230 may be omitted.
[0053] In some embodiments, an adversarial example set 250 for contrastive learning can be generated based on a first sample feature 202 for each image sample 201 in the training batch, a reference sample feature set 240, and a category feature set 221 from the classification network 220. The adversarial example set 250 includes an adversarial positive sample set 252 and an adversarial negative sample set 254 for the image samples. In contrastive learning, the sample features in the adversarial positive sample set 252 can be understood as similar to the first sample feature 202 of the current image sample and having the same true category, while the sample features in the adversarial negative sample set 254 are understood as easily confused with the first sample feature 202 of the current image sample and having a different true category. References will be made below. Figures 3 to 6 Describe in detail the process of generating the adversarial sample set 250.
[0054] The generated set of adversarial examples 250 and the class features 222 from the classification network 220 can be provided to the loss determination module 270. Considering that in a long-tail environment with imbalanced training data, the class features 222 will be trained to tend towards the head classes with more training data, resulting in poor performance for the tail classes, optionally, the class features 222 can be recalibrated using a calibrator 260. The recalibration adds a calibration factor to each class feature that reflects the "difficulty" of each class and the "representativeness" level of that class feature. Specific details will be described below and will not be elaborated here.
[0055] The following is for reference Figures 3 to 7 This describes the processing of image samples from the training dataset 130 during the training of the image classification model 200. Generally, image samples from the training dataset 130 are input into the image classification model 110 in training batches, with each batch containing multiple image samples. The image classification model 110 is updated by calculating the loss for each training batch.
[0056] Figure 3 An example flowchart of an image processing procedure 300 according to an embodiment of the present disclosure is shown. Procedure 300 can be, for example... Figure 1 The process is implemented using the computing device 101 shown. It should be understood that the process 300 may also include additional actions not shown and / or the actions shown may be omitted; the scope of this disclosure is not limited in this respect. The following is in conjunction with… Figure 1 and Figure 2 Detailed description of the process 300.
[0057] In box 310, computing device 101 generates first sample features 202 of image samples 201 in training dataset 130 based on feature extraction network 210 of image classification model 200. In this paper, training dataset 130 may include... N Training samples, including C There are 130 categories. In this paper, the training dataset 130 can be represented as... Image sample 201 is represented as ,in Represents an image. This represents the true category of the image sample. As mentioned above, image sample 201 can be encoded as first sample feature 202 via backbone network 212 and encoder 230. On the other hand, image sample 201 can also be encoded as second sample feature 203 via backbone network 212 and encoder 214.
[0058] Encoder 230 may include at least one encoder, for example, a first encoder and a second encoder, wherein the first encoder is used to generate first sample features 202, and the second encoder is used to generate reference sample features 241. During training, the first encoder is updated online, while the second encoder is updated offline along with the first encoder according to a momentum mechanism.
[0059] In box 320, computing device 101 obtains a set of category features 221 from the classification network 220 of image classification model 200. Each category feature 222 in the set of category features 221 corresponds to a category associated with image classification model 200. In this document, the set of category features 221 is denoted as... ,in For one k A dimensional vector, representing categorical features, is also known as a prototype.
[0060] In some embodiments, the classification network 220 may include a fully connected network with a weight matrix. The classification network 220 can multiply the weight matrix of the fully connected network with the second sample features 203 to obtain a vector representing the prediction result, and determine the category corresponding to the largest component of the prediction result as the classification result. Thus, the category feature set 221 can be determined based on the weight matrix. Specifically, a row of weights in the weight matrix can be determined as the category feature of the corresponding category. For example, the weights in the first row of the weight matrix can be determined as the category feature of the first category output by the classification network 220, the weights in the second row of the weight matrix can be determined as the category feature of the second category output by the classification network 220, and so on.
[0061] In box 330, computing device 101 generates an adversarial positive sample set 252 and an adversarial negative sample set 254 for image sample 201 based on first sample features 202, category feature set 221, and reference sample feature set 240. Reference sample features 241 in the reference sample feature set 240 are generated from multiple image samples in training dataset 130 based on feature extraction network 210. In some embodiments, computing device 101 generates the reference sample feature set 240 from previous batches of image samples based on the backbone network 212 of feature extraction network 210 and encoder 230. The reference sample feature set 240 can be in the form of a first-in, first-out queue, where reference sample features from the earliest training batch are deleted when sample features from a new training batch are added. In this document, the reference sample feature set can be represented as... .
[0062] In some embodiments, the reference sample feature set 240 can be updated using the batch to which the current image sample 201 belongs. Specifically, the computing device 101 can generate sample features for the batch based on the feature extraction network 210 (using intermediate results from the backbone network 212) and the encoder 230, then add these sample features to the reference sample feature set 240, and remove the reference sample features from the earliest batch from the reference sample feature set 240. This update can be performed at the beginning of the training batch or after the image classification model 200 has been updated using the training batch.
[0063] As mentioned above, the sample features in the adversarial positive sample set 252 are understood to be similar to the first sample feature 202, and they have the same true class. However, the sample features in the adversarial negative sample set 254 are easily confused with the first sample feature 202 and have different true classes. Therefore, by constructing the adversarial positive sample set 252 and the adversarial negative sample set 254, we can provide learning samples with discriminative difficulty to guide the learning direction of the model.
[0064] Continue to refer to Figure 3 In box 340, computing device 101 updates image classification model 200 based on adversarial positive sample set 252, adversarial negative sample set 254 and category feature set 221.
[0065] Here, the adversarial positive sample set 252 and the adversarial negative sample set 254 are used to determine the loss for contrastive learning. The contrastive learning loss includes the loss of the first sample feature 202 compared to the adversarial positive sample set 252 and the loss compared to the adversarial negative sample set 254. The class feature set 221, from the classification network 220, is used to determine the classification loss, i.e., the difference between the predicted result and the true class. The classification loss includes the loss of the first sample feature 202 compared to class features of the same class in the class feature set 221 and the loss compared to class features of different classes. (Refer to...) Figure 7 Detailed description.
[0066] Using the same method, the loss for each image sample in the training batch can be obtained, thus yielding the total loss for the entire batch. The image classification model can be updated using an optimizer such as stochastic gradient descent (SGD).
[0067] The foregoing describes an exemplary image processing procedure for training an image classification model according to embodiments of the present disclosure. Based on this approach, adversarial positive and negative sample sets with discriminative difficulty can be constructed, correcting the decision boundary of the image classification model for tail categories, thereby enabling effective training of the image classification model in long-tail environments with imbalanced original training data. The following refers to... Figures 4 to 7Further embodiments of this disclosure are described, which can be compared with those described in the references. Figures 1 to 3 The described content can be combined arbitrarily, and there are no restrictions on the combination of each other in this disclosure. Figures 4 to 6 The process of generating a sample set for contrastive learning according to embodiments of the present disclosure is illustrated. Figure 7 The process of determining the loss function is described.
[0068] Figure 4 A schematic conceptual diagram illustrating an embodiment of the present disclosure for generating adversarial positive and adversarial negative samples. As shown, the process will be based on a training sample set. Image samples in The first sample feature of 201 202. Set of Category Features 221. Reference Sample Feature Set 240, generate positive and negative samples for contrastive learning, where C For the predefined total number of categories, M The size of the reference sample feature set is 240.
[0069] like Figure 4 As shown, firstly, based on the feature set of the reference sample... The true category of the reference sample features in 240, and the reference sample feature set 240 is divided into non-overlapping subsets. 402 and 404. Set 402 includes reference sample features that are the same as those of the true category as image sample 201, while the set 404 includes reference sample features that differ from the true category and image sample 201. Similarly, the set of category features... 221 is divided into non-overlapping subsets. 414 and 408. Set 414 is a single-element set containing category features corresponding to the true category of image sample 201. , and set 408 includes a set of categorical features. Except for 221 All other category features. The set obtained based on the partition. 402 404 414 and 408 is used to generate adversarial positive and adversarial negative samples for contrastive learning.
[0070] Figure 5An example flowchart of a process 500 for generating an adversarial positive sample set according to an embodiment of the present disclosure is shown. Process 500 may be... Figure 3 This is part of an exemplary implementation of box 330. Process 500 can be, for example... Figure 1 The process 500 is implemented using the computing device 101 shown. It should be understood that the process 500 may also include additional actions not shown and / or the actions shown may be omitted; the scope of this disclosure is not limited in this respect. The following is in conjunction with… Figures 1 to 4 Detailed description of the process (500).
[0071] In box 510, computing device 101 determines reference sample features in reference sample feature set 240 that have the same category as image sample 201 to generate a first candidate positive sample set. 402.
[0072] In box 520, computing device 101 selects from the first candidate positive sample set In step 402, reference sample features with incorrectly predicted categories are selected. False predictions indicate that these reference samples pose a challenge for the image classification model: their true categories are the same as image sample 201, but the model previously gave incorrect predictions. To control the sample size, the selection of no more than γ such reference sample features can be limited to form a set. 406. γ is a pre-set hyperparameter.
[0073] In box 530, computing device 101 combines (409) the reference sample features with the mispredicted category and the category features corresponding to the mispredicted category to generate a second set of candidate positive samples. 410. In the case where the reference sample features are mispredicted, the image classification model incorrectly identifies one of the category features in the category feature set 221 as the most similar category feature. The image classification model can guide the reference sample features and the incorrect category feature towards the correct category. Therefore, the reference sample features and the incorrect category feature are combined 409 as adversarial positive samples against the current image sample 201, forming a second candidate positive sample set. 410.
[0074] In some embodiments, to generate more challenging adversarial positive examples, the set can be modified. The 406 reference sample features and the corresponding mispredicted category features are weighted and summed. The weights for the reference sample features and the corresponding mispredicted category features can be randomized. For example, normalized weights can be used for the set. The i-th reference sample feature in 406 has a weight of categorical feature. The weights for the features of the reference sample are 1- , It can be a random value between (0, E), where E is a hyperparameter, taking a smaller value, such as 0.4 or another value. In other words, it is the set of second candidate positive samples. The main contribution of the adversarial positive samples in 410 comes from the features of the reference samples. 406. In some embodiments, the magnitude of the adversarial positive sample may also be normalized, for example, by dividing by the 2-norm of the sample.
[0075] In box 540, computing device 101 is based on a second set of candidate positive samples. 410, Generate an adversarial positive sample set 252. The computing device 101 can directly process the second candidate positive sample set. 410 is determined as the set of adversarial positive samples 252 for contrastive learning. In some embodiments, a second candidate set of positive samples may also be included. 410 and the first candidate positive sample set 402 merges result in a set of 252 adversarial positive samples, i.e. .
[0076] Figure 6 An example flowchart of a process 600 for generating an adversarial negative sample set according to an embodiment of the present disclosure is shown. Process 600 may be... Figure 3 This is part of an exemplary implementation of box 330. Process 600 can be, for example... Figure 1 The process 600 is implemented using the computing device 101 shown. It should be understood that the process 600 may also include additional actions not shown and / or the actions shown may be omitted; the scope of this disclosure is not limited in this respect. The following is in conjunction with… Figures 1 to 4 Detailed description of the process 600.
[0077] In box 610, computing device 101 determines reference sample features from reference sample feature set 240 that have a different category from image sample 201, in order to generate a first candidate negative sample set. 404.
[0078] In box 620, computing device 101 is based on the features of the first sample. The comparison of 202 is based on the first candidate negative sample set. 404 error: Select a reference sample feature. You can use the distance between features and the first sample feature. Compare 202. For the set Features in 404 The distance is represented as:
[0079] (1)
[0080] superscript The operation || represents the transpose operation. The operation ||2 represents the 2-norm.
[0081] Then, based on the features of the first sample... The distance of 202 is used to sort the first candidate negative sample set in ascending order. The reference sample features in 404 are sorted, and the top γ reference sample features are selected, where γ is a pre-set hyperparameter that can be used to select... The γ in 406 is the same as or different from that in 406. From this, we can obtain the set. 412, as follows:
[0082] (2)
[0083] in, It is sorted The γth reference sample feature in 404.
[0084] In box 630, the selected reference sample features and the category features corresponding to the category of image sample 201 are combined. Combining 414 with 415, we generate a second set of candidate negative samples. 420. In some embodiments, to generate more challenging adversarial negative samples, reference sample features and class features are used. 414 is randomly weighted and summed to form 415, as follows:
[0085] (3)
[0086] in These are random interpolation coefficients for each image sample, and have an upper bound. E It is a hyperparameter, and a set The magnitude of the adversarial negative samples is normalized using the 2-norm. To make the contribution of the reference sample features greater, E It is a smaller value, so the weight of the selected reference sample features is greater than the weight of the category features.
[0087] Next, in box 640, computing device 101, based on the second candidate negative sample set... 420, generate an adversarial negative sample set of 254. Computing device 101 can directly process the second candidate negative sample set. 420 is determined as the adversarial negative sample set 254 for contrastive learning. In some embodiments, a second candidate negative sample set may also be included. 420 and the first candidate negative sample set After merging 404, we get a set of 254 adversarial negative samples, i.e. .
[0088] The above is for reference only. Figures 4 to 6 The process of generating adversarial positive and adversarial negative samples is described. The adversarial positive and adversarial negative samples are compared with the class features in the class sample set 221 to determine the loss for image sample 201.
[0089] Given the imbalance of training data, the class features 222 in the classification network 220 may be trained to favor head classes, resulting in poor performance of tail classes. Embodiments of this disclosure also provide a mechanism for recalibrating each class feature 222. This mechanism can be... Figure 2 The calibrator 260 shown is used for implementation. Formally, for each category feature... 222 provides calibration factors, as follows:
[0090] (4)
[0091] in It is a related category. c a subset of samples It is the number of samples in this subset. It is for drawing samples The second sample feature 203 is output from the feature extraction network 210 of the image classification model 200. End-to-end calibration is implemented in batch-based training, and the global calibration factor for each category feature is obtained by moving average. The local calibration factor in the current batch is calculated by the above formula (4), as follows:
[0092] (5)
[0093] in Each batch has a label c The sample, and β This is a hyperparameter used as a smoothing coefficient. Here, the global calibration factor... This reflects the "difficulty" of each category and the "representativeness" of its characteristics. Finally, the calibration factors... Applying this to each category feature in the category feature set yields the calibrated category features, as follows:
[0094] (6)
[0095] In this manner, computing device 101 can generate sample features of at least one image sample in the batch containing image sample 201, based on feature extraction network 210, that correspond to the category and category features of that image sample 201. Then, the computing device 101 calculates based on the sample features of at least one image sample. and category features Determine the category features at this batch Local calibration factor Then, the computing device 101 can calculate the local calibration factor by using a moving average across batches, for example, according to formula (5). Determine the category features global calibration factor Finally, computing device 101 utilizes a global calibration factor. To adjust category features In some embodiments, computing device 101 uses an adjusted set of reference sample features. To determine the loss for image sample 201.
[0096] Next reference Figure 7 The diagram illustrates an example flowchart of a process 700 for determining the loss of image samples according to embodiments of the present disclosure. Process 700 may be... Figure 3 An exemplary implementation of box 340 in the diagram, which can be derived from... Figure 2 The loss determination module 270 shown is used for implementation. Process 700 can be implemented by, for example... Figure 1 The process 700 is implemented using the computing device 101 shown. It should be understood that the process 700 may also include additional actions not shown and / or the actions shown may be omitted; the scope of this disclosure is not limited in this respect. The following is in conjunction with… Figures 1 to 4 Detailed description of the process 700.
[0097] Overall, computing device 101 can determine the loss for image sample 201 based on class feature set 221, adversarial positive sample set 252, and adversarial negative sample set 254. In the following exemplary description, class feature set 221 includes adjusted class features. The set of 252 positive samples is used to counteract the virus. The set of 254 negative samples is used to combat the virus. It should be understood that the set of categorical features 221, the set of adversarial positive samples 252, and the set of adversarial negative samples 254 used to determine the loss can also be different, and this disclosure does not limit them.
[0098] In box 710, computing device 101 is based on feature extraction network 210 (including encoder) 214) Generate the second sample features of image sample 201 203. As mentioned above, the computing device 101 is based on the feature extraction network 210, for example via the backbone network 212 and the encoder. 230, the first sample features of generated image sample 201 202.
[0099] In box 720, computing device 101 is based on the second sample features. 203 and category feature set Category features in 221 that differ from those in image samples The comparison, and the features of the first sample 202 and adversarial negative sample set A comparison of 254 determines a first loss for image sample 201. Here, the two sample features can be compared by calculating a dot product. In some embodiments, the first loss is obtained by summing the comparison results. The first loss is associated with the negative sample and represents the inclusion of features from the second sample. The classification loss of 203 and features from the first sample The losses in the fight against 202.
[0100] In box 730, computing device 101 is based on the second sample features. 203 and category feature set Category features in 221 are the same as those in image sample 201. The comparison, and the features of the first sample 202 and the set of adversarial positive samples The comparison of 252 determines the second loss for image sample 201. Similarly, the second loss is obtained by summing the comparison results. The second loss is related to the positive samples, including features from the second sample. The classification loss of 203 and features from the first sample The losses in the fight against 202.
[0101] In box 740, computing device 101 updates the image classification model based on the first loss and the second loss. In some embodiments, the loss from an image sample 201 can be represented as follows:
[0102] (7)
[0103] Using the same method, the loss for each image sample in the training batch can be obtained, thus yielding the total loss for the entire batch. For each image sample in a training batch, the loss for each image sample can be computed in parallel using a parallel processing device (e.g., a GPU). Then, an optimizer such as stochastic gradient descent is used to update the image classification model200.
[0104] The above is for reference only. Figures 1 to 7Image processing methods or processes according to embodiments of the present disclosure are described. Compared to existing solutions, embodiments of the present disclosure can construct adversarial positive and negative sample sets with discriminative difficulty, correcting the decision boundaries of image classification models for tail categories, thereby enabling effective training of image classification models in long-tail environments with imbalanced original training data. In some embodiments, the classification network of the image classification model is calibrated to further eliminate or mitigate the problem of class features favoring head categories due to imbalanced training data. In some embodiments, a unified loss function is also provided, combining prediction loss and contrastive loss to train the image classification model, achieving better training results.
[0105] The embodiments of this disclosure also provide exemplary apparatus and devices. Figure 8 An example block diagram of an image processing apparatus 800 according to an embodiment of the present disclosure is shown. The apparatus 800 can be implemented in... Figure 1 The computing device shown is located at 101.
[0106] As shown in the figure, the device 800 includes a first sample feature generation unit 810, a category feature acquisition unit 820, an adversarial sample set generation unit 830, and a model update unit 840.
[0107] The first sample feature generation unit 810 is configured to generate first sample features for image samples in the training dataset based on the feature extraction network of the image classification model. The category feature acquisition unit 820 is configured to acquire a category feature set from the classification network of the image classification model, where each category feature corresponds to a category associated with the image classification model. The adversarial example set generation unit 830 is configured to generate an adversarial positive sample set and an adversarial negative sample set for the image samples based on the first sample features, the category feature set, and the reference sample feature set, where the reference sample features in the reference sample feature set are generated from multiple image samples in the training dataset based on the feature extraction network. The model update unit 840 is configured to update the image classification model based on the adversarial positive sample set, the adversarial negative sample set, and the category feature set.
[0108] In some embodiments, the image classification model may further include at least one encoder connected to the feature extraction network. Generating first sample features from image samples may include: generating first sample features from image samples based on the feature extraction network and at least one encoder.
[0109] In some embodiments, the apparatus 800 may further include a reference sample feature update unit. The reference sample feature update unit is configured to generate sample features from image samples in a batch containing the image samples, based on a feature extraction network and at least one encoder; add the generated sample features to a reference sample feature set; and remove the reference sample features from the earliest batch from the reference sample feature set.
[0110] In some embodiments, the classification network includes a fully connected network, and the category feature acquisition unit 820 may also be configured to determine a set of category features based on the weights of the fully connected network.
[0111] In some embodiments, the apparatus 800 may further include a calibration unit. The calibration unit is configured to: for each category feature in the category feature set, generate sample features of at least one image sample in the batch containing the image sample, corresponding to the category feature, based on a feature extraction network; determine a local calibration factor for the category feature at the batch based on the sample features and category feature of the at least one image sample; determine a global calibration factor for the category feature from the local calibration factor by a moving average across batches; and adjust the category feature using the global calibration factor.
[0112] In some embodiments, the model update unit may also be configured to update the image classification model based on adjusted class features of the adversarial positive sample set, the adversarial negative sample set, and the class feature set.
[0113] In some embodiments, the adversarial sample set generation unit 830 may further be configured to: determine reference sample features in the reference sample feature set that have the same category as the image sample to generate a first candidate positive sample set; select reference sample features with mispredicted categories from the first candidate positive sample set; combine the reference sample features with mispredicted categories and category features corresponding to the mispredicted categories to generate a second candidate positive sample set; and generate an adversarial positive sample set based on the second candidate positive sample set.
[0114] In some embodiments, the adversarial sample set generation unit 830 may further be configured to: determine reference sample features from a reference sample feature set whose categories are different from those of the image samples, to generate a first candidate negative sample set; select reference sample features from the first candidate negative sample set based on a comparison with the first sample features; combine the selected reference sample features with category features corresponding to the category of the image samples to generate a second candidate negative sample set; and generate an adversarial negative sample set based on the second candidate negative sample set.
[0115] In some embodiments, the adversarial sample set generation unit 830 may also be configured to: determine the distance between the reference sample features in the first candidate negative sample set and the first sample features; and select a number of reference sample features that are closest to the first sample features.
[0116] In some embodiments, the adversarial example set generation unit 830 may also be configured to: perform a weighted summation of selected reference sample features and category features corresponding to the category of the image sample, wherein the first weight for the selected reference sample features and the second weight for the category features are randomized, and the first weight is greater than the second weight.
[0117] In some embodiments, the model update unit 840 may further be configured to: generate second sample features for the image sample based on the feature extraction network; determine a first loss for the image sample based on a comparison of the second sample features and category features in the category feature set whose category is different from that of the image sample, and a comparison of the first sample features and the adversarial negative sample set; determine a second loss for the image sample based on a comparison of the second sample features and category features in the category feature set whose category is the same as that of the image sample, and a comparison of the first sample features and the adversarial positive sample set; and update the image classification model based on the first loss and the second loss.
[0118] Figure 9 A schematic block diagram of an example device 900 that can be used to implement embodiments of the present disclosure is shown. For example, a computing device 101 according to an embodiment of the present disclosure is implemented by device 900. As shown, device 900 includes a central processing unit (CPU) or graphics processing unit (GPU) 901, which can perform various appropriate actions and processes according to computer program instructions stored in read-only memory (ROM) 902 or loaded from storage unit 908 into random access memory (RAM) 903. Various programs and data required for the operation of device 900 may also be stored in RAM 903. The CPU / GPU 901, ROM 902, and RAM 903 are interconnected via bus 904. Input / output (I / O) interface 905 is also connected to bus 904.
[0119] Multiple components in device 900 are connected to I / O interface 905, including: input unit 906, such as keyboard, mouse, etc.; output unit 907, such as various types of monitors, speakers, etc.; storage unit 908, such as disk, optical disk, etc.; and communication unit 909, such as network card, modem, wireless transceiver, etc. Communication unit 909 allows device 900 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.
[0120] The various processes, procedures, models, or apparatuses described above may be executed or implemented by CPU / GPU 901. For example, in some embodiments, methods or processes 300, 500, and / or 600 may be implemented as computer software programs tangibly contained in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and / or installed on device 900 via ROM 902 and / or communication unit 909. When the computer program is loaded into RAM 903 and executed by CPU / GPU 901, one or more actions, implementations, and implementations of the methods or processes 300, 500, and / or 600 described above may be performed. Figure 2 The image classification models 110 and 200 shown, or implementations thereof. Figure 8 The device shown is 800.
[0121] This disclosure can be a method, apparatus, system, and / or computer program product. A computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for performing various aspects of this disclosure.
[0122] Computer-readable storage media can be tangible devices capable of holding and storing instructions for use by an instruction execution device. Computer-readable storage media can be, for example—but not limited to—electrical storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disc read-only memory (CD-ROM), digital multifunction disc (DVD), memory sticks, floppy disks, mechanical encoding devices, such as punch cards or recessed protrusions storing instructions thereon, and any suitable combination of the foregoing. The computer-readable storage media used herein are not to be construed as transient signals themselves, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or electrical signals transmitted through wires.
[0123] The computer-readable program instructions described herein can be downloaded from computer-readable storage media to various computing / processing devices, or downloaded via a network, such as the Internet, local area network, wide area network, and / or wireless network, to an external computer or external storage device. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and / or edge servers. A network adapter card or network interface in each computing / processing device receives the computer-readable program instructions from the network and forwards them to the computer-readable storage media in the respective computing / processing device.
[0124] Computer program instructions used to perform the operations of this disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, status setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or may be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, electronic circuitry, such as programmable logic circuitry, field-programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), is personalized by utilizing the status information of the computer-readable program instructions to implement various aspects of this disclosure.
[0125] Various aspects of this disclosure are described herein with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It should be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer-readable program instructions.
[0126] These computer-readable program instructions can be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus to produce a machine such that, when executed by the processing unit of the computer or other programmable data processing apparatus, they create means for implementing the functions / actions specified in one or more blocks of the flowchart and / or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium that causes a computer, programmable data processing apparatus, and / or other device to operate in a particular manner. Thus, the computer-readable medium storing the instructions comprises an article of manufacture that includes instructions for implementing aspects of the functions / actions specified in one or more blocks of the flowchart and / or block diagram.
[0127] Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to perform the functions / actions specified in one or more boxes of a flowchart and / or block diagram.
[0128] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of an instruction, which contains one or more executable instructions for implementing a specified logical function. In some alternative implementations, the functions marked in the blocks may occur in a different order than those marked in the drawings. For example, two consecutive blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions.
[0129] Various embodiments of this disclosure have been described above. These descriptions are exemplary and not exhaustive, nor are they limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principles, practical applications, or improvements to the technology in the market, or to enable others skilled in the art to understand the embodiments disclosed herein.
Claims
1. A method for image processing, comprising: A feature extraction network based on an image classification model generates the first sample features of image samples in the training dataset. A set of category features is obtained from the classification network of the image classification model, wherein each category feature corresponds to a category associated with the image classification model; Based on the first sample features, the category feature set, and the reference sample feature set, an adversarial positive sample set and an adversarial negative sample set are generated for the image samples. The reference sample features in the reference sample feature set are generated from multiple image samples in the training dataset by the feature extraction network. as well as The image classification model is updated based on the set of adversarial positive samples, the set of adversarial negative samples, and the set of category features.
2. The method according to claim 1, wherein, The image classification model further includes at least one encoder connected to the feature extraction network, and the first sample features for generating image samples include: The first sample features are generated from the image samples based on the feature extraction network and the at least one encoder.
3. The method according to claim 2, further comprising: Based on the feature extraction network and the at least one encoder, sample features are generated from image samples in the batch containing the image samples; Add the generated sample features to the reference sample feature set; as well as Remove the earliest batch of reference sample features from the reference sample feature set.
4. The method according to claim 1, wherein, The classification network includes a fully connected network, and obtaining the category feature set from the classification network includes: The set of category features is determined based on the weights of the fully connected network.
5. The method according to claim 1, further comprising: For each category feature in the set of category features: Based on the feature extraction network, at least one image sample in the batch to which the image sample belongs is generated, whose category corresponds to the category feature; Based on the sample features and category features of the at least one image sample, a local calibration factor for the category features is determined at the batch. A global calibration factor for the category feature is determined from the local calibration factor using a moving average across batches; and The category features are adjusted using the global calibration factor.
6. The method according to claim 5, wherein, Updating the image classification model includes: The image classification model is updated based on the adjusted category features of the adversarial positive sample set, the adversarial negative sample set, and the category feature set.
7. The method according to any one of claims 1 to 6, wherein, The generated set of adversarial positive examples includes: Determine the reference sample features in the reference sample feature set that have the same category as the image sample, so as to generate a first candidate positive sample set; Select reference sample features with incorrectly predicted categories from the first set of candidate positive samples; The reference sample features with the incorrectly predicted category and the category features corresponding to the incorrectly predicted category are combined to generate a second set of candidate positive samples; and The adversarial positive sample set is generated based on the second candidate positive sample set.
8. The method according to any one of claims 1 to 6, wherein, Generate an adversarial negative sample set, including: From the set of reference sample features, determine reference sample features whose categories are different from those of the image samples to generate a first set of candidate negative samples; Based on a comparison with the features of the first sample, reference sample features are selected from the first set of candidate negative samples; and The selected reference sample features and the category features corresponding to the category of the image sample are combined to generate a second set of candidate negative samples; and The adversarial negative sample set is generated based on the second candidate negative sample set.
9. The method according to claim 8, wherein, Based on the comparison with the first sample features, the selection of reference sample features from the first candidate negative sample set includes: Determine the distance between the features of the reference samples in the first candidate negative sample set and the features of the first sample; and Select several reference sample features that are closest to the first sample feature.
10. The method according to claim 8, wherein, Combining the selected reference sample features with the category features corresponding to the category of the image sample to generate a second candidate negative sample set includes: The selected reference sample features and the category features corresponding to the category of the image sample are weighted and summed, wherein the first weight for the selected reference sample features and the second weight for the category features are randomized, and the first weight is greater than the second weight.
11. The method according to any one of claims 1 to 6, wherein, Updating the image classification model based on the adversarial positive sample set, the adversarial negative sample set, and the category feature set includes: Based on the feature extraction network, a second sample feature of the image sample is generated; Based on the comparison of the second sample features and the category features in the category feature set whose categories are different from those of the image samples, and the comparison of the first sample features and the adversarial negative sample set, a first loss is determined for the image samples; Based on a comparison of the second sample features and the category features in the category feature set that have the same category as the image sample, and a comparison of the first sample features and the adversarial positive sample set, a second loss is determined for the image sample; and The image classification model is updated based on the first loss and the second loss.
12. An apparatus for image processing, the apparatus comprising: The first sample feature generation unit is configured to generate the first sample features of image samples in the training dataset based on the feature extraction network of the image classification model. The category feature acquisition unit is configured to acquire a set of category features from the classification network of the image classification model, wherein each category feature corresponds to a category associated with the image classification model; The adversarial sample set generation unit is configured to generate an adversarial positive sample set and an adversarial negative sample set for the image sample based on the first sample features, the category feature set, and the reference sample feature set, wherein the reference sample features in the reference sample feature set are generated from multiple image samples in the training dataset by the feature extraction network. as well as The model update unit is configured to update the image classification model based on the adversarial positive sample set, the adversarial negative sample set, and the category feature set.
13. A computing device, comprising: At least one processing unit; At least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions, when executed by the at least one processing unit, causing the computing device to perform the method according to any one of claims 1 to 11.
14. A computer-readable storage medium comprising machine-executable instructions that, when executed by a device, cause the device to perform the method according to any one of claims 1 to 11.