Training methods, devices, electronic equipment, and storage media for image classification models

By using the vMF distribution to model image features in the image classification model and generating image feature comparison pairs, the problem of data imbalance in the long-tailed distribution is solved, and the generalization performance of the model is improved.

CN117372756BActive Publication Date: 2026-06-30TSINGHUA UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
TSINGHUA UNIVERSITY
Filing Date
2023-10-10
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing image classification models are prone to overfitting in long-tailed distributed data, leading to a decline in generalization performance. Supervised contrastive learning requires a large amount of data, which is difficult to meet the requirements of class coverage.

Method used

Image features are modeled using the vMF distribution. The model is trained by generating image feature contrast pairs, including positive and negative pairs. The model is optimized using both contrastive loss function and adjustment loss function. A closed-form expression is derived to address the data imbalance problem.

Benefits of technology

By balancing the sampling of image category features, the performance of the model is improved, the problem of data imbalance is solved, and efficient image classification is achieved.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117372756B_ABST
    Figure CN117372756B_ABST
Patent Text Reader

Abstract

This disclosure provides a training method, apparatus, electronic device, and storage medium for an image classification model, relating to the field of image processing technology, and aiming to solve the problem of imbalanced class data. The method includes: inputting multiple image samples carrying image class labels into an image classification model to be trained, obtaining target image features for each of the multiple image samples; determining the vMF distribution of image features for each image class based on the target image features of image samples carrying the same image class label; sampling the vMF distribution of image features for each image class to obtain multiple sampled image features; generating multiple image feature comparison pairs based on each sampled image feature and each target image feature; and training the image classification model to be trained based on the multiple image feature comparison pairs to obtain a trained image classification model.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of image processing technology, and in particular to a training method, apparatus, electronic device, and storage medium for an image classification model. Background Technology

[0002] Long-tailed distributions are frequently found in real-world data, where most categories contain only a small amount of data, resulting in an imbalanced data distribution. Training a model on imbalanced data leads to better learning performance on samples with larger datasets and worse performance on samples with smaller datasets. Therefore, long-tailed data significantly reduces the model's generalization performance and easily leads to overfitting.

[0003] Related research has shown that supervised contrastive learning (SCL) has great potential in mitigating data imbalance. However, supervised contrastive learning requires a sufficiently large amount of training data to construct contrast pairs covering all classes, a requirement that is difficult to meet when class data is imbalanced. Summary of the Invention

[0004] In view of the above problems, this disclosure provides a training method, apparatus, electronic device and storage medium for an image classification model, so as to overcome the above problems or at least partially solve the above problems.

[0005] A first aspect of this disclosure provides a method for training an image classification model, the method comprising:

[0006] Multiple image samples carrying image category labels are input into the image classification model to be trained to obtain the target image features of each of the multiple image samples;

[0007] Based on the target image features of image samples carrying the same image category label, determine the vMF distribution of image features for each image category;

[0008] The vMF distribution of image features for each image category is sampled to obtain multiple sampled image features;

[0009] Based on each of the sampled image features and each of the target image features, multiple image feature comparison pairs are generated. The image feature comparison pairs include the following two types: positive pairs consisting of two image features of the same image category, and negative pairs consisting of two image features of different image categories.

[0010] Based on the multiple image feature comparison pairs, the image classification model to be trained is trained to obtain a trained image classification model.

[0011] Optionally, training the image classification model to be trained based on the plurality of image feature comparison pairs to obtain a trained image classification model includes:

[0012] Based on the vMF distribution of image features for each image category, a closed-form expression for the contrast loss function is derived when sampling an infinite number of image feature contrast pairs from the vMF distribution.

[0013] The contrast loss function value is determined based on the image feature comparison pair, the image category to which the target image features in the image feature comparison pair belong, and the closed expression of the contrast loss function;

[0014] Based on the contrastive loss function value, the image classification model to be trained is trained to obtain the image classification model.

[0015] Optionally, the step of inputting multiple image samples carrying image category labels into the image classification model to be trained to obtain the target image features of each of the multiple image samples includes:

[0016] Obtain multiple image samples from multiple training batches;

[0017] Multiple image samples from each training batch are input into the image classification model to be trained for that training batch to obtain the target image features of each multiple image sample from each training batch.

[0018] The step of determining the vMF distribution of image features for each image category based on the target image features of image samples carrying the same image category label includes:

[0019] In each training batch, image features for each image category are obtained based on the target image features of each of the multiple image samples in the training batch and the target image features of each of the multiple image samples in each previous training batch.

[0020] Based on the image features of each image category, construct the probability density function of the vMF distribution of the image features for each image category;

[0021] Obtain statistical data on image features for each of the aforementioned image categories;

[0022] Based on the statistical data, the average parameters and lumped parameters of the probability density function of the image feature vMF distribution for each image category are determined using the maximum likelihood estimation method;

[0023] The image feature vMF distribution for each image category is determined based on the average and lumped parameters of the probability density function of the image feature vMF distribution for each image category.

[0024] Optionally, after obtaining multiple sampled image features, the method further includes:

[0025] Obtain the image features of the image to be processed;

[0026] Based on the image features of the image to be processed and the image features of each image category, calculate the contrast loss function value of the image to be processed in each image category;

[0027] The image category with the smallest contrast loss function value is determined as the image category of the image to be processed;

[0028] Based on the image category of the image to be processed, a pseudo-label for the image category is generated, and the pseudo-label is used for semi-supervised training.

[0029] A second aspect of this disclosure provides a training apparatus for an image classification model, the apparatus comprising:

[0030] The input module is used to input multiple image samples carrying image category labels into the image classification model to be trained, and obtain the target image features of each of the multiple image samples;

[0031] The determination module is used to determine the vMF distribution of image features for each image category based on the target image features of image samples carrying the same image category label.

[0032] The sampling module is used to sample the vMF distribution of image features for each image category to obtain multiple sampled image features;

[0033] The generation module is used to generate multiple image feature comparison pairs based on each of the sampled image features and each of the target image features. The image feature comparison pairs include the following two types: positive pairs composed of two image features of the same image category, and negative pairs composed of two image features of different image categories.

[0034] The training module is used to train the image classification model to be trained based on the multiple image feature comparison pairs, so as to obtain a trained image classification model.

[0035] Optionally, the training module is specifically used for:

[0036] Based on the vMF distribution of image features for each image category, a closed-form expression for the contrast loss function is derived when sampling an infinite number of image feature contrast pairs from the vMF distribution.

[0037] The contrast loss function value is determined based on the image feature comparison pair, the image category to which the target image features in the image feature comparison pair belong, and the closed expression of the contrast loss function;

[0038] Based on the contrastive loss function value, the image classification model to be trained is trained to obtain the image classification model.

[0039] Optionally, the input module is specifically used for:

[0040] Obtain multiple image samples from multiple training batches;

[0041] Multiple image samples from each training batch are input into the image classification model to be trained for that training batch to obtain the target image features of each multiple image sample from each training batch.

[0042] The step of determining the vMF distribution of image features for each image category based on the target image features of image samples carrying the same image category label includes:

[0043] In each training batch, image features for each image category are obtained based on the target image features of each of the multiple image samples in the training batch and the target image features of each of the multiple image samples in each previous training batch.

[0044] Based on the image features of each image category, construct the probability density function of the vMF distribution of the image features for each image category;

[0045] Obtain statistical data on image features for each of the aforementioned image categories;

[0046] Based on the statistical data, the average parameters and lumped parameters of the probability density function of the image feature vMF distribution for each image category are determined using the maximum likelihood estimation method;

[0047] The image feature vMF distribution for each image category is determined based on the average and lumped parameters of the probability density function of the image feature vMF distribution for each image category.

[0048] Optionally, after obtaining multiple sampled image features, the apparatus further includes:

[0049] The acquisition module is used to acquire the image features of the image to be processed;

[0050] The calculation module is used to calculate the contrast loss function value of the image to be processed in each image category based on the image features of the image to be processed and the image features of each image category.

[0051] The category determination module is used to determine the image category with the smallest contrast loss function value as the image category of the image to be processed;

[0052] The pseudo-label generation module is used to generate pseudo-labels for the image category of the image to be processed based on the image category of the image to be processed. The pseudo-labels for the image category are used for semi-supervised training.

[0053] A third aspect of this disclosure provides an electronic device, including: a processor; and a memory for storing processor-executable instructions; wherein the processor is configured to execute instructions to implement a training method for an image classification model as described in the first aspect.

[0054] A fourth aspect of this disclosure provides a computer-readable storage medium that, when instructions in the computer-readable storage medium are executed by a processor of an electronic device, enables the electronic device to perform a training method for an image classification model as described in the first aspect.

[0055] The embodiments disclosed herein have the following advantages:

[0056] In this embodiment, multiple image samples carrying image category labels are input into the image classification model to be trained to obtain the target image features of each of the multiple image samples. Based on the target image features of the image samples carrying the same image category labels, the vMF distribution (von Mises-Fisher distribution, a distribution on a unit sphere) of the image features for each image category is determined. The vMF distribution of the image features for each image category is sampled to obtain multiple sampled image features. Multiple image feature comparison pairs are generated based on each sampled image feature and each target image feature. These image feature comparison pairs include two types: positive pairs consisting of two image features of the same image category, and negative pairs consisting of two image features of different image categories. Based on these multiple image feature comparison pairs, the image classification model to be trained is trained to obtain a trained image classification model. Thus, by balancing the number of times the vMF of the image features for each image category is sampled, the problem of data imbalance between different image categories can be solved. Furthermore, by training the initial image classification model based on the balanced image feature comparison pairs for different data categories, a high-performance image classification model can be obtained. Attached Figure Description

[0057] To more clearly illustrate the technical solutions of the embodiments of this disclosure, the accompanying drawings used in the description of the embodiments of this disclosure will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0058] Figure 1 This is a flowchart illustrating the steps of a training method for an image classification model according to an embodiment of this disclosure.

[0059] Figure 2 This is a schematic diagram of the structure of a training device for an image classification model according to an embodiment of this disclosure. Detailed Implementation

[0060] To make the above-mentioned objectives, features and advantages of this disclosure more apparent and understandable, the disclosure will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0061] Reference Figure 1 The diagram illustrates a flowchart of the steps involved in training an image classification model according to an embodiment of this disclosure. Figure 1 As shown, the training method for this image classification model may specifically include steps S11 to S15.

[0062] In step S11: multiple image samples carrying image category labels are input into the image classification model to be trained to obtain the target image features of each of the multiple image samples.

[0063] Multiple image samples can come from different image datasets. The image datasets can be image datasets with a balanced distribution of image classes or image datasets with an unbalanced distribution of image classes. Each image sample carries an image class label that represents the image class to which the image sample belongs.

[0064] Each image sample is input into the image classification model to be trained. The feature extraction network included in the image classification model can extract the image features of each image sample, thus obtaining the target image features of each image sample. The structure of the feature extraction network can refer to relevant techniques.

[0065] Optionally, multiple first image samples carrying image category labels can be obtained firstly, and then various data augmentation processes can be performed on each first image sample to obtain a second image sample corresponding to each first image sample. The second image sample has the same image category label as the corresponding first image sample.

[0066] The data augmentation processing performed on the first image sample may include, but is not limited to: random cropping, flipping, rotating, scaling, translating, adding noise, blurring, masking, and color gamut changes.

[0067] After obtaining the first image sample and the second image sample, the first image sample and the second image sample are respectively input into the image classification model to obtain the target image features of each image sample.

[0068] In step S12: Based on the target image features of image samples carrying the same image category label, determine the vMF distribution of image features for each image category.

[0069] Image features contain rich semantic information, and statistical data of image features can represent the intra- and inter-class variations of images. From a data augmentation perspective, modeling unconstrained features of a normal distribution can yield an upper bound for the expected cross-entropy loss. However, due to feature normalization in contrastive learning, directly modeling with a normal distribution is unreliable. Furthermore, for long-tailed data, it is impossible to estimate the distribution of all categories from a small batch of data. Therefore, in this embodiment, a reasonable and simple vMF distribution is established on a unit hypersphere to model the image feature distribution of each image category. The vMF distribution is a probability distribution on a unit hypersphere controlled by the mean parameter μ and the lumped parameter k.

[0070] Image samples carrying the same image category label are considered to belong to the same image category. The target image features of an image sample belonging to a given image category are defined as the image features of that category. Therefore, the vMF distribution of the image features for that image category can be determined based on the target image features of the image samples carrying the same image category label. The method for determining the vMF distribution will be described in detail later.

[0071] In step S13: the vMF distribution of image features for each image category is sampled to obtain multiple sampled image features.

[0072] The vMF distribution of image features for each image category is sampled, and the resulting sampled image features are the image features of that image category.

[0073] In step S14: Based on each sampled image feature and each target image feature, a plurality of image feature comparison pairs are generated. The image feature comparison pairs include the following two types: positive pairs composed of image features of the same image category, and negative pairs composed of image features of different image categories.

[0074] By combining each sampled image feature and each target image feature, multiple image feature comparison pairs can be obtained. Each image feature comparison pair includes one sampled image feature and one target image feature. If the sampled image feature and the target image feature included in an image feature comparison pair belong to the same image category, then the image feature comparison pair is a positive pair; if the sampled image feature and the target image feature included in an image feature comparison pair belong to different image categories, then the image feature comparison pair is a negative pair.

[0075] In step S15: Based on the multiple image feature comparison pairs, the image classification model to be trained is trained to obtain a trained image classification model.

[0076] The image classification model to be trained uses image feature comparison pairs to perform contrastive learning. Contrastive learning learns a mapping relationship that makes features of the same image category but far apart in high-dimensional space closer after being mapped to low-dimensional space, and features of different image categories but far apart in high-dimensional space farther apart after being mapped to low-dimensional space.

[0077] Optionally, training the image classification model to be trained based on the plurality of image feature comparison pairs to obtain a trained image classification model may include: deriving a closed-form expression for the contrast loss function when sampling an infinite number of image feature comparison pairs from the vMF distribution based on the vMF distribution of image features for each image category; determining the contrast loss function value based on the image feature comparison pairs, the image category to which the target image feature belongs in the image feature comparison pairs, and the closed-form expression of the contrast loss function; and training the image classification model to be trained based on the contrast loss function value to obtain the image classification model.

[0078] The image classification model to be trained can predict whether multiple image feature pairs are positive or negative. Based on the prediction results and whether the actual image feature pairs are positive or negative, the contrast loss function value is determined. Using the contrast loss function value, the image classification model to be trained can be trained to obtain a trained image classification model.

[0079] Optionally, to improve efficiency, mathematical analysis can be used to extend the number of sampled image feature pairs to infinity, and a closed-form expression for the contrast loss function can be rigorously derived. Therefore, based on the vMF distribution of image features for each image category, a closed-form expression for the contrast loss function can be derived when sampling an infinite number of image feature pairs from the vMF distribution. The contrast loss function value is determined based on the image feature pairs, the image category to which the target image features in the image feature pairs belong, and the closed-form expression of the contrast loss function. Based on the contrast loss function value, the initial image classification model is trained to obtain the image classification model.

[0080] Analytical solutions are rigorous formulas that allow us to derive the dependent variable from any independent variable; the dependent variable is the solution to the problem. Analytical solutions are the form of solutions, and the methods used to obtain them are called analytical methods. An analytical solution is a closed-form function; therefore, for any independent variable, it can be substituted into the analytical function to obtain the correct dependent variable. Thus, analytical solutions are also called solutions to closed-form expressions. Therefore, by substituting the image feature comparison pairs and the image categories to which the target image features belong in the image feature comparison pairs into the closed-form expression of the contrastive loss function, we can obtain the contrastive loss function value. Then, based on the contrastive loss function value, we can train the initial image classification model to obtain the image classification model.

[0081] Building upon the aforementioned technical solutions, training the initial image classification model solely based on the contrastive loss function value only allows the model to determine whether two image features belong to the same image category, but not which category a single image feature belongs to. Therefore, adjusting the loss function value can also be used to train the initial image classification model, resulting in a better-trained image classification model.

[0082] Multiple image samples carrying image category labels are input into an initial image classification model, which predicts the category of each image sample. Based on the image category labels and predicted categories of the image samples, an adjusted loss function value is determined. Then, based on the adjusted loss function value and the contrastive loss function value, the initial image classification model is trained to obtain the final image classification model.

[0083] Optionally, based on the above technical solution, the step of inputting multiple image samples carrying image category labels into the image classification model to be trained to obtain the target image features of each of the multiple image samples may include: obtaining multiple image samples from multiple training batches; inputting the multiple image samples from each training batch into the image classification model to be trained for that training batch to obtain the target image features of each of the multiple image samples from each training batch.

[0084] Image classification models can have multiple training batches, each with different model parameters. Image samples can be divided into multiple training batches, and each batch's image samples can be used to output the image classification model to be trained, thus obtaining the target image features for each batch.

[0085] The step of determining the vMF distribution of image features for each image category based on the target image features of image samples carrying the same image category label may include: in each training batch, obtaining image features for each image category based on the target image features of multiple image samples in the training batch and the target image features of multiple image samples in previous training batches; constructing a probability density function of the vMF distribution of image features for each image category based on the image features of each image category; obtaining statistical data of image features for each image category; determining the average parameter and lumped parameter of the probability density function of the vMF distribution of image features for each image category using the maximum likelihood estimation method based on the statistical data; and determining the vMF distribution of image features for each image category based on the average parameter and lumped parameter of the probability density function of the vMF distribution of image features for each image category.

[0086] In each training batch, the target image features of each image sample extracted from previous training batches and the target image features of each image sample extracted from the current training batch can be combined to obtain the image features for each image category. Based on the image features of each image category, a probability density function of the vMF distribution of the image features for each image category is constructed.

[0087] The calculation is relatively simple when using the maximum likelihood estimation method to determine the average and lumped parameters of the probability density function of the vMF distribution of image features for each image category.

[0088] In each training batch, when estimating the average and lumped parameters of the probability density function of the vMF distribution, the more target image features used, the better. This embodiment of the disclosure accumulates samples from each training batch during training for estimation, increasing the number of samples, and requiring only the estimation of one average and lumped parameter per training batch.

[0089] Optionally, the vMF distribution of image features for each image category can be sampled directly to obtain the image features for each image category. These image features are then input into an initial image classification model, which predicts the image category to which the image features belong. Based on the true image category corresponding to the image features and the predicted image category, the adjusted loss function value is determined.

[0090] By employing the technical solution of this disclosure, the problem of data imbalance between different image categories can be solved by balancing the number of times the vMF of image features is sampled for each image category. Furthermore, by training the initial image classification model based on the image feature comparison pairs with balanced data categories, a high-performance image classification model can be obtained.

[0091] Optionally, based on the above technical solution, the contrastive loss function can be applied to semi-supervised learning to directly generate image category pseudo-labels for unclassified images, and then used to back-estimate the distribution.

[0092] For example, weakly enhanced images can be predicted to generate pseudo-labels for image categories in strongly enhanced images. By introducing the image feature distribution, the contrastive loss function value of the weakly enhanced image for each image category can be calculated to represent the posterior probability, thereby generating pseudo-labels for image categories.

[0093] The vMF distribution of image features for each image category is sampled to obtain the image features for each image category; the image features of the image to be processed are obtained; the contrast loss function value for each image category is calculated based on the image features of the image to be processed and the image features of each image category; the image category with the smallest contrast loss function value is determined as the image category of the image to be processed; and a pseudo-label for the image category of the image to be processed is generated based on the image category of the image to be processed. The pseudo-label is used for semi-supervised training of the model.

[0094] The image to be processed can be any image. The method for obtaining the image features of the image to be processed can be referred to above. Substituting the image features of the image to be processed and the image features of each image category into the closed expression of the contrast loss function, the contrast loss function value between the image features of the image to be processed and the image features of each image category can be calculated. The image category with the smallest contrast loss function value is determined as the image category of the image to be processed. Based on the image category of the image to be processed, pseudo-labels for the image category can be generated. The image to be processed carrying the pseudo-labels for the image category can be used as training samples in semi-supervised training to perform semi-supervised training on the model.

[0095] This disclosure proposes a novel Probabilistic Contrastive Learning (ProCo) algorithm, which is an improvement on the SCL algorithm.

[0096] First, we introduce the SCL algorithm to lay the foundation for the ProCo algorithm. Taking image classification as an example, given an image sample set... Where x i Representing the i-th image sample, y i The image category represents the image class to which the i-th image sample belongs, and N is the number of image samples, i = 1, 2, ..., N. The SCL algorithm can represent the spatial... Image samples are mapped to the image category space. In the image categories, where K represents the image category. Mapping function It is typically modeled as a neural network, which consists of a feature extractor F: And a linear classifier G: composition.

[0097] Logits (prediction vector) adjusted loss is a loss margin modification method that uses prior probabilities of various classes as boundaries during training and inference. The adjusted loss function can be determined. for:

[0098]

[0099] Where y′ is the predicted image category of xi, Characterizing the image category y in the image sample set i Class frequency, π y′ The class frequency characterizing the predicted image category y′ in the image sample set; Is the image category y i logits, is the logits of the predicted image category y′; exp represents the exponential function with the natural constant e as the base. The meanings of the remaining characters can be found in the preceding text.

[0100] SCL can distinguish products with the same label y i =y j The opposite of (x) i x j ) and have different labels y i ≠y j negative pair (x) i x j Given any batch of image samples - image category labels With a temperature parameter τ, two expressions for SCL loss can be determined:

[0101]

[0102]

[0103] in, and These are two types of losses for SCL, differing in the position of the logarithm. A(j) represents the batch B / {(x) with the same image class label j. i ,y i The set of indexes of instances in )} It is its cardinality, z i z p z a These represent image samples x extracted by the feature extractor. i x p xa The normalization characteristics. The meanings of the remaining characters can be found in the previous text.

[0104] For any image sample in a batch, SCL treats other image samples with the same image class label as positive samples and the rest as negative samples. Therefore, the batch must contain a sufficient number of image samples to ensure that each example receives an appropriate supervision signal. However, large batches of image samples often lead to significant computational and memory burdens. Furthermore, in real-world machine learning scenarios, data distributions typically exhibit a long-tail pattern, with tail classes rarely sampled in mini-batches. This particular characteristic necessitates further increasing the number of image samples to effectively supervise tail classes.

[0105] To address this issue, embodiments of this disclosure propose the ProCo algorithm, which constructs contrastive pairs by estimating feature distributions and samples. Unconstrained features are modeled from a data augmentation perspective using a normal distribution, resulting in an optimized upper bound for the expected loss. However, features in contrastive learning are confined to a unit hypersphere, making direct modeling with a normal distribution unsuitable. Furthermore, due to the imbalanced distribution of training data, it is impossible to estimate the distribution parameters of all classes in a mini-batch. Therefore, we introduce a vMF distribution defined on the hypersphere, whose parameters can be efficiently estimated using maximum likelihood estimation across different batches. Moreover, a closed-form of the expected contrastive loss function, rather than an efficient upper bound, is rigorously derived and applied to semi-supervised learning.

[0106] First, assuming that image features follow a mixed vMF distribution, the probability density function of the vMF distribution of a random p-dimensional unit vector z is:

[0107]

[0108]

[0109] Where κ≥0,||μ||2=1,I (p / 2-1) The first-order modified Bessel function of the p / 2-1 order is defined as:

[0110]

[0111] Where μ is the average parameter and κ is the lumped parameter. As κ increases, the average parameter μ becomes more concentrated; when κ = 0, the distribution on the sphere is uniform.

[0112] Under the above assumptions, a mixed vMF distribution is used to simulate the image feature distribution:

[0113]

[0114] Wherein, the probability estimate of image category y is π y , which corresponds to the frequency of class y in the image sample set. The maximum likelihood estimation method can be used to estimate the mean parameter μ and the lumped parameter κ in the image feature distribution.

[0115] Assuming a unit hypersphere S p-1 A series of N independent unit vectors It is obtained from a vMF distribution. The maximum likelihood estimates of the mean parameter μ and the lumped parameter κ satisfy the following equation:

[0116]

[0117]

[0118] in, The sample mean. The length of the sample mean. An approximation of κ. It can be represented as:

[0119]

[0120] By aggregating the statistical data from the previous batch and the current batch, the sample mean for each image category is estimated online. Specifically, maximum likelihood estimation is performed using the estimated sample mean from the previous batch, while a new sample mean is maintained during the initialization of the current batch through online estimation.

[0121] in, It is the estimated sample mean of class j at step t; It is the average value of the j-th class of samples in the current batch. This indicates the number of samples in the previous batch. This indicates the number of samples in the current batch.

[0122] Based on the estimated parameters, image feature contrast pairs can be directly sampled from the mixed vMF distribution. However, sampling a sufficient number of data from the vMF distribution in each training iteration is inefficient. Therefore, embodiments of this disclosure employ mathematical analysis to extend the number of samples to infinity and rigorously derive a closed-form expression for the contrast loss function. This closed-form expression for the contrast loss function can be one of the following two:

[0123]

[0124]

[0125] in, τ is a temperature parameter.

[0126] In this way, an infinite number of contrastive samples are implicitly achieved through surrogate loss, eliminating the need for complex sampling operations and enabling efficient optimization. This design addresses the issue of SCL's reliance on large amounts of data. Furthermore, the assumptions about image feature distribution and parameter estimation effectively capture the feature diversity across different image categories, resulting in superior performance.

[0127] Image classification models can employ a two-branch design, comprising a classification branch based on a linear classifier and a representation branch based on a projection head. The representation branch maps vector representations to another feature space, thus decoupling it from the classifier using a multilayer perceptron. For both the classification and representation branches, a weighted sum of adjusted and contrastive loss functions is used to obtain the total loss function. The initial image classification model is then trained using this total loss function. Introducing an additional representation branch during training allows for efficient optimization alongside the classification branch using stochastic gradient descent, without introducing any additional overhead during inference.

[0128] This disclosure uses the vMF distribution to model the feature distribution, which has the following advantages:

[0129] 1) The distribution parameters of the vMF distribution can be estimated by using only the maximum likelihood estimation of the first sample moments, and the distribution parameters of different batches can be efficiently calculated during training.

[0130] 2) Based on this formula, when the number of samples approaches infinity, the closed form of the expected loss can be strictly derived, thereby avoiding the need to explicitly sample a large number of pairs and minimizing the surrogate loss function. This can effectively achieve optimization and will not introduce any additional overhead during the inference process.

[0131] Understandably, by using different image samples with different labels, image classification models suitable for different application scenarios can be trained. For example, when training an image classification model, if the image samples used are echocardiogram images, a target image classification model can be obtained for classifying echocardiogram images (also known as echocardiography).

[0132] Echocardiography images can display the internal cross-sectional structures of the heart, such as the ventricles, atria, and arteries. Echocardiography images can be classified in several dimensions, such as by ultrasound technique, by section, by image quality, and by section integrity. Classification by ultrasound technique includes: two-dimensional echocardiography, spectral Doppler echocardiography, color Doppler echocardiography, and other techniques. Classification by section includes: parasternal long-axis view, apical four-chamber view, parasternal left ventricular short-axis view, and subxiphoid view, among others. Classification by image quality includes various image quality levels, such as progressively enhancing levels I, II, III…N. Classification by section integrity includes various section integrity levels, such as progressively enhancing levels I, II, III…N.

[0133] Under normal circumstances, the above-mentioned classification tasks are all achieved manually. Due to human factors, this results in neither high classification efficiency nor guaranteed accuracy. However, using the target image classification model trained according to the embodiments of this disclosure to classify cardiac ultrasound images can achieve efficient and accurate classification.

[0134] It should be noted that, for the sake of simplicity, the method embodiments are all described as a series of actions. However, those skilled in the art should understand that the embodiments of this disclosure are not limited to the described order of actions, because according to the embodiments of this disclosure, some steps can be performed in other orders or simultaneously. Secondly, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily required by the embodiments of this disclosure.

[0135] Figure 2 This is a schematic diagram of the structure of a training device for an image classification model according to an embodiment of this disclosure, as shown below. Figure 2 As shown, the device includes an input module, a determination module, a sampling module, a generation module, and a training module, wherein:

[0136] The input module is used to input multiple image samples carrying image category labels into the image classification model to be trained, and obtain the target image features of each of the multiple image samples;

[0137] The determination module is used to determine the vMF distribution of image features for each image category based on the target image features of image samples carrying the same image category label.

[0138] The sampling module is used to sample the vMF distribution of image features for each image category to obtain multiple sampled image features;

[0139] The generation module is used to generate multiple image feature comparison pairs based on each of the sampled image features and each of the target image features. The image feature comparison pairs include the following two types: positive pairs composed of two image features of the same image category, and negative pairs composed of two image features of different image categories.

[0140] The training module is used to train the image classification model to be trained based on the multiple image feature comparison pairs, so as to obtain a trained image classification model.

[0141] Optionally, the training module is specifically used for:

[0142] Based on the vMF distribution of image features for each image category, a closed-form expression for the contrast loss function is derived when sampling an infinite number of image feature contrast pairs from the vMF distribution.

[0143] The contrast loss function value is determined based on the image feature comparison pair, the image category to which the target image features in the image feature comparison pair belong, and the closed expression of the contrast loss function;

[0144] Based on the contrastive loss function value, the image classification model to be trained is trained to obtain the image classification model.

[0145] Optionally, the input module is specifically used for:

[0146] Obtain multiple image samples from multiple training batches;

[0147] Multiple image samples from each training batch are input into the image classification model to be trained for that training batch to obtain the target image features of each multiple image sample from each training batch.

[0148] The step of determining the vMF distribution of image features for each image category based on the target image features of image samples carrying the same image category label includes:

[0149] In each training batch, image features for each image category are obtained based on the target image features of each of the multiple image samples in the training batch and the target image features of each of the multiple image samples in each previous training batch.

[0150] Based on the image features of each image category, construct the probability density function of the vMF distribution of the image features for each image category;

[0151] Obtain statistical data on image features for each of the aforementioned image categories;

[0152] Based on the statistical data, the average parameters and lumped parameters of the probability density function of the image feature vMF distribution for each image category are determined using the maximum likelihood estimation method;

[0153] The image feature vMF distribution for each image category is determined based on the average and lumped parameters of the probability density function of the image feature vMF distribution for each image category.

[0154] Optionally, after obtaining multiple sampled image features, the apparatus further includes:

[0155] The acquisition module is used to acquire the image features of the image to be processed;

[0156] The calculation module is used to calculate the contrast loss function value of the image to be processed in each image category based on the image features of the image to be processed and the image features of each image category.

[0157] The category determination module is used to determine the image category with the smallest contrast loss function value as the image category of the image to be processed;

[0158] The pseudo-label generation module is used to generate pseudo-labels for the image category of the image to be processed based on the image category of the image to be processed. The pseudo-labels for the image category are used for semi-supervised training.

[0159] It should be noted that the device embodiments are similar to the method embodiments, so the description is relatively simple. For relevant details, please refer to the method embodiments.

[0160] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other.

[0161] Those skilled in the art will understand that embodiments of this disclosure can be provided as methods, apparatus, or computer program products. Therefore, embodiments of this disclosure can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of this disclosure can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0162] This disclosure describes embodiments of methods, apparatus, electronic devices, and computer program products according to embodiments of this disclosure with reference to flowchart illustrations and / or block diagrams. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, generate instructions for implementing the flowchart illustrations. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0163] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0164] These computer program instructions can also be loaded onto a computer or other programmable data processing terminal equipment, causing a series of operational steps to be performed on the computer or other programmable terminal equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable terminal equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0165] While preferred embodiments of the present disclosure have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including both the preferred embodiments and all changes and modifications falling within the scope of the present disclosure.

[0166] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or terminal device that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or terminal device. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or terminal device that includes the element.

[0167] The training method, apparatus, electronic device, and storage medium for an image classification model provided in this disclosure have been described in detail above. Specific examples have been used to illustrate the principles and implementation methods of this disclosure. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of this disclosure. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this disclosure. Therefore, the content of this specification should not be construed as a limitation of this disclosure.

Claims

1. A method for training an image classification model, the method comprising: The method includes: Multiple image samples carrying image category labels are input into the image classification model to be trained to obtain the target image features of each of the multiple image samples; Based on the target image features of image samples carrying the same image category label, determine the vMF distribution of image features for each image category; The vMF distribution of image features for each image category is sampled to obtain multiple sampled image features; Based on each of the sampled image features and each of the target image features, multiple image feature comparison pairs are generated. The image feature comparison pairs include the following two types: positive pairs consisting of two image features of the same image category, and negative pairs consisting of two image features of different image categories. Based on the multiple image feature comparison pairs, the image classification model to be trained is trained to obtain a trained image classification model.

2. The method according to claim 1, characterized in that, The step of training the image classification model to be trained based on the multiple image feature comparison pairs to obtain a trained image classification model includes: Based on the vMF distribution of image features for each image category, a closed-form expression for the contrast loss function is derived when sampling an infinite number of image feature contrast pairs from the vMF distribution. The contrast loss function value is determined based on the image feature comparison pair, the image category to which the target image features in the image feature comparison pair belong, and the closed expression of the contrast loss function; Based on the contrastive loss function value, the image classification model to be trained is trained to obtain the image classification model.

3. The method according to claim 1, characterized in that, The process of inputting multiple image samples carrying image category labels into the image classification model to be trained, and obtaining the target image features of each of the multiple image samples, includes: Obtain multiple image samples from multiple training batches; Multiple image samples from each training batch are input into the image classification model to be trained for that training batch to obtain the target image features of each multiple image sample from each training batch. The step of determining the vMF distribution of image features for each image category based on the target image features of image samples carrying the same image category label includes: In each training batch, image features for each image category are obtained based on the target image features of each of the multiple image samples in the training batch and the target image features of each of the multiple image samples in each previous training batch. Based on the image features of each image category, construct the probability density function of the vMF distribution of the image features for each image category; Obtain statistical data on image features for each of the aforementioned image categories; Based on the statistical data, the average parameters and lumped parameters of the probability density function of the image feature vMF distribution for each image category are determined using the maximum likelihood estimation method; The image feature vMF distribution for each image category is determined based on the average and lumped parameters of the probability density function of the image feature vMF distribution for each image category.

4. The method according to claim 1, characterized in that, After obtaining multiple sampled image features, the method further includes: Obtain the image features of the image to be processed; Based on the image features of the image to be processed and the image features of each image category, calculate the contrast loss function value of the image to be processed in each image category; The image category with the smallest contrast loss function value is determined as the image category of the image to be processed; Based on the image category of the image to be processed, a pseudo-label for the image category is generated, and the pseudo-label is used for semi-supervised training.

5. A training device for an image classification model, characterized in that, The device includes: The input module is used to input multiple image samples carrying image category labels into the image classification model to be trained, and obtain the target image features of each of the multiple image samples; The determination module is used to determine the vMF distribution of image features for each image category based on the target image features of image samples carrying the same image category label. The sampling module is used to sample the vMF distribution of image features for each image category to obtain multiple sampled image features; The generation module is used to generate multiple image feature comparison pairs based on each of the sampled image features and each of the target image features. The image feature comparison pairs include the following two types: positive pairs composed of two image features of the same image category, and negative pairs composed of two image features of different image categories. The training module is used to train the image classification model to be trained based on the multiple image feature comparison pairs, so as to obtain a trained image classification model.

6. The apparatus according to claim 5, characterized in that, The training module is specifically used for: Based on the vMF distribution of image features for each image category, a closed-form expression for the contrast loss function is derived when sampling an infinite number of image feature contrast pairs from the vMF distribution. The contrast loss function value is determined based on the image feature comparison pair, the image category to which the target image features in the image feature comparison pair belong, and the closed expression of the contrast loss function; Based on the contrastive loss function value, the image classification model to be trained is trained to obtain the image classification model.

7. The apparatus according to claim 5, characterized in that, The input module is specifically used for: Obtain multiple image samples from multiple training batches; Multiple image samples from each training batch are input into the image classification model to be trained for that training batch to obtain the target image features of each multiple image sample from each training batch. The step of determining the vMF distribution of image features for each image category based on the target image features of image samples carrying the same image category label includes: In each training batch, image features for each image category are obtained based on the target image features of each of the multiple image samples in the training batch and the target image features of each of the multiple image samples in each previous training batch. Based on the image features of each image category, construct the probability density function of the vMF distribution of the image features for each image category; Obtain statistical data on image features for each of the aforementioned image categories; Based on the statistical data, the average parameters and lumped parameters of the probability density function of the image feature vMF distribution for each image category are determined using the maximum likelihood estimation method; The image feature vMF distribution for each image category is determined based on the average and lumped parameters of the probability density function of the image feature vMF distribution for each image category.

8. The apparatus according to claim 5, characterized in that, After obtaining multiple sampled image features, the device further includes: The acquisition module is used to acquire the image features of the image to be processed; The calculation module is used to calculate the contrast loss function value of the image to be processed in each image category based on the image features of the image to be processed and the image features of each image category. The category determination module is used to determine the image category with the smallest contrast loss function value as the image category of the image to be processed; The pseudo-label generation module is used to generate pseudo-labels for the image category of the image to be processed based on the image category of the image to be processed. The pseudo-labels for the image category are used for semi-supervised training.

9. An electronic device, characterized in that, include: processor; Memory used to store the processor's executable instructions; The processor is configured to execute the instructions to implement the training method of the image classification model as described in any one of claims 1 to 4.

10. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform a training method for an image classification model as described in any one of claims 1 to 4.