A training method and device of an image classification model, a computer device and a medium

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By adjusting the loss weight parameters and optimizing the model loss function in the image classification model, the problem of low classification accuracy under imbalanced noisy image datasets is solved, achieving higher model accuracy and reliability.

CN116071613BActive Publication Date: 2026-06-23SHENZHEN INTELLIFUSION TECHNOLOGIES CO LTD

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SHENZHEN INTELLIFUSION TECHNOLOGIES CO LTD
Filing Date: 2022-12-29
Publication Date: 2026-06-23

Application Information

Patent Timeline

29 Dec 2022

Application

23 Jun 2026

Publication

CN116071613B

IPC: G06V10/774; G06V10/764

AI Tagging

Application Domain

Instruments

Technology Topics

Probability estimationRadiology

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A quantum state probing method and system
CN122175035AQuantum computersProbability estimationQuantum algorithm
Small failure probability assessment method based on prior constraint integration and hierarchical correction sampling
CN122334053AProbability estimationSurrogate model
Selective update of multi-hypothesis probability estimation for entropy coding
US20260143125A1Digital video signal modificationProbability estimationMultiple hypothesis
A sleep staging detection method and apparatus, an electronic device, and a storage medium
CN122132931ABiological models SensorsPolysomnogramSleep time
Trajectory prediction method and system based on gated attention and probabilistic multi-modal
CN122166151AProbability estimationAlgorithm

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing image classification models have low classification accuracy when faced with imbalanced and noisy image datasets. Existing regularization processes hinder effective learning when the label distribution is imbalanced, resulting in models failing to achieve high accuracy.

Method used

By determining the class probability estimation vector and class probability prediction vector of the image to be classified, calculating the number of images and a preset threshold to adjust the loss weight parameters, and combining real-time and historical information in iterative training, the model loss function is optimized to reduce the impact of label imbalance.

Benefits of technology

This improves the classification accuracy of image classification models on imbalanced and noisy datasets, reduces the negative impact of label imbalance on training, and enhances the reliability and accuracy of the model.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN116071613B_ABST

Patent Text Reader

Abstract

The present application relates to the technical field of image classification, and particularly relates to a training method and device of an image classification model, a computer device and a medium. The method determines a class probability estimation vector and a class probability prediction vector of a to-be-classified image through an image classification model, determines a predicted image class of the to-be-classified image according to the class probability prediction vector, further determines the number of images of each image class, determines a first loss weight parameter of each to-be-classified image in combination with a preset image number threshold, determines a model loss of the image classification model according to the predicted image class, the first loss weight parameter, the class probability estimation vector and the class probability prediction vector of each to-be-classified image, and trains the image classification model. The first loss weight parameter is used as a weight basis of the similarity between the class probability estimation vector and the class probability prediction vector, the influence of the label imbalance problem on the training of the image classification model is reduced, and the accuracy of the image classification model is improved.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image classification technology, and in particular to a training method, apparatus, computer equipment, and medium for an image classification model. Background Technology

[0002] In recent years, machine learning algorithms have achieved remarkable results in image classification by relying on high-quality and large-scale supervised image datasets. However, in real-world image classification scenarios, due to the widespread presence of label noise in image datasets, image classification models tend to learn from clean image samples first and then from noisy image samples during training. As the number of iterations increases, the image classification model gradually learns more and more noise information during training, resulting in the inability of the image classification model to achieve high accuracy on low-quality image datasets.

[0003] Existing image classification models can improve classification accuracy by using regularization to prevent the model from memorizing noise information during training. However, this method only applies to cases where the image data is evenly distributed. When the image dataset also suffers from imbalanced label distribution, regularization can hinder the effective learning of the images to be classified, failing to prevent the model from memorizing noise information during training and thus reducing the classification accuracy.

[0004] Therefore, how to improve the classification accuracy of image classification models on imbalanced noisy image datasets has become an urgent problem to be solved. Summary of the Invention

[0005] In view of this, embodiments of the present invention provide a training method, apparatus, computer device and medium for an image classification model, in order to solve the problem that the classification accuracy of existing image classification model training methods is low when facing imbalanced noisy image datasets.

[0006] In a first aspect, Embodiment 1 of the present invention provides a method for training an image classification model, the method comprising:

[0007] Obtain the images to be classified in the training set, input the images to be classified into the image classification model, and determine the category probability estimation vector and category probability prediction vector for each image to be classified.

[0008] Based on the category probability prediction vector, the predicted image category of each image to be classified is determined; based on the predicted image category of each image to be classified, the number of images in each image category is determined; based on the number of images in each image category and a preset image number threshold, the first loss weight parameter of each image to be classified is determined.

[0009] The model loss of the image classification model is determined based on the predicted image category of each image to be classified, the first loss weight parameter, the category probability estimation vector, and the category probability prediction vector.

[0010] The image classification model is trained based on the model loss to obtain a trained image classification model.

[0011] Secondly, Embodiment 2 of the present invention provides a training method for an image classification model, the training method for the image classification model comprising:

[0012] Obtain the images to be classified in the training set, input the images to be classified into the image classification model, and determine the class probability estimation vector and class probability prediction vector for each image to be classified.

[0013] Based on the category probability prediction vector, the predicted image category of each image to be classified is determined. Based on the predicted image category of each image to be classified, the number of images in each image category is determined. Based on the number of images in each image category and a preset image number threshold, the first loss weight parameter of each image to be classified is determined.

[0014] Obtain the original image category for each image to be classified, and determine the number of images in each original image category based on the image to be classified and the original image category.

[0015] The second loss weight parameter for each image to be classified is determined based on the number of images in each original image category and a preset image number threshold.

[0016] The model loss of the image classification model is determined based on the predicted image category, the first loss weight parameter, the second loss weight parameter, the category probability estimation vector, and the category probability prediction vector for each image to be classified.

[0017] The image classification model is trained based on the model loss to obtain a well-trained image classification model.

[0018] Thirdly, according to Embodiment 3 of the present invention, a training device for an image classification model is provided, the training device for the image classification model comprising:

[0019] The probability prediction module is used to acquire the images to be classified in the training set, input the images to be classified into the image classification model, and determine the category probability estimation vector and category probability prediction vector for each image to be classified.

[0020] The parameter determination module is used to determine the predicted image category of each image to be classified based on the category probability prediction vector, determine the number of images in each image category based on the predicted image category of each image to be classified, and determine the first loss weight parameter of each image to be classified based on the number of images in each image category and a preset image number threshold.

[0021] The loss calculation module is used to determine the model loss of the image classification model based on the predicted image category of each image to be classified, the first loss weight parameter, the category probability estimation vector, and the category probability prediction vector.

[0022] The model training module is used to train the image classification model based on the model loss to obtain a trained image classification model.

[0023] Optionally, the training device for the image classification model further includes:

[0024] The image quantity determination module is used to obtain the original image category of each image to be classified, and determine the number of images in each original image category based on the image to be classified and the original image category;

[0025] The second loss weight parameter determination module is used to determine the second loss weight parameter for each image to be classified based on the number of images in each original image category and the preset image number threshold.

[0026] Correspondingly, the loss calculation module is used to determine the model loss of the image classification model based on the predicted image category of each image to be classified, the first loss weight parameter, the second loss weight parameter, the category probability estimation vector, and the category probability prediction vector.

[0027] Fourthly, embodiments of the present invention provide a computer device, the computer device including a processor, a memory, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement a training method for an image classification model as described in the first or second aspect.

[0028] Fifthly, embodiments of the present invention provide a computer-readable storage medium storing a computer program that, when executed by a processor, implements a training method for an image classification model as described in the first or second aspect.

[0029] The beneficial effects of Embodiment 1 of the present invention compared with the prior art are as follows: By inputting the image to be classified into the image classification model, the category probability estimation vector and category probability prediction vector of each image to be classified are determined. Based on the category probability prediction vector, the predicted image category of each image to be classified is determined. Based on the predicted image category of each image to be classified, the number of images in each image category is determined. Based on the number of images in each image category and a preset image number threshold, the first loss weight parameter of each image to be classified is determined. By using the preset image number threshold as a benchmark for the distribution of the number of images in each image category, the reliability and accuracy of the first loss weight parameter are improved. Based on the predicted image category, the first loss weight parameter, the category probability estimation vector, and the category probability prediction vector of each image to be classified, the model loss of the image classification model is determined. The image classification model is trained based on the model loss to obtain a trained image classification model. The first loss weight parameter is used as the weight basis for the similarity between the category probability estimation vector and the category probability prediction vector to reduce the impact of the label imbalance problem on the training of the image classification model and improve the accuracy of the image classification model.

[0030] The beneficial effects of Embodiment 2 of the present invention compared with the prior art are as follows: By inputting the image to be classified into the image classification model, the category probability estimation vector and category probability prediction vector of each image to be classified are determined. Based on the category probability prediction vector, the predicted image category of each image to be classified is determined. Based on the predicted image category of each image to be classified, the number of images in each image category is determined. Based on the number of images in each image category and a preset image number threshold, the first loss weight parameter of each image to be classified is determined. By using the preset image number threshold as a benchmark for the distribution of the number of images in each image category, the reliability and accuracy of the first loss weight parameter are improved. The original image category of each image to be classified is obtained. Based on the image to be classified and the original image category, the number of images in each original image category is determined. Based on the number of images in each original image category and a preset image number... A threshold is used to determine the second loss weight parameter for each image to be classified. By using a preset image quantity threshold as a benchmark for the distribution of image quantity in the original image categories, the reliability and accuracy of the second loss weight parameter are improved. Based on the predicted image category, the first loss weight parameter, the second loss weight parameter, the category probability estimation vector, and the category probability prediction vector for each image to be classified, the model loss of the image classification model is determined. The image classification model is trained based on the model loss to obtain a trained image classification model. The second loss weight parameter is used as the weight basis for the similarity between the category probability estimation vector and the corresponding predicted image category, and the first loss weight parameter is used as the weight basis for the similarity between the category probability estimation vector and the category probability prediction vector. This reduces the impact of label imbalance on the training of the image classification model and improves the accuracy of the image classification model. Attached Figure Description

[0031] To more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0032] Figure 1 This is a schematic diagram of an application environment for a training method for an image classification model provided in Embodiment 1 of the present invention;

[0033] Figure 2 This is a flowchart illustrating a training method for an image classification model provided in Embodiment 1 of the present invention;

[0034] Figure 3 This is a flowchart illustrating a training method for an image classification model provided in Embodiment 2 of the present invention;

[0035] Figure 4This is a schematic diagram of the structure of a training device for an image classification model provided in Embodiment 4 of the present invention;

[0036] Figure 5 This is a schematic diagram of the structure of a computer device provided in Embodiment 5 of the present invention. Detailed Implementation

[0037] In the following description, specific details such as particular system architectures and techniques are set forth for illustrative purposes and not for limitation, in order to provide a thorough understanding of the embodiments of the invention. However, those skilled in the art will understand that the invention can be implemented in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, apparatuses, circuits, and methods are omitted so as not to obscure the description of the invention with unnecessary detail.

[0038] It should be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described features, integrals, steps, operations, elements and / or components, but does not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or collections thereof.

[0039] It should also be understood that the term “and / or” as used in this specification and the appended claims refers to any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.

[0040] As used in this specification and the appended claims, the term "if" may be interpreted, depending on the context, as "when," "once," "in response to determination," or "in response to detection." Similarly, the phrase "if determined" or "if [described condition or event] is detected" may be interpreted, depending on the context, as meaning "once determined," "in response to determination," "once [described condition or event] is detected," or "in response to detection of [described condition or event]."

[0041] Furthermore, in the description of this invention and the appended claims, the terms "first," "second," "third," etc., are used only to distinguish descriptions and should not be construed as indicating or implying relative importance.

[0042] References to "one embodiment" or "some embodiments" as described in this specification mean that one or more embodiments of the invention include a specific feature, structure, or characteristic described in connection with that embodiment. Therefore, the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in still other embodiments," etc., appearing in different parts of this specification do not necessarily refer to the same embodiment, but rather mean "one or more, but not all, embodiments," unless otherwise specifically emphasized. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless otherwise specifically emphasized.

[0043] The embodiments of this invention can acquire and process relevant data based on artificial intelligence technology. Artificial intelligence (AI) refers to the theories, methods, technologies, and application systems that utilize digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results.

[0044] Foundational technologies for artificial intelligence generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies mainly encompass computer vision, robotics, biometrics, speech processing, natural language processing, and machine learning / deep learning.

[0045] It should be understood that the sequence number of each step in the following embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

[0046] To illustrate the technical solution of the present invention, specific embodiments are described below.

[0047] The image classification model training method provided in Embodiment 1 of this invention can be applied to, for example... Figure 1In this application environment, the client communicates with the server. The client includes, but is not limited to, handheld computers, desktop computers, laptops, ultra-mobile personal computers (UMPCs), netbooks, cloud computing devices, and personal digital assistants (PDAs). The server can be a standalone server or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDNs), and big data and artificial intelligence platforms.

[0048] See Figure 2 This is a flowchart illustrating a training method for an image classification model provided in Embodiment 1 of the present invention. The above-described image classification model training method can be applied to... Figure 1 For clients using this model, the training method for the image classification model may include the following steps:

[0049] Step S201: Obtain the images to be classified in the training set, input the images to be classified into the image classification model, and determine the category probability estimation vector and category probability prediction vector for each image to be classified.

[0050] In this embodiment, the training set is selected from image datasets that have label noise and label imbalance problems in actual image classification scenarios. Label noise means that the noise category labels of the images to be classified in the training set are inconsistent with the actual image categories of the corresponding images to be classified. Label imbalance means that the image categories of the images to be classified correspond to a skewed distribution, that is, some image categories contain a large number of images to be classified, while most image categories contain only a very small number of images to be classified.

[0051] During the training process of the image classification model, after acquiring the images to be classified in the training set, the images to be classified are input into the image classification model for feature extraction and feature analysis to determine the category probability estimation vector and category probability prediction vector for each image to be classified.

[0052] The probability estimation vector consists of the probability that the image to be classified belongs to each image category, and can be used to predict the image category of the corresponding image to be classified.

[0053] The class probability prediction vector obtained in the k-th iteration training can be obtained by fusing the class probability estimation vector obtained in the k-th iteration training and the class probability prediction vector obtained in the (k-1)-th iteration training. This class probability prediction vector combines the real-time and historical class probability prediction information of the image classification model during iterative training, and can be used to predict the image category of the corresponding image to be classified. Here, k = 2, 3, ..., K, where K can be a preset target iteration number.

[0054] Furthermore, since the iterative training process can analyze and learn the images to be classified and their category labels, optimize the parameters of the image classification model, and improve the accuracy of the image classification model, the reliability of the image classification model is relatively low in the early stage of iterative training. Therefore, in this embodiment, when determining the category probability prediction vector in the k-th iteration training, as the number of iterations increases, the first fusion weight of the category probability estimation vector obtained in the k-th iteration training is gradually increased, and the second fusion weight of the category probability prediction vector obtained in the (k-1)-th iteration training is gradually decreased, which can improve the reliability and accuracy of the category probability prediction vector.

[0055] For example, let β be the first fusion weight in the k-th iteration of training. 1k Let β be the second fusion weight in the k-th iteration of training. 1k As the number of iterations increases, the first fusion weight gradually increases, and the second fusion weight gradually decreases. The first fusion weight is:

[0056]

[0057] In the formula, β 1k Let x1 be the first fusion weight in the k-th iteration of training, x2 be the lower limit of the first fusion weight, x3 be the upper limit of the first fusion weight, K be the preset target iteration number, and k be the iteration number.

[0058] The second fusion weight is:

[0059]

[0060] In the formula, β 2k x1 is the second fusion weight in the k-th iteration of training, x2 is the lower limit of the first fusion weight, x1 is the upper limit of the first fusion weight, K is the preset target iteration number, and k is the iteration number.

[0061] Let I denote the total number of images to be classified. For the i-th (i = 1, 2, ..., I) image to be classified, let the estimated class probability vector obtained in the k-th iteration of training be denoted as... Let the class probability prediction vector obtained in the (k-1)th iteration of training be denoted as... The class probability prediction vector obtained in the k-th iteration of training is:

[0062]

[0063] In the formula, Let be the class probability prediction vector of the i-th image to be classified obtained in the k-th iteration of training. Let be the class probability estimate vector of the i-th image to be classified obtained in the k-th iteration of training. Let β be the predicted class probability vector of the i-th image to be classified obtained in the (k-1)-th iteration of training. 1k β is the first fusion weight. 2k This is the second fusion weight.

[0064] In this embodiment, the image classification task can be in scenarios such as gait analysis, video surveillance, and sports science, classifying human postures in images to be classified. Correspondingly, the category labels are various human posture categories such as stationary, walking, running, squatting, and jumping, and the image classification model is a human posture classification model. Alternatively, it can be in scenarios such as product audience analysis and population aging analysis, classifying facial attributes in images to be classified. Correspondingly, the category labels are various facial attribute categories such as male, female, child, youth, middle-aged, and elderly, and the image classification model is a facial attribute classification model. Alternatively, it can be in scenarios such as teaching evaluation, product sales, and personnel interviews, classifying human emotions in images to be classified. Correspondingly, the category labels are various human emotions such as happy, nervous, sad, disgusted, and bored, and the image classification model is a human emotion classification model. Alternatively, it can be in scenarios such as renting, decorating, and buying a house, classifying room styles in images to be classified. Correspondingly, the category labels are various room styles such as pastoral, minimalist, classical, new Chinese, Mediterranean, and Southeast Asian, and the image classification model is a room style classification model.

[0065] This embodiment takes the classification of human poses in images as an example. The images to be classified in the training set include human pose images of various postures, and the category labels are various human pose categories such as stationary, walking, running, squatting, and jumping. Since it is difficult to avoid labeling errors when annotating the category labels, some human pose images will be labeled as incorrect human pose categories, making the training set in this embodiment a noisy human pose image dataset.

[0066] During training, human pose images are acquired from the human pose image dataset. These images are then input into the human pose classification model for feature extraction and analysis to determine the category probability estimation vector and category probability prediction vector for each human pose image.

[0067] Optionally, the images to be classified are input into an image classification model, and the class probability estimation vector and class probability prediction vector for each image to be classified are determined by:

[0068] The images to be classified are input into the image classification model to determine the class probability estimate vector for each image in the first iteration.

[0069] Based on the preset category probability vector and the category probability estimation vector of each image to be classified in the first iteration, determine the category probability prediction vector of each image to be classified in the first iteration.

[0070] Determine the class probability estimation vector for each image to be classified in the k-th iteration and the class probability prediction vector in the (k-1)-th iteration, where k = 2, 3, ...;

[0071] Based on the estimated class probability vector of each image to be classified in the k-th iteration and the predicted class probability vector in the (k-1)-th iteration, the predicted class probability vector of each image to be classified in the k-th iteration is determined.

[0072] In this process, the image to be classified is input into the image classification model. After the first iteration of training, only the class probability estimation vector of the first iteration is obtained. The class probability estimation vectors of all iterations before the first iteration are not available.

[0073] To facilitate the unified calculation of the category probability prediction vector, this embodiment sets a preset category probability vector, which is fused with the category probability estimation vector obtained in the first iteration training to obtain the category probability prediction vector in the first iteration process.

[0074] In one embodiment, in order not to affect the calculation result of the category probability estimation vector, the preset category probability vector can be set as a zero vector.

[0075] Then, starting from the second iteration of training, the class probability prediction vector in the k-th iteration can be obtained by fusing the class probability estimation vector in the k-th iteration and the class probability prediction vector in the (k-1)-th iteration.

[0076] For example, for the i-th image to be classified, obtain the class probability estimate vector obtained in the first iteration of training. Let the preset class probability be P0, then the class probability prediction vector obtained in the first iteration of training is:

[0077]

[0078] In the formula, Let β be the predicted class probability vector for the i-th image to be classified in the first iteration of training. 11 β is the first fusion weight in the first iteration of training. 21 The second fusion weight in the first iteration of training, P0 is the probability estimate vector for the i-th image to be classified in the first iteration of training, where P0 is the preset class probability.

[0079] This embodiment takes into account the differences between the first iteration of training and other iterations, sets up a preset class probability and fuses it with the probability estimation vector in the first iteration of training, and performs adaptive fusion calculation on the class probability prediction vectors of different iterations, thereby improving the reliability and accuracy of the class probability prediction vector.

[0080] The steps described above—obtaining the images to be classified in the training set, inputting these images into the image classification model, and determining the category probability estimation vector and category probability prediction vector for each image—involve feature extraction and analysis based on an imbalanced and noisy training set. This process determines the probability estimation vector for each image to be classified, which is used to predict the image category. Furthermore, it combines real-time and historical category prediction information from iterative training to obtain the category probability prediction vector, which is used to predict the image category. Considering the relationship between the reliability of the image classification model and the number of iterations, the fusion ratio of real-time and historical category probability prediction information is adjusted, thereby improving the reliability and accuracy of the category probability prediction vector.

[0081] Step S202: Based on the category probability prediction vector, determine the predicted image category of each image to be classified; based on the predicted image category of each image to be classified, determine the number of images in each image category; based on the number of images in each image category and a preset image number threshold, determine the first loss weight parameter for each image to be classified.

[0082] The category probability prediction vector is obtained by fusing the probability estimation vector, and the category probability estimation vector is composed of the probability of the image to be classified belonging to each image category. Therefore, the predicted image category of each image to be classified can be predicted and determined based on the probability of each image category in the category probability prediction vector.

[0083] By statistically analyzing the predicted image categories of all images to be classified, the number of images belonging to each category can be determined. When the number of images in different categories differs significantly, the distribution of image numbers among categories becomes unbalanced, leading to an imbalanced label problem in the training set. Correspondingly, the image categories with a larger number of images are considered dominant categories, and the image categories with a smaller number of images are considered long-tail categories. During training, the image classification model first learns the images to be classified that belong to the dominant category, and then learns the images to be classified that belong to the long-tail category. This breaks the application condition of regularization-based image classification models, which first learn clean images to be classified and then learn noisy images to be classified, resulting in poor training performance of the image classification model.

[0084] Therefore, in order to solve the label imbalance problem, this embodiment adjusts the loss weight of each image category in the model loss according to the number of images in each image category. Correspondingly, the more images in an image category, the smaller the loss weight of the image to be classified in the model loss for that image category, thereby reducing the impact of the label imbalance problem on the training of the image classification model and improving the accuracy of the image classification model.

[0085] Specifically, in this embodiment, a preset image number threshold is set as a benchmark for measuring the image number distribution. The ratio of the preset image number threshold to the image number of each image category is calculated to measure the image number distribution of each image category based on the ratio. The ratio corresponding to each image category is determined as the first loss weight parameter for each image to be classified belonging to that image category.

[0086] In one embodiment, the ratio of the total number of images to be classified in the training set to the total number of image categories can be calculated. This ratio is the average number of images per image category. The average number of images per image category is then determined as a preset image number threshold. Using the average number of images per image category as a benchmark for the distribution of image number improves the reliability of the first loss weight parameter.

[0087] This embodiment takes the classification of human poses in images to be classified as an example. Based on the category probability prediction vector, the predicted human pose category of each human pose image is determined. Based on the predicted human pose category of each human pose image, the number of images for each human pose category is determined. A preset image number threshold is set as a benchmark for measuring the image number distribution. The ratio of the preset image number threshold to the number of images for each human pose category is calculated to measure the image number distribution of each human pose category. The ratio corresponding to each human pose category is determined as the first loss weight parameter for each human pose image belonging to that human pose category.

[0088] Optionally, based on the category probability prediction vector, the predicted image category for each image to be classified includes:

[0089] Based on the class probability prediction vector of each image to be classified, determine the value of the vector element at each position in the class probability prediction vector;

[0090] Determine the maximum value of the vector elements, and determine the image category corresponding to the maximum value as the predicted image category for each image to be classified.

[0091] In the category probability prediction vector, the value of each element represents the probability that the image to be classified belongs to each image category. Therefore, the image category corresponding to the maximum value in the vector element can be determined as the predicted image category of each image to be classified.

[0092] In this embodiment, the image category corresponding to the maximum value of the vector element in the category probability prediction vector is determined as the predicted image category for each image to be classified, thereby improving the accuracy of the predicted image category.

[0093] Optionally, the first loss weight parameters for each image to be classified are determined based on the number of images in each image category and a preset image number threshold, including:

[0094] Calculate the ratio of a preset image number threshold to the number of images in each image category, and determine the first loss weight parameter for each image category based on this ratio;

[0095] The first loss weight parameter for each image to be classified is determined based on the predicted image category for each image and the first loss weight parameter for each image category.

[0096] The preset image number threshold serves as a benchmark for measuring the distribution of image numbers. The ratio of the preset image number threshold to the number of images in each image category is calculated, and the ratio is determined as the first loss weight parameter for each image category.

[0097] For example, let the preset image quantity threshold be S0, and the number of images of the j-th image category in the k-th iteration of training be denoted as S0. Then, the first loss weight parameter for the j-th image category in the k-th iteration of training is:

[0098]

[0099] In the formula, Sj is the first loss weight parameter for the j-th image category in the k-th iteration of training, and S0 is a preset threshold for the number of images. The number of images belonging to the j-th image category during the k-th iteration of training.

[0100] Based on the image category to which each image belongs, the first loss weight parameter for each image can be determined. Correspondingly, the first loss weight parameter for the i-th image to be classified in the k-th iteration of training is denoted as...

[0101] This embodiment uses a preset image quantity threshold as a benchmark for measuring the image quantity distribution, calculates the ratio of the preset image quantity threshold to the image quantity of each image category, determines the first loss weight parameter for each image category, and then determines the first loss weight parameter for each image to be classified based on the image category to which each image to be classified belongs, thereby improving the reliability and accuracy of the first loss weight parameter.

[0102] The steps described above—determining the predicted image category for each image to be classified based on the category probability prediction vector, determining the number of images in each image category based on the predicted image category, and determining the first loss weight parameter for each image to be classified based on the number of images in each image category and a preset image number threshold—improve the accuracy of the predicted image category by determining the image category corresponding to the maximum value of the vector elements in the category probability prediction vector. Furthermore, using the preset image number threshold as a benchmark for measuring the image number distribution and determining the first loss weight parameter for each image to be classified by calculating the ratio of the preset image number threshold to the number of images in each image category improves the reliability and accuracy of the first loss weight parameter.

[0103] Step S203: Determine the model loss of the image classification model based on the predicted image category, the first loss weight parameter, the category probability estimation vector, and the category probability prediction vector for each image to be classified.

[0104] In this context, the predicted image category represents the image category to which the corresponding image to be classified belongs, and the category probability estimation vector and the category probability prediction vector represent the probability that the corresponding image to be classified belongs to each image category. Then, based on the similarity between the category probability estimation vector and the corresponding predicted image category of each image to be classified, as well as the similarity between the category probability estimation vector and the category probability prediction vector, the first sub-loss and the second sub-loss of each image to be classified can be calculated to characterize the classification accuracy of each image to be classified.

[0105] Since the first loss weight parameter is obtained based on the class probability prediction vector, it is used as the loss weight of the second sub-loss. The first and second sub-losses of all images to be classified are fused to obtain the model loss of the image classification model, so as to reduce the impact of the label imbalance problem on the training of the image classification model and improve the accuracy of the image classification model.

[0106] For example, for the i-th image to be classified in the k-th iteration of training, the similarity between the estimated class probability vector and the predicted image class is calculated to obtain the first sub-loss. The second sub-loss is obtained by calculating the similarity between the estimated class probability vector and the predicted class probability vector. Then, the first loss weight parameter is used as the loss weight of the second sub-loss to calculate the model loss for the i-th image to be classified:

[0107]

[0108] In the formula, Let i be the model loss for the i-th image to be classified during the k-th iteration of training. The first sub-loss is the first sub-loss for the i-th image to be classified in the k-th iteration of training. The second sub-loss is used for the i-th image to be classified in the k-th iteration of training. Let be the first loss weight parameter for the i-th image to be classified in the k-th iteration of training.

[0109] Then, the model losses of all images to be classified in the k-th iteration of training are summed to obtain the model loss of the image classification model in the k-th iteration of training.

[0110] This embodiment takes the classification of human poses in images to be classified as an example. The predicted human pose category is used to represent the human pose category to which the corresponding image to be classified belongs. The category probability estimation vector and the category probability prediction vector are used to represent the probability that the corresponding human pose image belongs to each human pose category. Then, based on the similarity between the category probability estimation vector and the corresponding predicted human pose category of each human pose image, as well as the similarity between the category probability estimation vector and the category probability prediction vector, the first sub-loss and the second sub-loss of each human pose image can be calculated to characterize the classification accuracy of each human pose. Then, the weight parameter of the first loss is used as the loss weight of the second sub-loss. The first sub-loss and the second sub-loss of all human pose images are fused to obtain the model loss of the human pose classification model, so as to reduce the impact of the label imbalance problem on the training of the human pose classification model and improve the accuracy of the human pose classification model.

[0111] Optionally, based on the predicted image class, the first loss weight parameter, the class probability estimation vector, and the class probability prediction vector for each image to be classified, the model loss of the image classification model is determined, including:

[0112] Based on the category probability estimation vector and the predicted image category, determine the cross-entropy loss for each image to be classified;

[0113] Calculate the mean of the cross-entropy loss for all images to be classified, and determine the mean of the cross-entropy loss as the third sub-loss of the image classification model;

[0114] Based on the class probability estimation vector and the class probability prediction vector, determine the regularization loss for each image to be classified;

[0115] Based on the first loss weight parameters and the regularization loss, determine the second sub-loss for each image to be classified;

[0116] Calculate the mean of the second sub-loss for all images to be classified, and determine the mean of the second sub-loss as the second model sub-loss of the image classification model;

[0117] The model loss of the image classification model is determined based on the third and second model sub-losses of the image classification model.

[0118] To facilitate similarity calculation, the corresponding category probability vector is first determined based on the predicted image category. Then, the cross-entropy loss between the category probability estimation vector and the category probability vector is calculated. The mean of the cross-entropy loss of all images to be classified is determined as the third model sub-loss of the image classification model.

[0119] The regularization loss between the category probability estimation vector and the category probability prediction vector is calculated. The first loss weight parameter is used as the weight of the regularization loss to obtain the second sub-loss for each image to be classified. Then, the mean of the second sub-losses of all images to be classified is determined as the second model sub-loss of the image classification model.

[0120] Furthermore, during iterative training, as the number of iterations increases, the reliability of the image classification model gradually increases, while the third model sub-loss gradually decreases. The third model sub-loss is calculated based on cross-entropy. Due to the inherent properties of logarithms, the third model sub-loss suffers from gradient problems. These gradient problems become increasingly severe with each iteration, leading to a gradual decrease in the accuracy of the third model sub-loss. Therefore, to improve the reliability and accuracy of the model loss, this embodiment reduces the proportion of the third model sub-loss in the overall model loss and increases the proportion of the second model sub-loss. In this embodiment, the proportion of the second model sub-loss is dynamically adjusted based on a preset loss adjustment parameter as its weight, resulting in the model loss of the image classification model.

[0121] For example, for the i-th image to be classified in the k-th iteration of training, let the class probability vector be denoted as Y. i k Calculate the category probability estimation vector and category probability vector Y i k The cross-entropy loss between them is:

[0122]

[0123] In the formula, Let N be the cross-entropy loss for the i-th image to be classified in the k-th training iteration, where N is the number of image categories. Let be the value of the element at the nth position in the class probability vector of the i-th image to be classified during the k-th training iteration. Let be the value of the element at the nth position in the category probability estimation vector of the i-th image to be classified during the k-th iteration of training.

[0124] Then, the mean cross-entropy loss of the I images to be classified in the k-th iteration of training is calculated to obtain the third model sub-loss in the k-th iteration of training.

[0125] Calculate the category probability estimate vector and category probability prediction vector T i k The regularization loss between them is:

[0126]

[0127] In the formula, P is the regularization loss for the i-th image to be classified in the k-th iteration of training. i k Let T be the class probability estimate vector of the i-th image to be classified in the k-th iteration of training. i k Let be the class probability prediction vector for the i-th image to be classified in the k-th training iteration. α is the first loss weight parameter for the i-th image to be classified in the k-th iteration of training, and α is a preset value. In this embodiment, the preset value α = 1 is taken.

[0128] The first loss weight parameter As regularization loss We calculate the weights of the I images to be classified in the k-th training iteration to obtain the second sub-loss, and then calculate the mean of the second sub-losses for the I images to be classified in the k-th training iteration. The second model sub-loss in the k-th training iteration is thus obtained as:

[0129]

[0130] In the formula, The second model sub-loss is used in the k-th iteration of training. Let be the first loss weight parameters for the i-th image to be classified in the k-th iteration of training. Let be the regularization loss for the i-th image to be classified during the k-th iteration of training.

[0131] Then, the preset loss adjustment parameter in the k-th iteration of training is denoted as λ. kBased on the third model sub-loss in the k-th iteration of training Using the second model sub-loss, the model loss in the k-th iteration of training is obtained as follows:

[0132]

[0133] In the formula, L k The model loss during the k-th iteration of training, The third model sub-loss in the k-th iteration of training, λ is the second model sub-loss in the k-th iteration of training. k This is the preset loss adjustment parameter in the k-th iteration of training.

[0134] This embodiment determines the cross-entropy loss of each image to be classified based on the category probability estimation vector and the predicted image category. The mean of the cross-entropy losses of all images to be classified is determined as the third model sub-loss of the image classification model. At the same time, the regularization loss between the category probability estimation vector and the category probability prediction vector is calculated. The first loss weight parameter is used as the weight of the regularization loss to obtain the second sub-loss of each image to be classified. The mean of the second sub-losses of all images to be classified is determined as the second model sub-loss of the image classification model. The proportion of the second sub-loss in the model loss is dynamically adjusted according to the preset loss adjustment parameter, thereby improving the accuracy of the model loss.

[0135] The steps described above, which determine the model loss of the image classification model based on the predicted image category, the first loss weight parameter, the category probability estimation vector, and the category probability prediction vector for each image to be classified, characterize the classification accuracy of the image classification model by the similarity between the category probability estimation vector and the corresponding predicted image category for each image to be classified, as well as the similarity between the category probability estimation vector and the category probability prediction vector. The first loss weight parameter is used as the weight basis for the similarity between the category probability estimation vector and the category probability prediction vector to reduce the impact of label imbalance on the training of the image classification model and improve the accuracy of the image classification model.

[0136] Step S204: Train the image classification model based on the model loss to obtain a trained image classification model.

[0137] Among them, model loss can measure the classification accuracy of image classification model in the corresponding iterative training. The smaller the model loss, the higher the classification accuracy of image classification model in the corresponding iterative training. Therefore, in the iterative training of image classification model, the image classification model is trained based on model loss until model loss converges, and a well-trained image classification model is obtained.

[0138] This embodiment takes the classification of human pose in the image to be classified as an example. In the iterative training of the human pose classification model, the human pose classification model is trained based on the model loss until the model loss converges, and a trained human pose classification model is obtained.

[0139] The above steps involve training an image classification model based on model loss to obtain a trained image classification model. Model loss characterizes the classification accuracy of the image classification model in the corresponding iterative training. The image classification model is trained based on model loss until the model loss converges, thereby improving the classification accuracy of the image classification model.

[0140] This embodiment inputs the images to be classified into an image classification model, determines the category probability estimation vector and category probability prediction vector for each image, determines the predicted image category for each image based on the category probability prediction vector, determines the number of images in each category based on the predicted image category, and determines the first loss weight parameter for each image based on the number of images in each category and a preset image number threshold. By using the preset image number threshold as a benchmark for the distribution of image number in each category, the reliability and accuracy of the first loss weight parameter are improved. Based on the predicted image category, the first loss weight parameter, the category probability estimation vector, and the category probability prediction vector for each image, the model loss of the image classification model is determined. The image classification model is trained based on the model loss to obtain a trained image classification model. The first loss weight parameter is used as the weight basis for the similarity between the category probability estimation vector and the category probability prediction vector to reduce the impact of label imbalance on the training of the image classification model and improve the accuracy of the image classification model.

[0141] See Figure 3 This is a flowchart illustrating a training method for an image classification model provided in Embodiment 2 of the present invention. The above-described image classification model training method can be applied to... Figure 1 For clients using this model, the training method for the image classification model may include the following steps:

[0142] Step S301: Obtain the images to be classified in the training set, input the images to be classified into the image classification model, and determine the category probability estimation vector and category probability prediction vector for each image to be classified.

[0143] Step S302: Based on the category probability prediction vector, determine the predicted image category of each image to be classified; based on the predicted image category of each image to be classified, determine the number of images in each image category; based on the number of images in each image category and a preset image number threshold, determine the first loss weight parameter for each image to be classified.

[0144] Step S303: Obtain the original image category for each image to be classified, and determine the number of images in each original image category based on the image to be classified and the original image category.

[0145] Step S304: Determine the second loss weight parameter for each image to be classified based on the number of images in each original image category and a preset image number threshold.

[0146] Step S305: Determine the model loss of the image classification model based on the predicted image category, the first loss weight parameter, the second loss weight parameter, the category probability estimation vector, and the category probability prediction vector for each image to be classified.

[0147] Step S306: Train the image classification model based on the model loss to obtain the trained image classification model.

[0148] Steps S301, S302, and S306 are identical to steps S201, S202, and S204 in the training method for an image classification model provided in Embodiment 1 of the present invention, and will not be repeated here. Steps S303, S304, and S305 are as follows:

[0149] Step S303: Obtain the original image category for each image to be classified, and determine the number of images in each original image category based on the image to be classified and the original image category.

[0150] In this embodiment, the imbalanced noisy training set contains images to be classified and their category labels. The original image category of each image to be classified can be determined based on the category labels. Then, the number of images in each original image category can be counted based on the original image categories of all images to be classified.

[0151] This embodiment takes the classification of human poses in images to be classified as an example. The original human pose category of each human pose image can be determined according to the category label. Then, the number of images in each original human pose category can be counted based on the original human pose categories of all human pose images.

[0152] The steps described above—obtaining the original image category for each image to be classified, and determining the number of images in each original image category based on the image to be classified and the original image category—use the category labels in the training set to determine the number of images in each original image category, which can serve as the basis for calculating the distribution of the number of original image categories in the training set.

[0153] Step S304: Determine the second loss weight parameter for each image to be classified based on the number of images in each original image category and a preset image number threshold.

[0154] When the number of images in different original image categories differs significantly, the distribution of the number of images in the original image categories becomes unbalanced, resulting in an imbalanced label problem in the training set. This disrupts the application condition of image classification models based on regularization, which first learns clean images to be classified and then learns noisy images to be classified, leading to poor training performance of the image classification model.

[0155] Therefore, in this embodiment, the loss weight of each original image category in the model loss is adjusted according to the number of images in the original image category. Correspondingly, the more images in the original image category, the smaller the loss weight of the image to be classified corresponding to that original image category in the model loss is set, thereby reducing the impact of the original label imbalance problem on the training of the image classification model and improving the accuracy of the image classification model.

[0156] Specifically, the ratio of a preset image quantity threshold to the number of images in each original image category is calculated. The distribution of the number of images in each original image category is measured based on the ratio, and the ratio corresponding to each original image category is determined as the second loss weight parameter for each image to be classified belonging to that original image category.

[0157] For example, let the number of images of the j-th original image category in the k-th training iteration be denoted as . Then, the second loss weight parameter for the j-th original image category in the k-th iteration of training is:

[0158]

[0159] In the formula, Sj is the second loss weight parameter for the j-th original image category in the k-th iteration of training, and S0 is a preset image number threshold. The number of images belonging to the j-th original image category during the k-th iteration of training.

[0160] Based on the original image category to which each image to be classified belongs, the second loss weight parameter for each image to be classified can be determined. Correspondingly, the second loss weight parameter for the i-th image to be classified in the k-th iteration of training is denoted as...

[0161] This embodiment takes the classification of human poses in images to be classified as an example. It calculates the ratio of a preset image quantity threshold to the number of images in each original human pose category. The ratio is used to measure the distribution of the number of images in each original human pose category. The ratio corresponding to each original human pose category is determined as the second loss weight parameter for each human pose image belonging to that original human pose category.

[0162] The above steps, which determine the second loss weight parameter for each image to be classified based on the number of images in each original image category and a preset image number threshold, use the preset image number threshold as a benchmark for measuring the distribution of the number of images. The second loss weight parameter for each image to be classified is determined by calculating the ratio of the preset image number threshold to the number of images in each original image category, thereby improving the reliability and accuracy of the second loss weight parameter.

[0163] Step S305: Determine the model loss of the image classification model based on the predicted image category, the first loss weight parameter, the second loss weight parameter, the category probability estimation vector, and the category probability prediction vector for each image to be classified.

[0164] In this context, the predicted image category represents the image category to which the corresponding image to be classified belongs, and the category probability estimation vector and the category probability prediction vector represent the probability that the corresponding image to be classified belongs to each image category. Then, based on the similarity between the category probability estimation vector and the corresponding predicted image category of each image to be classified, as well as the similarity between the category probability estimation vector and the category probability prediction vector, the first sub-loss and the second sub-loss of each image to be classified can be calculated to characterize the classification accuracy of each image to be classified.

[0165] Since the first loss weight parameter is obtained based on the class probability prediction vector and the second loss weight parameter is calculated based on the original image class, and the class probability estimation vector and the class probability prediction vector do not affect the second loss weight parameter, the first loss weight parameter is used as the loss weight of the second sub-loss, and the second loss weight parameter is used as the loss weight of the first sub-loss. The first sub-loss and the second sub-loss are fused and calculated based on the first loss weight parameter and the second loss weight parameter to obtain the model loss of the image classification model. This reduces the impact of the label imbalance problem on the training of the image classification model and improves the accuracy of the image classification model.

[0166] This embodiment takes the classification of human poses in images as an example. The predicted human pose category represents the human pose category to which the corresponding human pose image belongs. The category probability estimation vector and the category probability prediction vector represent the probability that the corresponding human pose image belongs to each human pose category. Based on the similarity between the category probability estimation vector and the corresponding predicted human pose category, as well as the similarity between the category probability estimation vector and the category probability prediction vector, the first sub-loss and the second sub-loss of each human pose image are calculated to characterize the classification accuracy of each human pose image. Then, the first loss weight parameter is used as the loss weight of the second sub-loss, and the second loss weight parameter is used as the loss weight of the first sub-loss. The first and second sub-losses are fused and calculated based on the first and second loss weight parameters to obtain the model loss of the human pose classification model. This reduces the impact of label imbalance on the training of the human pose classification model and improves its accuracy.

[0167] Optionally, based on the predicted image category, first loss weight parameter, second loss weight parameter, category probability estimation vector, and category probability prediction vector for each image to be classified, the model loss of the image classification model is determined, including:

[0168] Based on the category probability estimation vector and the predicted image category, determine the cross-entropy loss for each image to be classified;

[0169] The first sub-loss for each image to be classified is determined based on the second loss weight parameter and the cross-entropy loss.

[0170] Calculate the mean of the first sub-loss for all images to be classified, and determine the mean of the first sub-loss as the first model sub-loss of the image classification model;

[0171] Based on the class probability estimation vector and the class probability prediction vector, determine the regularization loss for each image to be classified;

[0172] Based on the first loss weight parameters and the regularization loss, determine the second sub-loss for each image to be classified;

[0173] Calculate the mean of the second sub-loss for all images to be classified, and determine the mean of the second sub-loss as the second model sub-loss of the image classification model;

[0174] The model loss of the image classification model is determined based on the first and second model sub-losses of the image classification model.

[0175] To facilitate similarity calculation, the corresponding category probability vector is first determined based on the predicted image category. Then, the cross-entropy loss between the category probability estimation vector and the category probability vector is calculated. The second loss weight parameter is used as the weight of the cross-entropy loss to obtain the first sub-loss for each image to be classified. Finally, the mean of the first sub-losses of all images to be classified is determined as the first model sub-loss of the image classification model.

[0176] The regularization loss between the category probability estimation vector and the category probability prediction vector is calculated. The first loss weight parameter is used as the weight of the regularization loss to obtain the second sub-loss for each image to be classified. Then, the mean of the second sub-losses of all images to be classified is determined as the second model sub-loss of the image classification model.

[0177] Then, the preset loss adjustment parameters are used as the weights of the second model sub-loss, and the proportion of the second model sub-loss is dynamically adjusted. The model loss of the image classification model is determined based on the first model sub-loss and the second model sub-loss.

[0178] For example, for the i-th image to be classified in the k-th training iteration, the class probability estimation vector... and category probability vector Y i k The cross-entropy loss between them is Then the second loss weight parameter As cross-entropy loss We calculate the weights of the I images to be classified, obtain the first sub-loss, and then calculate the mean of the first sub-losses for the I images to be classified in the k-th iteration of training. The first model sub-loss in the k-th iteration of training is thus obtained as:

[0179]

[0180] In the formula, The first model sub-loss in the k-th iteration of training, The second loss weight parameter is used for the i-th image to be classified in the k-th iteration of training. Let be the cross-entropy loss of the i-th image to be classified in the k-th iteration of training.

[0181] Then, based on the preset loss adjustment parameter λ from the k-th iteration of training. k The third model sub-loss in the k-th iteration of training Using the second model sub-loss, the model loss in the k-th iteration of training is obtained as follows:

[0182]

[0183] In the formula, L k The model loss during the k-th iteration of training, The first model sub-loss in the k-th iteration of training, λ is the second model sub-loss in the k-th iteration of training. k This is the preset loss adjustment parameter in the k-th iteration of training.

[0184] The steps described above, which determine the model loss of the image classification model based on the predicted image category, the first loss weight parameter, the second loss weight parameter, the category probability estimation vector, and the category probability prediction vector for each image to be classified, characterize the classification accuracy of the image classification model by the similarity between the category probability estimation vector and the corresponding predicted image category for each image to be classified, as well as the similarity between the category probability estimation vector and the category probability prediction vector. The second loss weight parameter is used as the weight basis for the similarity between the category probability estimation vector and the corresponding predicted image category, and the first loss weight parameter is used as the weight basis for the similarity between the category probability estimation vector and the category probability prediction vector. This reduces the impact of label imbalance on the training of the image classification model and improves the accuracy of the image classification model.

[0185] This embodiment inputs the images to be classified into an image classification model to determine the category probability estimation vector and category probability prediction vector for each image. Based on the category probability prediction vector, the predicted image category for each image is determined. Based on the predicted image category, the number of images in each category is determined. Based on the number of images in each category and a preset image number threshold, a first loss weight parameter for each image is determined. By using the preset image number threshold as a benchmark for the distribution of image number in each category, the reliability and accuracy of the first loss weight parameter are improved. The original image category for each image to be classified is obtained. Based on the image to be classified and the original image category, the number of images in each original image category is determined. Based on the number of images in each original image category and the preset image number threshold, the number of images in each image to be classified is determined. The second loss weight parameter for image classification improves its reliability and accuracy by using a preset image quantity threshold as a benchmark for the distribution of image quantity in the original image category. Based on the predicted image category, first loss weight parameter, second loss weight parameter, category probability estimation vector, and category probability prediction vector for each image to be classified, the model loss of the image classification model is determined. The image classification model is trained based on this model loss to obtain a trained image classification model. The second loss weight parameter is used as the weight basis for the similarity between the category probability estimation vector and the corresponding predicted image category, while the first loss weight parameter is used as the weight basis for the similarity between the category probability estimation vector and the category probability prediction vector. This reduces the impact of label imbalance on the training of the image classification model and improves its accuracy.

[0186] Embodiment 3 of the present invention provides an image classification method. This image classification method uses the image classification model trained in Embodiment 1 or Embodiment 2 of the present invention to classify images, and may include the following steps:

[0187] The system acquires the image to be classified in the image classification task, inputs the image to be classified into the trained image classification model, outputs the class probability estimation vector of the image to be classified, and determines the image class of the image to be classified based on the class probability estimation vector of the image to be classified.

[0188] Image classification tasks can be implemented in various scenarios, such as gait analysis, video surveillance, and sports science, to classify human postures in images. The category labels are various human posture categories, such as stationary, walking, running, squatting, and jumping, and the image classification model is a human posture classification model. Alternatively, in scenarios like product audience analysis and population aging analysis, tasks can be implemented to classify facial attributes in images. The category labels are various facial attribute categories, such as male, female, child, youth, middle-aged, and elderly, and the image classification model is a facial attribute classification model. Furthermore, in scenarios like teaching evaluation, product sales, and personnel interviews, tasks can be implemented to classify human emotions in images. The category labels are various human emotions, such as happy, nervous, sad, disgusted, and bored, and the image classification model is a human emotion classification model. Finally, in scenarios like renting, decorating, and buying a house, tasks can be implemented to classify room styles in images. The category labels are various room styles, such as pastoral, minimalist, classical, new Chinese, Mediterranean, and Southeast Asian, and the image classification model is a room style classification model.

[0189] This embodiment takes the classification of human poses in an image as an example. After acquiring the human pose image to be classified, the image is input into a trained human pose classification model for feature extraction and analysis. The model outputs a category probability estimation vector, which represents the probability that the corresponding human pose image belongs to each human pose category. The human pose category corresponding to the highest probability value in the category probability estimation vector is then determined as the human pose category of the image to be classified, thus completing the human pose classification task.

[0190] It is understood that in the specific embodiments of this application, data related to facial images, human body images, room images, etc. are involved. When the embodiments of this application are applied to specific products or technologies, user permission or consent is required, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.

[0191] This embodiment obtains the image classification model trained in Embodiment 1 or Embodiment 2 of the present invention, performs feature extraction and feature analysis on the image to be classified, outputs the category probability estimation vector of the image to be classified, determines the image category of the image to be classified, and improves the classification accuracy of the image to be classified.

[0192] Corresponding to the speech recognition method in the above embodiments, Figure 4 A structural block diagram of the training device for the image classification model provided in Embodiment 4 of the present invention is given. For ease of explanation, only the parts related to the embodiments of the present invention are shown.

[0193] See Figure 4 The training apparatus for this image classification model includes:

[0194] The probability prediction module 41 is used to acquire the images to be classified in the training set, input the images to be classified into the image classification model, and determine the category probability estimation vector and category probability prediction vector for each image to be classified.

[0195] The parameter determination module 42 is used to determine the predicted image category of each image to be classified based on the category probability prediction vector, determine the number of images in each image category based on the predicted image category of each image to be classified, and determine the first loss weight parameter of each image to be classified based on the number of images in each image category and a preset image number threshold.

[0196] The loss calculation module 43 is used to determine the model loss of the image classification model based on the predicted image category, the first loss weight parameter, the category probability estimation vector, and the category probability prediction vector for each image to be classified.

[0197] The model training module 44 is used to train the image classification model based on the model loss to obtain a trained image classification model.

[0198] Optionally, the training device for the above image classification model also includes:

[0199] The image quantity determination module is used to obtain the original image category of each image to be classified, and determine the number of images in each original image category based on the image to be classified and the original image category;

[0200] The second loss weight parameter determination module is used to determine the second loss weight parameter for each image to be classified based on the number of images in each original image category and a preset image number threshold.

[0201] Correspondingly, the loss calculation module is used to determine the model loss of the image classification model based on the predicted image category, the first loss weight parameter, the second loss weight parameter, the category probability estimation vector, and the category probability prediction vector for each image to be classified.

[0202] Optionally, the above-mentioned second loss weight parameter determination module includes:

[0203] The cross-entropy loss calculation submodule is used to determine the cross-entropy loss for each image to be classified based on the category probability estimation vector and the predicted image category.

[0204] The first sub-loss calculation submodule is used to determine the first sub-loss for each image to be classified based on the second loss weight parameters and the cross-entropy loss.

[0205] The first model sub-loss calculation submodule is used to calculate the mean of the first sub-loss of all images to be classified, and to determine the mean of the first sub-loss as the first model sub-loss of the image classification model.

[0206] The regularization loss calculation submodule is used to determine the regularization loss for each image to be classified based on the class probability estimation vector and the class probability prediction vector.

[0207] The second sub-loss calculation submodule is used to determine the second sub-loss for each image to be classified based on the first loss weight parameters and the regularization loss.

[0208] The second model sub-loss calculation submodule is used to calculate the mean of the second sub-loss of all images to be classified, and to determine the mean of the second sub-loss as the second model sub-loss of the image classification model.

[0209] The first model loss calculation submodule is used to determine the model loss of the image classification model based on the first model sub-loss and the second model sub-loss of the image classification model.

[0210] Optionally, the probability prediction module 41 mentioned above includes:

[0211] The first category probability prediction submodule is used to input the image to be classified into the image classification model and determine the category probability estimation vector of each image to be classified in the first iteration process;

[0212] The second category probability prediction submodule is used to determine the category probability prediction vector of each image to be classified in the first iteration process based on the preset category probability vector and the category probability estimation vector of each image to be classified in the first iteration process.

[0213] The third category probability prediction submodule is used to determine the category probability estimation vector of each image to be classified in the k-th iteration process and the category probability prediction vector in the (k-1)-th iteration process, where k = 2, 3, ...;

[0214] The fourth category probability prediction submodule is used to determine the category probability prediction vector of each image to be classified in the k-th iteration based on the category probability estimation vector of each image in the k-th iteration and the category probability prediction vector in the (k-1)-th iteration.

[0215] Optionally, the parameter determination module 42 includes:

[0216] The vector element value determination submodule is used to determine the vector element value at each position in the class probability prediction vector based on the class probability prediction vector of each image to be classified.

[0217] The image category prediction submodule is used to determine the maximum value of the vector elements and determine the image category corresponding to the maximum value as the predicted image category for each image to be classified.

[0218] Optionally, the parameter determination module 42 includes:

[0219] The first parameter determination submodule is used to calculate the ratio of the preset image number threshold to the number of images in each image category, and to determine the first loss weight parameter for each image category.

[0220] The second parameter determination submodule is used to determine the first loss weight parameter for each image to be classified based on the predicted image category of each image to be classified and the first loss weight parameter for each image category.

[0221] Optionally, the loss calculation module 43 mentioned above includes:

[0222] The cross-entropy loss calculation submodule is used to determine the cross-entropy loss for each image to be classified based on the category probability estimation vector and the predicted image category.

[0223] The third model sub-loss calculation submodule is used to calculate the mean of the cross-entropy loss of all images to be classified, and to determine the mean of the cross-entropy loss as the third model sub-loss of the image classification model.

[0224] The regularization loss calculation submodule is used to determine the regularization loss for each image to be classified based on the class probability estimation vector and the class probability prediction vector.

[0225] The second sub-loss calculation submodule is used to determine the second sub-loss for each image to be classified based on the first loss weight parameters and the regularization loss.

[0226] The second model sub-loss calculation submodule is used to calculate the mean of the second sub-loss of all images to be classified, and to determine the mean of the second sub-loss as the second model sub-loss of the image classification model.

[0227] The second model loss calculation submodule is used to determine the model loss of the image classification model based on the third model sub-loss and the second model sub-loss of the image classification model.

[0228] Figure 5 This is a schematic diagram of the structure of a computer device provided in Embodiment 5 of the present invention. Figure 5 As shown, the computer device of this embodiment includes: at least one processor ( Figure 5 Only one is shown in the diagram), a memory, and a computer program stored in the memory and capable of running on at least one processor, which, when executing the computer program, implements the steps in any of the above-described model training method embodiments.

[0229] This computer device may include, but is not limited to, a processor and memory. Those skilled in the art will understand that... Figure 5 The examples of computer devices are merely examples and do not constitute a limitation on computer devices. Computer devices may include more or fewer components than shown in the illustration, or combinations of certain components, or different components, such as network interfaces, displays, and input devices.

[0230] The processor referred to can be a CPU, but it can also be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor can be a microprocessor or any conventional processor.

[0231] Memory includes readable storage media, internal memory, etc., wherein internal memory can be the RAM of a computer device, providing an environment for the operation of the operating system and computer-readable instructions stored in the readable storage media. The readable storage media can be the hard drive of a computer device, or in other embodiments, it can be an external storage device of the computer device, such as a plug-in hard drive, Smart Media Card (SMC), Secure Digital (SD) card, or Flash Card. Furthermore, memory can include both internal storage units and external storage devices of a computer device. Memory is used to store the operating system, applications, bootloader, data, and other programs, such as program code for computer programs. Memory can also be used to temporarily store data that has been output or will be output.

[0232] Those skilled in the art will understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is used as an example. In practical applications, the functions described above can be assigned to different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit. Furthermore, the specific names of the functional units and modules are only for easy differentiation and are not intended to limit the scope of protection of this invention. The specific working process of the units and modules in the above device can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here. If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the present invention can implement all or part of the processes in the methods of the above embodiments by instructing related hardware through a computer program. The computer program can be stored in a computer-readable storage medium, and when executed by a processor, it can implement the steps of the above method embodiments. The computer program includes computer program code, which can be in the form of source code, object code, executable files, or certain intermediate forms. A computer-readable medium can include at least: any entity or device capable of carrying computer program code, a recording medium, a computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media. Examples include USB flash drives, portable hard drives, magnetic disks, or optical disks. In some jurisdictions, according to legislation and patent practice, computer-readable media cannot be electrical carrier signals or telecommunication signals.

[0233] The present invention can implement all or part of the processes in the methods of the above embodiments, or it can be accomplished by a computer program product. When the computer program product is run on a computer device, the computer device executes the steps in the above method embodiments.

[0234] In the above embodiments, the descriptions of each embodiment have different focuses. For parts that are not described in detail or recorded in a certain embodiment, please refer to the relevant descriptions of other embodiments.

[0235] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0236] In the embodiments provided by this invention, it should be understood that the disclosed apparatus / computer devices and methods can be implemented in other ways. For example, the apparatus / computer device embodiments described above are merely illustrative. For instance, the division of modules or units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.

[0237] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0238] The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be included within the protection scope of the present invention.

Claims

1. A training method for an image classification model, characterized in that, The training method for the image classification model includes: Obtain the images to be classified in the training set, input the images to be classified into the image classification model, and determine the category probability estimation vector and category probability prediction vector for each image to be classified. Based on the category probability prediction vector, the predicted image category of each image to be classified is determined; based on the predicted image category of each image to be classified, the number of images in each predicted image category is determined; based on the number of images in each predicted image category and a preset image number threshold, a first loss weight parameter for each image to be classified is determined. The model loss of the image classification model is determined based on the predicted image category of each image to be classified, the first loss weight parameter, the category probability estimation vector, and the category probability prediction vector. The image classification model is trained based on the model loss to obtain a trained image classification model; The step of inputting the image to be classified into the image classification model and determining the class probability estimation vector and class probability prediction vector for each image to be classified includes: The image to be classified is input into the image classification model to determine the class probability estimation vector of each image to be classified in the first iteration process; Based on the preset category probability vector and the category probability estimation vector of each image to be classified in the first iteration, the category probability prediction vector of each image to be classified in the first iteration is determined. Determine the class probability estimation vector for each image to be classified in the k-th iteration and the class probability prediction vector in the (k-1)-th iteration, where k = 2, 3, ...; Based on the class probability estimation vector of each image to be classified in the k-th iteration and the class probability prediction vector in the (k-1)-th iteration, the class probability prediction vector of each image to be classified in the k-th iteration is determined.

2. The training method for the image classification model according to claim 1, characterized in that, The training method for the image classification model also includes: Obtain the original image category for each image to be classified, and determine the number of images in each original image category based on the image to be classified and the original image category; Based on the number of images in each original image category and the preset image number threshold, a second loss weight parameter is determined for each image to be classified; Correspondingly, based on the predicted image category of each image to be classified, the first loss weight parameter, the category probability estimation vector, and the category probability prediction vector, the model loss of the image classification model is determined as follows: The model loss of the image classification model is determined based on the predicted image category of each image to be classified, the first loss weight parameter, the second loss weight parameter, the category probability estimation vector, and the category probability prediction vector.

3. The training method for the image classification model according to claim 2, characterized in that, The step of determining the model loss of the image classification model based on the predicted image category, the first loss weight parameter, the second loss weight parameter, the category probability estimation vector, and the category probability prediction vector for each image to be classified includes: Based on the category probability estimation vector and the predicted image category, determine the cross-entropy loss for each image to be classified; Based on the second loss weight parameter and the cross-entropy loss, a first sub-loss is determined for each image to be classified; Calculate the mean of the first sub-loss of all the images to be classified, and determine the mean of the first sub-loss as the first model sub-loss of the image classification model; Based on the category probability estimation vector and the category probability prediction vector, determine the regularization loss for each image to be classified; Based on the first loss weight parameter and the regularization loss, a second sub-loss is determined for each image to be classified; Calculate the mean of the second sub-loss for all the images to be classified, and determine the mean of the second sub-loss as the second model sub-loss of the image classification model; The model loss of the image classification model is determined based on the first model sub-loss and the second model sub-loss of the image classification model.

4. The training method for the image classification model according to claim 1, characterized in that, Determining the predicted image category for each image to be classified based on the category probability prediction vector includes: Based on the category probability prediction vector of each image to be classified, determine the vector element value at each position in the category probability prediction vector; The maximum value of the vector element is determined, and the image category corresponding to the maximum value is determined as the predicted image category for each image to be classified.

5. The training method for the image classification model according to claim 1, characterized in that, The step of determining the first loss weight parameter for each image to be classified based on the number of images for each predicted image category and a preset image number threshold includes: Calculate the ratio of the preset image number threshold to the number of images for each predicted image category, and determine the first loss weight parameter for each predicted image category; The first loss weight parameter for each image to be classified is determined based on the predicted image category of each image to be classified and the first loss weight parameter for each predicted image category.

6. The training method for the image classification model according to claim 1, characterized in that, The step of determining the model loss of the image classification model based on the predicted image category of each image to be classified, the first loss weight parameter, the category probability estimation vector, and the category probability prediction vector includes: Based on the category probability estimation vector and the predicted image category, determine the cross-entropy loss for each image to be classified; Calculate the mean of the cross-entropy loss for all the images to be classified, and determine the mean of the cross-entropy loss as the third model sub-loss of the image classification model; Based on the category probability estimation vector and the category probability prediction vector, determine the regularization loss for each image to be classified; Based on the first loss weight parameter and the regularization loss, a second sub-loss is determined for each image to be classified; Calculate the mean of the second sub-loss for all the images to be classified, and determine the mean of the second sub-loss as the second model sub-loss of the image classification model; The model loss of the image classification model is determined based on the third model sub-loss and the second model sub-loss of the image classification model.

7. A training device for an image classification model, characterized in that, The training device for the image classification model includes: The probability prediction module is used to acquire the images to be classified in the training set, input the images to be classified into the image classification model, and determine the category probability estimation vector and category probability prediction vector for each image to be classified. The parameter determination module is used to determine the predicted image category of each image to be classified based on the category probability prediction vector, determine the number of images in each predicted image category based on the predicted image category of each image to be classified, and determine the first loss weight parameter for each image to be classified based on the number of images in each predicted image category and a preset image number threshold. The loss calculation module is used to determine the model loss of the image classification model based on the predicted image category of each image to be classified, the first loss weight parameter, the category probability estimation vector, and the category probability prediction vector. The model training module is used to train the image classification model based on the model loss to obtain a trained image classification model; The probability prediction module includes: The first category probability prediction submodule is used to input the image to be classified into the image classification model and determine the category probability estimation vector of each image to be classified in the first iteration process; The second category probability prediction submodule is used to determine the category probability prediction vector of each image to be classified in the first iteration process based on the preset category probability vector and the category probability estimation vector of each image to be classified in the first iteration process. The third category probability prediction submodule is used to determine the category probability estimation vector of each image to be classified in the k-th iteration process and the category probability prediction vector in the (k-1)-th iteration process, where k=2, 3, ...; The fourth category probability prediction submodule is used to determine the category probability prediction vector of each image to be classified in the k-th iteration process based on the category probability estimation vector of each image to be classified in the k-th iteration process and the category probability prediction vector in the (k-1)-th iteration process.

8. A computer device, characterized in that, The computer device includes a processor, a memory, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the training method of the image classification model as described in any one of claims 1 to 6.

9. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by the processor, it implements the training method of the image classification model as described in any one of claims 1 to 6.

Citation Information

Patent Citations

Image segmentation model training method and device
CN111260665A
Classification model training method and device, image classification method and device, equipment and medium
CN113902010A

Patent Information

AI Technical Summary

Abstract

Description

Patent Citations

Image segmentation model training method and device

Classification model training method and device, image classification method and device, equipment and medium