Image classification method, device, apparatus and storage medium
By training a multi-class classification model, utilizing a bidirectional long short-term memory network to learn the correlation of image category features, and training branch models for correction when necessary, the problem of cumbersome model training and poor classification results caused by the increase in the number of image categories is solved, achieving efficient and accurate multi-class image classification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SO-YOUNG INT INC
- Filing Date
- 2021-09-30
- Publication Date
- 2026-06-23
AI Technical Summary
In existing technologies, as the number of image categories increases, the number of classification models increases linearly, resulting in cumbersome and redundant model training operations, low efficiency, and different classification models are unable to learn the correlation between different image categories, leading to poor classification results.
By training a multi-class classification model, the correlation between features of different classes is learned using a bidirectional long short-term memory network. When the classification accuracy is low, a branch model is trained to correct the multi-class classification model. A combination of cross-entropy loss function and smoothing loss function is used for training.
This method enables the simultaneous classification of multiple image categories using a single neural network model, improving classification efficiency, enhancing model robustness, and addressing the issue of low classification accuracy for certain categories, resulting in more stable training performance.
Smart Images

Figure CN115937562B_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of image processing technology, specifically relating to an image classification method, apparatus, device, and storage medium. Background Technology
[0002] Currently, a massive number of images are generated online every day. Based on this vast amount of images, services such as text-based image search and image-by-image search can be provided to users. Before providing these services, the images must first be categorized.
[0003] In related technologies, a separate classification model is trained for each image category, and each category's image is identified from a massive dataset using its corresponding model. However, as the number of image categories increases, the number of classification models also increases linearly, requiring extensive model training. Furthermore, classifying images using multiple models is cumbersome, redundant, and inefficient. Moreover, using different models for classification prevents the models from learning the correlations between different image categories, resulting in poor classification performance. Summary of the Invention
[0004] This application proposes an image classification method, apparatus, device, and storage medium. It trains a multi-class classification model based on images labeled with category tags corresponding to each image category, enabling the multi-class classification model to classify multiple image categories simultaneously. Only one model needs to be trained to classify multiple image categories, which is highly efficient and easy to operate.
[0005] The first aspect of this application proposes an image classification method, including:
[0006] Obtain a first training set, which includes multiple images, each of which is labeled with a category label corresponding to each image category;
[0007] Construct the structure of the first neural network model for multi-label classification;
[0008] The first neural network model is trained based on the first training set to obtain a multi-class classification model.
[0009] In some embodiments of this application, the structure for constructing the first neural network model for multi-label classification includes:
[0010] By connecting the output of the pre-defined high-efficiency network to the input of the bidirectional long short-term memory network, a first neural network model for multi-label classification is obtained.
[0011] In some embodiments of this application, training the constructed first neural network model based on the first training set to obtain a multi-class classification model includes:
[0012] Multiple images are obtained from the first training set;
[0013] The acquired images are input into the first neural network model to obtain the classification results for each image.
[0014] Based on the classification result of each image, the loss value for the current training cycle is calculated using the cross-entropy loss function and the smoothing loss function.
[0015] In some embodiments of this application, after training the constructed first neural network model based on the first training set to obtain a multi-class classification model, the method further includes:
[0016] Obtain the image to be classified;
[0017] The image to be classified is classified using the multi-class classification model to obtain the probability that the image belongs to each image category.
[0018] In some embodiments of this application, after training the constructed first neural network model based on the first training set to obtain a multi-class classification model, the method further includes:
[0019] The accuracy of the multi-class classification model in classifying each image category is determined respectively;
[0020] If there is a first image category with an accuracy lower than a preset threshold, then train the branch model corresponding to the first image category;
[0021] The multi-class classification model is modified using the branching model to obtain the modified multi-class classification model.
[0022] In some embodiments of this application, training the branch model corresponding to the first image category includes:
[0023] Obtain a second training set corresponding to the first image category. The second training set includes multiple images, each of which is labeled with a category label corresponding to the first image category.
[0024] A structure is constructed for a second neural network model to classify the first image category separately, the second neural network model including a branch model corresponding to the first image category;
[0025] The second neural network model is trained based on the second training set to obtain the trained branch model.
[0026] In some embodiments of this application, the structure for constructing a second neural network model for separately classifying the first image category includes:
[0027] By sequentially connecting a predetermined number of fully connected layers, a branch model corresponding to the first image category is obtained;
[0028] Connect the output of the preset efficient network in the multi-class classification model to the input of the branch model to obtain a second neural network model for separately classifying the first image category.
[0029] In some embodiments of this application, the step of modifying the multi-class classification model using the branching model to obtain a modified multi-class classification model includes:
[0030] Connect the input of the trained branch model to the output of the preset high-efficiency network in the multi-class classification model;
[0031] The output of the branch model is connected to the output of the bidirectional long short-term memory network in the multi-class classification model through the fusion module to obtain the modified multi-class classification model.
[0032] The fusion module is used to fuse the first classification result corresponding to the first image category output by the branch model and the second classification result corresponding to the first image category output by the bidirectional long short-term memory network.
[0033] In some embodiments of this application, the method further includes:
[0034] Obtain the image to be classified;
[0035] The image to be classified is classified using a modified multi-class classification model to obtain the probability that the image belongs to each image category.
[0036] In some embodiments of this application, the step of classifying the image to be classified using a modified multi-class classification model to obtain the probability that the image to be classified belongs to each image category includes:
[0037] The feature vector of the image to be classified is extracted by the preset high-efficiency network in the modified multi-class classification model.
[0038] The feature vector is classified by the branch model in the modified multi-class classification model to obtain the first probability that the image to be classified belongs to the first image category.
[0039] The feature vector is classified by the bidirectional long short-term memory network in the modified multi-class classification model to obtain the probability that the image to be classified belongs to each image category, including the second probability that the image to be classified belongs to the first image category.
[0040] The first probability and the second probability are fused to obtain the final probability that the image to be classified belongs to the first image category.
[0041] An embodiment of the second aspect of this application provides an image classification apparatus, comprising:
[0042] The acquisition module is used to acquire a first training set, which includes multiple images, each of which is labeled with a category label corresponding to each image category.
[0043] The building blocks are used to construct the structure of the first neural network model for multi-label classification.
[0044] The training module is used to train the constructed first neural network model based on the first training set to obtain a multi-class classification model.
[0045] An embodiment of the third aspect of this application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the method described in the first aspect above.
[0046] An embodiment of the fourth aspect of this application provides a computer-readable storage medium having a computer program stored thereon, the program being executed by a processor to implement the method described in the first aspect above.
[0047] The technical solutions provided in this application embodiment have at least the following technical effects or advantages:
[0048] In this embodiment, a multi-class classification model is trained to simultaneously classify multiple image categories using a single neural network model. This multi-class classification model includes a bidirectional long short-term memory (LSTM) network, which learns the correlations between features of different categories, resulting in more robust features. During the classification process for each image category, if the classification accuracy for a particular image category is low, a branch model corresponding to that image category is trained. This branch model is then used to correct the multi-class classification model, addressing the issue of low classification accuracy for that image category. The model training process employs a combination of cross-entropy loss and smoothing loss functions, resulting in smoother training and better training performance.
[0049] Additional aspects and advantages of this application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of this application. Attached Figure Description
[0050] Various other advantages and benefits will become apparent to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the scope of this application. Furthermore, the same reference numerals denote the same parts throughout the drawings.
[0051] In the attached diagram:
[0052] Figure 1 A flowchart of an image classification method provided in an embodiment of this application is shown;
[0053] Figure 2 A schematic diagram of the structure of a first neural network model provided in an embodiment of this application is shown;
[0054] Figure 3 A schematic diagram of the structure of a second neural network model provided in an embodiment of this application is shown;
[0055] Figure 4 A schematic diagram of the structure of a modified multi-class classification model provided in an embodiment of this application is shown;
[0056] Figure 5 A schematic diagram illustrating the image classification effect of a multi-class classification model provided in an embodiment of this application is shown;
[0057] Figure 6 This invention provides a schematic diagram of the structure of an image classification device according to an embodiment of the present application.
[0058] Figure 7 This illustration shows a schematic diagram of the structure of an electronic device according to an embodiment of this application;
[0059] Figure 8 A schematic diagram of a storage medium provided in one embodiment of this application is shown. Detailed Implementation
[0060] Exemplary embodiments of this application will now be described in more detail with reference to the accompanying drawings. While exemplary embodiments of this application are shown in the drawings, it should be understood that this application may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided to enable a more thorough understanding of this application and to fully convey the scope of this application to those skilled in the art.
[0061] It should be noted that, unless otherwise stated, the technical or scientific terms used in this application shall have the ordinary meaning as understood by one of ordinary skill in the art to which this application pertains.
[0062] The following description, in conjunction with the accompanying drawings, describes an image classification method, apparatus, device, and storage medium according to embodiments of this application.
[0063] Currently, for image classification problems involving multiple image categories, related technologies train separate classification models for each category, using these models to identify images of each category from a massive dataset. However, as the number of image categories increases, the number of classification models also increases linearly, requiring extensive model training. Furthermore, classifying images using multiple models is cumbersome, redundant, and inefficient. Moreover, using different models for separate classification prevents the models from learning the correlations between different image categories, resulting in poor classification performance.
[0064] Based on this, embodiments of this application provide an image classification method. This method trains a multi-class classification model to achieve simultaneous classification of multiple image categories using a single neural network model. The multi-class classification model includes a bidirectional long short-term memory (LSTM) network, which learns the correlations between features of different categories, resulting in better robustness of the learned features. During the classification process for each image category, if the classification accuracy for a certain image category is low, a branch model corresponding to that image category is trained. This branch model is then used to correct the multi-class classification model, addressing the problem of low classification accuracy for that image category.
[0065] See Figure 1 The method specifically includes the following steps:
[0066] Step 101: Obtain the first training set, which includes multiple images, each labeled with a category label corresponding to its respective category.
[0067] First, determine all image categories that need to be classified. Image categories can be determined based on business needs. For example, in fields such as medical and cosmetic surgery, image categories may include surgical sites, puzzles, surgical procedures, instruments and medicines, indoor environments, outdoor environments, people, or others. Other categories may include other types of images uploaded by users, such as food photos, landscape photos, cartoon images, emoticons, advertising images, game screenshots, icons, WeChat chat screenshots, and any other non-medical / surgical related images. This application does not limit the specific number of image categories or the method of classification; in practical applications, image categories can be determined based on business needs.
[0068] A large number of images are acquired. For each acquired image, a category label is assigned to determine the image category. The category label can be a binary label. For example, an image showing a surgical site would have a category label of 1 for the surgical site, while the category labels for other image categories such as jigsaw puzzles, surgical procedures, instruments and medications, indoor environments, outdoor environments, people, or others would all be 0.
[0069] The first training set consists of multiple images labeled with category tags corresponding to each image category.
[0070] Step 102: Construct the structure of the first neural network model for multi-label classification.
[0071] By connecting the output of the pre-defined high-efficiency network to the input of a Bi-directional Long Short-Term Memory (BiLSTM) network, a first neural network model for multi-label classification is obtained.
[0072] The preset high-efficiency network can be any one of the eight high-efficiency networks from B0 to B7, namely EfficientNet-B0 to EfficientNet-B7, or it can be any other neural network capable of image classification.
[0073] like Figure 2 As shown, the structure of the first neural network model is illustrated using the EfficientNet-B0 network as an example. Figure 2 The arrows indicate the direction of data transmission. The EfficientNet-B0 network consists of a first convolutional layer (conv), seven MBconv layers, and a second convolutional layer (conv) connected sequentially. The first convolutional layer (conv) has a 3x3 kernel size, the second convolutional layer (conv) has a 1x1 kernel size, and the six MBconv layers have kernel sizes of 3x3, 3x3, 5x5, 3x3, 5x5, 5x5, and 3x3, respectively. The output of the second convolutional layer (conv) is connected to the input of the bidirectional long short-term memory network. The second convolutional layer (conv) can also be replaced with a fully connected layer.
[0074] This application does not limit the order in which steps 101 and 102 are executed. Step 101 can be executed first, followed by step 102. Alternatively, step 102 can be executed first, followed by step 101. Or, steps 101 and 102 can be executed simultaneously.
[0075] Step 103: Train the constructed first neural network model based on the first training set to obtain a multi-class classification model.
[0076] Multiple images are acquired from the first training set. The number of images acquired can be the number of images that the first neural network model can process in parallel; this number can be called the batch size. Each acquired image is input into the constructed first neural network model. In this embodiment, each image input into the first neural network model is scaled to a preset size, such as 224*224 or 226*226. The image of the preset size is input into the first neural network model, where a high-efficiency network extracts the feature vector of the image. The extracted feature vector is then input into a bidirectional long short-term memory network. The bidirectional long short-term memory network classifies the image based on the feature vector and outputs the classification result, i.e., the probability that the image belongs to each image category.
[0077] like Figure 2 As shown, a 224*224 image is input into EfficientNet-B0. The first convolutional layer (conv) processes the image and outputs a 112*112 feature map. This 112*112 feature map is input into the first MBconv layer, which outputs a 112*112 feature map to the second MBconv layer. The second MBconv layer outputs a 56*56 feature map to the third MBconv layer. The third MBconv layer outputs a 28*28 feature map to the fourth MBconv layer. The fourth MBconv layer outputs a 14*14 feature map to the fifth MBconv layer. The fifth MBconv layer outputs a 14*14 feature map to the sixth MBconv layer. The sixth MBconv layer outputs a 7*7 feature map to the seventh MBconv layer, which in turn outputs a 7*7 feature map to the second convolutional layer (conv). The second convolutional layer (conv) outputs a 1*1280 feature vector to the bidirectional long short-term memory network. Assuming the image categories include eight categories: surgical site, puzzle, surgical procedure, instruments and medicines, indoor environment, outdoor environment, people, or others, the final bidirectional long short-term memory network outputs a 1*8 classification result, which represents the probability that the image belongs to one of these eight categories: surgical site, puzzle, surgical procedure, instruments and medicines, indoor environment, outdoor environment, people, or others.
[0078] In this embodiment, a bidirectional long short-term memory network is introduced into the first neural network model, enabling the first neural network model to learn the preceding and following relationships and the preceding and following relationships between features of different categories. This allows the model to learn the correlation between features of different categories, enhances the robustness of the model, and improves the accuracy of the model in multi-class classification.
[0079] After obtaining the classification results of each image in the current training period using the above method, the loss value of the current training period is calculated based on the classification results of each image using the cross-entropy loss function and the smoothing loss function.
[0080] The cross-entropy loss function can be sigmoidcrossentrop, and the smoothing loss function can be smoothL1. The cross-entropy loss function is shown in Equation (1), and the smoothing loss function is shown in Equation (2). After calculating the cross-entropy loss value using Equation (1) and the smoothing loss value using Equation (2), the loss value for the current training period is calculated using Equation (3).
[0081]
[0082]
[0083] Loss_all = Loss * 0.5 + smooth L1 *0.5…(3)
[0084] In formulas (1) and (2), Loss_all is the loss value for the current training period, and LOSS is the cross-entropy loss value. This represents the smoothing loss value. M is the number of image categories, and N is the number of categories with different labels. For example, for binary classification, the number of categories is 2. in This represents the category label corresponding to the i-th image category, such as the label for surgical site, the label for puzzle, etc. x represents the probability of the i-th image category predicted by the first neural network model. i This represents the absolute value of the difference between the predicted probability corresponding to the i-th image category and the value of the category label corresponding to the i-th image category in the image.
[0085] After calculating the loss value for the current training cycle using the above method, determine whether the number of training cycles has reached the preset number. If so, stop training, determine the training cycle with the smallest loss value from the trained cycles, and combine the model parameters trained in the training cycle with the smallest loss value with the structure of the first neural network model to obtain a trained multi-class classification model. If not, continue training using the above method until the preset number of training cycles is reached to obtain a trained multi-class classification model.
[0086] After obtaining the multi-class classification model using the above method, the model can be used to simultaneously classify multiple image categories. Specifically, the image to be classified is acquired, scaled to a preset size, and then input into the trained multi-class classification model. The multi-class classification model classifies the image and outputs the probability that the image belongs to each image category.
[0087] A multi-class classification model is obtained by training on multi-label images. This model can classify multiple image categories simultaneously using only one model, improving the efficiency of multi-class classification. Furthermore, the bidirectional long short-term memory network included in the trained multi-class classification model allows it to learn the correlations between features of different image categories, resulting in more accurate classification and better robustness.
[0088] A multi-class classification model, which classifies multiple image categories simultaneously, inevitably leads to mutual influence between features of different image categories. Therefore, the classification accuracy for a few image categories may be low. To address this issue, after training the multi-class classification model as described above, the accuracy of the model for each image category can be determined separately.
[0089] Specifically, a test set can be obtained, which includes multiple images belonging to different image categories. The images in the test set are input into a trained multi-class classification model, and the probability of each image belonging to each image category is obtained. The image category with the highest probability is the predicted image category. For each image category, based on the predicted image category and the actual image category of each image, the number of images whose predicted and actual categories match is counted. The ratio between the number of matching images and the total number of images in the test set that actually belong to that image category is calculated, and this ratio is determined as the accuracy corresponding to that image category.
[0090] The accuracy of each image category is compared with a preset threshold. If no image category has an accuracy lower than the preset threshold, it indicates that the trained multi-class classification model has high accuracy for each image category and no adjustment is needed. If a first image category has an accuracy lower than the preset threshold, a branch model corresponding to the first image category is trained. Then, this branch model is used to correct the multi-class classification model, resulting in a corrected multi-class classification model.
[0091] The aforementioned preset threshold can be 70% or 50%, etc. This application embodiment does not limit the specific value of the preset threshold; it can be determined according to requirements in practical applications.
[0092] Specifically, the branch model corresponding to the first image category is trained through the following steps S1-S3:
[0093] S1: Obtain the second training set corresponding to the first image category. The second training set includes multiple images, each of which is labeled with the category label corresponding to the first image category.
[0094] The category labels for all image categories other than the first image category in each image in the first training set can be deleted, leaving only the category label for the first image category, to obtain the second training set.
[0095] Alternatively, a large number of images can be acquired again, and the category label corresponding to the first image category can be labeled in each acquired image to obtain a second training set.
[0096] S2: Construct the structure of a second neural network model for separately classifying the first image category. The second neural network model includes a branch model corresponding to the first image category.
[0097] A predetermined number of fully connected layers are sequentially connected in series to obtain the branch model corresponding to the first image category. The predetermined number can be 3 or 4, etc. The embodiments of this application do not limit the specific value of the predetermined number, and it can be set according to the needs in actual applications.
[0098] Connect the output of the preset efficient network in the multi-class classification model trained above to the input of the branch model above to obtain a second neural network model for separately classifying the first image category.
[0099] S3: Train the second neural network model based on the second training set to obtain the trained branch model.
[0100] Multiple images are acquired from the second training set, the number of which can be the number of images that the second neural network model can process in parallel. Each acquired image is input into the constructed second neural network model. In this embodiment, each image input into the second neural network model is also scaled to a preset size. The image of the preset size is input into the first neural network model, where a preset high-efficiency network extracts the feature vector of the image. The extracted feature vector is then input into the branch model corresponding to the first image category. The branch model classifies the image based on the feature vector and outputs the classification result, i.e., the probability that the image belongs to the first image category.
[0101] like Figure 3 As shown, assuming the branching model consists of three sequentially connected fully connected layers, the output of a pre-defined efficient network, EfficientNet-B0, is connected to the first fully connected layer of the branching model. A 224*224 image is input into EfficientNet-B0, which outputs a 1*1280 feature vector, which is then fed into the first fully connected layer. The first fully connected layer outputs a 1*320 feature vector, which is fed into the second fully connected layer. The second fully connected layer outputs a 1*80 feature vector, which is fed into the third fully connected layer. The third fully connected layer outputs a 1*1 classification result. Assuming the first image belongs to the surgical procedure category, this classification result represents the probability that the image belongs to the surgical procedure category.
[0102] After obtaining the classification results of each image in the current training cycle through the above method, the loss value of the current training cycle is calculated based on the classification results of each image using the cross-entropy loss function of formula (1), the smoothing loss function of formula (2), and the total loss function of formula (3).
[0103] After calculating the loss value of the current training period using the above method, determine whether the number of training periods has reached the preset number of times. If so, stop training, determine the training period with the smallest loss value from the trained periods, and combine the model parameters and structure of the branch model trained in the training period with the smallest loss value to obtain the trained branch model. If not, continue training using the above method until the preset number of training periods is reached to obtain the trained branch model.
[0104] After training the branch model corresponding to the first image category using the above method, this branch model is used to correct the multi-class classification model trained in steps 101-103. Specifically, the input of the trained branch model is connected to the output of a pre-set efficient network in the multi-class classification model. The output of the branch model is then connected to the output of the bidirectional long short-term memory network in the multi-class classification model via a fusion module to obtain the corrected multi-class classification model. The fusion module is used to fuse the first classification result corresponding to the first image category output by the branch model and the second classification result corresponding to the first image category output by the bidirectional long short-term memory network.
[0105] like Figure 4 As shown, the input of the first fully connected layer of the branch model is connected to the output of the EfficientNet-B0 network, and the output of the last fully connected layer of the branch model is connected to the fusion module. The output of the bidirectional long short-term memory network is also connected to the fusion module. The data output by the fusion module is the final probability corresponding to the first image category. Figure 4 Taking eight image categories—surgical site, puzzle, surgical procedure, instruments and drugs, indoor environment, outdoor environment, people, and others—as an example, a branching model for the surgical procedure was trained. The probability output by the branching model for the surgical procedure is fused with the probability of the surgical procedure category output by the bidirectional long short-term memory network to obtain the final probability corresponding to the surgical procedure category.
[0106] The fusion module can perform a weighted fusion operation to weight and fuse the first probability corresponding to the first image category output by the branch model with the second probability corresponding to the first image category output by the bidirectional long short-term memory network. Specifically, the weighted fusion can be performed using the following formula (4).
[0107] feature=feature1*p+feature2*(1-p)…(4)
[0108] Where feature is the final probability corresponding to the first image category after fusion, feature1 is the first probability corresponding to the first image category output by the branch model, feature2 is the second probability corresponding to the first image category output by the bidirectional long short-term memory network, and P is the fusion factor.
[0109] To determine the fusion factor P, in this embodiment, 11 numbers—0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0—can be substituted into formula (4) above to record the final probability of the predicted first image category when P takes different values. In the test set, the final probability of the predicted first image is compared with the actual situation, and the accuracy of the prediction when P takes different values is calculated. The value of P with the highest accuracy is selected. This value can be 0.7 or 0.8, etc.
[0110] In this embodiment, the number of first image categories that need to be corrected by the branch model can be one or more. For cases where there are multiple first image categories, each branch model corresponding to a first image category can be trained separately as described above. Then, each branch model corresponding to a first image category is connected in parallel to the multi-class classification model trained in steps 101-103. The input of each branch model corresponding to a first image category is connected to the output of a preset high-efficiency network, and the output of each branch model corresponding to a first image category is connected to the output of a bidirectional long short-term memory network through a fusion module.
[0111] After obtaining the modified multi-class classification model through the above method, the modified model can be used to classify images into multiple image categories. Specifically, the image to be classified is obtained, and the modified multi-class classification model is used to classify the image to be classified, obtaining the probability of the image belonging to each image category.
[0112] The image to be classified is scaled to a preset size and then input into a preset efficient network in the corrected multi-class classification model. The preset efficient network extracts feature vectors from the image, which are then input into the branch model and bidirectional long short-term memory network corresponding to each first image category. Each branch model classifies the feature vectors to obtain the first probability that the image belongs to each first image category. The bidirectional long short-term memory network further classifies the feature vectors to obtain the probability that the image belongs to each image category, including a second probability. For each first image category, a fusion module weighted and fused the first and second probabilities to obtain the final probability that the image belongs to each first image category.
[0113] After obtaining the final probability of the image to be classified belonging to each image category, the image category with the highest probability is determined as the image category to which the image to be classified belongs, thus realizing the classification of multiple image categories through a single model and identifying the image category to which the image to be classified belongs.
[0114] Taking an image category comprising eight categories—surgical site, mosaic, surgical procedure, instruments and medications, indoor environment, outdoor environment, people, and others—as an example, the classification method provided in this application embodiment achieves the following final image classification effect: Figure 5 As shown.
[0115] In this embodiment, a multi-class classification model is trained to simultaneously classify multiple image categories using a single neural network model. This multi-class classification model includes a bidirectional long short-term memory (LSTM) network, which learns the correlations between features of different categories, resulting in more robust features. During the classification process for each image category, if the classification accuracy for a particular image category is low, a branch model corresponding to that image category is trained. This branch model is then used to correct the multi-class classification model, addressing the issue of low classification accuracy for that image category. The model training process employs a combination of cross-entropy loss and smoothing loss functions, resulting in smoother training and better training performance.
[0116] This application also provides an image classification apparatus for performing the image classification method provided in any of the above embodiments. Figure 6 As shown, the device includes:
[0117] The acquisition module 201 is used to acquire the first training set, which includes multiple images, each of which is labeled with the category label corresponding to each image category;
[0118] Module 202 is used to construct the structure of the first neural network model for multi-label classification;
[0119] The training module 203 is used to train the constructed first neural network model based on the first training set to obtain a multi-class classification model.
[0120] Module 202 is used to connect the output of a preset high-efficiency network to the input of a bidirectional long short-term memory network to obtain a first neural network model for multi-label classification.
[0121] The training module 203 is used to acquire multiple images from the first training set; input the acquired images into the constructed first neural network model to obtain the classification result of each image; and calculate the loss value of the current training cycle based on the classification result of each image using the cross-entropy loss function and the smoothing loss function.
[0122] The device also includes: a classification module for acquiring the image to be classified; and a multi-class classification model for classifying the image to be classified to obtain the probability that the image to be classified belongs to each image category.
[0123] The device also includes an accuracy determination module, used to determine the accuracy of the multi-class classification model in classifying each image category;
[0124] The training module 203 is also used to train the branch model corresponding to the first image category if there is a first image category with an accuracy lower than a preset threshold.
[0125] The correction module is used to correct the multi-class classification model using the branching model, resulting in a corrected multi-class classification model.
[0126] The training module 203 is also used to obtain a second training set corresponding to the first image category. The second training set includes multiple images, each of which is labeled with a category label corresponding to the first image category. The second neural network model is used to construct a structure for classifying the first image category separately. The second neural network model includes a branch model corresponding to the first image category. The second neural network model is trained according to the second training set to obtain the trained branch model.
[0127] The training module 203 is also used to sequentially connect a preset number of fully connected layers to obtain a branch model corresponding to the first image category; and to connect the output of the preset efficient network in the multi-class classification model with the input of the branch model to obtain a second neural network model for separately classifying the first image category.
[0128] The correction module is used to connect the input of the trained branch model to the output of the pre-set efficient network in the multi-class classification model; the output of the branch model is connected to the output of the bidirectional long short-term memory network in the multi-class classification model through the fusion module to obtain the corrected multi-class classification model; wherein, the fusion module is used to fuse the first classification result corresponding to the first image category output by the branch model and the second classification result corresponding to the first image category output by the bidirectional long short-term memory network.
[0129] The classification module is also used to acquire the image to be classified; the image to be classified is processed by the modified multi-class classification model to obtain the probability of the image to be classified belonging to each image category.
[0130] The classification module is also used to extract feature vectors of the image to be classified through a preset efficient network in the modified multi-class classification model; classify the feature vectors through a branch model in the modified multi-class classification model to obtain a first probability that the image to be classified belongs to a first image category; classify the feature vectors through a bidirectional long short-term memory network in the modified multi-class classification model to obtain the probability that the image to be classified belongs to each image category, including a second probability that the image to be classified belongs to the first image category; and fuse the first probability and the second probability to obtain the final probability that the image to be classified belongs to the first image category.
[0131] The image classification device and the image classification method provided in the above embodiments of this application are based on the same inventive concept and have the same beneficial effects as the methods adopted, run or implemented by the applications stored therein.
[0132] This application also provides an electronic device for performing the image classification method described above. Please refer to... Figure 7 This illustrates a schematic diagram of an electronic device provided by some embodiments of this application. For example... Figure 7 As shown, the electronic device 8 includes: a processor 800, a memory 801, a bus 802, and a communication interface 803. The processor 800, the communication interface 803, and the memory 801 are connected via the bus 802. The memory 801 stores a computer program that can run on the processor 800. When the processor 800 runs the computer program, it executes the image classification method provided in any of the foregoing embodiments of this application.
[0133] The memory 801 may include high-speed random access memory (RAM) or non-volatile memory, such as at least one disk storage device. Communication between this device network element and at least one other network element is achieved through at least one communication interface 803 (which can be wired or wireless), such as the Internet, wide area network, local area network, metropolitan area network, etc.
[0134] Bus 802 can be an ISA bus, PCI bus, or EISA bus, etc. The bus can be divided into an address bus, a data bus, a control bus, etc. The memory 801 is used to store programs. After receiving an execution instruction, the processor 800 executes the program. The image classification method disclosed in any of the foregoing embodiments of this application can be applied to the processor 800, or implemented by the processor 800.
[0135] The processor 800 may be an integrated circuit chip with signal processing capabilities. In implementation, each step of the above method can be completed by the integrated logic circuitry in the hardware of the processor 800 or by instructions in software form. The processor 800 may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it may also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. It can implement or execute the methods, steps, and logic block diagrams disclosed in the embodiments of this application. The general-purpose processor may be a microprocessor or any conventional processor. The steps of the methods disclosed in the embodiments of this application can be directly embodied in the execution of a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may reside in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, or other mature storage media in the art. The storage medium is located in memory 801. Processor 800 reads the information in memory 801 and, in conjunction with its hardware, completes the steps of the above method.
[0136] The electronic device provided in this application embodiment and the image classification method provided in this application embodiment are based on the same inventive concept and have the same beneficial effects as the methods they adopt, operate or implement.
[0137] This application also provides a computer-readable storage medium corresponding to the image classification method provided in the foregoing embodiments. Please refer to... Figure 8 The computer-readable storage medium shown is an optical disc 30, on which a computer program (i.e., a program product) is stored. When the computer program is run by a processor, it executes the image classification method provided in any of the foregoing embodiments.
[0138] It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other optical and magnetic storage media, which will not be elaborated here.
[0139] The computer-readable storage medium provided in the above embodiments of this application and the image classification method provided in the embodiments of this application are based on the same inventive concept and have the same beneficial effects as the methods adopted, run or implemented by the applications stored therein.
[0140] It should be noted that:
[0141] Numerous specific details are set forth in the specification provided herein. However, it will be understood that embodiments of this application may be practiced without these specific details. In some instances, well-known structures and techniques have not been shown in detail so as not to obscure the understanding of this specification.
[0142] Similarly, it should be understood that, for the sake of brevity and to aid in understanding one or more of the various inventive aspects, in the above description of exemplary embodiments of this application, various features of this application are sometimes grouped together in a single embodiment, figure, or description thereof. However, this disclosure should not be construed as reflecting a schematic diagram in which the claimed application requires more features than expressly recited in each claim. Rather, as reflected in the following claims, inventive aspects lie in fewer than all features of a single foregoing disclosed embodiment. Therefore, the claims following the detailed description are hereby expressly incorporated into that detailed description, wherein each claim itself is a separate embodiment of this application.
[0143] Furthermore, those skilled in the art will understand that although some embodiments described herein include certain features but not others included in other embodiments, combinations of features from different embodiments are intended to be within the scope of this application and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.
[0144] The above description is merely a preferred embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. An image classification method, characterized in that, include: Obtain a first training set, which includes multiple images, each of which is labeled with a category label corresponding to each image category; By connecting the output of the pre-defined high-efficiency network to the input of the bidirectional long short-term memory network, a first neural network model for multi-label classification is obtained. The first neural network model is trained based on the first training set to obtain a multi-class classification model; The accuracy of the multi-class classification model in classifying each image category is determined respectively; If there is a first image category with an accuracy lower than a preset threshold, then a second neural network model is constructed to classify the first image category separately. The second neural network model includes a branch model corresponding to the first image category, and the branch model corresponding to the first image category is trained. Connect the input of the trained branch model to the output of the preset high-efficiency network in the multi-class classification model; The output of the branch model is connected to the output of the bidirectional long short-term memory network in the multi-class classification model through the fusion module to obtain the modified multi-class classification model. The fusion module is used to fuse the first classification result corresponding to the first image category output by the branch model and the second classification result corresponding to the first image category output by the bidirectional long short-term memory network.
2. The method according to claim 1, characterized in that, The step of training the constructed first neural network model based on the first training set to obtain a multi-class classification model includes: Multiple images are obtained from the first training set; The acquired images are input into the first neural network model to obtain the classification results for each image. Based on the classification result of each image, the loss value for the current training cycle is calculated using the cross-entropy loss function and the smoothing loss function.
3. The method according to claim 1 or 2, characterized in that, After training the constructed first neural network model based on the first training set to obtain a multi-class classification model, the method further includes: Obtain the image to be classified; The image to be classified is classified using the multi-class classification model to obtain the probability that the image belongs to each image category.
4. The method according to claim 1, characterized in that, The training of the branch model corresponding to the first image category includes: Obtain a second training set corresponding to the first image category. The second training set includes multiple images, each of which is labeled with a category label corresponding to the first image category. The second neural network model is trained based on the second training set to obtain the trained branch model.
5. The method according to claim 4, characterized in that, The structure for constructing a second neural network model for separately classifying the first image category includes: By sequentially connecting a predetermined number of fully connected layers, a branch model corresponding to the first image category is obtained; Connect the output of the preset efficient network in the multi-class classification model to the input of the branch model to obtain a second neural network model for separately classifying the first image category.
6. The method according to claim 1, characterized in that, The method further includes: Obtain the image to be classified; The image to be classified is classified using a modified multi-class classification model to obtain the probability that the image belongs to each image category.
7. The method according to claim 6, characterized in that, The process of classifying the image to be classified using the modified multi-class classification model to obtain the probability of the image belonging to each image category includes: The feature vector of the image to be classified is extracted by the preset high-efficiency network in the modified multi-class classification model. The feature vector is classified by the branch model in the modified multi-class classification model to obtain the first probability that the image to be classified belongs to the first image category. The feature vector is classified by the bidirectional long short-term memory network in the modified multi-class classification model to obtain the probability that the image to be classified belongs to each image category, including the second probability that the image to be classified belongs to the first image category. The first probability and the second probability are fused to obtain the final probability that the image to be classified belongs to the first image category.
8. An image classification device, characterized in that, include: The acquisition module is used to acquire a first training set, which includes multiple images, each of which is labeled with a category label corresponding to each image category; A building module is used to connect the output of a pre-defined high-efficiency network to the input of a bidirectional long short-term memory network to obtain a first neural network model for multi-label classification. The training module is used to train the constructed first neural network model based on the first training set to obtain a multi-class classification model. The accuracy determination module is used to determine the accuracy of the multi-class classification model in classifying each image category. The training module is also used to construct a structure for classifying the first image category separately if there is a first image category with an accuracy lower than a preset threshold. The second neural network model includes a branch model corresponding to the first image category, and the branch model corresponding to the first image category is trained. The correction module is used to connect the input of the trained branch model to the output of the pre-set efficient network in the multi-class classification model; the output of the branch model is connected to the output of the bidirectional long short-term memory network in the multi-class classification model through the fusion module to obtain the corrected multi-class classification model; wherein, the fusion module is used to fuse the first classification result corresponding to the first image category output by the branch model and the second classification result corresponding to the first image category output by the bidirectional long short-term memory network.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, The processor executes the computer program to implement the method as described in any one of claims 1-7.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, The program is executed by a processor to implement the method as described in any one of claims 1-7.