Image classification methods and apparatus, electronic devices and storage media
By acquiring the domain, semantic, and instance features of medical images, and adjusting the image classification model using an encoder and a decoupling model, the problem of low accuracy in medical image classification in existing technologies is solved, achieving higher image classification accuracy and correct classification of abnormal images.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA PING AN LIFE INSURANCE CO LTD
- Filing Date
- 2023-07-07
- Publication Date
- 2026-06-30
Smart Images

Figure CN116758355B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the fields of artificial intelligence and digital medical technology, and in particular to an image classification method and apparatus, electronic device and storage medium. Background Technology
[0002] With the development of intelligent technologies, these technologies are being applied in various fields. For example, in medicine, intelligent image classification is already assisting medical personnel in conducting medical research. Medical images of different internal tissues are complex, and the accuracy of medical image classification is crucial.
[0003] In related technologies, to improve the accuracy of medical image classification, label information from a large amount of labeled image data is transferred to unlabeled image data to determine the image category of the unlabeled image data. This involves calculating the image vector distance between the unlabeled and labeled image data to classify medical images and assist medical personnel in medical research. However, the accuracy of classifying unlabeled image data using image vector distance is low. Therefore, improving the accuracy of image data classification has become an urgent technical problem to be solved. Summary of the Invention
[0004] The main objective of this application is to provide an image classification method, apparatus, electronic device, and storage medium, which aims to improve the accuracy of medical image classification.
[0005] To achieve the above objectives, a first aspect of this application proposes an image classification method, the method comprising:
[0006] Acquire training image features; wherein, the training image features include: image domain features, image semantic features, and image instance features;
[0007] The training image features are input into a preset original image classification model; wherein, the original image classification model includes: a domain encoder, a semantic encoder, an instance encoder, and an original decoder;
[0008] The image domain features are encoded by the domain encoder to obtain an image domain vector; the image semantic features are encoded by the semantic encoder to obtain an image semantic vector; and the image instance features are encoded by the instance encoder to obtain an image instance vector.
[0009] The image domain vector, the image semantic vector, and the image instance vector are decoded by the original decoder to obtain the image prediction category.
[0010] The image domain vector, the image semantic vector, and the image instance vector are input into a preset decoupling model for decoupling processing to obtain image decoupling information;
[0011] Loss calculation is performed based on the preset image reference category, the image prediction category, and the image decoupling information to obtain target loss data;
[0012] The parameters of the original image classification model are adjusted based on the target loss data to obtain the target image classification model.
[0013] The acquired target image features are input into the target image classification model for image classification processing to obtain the target image category.
[0014] In some embodiments, the decoupling model includes: a discriminator, a classifier, and a parser; the image decoupling information includes: image domain information, image semantic information, and image instance information; the step of inputting the image domain vector, the image semantic vector, and the image instance vector into a preset decoupling model for decoupling processing to obtain image decoupling information includes:
[0015] The image domain information is obtained by extracting domain information from the image domain vector using the discriminator.
[0016] The semantic information of the image is obtained by extracting semantic information from the semantic vector of the image using the classifier;
[0017] The parser extracts instance information from the image instance vector to obtain the image instance information.
[0018] In some embodiments, the classifier includes: a similarity calculation layer, a clustering layer, and a semantic extraction layer; the step of extracting semantic information from the image semantic vector using the classifier to obtain image semantic information includes:
[0019] Using the image domain information as a filtering condition, the image semantic vector is filtered through the similarity calculation layer to obtain a reference semantic vector;
[0020] The similarity calculation layer performs similarity measurement on the image semantic vector and the reference semantic vector to obtain similarity data.
[0021] The image semantic vectors are clustered through the clustering layer to obtain a target vector set; wherein the reference semantic vectors serve as cluster centers and the similarity data serve as clustering parameters.
[0022] The semantic extraction layer obtains the structural information of the target vector set, thereby obtaining the image semantic information.
[0023] In some embodiments, the classifier includes: a similarity calculation layer, a clustering layer, and a semantic extraction layer; the step of extracting semantic information from the image semantic vector using the classifier to obtain image semantic information includes:
[0024] Using the image domain information as a filtering condition, the image semantic vector is filtered through the similarity calculation layer to obtain a reference semantic vector;
[0025] The similarity calculation layer performs similarity measurement on the image semantic vector and the reference semantic vector to obtain similarity data.
[0026] The image semantic vectors are clustered through the clustering layer to obtain a target vector set; wherein the reference semantic vectors serve as cluster centers and the similarity data serve as clustering parameters.
[0027] The semantic extraction layer obtains the structural information of the target vector set, thereby obtaining the image semantic information.
[0028] In some embodiments, the step of calculating the loss based on a preset image reference category, the image prediction category, and the image decoupling information to obtain target loss data specifically includes:
[0029] The loss is calculated based on the image reference category and the image prediction category to obtain classification loss data;
[0030] Loss calculation is performed on the image decoupling information to obtain decoupling loss data;
[0031] The classification loss data and the decoupling loss data are concatenated to obtain the target loss data.
[0032] In some embodiments, the step of calculating the loss on the image decoupling information to obtain decoupling loss data includes:
[0033] The image domain information is subjected to loss calculation to obtain the image domain loss value;
[0034] The semantic information of the image is subjected to loss calculation to obtain the image semantic loss value;
[0035] Loss calculation is performed on the image instance information to obtain the image instance loss value;
[0036] The image domain loss value, the image semantic loss value, and the image instance loss value are merged to obtain the decoupling loss data.
[0037] In some embodiments, the target image features include target semantic features; the step of inputting the acquired target image features into the target image classification model for image classification processing to obtain the target image category includes:
[0038] The target image features are input into the target image classification model; wherein, the target image classification model includes: a target encoder and a target decoder;
[0039] The target semantic features are encoded by the target encoder to obtain the target semantic vector;
[0040] The target semantic vector is decoded by the target decoder to obtain the target image category.
[0041] To achieve the above objectives, a second aspect of this application provides an image classification apparatus, the apparatus comprising:
[0042] The feature acquisition module is used to acquire training image features; wherein, the training image features include: image domain features, image semantic features, and image instance features;
[0043] The feature input module is used to input the training image features into a preset original image classification model; wherein, the original image classification model includes: a domain encoder, a semantic encoder, an instance encoder, and a decoder;
[0044] The encoding module is used to encode the image domain features through the domain encoder to obtain an image domain vector, to encode the image semantic features through the semantic encoder to obtain an image semantic vector, and to encode the image instance features through the instance encoder to obtain an image instance vector.
[0045] The decoding module is used to decode the image domain vector, the image semantic vector, and the image instance vector through the decoder to obtain the image prediction category;
[0046] The decoupling module is used to input the image domain vector, the image semantic vector, and the image instance vector into a preset decoupling model for decoupling processing to obtain image decoupling information;
[0047] The loss calculation module is used to perform loss calculation based on the preset image reference category, the image prediction category and the image decoupling information to obtain target loss data;
[0048] The parameter adjustment module is used to adjust the parameters of the original image classification model according to the target loss data to obtain the target image classification model.
[0049] The image classification module is used to input the acquired target image features into the target image classification model for image classification processing to obtain the target image category.
[0050] To achieve the above objectives, a third aspect of this application provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the method described in the first aspect.
[0051] To achieve the above objectives, a fourth aspect of the present application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the method described in the first aspect.
[0052] The image classification method, apparatus, electronic device, and storage medium proposed in this application acquire image domain features, image semantic features, and image instance features. These features are then input into a pre-defined original image classification model. A domain encoder encodes the image domain features to obtain an image domain vector, a semantic encoder encodes the image semantic features to obtain an image semantic vector, and an instance encoder encodes the image instance features to obtain an image instance vector. An original decoder decodes the image domain vector, image semantic vector, and image instance vector to obtain an image prediction category. These vectors are then input into a pre-defined decoupling model for decoupling to obtain image decoupling information. Loss calculations are performed on the pre-defined image reference category, image prediction category, and image decoupling information to obtain target loss data. Based on the target loss data, the parameters of the original image model are adjusted to construct a more accurate target image classification model. Finally, the acquired target image features are input into the target image classification model to obtain a more accurate target image category. Therefore, for complex medical images, image classification from three perspectives—image domain, image semantics, and image instance—can improve the accuracy of medical image classification, thus providing medical researchers with more accurate medical image classification results. Attached Figure Description
[0053] Figure 1 This is a flowchart of the image classification method provided in the embodiments of this application;
[0054] Figure 2 This is a flowchart of an image classification method provided in another embodiment of this application;
[0055] Figure 3 yes Figure 1 The flowchart of step S105 in the process;
[0056] Figure 4 yes Figure 3 The flowchart of step S302 in the document;
[0057] Figure 5 yes Figure 3 The flowchart of step S303 in the process;
[0058] Figure 6 yes Figure 1 The flowchart of step S106 in the process;
[0059] Figure 7 yes Figure 6 The flowchart of step S602 in the document;
[0060] Figure 8 yes Figure 1 The flowchart of step S108 in the process;
[0061] Figure 9 This is a schematic diagram of the structure of the image classification method apparatus provided in the embodiments of this application;
[0062] Figure 10 This is a schematic diagram of the hardware structure of the electronic device provided in the embodiments of this application. Detailed Implementation
[0063] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0064] It should be noted that although functional modules are divided in the device schematic diagram and a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than the module division in the device or the order in the flowchart. The terms "first," "second," etc., in the specification, claims, and the aforementioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.
[0065] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of this application only and is not intended to limit this application.
[0066] First, let's analyze some of the terms used in this application:
[0067] Artificial intelligence (AI) is a new branch of computer science that studies, develops, and applies theories, methods, technologies, and systems to simulate, extend, and expand human intelligence. It aims to understand the essence of intelligence and produce intelligent machines that can react in a way similar to human intelligence. Research in this field includes robotics, speech recognition, image recognition, natural language processing, and expert systems. AI can simulate the information processes of human consciousness and thought. Furthermore, AI utilizes digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceiving the environment, acquiring knowledge, and using that knowledge to achieve optimal results.
[0068] Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. Most UDA methods focus on using convolutional neural network (CNN)-based frameworks to learn domain-invariant feature representations from the domain level or category level. A fundamental problem with category-level UDA is generating pseudo-labels for samples in the target domain, which is often too noisy to achieve accurate domain alignment, inevitably impacting UDA performance. With the success of Transformers in various tasks, we have found that cross-attention in Transformers is robust to noisy input pairs, enabling better feature alignment.
[0069] Decoupling: Coupling refers to the phenomenon where two or more systems or two forms of motion influence each other and even unite through interaction. In mathematics, decoupling means transforming a mathematical equation containing multiple variables into a system of equations that can be expressed by a single variable. That is, the variables no longer simultaneously and directly affect the result of a single equation, thereby simplifying analysis and calculation.
[0070] Image instance segmentation: Image instance segmentation is a further refinement of semantic segmentation, separating the foreground and background of objects to achieve pixel-level object separation. Furthermore, semantic segmentation and instance segmentation are two different concepts. Semantic segmentation only distinguishes between different categories of objects, while instance segmentation further segments different instances of the same object.
[0071] Cross entropy: Cross entropy is an important concept in Shannon's information theory, primarily used to measure the difference in information between two probability distributions. The performance of a language model is usually measured by cross entropy and perplexity. Cross entropy signifies the difficulty of text recognition using the model, or, from a compression perspective, how many bits are needed to encode each word on average.
[0072] The k-means clustering algorithm is an iterative clustering analysis algorithm. Its steps are as follows: first, the data is pre-divided into K groups; then, K objects are randomly selected as initial cluster centers; next, the distance between each object and each seed cluster center is calculated, and each object is assigned to the nearest cluster center. The cluster centers and the objects assigned to them represent a cluster.
[0073] With the development of intelligent technologies, these technologies are being applied to various fields. For example, in the medical field, intelligent image recognition is already assisting medical personnel in conducting medical research. Medical images of different internal tissues are complex, and the accuracy requirements for classifying these images are even higher.
[0074] In related technologies, to improve the simplicity and efficiency of medical image classification, image sample data containing abundant label information is designated as source domain sample data, while image sample data lacking label information is designated as target domain sample data. The label information from the source domain sample data is then transferred to the target domain sample data to achieve medical image classification, making the classification more efficient and simpler. However, traditionally, transferring label information from source domain sample data to target domain sample data mainly involves calculating the image vector distance between the source and target domain sample data. This distance is then used to determine which source domain sample data the target domain sample data is closest to, thus transferring its label information to the target domain sample data for image classification. However, semantic distance alone cannot accurately classify anomalous image data, and classifying target domain sample data solely based on image vector distance has low accuracy.
[0075] Based on this, embodiments of this application provide an image classification method, apparatus, electronic device, and storage medium. The method involves inputting image domain features, image semantic features, and image instance features into an original image classification model. An image domain encoder encodes the image domain features to obtain an image domain vector, a semantic encoder encodes the image semantic features to obtain an image semantic vector, and an instance encoder encodes the image instance features to obtain image instance information. Then, an original decoder decodes the image domain vector, image semantic vector, and image instance vector to obtain the predicted image category. This approach combines multiple image features for more accurate image classification. Simultaneously, the image domain vector, image semantic vector, and image instance vector are input into a decoupling module for decoupling to obtain image decoupling information. This decoupling information allows for assessment of the accuracy of the feature encoding process. Finally, loss calculations are performed on the image reference category, the predicted image category, and the image decoupling information to obtain target loss data. Based on this target loss data, the parameters of the original image classification model are adjusted to obtain the target image classification model. Therefore, by constructing a target image classification model that is more accurate in image classification and capable of classifying abnormal images, the target image features are input into the target image classification model for image classification processing, resulting in more accurate target image categories. This achieves accurate image classification, even for images with misclassification. Thus, for medical images that are prone to anomalies and cannot be classified by traditional models, this application extracts the image domain, image semantics, and image instances of medical images for classification, which not only enables the classification of abnormal medical images but also improves the accuracy of classifying normal medical images.
[0076] The image classification method, apparatus, electronic device, and storage medium provided in this application are specifically described through the following embodiments. First, the image classification method in this application is described.
[0077] The embodiments of this application can acquire and process relevant data based on artificial intelligence technology. Artificial intelligence (AI) refers to the theories, methods, technologies, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results.
[0078] Foundational technologies for artificial intelligence generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies mainly encompass computer vision, robotics, biometrics, speech processing, natural language processing, and machine learning / deep learning.
[0079] The image classification method provided in this application relates to the fields of artificial intelligence and digital medical technology. The image classification method provided in this application can be applied to a terminal, a server, or software running on either a terminal or a server. In some embodiments, the terminal can be a smartphone, tablet, laptop, desktop computer, etc.; the server can be configured as an independent physical server, a server cluster or distributed system composed of multiple physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms; the software can be an application implementing the image classification method, but is not limited to the above forms.
[0080] This application can be used in a wide variety of general-purpose or special-purpose computer system environments or configurations. Examples include: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, and distributed computing environments including any of the above systems or devices. This application can be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform specific tasks or implement specific abstract data types. This application can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.
[0081] It should be noted that in all specific embodiments of this application, when processing data related to user identity or characteristics, such as user information, user behavior data, user image data, user historical data, and user location information, user permission or consent is obtained first. Furthermore, the collection, use, and processing of this data comply with relevant laws, regulations, and standards. In addition, when embodiments of this application require access to sensitive personal information of users, separate permission or consent from the user is obtained through pop-ups or redirection to a confirmation page. Only after obtaining the user's separate permission or consent is the necessary user-related data required for the proper functioning of these embodiments obtained.
[0082] Figure 1 This is an optional flowchart of the image classification method provided in the embodiments of this application. Figure 1 The method may include, but is not limited to, steps S101 to S108.
[0083] Step S101: Obtain training image features; wherein, the training image features include: image domain features, image semantic features, and image instance features;
[0084] Step S102: Input the training image features into a preset original image classification model; wherein, the original image classification model includes: a domain encoder, a semantic encoder, an instance encoder, and an original decoder;
[0085] Step S103: The image domain features are encoded by the domain encoder to obtain the image domain vector; the image semantic features are encoded by the semantic encoder to obtain the image semantic vector; and the image instance features are encoded by the instance encoder to obtain the image instance vector.
[0086] Step S104: The image domain vector, image semantic vector, and image instance vector are decoded by the original decoder to obtain the image prediction category;
[0087] Step S105: Input the image domain vector, image semantic vector and image instance vector into the preset decoupling model for decoupling processing to obtain image decoupling information;
[0088] Step S106: Calculate the loss based on the preset image reference category, image prediction category, and image decoupling information to obtain the target loss data;
[0089] Step S107: Adjust the parameters of the original image classification model based on the target loss data to obtain the target image classification model;
[0090] Step S108: Input the acquired target image features into the target image classification model for image classification processing to obtain the target image category.
[0091] Steps S101 to S108 as shown in the embodiments of this application involve acquiring training image features composed of image domain features, image semantic features, and image instance features. These training image features are then input into a preset original image classification model. An image domain vector is obtained by encoding the image domain features using a domain encoder. An image semantic vector is obtained by encoding the image semantic features using a semantic encoder. An image instance vector is obtained by encoding the image instance features using an instance encoder. The image domain vector, image semantic vector, and image instance vector are then input into the original decoder and the decoupling model, respectively. The image domain vector, image semantic vector, and image instance vector are decoded by the original decoder to obtain the predicted image category. Simultaneously, the decoupling model decouples the image domain vector, image semantic vector, and image instance vector to obtain image decoupling information. This image decoupling information can provide feedback on whether the vectors formed by the image feature encoding are discriminative. Then, the image reference category, the image predicted category, and the image decoupling information are used to calculate the target loss data. Based on the target loss data, the parameters of the original image classification model are adjusted to construct a target image classification model capable of accurately classifying images. Since the target loss data includes loss data related to image decoupling information, adjusting the target image classification model to encode image features more accurately can reduce the impact of noise in the encoding process. Therefore, by inputting the target image features into the target image classification model for image classification processing to obtain the target image category, image classification becomes more accurate.
[0092] For example, in medical image classification, by constructing a classification system that extracts image semantics, image domain, and image instances from medical images, it is possible to consider instance-level image features, and abnormal medical features can also be accurately classified.
[0093] In step S101 of some embodiments, training image features are acquired, i.e., training images are acquired and input into a feature extractor to extract features. The training images can be extracted directly from a target database, or candidate images from multiple internet platforms can be collected in real time, and training images can be selected from the candidate images based on image requirements. No specific restrictions are placed on the method of acquiring training images. This embodiment is applied to medical image classification, and image classification can be divided into disease level classifications down to specific internal or external tissues. Internal tissues include: stomach, abdomen, heart, brain, etc., while external tissues include: arms, feet, hands, head, etc. If the acquired training image is an electroencephalogram (EEG), then brain disease level classification is performed based on the EEG. If the acquired training image is an ultrasound image of the heart, then heart disease level classification is performed based on the ultrasound image. After acquiring the training image, it is input into a feature extractor for feature extraction, extracting image domain features, image semantic features, and image instance features from the training image.
[0094] Specifically, the training images include source domain images and target domain images, and the target domain images do not contain label information. If the data of the source domain images is... There are a total of Zhang Yuanyu's image, Features of the source domain image This is for label information. The data for the target domain image is... There are a total of Zhang target domain image, The training image features are defined as follows: Since the training image features include image domain features, image semantic features, and image instance features, the training image is input into the z-feature extractor to determine the image domain and obtain image domain features. This involves analyzing whether the training image contains label information to determine the image domain features based on the presence or absence of label information. Image domain features include source domain image features and target domain image features, where source domain image features represent the presence of label information, and target domain image features represent the absence of label information. Simultaneously, the feature extractor performs semantic feature extraction on the training image. This involves using a multi-layer extraction mechanism to perform semantic processing on the training image. First, candidate image features are extracted from the training image, and object recognition is performed on these candidate features to obtain image object information. Then, semantic extraction is performed on the image object information to obtain image semantic features. Candidate image features include color features, texture features, and shape features. Object recognition of candidate image features mainly involves similarity matching between the object model in the knowledge base and the extracted image reference features to determine the image object information for each candidate image feature. Object recognition of the image object information involves mapping the image object information and semantic reference features using the recognition rules and methods of the knowledge base to obtain image semantic features. After completing the semantic feature extraction of the image, the instance features of the training image are obtained by using a feature extractor, and the instance features of the image can represent the target object of the training image.
[0095] In step S102 of some embodiments, the original image classification model includes a domain encoder, a semantic encoder, an instance encoder, and an original decoder. By constructing an original image classification model that combines a domain encoder, a semantic encoder, and an instance encoder, the corresponding encoder can be used to encode features for different training image features. The generated training image vector integrates image domain features, image semantic features, and image instance features, enabling abnormal medical images to be classified normally and improving the accuracy of medical image classification.
[0096] It should be noted that medical images, also known as medical imaging, are affected by the internal environment of the body when photographing internal tissues. This can result in noise in the medical images, leading to abnormal medical interpretations. Please refer to [link / reference needed]. Figure 2The domain encoder, semantic encoder, and instance encoder are respectively connected to the feature extractor. Feature extractor Image domain features are input into the domain encoder. Image semantic features are input into the semantic encoder. The image instance features are input into the instance encoder. This allows for the separate encoding and processing of training features for each image. Therefore, by combining image domain features, image semantic features, and image instance features of medical images, the influence of noise can be reduced, and abnormal medical images can be classified normally, making medical image classification more accurate.
[0097] In step S103 of some embodiments, please refer to Figure 2 The image domain features are input to the domain encoder through the feature extractor. Encoding processing is performed, and the image domain features include both source domain image features and target domain features, therefore the domain encoder... The image domain feature encoding outputs an image domain vector as follows: Image domain vector The vector is either 1 or 0, with 1 representing source domain image features and 0 representing target domain image features. Simultaneously, the feature extractor inputs the image semantic features into the semantic encoder. Encoding process is performed to obtain the image semantic vector. That is, selecting semantic vectors from semantic vectors based on image semantic features, and concatenating the selected semantic vectors to obtain the image semantic vector. The image instance features are input into the instance encoder via the feature extractor. Image instance vectors are obtained through encoding processing. That is, through image instance vectors Characterize image instance features so that the image prediction category can be more accurately obtained by decoding the original decoder based on the image domain vector, image semantic vector and image instance vector.
[0098] It should be noted that the domain encoder, semantic encoder, and instance encoder are VAE networks. VAE networks can map the Z in the low-dimensional space to the high-dimensional space, that is, map the low-dimensional training image features to the high-dimensional image vector. The high-dimensional image vector represents the training image features more richly, so the image prediction category obtained by decoding the output high-dimensional image vector is more accurate.
[0099] In step S104 of some embodiments, please refer to Figure 2 The training image vector is obtained by concatenating the image domain vector, image semantic vector, and image instance vector. This training image vector is then input into the original decoder. Decoding is performed, which involves classifying images based on training image vectors to obtain predicted image categories, making image category prediction for medical images more accurate.
[0100] Please see Figure 3 In some embodiments, the decoupling model includes: a discriminator, a classifier, and a parser; the image decoupling information includes: image domain information, image semantic information, and image instance information; step S105 may include, but is not limited to, steps S301 to S303:
[0101] Step S301: Extract domain information from the image domain vector using a discriminator to obtain image domain information;
[0102] Step S302: Extract semantic information from the image semantic vector using a classifier to obtain the image semantic information;
[0103] Step S303: Extract instance information from the image instance vector using a parser to obtain image instance information.
[0104] In step S301 of some embodiments, a decoupled model including a discriminator, a classifier and a parser is constructed, and the image domain vector is input to the discriminator to extract the domain information to obtain the image domain information. That is, the discriminator distinguishes whether the image domain vector represents the training image as a source domain image or a target domain image.
[0105] In step S302 of some embodiments, semantic information is extracted from the image semantic vector by a classifier to obtain image semantic information. This image semantic information can determine whether the semantic information of images from different domains but with the same image category is the same. This is beneficial for adjusting the image semantic vectors of the same image category based on the image semantic information, making the image semantic vectors of images from different domains but with the same image category approximately similar. Decoding the image category based on the image semantic vectors is then more accurate. Therefore, it is necessary to input the image semantic information of both the source domain image and the target domain image into the classifier to extract the image semantic information. This allows for the determination of whether the semantics of images from different domains but with the same image category are the same. In other words, it determines whether the image semantic vectors output by the original image classification model for training images from different image domains are reasonable. This allows for parameter adjustment of the semantic encoder of the original image classification model based on the image semantic information, resulting in more similar image semantic vectors for images of the same image category.
[0106] In step S303 of some embodiments, the parser extracts instance information from the image instance vector, that is, it separates the training image from the image instance vector, that is, it separates the foreground and background in the training image to achieve pixel-level object separation, so as to obtain image instance information that can characterize the image category of the training image.
[0107] In steps S301 to S303 of the embodiments of this application, the discriminator extracts image domain information from the image domain vector, representing which image domain the training image originates from. Simultaneously, the classifier extracts semantic information from the image semantic vector to obtain image semantic information, and based on this semantic information, it can be determined whether training images from different image domains belong to the same image category. The parser extracts instance information from the image instance vector to obtain image instance information, which is used to determine which target objects the training image includes. Wherein, if the training image is a medical image, then the image instance information represents that the medical image contains target objects representing each disease category.
[0108] Please see Figure 4 In some embodiments, the classifier includes: a similarity calculation layer, a clustering layer, and a semantic extraction layer; step S302 may include, but is not limited to, steps S401 to S404:
[0109] Step S401: Using image domain information as a filtering condition, the image semantic vector is filtered through a similarity calculation layer to obtain a reference semantic vector;
[0110] Step S402: The similarity calculation layer is used to calculate the similarity measure between the image semantic vector and the reference semantic vector to obtain similarity data;
[0111] Step S403: Cluster the image semantic vectors through a clustering layer to obtain a target vector set; wherein, the reference semantic vector is used as the cluster center and the similarity data is used as the clustering parameter;
[0112] Step S404: Obtain the structural information of the target vector set through the semantic extraction layer to obtain the image semantic information.
[0113] In step S401 of some embodiments, image domain information is used as a filtering condition, and the image semantic vector is filtered through a similarity calculation layer. That is, image semantic vectors with the same image domain as the image semantic vectors are obtained as reference semantic vectors. For example, if the image domain information indicates that the training image is a source domain image, then the image semantic vector of the training image as a source domain image is obtained as a reference semantic vector. If the image domain information indicates that the training image is a target domain image, then the image semantic vector of the training image as a target domain image is obtained as a reference semantic vector.
[0114] In step S402 of some embodiments, after obtaining image semantic vectors from different image domains as reference semantic vectors, similarity data is obtained by calculating the similarity between each image semantic vector and the reference semantic vector. The similarity data includes any of the following: Manhattan distance, Euclidean distance, and cosine similarity, and there are no specific restrictions on the category of similarity data. If the similarity data is cosine similarity, the cosine similarity between the image semantic vector and the reference semantic vector is calculated. A larger cosine similarity indicates that the image semantic vector and the reference semantic vector are less similar, while a smaller cosine similarity indicates that they are more similar. Therefore, the cosine similarity between the image semantic vector and the reference semantic vector is calculated as similarity data to determine the vector similarity between them.
[0115] In step S403 of some embodiments, a clustering layer uses reference semantic vectors as cluster centers and similarity data as clustering parameters to cluster image semantic vectors. The clustering layer uses the k-means clustering algorithm to cluster the image semantic vectors. Specifically, it compares similarity data with a preset similarity threshold, obtaining similarity data exceeding the threshold as target similarity data. The target similarity data is then sorted, and the top k image semantic vectors are used as the target vector set. If fewer than k image semantic vectors have similarity data exceeding the preset similarity threshold, then the image semantic vector cannot be considered as a vector set. Therefore, by clustering image semantic vectors based on similarity data to obtain the target vector set, vector classification becomes simpler.
[0116] In step S404 of some embodiments, by obtaining the structural information of the target vector set, that is, obtaining the semantic information of the reference semantic vector in the target vector set, since the image structures of the same category are not necessarily similar, all image semantic vectors in the same target vector set are given the same image semantic information, so that the image semantic information of the same image category in the same image domain is the same, so as to extract discriminative image semantic information. Based on the image semantic information, it is easier to determine whether the image categories of training images in the same image domain are the same, so as to determine whether the image semantic vectors generated by training images of the same category in different image domains are similar.
[0117] In steps S401 to S404 of the embodiments of this application, image domain information is used as a filtering condition, and a similarity calculation layer is used to filter the image semantic vectors to obtain reference semantic vectors. That is, image semantic vectors with the same image domain information are selected as reference semantic vectors, and the cosine angle between the reference semantic vector and the image semantic vector with the same image domain information is calculated as similarity data. The similarity data is used as clustering parameters, and the reference semantic vector is used as the cluster center. The image semantic vectors are clustered by the clustering layer to obtain a target vector set. Then, the semantic information of the reference semantic vector in the target vector set is obtained as image semantic information. This allows for the determination of whether the semantic information of images of the same category but different image domains is the same. The parameters of the semantic encoder can be adjusted according to the image semantic information so that the semantic encoder can output approximate and discriminative image semantic vectors for target images of different image domains but the same image category.
[0118] Please see Figure 5 In some embodiments, step S303 may include, but is not limited to, steps S501 to S503:
[0119] Step S501: Perform image reconstruction processing on the image instance vector through the reconstruction layer, using the instance reference image;
[0120] Step S502: Target detection is performed on the instance reference image through the target detection layer to obtain image target information;
[0121] Step S503: Using the image target information as a segmentation parameter, the instance reference image is segmented through the instance segmentation layer to obtain the image instance information.
[0122] In step S501 of some embodiments, the image instance information is reconstructed by a reconstruction layer to obtain an instance reference image, and the instance reference image is similar to the training image.
[0123] In step S502 of some embodiments, target detection is performed on the instance reference image by the target detection layer, that is, the target object in the instance reference image is identified and the target object is marked, that is, a recognition box is established for the target in the instance reference image, or each target object is represented by a different color to obtain image target information.
[0124] In step S503 of some embodiments, the image instance information includes the location and identifier of each target object. This image target information is used as a segmentation parameter to perform image instance segmentation on the instance reference image through an instance segmentation layer. In other words, each target object is segmented from the background to obtain the image instance information. This is to improve the accuracy of image classification by separating the image instance information.
[0125] In steps S501 to S503 of the embodiments of this application, the image instance vector is reconstructed by the reconstruction layer to obtain the instance reference image. Then, the instance reference image is detected by the target detection layer to detect the target object in the instance reference image and obtain the image target information. Based on the image target information, the instance reference image is segmented by the instance segmentation layer to obtain the image instance information. The instance loss data can be calculated based on the image instance information. Based on the instance loss data, it is determined whether the instance features extracted by the original image classification model have discriminative power, so that the original image classification model can be trained based on the instance loss data to accurately identify abnormal medical images.
[0126] Please see Figure 6 In some embodiments, step S106 includes, but is not limited to, steps S601 to S603:
[0127] Step S601: Calculate the loss based on the image reference category and the image prediction category to obtain classification loss data;
[0128] Step S602: Calculate the loss of the image decoupling information to obtain decoupling loss data;
[0129] Step S603: The classification loss data and decoupling loss data are concatenated to obtain the target loss data.
[0130] In step S601 of some embodiments, the image reference category is the label information of the source domain image, while the target domain image has no label information. Therefore, the label information of the source domain image is used as the image reference category, and the image reference category and the image prediction category of the source domain image are used to calculate the classification loss data. The classification loss data can characterize the classification accuracy of the original image classification model for the source domain image.
[0131] It should be noted that the classification loss data is obtained by calculating the loss of the image reference category and the image prediction category through a preset loss function, and the loss function is the ELBO function. The calculation of the image reference category and the image prediction category by the ELBO function is shown in formula (1):
[0132]
[0133]
[0134] (1)
[0135]
[0136] In the formula, For a domain encoder, For semantic encoders, For example encoder, For the original decoder, For image domain vectors, For image semantic vectors, For image instance vectors, The probability distribution of the image reference category. The probability distribution for predicting the category of an image. This is a letter of loss for ELBO.
[0137] In step S602 of some embodiments, since the training images include source domain images and target domain images, it is impossible to determine whether the original image classification model can accurately classify target domain images without label information using only the classification loss data. Furthermore, the subsequently constructed target image classification model is applied to unlabeled image data. Therefore, decoupling loss data is obtained by calculating the loss of image decoupling information. Since image decoupling information represents image domain information, image semantic information, and image instance information in the image vector after encoding the training image, the decoupling loss data adjusts the model parameters of the original image classification model to construct an image vector that can output discriminative images after encoding the target image, thereby improving the accuracy of image classification.
[0138] In step S603 of some embodiments, target loss data is obtained by concatenating classification loss data and decoupling loss data. The model parameters of the original image classification model are adjusted using the target loss data to construct a more accurate target image classification model.
[0139] In steps S601 to S603 of the embodiments of this application, classification loss data is obtained by calculating the loss based on the image reference category and the image prediction category. The classification loss data can determine the accuracy of the original image classification model in classifying the source domain image. Then, the image decoupling information is used to calculate the loss to obtain decoupling loss data. The decoupling loss data can determine whether the original image classification model extracts image vectors with strong discriminative power for images in different domains. The classification loss data and the decoupling loss data are then concatenated to obtain target loss data. The parameters of the original image classification model are adjusted based on the target loss data to construct a target image classification model with more accurate image classification.
[0140] Please see Figure 7 In some embodiments, step S602 may include, but is not limited to, steps S701 to S704:
[0141] Step S701: Perform loss calculation on the image domain information to obtain the image domain loss value;
[0142] Step S702: Perform loss calculation on the image semantic information to obtain the image semantic loss value;
[0143] Step S703: Perform loss calculation on the image instance information to obtain the image instance loss value;
[0144] Step S704: The image domain loss value, image semantic loss value, and image instance loss value are merged to obtain decoupling loss data.
[0145] In step S701 of some embodiments, an image domain loss value is obtained by performing loss calculation on the image domain information. The accuracy of the image domain vector constructed by the domain encoder is judged based on the image domain loss value, so as to adjust the model parameters of the original image classification model according to the image domain loss value. This can not only make the image domain vector output by the domain encoder more accurate after adjusting the parameters, but also help the semantic encoder output the image semantic vector to be more discriminative, thereby improving the accuracy of image classification.
[0146] It should be noted that the image domain information is used to calculate the image domain loss value through the cross-entropy function, which simplifies the calculation of the image domain loss value.
[0147] In step S702 of some embodiments, loss calculation is performed on the image semantic information. Since image semantic information represents the structural information of the same target vector set, that is, the semantic information of images in the same image domain and the same image category is the same. The loss data of image semantic information is calculated, that is, the difference between the image semantic information of different image domains of the same image category is calculated, so as to adjust the semantic encoder according to the image semantic loss value, so that the semantic encoder outputs image semantic vectors of the same image category, thereby making the image classification more accurate.
[0148] Specifically, semantic information of images with the same image category but different image domains is obtained and defined as source domain semantic information and target domain semantic information, respectively. The cosine similarity between the image semantic vectors corresponding to the source domain semantic information and the target domain semantic information is calculated, that is, the cosine similarity between the cluster centers of the target domain images and the cluster centers of the source domain images of the same image category is calculated, and the cosine similarity is used as the image semantic loss value. This allows the semantic encoder in the original image classification model to be adjusted according to the image semantic loss value so that the semantic encoder encodes similar image semantic vectors for images of the same image category.
[0149] In step S703 of some embodiments, the image instance information is used to calculate the image instance loss value, and the image instance loss value represents the accuracy of the parser in segmenting the image instance. The original image classification model is then adjusted according to the image instance loss value, so the image instance vector output by the instance encoder of the original image classification model is more accurate. The image instance vector can effectively reduce the noise influence in the image classification process, thereby improving the classification accuracy of the image classification model.
[0150] It should be noted that the specific formula for calculating the loss using image instance information is shown in formula (2):
[0151] (2)
[0152] In the formula, For image instance loss values, The set of latent variables representing all images in the source and target domain images. It can help the parser output discriminative image instance information, thereby enabling the separation of image instance information. This ensures that the intrinsic characteristics of the instance are preserved.
[0153] In step S704 of some embodiments, decoupling loss data is obtained by merging image domain loss value, image semantic loss value and image instance loss value, which makes the construction of decoupling loss data simple.
[0154] Specifically, the formula for calculating the decoupling loss data is shown in formula (3):
[0155] (3)
[0156] In the formula, For image domain loss values, The image semantic loss value, Image instance loss value.
[0157] In steps S701 to S704 of the embodiments of this application, an image domain loss value is obtained by performing loss calculation on the image domain information. The image domain loss value characterizes whether the image domain vector output by the domain encoder is accurate. An image semantic loss value is obtained by performing loss calculation on the image semantic information. The image semantic loss value can characterize whether the image semantic vectors encoded by the semantic encoder for different image domains but the same image category are the same, and can determine whether a discriminative image semantic vector is encoded. An image instance loss value is obtained by performing loss calculation on the image instance information. The image instance loss value can characterize whether the instance segmentation of the training image is accurate. Finally, the image domain loss value, the image semantic loss value, and the image instance loss value are merged to obtain decoupling loss data, so as to adjust the parameters of the original image classification model according to the decoupling loss data, so as to construct a target image classification model with more accurate image classification.
[0158] In step S107 of some embodiments, the classification loss data and decoupling loss data are added together to obtain the target loss data, and the parameters of the original image classification model are adjusted according to the target loss data to obtain the target image classification model. Specifically, the semantic encoder in the original image classification model is adjusted using the target loss data to obtain the target encoder, so that the target encoder can output a discriminative image semantic vector, thereby improving the accuracy of image classification. Since the construction of the target image classification model consists of a target encoder and a target decoder, and the target image input to the target image classification model does not carry label information, it is not necessary to extract the image domain features of the target image; only the image semantic features of the target image are extracted. Since the adjusted encoder can output a discriminative image semantic vector, the target image category obtained by the target decoder from the image semantic vector is also more accurate.
[0159] In some embodiments, please refer to Figure 8 The target image features include target semantic features, and step S108 may include, but is not limited to, steps S801 to S803:
[0160] Step S801: Input the target image features into the target image classification model; wherein, the target image classification model includes: a target encoder and a target decoder;
[0161] Step S802: The target semantic features are encoded by the target encoder to obtain the target semantic vector;
[0162] Step S803: The target semantic vector is decoded by the target decoder to obtain the target image category.
[0163] In step S801 of some embodiments, after constructing a target image classification model including a target encoder and a target decoder, and the target encoder is a semantic encoder with adjusted parameters, the target image is input to a feature extractor to extract target semantic features, and the target semantic features are input to the target image classification model for image classification.
[0164] In step S802 of some embodiments, the target semantic features are encoded by a target encoder to obtain a target semantic vector. Since the target encoder is a semantic encoder with adjustable parameters, and the target encoder can encode the target semantic features into a discriminative target semantic vector, and the parameters are also adjusted according to the instance loss value of the image instance information during the semantic encoder training process, the target semantic vector output by the semantic encoder can reduce the influence of noise. Therefore, the constructed target semantic vector can improve the accuracy of image classification.
[0165] In step S803 of some embodiments, the target semantic vector is decoded by the target decoder, and the decoding operation of the target decoder is the same as that of the original decoder. That is, the target semantic vector is image category identified to obtain the target image category, so that the classification of the target image is more accurate.
[0166] In steps S801 to S803 of the embodiments of this application, the target semantic features are input into the target image classification model, and the target image classification model includes a target encoder and a target decoder. The target semantic features are encoded by the target encoder to obtain a discriminative target semantic vector, and the target semantic vector is decoded by the target decoder to obtain the target image category by image category recognition based on the target semantic vector, so that the target image classification is more accurate.
[0167] Please refer to Figure 2 In this embodiment, medical images from the medical field are used as training images. These training images are input to a feature extractor to extract image domain features, image semantic features, and image instance features. The image domain features are input to a domain encoder, the image semantic features to a semantic encoder, and the image instance features to an instance encoder. Domain encoder Image domain features are encoded to obtain image domain vectors; semantic encoder Image semantic features are encoded to obtain image semantic vectors; instance encoder. Image instance features are encoded to obtain image instance vectors. The image domain vector, image semantic vector, and image instance vector are then integrated to obtain training image vectors, which are input into the original decoder. Decoding is performed to obtain the predicted image category. Simultaneously, the training image vectors are input into a decoupled model, where a discriminator extracts image domain information from the image domain vectors, representing which image domain the training image originates from. This image domain information is used as a filtering condition, and a similarity calculation layer filters the image semantic vectors to obtain reference semantic vectors. The cosine angle between the reference semantic vector and the image semantic vector with the same image domain information is calculated as similarity data. Using the similarity data as clustering parameters and the reference semantic vector as a clustering element, a clustering layer clusters the image semantic vectors to obtain a target vector set. The semantic information of the reference semantic vectors in the target vector set is then obtained as image semantic information. A reconstruction layer reconstructs the image instance vectors to obtain instance reference images. A target detection layer then detects the target objects in the instance reference images to obtain image target information. Based on this image target information, an instance segmentation layer performs image instance segmentation on the instance reference images to obtain image instance information. Image domain loss values are calculated from image domain information, image semantic loss values from image semantic information, and image instance loss values from image instance information. These loss values are then merged to obtain decoupling loss data. Classification loss data is calculated based on the image reference category and the image prediction category. Finally, the classification loss data and decoupling loss data are concatenated to obtain target loss data. The target loss data is used to adjust the parameters of the semantic encoder in the original image classification model, enabling it to output discriminative image semantic vectors. The target encoder encodes the target semantic features to obtain discriminative target semantic vectors, which are then decoded by the target decoder to determine the target image category, resulting in more accurate target image classification. Therefore, by constructing a target image classification model that combines image domain information, image semantic information, and image instance information with comparative learning, clustering, and instance separation methods, it is possible to accurately classify abnormal medical images and more precisely classify normal medical images.
[0168] Please see Figure 9 This application also provides an image classification apparatus that can implement the above-described image classification method. The apparatus includes:
[0169] The feature acquisition module is used to acquire training image features; the training image features include: image domain features, image semantic features, and image instance features.
[0170] The feature input module is used to input the training image features into a preset original image classification model; wherein, the original image classification model includes: a domain encoder, a semantic encoder, an instance encoder, and a decoder;
[0171] The encoding module is used to encode the image domain features through a domain encoder to obtain an image domain vector, to encode the image semantic features through a semantic encoder to obtain an image semantic vector, and to encode the image instance features through an instance encoder to obtain an image instance vector.
[0172] The decoding module is used to decode the image domain vector, image semantic vector, and image instance vector through the decoder to obtain the image prediction category;
[0173] The decoupling module is used to input the image domain vector, image semantic vector and image instance vector into the preset decoupling model for decoupling processing to obtain image decoupling information;
[0174] The loss calculation module is used to calculate the target loss data based on the preset image reference category, image prediction category and image decoupling information.
[0175] The parameter adjustment module is used to adjust the parameters of the original image classification model based on the target loss data to obtain the target image classification model;
[0176] The image classification module is used to input the acquired target image features into the target image classification model for image classification processing to obtain the target image category.
[0177] The specific implementation of this image classification device is basically the same as the specific implementation of the image classification method described above, and will not be repeated here.
[0178] This application also provides an electronic device, which includes a memory and a processor. The memory stores a computer program, and the processor executes the computer program to implement the above-described image classification method. This electronic device can be any smart terminal, including tablet computers, in-vehicle computers, etc.
[0179] Please see Figure 10 , Figure 10 The hardware structure of an electronic device according to another embodiment is illustrated. The electronic device includes:
[0180] The processor 1001 can be implemented using a general-purpose CPU (Central Processing Unit), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement the technical solutions provided in the embodiments of this application.
[0181] The memory 1002 can be implemented as a read-only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 1002 can store the operating system and other application programs. When the technical solutions provided in the embodiments of this specification are implemented through software or firmware, the relevant program code is stored in the memory 1002 and is called and executed by the processor 1001 using the image classification method of the embodiments of this application.
[0182] Input / output interface 1003 is used to implement information input and output;
[0183] The communication interface 1004 is used to enable communication and interaction between this device and other devices. Communication can be achieved through wired means (such as USB, network cable, etc.) or wireless means (such as mobile network, WIFI, Bluetooth, etc.).
[0184] Bus 1005 transmits information between various components of the device (e.g., processor 1001, memory 1002, input / output interface 1003, and communication interface 1004);
[0185] The processor 1001, memory 1002, input / output interface 1003 and communication interface 1004 are connected to each other within the device via bus 1005.
[0186] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described image classification method.
[0187] Memory, as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs and non-transitory computer-executable programs. Furthermore, memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory may optionally include memory remotely located relative to the processor, and these remote memories can be connected to the processor via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
[0188] The image classification method, apparatus, electronic device, and storage medium provided in this application acquire image domain features, image semantic features, and image instance features. These features are then input into a preset original image classification model. A domain encoder encodes the image domain features to obtain an image domain vector, a semantic encoder encodes the image semantic features to obtain an image semantic vector, and an instance encoder encodes the image instance features to obtain an image instance vector. An original decoder decodes the image domain vector, image semantic vector, and image instance vector to obtain an image prediction category. The image domain vector, image semantic vector, and image instance vector are then input into a preset decoupling model for decoupling to obtain image decoupling information. Loss calculations are performed on a preset image reference category, image prediction category, and image decoupling information to obtain target loss data. The parameters of the original image model are adjusted based on the target loss data to construct a target image classification model that can classify images more accurately. The acquired target image features are then input into the target image classification model for image classification to obtain a more accurate target image category, thereby achieving accurate classification of medical images.
[0189] The embodiments described in this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided by the embodiments of this application. As those skilled in the art will know, with the evolution of technology and the emergence of new application scenarios, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.
[0190] Those skilled in the art will understand that the technical solutions shown in the figures do not constitute a limitation on the embodiments of this application, and may include more or fewer steps than shown, or combine certain steps, or different steps.
[0191] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs.
[0192] Those skilled in the art will understand that all or some of the steps in the methods disclosed above, as well as the functional modules / units in the systems and devices, can be implemented as software, firmware, hardware, or suitable combinations thereof.
[0193] The terms “first,” “second,” “third,” “fourth,” etc. (if present) in the specification and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms “comprising” and “having,” and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0194] It should be understood that in this application, "at least one (item)" means one or more, and "more than" means two or more. "And / or" is used to describe the relationship between related objects, indicating that three relationships can exist. For example, "A and / or B" can represent three cases: only A exists, only B exists, and both A and B exist simultaneously, where A and B can be singular or plural. The character " / " generally indicates that the preceding and following related objects are in an "or" relationship. "At least one (item) of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one (item) of a, b, or c can represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", where a, b, and c can be single or multiple.
[0195] In the several embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of the units described above is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.
[0196] The units described above as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0197] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or as a software functional unit.
[0198] If the integrated unit is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes multiple instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods of the various embodiments of this application. The aforementioned storage medium includes various media capable of storing programs, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0199] The preferred embodiments of the present application have been described above with reference to the accompanying drawings, but this does not limit the scope of the claims of the present application. Any modifications, equivalent substitutions, and improvements made by those skilled in the art without departing from the scope and substance of the embodiments of the present application shall be within the scope of the claims of the present application.
Claims
1. An image classification method, characterized in that, The method includes: Acquire training image features; wherein the training image features include: image domain features, image semantic features, and image instance features; wherein the image domain features include source domain image features and target domain image features, and the source domain image features represent the presence of label information, and the target domain image features represent the absence of label information; The training image features are input into a preset original image classification model; wherein, the original image classification model includes: a domain encoder, a semantic encoder, an instance encoder, and an original decoder; The image domain features are encoded by the domain encoder to obtain an image domain vector; the image semantic features are encoded by the semantic encoder to obtain an image semantic vector; and the image instance features are encoded by the instance encoder to obtain an image instance vector. The image domain vector, the image semantic vector, and the image instance vector are decoded by the original decoder to obtain the image prediction category; wherein, the image domain vector represents the source domain image feature or the target domain image feature, the image semantic vector selects a semantic vector from the semantic vectors according to the image semantic feature, and the selected semantic vectors are concatenated to obtain the image instance vector, which includes the target object of the training image; The image domain vector, the image semantic vector, and the image instance vector are input into a preset decoupling model for decoupling processing to obtain image decoupling information. The image decoupling information includes: image domain information, image semantic information, and image instance information. The image domain information represents whether the training image belongs to the source domain or the target domain. The image semantic information is used to determine whether the semantic information of images from different domains but with the same image category is the same. The image instance information represents the foreground and background in the training image. Loss calculation is performed based on the preset image reference category, the image prediction category, and the image decoupling information to obtain target loss data; The parameters of the original image classification model are adjusted based on the target loss data to obtain the target image classification model. The acquired target image features are input into the target image classification model for image classification processing to obtain the target image category; Loss calculation is performed based on the preset image reference category, the image prediction category, and the image decoupling information to obtain target loss data, including: The loss is calculated based on the image reference category and the image prediction category to obtain classification loss data; Loss calculation is performed on the image decoupling information to obtain decoupling loss data; The classification loss data and the decoupling loss data are concatenated to obtain the target loss data.
2. The method according to claim 1, characterized in that, The decoupling model includes a discriminator, a classifier, and a parser; the process of inputting the image domain vector, the image semantic vector, and the image instance vector into the preset decoupling model for decoupling processing to obtain image decoupling information includes: The image domain information is obtained by extracting domain information from the image domain vector using the discriminator. The semantic information of the image is obtained by extracting semantic information from the semantic vector of the image using the classifier; The parser extracts instance information from the image instance vector to obtain the image instance information.
3. The method according to claim 2, characterized in that, The classifier includes: a similarity calculation layer, a clustering layer, and a semantic extraction layer; the process of extracting semantic information from the image semantic vector using the classifier to obtain image semantic information includes: Using the image domain information as a filtering condition, the image semantic vector is filtered through the similarity calculation layer to obtain a reference semantic vector; The similarity calculation layer performs similarity measurement on the image semantic vector and the reference semantic vector to obtain similarity data. The image semantic vectors are clustered through the clustering layer to obtain a target vector set; wherein the reference semantic vectors serve as cluster centers and the similarity data serve as clustering parameters. The semantic extraction layer obtains the structural information of the target vector set, thereby obtaining the image semantic information.
4. The method according to claim 2, characterized in that, The parser includes a reconstruction layer, an object detection layer, and an instance segmentation layer; the step of extracting instance information from the image instance vector using the parser to obtain image instance information includes: The image instance vector is reconstructed using the reconstruction layer to obtain an instance reference image; The target detection layer is used to perform target detection on the instance reference image to obtain image target information; Using the image target information as segmentation parameters, the instance reference image is segmented through the instance segmentation layer to obtain image instance information.
5. The method according to claim 2, characterized in that, The step of performing loss calculation on the image decoupling information to obtain decoupling loss data includes: The image domain information is subjected to loss calculation to obtain the image domain loss value; The semantic information of the image is subjected to loss calculation to obtain the image semantic loss value; Loss calculation is performed on the image instance information to obtain the image instance loss value; The image domain loss value, the image semantic loss value, and the image instance loss value are merged to obtain the decoupling loss data; wherein, the image domain loss value is used to characterize the accuracy of the image domain vector constructed by the domain encoder, the image semantic loss value characterizes the difference between the image semantic information of different image domains of the same image category, and the image instance loss value characterizes the accuracy of the parser in performing image instance segmentation.
6. The method according to any one of claims 1 to 5, characterized in that, The target image features include target semantic features; The step of inputting the acquired target image features into the target image classification model for image classification processing to obtain the target image category includes: The target image features are input into the target image classification model; wherein, the target image classification model includes: a target encoder and a target decoder; The target semantic features are encoded by the target encoder to obtain the target semantic vector; The target semantic vector is decoded by the target decoder to obtain the target image category.
7. An image classification device, characterized in that, The device includes: The feature acquisition module is used to acquire training image features; wherein the training image features include: image domain features, image semantic features, and image instance features; wherein the image domain features include source domain image features and target domain image features, and the source domain image features represent the presence of label information, and the target domain image features represent the absence of label information; The feature input module is used to input the training image features into a preset original image classification model; wherein, the original image classification model includes: a domain encoder, a semantic encoder, an instance encoder, and a decoder; An encoding module is used to encode the image domain features using a domain encoder to obtain an image domain vector, to encode the image semantic features using a semantic encoder to obtain an image semantic vector, and to encode the image instance features using an instance encoder to obtain an image instance vector. The image domain vector represents the source domain image features or the target domain image features. The image semantic vector is obtained by selecting a semantic vector from the semantic vectors based on the image semantic features and concatenating the selected semantic vectors. The image instance vector includes the target object of the training image. The decoding module is used to decode the image domain vector, the image semantic vector, and the image instance vector through the decoder to obtain the image prediction category; A decoupling module is used to input the image domain vector, the image semantic vector, and the image instance vector into a preset decoupling model for decoupling processing to obtain image decoupling information; wherein, the image decoupling information includes: image domain information, image semantic information, and image instance information, the image domain information represents whether the training image belongs to the source domain image or the target domain image, the image semantic information table is used to determine whether the semantic information of images of different domains but the same image category is the same, and the image instance information represents the foreground and background in the training image; The loss calculation module is used to perform loss calculation based on the preset image reference category, the image prediction category and the image decoupling information to obtain target loss data; The parameter adjustment module is used to adjust the parameters of the original image classification model according to the target loss data to obtain the target image classification model. The image classification module is used to input the acquired target image features into the target image classification model for image classification processing to obtain the target image category; Loss calculation is performed based on the preset image reference category, the image prediction category, and the image decoupling information to obtain target loss data, including: The loss is calculated based on the image reference category and the image prediction category to obtain classification loss data; Loss calculation is performed on the image decoupling information to obtain decoupling loss data; The classification loss data and the decoupling loss data are concatenated to obtain the target loss data.
8. An electronic device, characterized in that, The electronic device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to implement the image classification method according to any one of claims 1 to 6.
9. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by a processor, it implements the image classification method according to any one of claims 1 to 6.