A cross-domain small sample classification method and system based on local target data enhancement
By using local target data augmentation and contrastive learning methods, the problem of poor generalization performance of cross-domain small sample classification models in the target domain is solved, and the model's ability to discriminate the target domain is improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- JIANGXI NORMAL UNIV
- Filing Date
- 2024-09-24
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies exhibit poor generalization performance for small-sample classification models during cross-domain learning, especially when there is a significant domain shift between the target domain and the training source domain.
A local target data augmentation method is adopted, which processes image data through cut-and-paste blending and random data augmentation, analyzes the prediction relationship between the augmented image and the original image, and uses a contrastive learning method to compare the relationship between local features in order to improve the model's ability to discriminate the target domain.
It significantly improves the generalization ability of cross-domain small sample classification models and enhances the ability to discriminate data in the target domain.
Smart Images

Figure CN119131506B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image recognition processing, and in particular to a cross-domain few-sample classification method and system based on local target data augmentation. Background Technology
[0002] With the rapid development of the field of computer vision, few-sample image classification methods have gradually become an important technical means in handling image classification tasks in new or special fields because they can achieve good results with a small number of labeled samples when faced with limited labeled information.
[0003] In existing technologies, when using few-shot image classification models for cross-domain learning, multiple source domains are usually introduced to optimize the model in a specific domain. However, when there is a significant domain offset between the target domain and the training source domain, the performance of the model will be greatly affected. Due to the existence of domain spacing, the generalization performance of existing cross-domain few-shot models in the target domain is poor.
[0004] Therefore, how to design a few-shot classification method to improve the generalization ability of few-shot classification models to the target domain when learning across domains is a key question. Summary of the Invention
[0005] Based on this, the purpose of this invention is to provide a cross-domain few-shot classification method and system based on local target data augmentation. By designing a cut-and-mix method, local data augmentation is performed on image data from the source domain and a small amount of image data from the target domain, so that the final few-shot classification model can better adapt to the target domain. Then, the two sets of query set images are cut-and-mixed to obtain the augmented query image through random data augmentation. The relationship between the predictions of the augmented image and the original image is analyzed, and the relationship between their local features is compared using a contrastive learning method to improve the discrimination ability of the final few-shot classification model for target domain data. This invention greatly improves the generalization ability of cross-domain few-shot classification.
[0006] This invention proposes a cross-domain few-sample classification method based on local target data augmentation, comprising:
[0007] Obtain a query image dataset and a support image dataset. The query image dataset includes source domain query image data and target domain query image data. The support image dataset includes source domain support image data and target domain support image data. Perform random spatial data augmentation on the query image dataset to obtain an enhanced query image dataset. The enhanced query image dataset includes source domain enhanced query image data and target domain enhanced query image data.
[0008] The query image dataset and the enhanced query image dataset are input into the cropping and blending module. The query image dataset is cropped and blended to obtain the original image data of the query set. The enhanced query image dataset is cropped and blended to obtain the enhanced image data of the query set.
[0009] The original image data of the query set, the enhanced image data of the query set, and the supporting image dataset are input into the feature processing module to obtain domain-specific features and domain-independent features. Domain labels are obtained based on the domain-specific features and then input into the domain classifier to obtain domain classification scores.
[0010] The domain-independent features are input into the few-sample classification module for classification prediction, and then local features are optimized based on the local feature contrast learning algorithm.
[0011] In summary, based on the aforementioned cross-domain few-shot classification method using local target data augmentation, this invention designs a cut-and-mix method to perform local data augmentation on image data from the source domain and a small amount of image data from the target domain. This allows the final few-shot classification model to better adapt to the target domain. Then, by randomly augmenting two sets of query set images, cut-and-mix is performed to obtain augmented query images. The relationship between the predictions of the augmented images and the original images is analyzed, and a contrastive learning method is used to compare the relationship between their local features. This improves the discriminative ability of the final few-shot classification model for target domain data. This invention significantly enhances the generalization ability of cross-domain few-shot classification. Specifically, the process involves acquiring a query image dataset and a support image dataset. The query image dataset includes source domain query image data and target domain query image data. The support image dataset includes source domain support image data and target domain support image data. Random spatial data augmentation is then performed on the query image dataset to obtain an enhanced query image dataset. This enhanced query image dataset includes source domain enhanced query image data and target domain enhanced query image data, significantly enriching the enhanced images of the same image without causing significant semantic differences between the enhanced and original images. The query image dataset and the enhanced query image dataset are then input into a cropping and blending module. Based on the query image dataset... The process involves cropping and blending to obtain the original image data of the query set. Then, cropping and blending are performed on the enhanced query image dataset to obtain the enhanced image data of the query set. The original image data, the enhanced image data, and the supporting image dataset are input into a feature processing module to obtain domain-specific and domain-independent features. Domain labels are obtained based on the domain-specific features and input into a domain classifier to obtain a domain classification score. The domain-independent features are input into a few-shot classification module for classification prediction. Finally, local feature optimization is performed using a local feature contrastive learning algorithm, which greatly improves the discrimination ability of the final few-shot classification model for target domain data. This invention significantly improves the generalization ability of cross-domain few-shot classification.
[0012] Furthermore, the steps of cropping and blending based on the query image dataset to obtain the original image data of the query set, and cropping and blending based on the enhanced query image dataset to obtain the enhanced image data of the query set, specifically include:
[0013] Random spatial data augmentation is performed on the query image dataset. The random spatial data augmentation includes flipping or inverting the images in the query image dataset as a whole or internally to obtain an enhanced query image dataset.
[0014] A single image region is selected from the source domain query image data within the query image dataset. This image region is then replaced with the corresponding image region from the target domain query image data within the query image dataset to generate the original query set image data through a cut-and-mix process. The formula for generating the original query set image data through the cut-and-mix process is as follows:
[0015]
[0016]
[0017] in, This indicates the query set of raw image data. Indicates the aspect ratio of the image's sides. This indicates that the source domain is querying image data. This indicates that the target domain is querying image data. This represents the label data of the raw image data in the query set. This represents the label data of the source domain query image data. This represents the label data for the target domain query image data;
[0018] The enhanced query image dataset is then cropped and blended to generate query set enhanced image data.
[0019] Furthermore, the step of inputting the original image data of the query set, the enhanced image data of the query set, and the supporting image dataset into the feature processing module to obtain domain-specific features and domain-independent features specifically includes:
[0020] The query set raw image data, query set enhanced image data, and supporting image dataset are input into the feature processing module, which includes a twin feature extraction block and a feature decoupling block.
[0021] The first feature extraction block of the twin feature extraction block extracts features from the original image data of the query set and the source domain support image data and target domain support image data in the support image dataset. Then, all the extracted image features are input into the feature decoupling block to decompose the image features into domain-specific features and domain-independent features. The domain-specific features include source domain support image-specific features, target domain support image-specific features, and query set original image-specific features. The domain-independent features include source domain support image-independent features, target domain support image-independent features, and query set original image-independent features.
[0022] The second feature extraction block of the twin feature extraction block extracts features from the query set augmented image data, and then inputs all the extracted image features into the feature decoupling block to decompose and obtain irrelevant features of the query set augmented image. Overfitting is eliminated according to the feature perturbation algorithm, which is as follows:
[0023]
[0024] in, This indicates that the query set contains features irrelevant to the original image. This indicates that the query set enhances image-irrelevant features. The feature perturbation term includes noise sampled from a normal distribution to enhance image-irrelevant features of the query set;
[0025] Domain labels are obtained based on the domain-specific features, and then input into a domain classifier to obtain a domain classification score.
[0026] The domain-independent features are then input into the small sample classification module for classification prediction.
[0027] Furthermore, the step of obtaining domain labels based on domain-specific features and inputting them into a domain classifier to obtain domain classification scores specifically includes:
[0028] Obtain the domain labels of the source domain that support specific features of the image. and target domain labels that support image-specific features To calculate the domain labels of specific features of the original image in the query set. ,in Indicates the aspect ratio of the image's sides;
[0029] Input all domain-specific features and their corresponding domain labels into the domain classifier and calculate the domain classification score;
[0030] Loss optimization is performed based on a domain classification loss function, which is as follows:
[0031]
[0032]
[0033] in, Representation domain classification loss, Represents the cross-entropy loss function. This indicates that the source domain supports image-specific features. This indicates that the target domain supports specific image features. This indicates specific features of the original image in the query set. Representation domain classifier.
[0034] Furthermore, the step of inputting the domain-independent features into the few-sample classification module for classification prediction specifically includes:
[0035] Input the source domain supporting image-independent features, the target domain supporting image-independent features, and the query set's original image-independent features into the few sample classification module;
[0036] The original image-independent features of the query set are classified in small samples with the source domain supporting image-independent features and the target domain supporting image-independent features, respectively.
[0037] The predicted distributions in the source domain and the target domain are calculated separately, and the loss is optimized according to the few-shot classification loss function, which is as follows:
[0038]
[0039] in, This represents the small sample classification loss. Indicates the aspect ratio of the image's sides. Represents the cross-entropy loss function. This represents the predicted distribution of original image-irrelevant features of the query set in the source domain. This represents the predicted distribution of the original image-irrelevant features of the query set over the target domain. This indicates that the source domain supports few-sample classification labels for the image. This indicates that the target domain supports small sample classification labels for the image.
[0040] Furthermore, the small sample classification module also includes:
[0041] The few-shot classification module also includes a few-shot self-supervised learning block. The query set is augmented with image-irrelevant features, which are then input into the few-shot self-supervised learning block. This block performs few-shot classification self-supervised enhancement based on the few-shot classification labels of the original images in the query set, ensuring semantic consistency across different regions of all images. Loss optimization is then performed using the few-shot self-supervised learning loss function, which is as follows:
[0042]
[0043] in, This represents the loss from small-sample self-supervised learning. This represents the predicted distribution of query set augmented image-irrelevant features in the source domain. This represents the predicted distribution of query set augmented image-independent features over the target domain.
[0044] Furthermore, the step of optimizing local features based on the local feature contrast learning algorithm specifically includes:
[0045] Local slicing is performed on the original image data and the enhanced image data of the query set to obtain the original local image and the enhanced local image of the query set.
[0046] The local image features of the original local image and the enhanced local image of the query set are extracted according to the twin feature extraction block in the feature processing module, and then the local image features are decomposed according to the feature decoupling block to obtain the corresponding domain-independent features.
[0047] Randomly select any original local image of the query set as the anchor point, and take all local feature samples of the same type of image in the enhanced local image of the query set as positive samples. Calculate the distance between the anchor point and the local features corresponding to each position of the positive samples based on the cosine distance, so as to compare the distribution distance between the anchor point and the local features in the positive samples in the feature space. At the same time, select all local feature samples of any dissimilar image in the enhanced local image of the query set as negative samples, and compare the distribution distance between the anchor point and the local features in the negative samples in the feature space.
[0048] The loss is optimized using a local feature contrastive learning loss function, which is as follows:
[0049]
[0050] in, This represents the loss from local feature contrast learning. Indicates the number of local features. Represents the local feature ordinal number. Represents the cosine distance. Indicates local features of the anchor point. Represents local features of positive samples. Represents local features of negative samples. Represents boundary parameters, Indicates temperature parameter, This represents the maximum value function.
[0051] This invention proposes a cross-domain few-shot classification system based on local target data augmentation, comprising:
[0052] The image acquisition and processing module is used to acquire a query image dataset and a support image dataset. The query image dataset includes source domain query image data and target domain query image data. The support image dataset includes source domain support image data and target domain support image data. The module performs random spatial data augmentation processing on the query image dataset to obtain an enhanced query image dataset. The enhanced query image dataset includes source domain enhanced query image data and target domain enhanced query image data.
[0053] The cut-and-blend module is used to input the query image dataset and the enhanced query image dataset into the cut-and-blend module, cut and blend according to the query image dataset to obtain the original image data of the query set, and cut and blend according to the enhanced query image dataset to obtain the enhanced image data of the query set;
[0054] The feature processing module is used to input the original image data of the query set, the enhanced image data of the query set, and the supporting image dataset into the feature processing module to obtain domain-specific features and domain-independent features, and to obtain domain labels based on the domain-specific features, so as to input the domain classifier to obtain domain classification scores;
[0055] The classification optimization module is used to input the domain-independent features into the few-sample classification module for classification prediction, and then optimize the local features based on the local feature contrast learning algorithm.
[0056] The present invention also provides a storage medium that stores one or more programs, which, when executed by a processor, implement the cross-domain few-sample classification method based on local target data augmentation as described above.
[0057] The present invention also provides a computer device, the computer device including a memory and a processor, wherein:
[0058] The memory is used to store computer programs;
[0059] When the processor executes the computer program stored in the memory, it implements the cross-domain few-sample classification method based on local target data augmentation as described above. Attached Figure Description
[0060] Figure 1 The flowchart is a cross-domain few-sample classification method based on local target data augmentation proposed in the first embodiment of the present invention;
[0061] Figure 2 This is a flowchart of the cross-domain few-sample classification method based on local target data augmentation proposed in the second embodiment of the present invention;
[0062] Figure 3 This is a schematic diagram of the cross-domain small sample classification system based on local target data augmentation proposed in the third embodiment of the present invention.
[0063] The following detailed description, in conjunction with the accompanying drawings, will further illustrate the present invention. Detailed Implementation
[0064] To facilitate understanding of the present invention, a more complete description will be given below with reference to the accompanying drawings. Several embodiments of the invention are illustrated in the drawings. However, the invention can be implemented in many different forms and is not limited to the embodiments described herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
[0065] It should be noted that when a component is said to be "fixed to" another component, it can be directly on the other component or there may be an intervening component. When a component is said to be "connected to" another component, it can be directly connected to the other component or there may be an intervening component. The terms "vertical," "horizontal," "left," "right," and similar expressions used in this document are for illustrative purposes only.
[0066] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and / or" as used herein includes any and all combinations of one or more of the associated listed items.
[0067] Please see Figure 1 The diagram shows a flowchart of a cross-domain few-shot classification method based on local target data augmentation proposed in the first embodiment of the present invention. This cross-domain few-shot classification method based on local target data augmentation includes steps S01 to S04, wherein:
[0068] Step S01: Obtain the query image dataset and the supporting image dataset. The query image dataset includes source domain query image data and target domain query image data. The supporting image dataset includes source domain supporting image data and target domain supporting image data. Perform random spatial data augmentation on the query image dataset to obtain the enhanced query image dataset. The enhanced query image dataset includes source domain enhanced query image data and target domain enhanced query image data.
[0069] It should be noted that in this embodiment, random spatial data augmentation processing is performed on the query image dataset. The random spatial data augmentation processing includes flipping or inverting the images in the query image dataset as a whole or internally to obtain an enhanced query image dataset.
[0070] Step S02: Input the query image dataset and the enhanced query image dataset into the cut and blend module. Cut and blend the query image dataset to obtain the original image data of the query set. Cut and blend the enhanced query image dataset to obtain the enhanced image data of the query set.
[0071] It should be noted that in this embodiment, a single image region is selected from the source domain query image data within the query image dataset. This image region is then replaced with the corresponding image region in the target domain query image data within the query image dataset to generate the original query set image data through cropping and blending. The formula for generating the original query set image data through cropping and blending is as follows:
[0072]
[0073]
[0074] in, This indicates the query set of raw image data. Indicates the aspect ratio of the image's sides. This indicates that the source domain is querying image data. This indicates that the target domain is querying image data. This represents the label data of the raw image data in the query set. This represents the label data of the source domain query image data. This represents the label data for the target domain query image data;
[0075] The enhanced query image dataset is then cropped and blended to generate query set enhanced image data.
[0076] Step S03: Input the query set original image data, query set enhanced image data and supporting image dataset into the feature processing module to obtain domain-specific features and domain-independent features, obtain domain labels based on domain-specific features, and input them into the domain classifier to obtain domain classification scores;
[0077] It should be noted that in this embodiment, the query set original image data, query set enhanced image data, and supporting image dataset are input into the feature processing module, which includes a twin feature extraction block and a feature decoupling block;
[0078] The first feature extraction block of the twin feature extraction block extracts features from the original image data of the query set and the source domain support image data and target domain support image data in the support image dataset. Then, all the extracted image features are input into the feature decoupling block to decompose the image features into domain-specific features and domain-independent features. The domain-specific features include source domain support image-specific features, target domain support image-specific features, and query set original image-specific features. The domain-independent features include source domain support image-independent features, target domain support image-independent features, and query set original image-independent features.
[0079] The second feature extraction block of the twin feature extraction block extracts features from the query set augmented image data, and then inputs all the extracted image features into the feature decoupling block to decompose and obtain irrelevant features of the query set augmented image. Overfitting is eliminated according to the feature perturbation algorithm, which is as follows:
[0080]
[0081] in, This indicates that the query set contains features irrelevant to the original image. This indicates that the query set enhances image-irrelevant features. The feature perturbation term includes noise sampled from a normal distribution to enhance image-irrelevant features of the query set;
[0082] Domain labels are obtained based on the domain-specific features, and then input into a domain classifier to obtain a domain classification score.
[0083] The domain-independent features are then input into the few-sample classification module for classification prediction.
[0084] In this embodiment, the domain labels that support specific features of the image in the source domain are obtained. and target domain labels that support image-specific features To calculate the domain labels of specific features of the original image in the query set. ,in Indicates the aspect ratio of the image's sides;
[0085] Input all domain-specific features and their corresponding domain labels into the domain classifier and calculate the domain classification score;
[0086] Loss optimization is performed based on a domain classification loss function, which is as follows:
[0087]
[0088]
[0089] in, Representation domain classification loss, Represents the cross-entropy loss function. This indicates that the source domain supports image-specific features. This indicates that the target domain supports specific image features. This indicates specific features of the original image in the query set. Representation domain classifier;
[0090] Input the source domain supporting image-independent features, the target domain supporting image-independent features, and the query set's original image-independent features into the few sample classification module;
[0091] The original image-independent features of the query set are classified in small samples with the source domain supporting image-independent features and the target domain supporting image-independent features, respectively.
[0092] The predicted distributions in the source domain and the target domain are calculated separately, and the loss is optimized according to the few-shot classification loss function, which is as follows:
[0093]
[0094] in, This represents the small sample classification loss. Indicates the aspect ratio of the image's sides. Represents the cross-entropy loss function. This represents the predicted distribution of original image-irrelevant features of the query set in the source domain. This represents the predicted distribution of the original image-irrelevant features of the query set over the target domain. This indicates that the source domain supports few-sample classification labels for the image. This indicates that the target domain supports few-sample classification labels for the image;
[0095] In this embodiment, the small sample classification module further includes:
[0096] The few-shot classification module also includes a few-shot self-supervised learning block. The query set is augmented with image-irrelevant features, which are then input into the few-shot self-supervised learning block. This block performs few-shot classification self-supervised enhancement based on the few-shot classification labels of the original images in the query set, ensuring semantic consistency across different regions of all images. Loss optimization is then performed using the few-shot self-supervised learning loss function, which is as follows:
[0097]
[0098] in, This represents the loss from small-sample self-supervised learning. This represents the predicted distribution of query set augmented image-irrelevant features in the source domain. This represents the predicted distribution of query set augmented image-independent features over the target domain.
[0099] Step S04: Input the domain-independent features into the few-sample classification module for classification prediction, and then optimize the local features based on the local feature contrast learning algorithm;
[0100] It should be noted that in this embodiment, the original image data and the enhanced image data of the query set are locally sliced to obtain the original local image and the enhanced local image of the query set.
[0101] The local image features of the original local image and the enhanced local image of the query set are extracted according to the twin feature extraction block in the feature processing module, and then the local image features are decomposed according to the feature decoupling block to obtain the corresponding domain-independent features.
[0102] Randomly select any original local image of the query set as the anchor point, and take all local feature samples of the same type of image in the enhanced local image of the query set as positive samples. Calculate the distance between the anchor point and the local features corresponding to each position of the positive samples based on the cosine distance, so as to compare the distribution distance between the anchor point and the local features in the positive samples in the feature space. At the same time, select all local feature samples of any dissimilar image in the enhanced local image of the query set as negative samples, and compare the distribution distance between the anchor point and the local features in the negative samples in the feature space.
[0103] The loss is optimized using a local feature contrastive learning loss function, which is as follows:
[0104]
[0105] in, This represents the loss from local feature contrast learning. Indicates the number of local features. Represents the local feature ordinal number. Represents the cosine distance. Indicates local features of the anchor point. Represents local features of positive samples. Represents local features of negative samples. Represents boundary parameters, Indicates temperature parameter, This represents the maximum value function.
[0106] In summary, based on the aforementioned cross-domain few-shot classification method using local target data augmentation, this invention designs a cut-and-mix method to perform local data augmentation on image data from the source domain and a small amount of image data from the target domain. This allows the final few-shot classification model to better adapt to the target domain. Then, by randomly augmenting two sets of query set images, cut-and-mix is performed to obtain augmented query images. The relationship between the predictions of the augmented images and the original images is analyzed, and a contrastive learning method is used to compare the relationship between their local features. This improves the discriminative ability of the final few-shot classification model for target domain data. This invention significantly enhances the generalization ability of cross-domain few-shot classification. Specifically, the process involves acquiring a query image dataset and a support image dataset. The query image dataset includes source domain query image data and target domain query image data. The support image dataset includes source domain support image data and target domain support image data. Random spatial data augmentation is then performed on the query image dataset to obtain an enhanced query image dataset. This enhanced query image dataset includes source domain enhanced query image data and target domain enhanced query image data, significantly enriching the enhanced images of the same image without causing significant semantic differences between the enhanced and original images. The query image dataset and the enhanced query image dataset are then input into a cropping and blending module. Based on the query image dataset... The process involves cropping and blending to obtain the original image data of the query set. Then, cropping and blending are performed on the enhanced query image dataset to obtain the enhanced image data of the query set. The original image data, the enhanced image data, and the supporting image dataset are input into a feature processing module to obtain domain-specific and domain-independent features. Domain labels are obtained based on the domain-specific features and input into a domain classifier to obtain a domain classification score. The domain-independent features are input into a few-shot classification module for classification prediction. Finally, local feature optimization is performed using a local feature contrastive learning algorithm, which greatly improves the discrimination ability of the final few-shot classification model for target domain data. This invention significantly improves the generalization ability of cross-domain few-shot classification.
[0107] Please see Figure 2 The diagram shows a flowchart of a cross-domain few-shot classification method based on local target data augmentation proposed in the second embodiment of the present invention. This cross-domain few-shot classification method based on local target data augmentation includes steps S11 to S16, wherein:
[0108] Step S11: Perform random spatial data augmentation processing on the query image dataset to obtain an enhanced query image dataset. Select a single image region in the source domain query image data within the query image dataset and replace the image region with the corresponding image region in the target domain query image data within the query image dataset to generate the original query set image data by cutting and mixing. Then, cut and mix the enhanced query image dataset to generate the enhanced query set image data.
[0109] Step S12: Input the query set original image data, query set augmented image data, and support image dataset into the feature processing module. The feature processing module includes a twin feature extraction block and a feature decoupling block. The first feature extraction block of the twin feature extraction block extracts features from the source domain support image data and target domain support image data in the query set original image data and support image dataset. Then, all extracted image features are input into the feature decoupling block to decompose the image features into domain-specific features and domain-independent features. The second feature extraction block of the twin feature extraction block extracts features from the query set augmented image data. Then, all extracted image features are input into the feature decoupling block to decompose and obtain query set augmented image-independent features. Overfitting is eliminated according to the feature perturbation algorithm to obtain domain labels based on domain-specific features. These are then input into the domain classifier to obtain domain classification scores. Finally, the domain-independent features are input into the few-sample classification module for classification prediction.
[0110] Step S13: Obtain the domain labels of the source domain supporting image-specific features and the target domain supporting image-specific features, so as to calculate the domain labels of the original image-specific features of the query set. Input all domain-specific features and corresponding domain labels into the domain classifier and calculate the domain classification score to optimize the loss according to the domain classification loss function.
[0111] Step S14: Input the source domain supporting image-independent features, the target domain supporting image-independent features, and the query set original image-independent features into the few-shot classification module, calculate the predicted distribution in the source domain and the predicted distribution in the target domain respectively, and optimize the loss according to the few-shot classification loss function;
[0112] Step S15: Input the query set augmented image-irrelevant features into the few-shot self-supervised learning block. The few-shot self-supervised learning block performs few-shot classification self-supervised augmentation based on the few-shot classification labels of the original images in the query set, so as to make the semantic information of different regions of the images in all images consistent, and optimize the loss according to the few-shot self-supervised learning loss function.
[0113] Step S16: Perform local slicing on the original image data and enhanced image data of the query set to obtain the original local image and enhanced local image of the query set. Extract local image features of the original local image and enhanced local image of the query set according to the twin feature extraction block in the feature processing module. Then decompose the local image features according to the feature decoupling block to obtain the corresponding domain-independent features. Randomly select any original local image of the query set as the anchor point, and take all local feature samples of the same type of image in the enhanced local image of the query set as positive samples. Calculate the distance between the anchor point and the local features corresponding to each position of the positive sample according to the cosine distance to compare the distribution distance between the anchor point and the local features in the positive sample in the feature space. At the same time, select all local feature samples of any different type of image in the enhanced local image of the query set as negative samples, and compare the distribution distance between the anchor point and the local features in the negative samples in the feature space to optimize the loss according to the local feature contrast learning loss function.
[0114] It should be noted that a comparative experiment was conducted between the cross-domain few-shot classification model of the present invention and the few-shot classification model in the prior art, and the results are shown in Tables 1 and 2 below:
[0115] Table 1
[0116]
[0117] Table 2
[0118]
[0119] Among them, dataset 1, dataset 2, dataset 3, and dataset 4 are CUB, Cars, Places, and Places, respectively. Table 1 is set as 5-way 1-shot, and Table 2 is set as 5-way 5-shot. As can be seen from Table 1 and Table 2, the cross-domain few-shot classification model of the present invention has a great improvement in generalization ability compared with the existing few-shot classification model, and has made significant progress.
[0120] In summary, based on the aforementioned cross-domain few-shot classification method using local target data augmentation, this invention designs a cut-and-mix method to perform local data augmentation on image data from the source domain and a small amount of image data from the target domain. This allows the final few-shot classification model to better adapt to the target domain. Then, by randomly augmenting two sets of query set images, cut-and-mix is performed to obtain augmented query images. The relationship between the predictions of the augmented images and the original images is analyzed, and a contrastive learning method is used to compare the relationship between their local features. This improves the discriminative ability of the final few-shot classification model for target domain data. This invention significantly enhances the generalization ability of cross-domain few-shot classification. Specifically, the process involves acquiring a query image dataset and a support image dataset. The query image dataset includes source domain query image data and target domain query image data. The support image dataset includes source domain support image data and target domain support image data. Random spatial data augmentation is then performed on the query image dataset to obtain an enhanced query image dataset. This enhanced query image dataset includes source domain enhanced query image data and target domain enhanced query image data, significantly enriching the enhanced images of the same image without causing significant semantic differences between the enhanced and original images. The query image dataset and the enhanced query image dataset are then input into a cropping and blending module. Based on the query image dataset... The process involves cropping and blending to obtain the original image data of the query set. Then, cropping and blending are performed on the enhanced query image dataset to obtain the enhanced image data of the query set. The original image data, the enhanced image data, and the supporting image dataset are input into a feature processing module to obtain domain-specific and domain-independent features. Domain labels are obtained based on the domain-specific features and input into a domain classifier to obtain a domain classification score. The domain-independent features are input into a few-shot classification module for classification prediction. Finally, local feature optimization is performed using a local feature contrastive learning algorithm, which greatly improves the discrimination ability of the final few-shot classification model for target domain data. This invention significantly improves the generalization ability of cross-domain few-shot classification.
[0121] Please see Figure 3 The diagram shows a schematic representation of a cross-domain small sample classification system based on local target data augmentation proposed in the third embodiment of the present invention. The system includes:
[0122] The image acquisition and processing module 10 is used to acquire a query image dataset and a support image dataset. The query image dataset includes source domain query image data and target domain query image data. The support image dataset includes source domain support image data and target domain support image data. The module performs random spatial data augmentation processing on the query image dataset to obtain an enhanced query image dataset. The enhanced query image dataset includes source domain enhanced query image data and target domain enhanced query image data.
[0123] The cutting and blending module 20 is used to input the query image dataset and the enhanced query image dataset into the cutting and blending module, perform cutting and blending according to the query image dataset to obtain the original image data of the query set, and perform cutting and blending according to the enhanced query image dataset to obtain the enhanced image data of the query set;
[0124] Feature processing module 30 is used to input the original image data of the query set, the enhanced image data of the query set, and the supporting image dataset into the feature processing module to obtain domain-specific features and domain-independent features, and to obtain domain labels based on the domain-specific features, so as to input the domain classifier to obtain a domain classification score;
[0125] The classification optimization module 40 is used to input the domain-independent features into the few-sample classification module for classification prediction, and then optimize the local features according to the local feature contrast learning algorithm.
[0126] The present invention also proposes a computer storage medium storing one or more programs that, when executed by a processor, implement the aforementioned cross-domain few-sample classification method based on local target data augmentation.
[0127] The present invention also proposes a computer device, including a memory and a processor, wherein the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory to implement the above-described cross-domain few-sample classification method based on local target data augmentation.
[0128] Those skilled in the art will understand that the logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a processor-included system, or other system that can fetch and execute instructions from, an instruction execution system, apparatus, or device). For the purposes of this specification, "computer-readable medium" can mean any means that can contain stored, communicated, propagated, or transmitted programs for use by, or in conjunction with, an instruction execution system, apparatus, or device.
[0129] More specific examples of computer-readable media (a non-exhaustive list) include: electrical connections (electronic devices) having one or more wires, portable computer disk drives (magnetic devices), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Furthermore, computer-readable media can even be paper or other suitable media on which the program can be printed, because the program can be obtained electronically, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in computer memory.
[0130] It should be understood that various parts of the present invention can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.
[0131] In the description of this specification, references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.
[0132] The embodiments described above are merely illustrative of several implementations of the present invention, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of the present invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these modifications and improvements all fall within the scope of protection of the present invention. Therefore, the scope of protection of this patent should be determined by the appended claims.
Claims
1. A cross-domain small sample classification method based on local target data enhancement, characterized in that, include: Obtain a query image dataset and a support image dataset. The query image dataset includes source domain query image data and target domain query image data. The support image dataset includes source domain support image data and target domain support image data. Perform random spatial data augmentation on the query image dataset to obtain an enhanced query image dataset. The enhanced query image dataset includes source domain enhanced query image data and target domain enhanced query image data. The query image dataset and the enhanced query image dataset are input into the cropping and blending module. The query image dataset is cropped and blended to obtain the original image data of the query set. The enhanced query image dataset is cropped and blended to obtain the enhanced image data of the query set. The original image data of the query set, the enhanced image data of the query set, and the supporting image dataset are input into the feature processing module to obtain domain-specific features and domain-independent features. Domain labels are obtained based on the domain-specific features and then input into the domain classifier to obtain domain classification scores. The domain-independent features are input into the few-sample classification module for classification prediction, and then local features are optimized based on the local feature contrast learning algorithm.
2. The local target data augmentation based cross-domain few-shot classification method according to claim 1, characterized in that, The steps of cropping and blending based on the query image dataset to obtain the original image data of the query set, and cropping and blending based on the enhanced query image dataset to obtain the enhanced image data of the query set, specifically include: Random spatial data augmentation is performed on the query image dataset. The random spatial data augmentation includes flipping or inverting the images in the query image dataset as a whole or internally to obtain an enhanced query image dataset. A single image region is selected from the source domain query image data within the query image dataset. This image region is then replaced with the corresponding image region from the target domain query image data within the query image dataset to generate the original query set image data through a cut-and-mix process. The formula for generating the original query set image data through the cut-and-mix process is as follows: wherein, denotes query set raw image data, denotes an edge length ratio of an image, denotes source domain query image data, denotes target domain query image data, denotes label data of the query set raw image data, denotes label data of the source domain query image data, denotes label data of the target domain query image data; The enhanced query image dataset is then cropped and blended to generate query set enhanced image data.
3. The local target data augmentation based cross-domain few-shot classification method according to claim 1, characterized in that, The step of inputting the original image data of the query set, the enhanced image data of the query set, and the supporting image dataset into the feature processing module to obtain domain-specific features and domain-independent features specifically includes: The query set raw image data, query set enhanced image data, and supporting image dataset are input into the feature processing module, which includes a twin feature extraction block and a feature decoupling block. The first feature extraction block of the twin feature extraction block extracts features from the original image data of the query set and the source domain support image data and target domain support image data in the support image dataset. Then, all the extracted image features are input into the feature decoupling block to decompose the image features into domain-specific features and domain-independent features. The domain-specific features include source domain support image-specific features, target domain support image-specific features, and query set original image-specific features. The domain-independent features include source domain support image-independent features, target domain support image-independent features, and query set original image-independent features. The second feature extraction block of the twin feature extraction block extracts features from the query set augmented image data, and then inputs all the extracted image features into the feature decoupling block to decompose and obtain irrelevant features of the query set augmented image. Overfitting is eliminated according to the feature perturbation algorithm, which is as follows: wherein, denotes query set raw image-agnostic features, denotes query set enhanced image-agnostic features, denotes a feature perturbation term comprising noise sampled from a normal distribution of the query set enhanced image-agnostic features; Domain labels are obtained based on the domain-specific features, and then input into a domain classifier to obtain a domain classification score. The domain-independent features are then input into the small sample classification module for classification prediction.
4. The cross-domain small sample classification method based on local target data augmentation according to claim 3, characterized in that, The step of obtaining a domain label based on the domain-specific features and inputting it into a domain classifier to obtain a domain classification score specifically includes: Obtain the domain labels of the source domain that support specific features of the image. and target domain labels that support image-specific features To calculate the domain labels of specific features of the original image in the query set. ,in Indicates the aspect ratio of the image's sides; Input all domain-specific features and their corresponding domain labels into the domain classifier and calculate the domain classification score; Loss optimization is performed based on a domain classification loss function, which is as follows: in, Representation domain classification loss, Represents the cross-entropy loss function. This indicates that the source domain supports image-specific features. This indicates that the target domain supports specific image features. This indicates specific features of the original image in the query set. Representation domain classifier.
5. The cross-domain small sample classification method based on local target data augmentation according to claim 3, characterized in that, The step of inputting the domain-independent features into the few-sample classification module for classification prediction specifically includes: Input the source domain supporting image-independent features, the target domain supporting image-independent features, and the query set's original image-independent features into the few sample classification module; The original image-independent features of the query set are classified in small samples with the source domain supporting image-independent features and the target domain supporting image-independent features, respectively. The predicted distributions in the source domain and the target domain are calculated separately, and the loss is optimized according to the few-shot classification loss function, which is as follows: in, This represents the small sample classification loss. Indicates the aspect ratio of the image's sides. Represents the cross-entropy loss function. This represents the predicted distribution of original image-irrelevant features of the query set in the source domain. This represents the predicted distribution of the original image-irrelevant features of the query set over the target domain. This indicates that the source domain supports few-sample classification labels for the image. This indicates that the target domain supports small sample classification labels for the image.
6. The cross-domain small sample classification method based on local target data augmentation according to claim 5, characterized in that, The small sample classification module also includes: The few-shot classification module also includes a few-shot self-supervised learning block. The query set is augmented with image-irrelevant features, which are then input into the few-shot self-supervised learning block. This block performs few-shot classification self-supervised enhancement based on the few-shot classification labels of the original images in the query set, ensuring semantic consistency across different regions of all images. Loss optimization is then performed using the few-shot self-supervised learning loss function, which is as follows: in, This represents the loss from small-sample self-supervised learning. This represents the predicted distribution of query set augmented image-irrelevant features in the source domain. This represents the predicted distribution of query set augmented image-independent features over the target domain.
7. The cross-domain small sample classification method based on local target data augmentation according to claim 1, characterized in that, The step of further optimizing local features based on the local feature contrast learning algorithm specifically includes: Local slicing is performed on the original image data and the enhanced image data of the query set to obtain the original local image and the enhanced local image of the query set. The local image features of the original local image and the enhanced local image of the query set are extracted according to the twin feature extraction block in the feature processing module, and then the local image features are decomposed according to the feature decoupling block to obtain the corresponding domain-independent features. Randomly select any original local image of the query set as the anchor point, and take all local feature samples of the same type of image in the enhanced local image of the query set as positive samples. Calculate the distance between the anchor point and the local features corresponding to each position of the positive samples based on the cosine distance, so as to compare the distribution distance between the anchor point and the local features in the positive samples in the feature space. At the same time, select all local feature samples of any dissimilar image in the enhanced local image of the query set as negative samples, and compare the distribution distance between the anchor point and the local features in the negative samples in the feature space. The loss is optimized using a local feature contrastive learning loss function, which is as follows: in, This represents the loss from local feature contrast learning. Indicates the number of local features. Represents the local feature ordinal number. Represents the cosine distance. Indicates local features of the anchor point. Represents local features of positive samples. Represents local features of negative samples. Represents boundary parameters, Indicates temperature parameter, This represents the maximum value function.
8. A cross-domain few-sample classification system based on local target data augmentation, characterized in that, include: The image acquisition and processing module is used to acquire a query image dataset and a support image dataset. The query image dataset includes source domain query image data and target domain query image data. The support image dataset includes source domain support image data and target domain support image data. The module performs random spatial data augmentation processing on the query image dataset to obtain an enhanced query image dataset. The enhanced query image dataset includes source domain enhanced query image data and target domain enhanced query image data. The cut-and-blend module is used to input the query image dataset and the enhanced query image dataset into the cut-and-blend module, cut and blend according to the query image dataset to obtain the original image data of the query set, and cut and blend according to the enhanced query image dataset to obtain the enhanced image data of the query set; The feature processing module is used to input the original image data of the query set, the enhanced image data of the query set, and the supporting image dataset into the feature processing module to obtain domain-specific features and domain-independent features, and to obtain domain labels based on the domain-specific features, so as to input the domain classifier to obtain domain classification scores; The classification optimization module is used to input the domain-independent features into the few-sample classification module for classification prediction, and then optimize the local features based on the local feature contrast learning algorithm.
9. A storage medium, characterized in that, The storage medium stores one or more programs that, when executed by a processor, implement the cross-domain few-sample classification method based on local target data augmentation as described in any one of claims 1-7.
10. A computer device, characterized in that, The computer device includes a memory and a processor, wherein: The memory is used to store computer programs; When the processor executes the computer program stored in the memory, it implements the cross-domain few-sample classification method based on local target data augmentation as described in any one of claims 1-7.