A semi-supervised domain generalization system based on causal inference guidance
The semi-supervised domain generalization system guided by causal inference solves the problems of cross-domain distribution changes, pseudo-label balance, and causal considerations of data augmentation in semi-supervised learning, and improves the performance and interpretability of the model in unknown target domains.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGXI UNIV
- Filing Date
- 2026-03-24
- Publication Date
- 2026-06-19
AI Technical Summary
Existing semi-supervised learning methods suffer from performance degradation, difficulty in balancing pseudo-label quality and utilization efficiency, and lack of causal consideration in data augmentation when faced with cross-domain distribution changes. This results in insufficient adaptability of the model in unknown target domains and vulnerability to spurious correlations.
A semi-supervised domain generalization system based on causal inference is adopted. Through a cross-domain prototype self-learning module and a causal guidance enhancement module, robust causal feature representations are learned. Cross-domain prototype learning and data augmentation techniques are used to improve the performance of the model in unknown target domains.
It significantly improves the model's classification accuracy in unknown target domains, achieves a balance between the quantity and quality of pseudo-labels, enhances the model's interpretability and robustness, and possesses good versatility and scalability.
Smart Images

Figure CN122244528A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image processing technology, and in particular to a semi-supervised domain generalization system based on causal inference guidance. Background Technology
[0002] In recent years, semi-supervised learning and domain generalization techniques have shown great potential in addressing the problems of scarce labeled data and cross-domain distribution shifts. Existing semi-supervised domain generalization methods mainly follow two technical paths: one is to learn domain-invariant feature representations through domain alignment, and the other is to use data augmentation strategies to improve the model's robustness to distribution changes.
[0003] However, existing technical solutions still have obvious limitations and drawbacks:
[0004] 1. Vulnerability to spurious correlations: Traditional Data Grading (DG) methods typically focus on learning domain-invariant features. However, these features may only capture the statistical association between data and labels, rather than the underlying causal mechanism. Models are prone to over-reliance on spurious correlations present in the training data that may fail in new environments, leading to a sharp decline in model performance once the data distribution in the deployment environment changes.
[0005] 2. Insufficient adaptability to unknown target domains: Under the standard DG (Distributed Gaussian) setting, the target domain is completely invisible during model training. Most existing methods only utilize limited, known source domain data for learning. When there is a significant distributional difference between the target domain and the source domain, the model struggles to generalize effectively. In particular, there is a lack of mechanisms for effectively estimating and simulating the distribution of unknown target domains.
[0006] 3. The Challenge of Balancing Pseudo-Label Quality and Utilization Efficiency: In semi-supervised learning frameworks, the quality of pseudo-labels is crucial. While existing SSL methods (such as FixMatch and FreeMatch) filter pseudo-labels by setting fixed or dynamic thresholds, they often struggle to achieve a good balance between the "generation rate" (i.e., quantity) and "accuracy" (i.e., quality) of pseudo-labels when facing complex domain biases. Overly conservative strategies may result in a large amount of unlabeled data remaining unused, while overly aggressive strategies may introduce numerous incorrect labels, misleading model training.
[0007] 4. Lack of causal consideration in data augmentation strategies: Current data augmentation techniques (such as Mixup and CutMix) mostly focus on random perturbation or mixing at the pixel or feature level, lacking explicit modeling of the causal structure of image generation (such as core factors like object semantics and domain style). This can lead to the augmentation process failing to effectively simulate real-world domain changes or failing to specifically strengthen causal features and suppress non-causal confounding factors. Summary of the Invention
[0008] To address the aforementioned issues, this invention provides a semi-supervised domain generalization system guided by causal inference, which can learn more robust causal feature representations and effectively utilize unlabeled data to improve the model's cross-domain performance.
[0009] To achieve the above objectives, the technical solution adopted by the present invention is as follows:
[0010] A semi-supervised domain generalization system guided by causal inference, comprising:
[0011] The feature extraction module is used for primary feature extraction of images;
[0012] A cross-domain prototype self-learning module is used for data acquisition by the feature extraction module. The cross-domain prototype self-learning module performs prototype comparison learning on the primary features and constructs a cross-domain discrimination loss. It performs cross-domain feature alignment at the instance level and the prototype level so that the model can distinguish different image instances.
[0013] The causal-guided enhancement module is used for data acquisition by the feature extraction module, and the causal-guided enhancement module enables the model to obtain enhanced image samples through domain style feature mixing, key semantic feature suppression, and non-key region noise intervention.
[0014] A classification decision module is used to combine the cross-domain prototype self-learning module and the causal guidance enhancement module to obtain pseudo-labels;
[0015] Furthermore, the cross-domain prototype self-learning module includes a domain-specific instance learning submodule, a domain-specific prototype learning submodule, and an execution submodule.
[0016] The in-domain instance learning submodule assigns a unique instance label to each image in the dataset, constructs positive sample pairs from different augmented views of the same image, and obtains negative samples through memory sampling. The in-domain instance learning submodule uses instance discrimination contrastive learning loss to identify the positive sample pairs as belonging to the same instance label and the negative samples as different instances.
[0017] The domain-specific prototype learning submodule extracts representative prototype features from the data in each domain and uses these prototype features as the core anchor points for the data distribution in that domain, so as to reflect typical patterns of different categories or semantic structures.
[0018] The execution submodule compares and learns the image instances in the domain with the representative prototype features of the target domain. By modifying the contrastive loss function, it narrows the distance between image instances with the same semantics and cross-domain prototype features, and widens the distance between image instances with different semantics and cross-domain prototype features, thus performing cross-domain feature alignment at the instance level and the prototype level.
[0019] Furthermore, the in-domain instance learning submodule maps each sample to a unique location in the latent feature space by learning a feature extractor to capture its semantic information, and the in-domain instance learning submodule uses the contrastive InfoNCE loss as the instance discrimination target:
[0020] Formula (1)
[0021] in, To identify targets for instances; For feature extractors; For temperature parameters, for The set of negative samples; For the first Data sets from various domains These are unlabeled input samples;
[0022] The domain-specific prototype learning submodule addresses each domain. Internal feature representation Perform K-Means clustering to obtain A prototype ,prototype The cluster centers are defined by the clustering process, and the learning objective of the clustering process can be formalized as iteratively minimizing the intra-cluster distance:
[0023] Formula (2)
[0024] The execution submodule passes through the source domain. samples Calculate its comparison with prototypes in all other fields. The similarity is calculated, and a cross-domain discriminative self-learning loss is constructed:
[0025] Formula (3)
[0026] According to formulas (1) and (3), the training objective of the cross-domain prototype self-learning module is:
[0027] Formula (4)
[0028] Furthermore, the causal guidance enhancement module includes a hybridization submodule, a suppression submodule, and a noise intervention submodule.
[0029] The hybrid submodule is used to simulate the intervention of domain style variables on features, and the suppression submodule exposes high-level semantic features to changes in different domain styles in order to reduce spurious correlation dependencies related to domain style.
[0030] The suppression submodule is used to suppress highly activated semantic features. The suppression submodule identifies the features in the image that have the most influence on classification decisions through Grad-CAM and masks the regions corresponding to these features so that the model can learn and reason based on other secondary features.
[0031] The noise intervention submodule is used to randomly occlude or perturb non-critical semantic regions in the sample to simulate intervention on non-causal factors.
[0032] Furthermore, the hybrid submodule extracts feature representations of the input image in the feature extractor, and performs feature extraction on the features. Normalization is performed, and features are calculated. Two different samples and Mean of each channel and standard deviation For the mean and the standard deviation New style parameters are constructed using linear interpolation. and The features are obtained through the new style parameters. After normalization and affine transformation, the feature extractor outputs a new feature representation with a hybrid style;
[0033] The suppression submodule is for Features on each channel Its corresponding gradient is The weight coefficients for each channel are obtained by performing average pooling on the spatial dimension. By The heatmap is weighted with the original feature map and subjected to a ReLU activation function to suppress negative responses, preserving regions that positively contribute to the target class. Heatmaps are obtained through interpolation. The size is adjusted to match the original image, thus aligning it in pixel space to obtain a scaled image. The suppression submodule identifies highly sensitive regions of the input image using the heatmap and applies a first binary mask. This reduces the focus on highly sensitive areas; the suppression submodule outputs through the feature extractor... Multiply by the first binary mask to obtain the final feature;
[0034] The noise intervention submodule inputs an image. and the gradient map calculated based on this image. Define the second binary mask. ,and , A preset threshold is used to distinguish between critical and non-critical regions in an image; the noise intervention submodule intervenes in the non-critical regions through random clustering noise masking to obtain enhanced image samples. .
[0035] Furthermore, in the hybrid submodule, the mean and standard deviation The calculation method is as follows:
[0036] Formula (5)
[0037] Formula (6)
[0038] Among them, encoder The feature representation of the input image is extracted and denoted as . ,in , , These represent the number of feature channels, height, and width, respectively.
[0039] Constructing new style parameters and The method is as follows:
[0040] Formula (7)
[0041] Formula (8)
[0042] in, obey distributed, ;
[0043] Features Perform normalization and affine transformation:
[0044] Formula (10)
[0045] in, This is a hybrid style new feature representation output by the feature extractor.
[0046] Furthermore, in the suppression submodule, the weight coefficient is calculated as follows:
[0047] Formula (11)
[0048] in, This reflects the degree of importance of different channels in predicting the current category;
[0049] The heatmap representation method is as follows:
[0050] Formula (12)
[0051] First binary mask Based on a preset threshold have:
[0052] Formula (13)
[0053] The method for calculating the final feature is as follows:
[0054] Formula (14)
[0055] in, For the final feature; Output features for the feature extractor.
[0056] Furthermore, in the noise intervention submodule, the method for calculating the enhanced image samples is as follows:
[0057] . Formula (15)
[0058] Furthermore, the classification decision module defines the supervised loss as:
[0059] Formula (16)
[0060] in,( ) represents labeled data; For feature extractors, For classifiers, The cross-entropy loss function;
[0061] The classification decision module defines unsupervised loss as:
[0062] Formula (17)
[0063] in,( ) represents unlabeled data; To pass the enhanced data The model prediction results obtained can be used as pseudo-labels for samples; and These are the feature-level and pixel-level enhanced input representations generated by the causal-guided enhancement module, respectively. For extractors that apply a blend of style features; and These are two preset weighting coefficient parameters used to balance the contributions of feature enhancement and pixel enhancement to the loss function. It is an indicator function used to filter out pseudo-labeled samples with low confidence.
[0064] The objective function of the classification decision module is:
[0065] Formula (17)
[0066] in, These are the pseudo-tags for the output.
[0067] The beneficial effects of this invention are:
[0068] It significantly improves the model's classification accuracy in unknown target domains: it effectively mitigates spurious correlations through causal intervention mechanisms and expands the model's domain distribution cognition by utilizing cross-domain prototype learning, enabling the model to maintain excellent performance when facing unknown target domains with distributions significantly different from the training source domain.
[0069] A better balance between the quantity and quality of pseudo-labels is achieved: Through causal-guided data augmentation and more robust feature representations obtained by pre-training with cross-domain prototype self-learning modules, the pseudo-labels generated by this invention maintain a high generation rate while also having a high accuracy rate.
[0070] The interpretability and robustness of the model are enhanced: by visualizing the attention region through Grad-CAM, it is found that, compared with the existing technology, the method of the present invention can more accurately focus on the key semantic regions of objects that have a real causal relationship with the classification decision, rather than background or false features, in complex scenes and multi-object images.
[0071] It has good versatility and scalability: The cross-domain prototype self-learning module and causal guidance enhancement module proposed in this invention can be integrated into existing semi-supervised learning frameworks as plug-and-play components.
[0072] The computational complexity is controllable and has practical deployment potential: Although an additional computation module is introduced, the overall time complexity of this invention is on the same order of magnitude as that of mainstream SSL methods, maintaining linear growth and not bringing unacceptable computational burden, thus showing good prospects for engineering applications. Attached Figure Description
[0073] Figure 1 This is a block diagram of a preferred embodiment of the present invention.
[0074] Figure 2 This is a schematic diagram of a cross-domain prototype self-learning module according to a preferred embodiment of the present invention.
[0075] Figure 3 This is a schematic diagram of a causal guidance enhancement module according to a preferred embodiment of the present invention.
[0076] In the diagram, 1-Feature extraction module, 2-Cross-domain prototype self-learning module, 21-Domain-specific instance learning sub-module, 22-Domain-specific prototype learning sub-module, 23-Execution sub-module, 3-Causal guidance enhancement module, 31-Hybrid sub-module, 32-Suppression sub-module, 33-Noise intervention sub-module, and 4-Classification decision module. Detailed Implementation
[0077] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0078] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and / or" as used herein includes any and all combinations of one or more of the associated listed items.
[0079] Please also see Figures 1 to 3 A preferred embodiment of the present invention provides a semi-supervised domain generalization system based on causal inference guidance, comprising:
[0080] Feature extraction module 1 is used for primary feature extraction of images.
[0081] Cross-domain prototype self-learning module 2 is used for data acquisition by feature extraction module 1. Cross-domain prototype self-learning module 2 performs prototype comparison learning on primary features and constructs cross-domain discrimination loss. It performs cross-domain feature alignment at the instance level and prototype level so that the model can distinguish different image instances.
[0082] The cross-domain prototype self-learning module 2 includes an in-domain instance learning submodule 21, an in-domain prototype learning submodule 22, and an execution submodule 23.
[0083] The domain instance learning submodule 21 assigns a unique instance label to each image in the dataset, constructs different augmented views of the same image as positive sample pairs, and obtains negative samples through memory sampling. The domain instance learning submodule 21 uses instance discrimination contrastive learning loss to identify positive sample pairs as belonging to the same instance label and negative samples as different instances.
[0084] The in-domain instance learning submodule 21 learns a feature extractor to map each sample to a unique location in the latent feature space to capture its semantic information. Furthermore, the in-domain instance learning submodule 21 uses the InfoNCE loss from contrastive learning as the instance discrimination target.
[0085] Formula (1)
[0086] in, To identify targets for instances; For feature extractors; For temperature parameters, for The set of negative samples; For the first Data sets from various domains These are unlabeled input samples.
[0087] In this embodiment, instance discrimination learning is first performed within a single domain. This involves assigning a unique category identity to each image and training the model to recognize it as its corresponding instance label, while treating other images as negative samples. Under this setup, positive samples for each image include its original input and its data-augmented variants, while negative samples are sampled from a memory. The domain-specific instance learning submodule 21 employs instance discrimination contrastive learning loss, treating different augmented views of the same image as positive sample pairs and all other images in the memory as negative samples for contrastive learning. This step forces the model to learn the uniqueness of each instance, capturing fine-grained semantic information.
[0088] The domain prototype learning submodule 22 extracts representative prototype features from the data in each domain and uses these prototype features as the core anchor points for the data distribution in that domain to reflect typical patterns of different categories or semantic structures.
[0089] Domain-specific prototype learning submodule 22 pairs for each domain Internal feature representation Perform K-Means clustering to obtain A prototype ,prototype The cluster centers are defined by the clustering process, and the learning objective of the clustering process can be formalized as iteratively minimizing the intra-cluster distance:
[0090] Formula (2)
[0091] In this embodiment, due to significant distribution shifts between different domains, directly performing cross-domain instance discrimination learning may result in instances being incorrectly mapped to unrelated instance categories in another domain, thereby weakening the model's discriminative ability. The intra-domain prototype learning submodule extracts representative prototype features within each domain. Unlike existing technologies that directly perform cross-instance comparisons, the intra-domain prototype learning submodule 22 first constructs representative prototypes within each domain to capture the core patterns of data distribution within that domain, avoiding the failure of direct cross-domain instance comparisons due to excessively large distribution differences between domains.
[0092] The execution submodule 23 compares and learns the image instances in the domain with the representative prototype features of the target domain. By modifying the contrastive loss function, it narrows the distance between image instances with the same semantics and cross-domain prototype features, and widens the distance between image instances with different semantics and cross-domain prototype features, thus performing cross-domain feature alignment at the instance level and the prototype level.
[0093] Execution submodule 23 via source domain samples Calculate its comparison with prototypes in all other fields. The similarity is calculated, and a cross-domain discriminative self-learning loss is constructed:
[0094] Formula (3)
[0095] In this embodiment, execution submodule 23 constructs a cross-domain discriminative loss, encouraging instance features to have a certain degree of distinguishability with prototypes outside their own domain. This achieves instance-prototype level cross-domain feature alignment, and the learned features can better generalize to unknown domains. Compared with instance-based alignment, execution submodule 23 learns features with stronger generalization ability by aligning with cross-domain prototypes.
[0096] According to formulas (1) and (3), the training objective of cross-domain prototype self-learning module 2 is:
[0097] Formula (4)
[0098] like Figure 2 The diagram shown illustrates the workflow of the cross-domain prototype self-learning module 2.
[0099] The causal-guided enhancement module 3 is used for data acquisition by the feature extraction module 1. The causal-guided enhancement module 3 uses domain style feature mixing, key semantic feature suppression, and non-key region noise intervention to enable the model to obtain enhanced image samples.
[0100] The causal guidance enhancement module 3 includes a hybrid submodule 31, a suppression submodule 32, and a noise intervention submodule 33.
[0101] The hybrid submodule 31 is used to simulate the intervention of domain style variables on features, and the suppression submodule 32 exposes high-level semantic features to changes in different domain styles in order to reduce spurious correlation dependencies related to domain style.
[0102] The hybrid submodule 31 extracts feature representations of the input image in the feature extractor, and performs feature extraction on the input image. Normalization is performed, and features are calculated. Two different samples and Mean of each channel and standard deviation , for the mean and standard deviation New style parameters are constructed using linear interpolation. and Features are transformed through new style parameters After normalization and affine transformation, the feature extractor outputs a new feature representation with a hybrid style.
[0103] In the hybrid submodule 31, the mean and standard deviation The calculation method is as follows:
[0104] Formula (5)
[0105] Formula (6)
[0106] Among them, encoder The feature representation of the input image is extracted and denoted as . ,in , , These represent the number of feature channels, height, and width, respectively.
[0107] Constructing new style parameters and The method is as follows:
[0108] Formula (7)
[0109] Formula (8)
[0110] in, obey distributed, ;
[0111] Features Perform normalization and affine transformation:
[0112] Formula (10)
[0113] in, This is a hybrid style new feature representation output by the feature extractor.
[0114] The mixing submodule 31, based on the assumption that shallow features of deep neural networks are generally related to domain style, utilizes the MixStyle technique to achieve domain style mixing of features from samples from different domains. This process simulates the intervention of domain style features on high-level semantic features, that is, exposing high-level semantic features to changes in different domain styles, thereby reducing spurious correlations with domain style.
[0115] The suppression submodule 32 is used to suppress highly activated semantic features. The suppression submodule 32 identifies the features in the image that have the greatest influence on classification decisions through Grad-CAM and masks the regions corresponding to these features so that the model can learn and reason based on other secondary features.
[0116] Suppression submodule 32 for Features on each channel Its corresponding gradient is The weight coefficients for each channel are obtained by performing average pooling on the spatial dimension. By The heatmap is weighted with the original feature map and subjected to a ReLU activation function to suppress negative responses, preserving regions that positively contribute to the target class. Heatmaps are obtained through interpolation. The size is adjusted to match the original image, thus aligning it in pixel space to obtain a scaled image. The suppression submodule 32 identifies highly sensitive regions of the input image using a heatmap and applies a first binary mask. This reduces the focus on highly sensitive areas; the suppression submodule 32 outputs through the feature extractor... Multiply by the first binary mask to obtain the final feature.
[0117] In the suppression submodule 32, the weight coefficients are calculated as follows:
[0118] Formula (11)
[0119] in, This reflects the degree of importance of different channels in predicting the current category;
[0120] The heatmap representation method is as follows:
[0121] Formula (12)
[0122] First binary mask Based on a preset threshold have:
[0123] Formula (13)
[0124] The final feature is calculated as follows:
[0125] Formula (14)
[0126] in, For the final feature; Output features for the feature extractor.
[0127] In this embodiment, the suppression submodule 32 utilizes Grad-CAM to identify the features in the image that have the greatest influence on classification decisions. These regions typically correspond to the core structures of objects in the image, such as an animal's face or the front of a vehicle, and often become the main basis for model discrimination during traditional training. By selectively masking these regions, the suppression submodule 32 encourages the model to no longer rely on a single cue for model discrimination, but instead prompts the model to learn and reason from other secondary features, thus preventing overfitting.
[0128] The noise intervention submodule 33 is used for random occlusion or perturbation of non-critical semantic regions in the sample to simulate intervention on non-causal factors.
[0129] Noise intervention submodule 33 inputs images and the gradient map calculated based on this image. Define the second binary mask. ,and , A preset threshold is used to distinguish between key and non-key regions in the image; the noise intervention submodule 33 intervenes in the non-key regions through random clustering noise masking to obtain enhanced image samples. .
[0130] In noise intervention submodule 33, the method for calculating enhanced image samples is as follows:
[0131] . Formula (15)
[0132] In this embodiment, the noise intervention submodule 33 performs controllable perturbation on non-causal backgrounds or irrelevant contexts to simulate intervention on non-causal factors, further reducing the model's dependence on spurious features such as the background. The noise intervention submodule 33 utilizes clustered random noise CRN technology. Its core idea is to randomly occlude or perturb non-critical semantic regions in the samples using CRN, thereby simulating intervention on non-causal factors. CRN is a type of structured noise that focuses on low-activation regions in the Grad-CAM activation map, concentrating noise in the image background or in areas unrelated to the current task.
[0133] like Figure 2 The diagram shown is a schematic of the workflow of the causal guidance enhancement module 3.
[0134] Classification decision module 4 is used to combine cross-domain prototype self-learning module 2 and causal guidance enhancement module 3 to obtain pseudo-labels;
[0135] Classification Decision Module 4 defines the supervised loss as:
[0136] Formula (16)
[0137] in,( ) represents labeled data; For feature extractors, For classifiers, The cross-entropy loss function;
[0138] Classification Decision Module 4 defines unsupervised loss as:
[0139] Formula (17)
[0140] in,( ) represents unlabeled data; To pass the enhanced data The model prediction results obtained can be used as pseudo-labels for samples; and These are the input representations after feature-level enhancement and pixel-level enhancement, respectively, generated by the causal-guided enhancement module 3; For extractors that apply a blend of style features; and These are two preset weighting coefficient parameters used to balance the contributions of feature enhancement and pixel enhancement to the loss function. It is an indicator function used to filter out pseudo-labeled samples with low confidence.
[0141] The objective function of classification decision module 4 is:
[0142] Formula (17)
[0143] in, These are the pseudo-tags for the output.
[0144] In this embodiment, on the PACS dataset (5 labeled samples per class), the method achieved an average accuracy of 83.60%, which is 6.71 percentage points higher than the second-best performing baseline method (such as DebiasMatch, 76.89%). On the more challenging MiniDomainNet dataset (5 labeled samples per class), the method achieved an average accuracy of 60.10%, significantly outperforming the mainstream baseline method FreeMatch (54.15%) by nearly 6 percentage points. Therefore, this embodiment significantly improves the model's classification accuracy in unknown target domains.
[0145] In this embodiment, the pseudo-label generation rate is close to 90%, while the accuracy stabilizes at around 98% in the later stages of training. The harmonic mean of pseudo-label generation rate and accuracy reaches approximately 95%, far exceeding methods such as FixMatch (75%), FreeMatch (75%), and SoftMatch (65%), demonstrating that this embodiment can utilize unlabeled data more efficiently and reliably.
[0146] In this embodiment, in a multi-target scenario, the method of this embodiment can successfully activate the main features of all relevant targets, while the comparison method often results in omissions or incorrect activations. This indicates that the features learned in this embodiment are more causal and robust.
[0147] In this embodiment, ablation experiments show that introducing the PDS mechanism into baseline methods such as FixMatch, FreeMatch, and SoftMatch can improve their performance by 1% to 5%. For example, the performance of SoftMatch combined with PDS is improved by 5.1%, which proves the effectiveness and universality of the core mechanism of this embodiment.
[0148] This embodiment systematically solves the key bottleneck problem in generalization in the semi-supervised domain by introducing causal inference. It achieves leading performance on multiple benchmark tests and provides an effective technical solution for deploying reliable visual models in real-world scenarios where labeled data is scarce and deployment environments are highly variable.
Claims
1. A semi-supervised domain generalization system based on causal inference guidance, characterized in that, include: The feature extraction module is used for primary feature extraction of images; A cross-domain prototype self-learning module is used for data acquisition by the feature extraction module. The cross-domain prototype self-learning module performs prototype comparison learning on the primary features and constructs a cross-domain discrimination loss. It performs cross-domain feature alignment at the instance level and the prototype level so that the model can distinguish different image instances. The causal-guided enhancement module is used for data acquisition by the feature extraction module, and the causal-guided enhancement module enables the model to obtain enhanced image samples through domain style feature mixing, key semantic feature suppression, and non-key region noise intervention. The classification decision module is used to combine the cross-domain prototype self-learning module and the causal guidance enhancement module to obtain pseudo-labels.
2. The semi-supervised domain generalization system based on causal inference as described in claim 1, characterized in that: The cross-domain prototype self-learning module includes a domain-specific instance learning submodule, a domain-specific prototype learning submodule, and an execution submodule. The in-domain instance learning submodule assigns a unique instance label to each image in the dataset, constructs positive sample pairs from different augmented views of the same image, and obtains negative samples through memory sampling. The in-domain instance learning submodule uses instance discrimination contrastive learning loss to identify the positive sample pairs as belonging to the same instance label and the negative samples as different instances. The domain-specific prototype learning submodule extracts representative prototype features from the data in each domain and uses these prototype features as the core anchor points for the data distribution in that domain, so as to reflect typical patterns of different categories or semantic structures. The execution submodule compares and learns the image instances in the domain with the representative prototype features of the target domain. By modifying the contrastive loss function, it narrows the distance between image instances with the same semantics and cross-domain prototype features, and widens the distance between image instances with different semantics and cross-domain prototype features, thus performing cross-domain feature alignment at the instance level and the prototype level.
3. The semi-supervised domain generalization system based on causal inference as described in claim 2, characterized in that: The in-domain instance learning submodule maps each sample to a unique location in the latent feature space by learning a feature extractor to capture its semantic information, and uses the InfoNCE loss learned through contrastive learning as the instance discrimination target. Official (1) in, To identify targets for instances; For feature extractors; For temperature parameters, for The set of negative samples; For the first Data sets from various domains These are unlabeled input samples; The domain-specific prototype learning submodule addresses each domain. Internal feature representation Perform K-Means clustering to obtain A prototype ,prototype The cluster centers are defined by the clustering process, and the learning objective of the clustering process can be formalized as iteratively minimizing the intra-cluster distance: Official (2) The execution submodule passes through the source domain. samples Calculate its comparison with prototypes in all other fields. The similarity is calculated, and a cross-domain discriminative self-learning loss is constructed: Official (3) According to formulas (1) and (3), the training objective of the cross-domain prototype self-learning module is: Official (4).
4. A semi-supervised domain generalization system based on causal inference as described in claim 3, characterized in that: The causal guidance enhancement module includes a hybrid submodule, a suppression submodule, and a noise intervention submodule. The hybrid submodule is used to simulate the intervention of domain style variables on features, and the suppression submodule exposes high-level semantic features to changes in different domain styles in order to reduce spurious correlation dependencies related to domain style. The suppression submodule is used to suppress highly activated semantic features. The suppression submodule identifies the features in the image that have the most influence on classification decisions through Grad-CAM and masks the regions corresponding to these features so that the model can learn and reason based on other secondary features. The noise intervention submodule is used to randomly occlude or perturb non-critical semantic regions in the sample to simulate intervention on non-causal factors.
5. A semi-supervised domain generalization system based on causal inference as described in claim 4, characterized in that: The hybrid submodule extracts feature representations of the input image in the feature extractor, and performs feature extraction on the input image. Normalization is performed, and features are calculated. Two different samples and Mean of each channel and standard deviation For the mean and the standard deviation New style parameters are constructed using linear interpolation. and The features are obtained through the new style parameters. After normalization and affine transformation, the feature extractor outputs a new feature representation with a hybrid style; The suppression submodule is for Features on each channel Its corresponding gradient is The weight coefficients for each channel are obtained by performing average pooling on the spatial dimension. By The heatmap is weighted with the original feature map and subjected to a ReLU activation function to suppress negative responses, preserving regions that positively contribute to the target class. Heatmaps are obtained through interpolation. The size is adjusted to match the original image, thus aligning it in pixel space to obtain a scaled image. The suppression submodule identifies highly sensitive regions of the input image using the heatmap and applies a first binary mask. This reduces the focus on highly sensitive areas; the suppression submodule outputs through the feature extractor... Multiply by the first binary mask to obtain the final feature; The noise intervention submodule inputs an image. and the gradient map calculated based on this image. Define the second binary mask. ,and , A preset threshold is used to distinguish between key and non-key regions in an image; The noise intervention submodule intervenes in the non-critical regions using random clustering noise masks to obtain enhanced image samples. .
6. A semi-supervised domain generalization system based on causal inference as described in claim 5, characterized in that: In the hybrid submodule, the mean and standard deviation The calculation method is as follows: Official (5) Official (6) Among them, encoder The feature representation of the input image is extracted and denoted as . ,in , , These represent the number of feature channels, height, and width, respectively. Constructing new style parameters and The method is as follows: Official (7) Official (8) in, obey distributed, ; Features Perform normalization and affine transformation: Official (10) in, This is a hybrid style new feature representation output by the feature extractor.
7. A semi-supervised domain generalization system based on causal inference as described in claim 5, characterized in that: In the suppression submodule, the weight coefficient is calculated as follows: Official (11) in, This reflects the degree of importance of different channels in predicting the current category; The heatmap representation method is as follows: Official (12) First binary mask Based on a preset threshold have: Official (13) The method for calculating the final feature is as follows: Official (14) in, For the final feature; Output features for the feature extractor.
8. A semi-supervised domain generalization system based on causal inference as described in claim 5, characterized in that: In the noise intervention submodule, the method for calculating the enhanced image samples is as follows: Official (15).
9. A semi-supervised domain generalization system based on causal inference as described in claim 5, characterized in that: The classification decision module defines the supervision loss as: Official (16) in,( ) represents labeled data; For feature extractors, For classifiers, The cross-entropy loss function; The classification decision module defines unsupervised loss as: Official (17) in,( ) represents unlabeled data; To pass the enhanced data The model prediction results obtained can be used as pseudo-labels for samples; and These are the feature-level and pixel-level enhanced input representations generated by the causal-guided enhancement module, respectively. For extractors that apply a blend of style features; and These are two preset weighting coefficient parameters used to balance the contributions of feature enhancement and pixel enhancement to the loss function. It is an indicator function used to filter out pseudo-labeled samples with low confidence. The objective function of the classification decision module is: Official (17) in, These are the pseudo-tags for the output.