A cross-domain medical image segmentation method based on frequency domain causal learning
By decoupling the appearance and content variables of medical images through frequency domain causal learning, and using frequency components for causal decoupling and counterfactual reasoning, the problem of insufficient model generalization ability in cross-domain medical image segmentation is solved, thereby improving segmentation accuracy and robustness.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HEFEI CITY COULD DATA CENT
- Filing Date
- 2026-02-28
- Publication Date
- 2026-06-19
AI Technical Summary
Existing cross-domain medical image segmentation methods suffer from problems such as significant cross-domain data distribution shifts, spurious correlations among tissue content variables, and insufficient generalization ability of segmentation models when faced with differences in imaging equipment, scanning protocols, and acquisition environments.
A frequency-domain causal learning-based approach is adopted, which transforms the image from the spatial domain to the frequency domain through discrete Fourier transform, decoupling appearance variables and content variables. The frequency component is used for causal decoupling and counterfactual reasoning, and combined with unsupervised clustering and feature fusion, to achieve cross-domain medical image segmentation.
It improves the model's structural segmentation accuracy and generalization performance in unseen medical image domains, reduces the interference of imaging differences and contextual confounding factors on model decisions, and enhances the robustness and consistency of segmentation results.
Smart Images

Figure CN122244068A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image data processing technology, specifically a cross-domain medical image segmentation method based on frequency domain causal learning. Background Technology
[0002] In recent years, big data and deep learning technologies have made remarkable progress in computer vision tasks such as image segmentation, object detection, and image captioning. However, most deep learning models strictly adhere to the "independent and identically distributed" assumption. In practical applications, external factors such as lighting, viewpoint, background, and scanning protocol can cause domain heterogeneity between training and testing data. This phenomenon, also known as domain shift, significantly reduces the robustness and generalization ability of the model.
[0003] While large-scale public datasets such as ImageNet and MS COCO offer millions of instance-level annotations to cover as many real-world scenarios as possible, challenges remain when migrating them to downstream tasks, including network parameter selection and high training and storage costs. Furthermore, in privacy-sensitive tasks, such as medical image segmentation, data sharing among multiple centers is difficult; and the annotation of medical images requires extensive involvement from experienced radiology experts, making it time-consuming and costly. Therefore, fully utilizing limited, accessible source domain data and adapting it to unseen target domains downstream is crucial for improving the accuracy and efficiency of cross-domain image segmentation tasks.
[0004] Domain generalization (DG) is characterized by fully supervised or semi-supervised training using only source domain data, with target domain data only visible at test time. Inspired by domain adaptation techniques, some DG methods learn domain-invariant features based on kernel methods or domain adversarial training. Li et al. (see Li D, Yang Y, Song YZ, et al. Learning to generalize: Meta-learning for domain generalization[C] / / Proceedings of the AAAI conference on artificial intelligence. 2018, 32(1).) proposed a meta-learning method that simulates the Model Agnostic Meta-Learning (MAML) strategy by backpropagating the second-order gradient computed on the random meta-test domain segmented from the source domain at each iteration. Subsequent meta-learning-based DG methods utilize similar strategies to meta-learn regularizers, feature critique networks, or how to preserve semantic relationships. Another type of method is style enhancement, which aims to construct fictional target samples by exchanging / enhancing the appearance attributes of samples from different source domains. Zhang et al. (see L. Zhang et al., “Generalizing deep learning for medical image segmentation to unseen domains via deep stacked transformation,” IEEE Trans. Med. Imag., vol. 39, no. 7, pp. 2531–2540, Jul. 2020.) employed a series of photometric and geometric combination transformations to improve the robustness of source domain data. Chen et al. proposed AdvBIasi (see C. Chen et al., “Realistic adversarial data augmentation formr image segmentation,” in Proc. Int. Conf. Med. Image Comput. Comput. Assist. Intervent. (MICCAI). Cham, Switzerland: Springer, 2020, pp. 667–677.), which employs an adversarial enhancement technique based on a multiplicative bias field model to improve the accuracy of cross-center magnetic resonance segmentation.
[0005] Nevertheless, the aforementioned methods relax the constraints on the availability of source domain data. Specifically, due to privacy or cost constraints, multiple source data are often unavailable, and training data often comes from only a single source domain. Huang et al. proposed the Representation Self-challenging (RSC) method (see Z. Huang, H. Wang, EPXing, and D. Huang, “Self-challenging improves cross-domain generalization,” in Proc. Eur. Conf. Comput. Vis. (ECCV). Cham, Switzerland: Springer, 2020, pp. 124–140.), which achieves single-source domain generalization by suppressing features that cause the maximum loss gradient, assuming these features are domain-dependent. Another mainstream approach combines multiple strategies, including data augmentation, adversarial training, and contrastive learning (see Z. Wang, Y. Luo, R. Qiu, Z. Huang, and M. Baktashmotlagh, “Learning todiversify for single domain generalization,” in Proc. IEEE / CVF Int. Conf.Comput. Vis. (ICCV), Oct. 2021, pp. 834–843.), to simulate real-world imaging perturbations and fully exploit the domain-independent properties of finite-source samples.
[0006] In summary, Single-source Domain Generalization (SDG) remains a challenge in cross-domain image segmentation tasks. Existing data augmentation and feature alignment methods still suffer from the following limitations: 1) Ambiguity of appearance variables: Traditional data augmentation methods typically apply imaging transformations (contrast, intensity, texture, etc.) and geometric transformations (rotation, shearing, scaling, etc.) globally to the image. On the one hand, the lack of standardized constraints on transformation parameters easily leads to overgeneralization of single-source data; on the other hand, the failure to decouple content variables results in semantic inconsistencies before and after sample augmentation. 2) Pseudo-associations of content variables: Content variables can be decoupled into foreground and background variables. However, due to confounding factors in complex backgrounds, objects in the background may be statistically correlated with the object of interest rather than causally correlated, resulting in pseudo-associations between content variables. Examples include electrocardiogram metal electrodes in chest X-rays and abdominal tissue in different sequences of the same instance on MRI. The occurrence of pseudo-associations ultimately affects the model's decision rules, causing ambiguity in segmentation boundaries. 3) Structural Sensitivity: In medical image segmentation tasks, inter-domain differences often manifest not only as changes in appearance style but also directly affect the imaging representation of anatomical structures. For example, under different scanning devices, reconstruction algorithms, or imaging sequences, the same tissue boundary may exhibit different contrast relationships and morphological sharpness, making the model prone to misinterpreting imaging differences as structural differences. When appearance variable perturbations are coupled with actual anatomical changes, existing methods struggle to guarantee the consistency of structural semantics, leading to boundary drift and organ morphology missegmentation problems in cross-center or cross-modal data. 4) Spatiotemporal Complexity: Most data augmentation methods require additional training of generative models to achieve style simulation of unseen domains, inevitably introducing additional computational and storage costs.
[0007] Therefore, how to overcome the ambiguity of appearance variables and the pseudo-association problem of content variables at the same time, and achieve efficient and accurate cross-domain image segmentation, has become an urgent technical problem to be solved. Summary of the Invention
[0008] The purpose of this invention is to address the problems in multi-center medical image segmentation tasks, such as significant cross-domain data distribution shifts, spurious correlations among tissue content variables, and insufficient generalization ability of segmentation models caused by differences in imaging equipment, scanning protocols, and acquisition environments. This invention proposes a cross-domain medical image segmentation method based on frequency domain causal learning to solve these problems.
[0009] To achieve the above objectives, the technical solution of the present invention is as follows:
[0010] A cross-domain medical image segmentation method based on frequency domain causal learning includes the following steps:
[0011] 11) Preprocessing of medical image segmentation dataset: Randomly select a region of a medical image as the source domain and divide it into training and validation sets according to a certain ratio. The remaining region is used as the target domain and preprocessed.
[0012] 12) Causal decoupling of appearance variables: Based on the structured causal model, from the perspective of image generation, the discrete Fourier transform is used to transform the image from the spatial domain to the frequency domain, and the multiple frequency components contained in the image are statistically analyzed and stored.
[0013] 13) Content variable causal decoupling: Based on the truth mask, the background region of the image is extracted, and pixels are clustered in an unsupervised manner. Then, based on the pixel block feature association, multiple semantic structures are divided to generate a contextually mixed observation set.
[0014] 14) Two-stage causal counterfactual reasoning: Based on multiple appearance factors decoupled from each image, quantitative interventions of different intensities are applied, and different factors are adaptively assigned corresponding weights according to the changes in causal effects before and after the intervention; the two-stage counterfactual reasoning method includes: a counterfactual reasoning stage and a component-perceptual feature alignment stage;
[0015] 15) Content Variable Backdoor Adjustment: Treat the pre-decoupled content variables as a contextually mixed set, and weight the mixed observation set based on the normalized weighted geometric mean algorithm. Each context factor, along with the spectrally-interventional sample, is used for feature fusion at the encoder.
[0016] 16) Segmentation model training and result generation: The cross-domain medical image segmentation model is trained end-to-end by jointly optimizing the segmentation loss and causal constraint loss. After training, the medical image to be segmented is input into the cross-domain medical image segmentation model, and the corresponding pixel-level segmentation result is output.
[0017] The preprocessing of the medical image segmentation dataset includes the following steps:
[0018] 21) Randomly select a region of a medical image as the source domain. The original images serve as training samples. Pixel-dimensional segmentation labels ,in Indicates the total number of categories. Represents pixels Belongs to the first Classes, and the remaining multiple domains Used as the target domain for model testing. H is the total number of samples in the source domain, H is the image height, and W is the image width.
[0019] 22) For the training samples, geometric augmentation and intensity augmentation are used to increase sample diversity, and the training set, validation set and test set are randomly divided.
[0020] The causal decoupling of appearance variables includes the following steps:
[0021] 31) Representing appearance factors as statistical characteristics of medical images on different frequency components in the frequency space, with different frequency components corresponding to different types of appearance change factors, introduces frequency component decomposition to map the appearance factors of medical images to low-frequency, mid-frequency, and high-frequency components in the frequency space. By adaptively learning the causal effect of appearance factors corresponding to each frequency component on the domain shift, causal decoupling and domain generalization of appearance changes are achieved.
[0022] Perform a discrete Fourier transform on the training samples in the source domain:
[0023] ,
[0024] in, Represents the Discrete Fourier Transform operator; For frequency domain coordinates, For frequency indexing in the vertical direction, Frequency index in the horizontal direction; Represents the first in the source domain One sample; Representing an image In spatial location The pixel value at that location, where For row index, For column indexes; For the spatial resolution of the image, The image height, i.e., the number of rows. This refers to the image width, i.e., the number of columns. This represents summing over all spatial locations in the entire image; These are complex exponential basis functions used to decompose spatial signals into different frequency components; The imaginary unit;
[0025] 32) Based on bandpass mask The frequency components are decomposed into N frequency components with equal bandwidth. Each frequency component corresponds to an appearance sub-factor, which is used to characterize the specific appearance change pattern under the frequency band.
[0026] ,
[0027] ,
[0028] ,
[0029] in, The scale factor of the mask controls the bandwidth. The number of frequency components; Obtained through frequency band decomposition A set of frequency components; For the first The frequency representation corresponding to each frequency sub-band; Indicates mask In frequency coordinates The value at; Represents frequency coordinates The maximum distance to the center point of the image; Represents the minimum spatial dimension of an image; The cutoff radius indicates the frequency band and determines the size of the passband. Multiplying the representative elements by their dimensions;
[0030] 33) To characterize the intra-domain statistical properties of each outlier factor in the frequency space, a corresponding Gaussian distribution of spectral information is constructed for each frequency component to describe the statistical distribution of the outlier factor and support subsequent causal modeling:
[0031] ,
[0032] in, The variance of the frequency component distribution. The magnitude of represents the intensity of the change in the frequency components of the potential domain offset, which facilitates quantitative intervention to simulate the occurrence of domain offset with differentiated noise; Represents the frequency component index; Represents the source domain sample index; This refers to the nth frequency component of the i-th source domain sample. This represents the mean of the nth frequency component in the source domain. This is the nth frequency component.
[0033] The causal decoupling of content variables includes the following steps:
[0034] 41) Construct the context observation image set corresponding to the content variables:
[0035] By using a source domain segmentation mask to occlude the target object and removing pixel information that is semantically related to the target, we can obtain content variable observation samples that contain only contextual information and construct a contextual image set. :
[0036] ,
[0037] in, Represents the original source domain image and source domain tags The extracted contextual image samples are used as initial observations for content variables. For the source domain dataset;
[0038] Based on this context image set The Kmeans++ algorithm is used to cluster pixels to divide spatial regions with consistent contextual attributes, thereby constructing a set of image patches. ,
[0039] ,
[0040] Where N is a hyperparameter used to control the granularity of content variable space partitioning. To indicate the first The geometric region of an image block corresponding to a pixel cluster;
[0041] 42) Extracting high-dimensional contextual feature representations of content variables:
[0042] pixel block set Feed into the pre-trained convolutional neural network Used to extract contextual feature sets.
[0043] Given the first Geometric set of image patches The corresponding high-dimensional convolutional feature extraction process is represented as follows:
[0044] ,
[0045] in, Represents source domain samples In the geometric region Image blocks obtained by cropping; These are the corresponding high-dimensional context features, used to characterize the observational form of content variables in this spatial region;
[0046] 43) Generate promiscuous observations of content variables and complete causal decoupling:
[0047] To obtain a stable representation of content variables, based on region High-dimensional contextual features of all samples Perform kmeans++ clustering:
[0048] ,
[0049] in, Representative in the region The k-th content variable feature cluster;
[0050] Based on the clustering results, the image patch corresponding to the corresponding sample ID within each cluster is extracted and pixel averaging is performed:
[0051] ,
[0052] in, This represents the total number of samples in the k-th cluster;
[0053] By averaging image patches within the same contextual pattern, discriminative differences related to the target structure in individual samples are eliminated, resulting in improved outcomes. Only stable contextual statistical properties are retained and defined as the final observation form of content variables, thereby achieving causal decoupling between content variables and target semantic variables.
[0054] The two-stage counterfactual reasoning of causality includes the following steps:
[0055] 51) In order to adaptively learn the causal effects of different components, given a source domain image First, randomly select Each frequency component and its corresponding interference intensity Combining the intra-domain statistical properties from step 33), namely variance Random intervention :
[0056] ,
[0057] in, This represents the nth frequency component of the i-th source domain image; This indicates the interference strength applied to the nth frequency component; This represents the interference intensity value randomly extracted from the normal distribution; Represents a normal distribution. To obtain the variance information of the frequency component distribution in the source domain obtained statistically in the th... The values of each frequency component are used to constrain the scale of random interventions, so that the intensity of the intervention is consistent with the statistical characteristics of that frequency component.
[0058] Then A linear combination of frequency components is used to back-project frequency domain samples back into the spatial domain based on inverse Fourier transform, constructing frequency-enhanced synthetic samples to simulate the neighborhood shift in real-world scenarios.
[0059] ,
[0060] in, Represents the inverse Fourier transform; This represents the interference intensity corresponding to the i-th frequency component; This represents the source domain image after frequency intervention;
[0061] 52) In order to quantitatively learn the causal influence of N frequency components on the segmentation results, in Based on this, frequency interventions of varying intensities are applied to each component, which is defined here by the concept of a causal model. , This represents an intervention applied to a node in the causal graph. Therefore, the segmentation result based on the intervention is expressed as:
[0062] ,
[0063] in, Represents the set of intervention intensities. These are the decoder and encoder in the segmentation model, respectively. Represents the frequency component for the i-th source domain image. After intervention, the image labels output by the segmentation model are determined; Indicates a given medical image For its frequency components After intervention, the corresponding segmentation label The conditional probability distribution is used to characterize the model's prediction uncertainty regarding the mapping relationship between the intervention image and the label;
[0064] 53) No. The causal effect caused by each frequency component is calculated as follows:
[0065] ,
[0066] in, , Used to characterize the degree of influence of different frequency components on the neighborhood offset. This indicates that, without applying causal intervention to the frequency component, the segmentation model performs well on the input image. The resulting predicted output distribution; based on this causal effect, a causal weight estimation network is further constructed. Its output serves as the feature weights:
[0067] ;
[0068] Here, softmax is a normalization exponential function used to convert the weight vector into a probability distribution form; Align weights for the features of each frequency component. The prediction results of the weight estimation network;
[0069] 54) Component-aware feature alignment stage: To alleviate domain offset under different frequency components, N feature mapping networks are constructed for N frequency components. Used to randomly augment samples The sample is reprojected into the original sample feature space, and then the frequency component perceptual alignment loss is jointly optimized from the perspectives of feature distance and classification consistency.
[0070] The content variable backdoor adjustment includes the following steps:
[0071] 61) Based on the background context observation set, perform content variable backdoor adjustment on the background context features of the input medical image, including:
[0072] Based on a pre-built set of content variable observations The similarity between the background features of the input image and k content variables is calculated, and the content variable observation set is weighted and aggregated according to the prior distribution probability of each content variable to obtain the context-specific representation B.
[0073] 62) The context-specific representation is input into the encoder of the segmentation model, and the encoded features corresponding to the original medical image and the medical image after frequency domain intervention are fused. The fused features are used to calculate the final causal constraint loss.
[0074] The segmentation model training and result generation includes the following steps:
[0075] 71) Jointly optimize the source domain segmentation loss and causal constraint loss to constrain the feature consistency of the segmentation model under different scenarios of intervention of appearance variables and adjustment of content variables;
[0076] 72) Iteratively execute model training until the model converges, fix the segmentation model parameters, and use the trained segmentation model to perform segmentation prediction on the target domain medical image, and output the segmentation result of the target domain medical image.
[0077] A computer-readable storage medium storing a computer program that, when executed by a processor, enables a cross-domain medical image segmentation method based on frequency domain causal learning.
[0078] A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, which, when executed by the processor, enables a cross-domain medical image segmentation method based on frequency domain causal learning.
[0079] Beneficial effects
[0080] This invention provides a cross-domain medical image segmentation method based on frequency domain causal learning. Compared with existing technologies, this method starts from the medical image generation mechanism, decouples the appearance variables and content variables causally, and combines counterfactual reasoning and backdoor adjustment strategies to reduce the interference of imaging differences and contextual confounding factors on model decision-making, thereby improving the model's structural segmentation accuracy and generalization performance in the unseen medical image domain.
[0081] This invention employs a frequency domain spatial randomization strategy, which, compared to traditional image spatial domain random transformations, reduces modifications to image-domain invariant structures, preserving semantic structure while enhancing the model's domain generalization ability. In medical imaging scenarios, this strategy effectively maintains key anatomical structural information such as organ boundaries and lesion morphology, mitigating imaging style shifts caused by variations in imaging equipment or scanning parameters.
[0082] This invention proposes a two-stage causal counterfactual reasoning method that adaptively estimates the causal effects of interventions by components at different frequencies, while implicitly guiding the encoder to learn domain-shared features through feature alignment. This method enables the model to focus on stable causal features related to anatomical semantics, reducing dependence on domain-specific imaging patterns, thereby improving segmentation robustness and decision consistency in unseen medical image domains.
[0083] This invention utilizes a content variable backdoor adjustment method to extract background contextual cluttered observation sets in an unsupervised manner, while simultaneously mixing them with the original semantic features in the feature space. This helps overcome spurious associations between foreground targets and background, further improving segmentation accuracy. In medical image segmentation tasks, this mechanism can reduce the interference of complex tissue backgrounds or pseudo-correlated structures on the model's discrimination process, improving the accuracy and boundary consistency of target organ and lesion region identification. Attached Figure Description
[0084] Figure 1 This is a sequence diagram of the method of the present invention;
[0085] Figure 2 This is a schematic diagram of the frequency component visualization analysis of the present invention;
[0086] Figure 3 This is a graph showing the results of generating the context observation set for this invention.
[0087] Figure 4 This is a comparison chart of the generalized segmentation results in the field of this invention. Detailed Implementation
[0088] To provide a better understanding of the structural features and effects achieved by the present invention, a detailed description is provided below, accompanied by preferred embodiments and accompanying drawings:
[0089] like Figure 1 As shown, the cross-domain medical image segmentation method based on frequency domain causal learning described in this invention includes the following steps:
[0090] S1: Preprocessing of multicenter prostate image segmentation dataset.
[0091] This embodiment uses a publicly available dataset of T2-weighted prostate MRI images from six different medical centers, and its statistical information is shown in Table 1. Since MRI images are usually stored in the form of 3D voxels, they first need to be formatted as 2D sequence images, then uniformly scaled to a grayscale image of size 192×192 along the axial plane, and then all 3D slices are normalized with zero mean and unit variance.
[0092] To enhance the diversity of training data, geometric enhancement and intensity enhancement are randomly applied before each data loading. In this embodiment, the geometric enhancement includes affine transformation and elastic transformation, while the intensity enhancement includes brightness, contrast, gamma transformation, and additive Gaussian noise. For training each source domain, this embodiment sequentially selects one domain. The remaining 5 domains serve as the source domain and the target domains. Used to evaluate the model's domain generalization ability. For the source domain dataset. Furthermore, the training set, validation set, and test set are randomly divided into three groups at a ratio of 70%-10%-20%.
[0093] S2: Perform a Discrete Fourier Transform on the source domain samples from step S1. The transformation formula is as follows:
[0094] ,
[0095] Here, H and W represent the original image size, i.e., 192. Since MRI slices are grayscale images, only a single channel needs to be processed. The above transformation converts the original image from the spatial domain to the frequency domain, preserving the anatomical semantics (phase spectrum) while facilitating subsequent interventional analysis of the domain-changing components (phase spectrum).
[0096] S3: The Fourier-transformed image is decomposed into frequency components using a bandpass filter. The decomposition formula is as follows:
[0097] ,
[0098] ,
[0099] in, Used to control the size of the bandpass mask. The representative element is multiplied. In this embodiment, 14 frequency components are decomposed with equal bandwidth. It includes multiple spectral components of low, medium and high frequencies, which facilitates the subsequent differential analysis of causal effects of domain offset.
[0100] S4: For each source domain, a total of For each sample, based on the frequency components decomposed in the S3 step, the distribution information within the domain is statistically analyzed and stored. The calculation formula is as follows:
[0101] ,
[0102] Among them, variance The magnitude of this value indicates the intensity of the change in the frequency components of the latent domain shift, facilitating subsequent quantitative intervention and using differentiated noise to simulate the occurrence of domain shift. It represents the expected value of a certain frequency component of the sample within the domain. The final domain statistics are stored in .pkl format and loaded when needed.
[0103] Steps S2-S4 above constitute the causal decoupling process for appearance variables.
[0104] S5: To construct the background context observation set, this embodiment uses the K-means++ algorithm to cluster pixels and construct an image patch set. In this embodiment, the number of cluster centroids is selected. However, due to the lack of explicit annotations for the background anatomy, further classification of the pixel block set is required. This example selects a ResNet-18 network pre-trained on ImageNet as the feature extractor. . Set of pixel blocks The uniform size is 224×224, then it is sent in. Perform forward propagation, extract the output features of the last convolutional layer, and perform global average pooling to extract a 1×1×512 dimensional feature set. ,in, The total number of image patch samples, This represents the number of channels in the last convolutional layer, 512.
[0105] S6: Based on the high-dimensional convolutional features extracted in step S5, the K-means++ algorithm is used again for feature clustering. According to the clustering results, the image patch corresponding to the sample ID within each cluster is extracted and pixel-averaged. ,in, Representing the The total number of samples within each feature cluster.
[0106] Steps S5-S6 above constitute the causal decoupling process for content variables.
[0107] S7: Given a source domain image First, randomly select Each frequency component and its corresponding interference intensity Random intervention was carried out by combining the intra-domain statistical information from step S4. Its expression is:
[0108]
[0109] Then The transformation linear combination of the frequency components, based on the inverse Fourier transform, backprojects the frequency domain samples back into the spatial domain to construct frequency-enhanced synthetic samples, simulating the neighborhood shift in real-world scenarios. Its expression is:
[0110] .
[0111] S8: Further, in order to quantitatively learn the causal effects of the 14 frequency components on the segmentation results, based on the do operator... For each frequency component, random interventions of varying intensities are applied, and the intervention expression is as follows:
[0112] ,
[0113] in, Represents the set of intervention intensities. These are the decoder and encoder in the segmentation model, respectively. In this embodiment, Set as This is used to perform differential quantitative perturbations on the corresponding frequency components. For the segmentation model, this embodiment uses the U-Net network from the segmentation-model-pytorch image segmentation model library, where the encoder part uses EfficientNet-b2 as the backbone network.
[0114] S9: The calculation of causal effect can be converted into the difference in segmentation results, and the calculation formula is as follows:
[0115] ,
[0116] in, , This value is used to characterize the influence of different frequency components on the neighborhood offset; the larger the value, the more severe the influence of the corresponding component. Since the output of the segmentation head is a torch tensor of shape 2×192×192, where 2 represents the number of classes (i.e., 0-background, 1-prostate) and 192 represents the original image size, to facilitate subsequent estimation of feature alignment weights and further reduce computational cost, causal effects also need to be considered. Perform global average pooling along the spatial dimension to obtain a 2×1×1 one-dimensional tensor.
[0117] S10: Causal Effect The impact of different frequency components on the segmentation results was characterized. To achieve subsequent feature weighted alignment, a causal weight estimation network was further constructed. In this embodiment, a two-layer fully connected network is used as... Its [input channel—intermediate channel—output channel] structure is [2—36—1], where the output of the first layer uses ReLU activation to introduce more nonlinear factors. The final causal weights can be calculated as follows:
[0118] ,
[0119] in, This represents a 14-dimensional normalized weight vector, with each corresponding to a feature alignment weight for a frequency component.
[0120] S11: Construct 14 feature mapping networks for the 14 frequency components. In this embodiment, a 4-layer fully connected network is used as... Its [input channel—intermediate channel—output channel] structure is [576—256—256—256—576], where the outputs of layers 1 to 3 use ReLU for non-linear activation. Shallow layers have high feature space resolution but weak semantic meaning; conversely, deep layers have low feature space resolution but strong semantic meaning. To combine the advantages and disadvantages of both, this embodiment concatenates the channels of the features after global average pooling of the last 5 convolutional blocks of EfficientNet-b2. The number of channels in the last 5 convolutional blocks are [32, 24, 48, 120, 352], so the number of channels after concatenation is 576, corresponding to... The input dimension. Further, based on this cascaded feature, the objective function is jointly optimized from the perspectives of feature distance and classification consistency:
[0121] ,
[0122] ,
[0123] in, This represents the global average pooling operation. Represents the mean squared error loss. This represents the classification cross-entropy loss. The first term of the objective function above measures the loss of the original samples. and random augmented samples The first term of the objective function above is used to encourage randomly augmented samples to belong to the same class as the original samples, thus maintaining the consistency of the segmentation results, based on the differences in feature distribution between them.
[0124] The above steps S7-S11 constitute a two-stage causal counterfactual reasoning process, which mainly includes two stages: the counterfactual reasoning stage (S7-S10) and the component-perceived feature alignment stage (S11).
[0125] S12: The spurious association between the foreground and background in an image arises because the model fits a specific combination of representations between them. To break this spurious association, contextual feature intervention is needed based on the background context observation set obtained in step S6. Combined with causal graph analysis, backdoor adjustment of the image background can be described as... ,in It is a set If a specific representation of an image is obtained as a context template and there are c context confounding factors, then c forward propagations are required, which is computationally expensive. Therefore, this example uses the NWGM algorithm to move the sample dimension combination to the feature dimension, and further obtains... In this case, only one forward propagation is needed. Context-specific representation The calculation formula is:
[0126] ,
[0127] in, for The prior probability distribution, in this example, is set as follows. That is, the first The proportion of the total number of samples in each feature cluster to the total number of samples in the whole; A normalized similarity measure used to evaluate context-confused observation sets. The i-th element With input background The similarity between them is calculated using the following formula:
[0128] ,
[0129] in, A learnable projection matrix, used to... , Projected onto a unified joint space; This is a scaling factor used for feature normalization.
[0130] S13: Context-specific representation The input encoder is used to fuse features with the encoded outputs of the original image and the intervention image. Taking the encoding of the original image as an example, firstly, features are extracted... and The output feature map of the last convolutional block has a size of [6×6×352]. Concatenating the two along the channel dimension yields a feature map of [6×6×704]. Next, to preserve spatial structure, this embodiment uses 1-D convolution to compress the concatenated image back to the original number of channels. Finally, ReLU is used for non-linear activation. The expression for the above operation is as follows:
[0131] ,
[0132] Introducing the aforementioned backdoor adjustment steps into step S11, we further obtain the objective function:
[0133]
[0134]
[0135] Steps S12-S13 above constitute the backdoor adjustment process for content variables.
[0136] S14: Initialize hyperparameters. In this embodiment, the initial learning rate is... The learning rate decay strategy is CosineAnnealing, the optimizer is SGD, the number of iterations is 20K, and the batch size is 6.
[0137] S15: Jointly optimize the source domain segmentation loss Domain alignment loss ,in The calculation formula is:
[0138]
[0139] Where k=0,1 represents the background and foreground (prostate) in the segmentation mask; n is the number of pixels in the corresponding category; These are the truth label and the predicted label, respectively. Therefore, the overall optimization objective is:
[0140] ,
[0141] S16: Iterate through step S15 until the model converges, freeze the network parameters, and make predictions for the target domain. In this embodiment, the evaluation metrics are Dice score (%) and ASD (mm).
[0142] Steps S14-S16 above constitute the model optimization and result generation process.
[0143] To verify the effectiveness of the method of this invention, specific experiments will be conducted on a public dataset for verification and explanation. Experimental environment: Ubuntu 18.04 operating system, Intel(R) Xeon(R) CPU E5-2620v4 @2.10GHz×32, 128GB memory; NVIDIA TITAN X (12GB). The algorithm code involved is implemented in Python and PyTorch framework.
[0144] Experimental Data: The experiment used prostate MRI image data collected from six different medical data centers. Each dataset contained pixel-by-pixel labels of the prostate. Due to the influence of external parameters such as sampling field strength, equipment supplier, and scanning protocol, there were varying degrees of neighborhood shift between the samples from different centers (neighborhoods), which met the experimental requirements for neighborhood generalization segmentation. Specific details of the datasets from each center are shown in Table 1.
[0145] Table 1. Statistical Table of Multicenter MRI Imaging Information
[0146]
[0147] refer to Figure 2 The domain information contained in different frequency components is shown separately. For ease of visualization and analysis, this embodiment randomly selects an MRI slice from domain A and domain C, respectively. By adjusting the bandwidth coefficient of the bandpass filter, the corresponding frequency components are decomposed and inversely transformed back to the image spatial domain. Comparison reveals that low frequencies and high frequencies (see...) Figure 2 Columns 2 and 5 contain information on variations in the domain, such as grayscale changes and appearance style of the image; while mid-frequency (see...) Figure 2 The components in columns 3 and 4 are mostly domain-invariant information, such as the anatomical structure and shape variations of the image. These results indicate that the method of the present invention can fully utilize the causal influence of different frequency components on domain shift, and achieve the effect of domain-invariant feature alignment by applying quantitative intervention.
[0148] In this embodiment, based on the anatomical structure composition in the MRI image, K-means++ pixel clustering and ResNet-18 deep feature clustering were performed with cluster center numbers n=4 and 10, respectively. The results are as follows: Figure 3 As shown in the figure. It can be seen that the method of this invention automatically extracts similar anatomical structures in an unsupervised manner, greatly reducing the cost of manual annotation. The final context-mixed observation set is as follows. Figure 3 As shown in the lower part, it can be seen that different confounding factors have different contextual structures, which proves the effectiveness of the method of the present invention in decoupling content variables and also lays the data foundation for the subsequent backdoor adjustment process.
[0149] refer to Figure 4 This paper presents a comparison of the proposed method with Cutout (based on image patch removal), RSC (based on non-robust feature removal), and AdvBias (based on adversarial perturbation image enhancement) in terms of domain generalization segmentation results. The horizontal axis represents the source domain used for training (the remaining five domains are used as target domains for testing), and the vertical axis represents the Dice score (higher scores indicate better performance). Figure 4As can be seen, the method of this invention outperforms the other three comparative methods on most domain generalization tasks. Specifically, when the source domains are A, E, and F, the method of this invention improves the Dice score by 4.84%, 6.99%, and 14.14% respectively compared to the second-best performing algorithm. These results demonstrate that the method of this invention can fully utilize appearance variable intervention and content variable backdoor adjustment to improve the model's cross-domain image segmentation performance.
[0150] In cross-domain image segmentation scenarios, this invention first preprocesses the image segmentation dataset, then performs multi-frequency component counterfactual reasoning and contextual semantic backdoor adjustment based on the decoupled appearance and content variables, and finally completes parameter optimization and result generation in an end-to-end manner. Experiments conducted on a publicly available multi-center prostate MRI dataset verify the feasibility and superiority of the method presented in this invention.
[0151] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are merely principles of the invention. Various changes and modifications can be made to the invention without departing from its spirit and scope, and all such changes and modifications fall within the scope of the claimed invention. The scope of protection claimed by the appended claims and their equivalents is defined.
Claims
1. A cross-domain medical image segmentation method based on frequency domain causal learning, characterized in that, Includes the following steps: 11) Preprocessing of medical image segmentation dataset: Randomly select a region of a medical image as the source domain and divide it into training and validation sets according to a certain ratio. The remaining region is used as the target domain and preprocessed. 12) Causal decoupling of appearance variables: Based on the structured causal model, from the perspective of image generation, the discrete Fourier transform is used to transform the image from the spatial domain to the frequency domain, and the multiple frequency components contained in the image are statistically analyzed and stored. 13) Content variable causal decoupling: Based on the truth mask, the background region of the image is extracted, and pixels are clustered in an unsupervised manner. Then, based on the pixel block feature association, multiple semantic structures are divided to generate a contextually mixed observation set. 14) Two-stage causal counterfactual reasoning: Based on multiple appearance factors decoupled from each image, quantitative interventions of different intensities are applied, and different factors are adaptively assigned corresponding weights according to the changes in causal effects before and after the intervention. The two-stage counterfactual reasoning method includes: a counterfactual reasoning stage and a component-aware feature alignment stage; 15) Content Variable Backdoor Adjustment: Treat the pre-decoupled content variables as a contextually mixed set, and weight the mixed observation set based on the normalized weighted geometric mean algorithm. Each context factor, along with the spectrally-interventional sample, is used for feature fusion at the encoder. 16) Segmentation model training and result generation: The cross-domain medical image segmentation model is trained end-to-end by jointly optimizing the segmentation loss and causal constraint loss. After training, the medical image to be segmented is input into the cross-domain medical image segmentation model, and the corresponding pixel-level segmentation result is output.
2. The cross-domain medical image segmentation method based on frequency domain causal learning according to claim 1, characterized in that, The preprocessing of the medical image segmentation dataset includes the following steps: 21) Randomly select a region of a medical image as the source domain. The original images serve as training samples. Pixel-dimensional segmentation labels ,in Indicates the total number of categories. Represents pixels Belongs to the first Classes, and the remaining multiple domains Used as the target domain for model testing. H is the total number of samples in the source domain, H is the image height, and W is the image width. 22) For the training samples, geometric augmentation and intensity augmentation are used to increase sample diversity, and the training set, validation set and test set are randomly divided.
3. The cross-domain medical image segmentation method based on frequency domain causal learning according to claim 1, characterized in that, The causal decoupling of appearance variables includes the following steps: 31) Representing appearance factors as statistical characteristics of medical images on different frequency components in the frequency space, with different frequency components corresponding to different types of appearance change factors, introduces frequency component decomposition to map the appearance factors of medical images to low-frequency, mid-frequency, and high-frequency components in the frequency space. By adaptively learning the causal effect of appearance factors corresponding to each frequency component on the domain shift, causal decoupling and domain generalization of appearance changes are achieved. Perform a discrete Fourier transform on the training samples in the source domain: , in, Represents the Discrete Fourier Transform operator; For frequency domain coordinates, For frequency indexing in the vertical direction, For frequency indexing in the horizontal direction; Represents the first in the source domain One sample; Representing an image In spatial location The pixel value at that location, where For row index, For column indexes; For the spatial resolution of the image, The image height, i.e., the number of rows. This refers to the image width, i.e., the number of columns. This represents summing over all spatial locations in the entire image; These are complex exponential basis functions used to decompose spatial signals into different frequency components; The imaginary unit; 32) Based on bandpass mask The frequency components are decomposed into N frequency components with equal bandwidth. Each frequency component corresponds to an appearance sub-factor, which is used to characterize the specific appearance change pattern under the frequency band. , , , in, The scale factor of the mask controls the bandwidth. The number of frequency components; Obtained through frequency band decomposition A set of frequency components; For the first The frequency representation corresponding to each frequency sub-band; Indicates mask In frequency coordinates The value at; Represents frequency coordinates The maximum distance to the center point of the image; Represents the minimum spatial dimension of an image; The cutoff radius indicates the frequency band and determines the size of the passband. Multiplying the representative elements by their dimensions; 33) To characterize the intra-domain statistical properties of each outlier factor in the frequency space, a corresponding Gaussian distribution of spectral information is constructed for each frequency component to describe the statistical distribution of the outlier factor and support subsequent causal modeling: , in, The variance of the frequency component distribution. The magnitude of represents the intensity of the change in the frequency components of the potential domain offset, which facilitates quantitative intervention to simulate the occurrence of domain offset with differentiated noise; Represents the frequency component index; Represents the source domain sample index; This refers to the nth frequency component of the i-th source domain sample. This represents the mean of the nth frequency component in the source domain. This is the nth frequency component.
4. The cross-domain medical image segmentation method based on frequency domain causal learning according to claim 1, characterized in that, The causal decoupling of content variables includes the following steps: 41) Construct the context observation image set corresponding to the content variables: By using a source domain segmentation mask to occlude the target object and removing pixel information that is semantically related to the target, we can obtain content variable observation samples that contain only contextual information and construct a contextual image set. : , in, Represents the original source domain image and source domain tags The extracted contextual image samples are used as initial observations for content variables. For the source domain dataset; Based on this context image set The Kmeans++ algorithm is used to cluster pixels to divide spatial regions with consistent contextual attributes, thereby constructing a set of image patches. , , Where N is a hyperparameter used to control the granularity of content variable space partitioning. To indicate the first The geometric region of an image block corresponding to a pixel cluster; 42) Extracting high-dimensional contextual feature representations of content variables: pixel block set Feed into the pre-trained convolutional neural network Used to extract contextual feature sets. Given the first Geometric set of image patches The corresponding high-dimensional convolutional feature extraction process is represented as follows: , in, Represents source domain samples In the geometric region Image blocks obtained by cropping; These are the corresponding high-dimensional context features, used to characterize the observational form of content variables in this spatial region; 43) Generate promiscuous observations of content variables and complete causal decoupling: To obtain a stable representation of content variables, based on region High-dimensional contextual features of all samples Perform kmeans++ clustering: , in, Representative in the region The k-th content variable feature cluster; Based on the clustering results, the image patch corresponding to the corresponding sample ID within each cluster is extracted and pixel averaging is performed: , in, This represents the total number of samples in the k-th cluster; By averaging image patches within the same contextual pattern, discriminative differences related to the target structure in individual samples are eliminated, resulting in improved outcomes. Only stable contextual statistical properties are retained and defined as the final observation form of content variables, thereby achieving causal decoupling between content variables and target semantic variables.
5. The cross-domain medical image segmentation method based on frequency domain causal learning according to claim 1, characterized in that, The two-stage counterfactual reasoning of causality includes the following steps: 51) In order to adaptively learn the causal effects of different components, given a source domain image First, randomly select Each frequency component and its corresponding interference intensity Combining the intra-domain statistical properties from step 33), namely variance Random intervention : , in, This represents the nth frequency component of the i-th source domain image; This indicates the interference strength applied to the nth frequency component; This represents the interference intensity value randomly extracted from the normal distribution; Represents a normal distribution. To obtain the variance information of the frequency component distribution in the source domain obtained statistically in the th... The values of each frequency component are used to constrain the scale of random interventions, so that the intensity of the intervention is consistent with the statistical characteristics of that frequency component. Then A linear combination of frequency components is used to back-project frequency domain samples back into the spatial domain based on inverse Fourier transform, constructing frequency-enhanced synthetic samples to simulate the neighborhood shift in real-world scenarios. , in, Represents the inverse Fourier transform; This represents the interference intensity corresponding to the i-th frequency component; This represents the source domain image after frequency intervention; 52) In order to quantitatively learn the causal influence of N frequency components on the segmentation results, in Based on this, frequency interventions of varying intensities are applied to each component, which is defined here by the concept of a causal model. , This represents an intervention applied to a node in the causal graph. Therefore, the segmentation result based on the intervention is expressed as: , in, Represents the set of intervention intensities. These are the decoder and encoder in the segmentation model, respectively. Represents the frequency component for the i-th source domain image. After intervention, the image labels output by the segmentation model are determined; Indicates a given medical image For its frequency components After intervention, the corresponding segmentation label The conditional probability distribution is used to characterize the model's prediction uncertainty regarding the mapping relationship between the intervention image and the label; 53) No. The causal effect caused by each frequency component is calculated as follows: , in, , Used to characterize the degree of influence of different frequency components on the neighborhood offset. This indicates that, without applying causal intervention to the frequency component, the segmentation model performs well on the input image. The resulting predicted output distribution; based on this causal effect, a causal weight estimation network is further constructed. Its output serves as the feature weights: ; Here, softmax is a normalization exponential function used to convert the weight vector into a probability distribution form; Align weights for the features of each frequency component. The prediction results of the weight estimation network; 54) Component-aware feature alignment stage: To alleviate domain offset under different frequency components, N feature mapping networks are constructed for N frequency components. Used to randomly augment samples The sample is reprojected into the original sample feature space, and then the frequency component perceptual alignment loss is jointly optimized from the perspectives of feature distance and classification consistency.
6. The cross-domain medical image segmentation method based on frequency domain causal learning according to claim 1, characterized in that, The content variable backdoor adjustment includes the following steps: 61) Based on the background context observation set, perform content variable backdoor adjustment on the background context features of the input medical image, including: Based on a pre-built set of content variable observations The similarity between the background features of the input image and k content variables is calculated, and the content variable observation set is weighted and aggregated according to the prior distribution probability of each content variable to obtain the context-specific representation B. 62) The context-specific representation is input into the encoder of the segmentation model, and the encoded features corresponding to the original medical image and the medical image after frequency domain intervention are fused. The fused features are used to calculate the final causal constraint loss.
7. The cross-domain medical image segmentation method based on frequency domain causal learning according to claim 1, characterized in that, The segmentation model training and result generation includes the following steps: 71) Jointly optimize the source domain segmentation loss and causal constraint loss to constrain the feature consistency of the segmentation model under different scenarios of intervention of appearance variables and adjustment of content variables; 72) Iteratively execute model training until the model converges, fix the segmentation model parameters, and use the trained segmentation model to perform segmentation prediction on the target domain medical image, and output the segmentation result of the target domain medical image.
8. A computer-readable storage medium, characterized in that, The storage medium stores a computer program, which, when executed by a processor, enables the cross-domain medical image segmentation method based on frequency domain causal learning as described in any one of claims 1-7.
9. A computer device, characterized in that, It includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it can implement the cross-domain medical image segmentation method based on frequency domain causal learning as described in any one of claims 1-7.