A domain-robust medical image segmentation method combining structure-preserving fourier-domain augmentation and meta-learning

By combining structure-preserving Fourier domain enhancement and meta-learning training strategies, the problem of insufficient cross-domain generalization ability of deep learning models in medical image segmentation due to domain offset is solved, achieving higher boundary segmentation accuracy and robustness.

CN122244067APending Publication Date: 2026-06-19JIANGSU OCEAN UNIV +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
JIANGSU OCEAN UNIV
Filing Date
2025-12-27
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing deep learning models suffer from insufficient cross-domain generalization ability in medical image segmentation due to domain offset, particularly in the coordinated optimization of fine segmentation and stable cross-domain generalization ability in complex boundary regions.

Method used

By combining a structure-preserving learnable Fourier domain augmentation module and an improved meta-learning training strategy, we explicitly simulate style differences in unseen domains by introducing a learnable adaptive augmentation module in the frequency domain, and introduce dilated convolutions in the skip connection module to enhance the ability to capture boundary features. We also combine the MLDG meta-learning strategy to optimize the model's cross-domain generalization performance.

Benefits of technology

It significantly improves the model's generalization robustness and boundary segmentation accuracy in unseen domains, demonstrating good practical value, cross-domain performance stability and accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122244067A_ABST
    Figure CN122244067A_ABST
Patent Text Reader

Abstract

This invention proposes a domain-robust medical image segmentation framework combining Structure-Preserving Fourier Domain Enhancement (SFE) and Meta-Learning (MLDG), aiming to address the severe domain shift problem caused by differences in imaging equipment, institutions, and acquisition methods in fundus image segmentation, and improve the model's generalization ability to unseen domains. A Learnable Frequency Domain Enhancer (LFDA) applies a controllable and learnable perturbation to the amplitude spectrum in the frequency domain while explicitly preserving phase information to ensure anatomical invariance. This enhancer generates challenging style perturbation samples through adversarial optimization and injects them into the query set for meta-learning training, simulating the distribution of unseen domains. Training employs an MLDG meta-learning strategy: the inner loop updates the model on the support set, while the outer loop evaluates and optimizes generalization ability on the frequency-enhanced query set. Experiments show that this method outperforms several baselines and state-of-the-art methods in unseen domains, demonstrating excellent robustness and cross-domain generalization ability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical fields:

[0001] This invention belongs to the fields of artificial intelligence and medical image processing, and specifically relates to a domain generalization (DG) method based on deep learning, which is used to improve the segmentation accuracy and robustness of medical images (such as fundus images) in unseen domains with domain bias. Background technology:

[0002] With the rapid development of deep learning technology, end-to-end segmentation methods based on deep convolutional neural networks (such as U-Net, nnUNet, etc.) have achieved remarkable results in the field of medical image analysis and are gradually becoming the mainstream technical route in this field.

[0003] These methods are typically based on a key assumption: the training and test sets are independent and identically distributed. Under this ideal assumption, the model can learn a stable mapping relationship from images to segmentation masks through a large amount of labeled data and perform well in the testing phase. However, real-world medical imaging environments are highly heterogeneous and complex, making the above assumptions difficult to uphold. Specifically, different medical institutions use significantly different imaging equipment—for example, commonly used fundus cameras include various models such as the Nidek AFC-210, Zeiss Visucam 500, and Canon CR-2; at the same time, factors such as shooting parameter settings, lighting conditions, image post-processing workflows, and patient group characteristics also vary, resulting in significant inter-domain differences in the final acquired images across multiple dimensions, including color style, contrast distribution, noise patterns, and spatial resolution.

[0004] This domain offset problem caused by the diversity of data sources essentially constitutes an out-of-distribution generalization challenge, which severely weakens the generalization ability and segmentation accuracy of deep learning models in unseen domains, thereby greatly limiting their reliable deployment and large-scale application in real clinical environments.

[0005] To address the significant challenges posed by domain shift, domain generalization techniques have become a crucial research area in medical image analysis in recent years. Existing methods primarily explore two main technical paths: one is data augmentation-based strategies, aiming to enhance model robustness by expanding the diversity of training data. Early work focused on traditional augmentation methods in the image space, such as CutMix and MixUp, which construct synthetic samples through linear interpolation or region blending, enhancing the model's adaptability to local feature changes to some extent. In recent years, more and more research has begun to explore the possibilities of frequency domain augmentation. Fourier domain adaptation methods, for example, simulate inter-domain style differences by swapping the amplitude spectrum components of different images, providing a new approach for cross-domain generalization. However, these methods are typically extremely sensitive to hyperparameters (such as the frequency replacement ratio), and directly replacing low-frequency components may destroy important information closely related to anatomical structures in the original image, thus affecting the accuracy of segmentation boundaries. The other important research direction is meta-learning-based training strategies. Methods such as MLDG construct a "meta-training-meta-testing" task simulation mechanism to explicitly optimize the model's generalization ability in unknown domains during training. These methods can learn domain-invariant feature representations across different source domains, but their effectiveness largely depends on the rationality and diversity of the meta-task partitioning.

[0006] While existing research has made initial progress in improving the cross-domain performance of models, current methods still have significant shortcomings when facing two interrelated core challenges in medical image segmentation: the refined segmentation of complex boundary regions and the synergistic optimization of stable cross-domain generalization ability. In particular, how to effectively preserve key anatomical structural information while enhancing data diversity, and how to better simulate the complex and variable domain shift scenarios in real clinical settings within a meta-learning framework, remain weak links in current technical approaches. To address these issues, this paper proposes a novel framework that integrates a structure-preserving frequency domain enhancement mechanism with a two-layer optimized meta-learning strategy. By introducing a learnable adaptive enhancement module in the frequency domain, more natural style perturbations are achieved while preserving the integrity of image structure. Simultaneously, an improved meta-learning training mechanism strengthens the model's ability to extract boundary features and its cross-domain stability, thereby systematically improving the robustness and accuracy of fundus image segmentation methods in real multi-domain environments. Summary of the Invention:

[0007] This invention proposes a domain-robust medical image segmentation method that combines structure-preserving Fourier domain enhancement with meta-learning training strategy. It aims to significantly improve the model's generalization performance on unseen domains with domain shifts by explicitly simulating style differences in unseen domains and enhancing the ability to capture boundary features.

[0008] The technical solution of this invention consists of the following core components:

[0009] S1: A learnable Fourier domain enhancement module (LFDA) incorporating structure preservation: This module achieves adaptive, fine-grained fusion of style features from the source and target domains by applying a learnable attention weight matrix M to the amplitude spectrum. The source domain phase spectrum P is explicitly fixed. src Ensure enhanced sample f enhanced While diversifying the style (amplitude spectrum), the anatomical topology (phase spectrum) remains consistent to avoid destroying structural information. The generated challenging style perturbation samples are used in the meta-learning query set to more effectively model unseen domain distributions.

[0010] S2: A segmentation network based on an improved U-Net structure: dilated convolution is introduced into the skip connection module. This design expands the network's receptive field while preserving shallow high-resolution spatial details, specifically designed to enhance the capture and representation of fine-grained edge details of anatomical structures such as the optic disc / optic cup, thereby improving the accuracy of segmentation boundaries.

[0011] S3: MLDG Meta-Learning Training Strategy Based on LFDA-Enhanced Query Set: The MLDG double-loop optimization paradigm is adopted. The inner loop adapts to the known domain on the support set; the outer loop performs generalization evaluation on the LFDA-enhanced query set, and the loss is calculated using the outer loop. The driving model learns parameters θ that are robust to style perturbations, while simultaneously optimizing the enhancer parameters φ to achieve dynamic, adversarial style enhancement. Boundary-aware loss is introduced during training. Further constrain the boundary alignment accuracy of the model.

[0012] The present invention achieves the following beneficial effects through the above technical solution:

[0013] 1. Improved generalization robustness: LFDA introduces learnable style diversity while maintaining structural invariance. Combined with the MLDG strategy, it exposes the model to the simulated domain offset environment during the training phase, which significantly improves the robustness to style differences of peripheral devices.

[0014] 2. Improved boundary segmentation accuracy: The introduction of dilated convolution at skip connections enhances the ability to extract edge features, effectively solving the core challenge of insufficient boundary refinement in medical image segmentation tasks.

[0015] 3. Significant performance advantages: Under stringent cross-domain testing protocols, this method outperforms existing state-of-the-art methods in terms of average Dice coefficient, mIoU, and boundary metrics 95% HD and ASD in unseen domains.

[0016] It exhibits minimal domain performance volatility, demonstrating excellent practical value. Attached image description:

[0017] Figure 1This is a schematic diagram of the overall framework of the method of the present invention;

[0018] Figure 2 This is a schematic diagram of style fusion using the Learnable Fourier Domain Enhancer (LFDA) described in this invention;

[0019] Figure 3 This is a schematic diagram showing the comparison results of the method of segmenting the optic cup and optic disc in this invention with other models;

[0020] Figure 4 This is a visualization of the multi-domain t-SNE distribution. Detailed implementation method:

[0021] The present invention will be further described below with reference to the accompanying drawings and embodiments.

[0022] This embodiment describes the specific implementation steps of the present invention.

[0023] S1: Data Preparation and Network Initialization

[0024] Collect fundus image datasets from different imaging devices and medical institutions Initialize the backbone parameters θ and the learnable frequency domain enhancer (LFDA) parameters φ of the segmentation network. The segmentation network adopts a U-Net structure that introduces dilated convolutions at skip connections.

[0025] S2: Meta-task Construction

[0026] In each training iteration, a support set is constructed by sampling from the training domain set (e.g., the three domains excluding a hold-out test domain). Original samples used for inner loop training. Original query set. The original sample used for outer loop generalization evaluation.

[0027] S3: Frequency Domain Enhancement (LFDA) The samples are input into the LFDA enhancer for processing to generate an enhanced query set.

[0028] The specific enhancement process follows the steps described in claim 2: decoupling the source domain amplitude spectrum A src and phase spectrum P src The frequency domain attention weight matrix M is learned through a lightweight CNN network; A is then adaptively fused using M. src and target domain amplitude spectrum A tar

[0029] Get A fused ; Ultimately, A fused With P remaining unchanged src Perform an inverse Fourier transform to obtain enhanced samples with style perturbations but consistent structure.

[0030] S4: Inner Ring Adaptation

[0031] The model in Perform forward propagation to calculate boundary-aware support loss.

[0032]

[0033] Then, the virtual parameters are updated (inner loop):

[0034] S5: Outer loop generalization and S6: Parameter update

[0035] Using the adapted model θ ′ exist The above is used for evaluation, and the query loss is calculated. Constructing the target loss

[0036] By minimizing Simultaneously update the segmentation network master parameters θ and enhancer parameters φ using the outer loop learning rate γ:

[0037]

[0038] S7: Iterative Loop

[0039] Repeat steps S2 through S6 until the model converges to obtain the final domain robust segmentation model.

[0040] This embodiment describes the details of the improved segmentation network structure.

[0041] U-Net architecture is based on the standard U-Net encoding-decoding structure.

[0042] Dilated Convolution Skip Connections: Traditional U-Net uses simple feature concatenation as skip connections between corresponding layers in the encoder and decoder. This invention introduces dilated convolutional layers into this skip connection module to process the shallow features output by the encoder. Dilated convolution expands the receptive field by introducing a dilation rate, enabling the acquisition of a wider range of contextual information without increasing computational cost. This design specifically addresses the problem of blurred optic disc / cup edges and difficulty in capturing details in medical image segmentation. By fusing shallow information with a larger receptive field, it effectively enhances the model's ability to capture fine-grained edge information of anatomical boundaries.

[0043] This embodiment describes the experimental environment and performance verification of the present invention.

[0044] Experimental environment: Processor: Intel Xeon(R) Platinum 8352V CPU, Graphics card: RTX 4090D GPU (24GB), Development environment: Python 3.9, PyTorch 2.7.1, CUDA 11.8. Inner loop learning rate η = 1 × 10⁻⁶ -4 The outer loop learning rate γ = 4 × 10 -3 The total number of training generations is 50, and the batch size is 8.

[0045] Datasets: Four publicly available visual disc / cup segmentation datasets: Drishti-GS, RIM-ONE-r3, REFUGE (train), and REFUGE (val).

[0046] Evaluation protocol: A leave-one-out-of-domain testing protocol is adopted. Each time, one domain is selected as the unseen target domain for testing, and the remaining three domains are used as multi-source domains for training.

[0047] Evaluation indicators:

[0048] Region overlap accuracy: Dice similarity coefficient (DSC), mean intersection-union ratio (mIoU).

[0049] Boundary alignment accuracy: 95% Hausdorff distance (95% HD), average symmetric surface distance (ASD).

[0050] Predicted results:

[0051] In the unseen domains, the proposed method achieved the best overall performance. The average DSC reached 0.8736, a significant improvement over both the baseline U-Net (0.8111) and the state-of-the-art DOFE method (0.8586). The average 95% HD was reduced to 8.91 pixels, better than DOFE's 9.37 pixels, indicating a significant advantage in boundary accuracy. Furthermore, the proposed method had a DSC standard deviation of only 0.038 across the four test domains, the lowest among all compared methods, validating the model's good cross-domain stability and robustness.

[0052] Comparison Experiment of Segmentation of Eye Cup / Eye Disc

[0053]

[0054] Summary of beneficial effects:

[0055] This invention enhances the segmentation boundary modeling capability by introducing dilated convolution in skip connections, introduces structurally consistent cross-domain style diversity through learnable frequency domain perturbations, and explicitly optimizes cross-domain generalization performance during training by combining the MLDG meta-learning strategy, ultimately forming a robust medical image segmentation model for clinical applications.

[0056] The above embodiments merely illustrate several implementation methods of the present invention, and their descriptions are relatively specific and detailed, but they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these all fall within the protection scope of the present invention.

[0057] The above embodiments merely illustrate several implementation methods of the present invention, and their descriptions are relatively specific and detailed, but they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these all fall within the protection scope of the present invention.

Claims

1. A domain-robust medical image segmentation method combining structure-preserving Fourier domain enhancement and meta-learning, characterized in that, Includes the following steps: S1: Data Preparation: Collecting Multi-Source Domain Training Sets Initialize the segmentation network parameters θ and the learnable frequency domain enhancer (LFDA) parameters φ; S2: Meta-task construction: In each iteration, from Mid-sampling and dividing into support sets and the original query set S3: Frequency Domain Enhancement: The samples are input into the LFDA enhancer to generate an enhanced query set with style perturbations but consistent structure. S4: Inner Loop Adaptation: The computational model in Support loss Then, perform virtual parameter updates to obtain the adaptation parameters θ. ′ ; S5: Outer loop generalization: using θ ′ exist Calculate query loss Constructing the target loss S6: Parameter Update: By minimizing the... Simultaneously update the segmentation network parameter θ and the LFDA enhancer parameter φ; S7: Iterative Loop: Repeat steps S2 to S6 until the maximum number of iterations is reached or convergence is achieved, to obtain the final domain robust segmentation model.

2. The method according to claim 1, characterized in that, The Learnable Frequency Domain Enhancer (LFDA) in S3 generates the enhancement query set in the following manner. Sample f enhanced (x,y): S3-1: For the source domain image f src Performing a Fourier transform on (x,y) and decoupling yields the source domain amplitude spectrum A. src and source domain phase spectrum P src ; S3-2: Using the amplitude spectrum A of the target domain image tar As input, a frequency domain attention weight matrix M is learned and generated through a lightweight convolutional neural network (CNN); S3-3: Adaptively fuse the A src and A tar A new fusion amplitude spectrum A was obtained fused : A fused =M⊙A tar +(1-M)⊙A src Where ⊙ represents element-wise multiplication; S3-4: Place the A... fused With the source domain phase spectrum P src Perform an inverse Fourier transform to obtain f. enhanced (x,y).

3. The method according to claim 1, characterized in that, The segmentation network in S4 adopts an encoder-decoder structure based on U-Net, and dilated convolution is introduced in the skip connection module of U-Net. The dilated convolution is used to: expand the network's receptive field without losing high-resolution spatial details, and enhance the ability to capture and represent image edge details (such as the boundary between the optic disc and surrounding retinal tissue).

4. The method according to claim 1, characterized in that, Support loss in S4 and the query loss in S5 All use a boundary-aware loss function, which is a weighted combination of segmentation loss, classification cross-entropy loss, and boundary loss:

5. The method according to claim 1, characterized in that, The parameter update in S7 is based on the following formula: Where γ is the outer loop learning rate.