A morphology-adaptive and feature-decoupled network method for medical image segmentation
By collaboratively designing the morphological adaptive ResNeXt module and the context-details decoupling module, the problems of feature overloading and boundary inaccuracy in medical image segmentation of small lesions are solved, achieving high-precision and robust segmentation results, which are suitable for clinical screening of small medical lesions.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA THREE GORGES UNIV
- Filing Date
- 2026-03-30
- Publication Date
- 2026-06-26
AI Technical Summary
Existing medical image segmentation models cannot simultaneously solve the dual dilemmas of feature overload and inaccurate boundary localization when dealing with medical microlesions with a pixel ratio of less than 2%, resulting in segmentation recall and accuracy that are difficult to meet the needs of clinical applications.
We employ a morphological adaptive and feature decoupling network approach. By combining an encoder-decoder architecture with a morphological adaptive ResNeXt module and a context-details decoupling module, we achieve explicit decoupling of semantic context and high-frequency details. By combining large kernel spatial gating and global residual connections, we suppress background noise and prevent weak features from being buried, which significantly improves the detection rate and segmentation accuracy of small lesions.
It achieves high-precision and robust segmentation of small medical lesions, overcomes the bias of large target dominance, improves the segmentation recall and accuracy of small lesions, and has the ability to respond to clinical real-time.
Smart Images

Figure CN122289302A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of medical image processing technology, and in particular to a morphological adaptation and feature decoupling network method for medical image segmentation. Background Technology
[0002] Medical image segmentation is one of the core technologies of computer-aided clinical diagnosis. Most existing mainstream segmentation models are based on encoder-decoder architecture or visual Transformer architecture, which have achieved good results in the segmentation of conventional organs and large-scale lesions. However, when dealing with the segmentation of medical micro-lesions, which usually account for less than 2% of the pixel area, there are common core technical problems: existing models cannot simultaneously solve the dual dilemma of feature submersion and inaccurate boundary localization. On the one hand, continuous downsampling and global context aggregation can easily lead to the weak features of micro-lesions being submerged by the mean of the background. At the same time, shallow detail features are mixed with a lot of noise, which can easily lead to missed detection and false positives. On the other hand, the rigid receptive field of standard convolution cannot dynamically fit the non-rigid complex geometric boundaries formed by the infiltrative growth of micro-lesions. Moreover, deformable convolution is easily affected by noise interference in low-contrast scenes, resulting in deformation drift, which leads to jagged segmentation edges and boundary overflow. In addition, model optimization is also easily affected by the bias of large target dominance, sacrificing the segmentation accuracy of micro-lesions to fit large-scale lesions. Ultimately, the recall and precision of micro-lesion segmentation are difficult to meet the needs of clinical applications. Summary of the Invention
[0003] To address the problems existing in the background art, this invention provides a morphological adaptation and feature decoupling network method for medical image segmentation, the specific technical solution of which is as follows: A morphological adaptation and feature decoupling network method for medical image segmentation includes the following steps: Step 1: Using an encoder-decoder architecture, the input medical image is subjected to four downsampling and feature encoding operations, four upsampling and cascaded decoding operations, and the decoded features are subjected to one independent context-details decoupling process. Then, the pixel-level segmentation result is output through a 1×1 convolution. Step 2: Configure the morphological adaptive ResNeXt module and the context-details decoupling module in the encoder. The context-details decoupling module is deployed at the end of each downsampling and before the decoding output. The encoder is also configured with an efficient small-channel attention module to complete the filtering of redundant channels.
[0004] Furthermore, the morphologically adaptive ResNeXt module is constructed based on ResNeXt multi-branch topology and deformable convolution. The feature cardinality G of the ResNeXt bottleneck layer and the number of deformable groups in the deformable convolution are set to the same value. A dedicated two-dimensional spatial offset and modulation scalar are configured for each independent semantic subspace. The morphologically adaptive ResNeXt module first reduces the channel dimension to C / 2 using a 1×1 convolution on the input features, and then generates branches with 3K output channels using the offset. 2 The parameters of G, with K set to 3, are divided along the channel dimension. The first 2 / 3 of the parameters are used as spatial offset parameters, and the last 1 / 3 of the parameters are activated by Sigmoid and used as modulation mask parameters.
[0005] Furthermore, in the morphologically adaptive ResNeXt module, the g-th feature subspace is located at any position in the feature map. Output The definition is as follows: ; in, This represents the kernel weights of the g-th group; Input features; Standard 3×3 convolutional regularized grid The key improvement lies in the enumerated positions in the two-dimensional space; With modulation scalar All are explicitly bound to the g-th semantic subspace.
[0006] Furthermore, the context-decoupling module constructs a heterogeneous two-stream feature architecture with a receptive field, including a moderate field-of-view context branch and a raw high-frequency detail branch, which converts the input feature tensor into a single data structure. The input is processed in parallel by two branches. Both branches reduce the dimension of the input feature channels to C / 2 through 1×1 convolution. After channel dimension alignment, the output features of the two branches are concatenated in the channel dimension to form a joint feature.
[0007] Furthermore, the appropriate field-of-view context branch adopts a hybrid dilated convolution strategy, which sequentially passes the dimensionality-reduced features through 3×3 convolutional blocks with dilation rates d=1 and d=3, thereby expanding the effective receptive field to 9×9 pixels; the original high-frequency detail branch completes nonlinear feature mapping through 1×1 convolution after dimensionality reduction, performing linear combination only in the channel dimension to maintain a 1×1 spatial receptive field.
[0008] Furthermore, the joint features obtained from the context-decoupling module are concatenated and then used to generate a single-channel spatial attention map through a 7×7 large kernel convolution. After processing by the Sigmoid activation function, the map is multiplied with the joint features by Hadamard, and then channel dimensionality reduction and feature fusion are completed through a 1×1 convolution. At the same time, a global residual connection is introduced to superimpose the input feature tensor with the fused features to obtain the output features of the context-decoupling module.
[0009] The above technical solution has the following beneficial effects: This invention addresses the technical challenges of segmenting small medical lesions by achieving high-precision and robust segmentation through the collaborative design of a morphologically adaptive ResNeXt module and a context-decoupling module. The heterogeneous dual-stream architecture of the context-decoupling module explicitly decouples semantic context from high-frequency details. Combined with large-kernel spatial gating and global residual connections, it effectively filters background noise and prevents weak features from being overwhelmed, significantly improving the detection rate of small lesions. The morphologically adaptive ResNeXt module's full-group geometric alignment strategy assigns dedicated offsets and modulation scalars to each semantic subspace, suppressing deformation drift and allowing the receptive field to adaptively fit the complex, non-rigid boundaries of the lesion, eliminating edge jaggedness and improving segmentation geometric accuracy. The complementary collaboration of these two modules enables the model to overcome the bias of large target dominance, achieving scale-invariant acuity segmentation of small lesions without sacrificing performance for large lesions. Key metrics significantly outperform mainstream methods. Simultaneously, the efficient small-channel attention module simplifies features and, with hardware acceleration, enables real-time clinical response, providing reliable computer-aided support for clinical screening of small medical lesions. Attached Figure Description
[0010] Figure 1 This is a visualization of small lesions on the BUSI dataset using different methods in a morphological adaptation and feature decoupling network method for medical image segmentation according to the present invention. The rectangular areas in the figure represent the visual magnification areas.
[0011] Figure 2 This image shows the visualization results of different methods in the morphological adaptation and feature decoupling network method for medical image segmentation of this invention on the BUSI dataset. In the image, rectangular areas represent visual magnification areas, and elliptical areas represent false positives.
[0012] Figure 3 This image shows the visualization results of different methods in the morphological adaptation and feature decoupling network method for medical image segmentation of this invention on the ISIC-2017 dataset. The elliptical boxes in the image represent false positives. Detailed Implementation
[0013] The invention will now be further described with reference to the accompanying drawings.
[0014] Example 1, see Figures 1-3 The morphological adaptation and feature decoupling network method for medical image segmentation shown includes the following steps: Step 1: Using an encoder-decoder architecture, the input medical image is subjected to four downsampling and feature encoding operations, four upsampling and cascaded decoding operations, and the decoded features are subjected to one independent context-details decoupling process. Then, the pixel-level segmentation result is output through a 1×1 convolution. Step 2: Configure the morphological adaptive ResNeXt module and the context-details decoupling module in the encoder. The context-details decoupling module is deployed at the end of each downsampling and before the decoding output. The encoder is also configured with an efficient small-channel attention module to complete the filtering of redundant channels.
[0015] Example 2, based on Example 1, in this example, see... Figures 1-3 As shown, the morphologically adaptive ResNeXt module is constructed based on ResNeXt multi-branch topology and deformable convolution. The feature cardinality G of the ResNeXt bottleneck layer and the deformation group number of the deformable convolution are set to the same value. A dedicated two-dimensional spatial offset and modulation scalar are configured for each independent semantic subspace. The morphologically adaptive ResNeXt module first reduces the channel dimension to C / 2 through 1×1 convolution on the input features, and then generates parameters with 3K2G of branch output channels through the offset, where K is 3. The generated parameters are split in the channel dimension, with the first 2 / 3 of the parameters used as spatial offset parameters and the last 1 / 3 of the parameters used as modulation mask parameters after Sigmoid activation.
[0016] In the morphologically adaptive ResNeXt module, the g-th feature subspace is located at any position on the feature map. Output The definition is as follows: ; in, This represents the kernel weights of the g-th group; Input features; Standard 3×3 convolutional regularized grid The key improvement lies in the enumerated positions in the two-dimensional space; With modulation scalar All are explicitly bound to the g-th semantic subspace.
[0017] The context-detail decoupling module constructs a heterogeneous two-stream feature architecture with a receptive field, including a moderate field-of-view context branch and a raw high-frequency detail branch, which converts the input feature tensor into a single data stream. The input is processed in parallel by two branches. Both branches reduce the dimension of the input feature channels to C / 2 through 1×1 convolution. After channel dimension alignment, the output features of the two branches are concatenated in the channel dimension to form a joint feature.
[0018] The appropriate field-of-view context branch adopts a hybrid dilated convolution strategy, which sequentially passes the dimensionality-reduced features through 3×3 convolutional blocks with dilation rates d=1 and d=3, thereby expanding the effective receptive field to 9×9 pixels; the original high-frequency detail branch completes nonlinear feature mapping through 1×1 convolution after dimensionality reduction, performing linear combination only in the channel dimension to maintain a 1×1 spatial receptive field.
[0019] The joint features obtained from the context-decoupling module are concatenated and then used to generate a single-channel spatial attention map through a 7×7 large kernel convolution. After processing by the Sigmoid activation function, the map is multiplied with the joint features by Hadamard, and then channel dimensionality reduction and feature fusion are completed by a 1×1 convolution. At the same time, a global residual connection is introduced to superimpose the input feature tensor with the fused features to obtain the output features of the context-decoupling module.
[0020] The image data input is sequentially downsampled and feature encoded 4 times, then upsampled and cascaded decoded 4 times to restore the original image resolution, then undergoes one independent context-detail decoupling, and finally outputs pixel-level segmentation results using 1×1 convolution.
[0021] The innovations of this invention mainly focus on the geometric modeling and feature decoupling mechanisms of the early-stage encoding module. In the encoding stage, this invention innovatively designs a Morphological Adaptive ResNeXt module (MARB) and a Context-Details Decoupling module (CDDB). MARB endows the network with the ability to accurately fit the physical boundaries of irregular, small lesions; while CDDB is deployed at the end of each downsampling step and before the decoding output to forcibly strip away semantic flow and high-frequency details, fundamentally preventing the loss of weak signals. Small lesions in medical images (such as early-stage breast tumors and micromelanomas) not only have extremely low pixel proportions but also often exhibit irregular, invasive growth at the edges (such as star-shaped or spiky appearance). Traditional standard convolution, due to its fixed, rigid grid sampling, struggles to fit the complex physical boundaries of lesions.
[0022] While deformable convolutions impart dynamic receptive fields to networks by introducing spatial offsets, they often fall into the trap of "deformation drift" when processing tiny lesions with extremely low contrast. Because tiny target signals are easily interfered with by vast high-frequency background noise, the spatial offsets shared across channels in traditional DCNs are easily dominated by background features, causing sampling points to deviate from the lesion itself and fall into irrelevant background regions. To overcome this challenge, this invention integrates and improves the deformable convolution strategy based on the multi-branch topology of ResNeXt, proposing a morphologically adaptive ResNeXt module (MARB).
[0023] The core innovation of MARB lies in proposing a fully group-aligned architecture paradigm for small targets.
[0024] In the bottleneck layer of standard ResNeXt, features are divided into G independent semantic subspaces (i.e., cardinality, which is set to G=32 in this invention). Traditional DCNs typically share the same set (or very few sets) of spatial offset fields across all channels, which can easily lead to global sampling collapse when background noise dominates. In contrast, MARB forces the number of deformable groups in deformable convolutions to be strictly consistent with the feature cardinality G of ResNeXt. This means that the network is no longer constrained by a single global deformation trend, but rather endows each independent semantic subspace with its own geometric dynamic sampling capability, thereby maximizing the preservation of lesion morphology features from multiple perspectives in complex backgrounds.
[0025] From a mathematical perspective, for the g-th characteristic subspace ( ), any position on the feature map Output The definition is as follows: (1) in, This represents the kernel weights of the g-th group; Input features; Standard 3×3 convolutional regularized grid The key improvement lies in the enumerated positions in the two-dimensional space; With modulation scalar All are explicitly bound to the g-th semantic subspace.
[0026] In the specific network implementation, the MARB module first reduces the dimensionality of the input feature tensor to C / 2 in the channel dimension using a 1×1 convolution during the feature input stage. Subsequently, the number of output channels of the offset generation branch is precisely set to 3K. 2G(K=3). The generated parameters are split along the channel dimension: the first 2 / 3 is used for spatial offset. The latter 1 / 3 is activated by Sigmoid and used as a modulation mask. By employing this multi-path parallel and geometrically independent sampling strategy, MARB effectively mitigates noise interference at the edges of small lesions, achieving precise adaptive wrapping of non-rigid anatomical boundaries.
[0027] By binding ResNeXt's multi-branch features with DCN's adaptive deformation depth, MARB exhibits the following three targeted advantages in the small lesion segmentation task: Suppressing deformation drift: By decoupling global deformation into 32 local geometric probes, even if sampling points in the subspace where high-frequency noise is partially extracted experience background drift, it will not interfere with the deformation trajectories of other subspaces focused on the core texture of the lesion. This fault-tolerant mechanism significantly enhances the localization robustness under low signal-to-noise ratio conditions.
[0028] Subspace-specific receptive fields: MARB endows different semantic channels with differentiated morphological capture capabilities. For example, the receptive fields of some feature subspaces can adaptively remain compact to lock onto the dense core of small tumors; while the receptive fields of other subspaces undergo dynamic irregular stretching to precisely wrap highly invasive star-shaped spurs or irregularly spreading edges, thereby eliminating the edge jaggedness caused by traditional convolution.
[0029] Background noise gating suppression: Independently bound modulation scalars within the module It acts as a pixel-level adaptive spatial gate. When a sampling point falls into an irrelevant background area or a noisy region due to excessive deformation, the network can learn to optimize the location. By approaching zero, high-frequency interference signals are effectively filtered out during the feature aggregation stage, reducing model misjudgments caused by incorrectly introduced background features. In medical microlesion segmentation, standard deep convolutional networks often face an inherent contradiction: in order to fully identify lesions, the network needs to expand the receptive field to obtain the surrounding tissue context; however, as the receptive field expands, continuous spatial convolution and downsampling operations are mathematically equivalent to a "low-pass filter". This not only irreversibly erases the extremely fragile high-frequency edge details of microlesions, but also easily "averages" the weak high-dimensional response of the target and the vast background region, causing the target features to be completely submerged in the deep network.
[0030] To overcome this inherent contradiction, this invention proposes a context-detail decoupling module (CDDB) based on the spatial frequency band decoupling concept. This module parallelizes computational paths with different receptive field spans at the same network layer, achieving decoupling between high-dimensional semantic context and local high-frequency details. This "divide and conquer" strategy not only effectively prevents weak targets from being diluted by a large area of background during context aggregation, but also preserves edge cues that have not been low-pass smoothed to the greatest extent possible.
[0031] Let the input feature tensor be CDDB designed two paths with completely different receptive fields and computational logic: This branch aims to capture the symbiotic relationship between lesions and surrounding normal tissue. For small targets, directly using excessively large convolutional kernels or pooling with large strides easily introduces significant background noise, leading to feature meanification. Therefore, this branch employs a Hybrid Dilated Convolution (HDC) strategy to moderately and smoothly expand the effective receptive field (ERF) while maintaining high spatial resolution. (2) in, This is a dimension-reduced convolution (C→C / 2); This represents a 3×3 convolutional block with a porosity of d. Based on the receptive field derivation, after cascading d=1 and d=3, the effective receptive field of this branch expands precisely to 9×9 pixels. This moderate field of view avoids gridding effects while accurately covering small lesions (typically smaller than 10×10 pixels) and their nearby boundary extravasation areas, achieving dilution-resistant contextual aggregation.
[0032] To combat the low-pass smoothing effect caused by convolution operations, this branch employs a strict identity mapping strategy in the spatial dimension: (3) The first 1×1 convolutional layer is responsible for channel dimensionality reduction (C→C / 2) to align with the context branches. The second 1×1 convolutional layer is used to add non-linear feature mapping. Since the 1×1 convolution only performs linear combination in the channel dimension, its spatial receptive field always remains 1×1. The physical significance of this mathematical property is that it absolutely prohibits the overlap of information between neighboring pixels, and retains high-frequency edge gradients without any spatial smoothing with maximum fidelity, thereby preserving the original spatial resolution and edge sharpness of small targets to the greatest extent.
[0033] After obtaining the decoupled dual-frequency features, simple feature concatenation or element-wise addition cannot adaptively filter out invalid background noise. To address this, CDDB introduces a large kernel spatial gating mechanism based on field-of-view matching. The two feature paths are concatenated along the channel dimension to obtain the joint features in the original dimension. Then, a 7×7 large kernel convolution is used. Generate a single-channel spatial attention map : (4) in The activation function is sigmoid. The motivation for using a 7×7 kernel is to establish a mesoscale spatial receptive field, which, compared to 3×3 convolution, better utilizes the surrounding context to determine the saliency of the center pixel. Finally, the output is generated through gated weighting and residual connections. : (5) in This represents the Hadamard product, which performs a soft thresholding operation to adaptively suppress high-frequency artifacts in the background region. This is a 1×1 convolution used for channel dimensionality reduction and feature fusion. Global residual connections ensure that the extremely weak gradient signals from small lesions do not experience gradient vanishing in deep networks.
[0034] Tiny lesions in medical images are often accompanied by extreme class imbalance, with foreground target pixels typically accounting for less than 2% of the entire image. To maximize the detection rate of these tiny targets while maintaining training stability, this invention does not introduce complex loss function designs. Instead, it adopts a robust baseline configuration widely validated in the field of medical image segmentation: a joint objective function combining binary cross-entropy loss (BCE Loss) and Dice similarity coefficient loss (Dice Loss), defined as follows: (6) in, Focus on pixel-level classification and discrimination capabilities. The focus is on spatial overlap at the aggregation level. Given the characteristics of the microlesions described in this invention, this standard configuration provides the following key synergistic effects: It provides a smooth and consistent pixel-level cross-entropy penalty, effectively anchoring the optimization trajectory in the early stages of training and preventing gradient oscillations caused by a single Dice Loss due to an excessively small target; while Thanks to its mathematical characteristic of being naturally insensitive to the ratio of foreground to background, it forces the model to focus on optimizing the overlap of extremely small areas in the later stages of training, effectively overcoming the "averaging" interference caused by large background areas.
[0035] In all experimental configurations of this study, the weight coefficient λ was uniformly set to 0.5, based on considerations of experimental verification and multi-objective balance. This fundamental and efficient optimization strategy provides a stable gradient environment for MACD-Net to focus on the morphological adaptive feature learning of the architecture itself.
[0036] This invention uses the BUSI and ISIC-2017 public datasets to evaluate the network's segmentation performance. During preprocessing, all input images are uniformly adjusted to a resolution of 512×512. Statistical analysis shows that this invention defines samples with a target region pixel percentage of less than or equal to 2% as microlesions. This threshold can simultaneously evaluate the model's robustness in both "high-frequency routine" and "large-span generalization" scenarios.
[0037] The BUSI dataset contains grayscale images of breast ultrasound, where lesion boundaries are blurred and severely affected by ultrasound speckle noise. After removing normal samples, 630 cases of benign and malignant masses were selected as the experimental subset. Stratified random sampling based on lesion scale and a uniformly defined random seed strategy were used to strictly divide the samples into an independent training set (504 cases) and a test set (126 cases) in an 8:2 ratio. Statistical analysis revealed 142 cases (approximately 22.5%) that met the definition of small lesions, indicating a difficult set of images. In the field of medical image segmentation, the segmentation accuracy for large and medium-sized lesions often approaches saturation, and these 22.5% of low-contrast samples directly determine the upper limit of the model's overall performance. This dataset is primarily used to validate the model's resistance to feature overload under extremely blurred conditions.
[0038] The ISIC-2017 dataset, unlike the BUSI dataset which focuses on small lesion populations, is characterized by its extreme range of target scales. The dataset contains 2000 training images, 150 validation images, and 600 test images. The percentage of lesion pixels ranges from an extreme 0.3% to 93%. Statistical scale distribution shows that medium-sized lesions (approximately 10%-20%) constitute the majority of the data, while 314 samples (11.4%) meet the criteria for ≤ 2% micromelanoma. The core motivation for introducing this dataset is to verify whether the model can successfully overcome the "large target dominance bias" mentioned in the introduction and maintain generalization stability under highly imbalanced mixed scales.
[0039] This experimental environment was implemented using Python 3.8, PyTorch 1.13.1, and the CUDA 11.6 deep learning framework. The training and testing platform ran on Ubuntu 20.04, equipped with an Nvidia RTX 4090 GPU with 24GB of memory. To prevent overfitting, online data augmentation strategies such as random flipping and rotation were employed during training. The optimizer used was AdamW, with an initial learning rate of 1×10⁻⁶. −4 The batch size is set to 4, and the total training cycle consists of 100 epochs.
[0040] To comprehensively and objectively evaluate segmentation performance, this invention employs five mainstream evaluation metrics: average Dice similarity coefficient (mDice), average intersection-union ratio (mIoU), Jaccard coefficient, recall, and precision.
[0041] The experiment compares the proposed MACD-Net model with UM-Net, U-Net, UNet++, DeepLabV3, DenseASPP, R2U-net, EH-former, CaraNet, and M... 2 The comparison was made with mainstream segmentation methods such as SNet and STS-Net.
[0042] As shown in Table 1, MACD-Net demonstrates a comprehensive performance advantage on the BUSI dataset, ranking first in both mDice (86.96%) and mIoU (83.86%). Current mainstream medical segmentation models have generally reached saturation in segmentation accuracy when dealing with the majority of large and medium-sized conventional lesions, with extremely small performance differences between different models on large targets. Therefore, while mainstream baseline models can segment large-scale lesions well, they are prone to missing small lesions, severely dragging down the overall performance. The significant leap in MACD-Net's overall performance essentially reflects its superior ability to break through the limitations of small and extremely low-contrast lesions.
[0043] What deserves further exploration is the trade-off between the model's detection rate and false alarm rate. Although M 2SNet achieved an extremely high recall rate (74.52%), but its precision declined (to only 71.32%). This indicates that such models fall into an overly conservative pixel classification strategy when dealing with small, low-contrast lesions, and their high detection rate comes at the cost of sacrificing specificity and generating a large number of false positives in surrounding healthy tissue. In real-world clinical auxiliary diagnosis, this high false positive rate greatly increases the cognitive burden on doctors for secondary review. In contrast, the method of this invention maintains a high recall rate (72.63%) while achieving the best precision across the entire dataset (76.45%).
[0044] To provide an intuitive subset verification of this core argument, this invention... Figure 1 and Figure 2 The analysis specifically targets a subset of tiny lesions. For extremely small lesions with low contrast, DeepLabV3 and other algorithms suffer from severe undersegmentation or even blank predictions due to information loss caused by downsampling. For lesions with irregular shapes (such as star-shaped edges), STS-Net over-smooths the predicted contours. The magnified area in the figure confirms that MACD-Net's overall superior performance does not stem from blindly inflating the edges of conventional lesions, but rather from the powerful background denoising capabilities of the CDDB module and the morphological adaptation mechanism of MARB. This allows it to substantially detect tiny targets that other models completely miss, achieving a true balance between high recall and high precision in low-contrast scenarios.
[0045] Table 1. Comparison of segmentation results using different methods on the BUSI dataset.
[0046]
[0047] As shown in Table 2, on the ISIC-2017 dataset with its extreme scale range, MACD-Net achieved the best results in both mDice (90.21%) and Recall (82.98%). Further analysis reveals that the reason traditional models like STS-Net and U-Net can barely maintain an overall evaluation score of around 81%–89% is almost entirely due to a "scale-driven bias" during optimization. They sacrifice the extremely small melanomas (which account for 11.3% of the dataset, resulting in significant missed detections) to accommodate the large-scale lesions that constitute the majority (88.7%). This directly leads to a bottleneck in the Recall metric that existing models struggle to overcome.
[0048] Conversely, MACD-Net achieves superior performance because it overcomes the "scale-driven bias." While maintaining the integrity of large lesion segmentation, MACD-Net accurately captures the irregular edges of small lesions using its MARB module. MACD-Net's high score does not solely rely on the contribution of large targets, but rather stems from its successful rescue of a large number of small and difficult-to-detect lesions that were missed by other models.
[0049] Visual analysis (Figure 3) shows that for background interference samples containing complex artifacts such as marker lines and hair, STS-Net and M... 2 Both SNet and DeepLabV3 generated false positives in non-lesion areas, demonstrating the limitations of conventional attention mechanisms in distinguishing lesion texture from background noise. In contrast, the mask generated by MACD-Net produced a clean background, proving that CDDB effectively suppresses high-frequency artifact interference. When dealing with lesions with blurred boundaries, MACD-Net, thanks to the dynamic sampling capability of MARB, achieved a high degree of geometric alignment with the true standard, regardless of whether the lesion was elongated or had blurred edges.
[0050] Table 2 compares the segmentation results of different methods on the ISIC-2017 dataset.
[0051]
[0052] This section focuses on the ablation experiments using the original Morphological Adaptive ResNeXt module (MARB) and Context-Detail Decoupling module (CDDB) from this invention. In the experimental setup, a strong baseline network (denoted as Baseline in Table 3) was constructed. This baseline network consists of a ResNeXt backbone, an ESCA attention module inherited from STS-Net, and a decoder integrating ASPP. Given that the BUSI dataset contains samples of purely small lesions with the lowest contrast and most blurred edges in breast ultrasound, its sensitivity to subtle features is much higher than that of the mixed-scale ISIC-2017 dataset. Therefore, this invention selected the BUSI dataset for ablation experiments, inserting the original modules item by item.
[0053] As shown in Table 3, after introducing the MARB proposed in this invention, the model's mDice improved by 0.38%, and Precision increased by 0.49%. This indicates that MARB effectively enhances the model's geometric modeling ability for irregular lesions through an adaptive sampling mechanism, significantly reducing false alarms caused by boundary fitting bias.
[0054] After introducing CDDB separately, mDice improved by 0.46%, and the recall metric increased significantly by 0.91% (from 71.65% to 72.56%). This confirms that CDDB's heterogeneous dual-stream architecture can effectively alleviate the feature dilution problem, enhance the network's sensitivity to weak target signals, and thus significantly reduce the false negative rate.
[0055] The complete MACD-Net integrating both modules achieved state-of-the-art performance. The model simultaneously improved both recall and precision, demonstrating a high degree of positive complementarity between the two modules: CDDB addresses the "invisible" challenges at the feature level (improving detection rate), while MARB addresses the "misalignment" dilemma at the spatial level (improving geometric accuracy). This organic synergy fundamentally enhances the robustness of small lesion segmentation tasks.
[0056]
[0057] Table 3 Ablation experimental results on the BUSI dataset The basic principles and main features of the present invention have been described above. Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are only illustrative of the principles of the present invention. Various changes and modifications can be made to the present invention without departing from the spirit and scope of the present invention. All such changes and modifications fall within the scope of the present invention as claimed. The scope of protection of the invention is defined by the appended claims and their equivalents.
Claims
1. A morphological adaptation and feature decoupling network method for medical image segmentation, characterized in that, Includes the following steps: Step 1: Using an encoder-decoder architecture, the input medical image is subjected to four downsampling and feature encoding operations, four upsampling and cascaded decoding operations, and the decoded features are subjected to one independent context-details decoupling process. Then, the pixel-level segmentation result is output through a 1×1 convolution. Step 2: Configure the morphological adaptive ResNeXt module and the context-details decoupling module in the encoder. The context-details decoupling module is deployed at the end of each downsampling and before the decoding output. The encoder is also configured with an efficient small-channel attention module to complete the filtering of redundant channels.
2. The morphological adaptation and feature decoupling network method for medical image segmentation according to claim 1, characterized in that, The morphologically adaptive ResNeXt module is built based on ResNeXt multi-branch topology and deformable convolution. It sets the feature cardinality G of the ResNeXt bottleneck layer and the number of deformation groups in the deformable convolution to the same value. For each independent semantic subspace, it configures a dedicated two-dimensional spatial offset and modulation scalar. The morphologically adaptive ResNeXt module first reduces the channel dimension to C / 2 using a 1×1 convolution on the input features, and then generates branches with 3K output channels using the offset. 2 The parameters of G, with K set to 3, are divided along the channel dimension. The first 2 / 3 of the parameters are used as spatial offset parameters, and the last 1 / 3 of the parameters are activated by Sigmoid and used as modulation mask parameters.
3. The morphological adaptation and feature decoupling network method for medical image segmentation according to claim 2, characterized in that, In the morphologically adaptive ResNeXt module, the g-th feature subspace is located at any position on the feature map. Output The definition is as follows: ; in, This represents the kernel weights of the g-th group; Input features; Standard 3×3 convolutional regularized grid The key improvement lies in the enumerated positions in the two-dimensional space; With modulation scalar All are explicitly bound to the g-th semantic subspace.
4. The morphological adaptation and feature decoupling network method for medical image segmentation according to claim 1, characterized in that, The context-detail decoupling module constructs a heterogeneous two-stream feature architecture with a receptive field, including an appropriate field-of-view context branch and a raw high-frequency detail branch, which converts the input feature tensor... The input is processed in parallel by two branches. Both branches reduce the dimension of the input feature channels to C / 2 through 1×1 convolution. After channel dimension alignment, the output features of the two branches are concatenated in the channel dimension to form a joint feature.
5. The morphological adaptation and feature decoupling network method for medical image segmentation according to claim 4, characterized in that, The appropriate field-of-view context branch adopts a hybrid dilated convolution strategy, which sequentially passes the dimensionality-reduced features through 3×3 convolutional blocks with dilation rates d=1 and d=3, thereby expanding the effective receptive field to 9×9 pixels; the original high-frequency detail branch completes nonlinear feature mapping through 1×1 convolution after dimensionality reduction, performing linear combination only in the channel dimension to maintain a 1×1 spatial receptive field.
6. The morphological adaptation and feature decoupling network method for medical image segmentation according to claim 4, characterized in that, The joint features obtained from the context-decoupling module are concatenated and then used to generate a single-channel spatial attention map through a 7×7 large kernel convolution. After processing by the Sigmoid activation function, the map is multiplied with the joint features by Hadamard, and then channel dimensionality reduction and feature fusion are completed by a 1×1 convolution. At the same time, a global residual connection is introduced to superimpose the input feature tensor with the fused features to obtain the output features of the context-decoupling module.