A medical image segmentation method and system based on local-global feature modeling
By employing a local-global feature modeling approach, combined with multi-scale feature extraction, bilateral feature enhancement, and multi-scale fusion, the shortcomings of existing medical image segmentation models in global modeling and boundary refinement are addressed, achieving high-precision lesion region segmentation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- HUBEI UNIV OF TECH
- Filing Date
- 2026-03-03
- Publication Date
- 2026-06-19
AI Technical Summary
Existing medical image segmentation models have shortcomings in global modeling, local detail capture, multi-scale fusion, and refined boundary modeling, making it difficult to simultaneously achieve accuracy, efficiency, and boundary quality.
We adopt a local-global feature modeling approach, which combines multi-scale feature extraction, bilateral feature enhancement, multi-scale parallel fusion and decoding modules, and utilizes the Mamba state space model and depthwise separable convolution to improve feature representation and edge perception capabilities.
It improves the segmentation accuracy and boundary recognition capability of skin lesion areas, significantly enhances the overall performance of medical image segmentation, and performs particularly well in complex lesion and high-resolution image processing.
Smart Images

Figure CN122244434A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of medical image processing technology, and in particular to a medical image segmentation method, system, storage medium, and electronic device based on local-global feature modeling. Background Technology
[0002] Skin diseases are among the most common diseases worldwide, including melanoma, psoriasis, and eczema. Malignant melanoma, in particular, often presents with subtle early symptoms and, if left untreated, is prone to metastasis, posing a serious threat to life. Clinical diagnosis typically relies on dermoscopic images for lesion identification; however, manual interpretation is limited by experience and subjective factors, resulting in limited efficiency and accuracy. Therefore, the automated analysis of skin disease images using computer vision and artificial intelligence technologies has become an important research direction. Image segmentation is a crucial step, aiming to accurately extract lesion regions from dermoscopic images, providing fundamental support for subsequent classification and diagnosis, and improving diagnostic efficiency and reliability.
[0003] With the development of image technology and deep learning, medical image segmentation algorithms have evolved from traditional image processing methods (such as thresholding, edge detection, clustering, and graph theory-based methods) to deep learning-based models. Convolutional Neural Networks (CNNs), represented by UNet, achieve multi-layer semantic feature extraction through an encoder-decoder structure, achieving good results in medical image segmentation. However, due to the limitation of convolution operations on the local receptive field, they are insufficient in capturing long-range dependencies. In recent years, the Transformer architecture has been introduced into medical image segmentation tasks. Its self-attention mechanism can model the relationships between pixels globally, but its computational complexity is high, which is not conducive to the processing of high-resolution medical images. To balance global modeling capability and computational efficiency, State Space Models (SSMs) have gradually become a research hotspot. The Mamba model, as a typical linear state space modeling method, can achieve long-range dependency modeling with linear complexity, showing potential in visual tasks. However, existing medical segmentation models based on CNNs, Transformers, or SSMs still have problems such as an imbalance between local and global feature modeling, insufficient multi-scale semantic fusion, and easy loss of boundary details. Summary of the Invention
[0004] This invention provides a medical image segmentation method, system, storage medium, and electronic device based on local-global feature modeling, which can improve feature expression and edge perception capabilities, thereby improving the segmentation accuracy of skin lesion areas.
[0005] This invention provides a medical image segmentation method based on local-global feature modeling, comprising: Acquiring medical images; The medical image is input into a multi-scale feature extraction module to obtain local semantic features; The local semantic features are input into the bilateral feature enhancement module to obtain the local enhanced features; The local enhancement features are input into a multi-scale parallel fusion module to obtain deep enhancement features; The local semantic features, the local enhancement features, and the deep enhancement features are input into the decoding module to obtain the medical image segmentation result.
[0006] Furthermore, according to the above-mentioned medical image segmentation method based on local-global feature modeling, the multi-scale feature extraction module includes multiple convolutional layers; The processing steps of the multi-scale feature extraction module include: The medical image is input into the multi-scale feature extraction module, which outputs the first local semantic feature, the second local semantic feature, and the third local semantic feature in sequence through three convolutional layers.
[0007] Furthermore, according to the above-described medical image segmentation method based on local-global feature modeling, the bilateral feature enhancement module includes three sequentially connected BFE-Mamba processing blocks. Each BFE-Mamba processing block includes a local enhancement branch, a global modeling branch, and a dynamic weight fusion module. The local enhancement branch includes a left branch and a right branch. The global modeling branch includes multiple Mamba modules. The processing procedure of the bilateral feature enhancement module includes: The third local semantic feature is input, and the first local enhancement feature, the second local enhancement feature, and the third local enhancement feature are output respectively through three BFE-Mamba processing blocks; The processing procedure of the BFE-Mamba processing block includes: The third local semantic feature is input into the left branch of the local enhancement branch, and a channel attention mechanism is used to obtain directional features through two asynchronous convolutions. The third local semantic features are input into the right branch of the local enhancement branch, and a spatial attention mechanism is used to obtain structural features through two asynchronous convolutions. The directional and structural features are fused using residual summation and channel shuffling mechanisms to obtain the directional-structural features; In the global modeling branch, the third local semantic feature is divided into four sub-features after layer normalization. Each sub-feature is input into the corresponding Mamba module, and the outputs of all Mamba modules are concatenated to obtain the Mamba output feature. In the dynamic weight fusion module, the orientation-structure features and the Mamba output features are fused to obtain local enhancement features.
[0008] Furthermore, according to the above-described medical image segmentation method based on local-global feature modeling, the Mamba module includes a left branch and a right branch; The processing procedure for the right branch of the Mamba module is represented by the following formula:
[0009] in, This is the output of the right branch of the Mamba module. Represents a selective state space. It is a non-linear activation function. It is a linear activation function. This is a convolution operation; The processing procedure for the left branch of the Mamba module is represented by the following formula:
[0010] in, This is the output of the left branch of the Mamba module; The output characteristics of the Mamba module are obtained by concatenating the outputs of the right and left branches. It can be expressed by the following formula:
[0011] in, It is the tensor product.
[0012] Furthermore, according to the above-described medical image segmentation method based on local-global feature modeling, the multi-scale parallel fusion module includes multiple depthwise separable convolutional branches; the processing procedure of the multi-scale parallel fusion module includes: The third local enhancement feature is processed by convolution and normalization and then input into multiple depthwise separable convolution branches to obtain the outputs of multiple depthwise separable convolution branches. The outputs of multiple depthwise separable convolution branches are concatenated and then residually connected with the third local enhancement feature to obtain the deep enhancement feature.
[0013] Furthermore, according to the above-described medical image segmentation method based on local-global feature modeling, the decoding module includes multiple VM sub-modules and DEA sub-modules; the processing procedure of the decoding module includes: The deep enhancement features are input into the first VM submodule to obtain the first decoding features; The first decoding feature and the second local enhancement feature are concatenated and input into the first DEA submodule to obtain the first attention feature; The first attention feature is input into the second VM submodule to obtain the second decoding feature; The second decoding feature and the first local enhancement feature are concatenated and input into the second DEA submodule to obtain the second attention feature; The second attention feature is input into the third VM submodule to obtain the third decoding feature; The third decoding feature and the third local semantic feature are concatenated and input into the third DEA submodule to obtain the third attention feature; The third attention feature is input into the convolutional layer to obtain the fourth decoding feature; The fourth decoding feature and the second local semantic feature are concatenated and input into the fourth DEA submodule to obtain the fourth attention feature; The fourth attention feature is input into the convolutional layer to obtain the fifth decoding feature; The fifth decoding feature and the first local semantic feature are concatenated and input into the fifth DEA submodule to obtain the fifth attention feature; A 1×1 convolution operation is performed on the fifth attention feature to obtain the medical image segmentation result.
[0014] Furthermore, according to the above-described medical image segmentation method based on local-global feature modeling, the processing procedure of the DEA submodule includes: Multi-scale feature extraction is performed on the input features by downsampling operations at different scales, generating low-resolution feature maps at two scales. The edge response features are obtained by upsampling the low-resolution feature maps of the two scales back to their original size and then performing a difference operation. The edge response features and the input features are weighted and fused to obtain the fused features; Attention features are obtained by processing the fused features using channel attention and spatial attention mechanisms.
[0015] This invention also provides a medical image segmentation system based on local-global feature modeling, comprising: The acquisition module is used to acquire medical images; A multi-scale feature extraction module is used to process the medical image to obtain local semantic features; The bilateral feature enhancement module is used to process the local semantic features to obtain local enhanced features; A multi-scale parallel fusion module is used to process the local enhancement features to obtain deep enhancement features; The decoding module is used to process the local semantic features, the local enhancement features, and the deep enhancement features to obtain the medical image segmentation result.
[0016] The present invention also provides a computer-readable storage medium storing a plurality of instructions adapted for loading by a processor to execute any of the above-described medical image segmentation methods based on local-global feature modeling.
[0017] The present invention also provides an electronic device, including a processor and a memory, wherein the processor is electrically connected to the memory, the memory is used to store instructions and data, and the processor is used in the steps of the medical image segmentation method based on local-global feature modeling described in any of the preceding claims.
[0018] This invention provides a medical image segmentation method, system, storage medium, and electronic device based on local-global feature modeling. In the encoding stage, the invention proposes a bilateral feature enhancement module that deeply integrates the global dependency modeling capability of the Mamba state-space model with the local enhancement mechanism of channel-space attention. Through linear projection and bi-branch feature rearrangement, it achieves joint representation of global context and local texture, which is one of the key technical points for improving feature representation capabilities. In the network bottleneck stage, a multi-scale parallel fusion module is designed to fuse multi-scale convolutional kernels, local semantic features from different receptive fields, and global Mamba features in parallel, effectively improving the semantic stability and scale adaptability of complex lesion regions. In the decoding stage, a DEA differential edge attention module is constructed to generate edge response maps through multi-scale differential features and utilizes a learnable edge fusion mechanism to strengthen the structural information transmitted in skip connections, thereby effectively solving the problem of blurred lesion contours. This is an important means for improving boundary recovery capabilities. Furthermore, this invention introduces a residual deep coupling mechanism of local-global features into the overall network structure, maintaining the linear complexity advantage and lightweight structure of Mamba, ensuring that the network still has high efficiency on high-resolution medical images. Attached Figure Description
[0019] The technical solution and other beneficial effects of the present invention will become apparent from the following detailed description of specific embodiments of the invention, in conjunction with the accompanying drawings.
[0020] Figure 1 A flowchart of a medical image segmentation method based on local-global feature modeling provided in an embodiment of the present invention.
[0021] Figure 2 This is a structural diagram of the BFE-Mamba processing block provided in an embodiment of the present invention.
[0022] Figure 3This is a schematic diagram of the structure of the Mamba module provided in an embodiment of the present invention.
[0023] Figure 4 This is a schematic diagram of the structure of the multi-scale parallel fusion module provided in an embodiment of the present invention.
[0024] Figure 5 This is a schematic diagram of the structure of the DEA submodule provided in an embodiment of the present invention.
[0025] Figure 6 This is a schematic diagram of the structure of a medical image segmentation system based on local-global feature modeling provided in an embodiment of the present invention.
[0026] Figure 7 This is a schematic diagram of the structure of an electronic device provided in an embodiment of the present invention. Detailed Implementation
[0027] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0028] Traditional machine learning methods have limited scalability: handcrafted features are difficult to adapt to different imaging modes and complex lesion morphologies, are very sensitive to noise, shadows and blurred boundaries, have insufficient generalization ability, and cannot meet the needs of refined medical diagnosis.
[0029] CNN-based methods have insufficient ability to model global information: convolution operations are essentially local computations with limited receptive fields, causing the network to rely mainly on local texture and edge information. Even by increasing the number of network layers or expanding the receptive field by dilating convolutions, it is difficult to establish stable global context relationships, which can easily lead to missegmentation when processing images with complex structures or irregular lesion distributions.
[0030] The Transformer method has high training and deployment costs: The computational complexity of the Transformer's self-attention mechanism increases quadratically with the input size, resulting in a large number of model parameters and high computational costs, which is not conducive to deployment in real-world clinical scenarios, especially for applications with high real-time requirements or limited hardware resources.
[0031] Existing Mamba-based methods suffer from insufficient local feature modeling: While existing models using Mamba for medical image segmentation can effectively capture global dependencies, they generally lack fine-grained modeling mechanisms for local edges and texture details. Relying solely on global features may result in weak responses in boundary regions and unclear segmentation contours; small targets or slender structures are easily overlooked; and the local information passed through skip connections is not fully utilized, leading to insufficient matching of semantic and structural details.
[0032] The decoding stage lacks an effective boundary enhancement mechanism: most existing methods cannot fully focus on the edge region of the lesion during the upsampling and recovery of spatial details, which easily leads to blurred boundaries and loss of details, resulting in insufficient morphological accuracy of the final segmentation result.
[0033] In summary, existing technologies still have room for improvement in global modeling, local detail capture, multi-scale fusion, and refined boundary modeling, and it is difficult to simultaneously achieve accuracy, efficiency, and boundary quality.
[0034] To address the aforementioned problems, embodiments of the present invention provide a medical image segmentation method, system, storage medium, and electronic device based on local-global feature modeling. The medical image segmentation system based on local-global feature modeling provided by embodiments of the present invention can be integrated into an electronic device, which can be a terminal, server, or other device. The terminal can include a tablet computer, laptop computer, personal computer (PC), microprocessor box, or other devices.
[0035] Considering that lesion areas in medical images typically possess complex and variable structural features, global information is crucial for understanding their shape and overall contour, while local information helps capture edge details and texture variations, thereby achieving more accurate lesion boundary segmentation. Based on this, this invention provides a medical image segmentation method based on local-global feature modeling. Figure 1 For a flowchart of the medical image segmentation method based on local-global feature modeling provided in this embodiment of the invention, please refer to [link / reference]. Figure 1 This medical image segmentation method, applied in electronic devices and based on local-global feature modeling, includes the following steps: S1, acquire medical images.
[0036] S2 inputs the medical image into the multi-scale feature extraction module to obtain local semantic features.
[0037] In one embodiment, the multi-scale feature extraction module includes multiple convolutional layers, specifically three sequentially connected 3×3 convolutional layers (Conv3×3).
[0038] The processing steps of the multi-scale feature extraction module include: The medical image is input into the multi-scale feature extraction module, which outputs the first local semantic feature, the second local semantic feature, and the third local semantic feature in sequence through three convolutional layers.
[0039] S3 inputs the local semantic features into the bilateral feature enhancement module to obtain the local enhanced features.
[0040] In one embodiment, the bilateral feature enhancement module includes three sequentially connected BFE-Mamba (bilateral feature enhancement) processing blocks. Each BFE-Mamba processing block includes a local enhancement branch, a global modeling branch, and a dynamic weight fusion module. The local enhancement branch includes a left branch and a right branch; the global modeling branch includes multiple Mamba modules. The processing procedure of the bilateral feature enhancement module includes: The third local semantic feature is input, and the first local enhancement feature, the second local enhancement feature, and the third local enhancement feature are output respectively through three BFE-Mamba processing blocks; Figure 2 This is a structural diagram of the BFE-Mamba processing block provided in an embodiment of the present invention, as shown below. Figure 2 As shown, the processing procedure of the BFE-Mamba processing block includes: S31, the third local semantic features are input into the left branch of the local enhancement branch, and the channel attention mechanism is used to obtain directional features through two asynchronous convolutions.
[0041] This can be expressed by the following formula:
[0042]
[0043] in, The input to the left branch of the local enhancement branch is the third local semantic feature. For channel attention mechanism, This indicates an element-wise multiplication operation. and These represent asynchronous convolution operations in the horizontal and vertical directions, respectively, used to enhance orientation awareness. As a process characteristic, It is a directional feature.
[0044] S32, the third local semantic features are input into the right branch of the local enhancement branch, and the spatial attention mechanism is used to obtain the structural features through two asynchronous convolutions.
[0045] The right branch employs a spatial attention mechanism to capture structural and contextual information at different locations. Its calculation formula is as follows:
[0046]
[0047] in, The input to the right branch of the local enhancement branch is the third local semantic feature. For spatial attention mechanisms, As a process characteristic, It is a structural feature.
[0048] S33 uses residual summation and channel shuffling mechanism to fuse directional and structural features to obtain directional-structural features.
[0049] Through the aforementioned dual-path design, both the channel features and spatial layout of the image can be considered simultaneously, resulting in a more comprehensive feature representation. Finally, to enhance information interaction between the two branches, channel information from the left branch is transmitted to the right branch, and spatial information from the right branch is transmitted to the left branch. Fusion is achieved through residual summation and a channel shuffling mechanism (which includes data reshaping: reshaping one-dimensional channel data into two-dimensional data; transpose operation: transposing the reshaped data; and reshaping again: reshaping the transposed data back into one-dimensional data, completing the channel shuffling operation). The fusion formula is as follows:
[0050]
[0051]
[0052] in These are the results of summing the residuals for directional and structural features, respectively. Indicates a connection operation. For channel shuffling mechanism, This refers to directional-structural characteristics.
[0053] The channel shuffling mechanism effectively breaks the redundancy between channels and improves the diversity of feature representation and discrimination ability.
[0054] In S34, in the global modeling branch, the third local semantic features are divided into four sub-features after layer normalization. Each sub-feature is input into the corresponding Mamba module, and the outputs of all Mamba modules are concatenated to obtain the Mamba output features.
[0055] The global modeling branch is based on the Mamba State Space Model (SSM) to capture long-range dependencies and global context information in the input features.
[0056] First, the input features (i.e., the third local semantic features) are subjected to layer normalization, and then divided into four equal parts according to channels: .
[0057] in, Used to split a string into multiple substrings according to a specified delimiter. For layer normalization, These are the four sub-features that define the division.
[0058] Each sub-feature is input into a separate Mamba module for global modeling. For each branch... Output features Defined as:
[0059] in, This is the residual adjustment factor.
[0060] Finally, the outputs of the four sub-branches are concatenated and projected to obtain the final features:
[0061]
[0062] in, The feature obtained by concatenating the outputs of the four branches. This refers to the output characteristics of Mamba.
[0063] Figure 3 This is a schematic diagram of the structure of the Mamba module provided in an embodiment of the present invention, as shown below. Figure 3 As shown, the Mamba module (Mamba Architecture, a novel deep modeling structure based on a state-space model, with efficient sequence modeling and long-range dependency capture capabilities, used to enhance global feature modeling) adopts a left-right dual-branch structure: the left branch enhances feature expression through linear transformation and activation, while the right branch combines convolution and the state-space model to achieve sequence modeling.
[0064] The processing of the right branch of a Mamba module is represented by the following formula:
[0065] in, This is the output of the right branch of the Mamba module. Representing a selective state space, it can capture global features of the sequence across time steps, enabling efficient extraction of long-range dependency information. It is a non-linear activation function. It is a linear activation function. This is a convolution operation; The processing of the left branch of a Mamba module is represented by the following formula:
[0066] in, This is the output of the left branch of the Mamba module; The output characteristics of the Mamba module are obtained by concatenating the outputs of the right and left branches. It can be expressed by the following formula:
[0067] in, It is the tensor product.
[0068] S35, in the dynamic weight fusion module, the orientation-structure features and Mamba output features are fused to obtain local enhancement features.
[0069] The outputs of the local enhancement branch and the global modeling branch are fused through dynamic parameter adjustment and attention mechanism to achieve mutual learning and enhancement of information, improve the overall feature extraction capability of the module, and take into account both local edge details and global context information.
[0070] S4 inputs the local enhancement features into the multi-scale parallel fusion module to obtain deep enhancement features.
[0071] To address the issue of insufficient utilization of deep semantic features in the bottleneck stage of traditional encoder-decoder architectures, this invention proposes a multi-scale parallel fusion module (MSR-Mamba). This module enhances the ability to express deep semantics and lesion edge features while maintaining lightweight design, providing more accurate feature support for the decoder.
[0072] Figure 4 This is a schematic diagram of the structure of the multi-scale parallel fusion module provided in an embodiment of the present invention, as shown below. Figure 4 As shown, the multi-scale parallel fusion module includes multiple depthwise separable convolutional branches; the processing steps of the multi-scale parallel fusion module include: The third local enhancement feature is processed by convolution and normalization and then input into multiple depthwise separable convolution branches to obtain the outputs of multiple depthwise separable convolution branches. The outputs of multiple depthwise separable convolution branches are concatenated and then residually connected with the third local enhancement feature to obtain the deep enhancement feature.
[0073] Specifically, input features (That is, the third local enhancement feature) First, it undergoes channel expansion through 1×1 convolution and is then normalized using LayerNorm to obtain a standardized feature representation Y. The normalized feature Y is then input into multiple depthwise separable convolutional branches with different receptive fields. Each depthwise separable convolutional branch is used to extract semantic information and edge details at different scales. Each depthwise separable convolutional branch first performs local feature extraction through depthwise separable convolution (DWConv), then adjusts the number of channels through 1×1 convolution, and finally inputs it into the Mamba module for global long-range dependency modeling.
[0074] in, This indicates a depthwise separable convolution operation. Indicates the kernel size. This represents a 1×1 convolution operation. Branches corresponding to different scales.
[0075] Finally, the outputs of each depthwise separable convolutional branch are fused proportionally and residually connected with the input feature Y to achieve a comprehensive representation of multi-scale information, resulting in the final output feature (i.e., deep enhancement feature):
[0076] in, These are learnable scale weight parameters used to dynamically adjust the contributions of each branch.
[0077] Through the above design, the MSR-Mamba module of this invention can simultaneously capture deep semantic information and multi-scale edge features, achieving efficient enhancement of features at the bottleneck stage. This module has a lightweight structure, high computational efficiency, and effectively improves the network's ability to reconstruct lesion regions during the decoding stage, thereby enhancing the accuracy of medical image segmentation and boundary recognition capabilities.
[0078] S5 inputs local semantic features, local enhancement features, and deep enhancement features into the decoding module to obtain the medical image segmentation result.
[0079] In one embodiment, the decoding module includes multiple VM submodules (Vision Mamba, a decoder based on the Mamba architecture) and a DEA submodule (Differential Edge Attention submodule). Please continue reading. Figure 1 The decoding module's processing includes: S51, input the deep enhancement features into the first VM submodule to obtain the first decoding features; S52, the first decoding feature and the second local enhancement feature are concatenated and input into the first DEA submodule to obtain the first attention feature; S53, input the first attention feature into the second VM submodule to obtain the second decoding feature; S54, the second decoding feature and the first local enhancement feature are concatenated and input into the second DEA submodule to obtain the second attention feature; S55, the second attention feature is input into the third VM submodule to obtain the third decoding feature; S56, the third decoding feature and the third local semantic feature are concatenated and input into the third DEA submodule to obtain the third attention feature; S57, the third attention feature is input into the convolutional layer to obtain the fourth decoding feature; S58, the fourth decoding feature and the second local semantic feature are concatenated and input into the fourth DEA submodule to obtain the fourth attention feature; S59, the fourth attention feature is input into the convolutional layer to obtain the fifth decoding feature; S5A concatenates the fifth decoding feature and the first local semantic feature and inputs them into the fifth DEA submodule to obtain the fifth attention feature; S5B performs a 1×1 convolution operation on the fifth attention feature to obtain the medical image segmentation result.
[0080] In medical images, the boundaries of lesion areas are often blurred and smoothly transition to the background. Traditional segmentation networks are prone to blurring or discontinuity during edge reconstruction, resulting in unclear lesion contours. To address this, this invention proposes a DEA module to enhance edge feature modeling capabilities during the decoding stage, thereby improving the boundary accuracy of the segmentation results.
[0081] Figure 5 This is a schematic diagram of the structure of the DEA submodule provided in an embodiment of the present invention. The processing procedure of the DEA submodule includes: S521 performs multi-scale feature extraction on the input features through downsampling operations at different scales, generating low-resolution feature maps at two scales. S522, after upsampling the low-resolution feature maps of the two scales back to their original size, performs a difference operation to obtain the edge response features; S523, weighted fusion of edge response features and input features to obtain fused features; S524 obtains attention features by processing the fused features through channel attention mechanism and spatial attention mechanism.
[0082] Specifically, two downsampling ratios are used to generate low-resolution feature maps, which are then upsampled back to their original size and subjected to a difference operation to obtain edge response features. :
[0083] in, This indicates that the input features are downsampled by a ratio r. This indicates an upsampling operation, where X represents the input feature of the DEA submodule.
[0084] The obtained edge response features It can effectively characterize the differences in feature responses at different scales, thereby highlighting abrupt edge changes in images. To enhance the representation strength of edge features in the overall features, this module uses learnable fusion coefficients. The differential features are weighted and fused with the original feature map to obtain the fused features. This operation achieves a dynamic balance between edge features and global context features, helping the model focus on boundary-sensitive regions in subsequent stages. To further enhance the model's attention to edge regions and key structures, this invention introduces channel attention and spatial attention mechanisms on the fused edge features, constructing a dual-attention guided process. Through the above design, the DEA submodule of this invention can extract edge mutation information from multi-scale differences and achieve adaptive enhancement of edge features through the attention mechanism. Compared with traditional decoder structures, this module significantly improves the network's ability to perceive blurred boundaries and fine-grained structures while maintaining the model's lightweight nature, thereby effectively improving the reconstruction quality and segmentation accuracy of lesion boundaries in medical images.
[0085] In one embodiment, the VM submodule is a state-space modeling unit used to perform global context modeling of features at different scales, thereby enhancing semantic consistency and long-range dependency representation capabilities during the decoding stage.
[0086] In one embodiment of the present invention, the internal processing flow of the VM submodule is consistent with the global modeling branch in the bilateral feature enhancement module (BFE-Mamba), both of which perform global feature modeling based on the Mamba state space model.
[0087] Specifically, the VM submodule first performs layer normalization on the input features and divides them into sub-features along the channel dimension. Then, each sub-feature is input into the corresponding Mamba state space modeling unit to capture long-range dependencies in the features. The outputs of each branch are fused through residual connections and then spliced together, and finally linearly projected to obtain the output features after global modeling.
[0088] The above method can be applied to the medical image segmentation network (Local-Global Mamba-enhanced ModuleNet, LGMM-Net), which is trained based on a loss function.
[0089] Regarding the setting of the loss function, since the boundaries of the lesion area are usually blurred and smoothly transitioned with the background, if only the traditional binary cross entropy loss (BCE) and Dice loss are used for training, the model mainly focuses on the accuracy of region overlap and easily ignores the detailed information of the boundary structure, resulting in blurry or discontinuous segmentation results at the edges.
[0090] To address this, this invention introduces an edge constraint mechanism based on the Laplacian operator in the loss function design. The Laplacian operator is a second-order derivative filter that effectively captures locations in the image where pixel intensity changes drastically, i.e., edge regions. If the edges of the predicted image and the ground truth label are consistent, the difference is small; otherwise, it is large. By imposing differential constraints on the edge regions of the predicted image and the ground truth label, the model is guided to learn clearer boundary representations, thereby improving the overall segmentation accuracy. Specifically, this invention uses a standard 3×3 Laplacian kernel as a second-order derivative filter to extract locations in the image where pixel intensity changes drastically. This operator can effectively capture the edge information of the image and reflect the consistency between the predicted result and the ground truth boundary. Let the predicted medical segmentation result image be P, and the ground truth label be G. First, Laplacian convolution operations are applied to both to obtain the edge response images. and :
[0091]
[0092] The mean squared error (MSE) between the two is then calculated as the marginal regularization loss term: Ultimately, the overall loss function of this invention consists of three parts:
[0093] in, These are the weighting coefficients for the three loss terms, used to balance the relationship between classification accuracy and marginal constraints. For binary cross-entropy loss, The Dice loss function measures the similarity between the predicted medical segmentation structure map and the corresponding ground truth label. This is a marginalization regularization term.
[0094] By introducing this edge regularization term, the loss function of this invention can significantly enhance the model's ability to perceive lesion boundaries while maintaining overall segmentation performance, effectively reducing edge blurring and thus achieving more accurate lesion region segmentation.
[0095] The method provided in this invention achieves significant performance improvements on multiple publicly available medical image segmentation datasets. As shown in Tables 1-3, on the ISIC 2017 dataset, the mIoU reaches 79.49% and the Dice reaches 88.57%, which is 2%–6% higher than mainstream methods such as U-Net, Attention U-Net, and META-UNet. In the ISIC 2018 dataset, the mIoU and Dice of this invention reach 80.27% and 89.05% respectively, maintaining a leading position in complex boundary scenarios. In the PH2 dataset with limited samples, this invention still achieves the highest mIoU of 82.50% and Dice of 90.41%, fully demonstrating excellent generalization ability. The above experimental verification shows that the technical solution of this invention has achieved significant improvements in global modeling, local detail capture, boundary restoration accuracy, and model efficiency, possessing outstanding technical advancement and application value.
[0096] Table 1. Experimental results of LGMM-Net on the ISIC 2017 dataset.
[0097] Table 2. Experimental results of LGMM-Net on the ISIC 2018 dataset.
[0098] Table 3 Experimental results of LGMM-Net on the PH2 dataset
[0099] Based on the method described in the above embodiments, this embodiment will further describe it from the perspective of a medical image segmentation system based on local-global feature modeling. This medical image segmentation system based on local-global feature modeling can be implemented as an independent entity or integrated into an electronic device. The electronic device can be a terminal, server, or other device. The terminal can include a tablet computer, a laptop computer, a personal computer (PC), a microprocessor box, or other devices.
[0100] Please see Figure 6 , Figure 6 This invention specifically describes a medical image segmentation system based on local-global feature modeling, applied in electronic devices. The system may include: The acquisition module is used to acquire medical images; A multi-scale feature extraction module is used to process the medical image to obtain local semantic features; The bilateral feature enhancement module is used to process the local semantic features to obtain local enhanced features; A multi-scale parallel fusion module is used to process the local enhancement features to obtain deep enhancement features; The decoding module is used to process the local semantic features, the local enhancement features, and the deep enhancement features to obtain the medical image segmentation result.
[0101] In specific implementation, the above modules and / or units can be implemented as independent entities, or they can be arbitrarily combined and implemented as the same or several entities. For the specific implementation of the above modules and / or units, please refer to the previous method embodiments. For the specific beneficial effects that can be achieved, please also refer to the beneficial effects in the previous method embodiments, which will not be repeated here.
[0102] In addition, this embodiment of the invention also provides an electronic device, which may be a computer, tablet computer, or other similar device. This electronic device can implement the steps of any embodiment of the medical image segmentation method based on local-global feature modeling provided in this embodiment of the invention. Therefore, it can achieve the beneficial effects achievable by any medical image segmentation method based on local-global feature modeling provided in this embodiment of the invention, as detailed in the preceding embodiments, and will not be repeated here.
[0103] Figure 7 A specific structural block diagram of an electronic device provided in an embodiment of the present invention is shown. This electronic device can be used to implement the medical image segmentation method based on local-global feature modeling provided in the above embodiments. The electronic device 500 can be a terminal, server, or other device. The terminal can include a tablet computer, laptop computer, personal computer (PC), microprocessor box, or other devices.
[0104] The memory 520 can be used to store software programs and modules, such as the program instructions / modules corresponding to those in the above embodiments. The processor 580 executes various functional applications and data processing by running the software programs and modules stored in the memory 520. The memory 520 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 520 may further include memory remotely located relative to the processor 580, and these remote memories can be connected to the electronic device 500 via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
[0105] The input unit 530 can be used to receive input numeric or character information, and to generate a keyboard and mouse related to user settings and function control. Display unit 540 can be used to display information input by the user or information provided to the user, as well as various graphical user interfaces, which can be composed of graphics, text, icons, video, and any combination thereof. Display unit 540 may include display panel 541, which may optionally be configured in the form of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or other similar forms.
[0106] Electronic device 500, through transmission module 570 (e.g., Wi-Fi module), can help users receive requests, send information, etc., providing users with wireless broadband internet access. Although transmission module 570 is shown in the figure, it is understood that it is not an essential component of electronic device 500 and can be omitted as needed without changing the essence of the invention.
[0107] The processor 580 is the control center of the electronic device 500. It connects to various parts of the phone via various interfaces and lines, and performs various functions and processes data of the electronic device 500 by running or executing software programs and / or modules stored in the memory 520, and by calling data stored in the memory 520, thereby providing overall monitoring of the electronic device. Optionally, the processor 580 may include one or more processing cores; in some embodiments, the processor 580 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface, and applications, and the modem processor mainly handles wireless communication. It is understood that the modem processor may also not be integrated into the processor 580.
[0108] Electronic device 500 also includes a power supply 590 (such as a battery) that supplies power to various components. In some embodiments, the power supply may be logically connected to processor 580 through a power management system, thereby enabling functions such as charging, discharging, and power consumption management through the power management system. The power supply 590 may also include one or more DC or AC power supplies, recharging systems, power fault detection circuits, power converters or inverters, power status indicators, and other arbitrary components.
[0109] Although not shown, the electronic device 500 also includes cameras (such as front-facing cameras and rear-facing cameras), Bluetooth modules, etc., which will not be described in detail here. Specifically, in this embodiment, the display unit of the electronic device is a touch screen display, and the mobile terminal also includes a memory and one or more programs, wherein one or more programs are stored in the memory and configured to be executed by one or more processors. One or more programs contain instructions for performing the following operations: Acquiring medical images; The medical image is input into a multi-scale feature extraction module to obtain local semantic features; The local semantic features are input into the bilateral feature enhancement module to obtain the local enhanced features; The local enhancement features are input into a multi-scale parallel fusion module to obtain deep enhancement features; The local semantic features, the local enhancement features, and the deep enhancement features are input into the decoding module to obtain the medical image segmentation result.
[0110] In practice, the above modules can be implemented as independent entities or combined in any way to be implemented as the same or several entities. For the specific implementation of the above modules, please refer to the previous method implementation examples, which will not be repeated here.
[0111] Those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be implemented by instructions, or by instructions controlling related hardware. These instructions can be stored in a computer-readable storage medium and loaded and executed by a processor. Therefore, embodiments of the present invention provide a storage medium storing multiple instructions that can be loaded by a processor to execute the steps of any embodiment of the medical image segmentation method based on local-global feature modeling provided by the present invention.
[0112] The computer-readable storage medium may include: read-only memory (ROM), random access memory (RAM), disk or optical disk, etc.
[0113] Since the instructions stored in the storage medium can execute the steps in any embodiment of the medical image segmentation method based on local-global feature modeling provided in the embodiments of the present invention, the beneficial effects that any medical image segmentation method based on local-global feature modeling provided in the embodiments of the present invention can achieve can be realized, as detailed in the preceding embodiments, and will not be repeated here.
[0114] The foregoing has provided a detailed description of a medical image segmentation method, system, storage medium, and electronic device based on local-global feature modeling provided by embodiments of the present invention. Specific examples have been used to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of the present invention. Therefore, the content of this specification should not be construed as a limitation of the present invention.
Claims
1. A medical image segmentation method based on local-global feature modeling, characterized in that, The method includes: Acquiring medical images; The medical image is input into a multi-scale feature extraction module to obtain local semantic features; The local semantic features are input into the bilateral feature enhancement module to obtain the local enhanced features; The local enhancement features are input into a multi-scale parallel fusion module to obtain deep enhancement features; The local semantic features, the local enhancement features, and the deep enhancement features are input into the decoding module to obtain the medical image segmentation result.
2. The medical image segmentation method based on local-global feature modeling according to claim 1, characterized in that, The multi-scale feature extraction module includes multiple convolutional layers; The processing steps of the multi-scale feature extraction module include: The medical image is input into the multi-scale feature extraction module, which outputs the first local semantic feature, the second local semantic feature, and the third local semantic feature in sequence through three convolutional layers.
3. The medical image segmentation method based on local-global feature modeling according to claim 2, characterized in that, The bilateral feature enhancement module includes three sequentially connected BFE-Mamba processing blocks, each of which includes a local enhancement branch, a global modeling branch, and a dynamic weight fusion module. The local enhancement branch includes a left branch and a right branch; the global modeling branch includes multiple Mamba modules; the processing procedure of the bilateral feature enhancement module includes: The third local semantic feature is input, and the first local enhancement feature, the second local enhancement feature, and the third local enhancement feature are output respectively through three BFE-Mamba processing blocks; The processing procedure of the BFE-Mamba processing block includes: The third local semantic feature is input into the left branch of the local enhancement branch, and a channel attention mechanism is used to obtain directional features through two asynchronous convolutions. The third local semantic features are input into the right branch of the local enhancement branch, and a spatial attention mechanism is used to obtain structural features through two asynchronous convolutions. The directional and structural features are fused using residual summation and channel shuffling mechanisms to obtain the directional-structural features; In the global modeling branch, the third local semantic feature is divided into four sub-features after layer normalization. Each sub-feature is input into the corresponding Mamba module, and the outputs of all Mamba modules are concatenated to obtain the Mamba output feature. In the dynamic weight fusion module, the orientation-structure features and the Mamba output features are fused to obtain local enhancement features.
4. The medical image segmentation method based on local-global feature modeling according to claim 3, characterized in that, The Mamba module includes a left branch and a right branch; The processing procedure for the right branch of the Mamba module is represented by the following formula: in, This is the output of the right branch of the Mamba module. Represents a selective state space. It is a non-linear activation function. It is a linear activation function. This is a convolution operation; The processing procedure for the left branch of the Mamba module is represented by the following formula: in, This is the output of the left branch of the Mamba module; The output characteristics of the Mamba module are obtained by concatenating the outputs of the right and left branches. It can be expressed by the following formula: in, It is the tensor product.
5. The medical image segmentation method based on local-global feature modeling according to claim 3, characterized in that, The multi-scale parallel fusion module includes multiple depthwise separable convolutional branches; the processing procedure of the multi-scale parallel fusion module includes: The third local enhancement feature is processed by convolution and normalization and then input into multiple depthwise separable convolution branches to obtain the outputs of multiple depthwise separable convolution branches. The outputs of multiple depthwise separable convolution branches are concatenated and then residually connected with the third local enhancement feature to obtain the deep enhancement feature.
6. The medical image segmentation method based on local-global feature modeling according to claim 5, characterized in that, The decoding module includes multiple VM sub-modules and DEA sub-modules; the processing procedure of the decoding module includes: The deep enhancement features are input into the first VM submodule to obtain the first decoding features; The first decoding feature and the second local enhancement feature are concatenated and input into the first DEA submodule to obtain the first attention feature; The first attention feature is input into the second VM submodule to obtain the second decoding feature; The second decoding feature and the first local enhancement feature are concatenated and input into the second DEA submodule to obtain the second attention feature; The second attention feature is input into the third VM submodule to obtain the third decoding feature; The third decoding feature and the third local semantic feature are concatenated and input into the third DEA submodule to obtain the third attention feature; The third attention feature is input into the convolutional layer to obtain the fourth decoding feature; The fourth decoding feature and the second local semantic feature are concatenated and input into the fourth DEA submodule to obtain the fourth attention feature; The fourth attention feature is input into the convolutional layer to obtain the fifth decoding feature; The fifth decoding feature and the first local semantic feature are concatenated and input into the fifth DEA submodule to obtain the fifth attention feature; A 1×1 convolution operation is performed on the fifth attention feature to obtain the medical image segmentation result.
7. The medical image segmentation method based on local-global feature modeling according to claim 6, characterized in that, The processing procedure of the DEA submodule includes: Multi-scale feature extraction is performed on the input features by downsampling operations at different scales, generating low-resolution feature maps at two scales. The edge response features are obtained by upsampling the low-resolution feature maps of the two scales back to their original size and then performing a difference operation. The edge response features and the input features are weighted and fused to obtain the fused features; Attention features are obtained by processing the fused features using channel attention and spatial attention mechanisms.
8. A medical image segmentation system based on local-global feature modeling, characterized in that, include: The acquisition module is used to acquire medical images; A multi-scale feature extraction module is used to process the medical image to obtain local semantic features; The bilateral feature enhancement module is used to process the local semantic features to obtain local enhanced features; A multi-scale parallel fusion module is used to process the local enhancement features to obtain deep enhancement features; The decoding module is used to process the local semantic features, the local enhancement features, and the deep enhancement features to obtain the medical image segmentation result.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a plurality of instructions adapted for loading by a processor to execute the medical image segmentation method based on local-global feature modeling as described in any one of claims 1 to 7.
10. An electronic device, characterized in that, The device includes a processor and a memory, the processor being electrically connected to the memory, the memory being used to store instructions and data, and the processor being used to execute the steps in the medical image segmentation method based on local-global feature modeling as described in any one of claims 1 to 7.