A medical image segmentation method based on information guidance and boundary perception
By employing information-guided and boundary-aware medical image segmentation methods, and utilizing dataset expansion and multi-module feature processing techniques, the problem of difficult lesion boundary localization is solved, achieving accurate lesion boundary segmentation and improving segmentation accuracy. This approach is applicable to the field of medical image segmentation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUILIN UNIVERSITY OF TECHNOLOGY
- Filing Date
- 2026-03-27
- Publication Date
- 2026-06-26
AI Technical Summary
Existing deep learning-based medical image segmentation methods struggle to accurately locate lesion boundaries, often resulting in missed or over-segmentation, especially when lesion boundaries are blurry or have complex contours, thus affecting segmentation accuracy.
A medical image segmentation method based on information guidance and boundary awareness is adopted. Through dataset expansion and enhancement, using a Transformer encoder, information fusion module, boundary awareness module, global/local feature extraction module and channel multi-scale module, combined with feature processing techniques of multiple modules, the accurate localization and segmentation of lesion boundaries can be achieved.
This method can effectively perceive lesion boundaries in complex environments, achieve accurate boundary segmentation, improve the segmentation accuracy and generalization performance of the model, and provide important clinical diagnostic references.
Smart Images

Figure CN122289682A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of medical image segmentation technology, and specifically to a medical image segmentation method based on information guidance and boundary awareness. Background Technology
[0002] In actual medical images, the areas where lesions are located within the human body are often quite complex: the boundary regions of lesions are very similar to normal tissue, with very blurred boundaries, making them difficult to distinguish effectively; the boundary contours of lesions are often complex and highly irregular, making it difficult to effectively perceive specific trends of change, which greatly affects doctors' accurate diagnosis of patients' conditions. Traditional segmentation methods, which use algorithms based on watersheds, gray-scale thresholds, and clustering, rely too heavily on relatively fixed features extracted manually, such as color, shape, and contour. Their modeling capabilities are limited, and they can only extract some typical regions. However, the morphology, texture, and contours of lesions themselves are complex and diverse, making traditional segmentation methods limited and unable to effectively cope with this diversity, affecting segmentation accuracy and generalization ability.
[0003] In recent years, deep learning technology has shown significant development potential in the field of medical image segmentation. This type of method eliminates the reliance on manually extracted fixed features. Through learning from and continuously optimizing training datasets, it can quickly and effectively perceive the complex features of lesions, achieving more accurate medical image segmentation performance and effectively assisting doctors in diagnosing conditions in complex clinical environments. Currently, deep learning-based medical image segmentation methods mainly employ two technical routes: ① convolutional neural network (CNN)-based methods; ② Transformer-based methods. CNN-based segmentation methods are currently the most widely used. They typically use CNNs as the backbone, employing encoder-decoder structures, such as fully convolutional neural networks, U-Net models, and their derivatives. These segmentation algorithms mainly utilize the advantages of convolutional operations in capturing local image details, constructing multi-scale feature extraction modules, attention mechanisms, residual modules, and reconstructing skip connection structures to fully extract local details from medical images. However, due to the relatively small receptive field of convolutional operations, CNN-based models struggle to effectively model images from a global perspective and extract global semantic features, limiting further improvements in model segmentation performance. Recently, many innovative Transformer-based methods have been applied to medical imaging tasks with good results. Transformers possess a receptive field that covers the entire image, exhibiting excellent long-range modeling capabilities and effectively uncovering long-range dependencies between elements to extract global features. These network models typically incorporate self-attention mechanisms to model the global image. Compared to convolutional neural networks, Transformers can model the global image more efficiently and capture global dependencies. However, they also have limitations: Transformers are insufficient in modeling local image details, making it difficult to effectively extract key local features and resulting in predictions lacking detailed information.
[0004] Furthermore, the lesions themselves present several challenges for accurate image segmentation: Complete lesion segmentation is crucial in medical image segmentation to effectively prevent disease recurrence. However, in practice, the boundary regions of lesions are often complex, specifically: ① Lesions are very similar to normal tissue in texture and color, leading to blurred boundaries between them and surrounding normal tissue; ② The boundary contours of lesions are complex and variable, making it difficult for most models to effectively perceive their changing trends. These issues make it difficult for existing deep learning-based models to accurately locate lesion boundaries, easily resulting in missed or over-segmentation, thus affecting segmentation accuracy. Therefore, developing a method capable of accurately perceiving the location of lesion boundaries and precisely segmenting them in complex environments is crucial for advancing this field. Summary of the Invention
[0005] To overcome the problems existing in the above-mentioned medical image segmentation, to perform complete medical image segmentation, to accurately locate lesion boundaries, to improve the segmentation accuracy of the model, and to assist doctors in diagnosing patients' conditions in clinical settings, this invention provides a medical image segmentation method based on information guidance and boundary awareness.
[0006] In embodiments of the present invention, the medical image segmentation method based on information guidance and boundary awareness includes the following technical solutions, with specific steps as follows:
[0007] 1. A medical image segmentation method based on information guidance and boundary awareness, characterized by comprising the following steps:
[0008] Step 1: Dataset expansion and enhancement. This operation only applies to the training dataset and is used to increase the size of the training dataset.
[0009] Step 2: Use the Transformer encoder to extract features from the image;
[0010] Step 3: Use the Information Fusion Module (AGG) to process the features output from deeper stages of the encoder. This module runs in parallel with the decoder and acts as a target position guide.
[0011] Step 4: Use the boundary perception module (BP) to accurately locate the target boundary. This module uses the location information output by the information fusion module (AGG) as guidance and shallow stage features as details to perceive the boundary of the target lesion.
[0012] Step 5: Use the Global / Local Feature Extraction (GL) module to enhance and expand the feature information extracted during the encoding stage. This module is responsible for processing the two intermediate layers of features and mining the local and global features.
[0013] Step 6: Use the channel multi-scale module (MS) to process the deepest features and fully explore the global semantic features at multiple scales;
[0014] Step 7: Use the Region Fusion Module (RFM) to fuse the features output by each module, enhance uncertain regions, and continuously improve the segmentation results of the model.
[0015] 2. The medical image segmentation method based on information guidance and boundary awareness according to claim 1, characterized in that the specific methods for data augmentation and enhancement of the training dataset include operations such as Gaussian blur, color transformation, and flipping. Specifically, the Gaussian blur and color transformation operations are applied only to the image itself, while the remaining enhancement operations are consistently applied to the image and its corresponding labels.
[0016] 3. The medical image segmentation method based on information guidance and boundary awareness according to claim 1, characterized in that the encoder extracts features from the image in the following specific manner: a Transformer network encoder is used to simultaneously extract features from the target region of the image. Specifically, Pyramid Vision Transformer v2 (PVTv2) is used as the encoder to progressively extract multi-scale features of the image.
[0017] 4. The medical image segmentation method based on information guidance and boundary awareness according to claim 1, characterized in that the information fusion module (AGG) processes the features output from deeper stages in the following specific way: the module continuously fuses features from adjacent stages from deep to shallow, uses multiplication operations to amplify key features, and uses addition operations to retain and supplement the original features, progressively enhancing the perception ability of the target location.
[0018] 5. The medical image segmentation method based on information guidance and boundary awareness according to claim 1, characterized in that the boundary awareness module (BP) uses feature maps to locate the target boundary in the following specific ways: ① Information guidance: using the position information output by the information fusion module (AGG), the main position of the target is quickly found, and combined with shallow detail features, it helps to mine boundary information around the target; ② Boundary mining: in medical images, the boundary area of lesions is relatively blurry and the outline is complex. After refinement, the boundary presents a complex strip shape. In this stage, multi-scale depth-separable strip convolution with a relatively good shape is used to efficiently model and mine the boundary, capture the differences between regions, and quickly determine the boundary position of the target; ③ Boundary refinement: in this stage, the idea of reverse attention is used to eliminate the internal information of the target and leave only the boundary information, which is used to further refine the boundary outline and output the boundary prediction result.
[0019] 6. The medical image segmentation method based on information guidance and boundary awareness according to claim 1, characterized in that the global / local feature extraction module (GL) strengthens and expands the features of the intermediate two layers in the following specific ways: First, a parallel axial attention mechanism and a multi-scale feature extraction module are used to extract the global semantic features and local detail features of the image, respectively. The former uses an axial self-attention mechanism to mine the long-range dependencies between regions while reducing computational complexity; the latter uses multiple parallel branches superimposed with dilated convolutions to give it diverse receptive fields and obtain multi-scale local information. Second, a residual-based dual attention module is used to filter irrelevant features, strengthen key useful features, and focus the model's attention on the lesion region.
[0020] 7. The medical image segmentation method based on information guidance and boundary awareness according to claim 1, characterized in that the channel multi-scale module (MS) processes the deepest features, and the specific way to mine multi-scale global semantic features at the channel level is as follows: First, three parallel convolutional branches are used to mine global features of the feature map with different channel change rates. The channel change rates of each branch are: 512-32, 512-320-32, and 512-320-128-32. Each change can extract different feature information. The diversity of changes can fully extract semantic information and explore the relationship between channels. Second, two parallel operations are used to process the output of each branch: ① Convolution compresses the channel dimension to promote the fusion of features of each branch; ② Channel shuffling is first used to make the branch features cross-arranged at the channel level, and then convolution operation is used to compress the channel dimension to promote the flow and interaction of information and enhance the integrability between channels. Finally, an efficient channel attention mechanism is used to strengthen and filter features.
[0021] 8. The medical image segmentation method based on information guidance and boundary awareness according to claim 1, characterized in that the specific method by which the Region Fusion Module (RFM) fuses the outputs of each module, strengthens uncertain regions, and continuously improves the segmentation results is as follows: The input information includes boundary information, current stage features, and the output of the previous Region Fusion Module (RFM); ① First, the current stage features and the previous stage results are strengthened and key features are retained through tensor multiplication and addition operations; ② Second, the output results interact with the boundary information to increase the importance of boundary features; ③ Then, the model's attention to difficult-to-identify regions is increased through uncertain region strengthening operations; ④ Finally, the outputs in ② and ③ interact to retain identified features, strengthen uncertain regions, and output prediction results.
[0022] Compared with existing technologies, the advantages of this invention are: regardless of whether the lesion region has blurred boundaries or complex and varied boundary contours, this model can effectively perceive its boundaries and achieve precise and detailed boundary segmentation; at the same time, it can accurately locate small lesions and has advanced learning capabilities and generalization performance. The superior performance of this model can provide important reference value for doctors to accurately diagnose diseases in clinical settings. Attached Figure Description
[0023] Figure 1 This is a flowchart illustrating the overall structure and training process of a medical image segmentation method based on information guidance and boundary awareness. Figure 1 In this context, AGG represents the information fusion module, BP represents the boundary awareness module, GL represents the global / local feature extraction module, MS represents the channel multi-scale module, and RFM represents the region fusion module. During the testing phase, the test set is tested directly without the data augmentation operation in step S1.
[0024] Figure 2 It details the specific structure of the information aggregation module (AGG) and the information processing flow.
[0025] Figure 3 It details the specific structure of the boundary perception (BP) module and the information processing flow.
[0026] Figure 4 It details the structure of the global and local feature extraction module (GL) and the information processing flow.
[0027] Figure 5 It details the specific structure of the channel multi-scale module (MS) and the information processing flow.
[0028] Figure 6 It details the structure of the region fusion module (RFM) and the information processing flow. Detailed Implementation
[0029] Specific embodiments of the present invention will now be described. Examples of exemplary embodiments are shown in the accompanying drawings, which will help those skilled in the art to further understand the invention. It is worth noting that the following embodiments are only used to illustrate the technical solutions of the present invention, and not to limit it. The scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art without departing from the concept of the present invention are within the scope of protection of the present invention.
[0030] Please see Figures 1 to 6 This invention provides a medical image segmentation method based on information guidance and boundary awareness, comprising the following steps:
[0031] Step 1: Dataset augmentation operations, applied only to the training set, to expand the dataset size. These mainly include the following operations: Gaussian blur; color transformation; horizontal and vertical flip; affine transformation. Gaussian blur and color transformation are applied only to the images themselves, while the remaining augmentation operations are consistently applied to the images and their corresponding labels.
[0032] Step 2: Extract multi-scale features of the image using the Pyramid Visual Transformer v2 (PVTv2) encoder. The initial input image size is 352×352×3. The PVTv2 encoder utilizes a spatial reduction attention mechanism to model the global dependencies of the image. This branch has four image processing stages to progressively extract high-level semantic features of the image. The output image sizes of each stage are 88×88×64, 44×44×128, 22×22×256, and 11×11×512, respectively.
[0033] Step 3: As Figure 1 As shown, this invention creates an Information Fusion Module (AGG) to aggregate and process deeper-level features. Details of this module are as follows... Figure 2 As shown, its main function is to aggregate features from different stages to locate the main region where the lesion is located, guiding the boundary perception module (BP) to focus its attention on the target region, thereby perceiving the lesion boundary. Compared with shallow features, deep features can contribute more global semantic features, which is beneficial for lesion localization. Furthermore, shallow features have higher spatial resolution, resulting in relatively higher computational costs. Therefore, this module utilizes deeper features for information fusion. Specifically, by... Figure 2 As can be seen, this module mainly consists of convolution operations and vector multiplication and addition operations, employing a progressive fusion process. Multiplication operations can amplify key features and further highlight the weights of the target region, while addition operations can introduce original features to prevent information loss due to improper fusion. Convolution operations are used to fuse features. Through this progressive feature fusion method, the location information of lesions can be gradually improved, enhancing the model's ability to perceive targets.
[0034] Step 4: As Figure 1 As shown, this invention creates a boundary sensing module (BP), which senses the boundary of the lesion area by processing shallow information and the positional information output by the information fusion module (AGG). Specific details of this module are as follows... Figure 3 As shown, it mainly includes three stages: ① Information guidance, Figure 3 The green dashed line in the image represents the inputs: shallow detail information (F1) and positioning information output from the information fusion module (AGG). Each contains rich detailed information and semantic information, and the two are aligned in terms of the number of channels. The specific information processing flow at this stage is as follows:
[0035]
[0036] Where cat represents the channel concatenation operation, RA represents the residual-based attention mechanism, and the structure is as follows: Figure 3As shown on the far left, SA represents the spatial attention mechanism, which is responsible for processing the shallow feature map F1. It can effectively enhance the key local detail features in F1 and filter out useless feature information. Formula (1) uses a method of first concatenating channels and then connecting residuals to repeatedly interact with features, gradually focusing the model's attention on the target region while ensuring the integrity of information. ② Boundary mining, Figure 3 The red dashed line indicates that, at this stage, the invention uses parallel multi-scale depth-separable strip convolution to specifically mine the boundary information of lesions. The similarity between lesions and surrounding normal tissue, and the complex and varied contours, result in a blurred and complex strip shape in the boundary region. The invention uses strip convolution with a similar shape to efficiently model this region, and each parallel branch has a different receptive field, allowing the model to sensitively capture subtle differences between regions, thereby determining the boundary location; ③ Boundary refinement. Figure 3 The blue dashed line in the figure. Utilizing the idea of reverse attention mechanism, the boundary region is refined to enhance the model's boundary perception ability. The specific process is shown in formula (2):
[0037]
[0038] Where Y is the feature map output from stage ②, containing feature information that can accurately locate the lesion boundary and the internal feature information of the lesion. Through multiplication with the feature map S5 output by the reverse assignment operation, the internal feature information can be eliminated, leaving only the boundary information of the lesion. This focuses the attention on the boundary region, which is beneficial for further refining the boundary contour of the lesion. Finally, the boundary prediction result map S of the target is output. b .
[0039] Step 5: Enhance and expand feature information using the Global / Local Feature Extraction (GL) module and the Channel Multiscale (MS) module.
[0040] like Figure 1 As shown, the Global / Local Feature Extraction (GL) module processes feature maps F2 and F3, which are at a relatively mid-level and contain both detailed and global features. To fully utilize these two types of features, this invention constructs a Global / Local Feature Extraction (GL) module, the structure of which is detailed as follows: Figure 4As shown. First, two consecutive convolutional layer operations are used to perform preliminary processing on the input feature map, reduce the channel dimension, and use the local receptive field of the convolution operation to promote the interaction between adjacent pixels, enhance the correlation, and prevent the model's attention from being scattered. Second, the parallel axial attention mechanism and multi-scale feature extraction module are used to model the feature map from the global and local perspectives respectively, and extract the semantic features and detail features of the image. The axial attention mechanism is an improvement based on the self-attention mechanism. It uses the mapping values of the input sequence (Query, Q), key (Key, K) and value (Value, V) as calculation elements to mine the long-range dependencies between regions in the image. The specific calculation process is shown in formula (3):
[0041]
[0042] In the formula, Q, K, and V represent the mapping values of the input feature sequences, D represents the dimensions of Q and K, and Softmax represents the normalized activation function. Unlike the self-attention mechanism, the axial attention mechanism uses an attention mechanism along the width and height directions. Specifically, in the self-attention mechanism, the size of the elements Q and K involved in the calculation is HW*C, while in the axial attention mechanism, the size of the elements involved in the calculation is H*CW or W*CH, which greatly reduces the complexity of attention calculation. In addition, the axial attention mechanism can explore long-range dependencies from the width and height directions respectively, thereby achieving efficient modeling of global image features. In order to fully extract the detailed information of the image and strengthen the feature representation, this invention also uses a multi-scale feature extraction module in parallel with the axial attention mechanism to model the detailed features of the lesion. The specific process is shown in formula (4):
[0043]
[0044] In the formula, cat represents the channel concatenation operation. ① First, the module uses four parallel branches, which increase the receptive field of each branch by stacking dilated convolutions from bottom to top, making the receptive field range of the module from 3×3 to 33×33; ② Second, each upper branch also introduces the output of the lower branch into its own branch; ③ Finally, the outputs of each branch are concatenated and fused. In step ①, the receptive fields of each branch are different, which can explore image details from different ranges and obtain multi-scale local information. In addition, the reason for using dilated convolution is that, compared with ordinary convolution, dilated convolution can flexibly expand the receptive field without increasing computational complexity; in step ②, the upper branch introduces the features of the lower branch, which can promote the feature fusion of adjacent branches, enhance the correlation between features of each branch, and avoid the loss of local information due to differences when the features of the four branches are fused at the same time in step ③. The parallel axial attention mechanism and multi-scale feature extraction module enhance the output feature information, which is then further concatenated and fused to obtain a feature map that combines global semantic information and local detail information. Residual connections are then used to further enhance the feature representation and prevent model performance degradation. The enhanced features are then input into a residual-based dual attention module (RDA), the structure of which is as follows: Figure 4 As shown in the lower right section, SA represents the spatial attention mechanism, which enhances the model's attention to important spatial details and suppresses irrelevant information. ECA is the efficient channel attention mechanism, which effectively enhances the importance of key channels and suppresses irrelevant channels by utilizing the interaction and fusion between adjacent channels. This module also uses residual connection operations to supplement feature information and avoid degrading model performance. The global / local feature extraction module (GL) strengthens and expands global and local features and filters irrelevant information, allowing the model's attention to focus on the region where the lesion is located.
[0045] Depend on Figure 1 As can be seen, the channel multi-scale module (MS) processes the deepest feature map F4, which has a size of 512×11×11. This feature map primarily contains global semantic features, and these features are mainly contained at the channel (C=512) level. Therefore, this invention utilizes the channel multi-scale module (MS) to model the feature map at the channel level and mine the multi-scale global semantic features of the lesions. Its structure is as follows: Figure 5As shown, the three parallel convolutional branches employ different channel change rates to mine global features from the feature map. The channel changes from bottom to top are: 512-32, 512-320-32, and 512-320-128-32. This diversity allows for the extraction of semantic information and effective exploration of relationships between channels. Secondly, the outputs of each branch are concatenated to enrich the feature information. Then, two parallel operations are used to process the concatenated result: ① Convolution operation compresses the channel dimension, promoting the fusion of branch features; ② Channel shuffling is first used to cross-arrange different branch features at the channel level, followed by convolution operation to compress the channel dimension, promoting information flow and interaction, enhancing the incoherence between channels, and deeply fusing features. ① functions similarly to residual connections, supplementing information and preventing performance degradation due to improper feature fusion in ②. Finally, the features output from ① and ② are concatenated and interactively fused through convolution operation. Then, the efficient channel attention mechanism ECA is used for feature enhancement and filtering. In summary, this module can effectively explore the global features of an image at the channel level, helping the model to identify and locate targets.
[0046] Step 6: Use the Region Fusion Module (RFM) to fuse multiple features, retaining identified features, enhancing uncertain features, and improving the segmentation results. The module structure is as follows: Figure 6 As shown, by Figure 1 It can be seen that the input information is: the boundary information output by the boundary sensing module (BP), and the M of this stage. i The feature map output by module (i=2,3,4) and the segmentation result S output by the previous region fusion module i+1 (i=2,3. S4 is the segmentation result output by the Information Fusion Module (AGG)). Its main function is to continuously fuse and supplement information from deep to shallow layers, thereby optimizing the output result and ultimately achieving the best performance. The information processing flow of this module is as follows:
[0047]
[0048]
[0049]
[0050]
[0051] In formula (5), firstly, tensor multiplication can reinforce important common features and simultaneously improve the performance of F. i.2 Neglected in S i+1The weights of important lesion features that are valued are assigned; then, tensor addition is used to reintroduce F. i.2 To prevent multiplication operations from weakening the performance of F i.2 The importance of unique features. In formula (6), It is a feature map with rich boundary information output by the BP module. The channel dimension is the same as X. Tensor multiplication can increase the weight of the boundary features, but it will cause other features besides the boundary features to be ignored. Therefore, tensor addition is used to introduce X again to make up for the information. In formula (7), S b To represent the boundary partitioning graph output by the BP module, firstly, tensor multiplication and addition operations can be performed to divide S... b S i+1 The information already identified by the model is complementary and fused to obtain a feature map with complete information: it contains both boundary and internal information. Then, the model's attention to difficult-to-identify regions is increased through the uncertainty region enhancement operation E(x). The specific operation of E(x) is shown in formula (9):
[0052]
[0053] In the formula, sigmoid() represents the activation function. Uncertain regions are commonly found in blurry areas near boundaries or in areas that are difficult for the model to identify due to external factors such as lighting. For such regions, the model's prediction value is usually around 0.5, indicating whether they belong to lesions or normal tissue. Therefore, in this chapter, T=0.5, which maximizes the weight of difficult-to-identify regions and enhances the model's ability to identify such regions. Finally, in formula (8), the multiplication operation strengthens the uncertain regions, and the addition operation prevents the loss of identified feature regions. After prediction by the convolutional layer, the final lesion segmentation result S is output. i .
[0054] In summary, addressing the boundary issues in medical image segmentation: ① The boundaries of lesions are very similar to those of normal tissue, resulting in blurred boundary areas and making it difficult to distinguish the boundaries between lesions and other tissues; ② The complex contours of lesion boundaries make it difficult to effectively perceive trends of change. This invention relates to a medical image segmentation method based on information guidance and boundary awareness. It utilizes an information fusion module (AGG) to fuse deeper feature maps and mine global clues to aid in target localization. Simultaneously, it employs a boundary awareness module (BP), combined with the localization information output by the AGG module, to specifically model the boundaries of lesions, achieving accurate boundary segmentation. Furthermore, to enhance and expand feature information, a global / local feature extraction module (GL) and a channel multi-scale module (MS) are proposed. The former models intermediate layer features, mining global semantic features and local detail features, while the latter targets the deepest features, mining multi-scale global information at the channel level. Finally, a region fusion module (RFM) is constructed, which explores the relationship between the boundary and the target region, strengthens uncertain regions, and gradually improves the prediction results.
[0055] The method involved in this invention has better segmentation performance than many existing excellent models. It can accurately locate the boundaries of complex lesions and effectively segment variable targets, thus improving the problem of difficult boundary region segmentation in existing models in medical image segmentation.
[0056] The above description discloses only one preferred embodiment of the present invention, and should not be construed as limiting the scope of the present invention. Those skilled in the art will understand that all or part of the processes of the above embodiments can be implemented, and equivalent changes made in accordance with the claims of the present invention are still within the scope of the invention.
Claims
1. A medical image segmentation method based on information guidance and boundary awareness, characterized in that, Includes the following steps: Step 1: Dataset expansion and enhancement. This operation only applies to the training dataset and is used to increase the size of the training dataset. Step 2: Use the Transformer encoder to extract features from the image; Step 3: The information fusion module is used to process the features output from the deep stage of the encoder. This module runs in parallel with the decoder and acts as a guide for the target position. Step 4: Use the boundary perception module to locate the target boundary. This module uses the position information output by the information fusion module as guidance and shallow stage features as details to perceive the target boundary. Step 5: Use the global / local feature extraction module to enhance and expand the feature information extracted during the encoding stage. This module is responsible for processing the features of the two middle layers of the encoder and mining the local and global features. Step 6: Utilize the channel multi-scale module to process the deepest features and fully explore the global semantic features at multiple scales; Step 7: Use the region fusion module to fuse the features output by each module, strengthen uncertain regions, and continuously improve the segmentation results of the model.
2. The medical image segmentation method based on information guidance and boundary awareness according to claim 1, characterized in that, Specific methods for augmenting the training dataset include Gaussian blur, color transformation, and flipping operations. Among these, Gaussian blur and color transformation are applied only to the image itself, while the other operations are applied consistently to the image and its corresponding label.
3. The medical image segmentation method based on information guidance and boundary awareness according to claim 1, characterized in that, The encoder extracts features from the image in the following ways: it uses a Transformer network encoder to extract features from the target region of the image simultaneously; specifically, it uses the Pyramid Vision Transformer v2 as the encoder to extract multi-scale features from the image.
4. The medical image segmentation method based on information guidance and boundary awareness according to claim 1, characterized in that, The specific way the information fusion module processes the features output from the deep stage is as follows: This module continuously fuses features from adjacent stages from deep to shallow, uses multiplication operations to amplify key features, and uses addition operations to retain and supplement the original features, thus progressively enhancing the ability to perceive the target location. 5.The information guidance and boundary perception based medical image segmentation method according to claim 1, characterized in that, The boundary awareness module uses feature maps to locate target boundaries in the following ways: ① Information guidance: Using the location information output by the information fusion module, the main position of the target is quickly found. Combined with shallow detail features, it helps to mine boundary information around the target; ② Boundary mining: In medical images, the boundary region of lesions is blurred and the outline is complex. After refinement, the boundary appears as a strip. In this stage, shape-fitting multi-scale depth-separable strip convolution is used to efficiently model and mine the boundary, capture the differences between regions, and quickly determine the boundary position of the target; ③ Boundary refinement: In this stage, the idea of reverse attention is used to eliminate the internal information of the target and leave only the boundary information. The boundary outline is further refined, and the predicted boundary result is output. 6.The information guidance and boundary perception based medical image segmentation method according to claim 1, characterized in that, The specific methods by which the global / local feature extraction module strengthens and expands the features of the middle two layers are as follows: First, the global semantic features and local detail features of the image are extracted by using a parallel axial self-attention mechanism and a multi-scale feature extraction module, respectively. Specifically, the axial self-attention mechanism is used to explore the long-range dependencies between regions while reducing computational complexity. Then, multiple parallel branches are superimposed with dilated convolution to give it a diverse receptive field and obtain multi-scale local information. Second, a residual-based dual attention module is used to filter irrelevant features, strengthen key useful features, and focus the model's attention on the lesion area.
7. The medical image segmentation method based on information guidance and boundary awareness according to claim 1, characterized in that, The multi-scale module processes the deepest features and mines multi-scale global semantic features at the channel level in the following ways: First, three parallel convolutional branches are used to mine global features of the feature map with different channel change rates. The channel change rates of each branch are: 512-32, 512-320-32, and 512-320-128-32. Second, two parallel operations are used to process the output of each branch: ① Convolution compresses the channel dimension to promote the fusion of features from each branch; ② Channel shuffling is first used to make the branch features cross-arranged at the channel level, and then convolution operation is used to compress the channel dimension to promote the flow and interaction of information and enhance the integrability between channels. Finally, an efficient channel attention mechanism is used to strengthen and filter the features. 8.The information guidance and boundary perception based medical image segmentation method of claim 1, wherein, The region fusion module integrates the outputs of various modules, strengthens uncertain regions, and continuously improves the segmentation results in the following ways: Input information includes boundary information, current stage features, and the output of the previous region fusion module. ① First, the current stage features and the results of the previous stage are strengthened and key features are preserved through tensor multiplication and addition operations. ② Second, the output results interact with the boundary information to increase the importance of boundary features. ③ Then, the uncertain region strengthening operation increases the model's attention to areas that are difficult to identify. ④ Finally, the outputs in ② and ③ interact to preserve identified features, strengthen uncertain regions, and output prediction results.