An adrenal gland segmentation method and model training method, device and equipment based on mamba and medium
By using a Mamba-based adrenal gland segmentation model, combined with a three-way Mamba module and attention mechanism, the problems of time-consuming, labor-intensive, and inaccurate segmentation of adrenal glands and adenomas are solved. This enables efficient and accurate adrenal gland volume measurement and automatic calculation of multiple parameters, thereby improving the accuracy and efficiency of clinical diagnosis and treatment.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TIANJIN MEDICAL UNIVERSITY GENERAL HOSPITAL
- Filing Date
- 2026-02-06
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies for the automatic segmentation and volume measurement of adrenal glands and adenomas suffer from problems such as being time-consuming and labor-intensive, having large subjective errors, and insufficient segmentation accuracy. In particular, they are difficult to handle complex boundaries and low contrast in the segmentation of small adrenal glands and adenomas, and cannot meet the high-precision requirements of clinical practice.
An adrenal gland segmentation model based on Mamba is adopted, which combines a three-way Mamba module and an attention mechanism module. The model parameters are optimized by combining loss functions to achieve efficient and accurate identification and segmentation of the adrenal gland region. This includes gray-level normalization, window level and width adjustment and resampling preprocessing. Dice loss, boundary loss and small lesion focusing loss are used to improve segmentation accuracy.
It achieves high-precision automatic segmentation and volume measurement of adrenal glands and adenomas, reduces subjective errors, improves segmentation stability and robustness, supports automatic calculation of multidimensional physiological parameters, and enhances the accuracy and efficiency of diagnosis and treatment.
Smart Images

Figure CN122244431A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of artificial intelligence technology, and more specifically, to a method, apparatus, device, and medium for adrenal gland segmentation based on Mamba and its model training. Background Technology
[0002] Currently, the clinical diagnosis of adrenal diseases mainly relies on imaging techniques such as CT and MRI to locate and measure adrenal glands and adenomas. Automated segmentation and volume measurement of adrenal glands and adenomas are important research directions in the field of medical image analysis, particularly valuable in the diagnosis and treatment monitoring of adrenal diseases such as Cushing's syndrome and pheochromocytoma. Accurately identifying glands and adenomas and measuring their volume, morphology, and density are crucial for determining the nature of lesions (e.g., benign or malignant), guiding treatment strategies, and improving the accuracy of clinical diagnosis and treatment. It also represents a key breakthrough in reducing the workload of radiologists and improving medical efficiency.
[0003] There are currently three main solutions for identifying and measuring adrenal glands and adenomas, all of which have significant technical drawbacks: First, the mainstream clinical method is manual processing, where radiologists manually delineate the target area frame by frame in medical images, and then use image workstation software to perform two-dimensional or three-dimensional parameter measurements. For example, the length, width, and height of the largest cross-section of the adrenal gland are manually marked on CT or MRI images, and the volume is estimated using the formula (length × width × height × π / 6) assuming the adrenal gland is an ellipsoid. This method is not only time-consuming and labor-intensive, but also highly susceptible to subjective factors, resulting in significant subjective differences. Furthermore, because the actual shape of the adrenal gland is irregular, simplified processing can lead to approximately 20% to... A 30% volume estimation error is more pronounced when adrenal gland boundaries are blurred or adenoma morphology is atypical. Secondly, traditional machine-aided segmentation methods, including atlas-based segmentation and region growth / threshold-based segmentation, suffer from limitations. Atlas-based segmentation relies on predefined anatomical atlases registered to patient images, but the small size and significant morphological variations of adrenal glands and adenomas limit registration effectiveness. Region growth / threshold-based segmentation is sensitive to image contrast and easily affected by surrounding tissues such as the kidneys and liver. Thirdly, recent advancements in deep learning methods have led researchers to explore convolutional neural network (CNN) architectures, drawing inspiration from U-Net and 3D... U-Net, nnU-Net, and other algorithms have been widely used for automatic segmentation of multiple organs and lesions, including the segmentation of adrenal glands and adenomas. Although some progress has been made, most current research is limited to coarse-grained segmentation or classification tasks and has not yet fully solved the problems of fine segmentation and accurate quantization. Moreover, existing algorithms often lack sufficient segmentation accuracy for small-sized adrenal glands and adenomas, and are difficult to handle the low contrast and complex boundary problems between the adrenal gland and surrounding tissues, thus failing to meet the actual clinical needs for high-precision automatic segmentation and measurement. Summary of the Invention
[0004] To address the aforementioned issues, this invention proposes a method, apparatus, device, and medium for adrenal gland segmentation and model training based on Mamba. This method can automatically and efficiently identify and segment adrenal tissue and adenoma regions from medical images, and achieve rapid and accurate measurement of key parameters such as the volume, area, and CT density of the gland and adenoma.
[0005] In a first aspect, embodiments of the present invention provide a training method for an adrenal gland segmentation model based on Mamba, the method comprising:
[0006] Obtain CT images of the adrenal gland region and their annotation results;
[0007] The CT image of the adrenal gland region is input into the initial segmentation model to obtain the predicted segmentation result. The initial segmentation model includes a three-way Mamba module and an attention mechanism module.
[0008] The loss between the annotation result and the predicted segmentation result is calculated using a combined loss function;
[0009] The parameters of the initial segmentation model are adjusted using the loss to obtain a Mamba-based segmentation model.
[0010] As one possible implementation, the step of inputting the CT image of the adrenal gland region into the initial segmentation model to obtain the predicted segmentation result includes:
[0011] The backbone layer uses deep convolution to extract the initial features of the CT image of the adrenal gland region and outputs a 48-channel feature map.
[0012] The ToM layer flattens the 48-channel feature map into three sequences f, r, and s, and models the global dependency through the Mamba layer to output the ToM features.
[0013] The SE layer performs channel weight optimization on the ToM features to obtain attention features;
[0014] The downsampling layer reduces the feature map size of the attention features through strided convolution to obtain downsampling features;
[0015] The upsampling layer recovers the feature map size of the downsampling features through transposed convolution to obtain the upsampling features;
[0016] The output layer generates a predicted segmentation result based on the upsampled features.
[0017] As one possible implementation, the method further includes:
[0018] The mean-variance standardization method was used to normalize the CT images of the adrenal gland region;
[0019] And / or, grayscale truncation of the CT image of the adrenal gland region is performed using a preset window width and window level;
[0020] And / or, the CT images of the adrenal region are resampled to a uniform isovoxel resolution using nearest neighbor interpolation.
[0021] As one possible implementation, the combined loss function is a loss function that combines Dice loss, boundary loss, and small lesion focusing loss.
[0022] As one possible implementation, the annotation result is obtained through a two-stage annotation operation, which includes:
[0023] Obtain the initial labeled dataset, which consists of CT images of the adrenal gland region labeled with transverse, sagittal, or coronal glands;
[0024] The initial labeled data is input into the initial segmentation model to obtain auxiliary segmentation results;
[0025] Obtain the annotation result set, wherein the annotation result is obtained by manually annotating based on the annotation segmentation result.
[0026] Secondly, embodiments of the present invention provide an adrenal gland segmentation method based on Mamba, the method comprising:
[0027] The target CT image is input into a Mamba-based segmentation model to obtain the target prediction result. The Mamba-based segmentation model is trained by the method described in the first aspect.
[0028] Thirdly, embodiments of the present invention provide a training device for an adrenal gland segmentation model based on Mamba, the device comprising:
[0029] The acquisition module is used to acquire CT images of the adrenal gland region and their annotation results;
[0030] The prediction module is used to input the CT image of the adrenal region into the initial segmentation model to obtain the predicted segmentation result. The initial segmentation model includes a three-way Mamba module and an attention mechanism module.
[0031] The loss calculation module is used to calculate the loss between the labeled results and the predicted segmentation results using the combined loss function;
[0032] The parameter tuning module is used to adjust the parameters of the initial segmentation model using the loss to obtain a Mamba-based segmentation model.
[0033] Fourthly, embodiments of the present invention provide an adrenal gland segmentation device based on Mamba, characterized in that the device comprises:
[0034] The target prediction module is used to input the target CT image into a Mamba-based segmentation model to obtain the target prediction result. The Mamba-based segmentation model is trained by the method described in the first aspect.
[0035] Fifthly, embodiments of the present invention provide an electronic device, including: one or more processors; and a storage device having stored one or more programs thereon, which, when executed by the one or more processors, cause the one or more processors to implement the method described in any implementation of the first or second aspect.
[0036] In a sixth aspect, embodiments of the present invention provide a readable storage medium having executable instructions stored thereon, wherein the executable instructions, when executed by a processor, implement the method as described in any of the implementations of the first or second aspect.
[0037] The embodiments of the present invention provide a method, apparatus, device, and medium for adrenal gland segmentation and model training based on Mamba, comprising: acquiring CT images of the adrenal gland region and their annotation results; inputting the CT images of the adrenal gland region into an initial segmentation model to obtain a predicted segmentation result, wherein the initial segmentation model includes a three-way Mamba module and an attention mechanism module; calculating the loss between the annotation result and the predicted segmentation result using a combined loss function; and adjusting the parameters of the initial segmentation model using the loss to obtain a Mamba-based segmentation model.
[0038] Thus, by introducing a three-way Mamba module and an attention mechanism module, complex features in adrenal gland region CT images can be effectively captured, improving segmentation accuracy and robustness. The design of the combined loss function further optimizes the model's training process, enabling it to better adapt to segmentation needs under different conditions. The Mamba-based segmentation model performs excellently in adrenal gland segmentation tasks, providing a new and effective solution for the field of medical image processing. Attached Figure Description
[0039] Figure 1 This is an exemplary architecture diagram in which an embodiment of the present invention can be applied;
[0040] Figure 2 This is a flowchart of an embodiment of the adrenal gland segmentation model training method based on Mamba provided in this invention;
[0041] Figure 3This is a TSMamba segmentation network structure diagram of an embodiment of the adrenal gland segmentation model training method based on Mamba provided in this invention.
[0042] Figure 4 This is a flowchart of an embodiment of the adrenal gland segmentation method based on Mamba provided in this invention;
[0043] Figure 5 This is a schematic diagram of adenoma segmentation results from an embodiment of the adrenal gland segmentation method based on Mamba provided in this invention.
[0044] Figure 6 This is a schematic diagram of adenoma segmentation results from an embodiment of the adrenal gland segmentation method based on Mamba provided in this invention.
[0045] Figure 7 This is a structural diagram of one embodiment of the Mamba-based adrenal gland segmentation model training device provided in this invention.
[0046] Figure 8 This is a structural diagram of one embodiment of the Mamba-based adrenal gland segmentation device provided in this invention.
[0047] Figure 9 This is a schematic diagram of the structure of a computer suitable for implementing embodiments of the present disclosure. Detailed Implementation
[0048] The present invention will now be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and not intended to limit it. Furthermore, it should be noted that, for ease of description, only the parts relevant to the invention are shown in the accompanying drawings.
[0049] It should be noted that, unless otherwise specified, the embodiments and features described in the present invention can be combined with each other. The present invention will now be described in detail with reference to the accompanying drawings and embodiments.
[0050] Figure 1 An exemplary system architecture 100 is shown, in which embodiments of the Mamba-based adrenal gland segmentation and its model training methods, apparatus, electronic devices, and storage media of this disclosure can be applied.
[0051] like Figure 1As shown, system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105. Network 104 serves as the medium for providing communication links between terminal devices 101, 102, and 103 and server 105. Network 104 may include various connection types, such as wired or wireless communication links, or fiber optic cables, etc.
[0052] Users can use terminal devices 101, 102, and 103 to interact with server 105 via network 104 to receive or send messages, etc. Various communication client applications can be installed on terminal devices 101, 102, and 103, such as image processing applications, voice recognition applications, short video social applications, audio and video conferencing applications, live video streaming applications, document editing applications, input method applications, web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, etc.
[0053] Terminal devices 101, 102, and 103 can be either hardware or software. When terminal devices 101, 102, and 103 are hardware, they can be various electronic devices with displays, including but not limited to smartphones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III), MP4 players (Moving Picture Experts Group Audio Layer IV), laptops, and desktop computers, etc. When terminal devices 101, 102, and 103 are software, they can be installed on the terminal devices listed above. They can be implemented as multiple software programs or software modules (e.g., used to provide image processing services) or as a single software program or software module. No specific limitations are imposed here.
[0054] In some cases, the Mamba-based adrenal gland segmentation and model training method provided in this disclosure can be executed by terminal devices 101, 102, and 103. Accordingly, the Mamba-based adrenal gland segmentation and model training device can be set in terminal devices 101, 102, and 103. In this case, the system architecture 100 may not include server 105.
[0055] In some cases, the Mamba-based adrenal gland segmentation and model training method provided in this disclosure can be jointly executed by terminal devices 101, 102, and 103 and server 105. For example, the step of "acquiring CT images of the adrenal gland region and their annotation results" can be executed by terminal devices 101, 102, and 103, and the steps of "inputting the CT images of the adrenal gland region into the initial segmentation model to obtain the predicted segmentation results" can be executed by server 105. This disclosure does not limit this. Correspondingly, the Mamba-based adrenal gland segmentation and model training device can also be respectively set in terminal devices 101, 102, and 103 and server 105.
[0056] In some cases, the adrenal gland segmentation and model training method based on Mamba provided in this disclosure can be executed by server 105. Correspondingly, the adrenal gland segmentation and model training device based on Mamba can also be set in server 105. In this case, the system architecture 100 may not include terminal devices 101, 102, and 103.
[0057] It should be noted that server 105 can be either hardware or software. When server 105 is hardware, it can be implemented as a distributed server cluster consisting of multiple servers, or as a single server. When server 105 is software, it can be implemented as multiple software programs or software modules (for example, used to provide distributed services), or as a single software program or software module. No specific limitations are made here.
[0058] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.
[0059] Continue to refer to Figure 2 It illustrates a flowchart 200 of an embodiment of the Mamba-based adrenal gland segmentation model training method of the present invention, which includes steps 201 to 204:
[0060] Step 201: Obtain CT images of the adrenal gland region and their annotation results.
[0061] The adrenal glands are small, poorly defined, and morphologically variable organs, often obscured anatomically by surrounding organs such as the kidneys, liver, spleen, and pancreas; adenoma lesions are even more heterogeneous and small in size. Therefore, when automatically segmenting them, relying solely on local contextual features often fails to yield accurate contours, making the efficient capture of global spatial dependencies a key challenge. CT images of the adrenal gland region were acquired, with scan sequences including one or more of the following: plain CT, arterial phase, and venous phase images. The dataset encompasses normal adrenal glands, hyperplastic glands, and adrenal adenoma cases, with adenoma data further including pre- and post-treatment imaging.
[0062] The CT images of the adrenal gland region are labeled. In the labeling of the Region of Interest (ROI) of the adrenal gland and adenoma, this embodiment employs a "two-stage labeling method" to improve labeling efficiency and quality:
[0063] Phase 1: Coarse Segmentation Network Training
[0064] The initial annotation data consisted of CT images of the adrenal gland region annotated with transverse, sagittal, or coronal sections. First, radiologists with extensive clinical experience manually annotated a small sample of glands / adenomas based on transverse, sagittal, and coronal sections, constructing the initial annotation dataset. A preliminary coarse segmentation network model was then trained based on this dataset.
[0065] Phase Two: Iterative Optimization and Fine-Grained Annotation
[0066] The initial labeled data is input into the initial segmentation model to obtain auxiliary segmentation results. The trained coarse segmentation model is then used to perform inference and prediction on the unlabeled images, and the prediction results are fed back to the doctor. The doctor then performs corrections and refinements based on these results, thereby generating high-quality labeled results. This method allows for the efficient construction of large-scale, accurately labeled adrenal gland and adenoma datasets for subsequent training and optimization of deep segmentation models.
[0067] As one possible implementation method, to ensure the stability and generalization ability of the model training, all CT images undergo systematic multi-step preprocessing, the specific process of which is as follows:
[0068] 1) Gray-scale normalization:
[0069] To eliminate differences in grayscale distribution caused by different CT scanning equipment or imaging protocols, the mean-variance normalization method (Z-score normalization) is used to normalize the images. The calculation formula is as follows:
[0070]
[0071] Where μ and σ represent the mean and standard deviation of the image, respectively.
[0072] 2) Window width and position adjustment:
[0073] To highlight the adrenal glands and their lesion areas while suppressing the effects of background noise and other non-target structures, an appropriate window width (WW) and window level (WL) are selected for grayscale truncation of the image. For example, a soft tissue window (WW: 400 HU, WL: 40 HU) or an adrenal gland-specific window (WW: 300 HU, WL: 50 HU) can be used to enhance the contrast and visibility of the region of interest.
[0074] 3) Resampling:
[0075] Raw CT images often exhibit anisotropy, which can affect the accuracy of 3D feature extraction. To address this, nearest neighbor interpolation is used to resample the images to a uniform isocubic voxel resolution (1 × 1 × 1 mm³), thereby eliminating the bias caused by spatial anisotropy and ensuring the consistent modeling capability of subsequent segmentation models in 3D space.
[0076] Step 202: Input the CT image of the adrenal gland region into the initial segmentation model to obtain the predicted segmentation result.
[0077] During the model prediction phase, the trained model can quickly process newly input CT images and generate accurate segmentation results. The initial segmentation model introduces Mamba for the first time into the automatic segmentation of adrenal glands and adenomas, designing a Tri-directional Oriented Mamba (ToM) module to enhance the sequential modeling capability of volume data in three orthogonal directions: sagittal, coronal, and axial planes. Secondly, to improve the model's ability to model small targets and heterogeneous structures, this approach introduces a squeeze-and-excitation (SE) module.
[0078] Combination Figure 3 (a) In terms of overall architecture, the initial segmentation model adopts a U-Net-style encoder-decoder structure as the backbone network. This framework integrates the backbone layer, downsampling, upsampling, and compression excitation modules to achieve feature extraction and reconstruction of 3D input data. The feature encoding part consists of a deep convolutional backbone layer and multiple ToM plus downsampling layers, aiming to efficiently learn a deep representation of the input data.
[0079] Specifically, the backbone layer uses a depthwise convolution with a kernel size of 7×7×7, padding of 3×3×3, and a stride of 2×2×2. For a given 3D input volume... (Where C represents the number of input channels), after processing by the backbone layer, the first-scale features are generated. Subsequently, these features are passed step-by-step to each ToM block and its corresponding downsampling layer for further processing. To enhance the model's ability to learn features of small targets, the SE module is embedded in different layers of the network. Finally, the decoding stage is implemented through multiple upsampling layers, leveraging cross-scale fusion to aggregate global and local feature information to obtain more accurate segmentation results. This design can more fully express the continuity and morphological consistency of adrenal glands and adenomas in space, significantly improving segmentation accuracy and robustness.
[0080] The three-dimensional Mamba (ToM) module aims to model long-distance dependencies in the axial, coronal, and sagittal directions of volume data, achieving global spatial awareness. This allows local details and global dependency modeling to complement each other, effectively capturing the multi-scale structural features of organs and lesions—particularly important in segmentation tasks of small-volume heterogeneous structures such as adrenal glands and adenomas. Mamba originates from a state-space model and models long-distance dependencies through selection mechanisms and hardware-aware algorithms, simultaneously improving training and inference efficiency. Its structure is as follows: Figure 3 As shown in b) of the diagram. The input sequence is processed through two linear projection branches: one branch performs convolutional transformation and state space modeling, combining a nonlinear activation function to extract rich local features and long-range dependencies; the other branch utilizes residual connections to ensure effective transmission of information flow, ultimately outputting the state representation of the sequence. This method can efficiently capture dependencies in long sequences. However, the original Mamba block only models global dependencies in a single direction, which is insufficient to handle the complexity of high-dimensional medical images. Therefore, to effectively capture global information in three-dimensional features, this embodiment introduces a three-way Mamba module to calculate feature dependencies from three different directions. Specifically, as shown in b) of the diagram. Figure 3 As shown in c), the 3D input features are flattened into three sequences, and corresponding feature interactions are performed to obtain fused 3D features. The calculation method can be summarized by the following formula:
[0081]
[0082] The Mamba layer is used to model global information within a sequence, with the symbols f, r, and s representing the forward, backward, and inter-slice unfolding operations, respectively. This three-way Mamba module design not only enhances the modeling ability of global dependencies in 3D medical images but also improves the recognition accuracy of complex structural features, especially for small-volume heterogeneous structures such as the adrenal gland and its adenoma.
[0083] To further enhance the ability to identify and characterize small lesions such as adrenal adenomas, this embodiment introduces the Squeeze-and-Excitation (SE) attention mechanism, such as... Figure 3As shown in d), this module adaptively assigns different weights to each channel by explicitly modeling the channel dimension, thereby enhancing the responsiveness to key feature channels and suppressing irrelevant or redundant channels. Specifically, the SE module consists of two stages: the Squeeze stage compresses the spatial dimension information of each channel through global average pooling to extract global contextual information; the Excitation stage uses a two-layer fully connected network with non-linear activation functions such as ReLU and Sigmoid to learn the dependencies between channels and generate weight coefficients. These weights are then used to readjust the channel strength of the original feature map, achieving a more sensitive response and more prominent feature representation for small lesion regions, thereby enhancing the model's detection and segmentation performance for small lesions with inconspicuous morphological changes.
[0084] During the calculation, the backbone layer uses depthwise convolution to extract the initial features of the CT image of the adrenal gland region, outputting a 48-channel feature map. The ToM layer flattens the 48-channel feature map into three sequences f, r, and s, and models global dependencies through Mamba layers to output ToM features. The SE layer optimizes the channel weights of the ToM features to obtain attention features. The downsampling layer reduces the feature map size of the attention features through stride convolution to obtain downsampled features. The upsampling layer restores the feature map size of the downsampled features through transpose convolution to obtain upsampled features. The output layer generates the predicted segmentation result based on the upsampled features.
[0085] In some embodiments, feature extractors can be designed separately for multiple channels of each lead. These extractors include various time-series signal feature extraction networks, including but not limited to one or more combinations of algorithms such as Transformer and its variants, Usleep networks, Utime networks, and ResNet-LSTM networks. In the multi-feature fusion stage, different methods such as cascaded fusion, weighted fusion, attention fusion, or graph convolution fusion can be selected according to the relationship between the task and the data.
[0086] Step 203: Calculate the loss between the labeled results and the predicted segmentation results using the combined loss function.
[0087] In medical image segmentation tasks, especially for scenarios with blurred boundaries, numerous small lesions, and imbalanced categories, such as adrenal glands and adenomas, a single loss function often struggles to simultaneously learn both global and local features.
[0088] Therefore, this invention combines Dice Loss, Boundary Loss, and Focal Loss to improve the model's overall accuracy, boundary detail, and sensitivity to small targets. Dice Loss measures the overlap between the predicted and ground truth regions; Boundary Loss effectively improves the detail segmentation of tumor or organ contours; and Focal Loss addresses class imbalance and the difficulty in learning small lesions. The total loss can be defined as:
[0089]
[0090] Step 204: Adjust the parameters of the initial segmentation model using the loss to obtain a Mamba-based segmentation model.
[0091] Based on actual clinical needs and the shortcomings of current technology, this embodiment proposes an automated measurement system for adrenal glands and adenomas based on deep learning. By deeply integrating deep learning with automated measurement technology, this system achieves efficient and accurate identification and quantitative analysis of adrenal gland structures and their lesion areas. The technical process is as follows: First, the raw CT image data is preprocessed, and the Region of Interest (ROI) is labeled to provide a high-quality data foundation for subsequent model training. Then, an end-to-end deep learning model is designed and constructed. By automatically learning multi-level deep features and combining them with an optimized loss function, high-precision segmentation and identification of adrenal glands and adenomas are achieved.
[0092] In this way, by continuously iterating and optimizing model parameters, the system can gradually improve its ability to analyze complex medical images, ensuring stable segmentation performance under different scanning conditions and case characteristics. The final optimized model not only supports automated measurement of basic parameters such as adrenal gland volume and morphology, but also generates visual reports through 3D reconstruction technology, providing intuitive quantitative evidence for clinical diagnosis and treatment. Compared with traditional methods, it significantly improves the detection rate of small lesions and the accuracy of boundary localization, especially demonstrating important clinical value in the early screening and postoperative follow-up of adrenal adenomas.
[0093] Continue to refer to Figure 4 It illustrates a flowchart 400 of an embodiment of the Mamba-based adrenal gland segmentation method of the present invention, which includes steps 401 to 402:
[0094] Step 401: Input the target CT image into the Mamba-based segmentation model to obtain the target prediction result.
[0095] The Mamba-based segmentation model is the model trained in the previous embodiment. During the model inference phase, this invention uses the trained gland and adenoma segmentation model to perform high-precision predictions on new 3D medical image data.
[0096] Specifically, the optimal model parameters on the test dataset are saved and used for inference on new samples. After the model runs, it outputs the Region of Interest (ROI) of the target area (gland and its internal adenoma), thereby achieving precise localization and boundary delineation of the lesion structure. Figure 5 , Figure 6 The two embodiments shown demonstrate that the model can stably and accurately identify adrenal glands and their lesion areas (adenomas) of different shapes and locations, verifying its good generalization ability and clinical applicability.
[0097] Step 402: Based on the target prediction results, extract various physiological parameters from the target CT image.
[0098] After obtaining the ROI segmentation results of glands and adenomas in CT images, an automatic calculation module for multiple physiological parameters can be integrated to achieve quantitative assessment of lesions. Key extracted indicators include volume, maximum cross-sectional area, and mean CT value. These parameters provide physicians with objective and quantitative imaging evidence, aiding in diagnosis and efficacy evaluation. For example, volume directly reflects changes in the size of the adrenal gland or adenoma, which is important for judging tumor growth trends or postoperative recurrence; maximum cross-sectional area helps assess the extent of lesion expansion in a specific plane; mean CT value provides information on tissue density, helping to differentiate between benign and malignant lesions or monitor tissue changes during treatment. By integrating these multi-dimensional physiological parameters, physicians can gain a more comprehensive understanding of lesion characteristics and develop more precise treatment plans. Furthermore, the system supports exporting extracted parameters in a structured format, facilitating integration with electronic medical record systems and enabling seamless flow and sharing of clinical data.
[0099] The experiment showed that the system automatically measured 365 clinical cases and compared the results with the gold standard (manually annotated results). The statistical results are shown in Table 1.
[0100] Table 1
[0101] method Volume (mm³) - L Volume (mm³) - R Maximum cross-sectional area (mm²) - L Maximum cross-sectional area (mm²) - R CT value - L CT value - R Manual annotation 5071.373 4167.762 236.674 231.707 58.310 61.256 AI Results 5159.396 4167.821 237.396 231.553 56.954 60.513
[0102] Experiments show that this method has high accuracy and consistency in the measurement of key parameters. Taking adrenal gland volume as an example, its degree of hyperplasia is one of the important criteria for judging functional disorders (such as Cushing's syndrome or primary aldosteronism). Furthermore, in this embodiment, the system automatically measured adenoma data before and after treatment in 18 cases, and compared the two results. The statistical results are shown in Table 2.
[0103] Table 2
[0104] method Volume (mm³) - L Volume (mm³) - R Maximum cross-sectional area (mm²) - L Maximum cross-sectional area (mm²) - R CT value - L CT value - R Manual annotation 5071.373 4167.762 236.674 231.707 58.310 61.256 AI Results 5159.396 4167.821 237.396 231.553 56.954 60.513
[0105] Therefore, the experimental results show that this method has significant advantages and clinical application value in evaluating treatment efficacy.
[0106] In summary, this method not only significantly improves the accuracy and stability of adrenal gland and adenoma segmentation, but also provides reliable data support for clinical diagnosis, efficacy evaluation, and personalized treatment. Furthermore, in the multi-parameter automated quantification stage, the system automatically generates key physiological parameters, including volume, maximum diameter, and average CT value, based on the segmentation results, achieving comprehensive digital assessment of the lesions. This method significantly improves the accuracy and objectivity of endocrine system disease diagnosis and treatment by providing standardized and automated morphological quantification data. It demonstrates irreplaceable clinical value, particularly in the differential diagnosis of functional and non-functional lesions and the dynamic monitoring of lesion changes before and after treatment. Simultaneously, this automated measurement method effectively improves diagnostic efficiency, reduces subjective errors caused by traditional manual measurements, and enhances data consistency and reproducibility. In the future, with the accumulation of more clinical data and algorithm optimization, this system is expected to play a greater role in the precision diagnosis and treatment of endocrine diseases and other related fields, further promoting the deep integration of radiomics and artificial intelligence technologies in clinical practice.
[0107] See below. Figure 7 The Mamba-based adrenal gland segmentation model training device 700 in this embodiment includes: an acquisition module 701, a prediction module 702, a loss calculation module 703, and a parameter tuning module 704.
[0108] Among them, the acquisition module 701 is used to acquire CT images of the adrenal gland region and their annotation results;
[0109] Prediction module 702 is used to input the CT image of the adrenal region into the initial segmentation model to obtain the predicted segmentation result. The initial segmentation model includes a three-way Mamba module and an attention mechanism module.
[0110] The loss calculation module 703 is used to calculate the loss between the labeled results and the predicted segmentation results using the combined loss function;
[0111] The parameter tuning module 704 is used to adjust the parameters of the initial segmentation model using the loss to obtain a Mamba-based segmentation model.
[0112] As one possible implementation, the prediction module 702 includes:
[0113] The convolutional unit is used for backbone layer convolution and depth convolution to extract the initial features of the CT image of the adrenal region and output a 48-channel feature map.
[0114] The ToM unit is used by the ToM layer to flatten the 48-channel feature map into three sequences f, r, and s, which are then modeled with global dependencies through the Mamba layer to output ToM features.
[0115] SE unit, used by the SE layer to optimize the channel weights of the ToM features to obtain attention features;
[0116] The downsampling unit is used by the downsampling layer to reduce the feature map size of the attention features through stride convolution to obtain downsampling features;
[0117] The upsampling unit is used by the upsampling layer to recover the feature map size of the downsampling features through transposed convolution, thereby obtaining the upsampling features;
[0118] The output unit is used by the output layer to generate the predicted segmentation result based on the upsampled features.
[0119] As one possible implementation, the Mamba-based adrenal gland segmentation model training device 700 further includes:
[0120] The normalization module is used to normalize the CT images of the adrenal region using the mean-variance standardization method;
[0121] And / or, a window width and window level adjustment module, used to perform grayscale truncation on the CT image of the adrenal gland region using a preset window width and window level;
[0122] And / or, a resampling module for resampling the CT image of the adrenal region to a uniform isovoxel resolution using nearest neighbor interpolation.
[0123] As one possible implementation, the combined loss function is a loss function that combines Dice loss, boundary loss, and small lesion focusing loss.
[0124] As one possible implementation, the annotation result is obtained through a two-stage annotation operation, which includes:
[0125] Obtain the initial labeled dataset, which consists of CT images of the adrenal gland region labeled with transverse, sagittal, or coronal glands;
[0126] The initial labeled data is input into the initial segmentation model to obtain auxiliary segmentation results;
[0127] Obtain the annotation result set, wherein the annotation result is obtained by manually annotating based on the annotation segmentation result.
[0128] See below. Figure 8 The Mamba-based adrenal gland segmentation device 800 of this embodiment includes:
[0129] The target prediction module 801 is used to input the target CT image into a Mamba-based segmentation model to obtain the target prediction result. The Mamba-based segmentation model is trained by the method described in claims 1-5.
[0130] The following is for reference. Figure 9 It shows a schematic diagram of the structure of a computer 900 suitable for implementing the electronic device of the present invention. Figure 9 The computer 900 shown is merely an example and should not be construed as limiting the functionality and scope of the embodiments of the present invention.
[0131] like Figure 9 As shown, the computer 900 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 901, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 902 or a program loaded from a storage device 908 into a random access memory (RAM) 903. The RAM 903 also stores various programs and data required for the operation of the computer 900. The processing device 901, ROM 902, and RAM 903 are interconnected via a bus 904. An input / output (I / O) interface 905 is also connected to the bus 904.
[0132] Typically, the following devices can be connected to I / O interface 905: input devices 906 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, etc.; output devices 907 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; storage devices 908 including, for example, magnetic tapes, hard disks, etc.; and communication devices 909. Communication device 909 allows computer 900 to communicate wirelessly or wiredly with other devices to exchange data. Although Figure 9 A computer 900 with various electronic devices is shown; however, it should be understood that it is not required to implement or possess all of the devices shown. More or fewer devices may be implemented or possessed alternatively.
[0133] In particular, according to embodiments of the present invention, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device 909, or installed from a storage device 908, or installed from a ROM 902. When the computer program is executed by a processing device 901, it performs the functions defined in the methods of the embodiments of the present invention.
[0134] It should be noted that the computer-readable medium described above in this invention can be a computer-readable signal medium, a computer-readable storage medium, or any combination thereof. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor device or apparatus, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this invention, a computer-readable storage medium can be any tangible medium containing or storing a program that can be executed by instructions, used by a device or apparatus, or used in conjunction with it. In this invention, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. A computer-readable signal medium may be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with instructions, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wires, optical fibers, RF (radio frequency), etc., or any suitable combination thereof.
[0135] The aforementioned computer-readable medium may be included in the aforementioned electronic device; or it may exist independently and not assembled into the electronic device.
[0136] The aforementioned computer-readable medium carries one or more programs, which, when executed by the electronic device, cause the electronic device to implement the methods shown in the above embodiments and their optional implementations.
[0137] Computer program code for performing the operations of this invention can be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, and C++, as well as conventional procedural programming languages such as the "C" language or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or cloud server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including a local area network (LAN) or a wide area network (WAN)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).
[0138] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of methods and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing the specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, may be implemented using dedicated hardware-based implementations that perform the specified functions or operations, or using a combination of dedicated hardware and computer instructions.
[0139] The units or modules described in the embodiments of the present invention can be implemented in software or hardware. In some cases, the user identifier of a unit or module does not constitute a limitation on the unit itself.
[0140] The above description is merely a preferred embodiment of the present invention and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of disclosure in this invention is not limited to technical solutions formed by specific combinations of the above-described technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the above-disclosed concept. For example, technical solutions formed by substituting the above features with (but not limited to) technical features with similar functions disclosed in this invention.
Claims
1. A training method for an adrenal gland segmentation model based on Mamba, characterized in that, The method includes: Obtain CT images of the adrenal gland region and their annotation results; The CT image of the adrenal gland region is input into the initial segmentation model to obtain the predicted segmentation result. The initial segmentation model includes a three-way Mamba module and an attention mechanism module. The loss between the annotation result and the predicted segmentation result is calculated using a combined loss function; The parameters of the initial segmentation model are adjusted using the loss to obtain a Mamba-based segmentation model.
2. The method according to claim 1, characterized in that, The step of inputting the CT image of the adrenal gland region into the initial segmentation model to obtain the predicted segmentation result includes: The backbone layer uses deep convolution to extract the initial features of the CT image of the adrenal gland region and outputs a 48-channel feature map. The ToM layer flattens the 48-channel feature map into three sequences f, r, and s, and models the global dependency through the Mamba layer to output the ToM features. The SE layer performs channel weight optimization on the ToM features to obtain attention features; The downsampling layer reduces the feature map size of the attention features through strided convolution to obtain downsampling features; The upsampling layer recovers the feature map size of the downsampling features through transposed convolution to obtain the upsampling features; The output layer generates a predicted segmentation result based on the upsampled features.
3. The method according to claim 1, characterized in that, The method further includes: The mean-variance standardization method was used to normalize the CT images of the adrenal gland region; And / or, grayscale truncation of the CT image of the adrenal gland region is performed using a preset window width and window level; And / or, the CT images of the adrenal region are resampled to a uniform isovoxel resolution using nearest neighbor interpolation.
4. The method according to claim 1, characterized in that, The combined loss function is a loss function that combines Dice loss, boundary loss, and small lesion focusing loss.
5. The method according to claim 1, characterized in that, The annotation results are obtained through a two-stage annotation operation, which includes: Obtain the initial labeled dataset, which consists of CT images of the adrenal gland region labeled with transverse, sagittal, or coronal glands; The initial labeled data is input into the initial segmentation model to obtain auxiliary segmentation results; Obtain the annotation result set, wherein the annotation result is obtained by manually annotating based on the annotation segmentation result.
6. A method for adrenal gland segmentation based on Mamba, characterized in that, The method includes: The target CT image is input into a Mamba-based segmentation model to obtain the target prediction result. The Mamba-based segmentation model is trained by the method described in claims 1-5.
7. A training device for an adrenal gland segmentation model based on Mamba, characterized in that, The device includes: The acquisition module is used to acquire CT images of the adrenal gland region and their annotation results; The prediction module is used to input the CT image of the adrenal region into the initial segmentation model to obtain the predicted segmentation result. The initial segmentation model includes a three-way Mamba module and an attention mechanism module. The loss calculation module is used to calculate the loss between the labeled results and the predicted segmentation results using the combined loss function; The parameter tuning module is used to adjust the parameters of the initial segmentation model using the loss to obtain a Mamba-based segmentation model.
8. A Mamba-based adrenal gland segmentation device, characterized in that, The device includes: The target prediction module is used to input the target CT image into a Mamba-based segmentation model to obtain the target prediction result. The Mamba-based segmentation model is trained by the method described in claims 1-5.
9. An electronic device, characterized in that, include: One or more processors; Storage device, on which one or more programs are stored, When the one or more programs are executed by the one or more processors, the one or more processors cause the one or more processors to implement the method as described in any one of claims 1-5.
10. A readable storage medium having executable instructions stored thereon, characterized in that, When the executable instructions are executed by the processor, they implement the method as described in any one of claims 1-5.