Rectal tumor segmentation method based on interpretability

By using Grad-CAM and a sliding window strategy to select valuable slices and retrain the model, we have solved the problem of the lack of interpretability of deep learning models in rectal cancer diagnosis, improved the accuracy and interpretability of rectal cancer tumor segmentation, and achieved more efficient medical image segmentation.

CN118397019BActive Publication Date: 2026-06-19UNIV OF ELECTRONICS SCI & TECH OF CHINA

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
UNIV OF ELECTRONICS SCI & TECH OF CHINA
Filing Date
2024-05-11
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing deep learning medical image segmentation models lack interpretability in rectal cancer diagnosis, making it difficult to provide detailed explanations and improvements. They also have insufficient segmentation accuracy, especially in images with significant morphological variations.

Method used

We employ an interpretability analysis method based on Grad-CAM, using a sliding window strategy and a threshold combining specificity and Dessian coefficients to select valuable image slices. We then perform retraining to improve the model's segmentation accuracy. By combining structural similarity indicators to reduce repetition, we use interpretability methods combined with custom slice selection rules to improve the model's segmentation performance for rectal cancer tumors.

Benefits of technology

While maintaining the model structure, the accuracy and interpretability of rectal cancer tumor segmentation were significantly improved, the model's ability to segment regions of less interest was enhanced, and the overall segmentation accuracy and interpretability analysis capabilities were improved.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN118397019B_ABST
    Figure CN118397019B_ABST
Patent Text Reader

Abstract

This invention discloses an interpretability-based method for rectal cancer tumor segmentation, belonging to the field of medical image processing technology. The invention uses a sliding window method based on weak feature slicing to extract under-interested regions during the information flow process within the model. Simultaneously, it uses a threshold based on a combination of specificity and the Dessian coefficient to further filter the obtained slices. Finally, it uses a structural similarity index to further reduce the number of slices. Retraining with the slices and images from the original dataset is equivalent to using a magnifying glass to magnify the details of poorly segmented areas, thereby strengthening weak areas in the initial training and improving the model's segmentation accuracy using an interpretability method. This invention provides interpretability analysis for deep segmentation models and enhances the segmentation capabilities of the original model using interpretability analysis techniques while preserving the original model.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of medical image processing technology, specifically relating to a method for rectal cancer tumor segmentation and interpretability analysis based on convolutional semantic segmentation networks. Background Technology

[0002] According to the latest global cancer statistics report, colorectal cancer has become the third most common cancer worldwide. As one of the countries with a high incidence of colorectal cancer, my country saw 550,000 new cases in 2020, according to data from the National Cancer Center, ranking first among malignant tumors in terms of diagnosed cases, second only to lung cancer. Notably, the total number of deaths from colorectal cancer reached 280,000, ranking fifth among malignant tumor deaths.

[0003] Rectal cancer is clinically defined as a malignant tumor extending from the dentate line to the rectosigmoid junction. Its unique characteristic is its long progression period; many patients are diagnosed at an advanced stage, resulting in a staggering mortality rate of up to 86%. Besides genetic factors, the causes of rectal cancer are closely related to controllable factors in modern life, such as irregular eating habits and lack of exercise.

[0004] However, faced with this threat, modern society generally suffers from insufficient awareness of cancer prevention and detection. Coupled with the uneven distribution of medical resources, this results in most patients being diagnosed at middle or late stages, missing the optimal window for early treatment. Therefore, utilizing relevant medical imaging processing technologies for early diagnosis and screening of rectal cancer is particularly urgent and of significant practical importance.

[0005] Traditional medical image segmentation typically relies on manual methods, involving manually delineating lesion areas to obtain key features such as shape, size, location, and distribution. This helps distinguish lesion areas from normal organs and tissues, providing crucial information for medical diagnosis and establishing a diagnostic basis. However, this manual process is time-consuming, labor-intensive, highly dependent on physician experience, and inherently uncertain, leading to differing segmentation decisions for the same lesion area among different physicians.

[0006] To overcome this challenge, researchers have introduced artificial intelligence (AI) technology into the field of medical image segmentation, conducting more in-depth and extensive research. This primarily includes traditional machine learning-based and deep learning-based medical image segmentation methods. For rectal cancer images with significant morphological variations, deep learning technology can directly extract feature information from the input image, learning and understanding complex features in medical images, thereby achieving efficient and accurate segmentation. Therefore, compared to traditional methods, deep learning-based medical image segmentation methods are considered a superior choice.

[0007] In recent years, deep learning-based image processing methods have made significant progress in multiple fields. However, the modeling process of deep learning models is generally considered a "black box," meaning that it is difficult to provide detailed explanations when predictions are correct. Therefore, when the model exhibits prediction bias, it is difficult to conduct systematic analysis and improvement. In the field of image segmentation, the processing flow of deep neural networks typically includes a series of convolutions, pooling, activations, and fully connected mappings. However, due to the high-dimensionality of extracted features, questions regarding the meaning of deep features, operator weights, and the explanation of the reasons for pixel classification are still under exploration. Especially for some fields with low fault tolerance and high risk, such as medical decision-making and autonomous driving, the black-box nature of deep neural networks makes them difficult to apply directly. Overall, using deep neural networks to identify and analyze colorectal pathology images and providing interpretable features to explain prediction results can not only fundamentally improve the classification performance of deep convolutional neural networks (CNNs), but also provide internal explanations for the model, improve its generalization ability, and enable artificial intelligence to better serve society.

[0008] However, current development and application of deep learning models in medical image processing primarily focus on improving the accuracy of output results, while the clinical and knowledge value of model interpretability has not received sufficient attention. Constrained by factors such as patient privacy, ethnicity, and individual differences, medical images lack authoritative, unified, and sufficient sample databases. This makes the development of interpretable deep learning models crucial for correcting systematic biases in sample selection. Simultaneously, interpretable deep models can extract features in high-dimensional medical images that are easily overlooked by humans, providing valuable knowledge feedback for medical imaging. This not only helps improve the reliability and generalization ability of models but also promotes a deeper understanding of medical images. In deep learning medical image processing, emphasizing model interpretability plays a positive role in advancing the field. Summary of the Invention

[0009] This invention discloses an interpretability-based method for rectal cancer tumor segmentation, enabling interpretable analysis of medical segmentation models, improving the model's segmentation accuracy for rectal cancer tumor lesion regions, and thus enhancing the overall segmentation accuracy of the model.

[0010] The technical solution adopted in this invention is as follows:

[0011] An interpretability-based method for rectal cancer tumor segmentation, comprising the following steps:

[0012] Step 1: Perform initial training of the rectal cancer tumor segmentation model network model based on the training dataset to obtain the initial model;

[0013] The training dataset or a portion of the training dataset is used for forward inference on the initial model, saving the output feature map of each sample in the corresponding dataset at a specified layer of the initial model, as well as the segmentation result output by the initial model;

[0014] Step 2: Based on the set feature map weights, perform weighted fusion on all output feature maps of the specified layer to obtain a class activation map. This class activation map is then processed by an activation function to obtain an attention heatmap (i.e., an interpretable heatmap). This attention heatmap can represent the model's attention to each region (pixel). Different colors can be used to distinguish the level of attention, for example, red represents that the model pays more attention to the region, while blue represents that the model pays less attention to the region. By using color, the model's attention to each region for different categories can be visualized, providing a qualitative and interpretable decision-making process.

[0015] Step 3: Perform slice filtering on the attention heatmap;

[0016] A sliding window strategy is adopted, in which windows partially overlap when the sliding window moves. Each time it moves, the specificity and Dess coefficient between the interpretability heatmap and the corresponding label within the current window range are calculated. The weighted sum of the specificity and Dess coefficient is used as the metric of the candidate slice corresponding to the current window. Candidate slices with a metric value greater than or equal to the screening threshold are retained. Redundancy removal is performed on the candidate slices based on the structural similarity between them to obtain the slice screening results.

[0017] In step 3 of this invention, the partial overlap between adjacent windows is retained to obtain richer contextual information for each image patch. Candidates selected based on specificity and the Dessian coefficient are further evaluated using structural similarity indicators to reduce redundancy between slices, selecting slices with low redundancy and high value. Step 3 achieves the filtering of valuable slices based on interpretable attention heatmaps according to predefined custom rules.

[0018] Step 4: Adjust the image size of the selected slices to match the image size of the model input, that is, adjust the slice size to the image size in the training dataset; construct a new training dataset based on all the adjusted slices and the initial training dataset, and retrain the initial model based on the new training dataset to obtain the trained rectal cancer tumor segmentation model.

[0019] After obtaining image slices with low repetition and high value based on step 3, the image sizes of the slices need to be adjusted to match the original training set images. These slices are then combined with the original dataset to form a new training set. The initially trained model is then retrained using this new training set. This improves the model's segmentation accuracy for under-interested regions while preserving the model structure, and provides an interpretable method to explain the model's decision-making process. During retraining, this invention fine-tunes the initial model, allowing for smaller learning rates and fewer training epochs without requiring additional data augmentation. When the model's loss converges, the model has converged, and the retraining process ends. At this point, because the model has received both coarse and detailed images from the original training dataset and resized image slices, it exhibits good segmentation performance in both overall and detailed aspects. While maintaining the original model's overall network structure, the retraining strategy, combining interpretable methods with custom slice selection rules, improves the model's segmentation performance for rectal cancer tumors.

[0020] Furthermore, the present invention also includes step 5, which involves testing the rectal cancer tumor segmentation model (final model) trained in step 4 using a proprietary dataset:

[0021] Convert the data format of your own dataset to be the same as that of the training dataset;

[0022] The proprietary dataset differs from the training data used during training in terms of rectal cancer morphology and background features. By using the final model on the proprietary dataset for inference and comparing it with the corresponding labels and the segmentation results output by the initial model, it was found that the segmentation model combining the interpretable method and the custom slice selection rules with the retraining strategy has a better segmentation effect on rectal cancer tumors, and is more accurate in segmenting the shape and edge of rectal cancer tumors. This shows that the method proposed in this invention has good performance in rectal cancer tumor segmentation.

[0023] In further step 1, the training dataset is obtained by: acquiring initial training data from a public dataset, performing data augmentation on the data, and obtaining the initial training dataset based on the training data before and after data augmentation.

[0024] In this invention, the rectal cancer tumor segmentation model is a segmentation network based on a convolutional neural network, and is not limited to the U-Net model; other network structures are also acceptable. The selection of the designated layer in the rectal cancer tumor segmentation model depends primarily on its location; typically, the middle layer of the model containing the richest semantic information is selected. For example, for the U-Net model, the transition unit between the encoder and decoder can be selected as the designated layer for interpretability analysis. By obtaining the feature map output by the designated layer, a final class activation map is obtained by performing a weighted summation of the feature maps. Finally, an activation function (such as ReLU) is used to segment the regions that have a positive impact on the specified class. This region can effectively improve the output probability of the final fully connected layer for the specified class. Then, the attention heatmap is upsampled and normalized to obtain a semantically coarser heatmap. Coloring this heatmap allows visualization of the model's attention level to each region for different classes, providing a qualitative and interpretable decision-making process.

[0025] In step 3, when using the sliding window strategy to obtain candidate slices, the region to be defined is first determined to identify the region of interest. First, a zero-based matrix K with the same size as the image's height and width is set. Pixels labeled as rectal cancer tumors are assigned grayscale values ​​from an interpretable heatmap in matrix K. These grayscale values ​​represent the pixel's contribution to rectal cancer tumor segmentation. Pixels in non-rectal cancer tumor regions remain zero in K. Based on this, using K as a reference matrix, a sliding window strategy is employed. When setting the step size of the sliding window, a certain overlap area needs to be set to ensure richer contextual information in each image slice. The specificity and Dessian coefficient between the attention heatmap of matrix K within the window region and the corresponding label are calculated. Then, a threshold based on the sum of the specificity and the Dessian coefficient is set for filtering. Finally, image slices and their corresponding segmentation masks that have specificity and Dessian coefficients exceeding the sum of the average specificity and Dessian index on the training dataset are retained. Finally, to further eliminate duplicate slices, a structural similarity index was calculated between each pair of image patches. The threshold was set to the average of all pairwise structural similarities to filter out half of the redundant slices, thereby retaining image slices with less repetition and greater value.

[0026] Furthermore, in step 2, the feature map weights are set as follows:

[0027] Define the pixel value at pixel position (i,j) of the output feature map corresponding to the k-th channel of a specified layer. The size of the output feature map for each channel is H×W;

[0028] Feature map weights for each channel's output feature map are obtained using gradient-weighted activation mapping:

[0029]

[0030] in, p represents the feature map weight of the k-th channel under category c. c This represents the probability score of the current sample image being classified into category c by the initial model output. In this invention, the classification categories include two types: background and rectal cancer tumor. Here, category c is a specific designation, referring to pixels that are rectal cancer tumors.

[0031] The technical solution provided by this invention brings at least the following beneficial effects:

[0032] This invention uses a Grad-CAM (Gradient Weighted Class Activation Mapping) method to analyze the information flow process within the model, obtaining a heatmap representing regional attention. Then, a sliding window strategy is used to process the heatmap to obtain multiple candidate image slices. A threshold based on specificity and the Dessian coefficient is then used to filter the candidate slices. Finally, a structural similarity index is used to ensure that the obtained slices are significantly different, minimizing the number of valuable slices. To improve the segmentation performance of the deep segmentation model using interpretability theory, the slices obtained through the above process are mixed with the original data for retraining. This improves the model's segmentation accuracy for rectal cancer tumor lesions while preserving the original model structure, thereby enhancing the overall segmentation accuracy of the model. Attached Figure Description

[0033] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0034] Figure 1 This is a schematic diagram of the overall architecture for rectal cancer tumor segmentation according to an embodiment of the present invention;

[0035] Figure 2 This dataset contains images of various morphologies of rectal cancer tumors.

[0036] Figure 3 Here is an example diagram illustrating the use of an automatic data augmentation strategy;

[0037] Figure 4 The results are from testing the initial UNet model in the Teddy Cup.

[0038] Figure 5 Example diagram of an interpretable heatmap obtained through interpretability methods;

[0039] Figure 6 Example images of sliced ​​images and their mask labels obtained after filtering using a custom valuable slice strategy;

[0040] Figure 7 This is an example graph comparing the test results of the retrained model and the initial model on the Teddy Cup dataset.

[0041] Figure 8 This is an example of the change in the model's focus on the target region of rectal cancer tumors after retraining;

[0042] Figure 9 Example image of a portion of data from our own dataset;

[0043] Figure 10 This is an example graph comparing the test results of the retrained model and the initial model on the proprietary dataset. Detailed Implementation

[0044] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be described in detail and completely below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of this application, and not all of them. Generally, the components of the embodiments of the present invention described and shown in the accompanying drawings can be arranged and designed using different configurations. Therefore, the following detailed description of the embodiments of the present invention provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely represents selected embodiments of the present invention.

[0045] Gradient-weighted class activation mapping (Grad-CAM) is a classic post-interpretation method in deep learning-based medical image processing. It generates interpretable heatmaps by combining individual samples with their resulting feature maps. Although the generated heatmaps are coarse-grained, Grad-CAM can be combined with backpropagation-based visualization interpretation methods to produce high-resolution, pixel-level interpretable maps with clear semantics. This method is simple and intuitive, suitable for models in various tasks, including medical image classification, image understanding, and visual question answering.

[0046] To leverage the information provided by Grad-CAM's interpretability approach, this embodiment of the invention employs a sliding window method based on weak feature slicing to extract under-interested regions during the information flow within the model. Further filtering of the obtained slices is achieved using a threshold based on a combination of specificity and the Dessian coefficient. Finally, structural similarity metrics are used to further reduce the number of slices. Retraining with the slices and images from the original dataset is akin to using a magnifying glass to magnify the details of poorly segmented areas, allowing the model to strengthen areas where training was initially weak. This improves the model's segmentation accuracy using interpretability methods. Compared to traditional deep learning-based methods, the interpretability-based rectal cancer tumor segmentation method proposed in this embodiment provides interpretability analysis of deep segmentation models, enhancing the original model's segmentation capabilities while preserving its integrity.

[0047] The rectal cancer tumor segmentation method based on interpretability provided in this invention is mainly applicable to deep learning segmentation networks based on convolutional neural networks, and is not limited to the UNet model. For ease of description, this invention uses the classic medical image segmentation model UNet as an example. See also Figure 1 The segmentation method provided in this embodiment of the invention first divides the original dataset into training, validation, and test sets, for example, in a 7:1:2 ratio, and then trains an initial CNN-based deep segmentation network to obtain an initial UNet segmentation model. Figure 1 x in EN x DEThese represent the output features of the encoder and decoder, respectively, with superscripts used to distinguish the layer numbers. Next, the interpretability stage begins, using the Grad-CAM method to analyze the interpretability of the information flow within the model. This involves selecting the region in the network's encoding layer that contains the richest semantic information (generally the last convolutional layer in the encoder), weighting and summing these feature maps, and then applying an activation function to obtain an attention heatmap. This heatmap represents the model's attention to each region (pixel). In the colored heatmap, red indicates high model attention to that region, while blue indicates low attention. After obtaining the attention heatmap through these steps, the slice selection strategy stage begins, focusing on selecting valuable slices. First, a sliding window strategy is used, where some overlap is left between windows as the sliding window moves to provide richer contextual information for each image patch. The specificity and the sum of the Dessian coefficients between the interpretability heatmap and the corresponding label within the window are calculated. The number of slices is controlled by a threshold based on a combination of specificity and Dessian coefficients. Then, a structural similarity index is used to reduce the redundancy between slices, selecting slices with low redundancy and high value. Finally, the image size of the slices is adjusted and combined with the original training set to form a new training set. A CNN-based deep segmentation network is used for retraining, thereby improving the model's segmentation accuracy for under-interested regions while preserving the model structure. At the same time, interpretability methods are used to provide the model's decision-making process.

[0048] As one possible implementation, the specific steps of an interpretable rectal cancer tumor segmentation method provided in this embodiment of the invention include:

[0049] Step 1: Dataset processing, partitioning, and initial training of the UNet model

[0050] This embodiment collects publicly available datasets related to rectal cancer tumor segmentation from the internet. Specifically, it collects the original files from the 7th Teddy Cup dataset and combines them with its own dataset to test the effectiveness of the rectal tumor segmentation method of this invention. Since the Teddy Cup dataset collected from the internet is in DICOM format with corresponding PNG labels, and this embodiment primarily performs 2D medical image segmentation, it is necessary to convert the CT images and corresponding labels to PNG format for easier subsequent operations. The converted images exhibit certain differences depending on the patient, leading to morphological variations in rectal cancer tumors. A specific example is provided. Figure 2As shown. Furthermore, the Teddy Cup dataset contains 107 cases, 3029 images and their corresponding labels, which is relatively small overall, and many images have empty label masks. Because the original dataset lacks a partition, it needs to be partitioned to accommodate neural network training. In this embodiment, the dataset is divided into a training set, validation set, and test set in a 7:1:2 ratio. Since the original dataset has an insufficient number of images, data augmentation methods are needed to expand it. This embodiment uses common medical image data augmentation strategies to process the images. The overall strategy uses automatic data augmentation techniques, including several data augmentation combinations and arrangements. This embodiment uses four strategies, each randomly selecting operations from the pixel domain and spatial domain for data augmentation. Pixel domain augmentation includes operations such as brightness, contrast, tone separation, sharpening, Gaussian blur, and Gaussian noise. Spatial domain augmentation includes operations such as random rotation, horizontal flip, vertical flip, random scaling, translation, and shearing transformation. Examples of image augmentation are shown below. Figure 3 As shown in the diagram, the image in the first row and first column is the original image. The image in the second column undergoes random rotation and brightness transformation for data augmentation. The image in the second row and first column undergoes horizontal flipping, translation, and Gaussian noise transformation. The image in the second column undergoes random rotation and Gaussian noise transformation. It can be seen that this automatic data augmentation strategy enhances data diversity without affecting the normal segmentation of rectal cancer tumors. After these steps, the augmented images are combined with the original dataset to obtain the expanded dataset. The hardware used in this embodiment mainly included a CPU: Intel 13700F, a GPU: NVIDIA GeForce 4090 24GB, and 32GB of RAM. The software configuration was as follows: Ubuntu 22.04, Python 3.8.17, using the deep learning framework PyTorch 2.0. No pre-trained models were loaded during model training. The network model optimizer was Adam, with an initial learning rate of 5e-4, a learning rate update strategy of ReduceLROnPlateau, a batch size of 8, and 100 training epochs. Gradient clipping was used to prevent gradient explosion, and the model's loss function used a combination of binary cross-entropy and dice loss. An initial UNet model was obtained by training in the above environment. An example of using the initial UNet model for rectal cancer tumor segmentation is shown below. Figure 4 As shown in the image. The first row is the original image, the second row is the result obtained from testing the initial UNet model, and the third row is the mask label corresponding to the original image.

[0051] Step 2: Perform interpretability analysis on the initial UNet model.

[0052] After obtaining the trained UNet model, the transition unit between the encoder and decoder in UNet is selected as the designated layer for interpretability analysis. Other deep learning segmentation networks based on CNN models primarily select the designated layer based on its location (e.g., UNet3plus selects the fourth downsampling layer of the encoder, and Deeplabv3 selects the fourth layer of the backbone network). In this embodiment, the general selection principle is to choose the layer between the encoder and decoder, because these positions are the last link in the information flow during the network's feature extraction process, allowing for a more accurate analysis of the information flow process within the network. The feature map obtained after acquiring the designated layer of the model is then used. The probability score of the network model's output being classified into class c is set to p. c The pixel value of the k-th feature map output by the last convolutional layer of the specified layer in image (i,j) is [value missing]. The feature map size is H×W. Grad-CAM calculates p for each pixel on the feature map. c right The partial derivative of the gradient is used to identify the pixel in the feature map that has the greatest impact on the final classification decision and then backpropagates the gradient.

[0053] After the above steps, the gradient magnitude of each pixel on the feature map can be obtained. Then, the gradients of each pixel in the feature map are averaged to obtain the weight of the feature map with respect to class c. The calculation process is shown in the following formula:

[0054]

[0055] Then, the obtained feature map weights are used. The feature maps are weighted and summed to obtain the final class activation map.

[0056] Finally, the ReLU activation function is used to segment the regions that have a positive impact on the specified category and suppress the regions that have a negative impact on the classification results, resulting in the final interpretability heatmap (the final class activation map). The calculation formula is as follows:

[0057]

[0058] Where, φ k This represents the output feature map of the k-th channel of the specified layer.

[0059] By using the ReLU activation function to delineate regions that positively influence classification, these regions can effectively increase the output probability of the final fully connected layer for a given category. Subsequently, upsampling and normalizing the heatmap yields a semantically coarser heatmap. Coloring this heatmap visualizes the degree of attention the model pays to each region for different categories, providing a qualitative picture of the model's interpretability decision-making process. Figure 5 The image shows an example where red indicates areas that have a positive impact on the classification of rectal cancer tumors, while blue indicates areas that have a negative impact on the classification of rectal cancer tumors.

[0060] Step 3: Filter valuable slices based on custom rules set by the interpretability attention heatmap.

[0061] After obtaining the interpretable attention heatmap in step three, custom rules are needed to filter the slices to acquire valuable image slices. In this embodiment, the slice filtering process is as follows: First, the region to be defined is delineated to determine the region of interest. A zero-based matrix K with the same size as the original image is set. For pixels labeled as rectal cancer tumors in the image, K is assigned the grayscale value of the interpretable heatmap. This grayscale value represents the pixel's contribution to the segmentation of rectal cancer tumors. Non-rectal cancer tumor regions remain zero in K, thus determining the range of the region of interest. Based on this, using K as a reference matrix, a sliding window strategy is employed. In this embodiment, the height and width of the sliding window are set to be half the height and width of the original image. A certain overlap area needs to be set when determining the step size of the sliding window. This is achieved by setting the sliding window's step size to 0.15 times the height and width of the original image, thereby ensuring richer contextual information in each image slice. This setting of the sliding window size and step size ensures a moderate number of image patches are obtained while guaranteeing sufficient contextual information within each image patch. During the window sliding process, the specificity and Destination coefficient of the interpretability attention heatmap of matrix K within the window region are calculated. Then, a threshold based on the sum of the specificity and the Destination coefficient is set for filtering. Ultimately, image slices and their corresponding segmentation masks are retained if their specificity and Destination coefficient exceed the sum of the average specificity and Destination index on the training set. A specific example is shown below. Figure 6 As shown in the figure. The first two rows are the corresponding small slice images, and the last two rows are the corresponding mask labels. The formulas for calculating the specificity (sen) and the Dice coefficient are as follows:

[0062]

[0063]

[0064] Where TP represents positive samples correctly predicted as positive by the model. FN represents positive samples predicted as negative by the model. |A∩B| represents the cardinality (number of elements) of the intersection of sets A and B. |A| represents the cardinality of set A. |B| represents the cardinality of set B.

[0065] Using threshold indicators based on specificity and the Desce formula for screening can effectively improve the recall rate of rectal cancer tumors, while also improving the segmentation accuracy of the model for rectal cancer tumors.

[0066] Finally, to further eliminate duplicate slices, the structural similarity index SSIM is used to calculate the index between pairwise image patches. The calculation formula is as follows:

[0067]

[0068] Where, μ x and μ y σ represents the mean of x and y, respectively. x and σ y Let σ represent the standard deviations of x and y, respectively. xy This represents the covariance between x and y. The preset constants c1, c2, and c3 are used to prevent division by zero errors during calculation.

[0069] The threshold is set to the average of all pairwise structural similarities to filter out half of the redundant slices, thus reducing the training cost of step 4 retraining while retaining image slices with less repetition and greater value.

[0070] Step 4: Use valuable slices to combine with the original data to form a new training set for retraining the model.

[0071] After obtaining image slices with low repetition and high value, the slices need to be resized to match the original training set images for parallel training. The resized slices are then combined with the original dataset to form a new training set, which is then retrained on the original CNN-based deep learning segmentation network. Taking UNet as an example, the initial UNet model is fine-tuned using a smaller learning rate and fewer training epochs (here, the learning rate is set to 1 / 10 of the initial training rate, and the number of training epochs is 50). No additional data augmentation is required during the initial training. When the model's loss converges, it indicates convergence, and the retraining process is complete. During training, the weights with the highest Dessian coefficients on the validation set are saved every 5 epochs, resulting in the optimal weight file after training. During retraining, because the model receives both coarse and detailed images from the original dataset and resized image slices, it achieves good segmentation performance in both overall and detailed aspects. While maintaining the overall network structure of the original model, the retraining strategy, combined with interpretable methods and custom slice selection rules, improves the model's segmentation performance. Meanwhile, since the model has undergone initial training, the computational cost of fine-tuning the model during retraining is relatively low, enabling improved segmentation performance for rectal cancer tumors with lower computational cost. A comparison is made with the initial UNet segmentation and corresponding mask labels, and the retraining results are also compared using UNetplusplus and UNet3plus, both deep learning segmentation networks based on CNN, to demonstrate the versatility of the proposed method. Specific examples are as follows... Figure 7 As shown in the diagram. The first column of each row is the original image, the second column is the test result of the model after initial training, the third column is the test result of the model after interpretability analysis, custom slice selection rules, and retraining strategies, and the fourth column is the mask label corresponding to the original image. It is quite evident that interpretability analysis combined with custom slice selection criteria and retraining strategies can improve the model's segmentation performance for rectal cancer tumors. Furthermore, since custom valuable slices can enhance the model's focus on lesions, a specific example is shown below. Figure 8 As shown, the first and third columns of each row are interpretable attention heatmaps of the original image, while the second and fourth columns are interpretable attention heatmaps of the retrained model using the method proposed in this invention. It can be seen that after using the method proposed in this invention, the heatmap color of the rectal cancer tumor region transitions from blue to red, indicating that the model's attention to the rectal cancer tumor target region has improved after retraining.

[0072] Step 5: Validate the performance of the final deep learning segmentation model obtained above on a heterogeneous dataset.

[0073] After the retraining strategy in step 4, the UNet deep learning segmentation model obtained after retraining achieves high segmentation accuracy for rectal cancer tumors. To evaluate the effectiveness and generalization of the proposed method after obtaining fixed UNet weights, the results of testing with UNet were performed using a proprietary dataset. An example of the proprietary dataset is shown below. Figure 9 As shown, unlike the Teddy Cup data used during model training, the proprietary dataset consists of MRI images. The imaging principles differ significantly from CT images, and the image format is NII, thus requiring conversion to PNG format. The proprietary dataset exhibits significant differences from the Teddy Cup data used during training in both rectal cancer morphology and background features. Therefore, using the proprietary dataset to verify the generalization of the proposed method is more reasonable. The final UNet model was used on the proprietary dataset for inference and label matching, and the results were compared with the initial UNet segmentation. The effectiveness of the proposed method was also tested using AttentionUNet and Deeplabv3, both CNN-based deep learning segmentation models. Examples are shown below. Figure 10 As shown in the diagram, the first column of each row is the original image, the second column is the test result of the model after initial training, the third column is the test result of the model obtained after interpretability and custom slice selection rules and retraining strategy, and the fourth column is the mask label corresponding to the original image. It can be observed that the UNet model, AttetionUNet model, and Deeplabv3 model, which combine the interpretability method with custom slice selection rules and retraining strategy, achieve better segmentation results for rectal cancer tumors, significantly improving upon the initial training segmentation results. They also demonstrate more accurate segmentation of the shape and edges of rectal cancer tumors, fully illustrating the superior performance of the method proposed in this embodiment for rectal cancer tumor segmentation.

[0074] The rectal cancer tumor segmentation method based on interpretability provided in this invention first analyzes the information flow process within the model using a Grad-CAM-based method to obtain a heatmap representing regional attention. A sliding window-based method is then used to process the heatmap to obtain multiple image slices. Subsequently, a threshold based on a combination of specificity and the Dessian coefficient is used to further filter the obtained slices. Finally, a structural similarity index is used to ensure that the obtained slices are significantly different while minimizing the number of valuable slices. To improve the segmentation performance of the deep segmentation model using interpretability theory, the slices obtained through the above process are mixed with the original data for retraining. This improves the model's segmentation accuracy for rectal cancer tumor lesions while preserving the original model structure, thereby enhancing the overall segmentation accuracy of the model.

[0075] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

[0076] The above descriptions are merely some embodiments of the present invention. Those skilled in the art can make various modifications and improvements without departing from the inventive concept of the present invention, and these all fall within the scope of protection of the present invention.

Claims

1. An interpretable-based colorectal tumor segmentation method, characterized in that, Includes the following steps: Step 1: Perform initial training of the rectal cancer tumor segmentation model network model based on the training dataset to obtain the initial model; Perform forward inference on the initial model using the training dataset or a portion of the training dataset, and save the output feature map of each sample in the corresponding dataset at a specified layer of the initial model, as well as the segmentation result output by the initial model. Step 2: Based on the set feature map weights, perform weighted fusion on all output feature maps of the specified layer to obtain a class activation map. This class activation map is then processed by an activation function to obtain an attention heatmap. Step 3: Perform slice filtering on the attention heatmap: A sliding window strategy is adopted. When the sliding window moves, the windows partially overlap. Each time it moves, the specificity and Dess coefficient between the interpretability heatmap and the corresponding label within the current window range are calculated. The weighted sum of the specificity and Dess coefficient is used as the metric of the candidate slice corresponding to the current window. Candidate slices with a metric value greater than or equal to the screening threshold are retained. Redundancy removal is performed on candidate slices based on structural similarity between them to obtain the slice selection results; Step 4: Adjust the image size of the selected slices to match the image size of the model input. A new training dataset is constructed based on all the adjusted slices and the initial training dataset. The initial model is then retrained based on this new training dataset to obtain a well-trained rectal cancer tumor segmentation model.

2. The method of claim 1, wherein, In step 1, the training dataset is obtained by acquiring initial training data from a public dataset, performing data augmentation on the data, and obtaining the initial training dataset based on the training data before and after data augmentation.

3. The method of claim 1, wherein, Step 2 also includes upsampling and normalization of the obtained attention heatmap.

4. The method of claim 1, wherein, Step 2 also includes coloring the obtained attention heatmap to visualize the degree of attention the rectal cancer tumor segmentation model pays to each region for different categories.

5. The method of claim 1, wherein, In step 2, the feature map weights are set as follows: The definition specifies that the pixel value of the output feature map corresponding to the kth channel of the specified layer at the pixel position (i, j) is The size of the output feature map of each channel is HxW. Feature map weights for each channel's output feature map are obtained using gradient-weighted activation mapping: in, p represents the feature map weight of the k-th channel under category c. c This represents the probability score of the current sample image being classified into category c by the initial model output, where category c refers to rectal cancer tumors.

6. The method as described in claim 1, characterized in that, Step 3, when using the sliding window strategy to obtain candidate slices, also includes: Define a zero-based matrix K with the same size as the images in the training dataset; Based on the segmentation results output by the rectal cancer tumor segmentation model during the initial training process, matrix K is assigned values: for pixels labeled as rectal cancer tumors in the segmented image, the gray value of the attention heatmap is assigned to matrix K, which represents the degree of contribution of the pixel to the segmentation of rectal cancer tumors; the values ​​assigned to pixels in non-rectal cancer tumor regions remain unchanged in matrix K. With K as the reference matrix, a sliding window is used to move across the attention heatmap, and the specificity and Dess coefficient between the attention heatmap of the reference matrix and the corresponding label within the current window area are calculated.

7. The method of claim 1, wherein, In step 3, the metric of a candidate slice is the sum of its specificity and Dess coefficient, and the screening threshold is the average metric of the candidate slices obtained from the training dataset.

8. The method of claim 1, wherein, In step 3, when performing redundancy removal on candidate slices based on the structural similarity between candidate slices, the mean of the structural similarity between all candidate slices is used as the similarity threshold. For any two candidate slices, if their structural similarity does not exceed the similarity threshold, one of the candidate slices is removed, until the structural similarity between any two candidate slices is greater than or equal to the similarity threshold.

9. The method of claim 1, wherein, In step 4, the learning rate set during the retraining process is smaller than that during the initial training, and the number of training rounds set during the retraining process is less than that during the initial training.

10. The method of claim 1, wherein, The present invention also includes step 5, which involves testing the rectal cancer tumor segmentation model trained in step 4 using a proprietary dataset: Convert the data format of your own dataset to be the same as that of the training dataset; On our own dataset, we used the pre-trained rectal cancer tumor segmentation model and the initial model for forward inference. The segmentation loss between the output segmentation result and the corresponding label was used as the corresponding test result. The test results of the two models were compared and visualized.