Colorectal cancer MRI image segmentation method and system based on multi-dimensional feature fusion

By integrating image and clinical data through a multi-dimensional feature fusion method, the problems of low early lesion recognition rate and poor adaptability to individual differences in colorectal cancer MRI image segmentation were solved, achieving more accurate lesion segmentation and improving the accuracy of early diagnosis of colorectal cancer.

CN121280359BActive Publication Date: 2026-06-19CHUZHOU CITY VOCATIONAL COLLEGE

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHUZHOU CITY VOCATIONAL COLLEGE
Filing Date
2025-09-28
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing MRI image segmentation methods for colorectal cancer suffer from low early-stage small lesion recognition rates, poor adaptability to individual differences, and insufficient utilization of functional metabolic information, resulting in unstable segmentation performance and high misdiagnosis rates.

Method used

By integrating image features with clinically relevant data and employing a multi-dimensional feature fusion approach, including multimodal data preprocessing, multi-scale feature extraction, feature fusion, spatial and channel attention mechanisms, U-Net network decoding, and CRF post-processing, the accuracy of lesion identification and segmentation is improved.

Benefits of technology

It improves the recognition rate of early small lesions, reduces false positives outside the intestinal wall, enhances the ability to adapt to individual differences, and provides more accurate MRI image segmentation results for colorectal cancer, supporting early diagnosis and staging.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121280359B_ABST
    Figure CN121280359B_ABST
Patent Text Reader

Abstract

This invention relates to a method and system for colorectal cancer MRI image segmentation based on multi-dimensional feature fusion, belonging to the field of medical image processing technology. The invention first collects multi-source data from colorectal cancer patients and preprocesses the data, constructing a multi-scale feature extraction module to obtain image feature data. Then, through a multi-dimensional feature fusion module, it integrates seven dimensions of features: texture, shape, grayscale, modal correlation, lifestyle, functional metabolism, and tissue specificity, combining a spatial-channel attention mechanism to achieve feature selection and weight allocation. Finally, it decodes features using a U-Net network and optimizes the segmentation boundary using a fully connected conditional random field (CRF) with an adaptive potential function. This invention improves the accuracy and robustness of colorectal cancer MRI image segmentation through multi-modal data integration, multi-scale feature extraction, multi-dimensional feature fusion, attention mechanism empowerment, and the combination of U-Net network decoding and CRF post-processing.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to medical image processing technology, and more particularly to a method and system for segmenting colorectal cancer MRI images based on multi-dimensional feature fusion. Background Technology

[0002] Colorectal cancer is the third leading cause of cancer incidence and the second leading cause of cancer death worldwide. According to the 2023 Global Cancer Statistics Report, there are over 2 million new cases of colorectal cancer and over 1 million deaths annually. The 5-year survival rate for patients diagnosed at an early stage (Stage I) can reach over 90%, while the survival rate for patients diagnosed at an advanced stage (Stage IV) is less than 10%. Therefore, accurate lesion segmentation is a crucial prerequisite for early diagnosis and staging of colorectal cancer.

[0003] MRI, due to its radiation-free nature and high soft tissue resolution, has become the gold standard for imaging examinations of colorectal cancer. Commonly used multimodal sequences (T1, T2, DWI, DCE) can provide multidimensional information such as anatomical structure, water molecule diffusion, and tissue blood supply. However, current MRI image segmentation methods for colorectal cancer still have the following core limitations:

[0004] Low early small lesion recognition rate: Early tumors with a diameter of <5mm show weak signal differences in MRI images. Existing methods rely only on image features and are easily misclassified as normal tissue or inflammation.

[0005] Poor adaptability to individual differences: Different patients’ lifestyle habits (such as smoking, high-fat diet) and physical condition (such as obesity, bowel preparation) will affect MRI signal characteristics. Existing methods do not take these individual factors into account, resulting in large fluctuations in segmentation performance (DSC difference can reach 0.15).

[0006] Insufficient utilization of functional metabolic information: The ADC value and DCE pharmacokinetic parameters of DWI can directly reflect the activity of the lesion, but existing methods mostly use them as ordinary image inputs and do not quantify and explore their correlation with the degree of lesion malignancy.

[0007] Therefore, there is an urgent need to develop a segmentation method that integrates "image-clinical multi-source features" to overcome existing technological bottlenecks. Summary of the Invention

[0008] To address the aforementioned problems in existing technologies, this invention provides a method and system for colorectal cancer MRI image segmentation based on multi-dimensional feature fusion. By integrating image features with clinically relevant data, it improves the early detection rate of small lesions, reduces false positives outside the intestinal wall, and enhances the ability to adapt to individual differences, thus providing more accurate segmentation results for clinical practice.

[0009] To achieve the above objectives, firstly, the technical solution adopted by the present invention is: a colorectal cancer MRI image segmentation method based on multi-dimensional feature fusion, comprising the following steps:

[0010] S1: Obtain multimodal data of colorectal cancer patients, the multimodal data including:

[0011] (1) T1-weighted images, T2-weighted images, diffusion-weighted imaging (DWI) images, and dynamic contrast-enhanced (DCE) MRI images;

[0012] (2) Clinically relevant data, including lifestyle data and functional metabolic data. The lifestyle data includes smoking history, drinking history, diet, exercise habits and constipation history. The functional metabolic data includes the apparent diffusion coefficient (ADC) mapping map corresponding to the DWI image and the pharmacokinetic parameters corresponding to the DCE image.

[0013] S2: Preprocess the multimodal data;

[0014] S3: Construct a multi-scale feature extraction module, input the preprocessed multimodal data into the multi-scale feature extraction module, and extract the feature data of the image under the receptive fields of 1×1, 3×3, 5×5 and 7×7 respectively;

[0015] S4: Establish a multi-dimensional feature fusion module, which includes a texture feature extraction unit, a shape feature extraction unit, a grayscale feature optimization unit, a modal association feature mining unit, a living habit feature extraction unit, a functional metabolism feature extraction unit, and a tissue-specific feature extraction unit. The features extracted in step S3 are processed respectively to obtain a multi-dimensional feature set.

[0016] S5: Introduce spatial attention mechanism and channel attention mechanism into the multi-dimensional feature fusion module, perform weight allocation and feature filtering on the multi-dimensional feature set, and output the fused feature map;

[0017] S6: Input the fused feature map into the U-Net network for feature decoding to obtain preliminary segmentation results;

[0018] S7: The preliminary segmentation results are post-processed using a fully connected conditional random field (CRF) to optimize the segmentation accuracy of the boundary region and obtain the final segmentation results of the colorectal cancer MRI image.

[0019] The core technical solution of this invention is a five-stage process of "multi-source data input → multi-dimensional feature fusion → attention filtering → U-Net decoding → CRF post-processing". The beneficial effects of this invention are as follows:

[0020] (1) Multimodal data integration, more comprehensive information: It covers images such as T1-weighted images and T2-weighted images, as well as clinical data such as lifestyle habits, disease function and metabolism, breaking through the limitations of single image data, providing a rich data foundation for subsequent accurate segmentation, and can more comprehensively reflect the relevant information of colorectal cancer lesions;

[0021] (2) Multi-scale feature extraction, taking into account both details and the overall picture: Seven-dimensional feature extraction units are constructed to achieve seamless integration of clinical data and image features. Feature data under the 1×1, 3×3, 5×5 and 7×7 receptive fields are extracted to avoid missing key lesion information due to focusing only on features of a certain scale, and to improve the ability to identify lesions of different sizes and shapes.

[0022] (3) Multi-dimensional feature fusion and high feature utilization: By processing features through multiple feature extraction units, a multi-dimensional feature set is formed, which fully explores features such as texture, shape, and gray level, so that the model can understand the lesion features from different angles and reduce the segmentation deviation caused by single features.

[0023] (4) Attention mechanism empowers precise feature selection: The introduction of spatial attention mechanism and channel attention mechanism can perform weight allocation and feature selection on multi-dimensional feature sets, so that the model focuses on regions and channels that are important for segmentation, reduces interference from irrelevant features, and improves feature utilization efficiency.

[0024] (5) The combination of U-Net network decoding and CRF post-processing results in high segmentation accuracy: U-Net network is good at medical image segmentation and can effectively decode and fuse features to obtain preliminary segmentation results; fully connected conditional random field (CRF) further optimizes the segmentation accuracy of boundary regions, reduces boundary ambiguity problems, and ultimately significantly improves the accuracy of colorectal cancer MRI image segmentation, providing reliable support for early diagnosis and staging of colorectal cancer.

[0025] Optionally, the preprocessing of the multimodal data in step S2 includes:

[0026] (1) Image preprocessing: Using the T2-weighted image as the reference image, the rigid registration algorithm based on mutual information is used to register other modal images. The registration process reduces the computational complexity by using Gaussian pyramid downsampling (512×512→64×64), and the registration error is controlled within 0.5 pixels. Adaptive Wiener filtering is used to remove Gaussian noise from the image, and the filter window size is set to 3×3. Gray-level normalization is achieved by Z-score standardization (mean 0, standard deviation 1). Regions of interest (ROI) containing lesion areas are marked on the T2-weighted image and cropped into 256×256 pixel image blocks.

[0027] (2) Clinical data preprocessing: The lifestyle data were quantified and encoded, and the ordinal variables were mapped to continuous values ​​of 0-1 to generate a 1×5 lifestyle quantification vector; the functional metabolic data were parameter calculated, and the ADC mapping map was calculated based on the b value of the DWI image (b=0, b=1000s / mm²). The ADC value was divided into 5 intervals (<0.6, 0.6-1.0, 1.0-1.5, 1.5-2.0, >2.0×10⁻³mm² / s) to generate ADC interval features; the time-signal intensity curve of the DCE image was fitted using the Tofts pharmacokinetic model, and three core parameters were extracted: contrast agent transport constant (Ktrans), extracellular space volume ratio (Ve), and contrast agent reflux rate (Kep). The core parameter data were processed using the Min-Max normalization method to map them to the range of 0 to 1, thereby obtaining the pharmacokinetic parameter vector.

[0028] As described above, using the T2-weighted image as a reference for registration of other modal images, combined with Gaussian pyramid downsampling to reduce computational complexity, and controlling the registration error within 0.5 pixels, ensures spatial consistency of multimodal images, facilitates subsequent feature fusion, improves computational efficiency, and reduces model training and inference time. Adaptive Wiener filtering removes Gaussian noise and improves image clarity; Z-score normalization achieves grayscale normalization, eliminating interference from grayscale differences between different images, allowing the model to focus more on lesion feature learning and reducing segmentation errors caused by image quality and grayscale inconsistencies. Regions of interest are marked on the T2-weighted image and cropped into fixed-size image blocks, focusing on the lesion region, reducing interference from irrelevant background information, reducing model computation, and making the input data more targeted, thus improving the model's lesion learning effect. Mapping rank variables to continuous 0-1 values ​​to generate quantized vectors transforms non-numerical lifestyle data into numerical data that the model can process, enabling the model to segment based on patient lifestyle information, improving adaptability to individual differences, and reducing segmentation performance fluctuations caused by individual lifestyle differences. The ADC mapping map is calculated and interval features are generated. The DCE pharmacokinetic parameters are extracted and normalized. The correlation between functional metabolic data and the malignancy of lesions is fully explored and transformed into effective feature vectors. This allows the model to use these key functional information to identify lesions, improve the recognition rate of early small lesions, and reduce the false positive rate in non-intestinal wall areas.

[0029] Optionally, S3 includes:

[0030] A multi-scale feature extraction module is constructed, which contains four parallel convolutional branches. Each branch consists of a convolutional layer, a batch normalization layer, and a ReLU activation function layer in sequence.

[0031] (1) 1×1 convolutional branch: 64 convolutional kernels, stride 1, used to extract image detail features;

[0032] (2) 3×3 convolution branch: 64 convolution kernels, stride 1, dilation rate 1, used to extract local features of the lesion area;

[0033] (3) 5×5 convolutional branch: 64 convolutional kernels, stride 1, dilation rate 2, used to extract medium-scale lesion features;

[0034] (4) 7×7 convolutional branch: 64 convolutional kernels, stride 1, dilation rate 3, used to extract global features of the lesion area;

[0035] The 256×256×64 feature maps output from the four branches are concatenated through channels to obtain a 256×256×256 multi-scale feature map.

[0036] As can be seen from the above description, the four parallel convolutional branches correspond to different receptive fields. The 1×1 convolutional branch focuses on extracting image detail features and can capture the subtle structure of lesions. The 3×3 convolutional branch extracts local features of the lesion area, which helps to locate the local range of the lesion. The 5×5 and 7×7 convolutional branches extract medium-scale and global lesion features respectively, which can grasp the overall shape and positional relationship of the lesion. Each branch performs its own function and comprehensively obtains lesion-related features at different scales.

[0037] In addition, the feature maps output from the four branches are concatenated through channels to obtain a multi-scale feature map with higher dimensions. This integrates feature information from different scales, making the feature expression richer and more comprehensive. This lays a solid foundation for subsequent multi-dimensional feature fusion and accurate segmentation, reduces the loss of lesion information due to insufficient feature dimensions, and improves the model's ability to identify and segment complex lesions.

[0038] Optionally, in step S4, the features extracted in step S3 are processed to obtain a multi-dimensional feature set. The processing steps include:

[0039] (1) Texture feature extraction unit: Using the gray-level co-occurrence matrix (GLCM), under the conditions of distance d=1 and angle θ=0° / 45° / 90° / 135°, four texture parameters, namely contrast, correlation, energy and entropy, are calculated to generate four 256×256×1 GLCM feature maps; combined with local binary mode (LBP), local texture coding is generated by 3×3 neighborhood comparison, and the coding histogram is statistically analyzed to obtain one 256×256×1 LBP feature map; after splicing, a 256×256×5 texture feature set is formed;

[0040] (2) Shape feature extraction unit: The Canny edge detection algorithm (high threshold 0.7, low threshold 0.3) is used to obtain the lesion edge information and generate a 256×256×1 edge feature map; 7 Hu invariant moments are calculated for the lesion area in the edge feature map, and each moment parameter is extended to a 256×256×1 feature map through bilinear interpolation. After splicing, a 256×256×7 shape feature set is formed.

[0041] (3) Gray-scale feature optimization unit: The contrast-limited adaptive histogram equalization (CLAHE) is adopted to divide the gray-scale range into 64 intervals. The contrast limit factor is set to 2.0 to generate an enhanced gray-scale feature map of 256×256×256. The Sobel operator is used to calculate the gray-scale gradient in the x and y directions respectively. The gradient magnitude is obtained by taking the square root of the sum of squares to generate a gray-scale gradient feature map of 256×256×1. After stitching, a gray-scale feature set of 256×256×257 is formed.

[0042] (4) Modal correlation feature mining unit: calculate the Pearson correlation coefficients of corresponding pixels in four modal images: T1, T2, DWI, and DCE, and construct a 4×4 modal correlation matrix; expand the 16 correlation coefficients in the matrix into 16 256×256×1 feature maps through bilinear interpolation, and stitch them together to form a 256×256×16 modal correlation feature set;

[0043] (5) Living Habits Feature Extraction Unit: Construct a 2-layer fully connected network (1×5 input layer, 32 ReLU neurons in the hidden layer, and 64 ReLU neurons in the output layer) to encode the living habits quantization vector into a 1×64 feature vector; generate a 256×256×64 living habits feature map through a spatial broadcasting strategy;

[0044] (6) Functional metabolic feature extraction unit: The ADC interval features are mapped into 5 256×256×1 feature maps (1 map for each interval); the normalized Ktrans, Ve, and Kep parameters are spatially broadcast to generate 3 256×256×1 feature maps respectively; after splicing, a 256×256×8 functional metabolic feature set is formed.

[0045] (7) Tissue-specific feature extraction unit: The colorectal anatomical parts (rectum, sigmoid colon, descending colon, transverse colon, ascending colon) are encoded using one-hot encoding to generate 5 256×256×1 site feature maps; the intestinal wall is segmented using the U-Net++ model to obtain a binary mask (1 represents the intestinal wall, 0 represents the non-intestinal wall), and the shortest distance from each pixel to the intestinal wall is calculated using the Euclidean distance transform algorithm. The distance constraint feature map is generated by assigning values ​​according to "≤3mm=1, 3-10mm=0.5, >10mm=0"; after splicing, a 256×256×6 tissue-specific feature set is formed.

[0046] By concatenating the above seven feature sets through channels, a multi-dimensional feature pool of 256×256×350 is obtained.

[0047] As described above, by comprehensively covering multiple dimensions of features, lesions can be accurately depicted:

[0048] Texture and shape features: The texture feature extraction unit obtains texture information through GLCM and LBP, which can distinguish the texture differences between lesions and normal tissues; the shape feature extraction unit captures the edge and shape features of lesions with the help of Canny edge detection and Hu invariant moments, which helps to determine the outline of the lesion. The combination of the two can accurately identify lesions from appearance features.

[0049] Gray-scale feature optimization: CLAHE enhances gray-scale features, and the Sobel operator calculates gray-scale gradients, which strengthens the gray-scale contrast between lesions and surrounding tissues, making it easier for the model to identify lesion areas and reducing missed or misdiagnosed cases caused by insignificant gray-scale differences.

[0050] Modal association features: By calculating the Pearson correlation coefficient of pixels in multimodal images, the association information between different modalities can be mined. The complementarity between modalities can be used to improve the reliability of lesion recognition and avoid segmentation bias caused by insufficient information from a single modality.

[0051] Lifestyle Habits and Functional Metabolic Characteristics: The lifestyle habit feature extraction unit encodes lifestyle habits into feature maps, enabling the model to judge the likelihood of lesions by combining the patient's lifestyle habits; the functional metabolic feature extraction unit processes ADC interval features and pharmacokinetic parameters, making full use of functional metabolic information to reflect lesion activity, improving the ability to identify early and small lesions, and reducing the impact of individual differences on segmentation performance.

[0052] Tissue-specific features: The anatomical location features encoded by unique heat and the constraint features based on intestinal wall distance provide anatomical location references for segmentation, reducing false positives in non-intestinal wall areas caused by ascites, adipose tissue, etc., and improving the accuracy and specificity of segmentation.

[0053] In addition, the feature pool is formed by splicing the feature sets, and the information is fully integrated: the seven feature sets are spliced ​​to form a multi-dimensional feature pool, which integrates multi-source information such as images, clinical data, and functional metabolism, and realizes the deep fusion of multi-dimensional features. This enables the model to comprehensively judge the lesion area from multiple perspectives, greatly improving the comprehensiveness and effectiveness of the features, and providing sufficient and high-quality feature support for subsequent attention mechanism screening and accurate segmentation.

[0054] Optionally, the processing step S5 includes:

[0055] (1) Spatial attention mechanism: Global average pooling is performed on the multi-dimensional feature pool to obtain a 1×1×350 feature vector; a spatial attention weight matrix (256×256×350) is generated through a 2-layer fully connected network (128 ReLU neurons in the hidden layer and 350 Sigmoid neurons in the output layer); for the region of "high risk of lifestyle habits (quantitative value ≥0.8) + abnormal functional metabolism (ADC <0.6×10⁻³mm² / s or Ktrans ≥0.3min⁻¹)", the spatial weight is increased by 1.2-1.5 times;

[0056] (2) Channel attention mechanism: The squeeze-excitement (SE) module is used to perform global average pooling on the multi-dimensional feature pool to obtain a 1×1×350 channel feature vector; a channel attention weight vector (1×1×350) is generated through a 2-layer fully connected network (32 ReLU neurons in the hidden layer and 350 Sigmoid neurons in the output layer); the channel weights are increased by 1.3 times for channels with high ADC activity (<0.6×10⁻³mm² / s), high Ktrans values, and intestinal wall distance ≤3mm.

[0057] (3) Feature fusion: The spatial attention weight matrix and the channel attention weight vector are fused element-wise to obtain the final feature weight matrix; the multidimensional feature pool and the feature weight matrix are fused element-wise to output a fused feature map of 256×256×350.

[0058] As described above, by generating a spatial attention weight matrix through global average pooling and a fully connected network, the model can automatically identify spatial regions important for segmentation, increase the weights of these regions, reduce interference from irrelevant regions, and allow the model to focus more on areas where lesions may exist. Furthermore, by increasing the spatial weights of regions with high-risk lifestyle habits and abnormal metabolic function, the model can specifically focus on areas with a high probability of lesions, further improving the model's sensitivity in identifying lesions in high-risk areas, reducing missed diagnoses of early, small lesions, and improving the accuracy and specificity of segmentation. Moreover, the SE module generates channel attention weight vectors, highlighting the role of important feature channels (such as high ADC activity regions, high Ktrans values, and intestinal wall distance ≤3mm), suppressing irrelevant or interfering channels, and improving the utilization rate of effective feature channels. This allows the model to prioritize channel features more valuable for lesion identification and reduce the impact of ineffective features on the segmentation results. Simultaneously, the fusion of spatial and channel attention weights yields the final feature weight matrix, which is then multiplied by a multi-dimensional feature pool, achieving dual optimization of spatial regions and feature channels. This filters out more critical and effective features, significantly improving the quality of fused features and providing high-quality feature input for subsequent accurate decoding and segmentation by the U-Net network, further improving segmentation accuracy.

[0059] Optionally, the U-Net network uses an improved U-Net network for feature decoding, the improved U-Net network including an encoder and a decoder:

[0060] (1) Encoder: It contains 4 convolutional blocks. Each convolutional block consists of two 3×3 convolutional layers (the number of output channels is 64, 128, 256, and 512 respectively), a batch normalization layer, and a ReLU activation function layer. Adjacent convolutional blocks are downsampled through a 2×2 max pooling layer (stride 2). Each convolutional block introduces a residual connection (adding the input and output features of the convolutional block) to solve the gradient vanishing problem in deep networks.

[0061] (2) Decoder: It contains 4 deconvolution blocks. Each deconvolution block consists of a 2×2 transposed convolutional layer (the number of input channels is 512, 256, 128, and 64 respectively), a batch normalization layer, and a ReLU activation function layer. Upsampling is achieved through transposed convolution. Skip connections are used to fuse the high-resolution features (including edge details) of the corresponding layer of the encoder with the low-resolution features of the decoder.

[0062] (3) Output layer: The number of channels is reduced to 1 by a 1×1 convolutional layer, and a 256×256×1 probability map is output by combining the Sigmoid activation function (pixel value 0-1, the closer the value is to 1, the more likely it is to be a lesion area); the area with pixel value ≥0.5 in the probability map is marked as a lesion, and the preliminary segmentation result is obtained.

[0063] From the above description, we can see that ① encoder optimization solves the gradient vanishing problem;

[0064] Convolutional blocks and residual connections: The encoder's convolutional blocks consist of convolutional layers, batch normalization layers, and ReLU activation function layers, which can effectively extract features. Residual connections add the input and output features of the convolutional blocks, solving the gradient vanishing problem in deep network training. This enables the model to deeply mine feature information, improve the ability to extract complex lesion features, and avoid insufficient feature extraction caused by gradient vanishing in deep networks.

[0065] Downsampling and increased channel count: 2×2 max pooling layer downsampling reduces feature map size while increasing the number of output channels. This reduces computational cost while improving the abstraction of features, enabling the model to capture higher-level lesion features and provide more representative feature information for subsequent decoding.

[0066] ② Decoder optimization, incorporating high-resolution features

[0067] Deconvolution block upsampling: The decoder's deconvolution block achieves upsampling through transposed convolution, restoring the feature map size and gradually restoring the spatial location information of the lesion, providing a spatial basis for accurate segmentation of the lesion region.

[0068] Skip connections: Fusing high-resolution features from the corresponding layer of the encoder with low-resolution features from the decoder, supplementing the edge details lost during the decoding process, enabling the segmentation results to more accurately restore the edges of lesions, reduce edge blurring, and improve the precision of segmentation.

[0069] ③ The output layer accurately identifies the lesion area: The 1×1 convolutional layer reduces the dimension and combines it with the Sigmoid activation function to output a probability map. By setting a threshold to mark the lesion area, the lesion and normal tissue can be clearly distinguished, and an accurate preliminary segmentation result can be obtained. This lays a good foundation for subsequent CRF post-processing to optimize the boundary and improves the accuracy and reliability of the overall segmentation.

[0070] Optionally, the fully connected conditional random field (CRF) in S7 is an improved fully connected conditional random field (CRF). The improved CRF uses an extended adaptive potential function, expressed as follows: in: For any two pixels in the image; spatial coordinates, =10 represents the standard deviation of spatial distance; grayscale value, =5 represents the standard deviation of grayscale difference;

[0071] Fp and Fq are the multimodal feature values ​​of pixels p and q. =3 represents the standard deviation of modal features; Hp represents the risk coefficient of the living habits of pixel p; Dp represents the tissue distance constraint coefficient of pixel p; =0.5、 The weighting coefficients are used; by iteratively optimizing the adaptive potential function, the pixel classification results of the blurred boundary region are corrected, and the final colorectal cancer MRI image segmentation result is obtained.

[0072] As can be seen from the above description, ① the improved potential function introduces key constraint information: the extended adaptive potential function introduces the risk coefficient of living habits (Hp) and the tissue distance constraint coefficient (Dp), and combined with spatial distance, gray-level difference and modal feature information, it can more comprehensively consider the factors affecting pixel classification. In particular, by using living habits and tissue distance constraints, it can reduce false positive segmentation of non-intestinal wall areas, improve the accuracy of lesion area judgment, and avoid segmentation deviation caused by relying solely on image information.

[0073] ② Iterative optimization to correct boundaries and improve segmentation accuracy: By iteratively optimizing the adaptive potential function, the pixel classification results of blurred boundary areas are corrected, which solves the problem of unclear boundaries in the initial segmentation results, making the lesion boundary segmentation more accurate. This further improves the overall accuracy of colorectal cancer MRI image segmentation, providing clearer and more accurate lesion segmentation results for clinical diagnosis, and helping doctors to more accurately determine the lesion range and stage.

[0074] Optionally, the fully connected network training of the lifestyle feature extraction unit in step S4 should use "lifestyle quantification vector - lesion label (0=normal, 1=lesion)" as training data, adopt the cross-entropy loss function and Adam optimizer (learning rate 1e-4, β1=0.9, β2=0.999), train for 50 rounds, and adopt an early stopping strategy (if the validation set loss does not decrease for 5 consecutive rounds, training is stopped).

[0075] As can be seen from the above description, ① targeted training data improves network adaptability: using "lifestyle quantification vector - lesion label" as training data makes the training of the fully connected network more targeted, enabling it to better learn the relationship between lifestyle and lesions, and allowing the features output by the lifestyle feature extraction unit to more accurately reflect the impact of lifestyle on lesions, thereby improving the model's ability to segment by combining lifestyle information.

[0076] ② Reasonable loss function and optimizer to ensure training effect: The cross-entropy loss function is suitable for classification tasks and can effectively measure the difference between the network prediction value and the true label; the Adam optimizer (with appropriate learning rate and parameters) can efficiently optimize network parameters, speed up training convergence, and ensure that the network fully learns the association between living habits and lesions within 50 training rounds.

[0077] ③ Early stopping strategy prevents overfitting and improves generalization ability: If the validation set loss does not decrease for 5 consecutive rounds, training is stopped to avoid overfitting caused by overtraining of the network. This allows the trained fully connected network to maintain good performance on new data, improves the stability and reliability of life habit feature extraction, and reduces the impact of individual differences on the overall segmentation performance.

[0078] Optionally, the iterative optimization of the improved fully connected CRF in step S7 adopts the mean field approximation algorithm, and the computational complexity of each iteration is controlled to O(N²) (N is the number of image pixels). The post-processing time of a single image is controlled to within 500ms by GPU acceleration.

[0079] As can be seen from the above description, ① the iterative optimization algorithm ensures computational efficiency: the mean field approximation algorithm controls the computational complexity of the improved fully connected CRF iterative optimization to O(N²) (N is the number of image pixels), avoiding the problem of slow processing speed caused by excessive computational complexity, and ensuring that the iterative optimization process can be completed efficiently when processing images.

[0080] ② GPU acceleration improves processing speed: GPU acceleration keeps the post-processing time of a single image within 500ms, significantly shortening the post-processing time and improving the efficiency of the entire segmentation process. It can quickly output the final segmentation results, meet the timeliness requirements of clinical diagnosis, and avoid affecting the doctor's diagnostic efficiency due to excessive processing time.

[0081] Secondly, the technical solution adopted by the present invention is: a colorectal cancer MRI image segmentation system based on multi-dimensional feature fusion, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it implements the colorectal cancer MRI image segmentation method based on multi-dimensional feature fusion described above.

[0082] The technical effects of the colorectal cancer MRI image segmentation system based on multi-dimensional feature fusion provided in the second aspect are described in the relevant description of the colorectal cancer MRI image segmentation method based on multi-dimensional feature fusion provided in the first aspect. Attached Figure Description

[0083] Figure 1 This is a flowchart of the colorectal cancer MRI image segmentation method based on multi-dimensional feature fusion according to the present invention.

[0084] Figure 2 This is a schematic diagram of the structure of the colorectal cancer MRI image segmentation system based on multi-dimensional feature fusion of the present invention;

[0085] Explanation of reference numerals in the attached figures

[0086] 1. A colorectal cancer MRI image segmentation system based on multi-dimensional feature fusion; 2. Memory; 3. Processor. Detailed Implementation

[0087] To better explain and facilitate understanding of the present invention, the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. Although exemplary embodiments of the present invention are shown in the accompanying drawings, it should be understood that the present invention can be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that the present invention can be understood more clearly and thoroughly, and that the scope of the present invention can be fully conveyed to those skilled in the art.

[0088] Example 1

[0089] Please refer to Figure 1 As shown, the colorectal cancer MRI image segmentation method based on multi-dimensional feature fusion includes the following steps:

[0090] S1: Obtain multimodal data of colorectal cancer patients, the multimodal data including:

[0091] (1) T1-weighted images, T2-weighted images, diffusion-weighted imaging (DWI) images and dynamic contrast-enhanced (DCE) MRI images, with an image resolution of 512×512 pixels and a slice thickness of 3-5 mm;

[0092] (2) Clinically relevant data, including at least lifestyle data and functional metabolic data. The lifestyle data includes smoking history (none / <10 years / ≥10 years), drinking history (none / 1-2 times / week / ≥3 times / week), diet (low-fat high-fiber / normal / high-fat low-fiber), exercise habits (≥150 minutes / week / 30-149 minutes / week / <30 minutes / week), and constipation history (none / occasionally / long-term). The functional metabolic data includes the apparent diffusion coefficient (ADC) mapping map corresponding to the DWI image and the pharmacokinetic parameters corresponding to the DCE image.

[0093] Among them, lifestyle data reflects the basic risk of disease, functional metabolic data reflects the activity of disease, and tissue-specific data provides anatomical constraints.

[0094] The acquisition of multimodal data must meet the following standards: T1-weighted images use spin echo sequences (TR=500-800ms, TE=10-20ms), T2-weighted images use fast spin echo sequences (TR=2000-4000ms, TE=80-120ms), DWI uses single-shot spin echo-echo plane imaging sequences (TR=3000-5000ms, TE=60-100ms, b-value=0, 1000s / mm²), and DCE-MRI uses three-dimensional gradient echo sequences (TR=3-5ms, TE=1-2ms, flip angle 10°-15°, contrast agent is gadopentetate dimeglumine, dose 0.1mmol / kg).

[0095] S2: Preprocessing the multimodal data; including:

[0096] (1) Image preprocessing: Using the T2-weighted image as the reference image, the rigid registration algorithm based on mutual information is used to register other modal images. The registration process reduces the computational complexity by using Gaussian pyramid downsampling (512×512→64×64), and the registration error is controlled within 0.5 pixels. Adaptive Wiener filtering is used to remove Gaussian noise from the image, and the filter window size is set to 3×3. Gray-level normalization is achieved by Z-score standardization (mean 0, standard deviation 1). Regions of interest (ROI) containing lesion areas are marked on the T2-weighted image and cropped into 256×256 pixel image blocks.

[0097] (2) Clinical data preprocessing: The lifestyle data were quantified and encoded, and the ordinal variables were mapped to continuous values ​​of 0-1 to generate a 1×5 lifestyle quantification vector; the functional metabolic data were parameter calculated, and the ADC mapping map was calculated based on the b value of the DWI image (b=0, b=1000s / mm²). The ADC value was divided into 5 intervals (<0.6, 0.6-1.0, 1.0-1.5, 1.5-2.0, >2.0×10⁻³mm² / s) to generate ADC interval features; the time-signal intensity curve of the DCE image was fitted using the Tofts pharmacokinetic model, and three core parameters were extracted: contrast agent transport constant (Ktrans), extracellular space volume ratio (Ve), and contrast agent reflux rate (Kep). The core parameter data were processed using the Min-Max normalization method to map them to the range of 0 to 1, thereby obtaining the pharmacokinetic parameter vector;

[0098] Among them, the quantitative coding of lifestyle data needs to be collected through the hospital's electronic medical record system or standardized questionnaires. The collection process must be approved by the hospital's ethics committee and the patient must sign an informed consent form. The quantified data needs to be filled with missing values, using the "mean of the same type" strategy (e.g., missing values ​​of smoking history are filled with the mean of smoking history of patients in the same age group).

[0099] S3: Construct a multi-scale feature extraction module, input the preprocessed multimodal data into the multi-scale feature extraction module, and extract the feature data of the image under the receptive fields of 1×1, 3×3, 5×5 and 7×7 respectively;

[0100] Specifically, the module contains four parallel convolutional branches, each consisting of a convolutional layer, a batch normalization layer, and a ReLU activation function layer in sequence:

[0101] (1) 1×1 convolutional branch: 64 convolutional kernels, stride 1, used to extract image detail features;

[0102] (2) 3×3 convolution branch: 64 convolution kernels, stride 1, dilation rate 1, used to extract local features of the lesion area;

[0103] (3) 5×5 convolutional branch: 64 convolutional kernels, stride 1, dilation rate 2, used to extract medium-scale lesion features;

[0104] (4) 7×7 convolutional branch: 64 convolutional kernels, stride 1, dilation rate 3, used to extract global features of the lesion area;

[0105] The 256×256×64 feature maps output from the four branches are concatenated through channels to obtain a 256×256×256 multi-scale feature map.

[0106] S4: Establish a multi-dimensional feature fusion module, which includes a texture feature extraction unit, a shape feature extraction unit, a grayscale feature optimization unit, a modal correlation feature mining unit, a lifestyle feature extraction unit, a functional metabolic feature extraction unit, and a tissue-specific feature extraction unit. These units process the features extracted in step S3 to obtain a multi-dimensional feature set.

[0107] Its processing includes:

[0108] (1) Texture feature extraction unit: Using the gray-level co-occurrence matrix (GLCM), under the conditions of distance d=1 and angle θ=0° / 45° / 90° / 135°, four texture parameters, namely contrast, correlation, energy and entropy, are calculated to generate four 256×256×1 GLCM feature maps; combined with local binary mode (LBP), local texture coding is generated by 3×3 neighborhood comparison, and the coding histogram is statistically analyzed to obtain one 256×256×1 LBP feature map; after splicing, a 256×256×5 texture feature set is formed;

[0109] (2) Shape feature extraction unit: The Canny edge detection algorithm (high threshold 0.7, low threshold 0.3) is used to obtain the lesion edge information and generate a 256×256×1 edge feature map; 7 Hu invariant moments (with translation, rotation and scaling invariance) are calculated for the lesion area in the edge feature map. Each moment parameter is extended to a 256×256×1 feature map through bilinear interpolation. After being stitched together, a 256×256×7 shape feature set is formed.

[0110] (3) Gray-scale feature optimization unit: The contrast-limited adaptive histogram equalization (CLAHE) is adopted to divide the gray-scale range into 64 intervals. The contrast limit factor is set to 2.0 to generate an enhanced gray-scale feature map of 256×256×256. The Sobel operator is used to calculate the gray-scale gradient in the x and y directions respectively. The gradient magnitude is obtained by taking the square root of the sum of squares to generate a gray-scale gradient feature map of 256×256×1. After stitching, a gray-scale feature set of 256×256×257 is formed.

[0111] (4) Modal correlation feature mining unit: calculate the Pearson correlation coefficients of corresponding pixels in four modal images: T1, T2, DWI, and DCE, and construct a 4×4 modal correlation matrix; expand the 16 correlation coefficients in the matrix into 16 256×256×1 feature maps through bilinear interpolation, and stitch them together to form a 256×256×16 modal correlation feature set;

[0112] (5) Living Habits Feature Extraction Unit: Construct a 2-layer fully connected network (1×5 input layer, 32 ReLU neurons in the hidden layer, and 64 ReLU neurons in the output layer) to encode the living habits quantization vector into a 1×64 feature vector; generate a 256×256×64 living habits feature map through a spatial broadcasting strategy;

[0113] (6) Functional metabolic feature extraction unit: The ADC interval features are mapped into 5 256×256×1 feature maps (1 map for each interval); the normalized Ktrans, Ve, and Kep parameters are spatially broadcast to generate 3 256×256×1 feature maps respectively; after splicing, a 256×256×8 functional metabolic feature set is formed.

[0114] (7) Tissue-specific feature extraction unit: The colorectal anatomical parts (rectum, sigmoid colon, descending colon, transverse colon, ascending colon) are encoded using one-hot encoding to generate 5 256×256×1 site feature maps; the intestinal wall is segmented using the U-Net++ model to obtain a binary mask (1 represents the intestinal wall, 0 represents the non-intestinal wall), and the shortest distance from each pixel to the intestinal wall is calculated using the Euclidean distance transform algorithm. The distance constraint feature map is generated by assigning values ​​according to "≤3mm=1, 3-10mm=0.5, >10mm=0"; after splicing, a 256×256×6 tissue-specific feature set is formed.

[0115] By concatenating the above seven feature sets through channels, a multi-dimensional feature pool of 256×256×350 is obtained;

[0116] The training of the fully connected network for the lifestyle feature extraction unit requires "lifestyle quantification vector - lesion label (0=normal, 1=lesion)" as training data, using the cross-entropy loss function and Adam optimizer (learning rate 1e-4, β1=0.9, β2=0.999), 50 training rounds, and an early stopping strategy (if the validation set loss does not decrease for 5 consecutive rounds, training is stopped).

[0117] The calculation of the ADC mapping map of the functional metabolic feature extraction unit requires the use of the post-processing software provided with the MRI equipment (such as GEAW4.7, SiemensSyngo), and motion artifact regions need to be excluded during the calculation process; the fitting of pharmacokinetic parameters needs to be achieved through the PKIN toolkit of MATLAB R2022b, and motion correction needs to be performed on the DCE time series images before fitting.

[0118] S5: Introduce spatial attention and channel attention mechanisms into the multi-dimensional feature fusion module, perform weight allocation and feature filtering on the multi-dimensional feature set, and output the fused feature map, including the following steps:

[0119] (1) Spatial attention mechanism: Global average pooling is performed on the multi-dimensional feature pool to obtain a 1×1×350 feature vector; a spatial attention weight matrix (256×256×350) is generated through a 2-layer fully connected network (128 ReLU neurons in the hidden layer and 350 Sigmoid neurons in the output layer); for the region of "high risk of lifestyle habits (quantitative value ≥0.8) + abnormal functional metabolism (ADC <0.6×10⁻³mm² / s or Ktrans ≥0.3min⁻¹)", the spatial weight is increased by 1.2-1.5 times;

[0120] (2) Channel attention mechanism: The squeeze-excitement (SE) module is used to perform global average pooling on the multi-dimensional feature pool to obtain a 1×1×350 channel feature vector; a channel attention weight vector (1×1×350) is generated through a 2-layer fully connected network (32 ReLU neurons in the hidden layer and 350 Sigmoid neurons in the output layer); the channel weights are increased by 1.3 times for key channels such as the high ADC activity region (<0.6×10⁻³mm² / s), high Ktrans value, and intestinal wall distance ≤3mm;

[0121] (3) Feature fusion: The spatial attention weight matrix and the channel attention weight vector are fused element-wise to obtain the final feature weight matrix; the multi-dimensional feature pool is fused element-wise with the feature weight matrix to output a 256×256×350 fused feature map;

[0122] Specifically, the weight adjustment of the spatial attention mechanism needs to be optimized through the validation set. Specifically, in the validation set containing 200 samples, the Dice similarity coefficient (DSC) under different weight coefficients is calculated, and the weight range corresponding to the maximum DSC value (1.2-1.5 times) is selected. The key channel selection of the channel attention mechanism needs to be determined through feature importance analysis (such as permutation importance) to identify the channels that contribute the top 20% to the segmentation results.

[0123] S6: Input the fused feature map into the U-Net network for feature decoding to obtain preliminary segmentation results;

[0124] The U-Net network used here is an improved U-Net network for feature decoding. The training parameters of the improved U-Net network are set as follows: batch size 8, initial learning rate 1e-4, cosine annealing learning rate scheduling (decaying to 0.8 times the previous round every 10 rounds), 100 training rounds, and early stopping strategy (training stops if the DSC on the validation set does not improve for 8 consecutive rounds).

[0125] The improved U-Net network includes an encoder and a decoder:

[0126] (1) Encoder: It contains 4 convolutional blocks. Each convolutional block consists of two 3×3 convolutional layers (the number of output channels is 64, 128, 256, and 512 respectively), a batch normalization layer, and a ReLU activation function layer. Adjacent convolutional blocks are downsampled through a 2×2 max pooling layer (stride 2). Each convolutional block introduces a residual connection (adding the input and output features of the convolutional block) to solve the gradient vanishing problem in deep networks.

[0127] (2) Decoder: It contains 4 deconvolution blocks. Each deconvolution block consists of a 2×2 transposed convolutional layer (the number of input channels is 512, 256, 128, and 64 respectively), a batch normalization layer, and a ReLU activation function layer. Upsampling is achieved through transposed convolution. Skip connections are used to fuse the high-resolution features (including edge details) of the corresponding layer of the encoder with the low-resolution features of the decoder.

[0128] (3) Output layer: The number of channels is reduced to 1 by a 1×1 convolutional layer, and a 256×256×1 probability map is output by combining the Sigmoid activation function (pixel value 0-1, the closer the value is to 1, the more likely it is to be a lesion area); the area with pixel value ≥0.5 in the probability map is marked as a lesion, and the preliminary segmentation result is obtained;

[0129] S7: The preliminary segmentation results are post-processed using a fully connected conditional random field (CRF) to optimize the segmentation accuracy of the boundary region and obtain the final segmentation results of the colorectal cancer MRI image.

[0130] The fully connected conditional random field (CRF) mentioned here is an improved fully connected conditional random field (CRF). The improved CRF uses an extended adaptive potential function, expressed as follows: in: For any two pixels in the image; spatial coordinates, =10 represents the standard deviation of spatial distance; grayscale value, =5 represents the standard deviation of grayscale difference; Fp and Fq are the multimodal feature values ​​of pixels p and q, respectively. =3 represents the standard deviation of modal features; Hp represents the risk coefficient of the living habits of pixel p; Dp represents the tissue distance constraint coefficient of pixel p; =0.5、 The weighting coefficients are used; by iteratively optimizing the adaptive potential function, the pixel classification results of the blurred boundary region are corrected, and the final colorectal cancer MRI image segmentation result is obtained.

[0131] Among them, the iterative optimization of the improved fully connected CRF adopts the mean field approximation algorithm, and the computational complexity of each iteration is controlled at O(N²) (N is the number of image pixels). Through GPU acceleration, the post-processing time of a single image is controlled within 500ms.

[0132] This also includes model training and performance verification steps, as shown in the following example:

[0133] (1) Dataset construction: Collect multi-source data of at least 800 patients with pathologically confirmed colorectal cancer, including 450-500 male patients and 300-350 female patients, aged 40-75 years, and tumor stage I-IV; each data should include complete multimodal MRI images, clinical correlation data, and the gold standard segmentation jointly annotated by two radiologists with the title of associate chief physician or above (in case of inconsistent annotation, the chief physician shall arbitrate);

[0134] (2) Data set partitioning: The dataset was randomly divided into a training set (560-640 cases), a validation set (160-180 cases), and a test set (80-100 cases) in a ratio of 7:2:1. Stratified sampling was used in the partitioning process to ensure that the tumor stage, gender, and age distribution in each set were consistent with the total dataset.

[0135] (3) Data augmentation: Random flipping (horizontal / vertical flipping, probability 0.5), random rotation (-15°~15°, probability 0.5), random scaling (0.8~1.2 times, probability 0.5), and Gaussian noise addition (standard deviation 0.01~0.03, probability 0.3) were used to enhance the diversity of multimodal MRI images in the training set; random perturbation (within ±10%, probability 0.4) was used on the lifestyle quantification vector in the clinical association data to simulate the heterogeneity of real data;

[0136] (4) Model Training: A joint training strategy is adopted, using the multi-scale feature extraction module, the multi-dimensional feature fusion module (including living habits / functional metabolism / tissue-specific units), and the improved U-Net network as the overall model. The total loss function is the weighted sum of the cross-entropy loss function and the Dice loss function, expressed as:

[0137] Loss=a×CrossEntropyLoss+(1–a)×DiceLoss

[0138] Where 'a' is the weight coefficient, with a value of 0.4 (determined through validation set tuning), CrossEntropyLoss measures the difference between the distribution of predicted probabilities and true labels, and DiceLoss measures the overlap between the predicted segmentation region and the gold standard. The Adam optimizer is used (initial learning rate 1e-4, β1=0.9, β2=0.999, weight decay 1e-5), batch size is set to 8, training runs for 100 epochs, and an early stopping strategy is employed (training stops if the Dice similarity coefficient on the validation set does not improve for 8 consecutive epochs). The module with the best performance on the validation set is saved. Model weights; (5) Performance verification: The model performance is verified on the test set using four evaluation indicators, including Dice similarity coefficient (DSC, target 20.90), intersection-union ratio (LOU, target 20.85), sensitivity (Sensitivity, target 20.88, measuring the recall rate of the lesion area), and specificity (Target ≥0.95, measuring the accuracy of the normal area); The verification process requires recording the segmentation time of a single image (target ≤1.5s, including preprocessing and postprocessing) to ensure that the model meets the clinical real-time requirements.

[0139] In summary, the technical effects of the present invention are as follows:

[0140] The DSC on the test set reached 0.92±0.03, which is 15% higher than the traditional U-Net and 8% higher than the attention-based U-Net.

[0141] The false positive rate in non-intestinal wall areas was reduced to 3.2%, a 70% reduction compared to existing methods;

[0142] The early small lesion (3-5 mm in diameter) identification rate reached 89%, which is 30% higher than existing methods;

[0143] The segmentation time for a single image is 1.2 ± 0.2 s, which meets the clinical real-time requirements (≤ 1.5 s).

[0144] For patients with different lifestyles and physical conditions, the segmentation performance fluctuation (DSC difference) was controlled within 0.05, and the adaptability was significantly enhanced.

[0145] Example 2

[0146] Based on Example 1, the practical steps, model training, and deployment examples for multi-source data acquisition and preprocessing in the colorectal cancer MRI image segmentation method based on multi-dimensional feature fusion are as follows:

[0147] 1.1 Multimodal MRI Image Acquisition:

[0148] Equipment selection: A 3.0T superconducting MRI device (such as Siemens Prisma 3.0T) equipped with an 18-channel abdominal phased array coil was used;

[0149] Sequence parameter settings:

[0150] T1 weighted image: spin echo sequence, TR=650ms, TE=15ms, slice thickness 4mm, inter-slice spacing 1mm, FOV=380×380mm², matrix 320×320;

[0151] T2 weighted image: Fast spin echo sequence, TR=3000ms, TE=100ms, slice thickness 4mm, inter-slice spacing 1mm, FOV=380×380mm², matrix 320×320;

[0152] DWI: Single-shot spin echo-echo planar imaging sequence, TR=4000ms, TE=80ms, b-value=0, 1000s / mm², slice thickness 4mm, inter-slice spacing 1mm, FOV=380×380mm², matrix 128×128.

[0153] DCE-MRI: Three-dimensional gradient echo sequence, TR=4ms, TE=1.5ms, flip angle 12°, slice thickness 2mm, no inter-slice spacing, FOV=380×380mm², matrix 256×256; the contrast agent used was gadopentetate dimeglumine (Gd-DTPA), dose 0.1mmol / kg, injected via bolus injection through the antecubital vein (rate 2mL / s), followed by 30 consecutive scans (total scan time 6min).

[0154] 1.2 Clinical-related data collection:

[0155] Lifestyle data: The hospital’s electronic medical record system was used to extract the patient’s smoking history (to be determined according to needs) and drinking history over the past 5 or 10 years. Combined with the “Colorectal Cancer Lifestyle Questionnaire”, dietary structure (daily fat intake, dietary fiber intake), exercise habits (type and duration of exercise per week), and constipation history (number of bowel movements per week, frequency of difficulty in defecation) were collected.

[0156] Functional metabolic data: ADC mapping was calculated using Siemens Syngo post-processing software. The "Diffusion-Weighted Imaging - ADC Mapping" module was selected to automatically generate the ADC value distribution map, and motion artifact regions were manually excluded (marked as "invalid regions"). Pharmacokinetic parameters were fitted using the PKIN toolkit in MATLAB R2022b. The DCE time series image (DICOM format) was imported, the "Tofts dual-chamber model" was selected, the arterial input function (AIF) was set to "PopulationAIF", and the Ktrans, Ve, and Kep parameter maps were output.

[0157] 1.3 Preprocessing Practice:

[0158] Image registration: Using ITK-SNAP4.0 software, the T2-weighted image was used as the fixed image, and other modalities were used as moving images. The "mutual information" similarity metric was selected, and the "rigid transformation + Gaussian pyramid downsampling" registration strategy was adopted. The number of iterations was 500, and the root mean square error (RMSE) after registration was <0.3 pixels.

[0159] Lifestyle habits were quantified: using the Python Pandas library, “smoking history ≥ 10 years” was mapped to 1.0, “5-10 years” to 0.5, and “< 5 years or none” to 0.0. Other habits were quantified using the same logic. Missing values ​​were filled with the mean of the same age group (every 10 years).

[0160] Functional metabolic parameter normalization: Based on the parameter range of 560 training samples (Ktrans: 0.05-0.6 min⁻¹, Ve: 0.1-0.5, Kep: 0.1-0.8 min⁻¹), the Min-Max formula was used: x norm =(x−x min ) / (x max -x min These parameters are then normalized to the 0-1 range. 2. Model Training and Deployment Practice

[0161] 2.1 Training environment configuration:

[0162] Hardware: CPU Intel Xeon Gold 6348 (24 cores), GPU NVIDIA A100 (40GB VRAM), Memory 128GB DDR4, Hard Drive 2TB SSD;

[0163] Software: Operating system Ubuntu 20.04LTS, deep learning framework PyTorch 1.12.1, CUDA 11.6, CuDNN 8.4.1, image processing library OpenCV 4.6.0, medical image library SimpleITK 2.2.1, visualization tool TensorBoard 2.11.0.

[0164] 2.2 Model Training Steps:

[0165] Data loading: A custom Dataset class is used to load data in the format of "MRI image (4 channels) + lifestyle characteristics (1×5) + functional metabolic characteristics (8 channels) + tissue-specific characteristics (6 channels) + segmentation label". The batch size is 8, and multi-threading (num_workers=8) is used to accelerate the process.

[0166] Loss function calculation: PyTorch is used to implement cross-entropy loss and Dice loss, α=0.4, and the parameters of the multi-scale feature module, attention module and U-Net network are updated synchronously through backpropagation;

[0167] Early stopping strategy: When the DSC on the validation set improves by less than 0.01 for 8 consecutive rounds, save the current model weights (file name: "best_model.pth") and stop training; during training, evaluate on the validation set every 5 rounds and record the DSC, IoU and loss value.

[0168] 2.3 Clinical Deployment:

[0169] Software Development: Develop a Windows desktop application, "Colorectal Cancer MRI Segmentation System," based on PyQt5. This system supports DICOM image import (compatible with GE, Siemens, and Philips device formats), segmentation result visualization (pseudo-color annotation: red for tumor areas, transparent for normal tissue), and result export (JSON format including lesion area, maximum diameter, and DSC value).

[0170] Performance testing: In a hospital clinical environment (Windows 10 system, CPU Intel i7-12700H, GPU NVIDIA RTX 3070), the segmentation time for a single 256×256 image is 1.1±0.1s, which meets the real-time clinical requirements.

[0171] Example 3

[0172] Based on Example 1, this example is applied to a clinical setting. It briefly describes the segmentation process for a case of early-stage small lesions (stage I tumor, 4mm in diameter), as follows:

[0173] Patient information: Male, 52 years old, smoking for 15 years (quantitative value 1.0), high-fat diet (1.0), 20 minutes of exercise per week (0.1), occasional constipation (0.3); BMI=24, no other underlying diseases;

[0174] MRI features: localized mild thickening of the rectal wall on T2-weighted images (signal difference <8%), localized high signal on DWI, ADC=0.55×10⁻³mm² / s (high activity, quantification value 1.0), DCE-Ktrans=0.32min⁻¹ (abnormal, 0.8).

[0175] Segmentation process:

[0176] a. Multidimensional feature fusion: The lifestyle unit outputs a high-risk code (channel value 0.91), the functional metabolism unit marks high ADC activity (channel 1 value 1.0) and abnormal Ktrans (channel 6 value 0.8), and the tissue-specific unit confirms that the lesion is located in the rectal wall (distance = 1.8 mm, Dp = 1.0).

[0177] b. Attention mechanism: The spatial weight of the above-mentioned regions is increased by 1.4 times, and the channel weight is increased by 1.3 times, thus strengthening weak signals;

[0178] c. CRF post-processing: Hp=0.82 (lifestyle risk coefficient), Dp=1.0, in the potential function:

[0179] w3×H p +w4×D p =0.15×0.82+0.05×1.0=0.173, strengthening the association with the lesion;

[0180] Results: The segmentation DSC=0.91, and the lesion boundary matched the pathological section (postoperatively confirmed as stage I adenocarcinoma) 93%; the traditional U-Net did not identify the lesion (DSC=0.32), and the attention U-Net did not identify it completely (DSC=0.65).

[0181] Example 4

[0182] Based on Example 1, this example is applied to a clinical setting. It briefly describes the segmentation process for cases with non-intestinal wall false-positive filtration (interference from ascites), as follows:

[0183] Patient information: Female, 68 years old, no history of smoking or drinking (0.0), low-fat diet (0.0), 150 minutes of exercise per week (1.0), no constipation (0.0); BMI=26, history of cirrhosis (small amount of ascites).

[0184] MRI features: T2-weighted images show patchy high signal in the abdominal cavity (easily misdiagnosed as a tumor), DWI shows low signal (ADC=2.4×10⁻³mm² / s, normal, 0.0), DCE-Ktrans=0.11min⁻¹ (normal, 0.1), intestinal wall distance=16mm (Dp=0.0).

[0185] Segmentation process:

[0186] a. Tissue-specific unit output Dp=0.0 (distance>10mm);

[0187] b. CRF post-processing: w4×D p =0, and H p =0.08 (low risk), w3×H p =0.012, the probability of lesions in this area drops below 0.05;

[0188] Results: The area was classified as background (specificity = 0.98) with no false positives; traditional U-Net misclassified it as a lesion (false positive rate 18%), and U-Net++ misclassified it as a lesion (false positive rate 12%).

[0189] Example 5

[0190] Based on Example 1, this example is applied to a clinical setting. It briefly describes the segmentation process for obese patients (with fat signal interference), as follows:

[0191] Patient information: Male, 48 years old, smoked for 12 years (0.6), high-fat diet (1.0), exercised for 30 minutes per week (0.2), chronic constipation (0.8); BMI=31 (obese, significant fat signal).

[0192] MRI features: T2-weighted images showed the intestinal wall being masked by high signal intensity from fat (signal-to-noise ratio <15); DWI showed a local ADC of 0.62 × 10⁻³ mm² / s (low activity, 0.8) and DCE-Ktrans of 0.25 min⁻¹ (mild abnormality, 0.6).

[0193] Segmentation process:

[0194] a. The risk code output by the lifestyle unit is 0.75, and the functional metabolism unit marks mild ADC abnormality (channel 2 value 0.8).

[0195] b. Attention mechanism: The spatial weight of the "high-risk + metabolic abnormality" region is increased by 1.2 times and the channel weight is increased by 1.3 times, which counteracts the interference of fat signaling;

[0196] Results: Segmentation DSC=0.90, tumor invasion depth (T2 stage) was consistent with postoperative pathology; existing methods are affected by fat interference, traditional U-NetDSC=0.72, attention U-NetDSC=0.78.

[0197] Example 6

[0198] Based on Example 1, this example is applied to a clinical setting. It briefly describes the segmentation process for cases of lesion segmentation with blurred boundaries (tumor and inflammation being confused), as follows:

[0199] Patient information: Female, 55 years old, smoked for 8 years (0.5), normal diet (0.5), exercised 60 minutes per week (0.4), occasional constipation (0.3); after chemotherapy following colon cancer surgery, the boundary between the tumor and inflammation is blurred;

[0200] MRI features: Local thickening of the intestinal wall on T2-weighted images (overlapping signals between the tumor and inflammation, with unclear boundaries); DWI showed ADC = 0.58 × 10⁻³ mm² / s (1.0) in the tumor area and ADC = 0.85 × 10⁻³ mm² / s (0.5) in the inflammation area; DCE-Ktrans showed 0.35 min⁻¹ (0.9) in the tumor area and 0.18 min⁻¹ (0.3) in the inflammation area.

[0201] Segmentation process:

[0202] a. The functional metabolic unit distinguishes the differences in ADC / Ktrans between tumors and inflammation, generating differential feature maps;

[0203] b. Attention mechanism: The weight of regions with ADC < 0.6 × 10⁻³ mm² / s and Ktrans ≥ 0.3 min⁻¹ is increased by 1.5 times;

[0204] Results: The tumor region segmentation DSC=0.92, and the inflammatory region was not misclassified (specificity=0.96); the traditional U-Net misclassified inflammation as tumor (DSC=0.75), and the U-Net++ misclassification rate was 30% (DSC=0.78).

[0205] Example 7

[0206] Please refer to Figure 2 A colorectal cancer MRI image segmentation system 1 based on multi-dimensional feature fusion includes a memory 2, a processor 3, and a computer program stored on the memory 2 and executable on the processor 3. When the processor 3 executes the computer program, it implements the steps in Embodiment 1.

[0207] In summary, the colorectal cancer MRI image segmentation method and system based on multi-dimensional feature fusion provided by this invention has the following technical effects:

[0208] 1. Clinical application value

[0209] Facilitating early diagnosis: The early small lesion recognition rate is increased to 89%, which can increase the diagnosis rate of stage I colorectal cancer by 23%, increase the 5-year survival rate of patients from the current 68% to more than 85%, and reduce the incidence of late-stage disease;

[0210] Reduce over-medicalization: The non-intestinal wall false positive rate has been reduced to 3.2%, which can reduce unnecessary colonoscopies by about 23% (each colonoscopy costs about 3,000 yuan, saving more than 10 million yuan in medical costs annually).

[0211] Optimize treatment plans: Precise segmentation of tumor boundaries and invasion depth can provide quantitative basis for surgical plans (such as laparoscopic vs. open surgery) and chemotherapy dosage adjustments, reducing the postoperative recurrence rate (from 15% to 8%).

[0212] Adaptable to complex clinical scenarios: For obese patients, patients with poor bowel preparation, and patients requiring postoperative follow-up, the segmentation performance fluctuates little (DSC difference < 0.05), and its applicability covers more than 95% of clinical cases.

[0213] 2. Value of Technological Innovation

[0214] Multi-source feature fusion paradigm: It is the first to create a feature fusion framework driven by "image-clinical" dual-drive, providing a new paradigm for medical image segmentation, which can be extended to MRI segmentation of other tumors such as lung cancer and liver cancer; Attention mechanism optimization: It proposes an attention strategy that links "life habits-functional metabolism-image" to solve the problem of weak signal capture and provide technical reference for early lesion segmentation;

[0215] CRF post-processing extension: The potential function with added clinical prior constraints breaks through the limitations of traditional pure image CRF and provides a new method for segmentation of scenes with blurred boundaries and artifact interference;

[0216] Engineering implementation capability: The model is lightweight (weight file < 200MB) and has strong real-time performance (single image processing time < 1.5s), which can be directly integrated into the hospital PACS system without additional hardware upgrades, resulting in low implementation costs.

[0217] 3. Socioeconomic benefits

[0218] Health benefits: Through early diagnosis and precision treatment, approximately 15% of colorectal cancer deaths can be reduced annually, and the average survival time of patients can be extended by 3-5 years;

[0219] Economic cost savings: Reducing over-treatment and postoperative recurrence treatment can save the country more than 500 million yuan in medical expenses annually;

[0220] Technology promotion value: The method can be transferred to primary hospitals (hospitals equipped with 3.0T MRI can deploy it), promote the downward flow of high-quality medical resources, and narrow the gap in medical standards between urban and rural areas.

[0221] Since the system / device described in the above embodiments of the present invention is a system / device used to implement the method of the above embodiments of the present invention, those skilled in the art can understand the specific structure and modifications of the system / device based on the method described in the above embodiments of the present invention. Therefore, it will not be repeated here. All systems / devices used in the method of the above embodiments of the present invention are within the scope of protection of the present invention.

[0222] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, apparatus, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Moreover, the present invention can take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0223] It should be noted that, in the description of this specification, the terms "one embodiment," "some embodiments," "embodiment," "example," "specific example," or "some examples," etc., refer to the specific features, structures, materials, or characteristics described in connection with that embodiment or example that are included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Furthermore, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification and the features of the different embodiments or examples.

[0224] Although preferred embodiments of the invention have been described, those skilled in the art, upon learning of the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the claims should be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of the invention.

[0225] Obviously, those skilled in the art can make various modifications and variations to this invention without departing from the spirit and scope of this invention. Therefore, if these modifications and variations of this invention fall within the scope of the claims of this invention and their equivalents, then this invention should also include these modifications and variations.

Claims

1. A colorectal cancer MRI image segmentation method based on multi-dimensional feature fusion, characterized in that, Includes the following steps: S1: Obtain multimodal data of colorectal cancer patients, the multimodal data including: (1) T1-weighted images, T2-weighted images, diffusion-weighted imaging (DWI) images, and dynamic contrast-enhanced MRI (DCEMRI) images; (2) Clinically relevant data, including lifestyle data and functional metabolic data. The lifestyle data includes smoking history, drinking history, diet, exercise habits and constipation history. The functional metabolic data includes the apparent diffusion coefficient ADC mapping map corresponding to the DWI image and the pharmacokinetic parameters corresponding to the DCE image. S2: Preprocess the multimodal data; S3: Construct a multi-scale feature extraction module, input the preprocessed multimodal data into the multi-scale feature extraction module, and extract the feature data of the image under the receptive fields of 1×1, 3×3, 5×5 and 7×7 respectively; S4: Establish a multi-dimensional feature fusion module, which includes a texture feature extraction unit, a shape feature extraction unit, a grayscale feature optimization unit, a modal association feature mining unit, a lifestyle feature extraction unit, a functional metabolic feature extraction unit, and a tissue-specific feature extraction unit. Specifically, the tissue-specific feature extraction unit performs one-hot encoding on the anatomical parts of the colon and rectum, including the rectum, sigmoid colon, descending colon, transverse colon, and ascending colon, generating five 256×256×1 site feature maps. A U-Net++ model is used to segment the intestinal wall to obtain a binary mask, where 1 represents the intestinal wall and 0 represents non-intestinal wall. The shortest distance from each pixel to the intestinal wall is calculated using the Euclidean distance transform algorithm, and values ​​are assigned according to the rules ≤3mm=1, 3-10mm=0.5, and >10mm=0, generating one 256×256×1 distance-constrained feature map. These are then stitched together to form a 256×256×6 tissue-specific feature set. The features extracted in step S3 are then processed to obtain the multi-dimensional feature set. S5: Introduce spatial attention mechanism and channel attention mechanism into the multi-dimensional feature fusion module, perform weight allocation and feature filtering on the multi-dimensional feature set, and output the fused feature map; S6: Input the fused feature map into the U-Net network for feature decoding to obtain preliminary segmentation results; S7: The preliminary segmentation results are post-processed using an improved fully connected conditional random field (CRF) to optimize the segmentation accuracy of the boundary region and obtain the final colorectal cancer MRI image segmentation results. An improved fully connected conditional random field (CRF) employs an extended adaptive potential function, expressed as: ; in: p and q are any two pixels in the image; For the spatial coordinates of the pixels p, q, = 10 spatial distance standard deviation; for the gray values of the pixels p, q, = 5 is the gray difference standard deviation; Fp, Fq are the multi-modal feature values of pixels p, q, = 3 is the modal feature standard deviation; Hp represents the risk coefficient of the living habits of pixel p; Dp is the organization distance constraint coefficient for pixel p; = 0.5, = 0.3, = 0.15, = 0.05 are weight coefficients; By iteratively optimizing the adaptive potential function, the pixel classification results of the blurred boundary region are corrected, and the final segmentation result of the colorectal cancer MRI image is obtained.

2. The multi-dimensional feature fusion based colorectal cancer MRI image segmentation method according to claim 1, characterized in that, The preprocessing of the multimodal data in step S2 includes: (1) Image preprocessing: Using the T2-weighted image as the reference image, the rigid registration algorithm based on mutual information is used to register other modal images. The registration process is reduced from 512×512 to 64×64 by Gaussian pyramid downsampling to reduce the computational complexity and control the registration error within 0.5 pixels. Adaptive Wiener filtering is used to remove Gaussian noise from the image, and the filter window size is set to 3×3. Z-score normalization is used to make the data mean 0 and the standard deviation 1 to achieve grayscale normalization. The region of interest (ROI) containing the lesion area is marked on the T2-weighted image and cropped into an image block of 256×256 pixels. (2) Clinical data preprocessing: The lifestyle data were quantified and encoded, and the ordinal variables were mapped to continuous values ​​of 0-1 to generate a 1×5 lifestyle quantification vector; the functional metabolic data were parameter calculated, and the ADC mapping map was calculated based on the b=0 and b=1000s / mm² sequences in the DWI image. The ADC values ​​were divided into 5 intervals, namely <0.6, 0.6-1.0, 1.0-1.5, 1.5-2.0, and >2.0×10⁻³mm² / s, to generate ADC interval features; the time-signal intensity curve of the DCE image was fitted using the Tofts pharmacokinetic model, and three core parameters were extracted: contrast agent transport constant Ktrans, extracellular space volume ratio Ve, and contrast agent reflux rate Kep. The Min-Max normalization method was used to process the core parameter data to map them to the range of 0 to 1, thereby obtaining the pharmacokinetic parameter vector. 3.The method of claim 1, wherein, S3 includes: A multi-scale feature extraction module is constructed, which contains four parallel convolutional branches. Each branch consists of a convolutional layer, a batch normalization layer, and a ReLU activation function layer in sequence. (1) 1×1 convolutional branch: 64 convolutional kernels, stride 1, used to extract image detail features; (2) 3×3 convolution branch: 64 convolution kernels, stride 1, dilation rate 1, used to extract local features of the lesion area; (3) 5×5 convolutional branch: 64 convolutional kernels, stride 1, dilation rate 2, used to extract medium-scale lesion features; (4) 7×7 convolutional branch: 64 convolutional kernels, stride 1, dilation rate 3, used to extract global features of the lesion area; The 256×256×64 feature maps output from the four branches are concatenated through channels to obtain a 256×256×256 multi-scale feature map.

4. The multi-dimensional feature fusion based colorectal cancer MRI image segmentation method according to claim 1, characterized in that, In step S4, the features extracted in step S3 are processed to obtain a multi-dimensional feature set. The processing steps include: (1) Texture feature extraction unit: Using the gray-level co-occurrence matrix (GLCM), under the conditions of distance d=1 and angle θ=0° / 45° / 90° / 135°, four texture parameters, namely contrast, correlation, energy and entropy, are calculated to generate four 256×256×1 GLCM feature maps; combined with the local binary pattern (LBP), local texture coding is generated by 3×3 neighborhood comparison, and the coding histogram is statistically analyzed to obtain one 256×256×1 LBP feature map; after splicing, a 256×256×5 texture feature set is formed. (2) Shape feature extraction unit: The Canny edge detection algorithm is used, with a high threshold of 0.7 and a low threshold of 0.3 to obtain lesion edge information and generate a 256×256×1 edge feature map; 7 Hu invariant moments are calculated for the lesion area in the edge feature map, and each moment parameter is extended to a 256×256×1 feature map through bilinear interpolation. After splicing, a 256×256×7 shape feature set is formed. (3) Gray-scale feature optimization unit: The contrast-limited adaptive histogram equalization (CLAHE) is adopted to divide the gray-scale range into 64 intervals. The contrast limit factor is set to 2.0 to generate an enhanced gray-scale feature map of 256×256×256. The Sobel operator is used to calculate the gray-scale gradient in the x and y directions respectively. The gradient magnitude is obtained by taking the square root of the sum of squares to generate a gray-scale gradient feature map of 256×256×1. After stitching, a gray-scale feature set of 256×256×257 is formed. (4) Modal correlation feature mining unit: calculate the Pearson correlation coefficients of corresponding pixels in four modal images: T1, T2, DWI, and DCE, and construct a 4×4 modal correlation matrix; expand the 16 correlation coefficients in the matrix into 16 256×256×1 feature maps through bilinear interpolation, and stitch them together to form a 256×256×16 modal correlation feature set; (5) Living Habits Feature Extraction Unit: A two-layer fully connected network is constructed with an input layer of 1×5, a hidden layer of 32 ReLU neurons, and an output layer of 64 ReLU neurons. The living habits quantization vector is encoded into a 1×64 feature vector. A 256×256×64 living habits feature map is generated through a spatial broadcasting strategy. (6) Functional metabolic feature extraction unit: The ADC interval features are mapped into 5 feature maps of 256×256×1, with each interval corresponding to 1 feature map; the normalized Ktrans, Ve, and Kep parameters are generated into 3 feature maps of 256×256×1 through spatial broadcasting; after splicing, a 256×256×8 functional metabolic feature set is formed. (7) Tissue-specific feature extraction unit: For the anatomical parts of the colorectal region, including the rectum, sigmoid colon, descending colon, transverse colon, and ascending colon, one-hot encoding is performed to generate 5 256×256×1 site feature maps; the intestinal wall is segmented using the U-Net++ model to obtain a binary mask, where 1 represents the intestinal wall and 0 represents the non-intestinal wall. The shortest distance from each pixel to the intestinal wall is calculated using the Euclidean distance transform algorithm, and values ​​are assigned according to the rules of ≤3mm=1, 3-10mm=0.5, and >10mm=0 to generate 1 256×256×1 distance constraint feature map; after splicing, a 256×256×6 tissue-specific feature set is formed. By concatenating the above seven feature sets through channels, a multi-dimensional feature pool of 256×256×350 is obtained. 5.The multi-dimensional feature fusion based colorectal cancer MRI image segmentation method according to claim 1, characterized in that, The processing steps of S5 include: (1) Global average pooling is performed on the multi-dimensional feature pool to obtain a 1×1×350 feature vector; a spatial attention weight matrix of size 256×256×350 is generated by a two-layer fully connected network, with 128 ReLU neurons in the hidden layer and 350 Sigmoid neurons in the output layer; for regions with high risk of lifestyle habits and quantization value ≥0.8, abnormal functional metabolism and ADC <0.6×10⁻³mm² / s or Ktrans≥0.3min⁻¹, the spatial weight is increased by 1.2-1.5 times; (2) The squeeze-excitation SE module is used to perform global average pooling on the multi-dimensional feature pool to obtain a 1×1×350 channel feature vector; through a 2-layer fully connected network, the hidden layer contains 32 ReLU neurons and the output layer contains 350 Sigmoid neurons, the scale of the channel attention weight vector is 1×1×350; for the high activity range of ADC: <0.6×10⁻³mm² / s, high Ktrans value, and intestinal wall distance ≤3mm, the channel weight is increased by 1.3 times; (3) The spatial attention weight matrix and the channel attention weight vector are fused element-wise to obtain the final feature weight matrix; the multidimensional feature pool and the feature weight matrix are fused element-wise to output a fused feature map of 256×256×350. 6.The method of claim 1, wherein, The U-Net network described above uses an improved version of the U-Net network for feature decoding. This improved U-Net network includes an encoder and a decoder. (1) Encoder: It contains 4 convolutional blocks. Each convolutional block consists of 2 3×3 convolutional layers, a batch normalization layer and a ReLU activation function layer. The number of output channels of the convolutional layers are 64, 128, 256 and 512 respectively. Adjacent convolutional blocks are downsampled through 2×2 max pooling layers with a pooling stride of 2. Residual connections are introduced in each convolutional block. (2) Decoder: It contains 4 deconvolution blocks. Each deconvolution block consists of a 2×2 transposed convolutional layer, a batch normalization layer and a ReLU activation function layer. The number of input channels of the transposed convolutional layer is 512, 256, 128 and 64 respectively. Upsampling is achieved through transposed convolution. Skip connections are used to fuse the high-resolution features of the corresponding layer of the encoder with the low-resolution features of the decoder. The high-resolution features contain edge details. (3) Output layer: The number of channels is reduced to 1 by a 1×1 convolutional layer, and a 256×256×1 probability map is output by combining the Sigmoid activation function. The pixel value range is 0-1. The closer the value is to 1, the more likely it is to be a lesion area. Regions with pixel values ​​≥ 0.5 in the probability map are marked as lesions, and preliminary segmentation results are obtained.

7. The multi-dimensional feature fusion based colorectal cancer MRI image segmentation method according to claim 1, characterized in that, In step S4, the fully connected network training of the lifestyle feature extraction unit requires the lifestyle quantization vector-lesion label as training data, where 0 represents normal and 1 represents lesion. The cross-entropy loss function and Adam optimizer are used, with a learning rate of 1e-4, β1=0.9, β2=0.999, and 50 training rounds. An early stopping strategy is adopted, and training is stopped if the validation set loss does not decrease for 5 consecutive rounds. 8.The method of claim 1, wherein, The iterative optimization of the improved fully connected CRF in step S7 adopts the mean field approximation algorithm, and the computational complexity of each iteration is controlled at O(N²), where N is the number of image pixels. The post-processing time of a single image is controlled within 500ms by GPU acceleration. 9.A system for colorectal cancer MRI image segmentation based on multi-dimensional feature fusion, comprising a memory, a processor and a computer program stored in the memory and capable of running on the processor, characterized in that, When the processor executes the computer program, it implements the colorectal cancer MRI image segmentation method based on multi-dimensional feature fusion according to any one of claims 1 to 8.