Medical image segmentation method, medical model training method, device and storage medium
By stitching together two-dimensional and three-dimensional features, the problem of low segmentation accuracy and high resource consumption of the prostate organ region in the existing technology is solved, and efficient medical image segmentation and prostate region recognition are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 上海介航机器人有限公司
- Filing Date
- 2022-12-12
- Publication Date
- 2026-06-16
AI Technical Summary
Existing deep learning-based prostate organ region segmentation methods suffer from low segmentation accuracy when using 3D convolution, and adding input feature terms leads to excessive computational resource consumption.
The method of concatenating the target's two-dimensional features and the initial three-dimensional features is adopted. Through a two-dimensional convolutional feature extraction module, a first three-dimensional convolutional feature extraction module, a concatenation module, and a second three-dimensional convolutional feature extraction module, the accuracy of image segmentation is improved and resource consumption is reduced.
This improved the overall accuracy of medical image segmentation, reduced the consumption of computing resources, and achieved precise identification of the prostate region while ensuring accuracy, thus providing a foundation for automated treatment of prostate cancer.
Smart Images

Figure CN115861248B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of image processing technology, and in particular to a medical image segmentation method, a medical model training method, an apparatus, a computer device, a storage medium, and a computer program product. Background Technology
[0002] Currently, most deep learning-based prostate organ region segmentation methods simply use 3D convolution to take the sequence of MRI images as three-dimensional data input, stack 3D convolutions to form an end-to-end convolutional neural network model, and delineate the prostate region on the cross-section of the MRI image, or improve the delineation accuracy on the overall sequence by increasing the number of input feature terms.
[0003] However, since the prostate shown by each slice of the T2W sequence of MRI images is variable, 3D convolution can capture the relationship between slices, but it does not increase the ability to capture slice information. Using 3D convolution alone will cause the model to incompletely capture the information of a certain slice, resulting in low overall segmentation accuracy. On the other hand, adding input feature terms requires a lot of computing resources and image data. Summary of the Invention
[0004] Therefore, it is necessary to provide a medical image segmentation method, medical model training method, device, computer equipment, computer-readable storage medium, and computer program product that can reduce resource consumption while improving image segmentation accuracy, in order to address the above-mentioned technical problems.
[0005] In a first aspect, this application provides a medical image segmentation method, the method comprising:
[0006] Acquire the medical image to be segmented;
[0007] The target two-dimensional features are obtained by extracting features from the medical image to be segmented;
[0008] The initial three-dimensional features are obtained by extracting features from the medical image to be segmented;
[0009] The target two-dimensional features and the initial three-dimensional features are concatenated;
[0010] Feature extraction is performed on the stitched features to obtain the target three-dimensional features, and the segmentation result of the medical image is obtained based on the target three-dimensional features.
[0011] In one embodiment, the step of extracting target two-dimensional features from the medical image to be segmented includes:
[0012] Extract current two-dimensional features with different receptive field sizes from the current input features respectively, wherein the current input features of the first input are generated based on the medical image to be segmented;
[0013] The current two-dimensional features with different receptive field sizes are concatenated to obtain the next input feature, and the next input feature is used as the current input feature. Then, the current two-dimensional features with different receptive field sizes are extracted from the current input feature, and the final output next input feature is used as the target two-dimensional feature.
[0014] In one embodiment, the segmentation result of the medical image is predicted by a pre-trained medical model, which includes a two-dimensional convolutional feature extraction module, a first three-dimensional convolutional feature extraction module, a stitching module, and a second three-dimensional convolutional feature extraction module.
[0015] The step of extracting target two-dimensional features from the medical image to be segmented includes:
[0016] The target two-dimensional features are obtained by extracting features from the medical image to be segmented using the two-dimensional convolution feature extraction module.
[0017] The initial three-dimensional features are obtained by extracting features from the medical image to be segmented, including:
[0018] The first three-dimensional convolutional feature extraction module extracts features from the medical image to be segmented to obtain initial three-dimensional features.
[0019] The step of concatenating the target two-dimensional features and the initial three-dimensional features includes:
[0020] The stitching module stitches together the target two-dimensional features and the initial three-dimensional features.
[0021] The step of extracting features from the stitched features to obtain target 3D features, and obtaining the segmentation result of the medical image based on the target 3D features, includes:
[0022] The second three-dimensional convolutional feature extraction module extracts features from the stitched features to obtain the target three-dimensional features, and the segmentation result of the medical image is obtained based on the target three-dimensional features.
[0023] In one embodiment, the two-dimensional convolutional feature extraction module includes a multi-branch convolutional feature extraction unit and a splicing unit;
[0024] The step of extracting current two-dimensional features with different receptive field sizes from the current input features includes:
[0025] The multi-branch convolutional feature extraction unit extracts current two-dimensional features with different receptive field sizes from the current input features.
[0026] The step of concatenating the current two-dimensional features with different receptive field sizes to obtain the next input feature includes:
[0027] The stitching unit stitches together the current two-dimensional features with different receptive field sizes to obtain the next input feature.
[0028] In one embodiment, the step of concatenating the target two-dimensional feature and the initial three-dimensional feature includes:
[0029] According to the slice index order in the medical image to be segmented, the target two-dimensional features are sorted to obtain the fused three-dimensional features;
[0030] The fused 3D features and the initial 3D features are then stitched together.
[0031] In one embodiment, after acquiring the medical image to be segmented, the method further includes:
[0032] The medical image to be segmented is converted into a format to obtain a three-dimensional medical image and two-dimensional slice images arranged in sequence;
[0033] The step of extracting target two-dimensional features from the medical image to be segmented includes:
[0034] The target two-dimensional features are obtained by performing feature extraction on each of the two-dimensional slice images;
[0035] The initial three-dimensional features are obtained by extracting features from the medical image to be segmented, including:
[0036] Initial three-dimensional features are obtained by feature extraction from the three-dimensional medical image.
[0037] Secondly, this application also provides a medical model training method, the medical model training method comprising:
[0038] Acquire medical sample data, which includes sample medical images and corresponding target labels;
[0039] Extract the sample two-dimensional convolutional features from the medical images;
[0040] The extraction module extracts the first sample three-dimensional convolutional features of the medical image sample;
[0041] The two-dimensional convolutional features of the sample and the three-dimensional convolutional features of the first sample are concatenated to obtain the three-dimensional convolutional features of the second sample;
[0042] Feature extraction is performed on the three-dimensional convolutional features of the second sample, and the model output is obtained based on the feature extraction results;
[0043] A first loss function value is generated based on the two-dimensional convolutional features of the sample and the target label; a second loss function value is generated based on the three-dimensional convolutional features of the first sample and the target label; and a third loss function value is generated based on the model output and the target label.
[0044] The medical model is optimized based on the first loss function value, the second loss function value, and the third loss function value to obtain a trained medical model.
[0045] In one embodiment, generating a first loss function value based on the sample two-dimensional convolutional features and the target label includes:
[0046] The first loss function value is generated based on the two-dimensional convolutional features of the sample and the target label.
[0047] The step of generating a second loss function value based on the 3D convolutional features of the first sample and the target label, and generating a third loss function value based on the model output and the target label, includes:
[0048] A second loss function value is generated based on the three-dimensional convolutional features of the first sample and the target label using the second loss function; a third loss function value is generated based on the model output and the target label using the second loss function.
[0049] In one embodiment, optimizing the medical model based on the first loss function value, the second loss function value, and the third loss function value to obtain a trained medical model includes:
[0050] The two-dimensional convolutional feature extraction module of the medical model is optimized based on the first loss function value, the first three-dimensional convolutional feature extraction module of the medical model is optimized based on the second loss function value, and the second three-dimensional convolutional feature extraction module of the medical model is optimized based on the third loss function value, thereby obtaining the trained medical model.
[0051] In one embodiment, the step of extracting sample two-dimensional convolutional features of the sample medical image through the two-dimensional convolutional feature extraction module of the medical model includes:
[0052] Two-dimensional features of the current sample with different receptive field sizes are extracted from the input features of the current sample, wherein the first input feature of the current sample is generated based on the medical image of the sample;
[0053] The two-dimensional features of the current sample with different receptive field sizes are concatenated to obtain the input features of the next sample. The input features of the next sample are used as the input features of the current sample. The two-dimensional features of the current sample with different receptive field sizes are extracted from the input features of the current sample. The final output of the input features of the next sample is used as the two-dimensional convolutional features of the sample.
[0054] Thirdly, this application also provides a medical image segmentation apparatus, the apparatus comprising:
[0055] The medical image acquisition module is used to acquire the medical image to be segmented.
[0056] The target two-dimensional feature extraction module is used to extract target two-dimensional features from the medical image to be segmented;
[0057] An initial three-dimensional feature extraction module is used to extract features from the medical image to be segmented to obtain initial three-dimensional features;
[0058] The first stitching module is used to stitch the target two-dimensional features and the initial three-dimensional features together;
[0059] The segmentation module is used to extract features from the stitched features to obtain the target three-dimensional features, and to obtain the segmentation result of the medical image based on the target three-dimensional features.
[0060] Fourthly, this application also provides a medical model training device, the medical model training device comprising:
[0061] A medical sample data acquisition module is used to acquire medical sample data, which includes sample medical images and corresponding target labels;
[0062] The sample two-dimensional convolution feature extraction module is used to extract the sample two-dimensional convolution features of the medical image.
[0063] The sample 3D convolution feature extraction module is used to extract the first sample 3D convolution features of the sample medical image;
[0064] The second splicing module is used to splice the two-dimensional convolutional features of the sample and the three-dimensional convolutional features of the first sample to obtain the three-dimensional convolutional features of the second sample.
[0065] The model processing module is used to extract features from the three-dimensional convolutional features of the second sample and obtain the model output based on the feature extraction results.
[0066] The network parameter update module is used to generate a first loss function value based on the two-dimensional convolutional features of the sample and the target label, generate a second loss function value based on the three-dimensional convolutional features of the first sample and the target label, and generate a third loss function value based on the model output and the target label; and optimize the medical model according to the first loss function value, the second loss function value and the third loss function value to obtain the trained medical model.
[0067] Fifthly, this application also provides a computer device, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps of the method described in any of the above embodiments.
[0068] Sixthly, this application also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the method described in any of the above embodiments.
[0069] In a seventh aspect, this application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of the method described in any of the above embodiments.
[0070] The aforementioned medical image segmentation method, medical model training method, device, computer equipment, storage medium, and computer program product extract target two-dimensional features and initial three-dimensional features from the medical image to be segmented. By stitching the target two-dimensional features and the initial three-dimensional features together for image segmentation, the features of individual slices in the initial three-dimensional features can be increased by the target two-dimensional features, thereby improving the overall segmentation accuracy. Furthermore, no input features are required, thus reducing resource consumption while ensuring accuracy. Attached Figure Description
[0071] Figure 1 This is an application environment diagram of a medical image segmentation method in one embodiment;
[0072] Figure 2 This is a flowchart illustrating a medical image segmentation method in one embodiment;
[0073] Figure 3 This is a schematic diagram of the principle of two-dimensional convolution processing in one embodiment;
[0074] Figure 4 This is a schematic diagram of the target's two-dimensional features in one embodiment;
[0075] Figure 5 This is a schematic diagram of the principle of three-dimensional convolution processing in one embodiment;
[0076] Figure 6This is a schematic diagram of the splicing and fusion process in one embodiment;
[0077] Figure 7 This is a schematic diagram showing the result of the contour line coordinate calculation step in one embodiment;
[0078] Figure 8 This is a schematic diagram of the outline in one embodiment;
[0079] Figure 9 This is a schematic diagram of dilated convolution in one embodiment;
[0080] Figure 10 This is a schematic diagram of the structure of a two-dimensional convolutional feature extraction module in one embodiment;
[0081] Figure 11 This is a schematic diagram of the structure of a convolutional block in one embodiment;
[0082] Figure 12 This is a schematic diagram of the structure of a medical model in one embodiment;
[0083] Figure 13 This is a schematic diagram of the structure of a three-dimensional convolutional feature extraction module in one embodiment;
[0084] Figure 14 This is a schematic diagram of the format processing steps of a medical image to be segmented in one embodiment;
[0085] Figure 15 This is a schematic diagram of a prostate MRI image segmentation process in one embodiment;
[0086] Figure 16 This is a flowchart illustrating a medical model training method in one embodiment;
[0087] Figure 17 This is a schematic diagram illustrating the calculation of the first loss function in one embodiment;
[0088] Figure 18 This is a schematic diagram illustrating the calculation of the second loss function in one embodiment;
[0089] Figure 19 This is a schematic diagram of the training process in one embodiment;
[0090] Figure 20 This is a structural block diagram of a medical image segmentation device in one embodiment;
[0091] Figure 21 This is a structural block diagram of a medical model training device in one embodiment;
[0092] Figure 22 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation
[0093] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0094] The medical image segmentation method and medical model training method provided in this application can be applied to, for example... Figure 1 In the application environment shown, terminal 102 communicates with medical imaging device 104 via a network. Terminal 102 can receive the medical image to be segmented scanned by medical imaging device 104; extract target two-dimensional features from the medical image to be segmented; extract initial three-dimensional features from the medical image to be segmented; stitch the target two-dimensional features and the initial three-dimensional features together; extract features from the stitched features to obtain target three-dimensional features; and obtain the segmentation result of the medical image based on the target three-dimensional features. By stitching the target two-dimensional features and the initial three-dimensional features together before image segmentation, the features of individual slices in the initial three-dimensional features can be increased by the target two-dimensional features, thereby improving the overall segmentation accuracy. Furthermore, no input features are required, thus reducing resource consumption while ensuring accuracy.
[0095] The terminal 102 can be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The medical imaging device 104 includes, but is not limited to, various imaging devices, such as CT imaging devices (CT: Computed Tomography, which uses precisely collimated X-ray beams and highly sensitive detectors to perform a series of cross-sectional scans around a part of the human body, and can reconstruct precise three-dimensional images of tumors, etc.), magnetic resonance imaging devices (a type of tomographic imaging that uses magnetic resonance to obtain electromagnetic signals from the human body and reconstruct human body information images), positron emission tomography (PET / MR) devices, positron emission tomography / magnetic resonance imaging systems, etc. The medical images to be segmented can be image sequences acquired by the medical imaging device 104, such as T2W sequences of MRI images.
[0096] In one embodiment, such as Figure 2 As shown, a medical image segmentation method is provided, which can be applied to... Figure 1 Taking the terminal in the example, the explanation includes the following steps:
[0097] S202: Obtain the medical image to be segmented.
[0098] Specifically, the medical image to be segmented is an image that requires target segmentation. It can be an image sequence acquired by a medical imaging device, such as a T2W sequence of MRI images. For example, when identifying the prostate, a prostate medical image is acquired by a medical imaging device. In other embodiments, the medical image to be segmented can be a medical image corresponding to other tissues or organs. The corresponding medical target, i.e., tissue or organ, can be segmented from the medical image to be segmented through the processing described below. This allows for subsequent surgical planning based on the segmentation results, thereby improving surgical efficiency.
[0099] S204: Extract the target two-dimensional features from the medical image to be segmented.
[0100] Specifically, the target two-dimensional features are extracted for each slice of the medical image to be segmented. For example, each slice is treated as a two-dimensional image, and then two-dimensional convolution is performed on the two-dimensional image to extract two-dimensional features. The terminal extracts features from each slice of the medical image to be segmented, resulting in the target two-dimensional features for each slice.
[0101] Specifically, in combination Figure 3 As shown, Figure 3 This is a schematic diagram of the principle of two-dimensional convolution processing in one embodiment. In this embodiment, each slice of image is regarded as a two-dimensional image, and then a convolution operation is performed on the two-dimensional image using a convolution kernel of size 3*3 to extract features.
[0102] In addition, combined Figure 4 As shown, Figure 4 This is a schematic diagram of the target two-dimensional features in one embodiment, where the 11th slice image of the patient is selected. Here, "image" refers to the image, "label" is the labeled image, and "output" is the output prediction image, i.e., the target two-dimensional features.
[0103] S206: Extract features from the medical image to be segmented to obtain initial three-dimensional features.
[0104] Specifically, the initial three-dimensional features are extracted from all slices of the medical image to be segmented, that is, the medical image to be segmented is a three-dimensional image. For example, the T2W sequence of MRI images is regarded as a three-dimensional image, and then the three-dimensional image is subjected to three-dimensional convolution processing to obtain the initial three-dimensional features.
[0105] Specifically, in combination Figure 5 As shown, Figure 5 This is a schematic diagram of the principle of three-dimensional convolution processing in one embodiment. In this embodiment, the three-dimensional image to be segmented is subjected to three-dimensional convolution operation using a three-dimensional convolution with a kernel size of 3*3 to extract three-dimensional features.
[0106] S208: Combine the target's two-dimensional features with the initial three-dimensional features.
[0107] Specifically, stitching refers to combining the target two-dimensional features and the initial three-dimensional features. To do this, the target two-dimensional features are first converted into fused three-dimensional features, and then the fused three-dimensional features are stitched together with the initial three-dimensional features, for example, by connecting them together. Taking prostate MRI image data as an example, this method, with limited prostate MRI image data and low computational resources, increases the prostate region features output by stitching the target two-dimensional features and the initial three-dimensional features, adding target two-dimensional features at the same location in the slice, thus improving the model's prostate segmentation accuracy.
[0108] Specifically, in combination Figure 6 As shown, Figure 6 This is a schematic diagram of the stitching and fusion process in one embodiment. In this embodiment, the target two-dimensional features are first converted into fused three-dimensional features, and then the fused three-dimensional features are stitched together with the initial three-dimensional features. The fused three-dimensional features are obtained by sorting and stacking according to the index order of each slice image. That is, the target two-dimensional features corresponding to each slice image are sorted according to the index order of the slice images, thereby adding a dimension to obtain the fused three-dimensional features. In other words, if the number of slice images is N, there are N target two-dimensional features. Stacking and converting these N target two-dimensional features yields a feature dimension consistent with the initial three-dimensional features. Stitching the two together results in a stitched feature count of 2N.
[0109] S210: Extract features from the stitched features to obtain the target three-dimensional features, and obtain the segmentation result of the medical image based on the target three-dimensional features.
[0110] The target 3D feature is the result of performing 3D convolution on the stitched features. This 3D convolution process may be the same as or different from the 3D convolution process of the initial 3D feature.
[0111] Finally, the segmentation result of the medical image is obtained based on the extracted target 3D features. Specifically, the medical image segmentation result can be obtained by classifying based on the target 3D features, for example, by processing through a classification network. This classification network can include convolutional layers and sigmoid layers. The convolutional layer is a 1*1 convolution with the number of convolutional channels set to 1, and the sigmoid layer normalizes the output probability.
[0112] Specifically, in combination Figure 7 As shown, Figure 7 This is a schematic diagram showing the result of the contour line coordinate calculation step in one embodiment. In this embodiment, the segmentation result of the obtained medical image is... Figure 7The white area in the image, which is the prostate organ, is treated as a binary image. A threshold segmentation algorithm is used to find the coordinate points of the outer boundary of the image and save the coordinate points of the outer contour of the prostate.
[0113] Combination Figure 8 As shown, Figure 8 This is a schematic diagram of the contour lines in one embodiment. In this embodiment, for the binary image output by the medical model, the coordinates of the outer contour points are first found, and then the contour point coordinates are written onto the MRI image to display an image with contour lines.
[0114] The aforementioned medical image segmentation method extracts target two-dimensional features and initial three-dimensional features from the medical image to be segmented. These two-dimensional features are then concatenated with the initial three-dimensional features before image segmentation. This method enhances the overall segmentation accuracy by adding features to individual slices in the initial three-dimensional features using the target two-dimensional features, thus eliminating the need for input features and reducing resource consumption while maintaining accuracy. Furthermore, accurate identification of the prostate region is fundamental to determining prostate cancer areas, laying the foundation for automated prostate cancer treatment and further advancing the application of deep learning in the medical industry.
[0115] In one embodiment, feature extraction of the medical image to be segmented to obtain target two-dimensional features includes: extracting current two-dimensional features of different receptive field sizes from the current input features, wherein the first input current input feature is generated based on the medical image to be segmented; concatenating the current two-dimensional features of different receptive field sizes to obtain the next input feature, and using the next input feature as the current input feature, and continuing to extract current two-dimensional features of different receptive field sizes from the current input features, and finally using the last output next input feature as the target two-dimensional feature.
[0116] The receptive field refers to the input region "seen" by neurons in a neural network. In a convolutional neural network, the computation of a certain element on a feature map is influenced by a certain region on the input image; this region is the receptive field of that element. In this embodiment, dilated convolution is used to extract current two-dimensional features with different receptive field sizes. Specifically, combined with... Figure 9 As shown, Figure 9 This is a schematic diagram of dilated convolution in one embodiment, wherein... Figure 3 As shown, a regular convolution with a kernel size of 3×3 has a field of view of only 3×3 on the feature map. However, a dilated convolution with a kernel size of 3×3 and an inflation factor of 1 can capture features in a range of 5×5 on the feature map. This means that without increasing the amount of computation, a larger range of features can be captured.
[0117] Specifically, in combination Figure 10 and Figure 11 As shown, Figure 10 This is a schematic diagram of the structure of a two-dimensional convolutional feature extraction module in one embodiment. Figure 11 This is a schematic diagram of the structure of a convolutional block in one embodiment; the two-dimensional convolutional feature extraction module includes a multi-branch convolutional feature extraction unit and a splicing unit.
[0118] In one optional embodiment, extracting current two-dimensional features of different receptive field sizes from the current input features includes: extracting current two-dimensional features of different receptive field sizes from the current input features using a multi-branch convolutional feature extraction unit; and concatenating the current two-dimensional features of different receptive field sizes to obtain the next input feature includes: concatenating the current two-dimensional features of different receptive field sizes using a concatenation unit to obtain the next input feature.
[0119] Combination Figure 11 By combining features acquired with different receptive field sizes in a branching manner, different outputs can be obtained for the same input. After concatenating the outputs, different representations of the same location can be added. Furthermore, by learning without changing the image size, some pixel information can be preserved. Figure 11 The diagram shows two branches, but in other embodiments there may be other numbers of branches, which are not specifically limited here, such as... Figure 10 The convolutional block includes a 3x3 ordinary convolutional branch and a dilated convolutional branch with an inflation factor of 1. Each branch is followed by a 1x1 ordinary convolution to reduce the number of feature maps. Finally, the feature maps are concatenated by a concatenation unit to obtain the output of the convolutional block, which is the next input feature.
[0120] The process begins by generating the first current input feature based on the medical image to be segmented. This first current input feature is then input into the first convolutional block. Feature extraction is performed through the first convolutional block to obtain the next input feature, which is then used as the current input feature and continued to be input into the next convolutional block until the next input feature output from the last convolutional block is obtained. This last output next input feature is used as the target two-dimensional feature. Then, in the output layer, an activation function is used to convert the predicted feature values into a probability matrix, retaining values greater than a threshold of 0.6, resulting in a binary image where organs and tissues are represented as 1s and the rest as 0s, which is the result of the two-dimensional convolution.
[0121] In this embodiment, by designing convolutional blocks that enhance the ability to capture feature information, the segmentation of tissues and organs by two-dimensional convolution is further increased to obtain better results. The slices of each sequence are gradually converted into input features according to the index and input to the input layer. The number of feature maps of two-dimensional convolution is transformed. Then, through multiple designed convolutional blocks, the features of tissues and organs are extracted. In the output layer, the predicted feature values are converted into probability matrices using activation functions. Values greater than 0.6 are retained to obtain a binary map in which tissues and organs are 1 and the rest are 0.
[0122] In the above embodiments, designing convolutional blocks with different receptive field feature extraction can enrich the spatial information of organ tissues by enhancing the features of different ranges at the same location without increasing the amount of data, thereby improving the feature extraction capability of the two-dimensional convolutional structure for the slice layer and obtaining better two-dimensional segmentation results.
[0123] In one embodiment, the segmentation result of the medical image is predicted by a pre-trained medical model, which includes a two-dimensional convolutional feature extraction module, a first three-dimensional convolutional feature extraction module, a stitching module, and a second three-dimensional convolutional feature extraction module. The process involves: extracting target two-dimensional features from the medical image to be segmented using the two-dimensional convolutional feature extraction module; extracting initial three-dimensional features from the medical image to be segmented using the first three-dimensional convolutional feature extraction module; stitching the target two-dimensional features and the initial three-dimensional features together using the stitching module; extracting target three-dimensional features from the stitched features; and obtaining the segmentation result of the medical image based on the target three-dimensional features using the second three-dimensional convolutional feature extraction module.
[0124] Combination Figure 12 As shown, Figure 12 This is a schematic diagram of the structure of a medical model in one embodiment. In this embodiment, the medical model includes a two-dimensional convolutional feature extraction module, a first three-dimensional convolutional feature extraction module, a splicing module, and a second three-dimensional convolutional feature extraction module.
[0125] The two-dimensional convolutional feature extraction module is used to extract target two-dimensional features from the medical image to be segmented. Specifically, the two-dimensional convolutional feature extraction module reads the entire sequence of the medical image to be segmented and then converts it into the first current input feature. Specifically, according to the index of the sequence, each slice image is converted sequentially to obtain the first current input feature corresponding to each slice image. The slice images are processed sequentially according to their indices. For convenience, only the first slice image is used as an example. The first slice image is processed by each convolutional block to obtain the target two-dimensional feature. Specifically, the target two-dimensional feature can be a binary image. That is, in the output layer, the predicted feature values are converted into a probability matrix using an activation function, and values greater than 0.6 are retained to obtain a binary image where organs and tissues are 1 and the rest are 0. Similarly, the corresponding binary images are obtained for other slice images.
[0126] Both the first and second 3D convolutional feature extraction modules are used to extract 3D features. The first module processes the medical image to be segmented to extract initial 3D features, while the second module processes the stitched features to further extract target 3D features. Specifically, combined with... Figure 13 As shown, Figure 13 This is a schematic diagram of the structure of a three-dimensional convolutional feature extraction module in one embodiment, wherein the structures of the first three-dimensional convolutional feature extraction module and the second three-dimensional convolutional feature extraction module can both adopt... Figure 13 In other embodiments, the structures of the first three-dimensional convolutional feature extraction module and the second three-dimensional convolutional feature extraction module may also adopt other structures, and the structures of the two modules may be the same or different, without specific limitations.
[0127] like Figure 13 As shown, the 3D convolutional feature extraction module uses a basic 3D Unet structure to extract features from the segmented medical image and the stitched features. This includes an input layer, convolutional layers, upsampling layers, downsampling layers, and a final output layer that transforms the number of channels in the input features. Downsampling reduces the image size (while increasing the number of channels with each downsampling step), while upsampling is the opposite. Upsampling and downsampling, by varying the scale, can increase the number of channels to achieve a clearer representation of the features and also utilize shallow feature information. Feature map stitching combines the output of the first layer with the upsampled output of the penultimate layer, enabling feature reuse.
[0128] For splicing modules, please refer to Figure 6 As shown, in one embodiment, the target two-dimensional features and the initial three-dimensional features are spliced together, including: sorting each target two-dimensional feature according to the slice index order in the medical image to be segmented to obtain the fused three-dimensional features; and splicing the fused three-dimensional features and the initial three-dimensional features together.
[0129] Specifically, according to the slice index order in the medical image to be segmented, the two-dimensional features of each target are stacked to obtain a three-dimensional fused feature. For example, if the number of slices is N, there are N target two-dimensional features. Stacking these N target two-dimensional features can obtain a feature dimension consistent with the initial three-dimensional feature result. The two are then stitched together to obtain a stitched feature with a number of 2N.
[0130] Finally, the stitched features are input into the second 3D convolutional feature extraction module. After 3D convolutional feature extraction, the segmentation result of the medical image to be segmented is obtained.
[0131] This method of compensating for 3D convolution output with 2D convolution increases the image information of each layer, achieving accurate segmentation of the prostate without requiring a large amount of patient image data. The segmented regions can accurately reconstruct the 3D structure, reducing the time doctors spend manually identifying the prostate and improving work efficiency. Furthermore, by using 2D convolution output to stitch together 3D convolution output, the weakening of 2D extracted features caused by directly stitching together the original image data can be reduced, thus minimizing the impact on the utilization of 2D convolution output features.
[0132] In one embodiment, after acquiring the medical image to be segmented, the method further includes: converting the format of the medical image to be segmented to obtain a three-dimensional medical image and two-dimensional slice images arranged in sequence; extracting features from the medical image to be segmented to obtain target two-dimensional features, including: extracting features from each two-dimensional slice image to obtain target two-dimensional features; and extracting features from the medical image to be segmented to obtain initial three-dimensional features, including: extracting features from the three-dimensional medical image to obtain initial three-dimensional features.
[0133] Combination Figure 14 As shown, Figure 14 This is a schematic diagram of the format processing steps of a medical image to be segmented in one embodiment. In this embodiment, the format of the medical image to be segmented is dicom (Digital Imaging and Communications in Medicine), which is a series of two-dimensional slice images.
[0134] To obtain a 3D medical image from a segmented medical image, the format can be converted. This can be done by converting the DICOM format to the Nifti format, which stores the individual slice images as 3D image data. In other words, converting the DICOM image to Nifti format allows for 3D convolution to extract prostate features, while directly reading the DICOM images sequentially according to the slice image order allows for 2D convolution to extract features.
[0135] Specifically, in combination Figure 15 As shown, Figure 15This is a schematic diagram of a prostate MRI image segmentation process in one embodiment. In this embodiment, raw image data is acquired and processed into two-dimensional (2D) and three-dimensional (3D) images. The 2D image is input to a 2D convolutional feature extraction module to obtain target 2D features, and the 3D image is input to a first 3D convolutional feature extraction module to obtain initial 3D features. Then, the target 2D features and the initial 3D features are concatenated, and the concatenated features are input to a second 3D convolutional feature extraction module to obtain target 3D features. The target 2D features are the segmentation results of each slice layer. First, the target 2D features are stacked according to their indices and then converted into fused 3D features. These two segmentation results are then concatenated and fused as input features for subsequent 3D convolution, used to further extract prostate information from the fused features, and outputting the 3D segmentation result of the prostate.
[0136] In the above embodiments, the two-dimensional convolution compensates for the three-dimensional convolution output method, which increases the image information of each layer. It can achieve accurate segmentation of the prostate without a large amount of patient image data. The three-dimensional structure can be accurately reconstructed through the segmented region, reducing the time doctors spend manually identifying the prostate and improving work efficiency.
[0137] In one embodiment, such as Figure 16 As shown, a medical model training method is provided, which can be applied to... Figure 1 Taking the terminal in the example, the explanation includes the following steps:
[0138] S1602: Obtain medical sample data, which includes sample medical images and corresponding target labels.
[0139] S1604: Extract sample two-dimensional convolutional features from sample medical images.
[0140] S1606: The extraction module extracts the first sample of the three-dimensional convolutional features of the medical image.
[0141] S1608: The two-dimensional convolutional features of the sample and the three-dimensional convolutional features of the first sample are concatenated to obtain the three-dimensional convolutional features of the second sample.
[0142] S1610: Extract features from the three-dimensional convolutional features of the second sample, and obtain the model output based on the feature extraction results.
[0143] In one optional embodiment, the sample two-dimensional convolutional features of the sample medical image are extracted by the two-dimensional convolutional feature extraction module of the medical model, including: extracting two-dimensional features of the current sample with different receptive field sizes from the current sample input features, wherein the first input current sample input features are generated based on the sample medical image; concatenating the two-dimensional features of the current sample with different receptive field sizes to obtain the next sample input features, and using the next sample input features as the current sample input features, and continuing to extract two-dimensional features of the current sample with different receptive field sizes from the current sample input features, and finally using the last output next sample input features as the sample two-dimensional convolutional features.
[0144] Specifically, the limitations of the sample medical images can be found in the medical images to be segmented, the extraction of sample two-dimensional convolutional features can be found in the target two-dimensional convolutional features mentioned above, the extraction of the three-dimensional convolutional features of the first sample can be found in the initial three-dimensional convolutional features mentioned above, the extraction of the convolutional features of the second sample can be found in the target three-dimensional convolutional features mentioned above, and the limitations of the model output results can be found in the limitations of the medical image segmentation results mentioned above, and will not be limited here.
[0145] The target label is the region of the target in the sample medical image. Specifically, for convenience, the target label is the region of the target in each slice image, so the target label includes not only the region of the two-dimensional target, but also the region of the three-dimensional target.
[0146] S1612: Generate a first loss function value based on the two-dimensional convolutional features of the sample and the target label; generate a second loss function value based on the three-dimensional convolutional features of the first sample and the target label; generate a third loss function value based on the model output and the target label; optimize the medical model based on the first loss function value, the second loss function value and the third loss function value to obtain the trained medical model.
[0147] In one embodiment, optimizing the medical model based on the first loss function value, the second loss function value, and the third loss function value to obtain a trained medical model includes: optimizing the two-dimensional convolutional feature extraction module of the medical model based on the first loss function value, optimizing the first three-dimensional convolutional feature extraction module of the medical model based on the second loss function value, and optimizing the second three-dimensional convolutional feature extraction module of the medical model based on the third loss function value to obtain a trained medical model.
[0148] Specifically, in combination Figure 12As shown, for the extracted tissue and organ results from 2D and 3D images, different loss functions are used to calculate the losses of the parallel 2D and 3D convolutional channels, respectively. Dual-loss parallel backpropagation is implemented, and gradient updates are performed on both convolutional channels simultaneously, supervising the learning of each channel. Subsequently, loss values are calculated for the second 3D convolutional feature extraction module to optimize it.
[0149] In this process, 2D and 3D convolutions are performed in parallel, with each channel operating independently and simultaneously. The 2D convolution output uses cross-entropy to calculate the loss (Loss1), while the 3D convolution output uses Dice Loss to calculate the loss (Loss2). Loss1 is then passed back to the 2D channel for gradient updates, and Loss2 is passed back to the 3D channel for gradient updates. Subsequently, Loss3 is passed back to the second 3D convolution feature extraction module for gradient updates.
[0150] In the above embodiments, the losses of the parallel two-dimensional convolution and three-dimensional convolution channels are calculated using different loss functions, and dual-loss parallel backpropagation is implemented. Gradient updates are performed on the two convolution channels at the same time, and the learning of the two convolution channels is supervised separately. This can reduce the influence between the two-dimensional convolution and the three-dimensional convolution, thereby improving accuracy.
[0151] In one embodiment, generating a first loss function value based on the sample's two-dimensional convolutional features and the target label includes: generating a first loss function value based on the sample's two-dimensional convolutional features and the target label using the first loss function; generating a second loss function value based on the first sample's three-dimensional convolutional features and the target label; and generating a third loss function value based on the model output and the target label includes: generating a second loss function value based on the first sample's three-dimensional convolutional features and the target label using the second loss function; and generating a third loss function value based on the model output and the target label using the second loss function.
[0152] Specifically, in combination Figure 17 As shown, Figure 17 This is a schematic diagram illustrating the calculation of the first loss function in one embodiment. In one embodiment, the first loss function is the cross-entropy loss function, wherein the cross-entropy calculation formula is:
[0153] L=-[pt*log(pr)+(1-pt)*log(1-pr)]
[0154] Where L is the first loss function value, pt is the target label, and pr is the segmentation result corresponding to the target 2D features. In other words, the first loss function value is generated based on the target label and the segmentation result corresponding to the target 2D features. Therefore, the formula for calculating the cross-entropy of the entire graph is:
[0155] L=∑-[pt*log(pr)+(1-pt)*log(1-pr)]
[0156] The total loss is obtained by summing the cross-entropy calculated for the entire graph. This application uses the average value, that is, the average of the summed loss values over the entire graph, which is used as the loss value calculated for this output. Other methods may be used in other embodiments, and no specific limitation is made here.
[0157] Specifically, in combination Figure 18 As shown, Figure 18 This is a schematic diagram illustrating the calculation of the second loss function in one embodiment. In one embodiment, the second loss function is the Dice loss function, where Dice is used to obtain the similarity between two images, and its calculation method is as follows:
[0158] Dice = 2 * ∑T * P / ∑(T + P)
[0159] DiceLoss = 1 - Dice
[0160] Where DiceLoss is the second or third loss function value, T is the target label, and P is the segmentation result generated by the initial 3D convolutional features or the target 3D convolutional features. That is, the second loss function value is obtained based on the segmentation result generated by the target label and the initial 3D convolutional features, and the third loss function value is obtained based on the segmentation result generated by the target label and the target 3D convolutional features. The numerator only retains the predicted value of the target region, that is, the segmentation result generated by the initial 3D convolutional features or the target 3D convolutional features. In other words, the smaller DiceLoss is, the higher the Dice value is, and the closer the two images are.
[0161] In the above embodiments, different loss functions are used to calculate the loss values of the two convolutional channels, and the network parameters are updated separately for each channel. This reduces the influence between the two (because the segmentation results are different) and further improves the acquisition of the target 2D features and the initial 3D features. The deviations between the target 2D features and the initial 3D features and the real prostate region are calculated using Cross Entropy Loss and Dice Loss, respectively, and fed back to the different convolutional channels in the parallel part. Gradient updates are performed on the different convolutional channels simultaneously to standardize the deep learning model's learning of the prostate region from the overall T2W sequence of the MRI image.
[0162] Specifically, in combination Figure 19 As shown, Figure 19This is a schematic diagram of the training process in one embodiment. In this embodiment, a dataset is acquired, and then the data is processed and transformed. Two-dimensional and three-dimensional training are performed separately using dual convolutional channels. The two-dimensional and three-dimensional results are then used to calculate loss values using different loss functions. The different loss values are used to update the parameters of the corresponding convolutional channels. In addition, the two-dimensional and three-dimensional segmentation results are stitched together and fused. Subsequently, the three-dimensional convolution further extracts features from the stitched and fused features and outputs the three-dimensional segmentation results. Thus, the optimal result is selected and saved after multiple training sessions.
[0163] In the above embodiments, the loss values of the two convolutional channels are calculated using different loss functions, and the network parameters are updated separately for each channel. This can reduce the influence between the two channels and further improve the acquisition of the target two-dimensional features and the initial three-dimensional features.
[0164] It should be understood that although the steps in the flowcharts of the above embodiments are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the above embodiments may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.
[0165] Based on the same inventive concept, this application also provides a medical image segmentation apparatus and a medical model training apparatus for implementing the aforementioned medical image segmentation method and medical model training method. The solution provided by this apparatus is similar to the implementation scheme described in the above-described method. Therefore, the specific limitations of one or more embodiments of the medical image segmentation apparatus and medical model training apparatus provided below can be found in the limitations of the medical image segmentation method and medical model training method described above, and will not be repeated here.
[0166] In one embodiment, such as Figure 20 As shown, a medical image segmentation apparatus is provided, comprising:
[0167] The medical image acquisition module is used to acquire the medical image to be segmented.
[0168] The target two-dimensional feature extraction module is used to extract target two-dimensional features from the medical image to be segmented.
[0169] The initial 3D feature extraction module is used to extract initial 3D features from the medical image to be segmented.
[0170] The first stitching module is used to stitch together the target's two-dimensional features and the initial three-dimensional features;
[0171] The segmentation module is used to extract features from the stitched features to obtain the target's three-dimensional features, and to obtain the segmentation result of the medical image based on the target's three-dimensional features.
[0172] In one embodiment, the target two-dimensional feature extraction module includes:
[0173] The extraction unit is used to extract current two-dimensional features with different receptive field sizes from the current input features, wherein the current input features of the first input are generated based on the medical image to be segmented;
[0174] The concatenation unit is used to concatenate the current two-dimensional features with different receptive field sizes to obtain the next input feature, and use the next input feature as the current input feature. It then continues to extract the current two-dimensional features with different receptive field sizes from the current input feature, and finally outputs the next input feature as the target two-dimensional feature.
[0175] In one embodiment, the segmentation result of the medical image is predicted by a pre-trained medical model, which includes a two-dimensional convolutional feature extraction module, a first three-dimensional convolutional feature extraction module, a stitching module, and a second three-dimensional convolutional feature extraction module.
[0176] The aforementioned target two-dimensional feature extraction module is also used to extract target two-dimensional features from the medical image to be segmented through the two-dimensional convolution feature extraction module;
[0177] The aforementioned initial three-dimensional feature extraction module is also used to extract initial three-dimensional features from the medical image to be segmented through the first three-dimensional convolutional feature extraction module;
[0178] The first stitching module mentioned above is also used to stitch together the target two-dimensional features and the initial three-dimensional features through the stitching module;
[0179] The segmentation module is also used to extract target three-dimensional features from the stitched features through the second three-dimensional convolutional feature extraction module, and to obtain the segmentation result of the medical image based on the target three-dimensional features.
[0180] In one embodiment, the two-dimensional convolutional feature extraction module includes a multi-branch convolutional feature extraction unit and a splicing unit;
[0181] The aforementioned extraction unit is also used to extract current two-dimensional features with different receptive field sizes from the current input features through the multi-branch convolutional feature extraction unit;
[0182] The aforementioned splicing unit is also used to splice together the current two-dimensional features with different receptive field sizes to obtain the next input feature.
[0183] In one embodiment, the first stitching module is further configured to sort the target two-dimensional features according to the slice index order in the medical image to be segmented to obtain fused three-dimensional features; and stitch the fused three-dimensional features and the initial three-dimensional features together.
[0184] In one embodiment, the above-described apparatus further includes a preprocessing module, which is used to perform format conversion on the medical image to be segmented to obtain a three-dimensional medical image and a two-dimensional slice image arranged in sequence.
[0185] The target two-dimensional feature extraction module is also used to extract features from each two-dimensional slice image to obtain the target two-dimensional features;
[0186] The initial 3D feature extraction module is also used to extract features from 3D medical images to obtain initial 3D features.
[0187] In one embodiment, such as Figure 21 As shown, a medical model training device is provided, comprising:
[0188] The medical sample data acquisition module is used to acquire medical sample data, which includes medical images of the samples and corresponding target labels.
[0189] The sample 2D convolution feature extraction module is used to extract the sample 2D convolution features of medical images.
[0190] The sample 3D convolution feature extraction module is used to extract the first sample 3D convolution features of the medical image.
[0191] The second splicing module is used to splice the two-dimensional convolutional features of the sample and the three-dimensional convolutional features of the first sample to obtain the three-dimensional convolutional features of the second sample.
[0192] The model processing module is used to extract features from the three-dimensional convolutional features of the second sample and obtain the model output based on the feature extraction results.
[0193] The network parameter update module is used to generate a first loss function value based on the two-dimensional convolutional features of the samples and the target label, a second loss function value based on the three-dimensional convolutional features of the first samples and the target label, and a third loss function value based on the model output and the target label. The medical model is then optimized based on the first, second, and third loss function values to obtain the trained medical model.
[0194] In one embodiment, the training module is further configured to generate a first loss function value based on the two-dimensional convolutional features of the sample and the target label using a first loss function; generate a second loss function value based on the three-dimensional convolutional features of the first sample and the target label using a second loss function; and generate a third loss function value based on the model output and the target label using the second loss function.
[0195] In one embodiment, the training module is further configured to extract two-dimensional features of the current sample with different receptive field sizes from the current sample input features, wherein the first input current sample input features are generated based on the sample medical image; the two-dimensional features of the current sample with different receptive field sizes are concatenated to obtain the next sample input features, and the next sample input features are used as the current sample input features, and the two-dimensional features of the current sample with different receptive field sizes are extracted from the current sample input features, and the last output next sample input features are used as the sample two-dimensional convolution features.
[0196] Each module in the aforementioned medical image segmentation device and medical model training device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device in hardware form, or stored in the memory of a computer device in software form, so that the processor can call and execute the corresponding operations of each module.
[0197] In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as follows: Figure 22As shown, the computer device includes a processor, memory, input / output interface, communication interface, display unit, and input device. The processor, memory, and input / output interface are connected via a system bus, and the communication interface, display unit, and input device are also connected to the system bus via the input / output interface. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system and computer programs. The internal memory provides an environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The input / output interface is used for exchanging information between the processor and external devices. The communication interface is used for wired or wireless communication with external terminals; wireless communication can be achieved through Wi-Fi, mobile cellular networks, NFC (Near Field Communication), or other technologies. When executed by the processor, the computer program implements a medical image segmentation method and a medical model training method. The display unit is used to form a visually visible image and can be a display screen, projection device, or virtual reality imaging device. The display screen can be an LCD screen or an e-ink screen. The input device of the computer device can be a touch layer covering the display screen, or buttons, trackballs, or touchpads set on the casing of the computer device, or external keyboards, touchpads, or mice, etc.
[0198] Those skilled in the art will understand that Figure 22 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.
[0199] In one embodiment, a computer device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps in the above method embodiments.
[0200] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon that, when executed by a processor, implements the steps in the above method embodiments.
[0201] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps in the above method embodiments.
[0202] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.
[0203] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0204] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.
Claims
1. A medical image segmentation method, characterized in that, The method includes: Acquire the medical image to be segmented; The target two-dimensional features are obtained by extracting features from the medical image to be segmented; The initial three-dimensional features are obtained by extracting features from the medical image to be segmented; The target two-dimensional features and the initial three-dimensional features are concatenated; Feature extraction is performed on the stitched features to obtain the target three-dimensional features, and the segmentation result of the medical image is obtained based on the target three-dimensional features; The step of extracting target two-dimensional features from the medical image to be segmented includes: Extract current two-dimensional features with different receptive field sizes from the current input features respectively, wherein the current input features of the first input are generated based on the medical image to be segmented; The current two-dimensional features with different receptive field sizes are concatenated to obtain the next input feature, and the next input feature is used as the current input feature. Then, the current two-dimensional features with different receptive field sizes are extracted from the current input feature, and the final output next input feature is used as the target two-dimensional feature.
2. The method according to claim 1, characterized in that, The segmentation result of the medical image is predicted by a pre-trained medical model, which includes a two-dimensional convolutional feature extraction module, a first three-dimensional convolutional feature extraction module, a stitching module, and a second three-dimensional convolutional feature extraction module. The step of extracting target two-dimensional features from the medical image to be segmented includes: The target two-dimensional features are obtained by extracting features from the medical image to be segmented using the two-dimensional convolution feature extraction module. The initial three-dimensional features are obtained by extracting features from the medical image to be segmented, including: The first three-dimensional convolutional feature extraction module extracts features from the medical image to be segmented to obtain initial three-dimensional features. The step of concatenating the target two-dimensional features and the initial three-dimensional features includes: The stitching module stitches together the target two-dimensional features and the initial three-dimensional features. The step of extracting features from the stitched features to obtain target 3D features, and obtaining the segmentation result of the medical image based on the target 3D features, includes: The second three-dimensional convolutional feature extraction module extracts features from the stitched features to obtain the target three-dimensional features, and the segmentation result of the medical image is obtained based on the target three-dimensional features.
3. The method according to claim 2, characterized in that, The two-dimensional convolutional feature extraction module includes a multi-branch convolutional feature extraction unit and a splicing unit, wherein at least one branch convolutional feature extraction unit is implemented through dilated convolution; The step of extracting current two-dimensional features with different receptive field sizes from the current input features includes: The multi-branch convolutional feature extraction unit extracts current two-dimensional features with different receptive field sizes from the current input features. The step of concatenating the current two-dimensional features with different receptive field sizes to obtain the next input feature includes: The stitching unit stitches together the current two-dimensional features with different receptive field sizes to obtain the next input feature.
4. The method according to claim 1, characterized in that, The step of concatenating the target two-dimensional features and the initial three-dimensional features includes: According to the slice index order in the medical image to be segmented, the target two-dimensional features are sorted to obtain the fused three-dimensional features; The fused 3D features and the initial 3D features are then stitched together.
5. The method according to claim 1, characterized in that, After acquiring the medical image to be segmented, the process also includes: The medical image to be segmented is converted into a format to obtain a three-dimensional medical image and two-dimensional slice images arranged in sequence; The step of extracting target two-dimensional features from the medical image to be segmented includes: The target two-dimensional features are obtained by performing feature extraction on each of the two-dimensional slice images; The initial three-dimensional features are obtained by extracting features from the medical image to be segmented, including: Initial three-dimensional features are obtained by feature extraction from the three-dimensional medical image.
6. A method for training a medical model, characterized in that, The medical model training method includes: Acquire medical sample data, which includes sample medical images and corresponding target labels; Extract the sample two-dimensional convolutional features from the medical images; The extraction module extracts the first sample three-dimensional convolutional features of the medical image sample; The two-dimensional convolutional features of the sample and the three-dimensional convolutional features of the first sample are concatenated to obtain the three-dimensional convolutional features of the second sample; Feature extraction is performed on the three-dimensional convolutional features of the second sample, and the model output is obtained based on the feature extraction results; A first loss function value is generated based on the two-dimensional convolutional features of the sample and the target label; a second loss function value is generated based on the three-dimensional convolutional features of the first sample and the target label; and a third loss function value is generated based on the model output and the target label. The medical model is optimized based on the first loss function value, the second loss function value, and the third loss function value to obtain a trained medical model. The extraction of sample two-dimensional convolutional features from the sample medical image includes: Two-dimensional features of the current sample with different receptive field sizes are extracted from the input features of the current sample, wherein the first input feature of the current sample is generated based on the medical image of the sample; The two-dimensional features of the current sample with different receptive field sizes are concatenated to obtain the input features of the next sample. The input features of the next sample are used as the input features of the current sample. The two-dimensional features of the current sample with different receptive field sizes are extracted from the input features of the current sample. The final output of the input features of the next sample is used as the two-dimensional convolutional features of the sample.
7. The medical model training method according to claim 6, characterized in that, The step of generating a first loss function value based on the sample's two-dimensional convolutional features and the target label includes: The first loss function value is generated based on the two-dimensional convolutional features of the sample and the target label. The step of generating a second loss function value based on the 3D convolutional features of the first sample and the target label, and generating a third loss function value based on the model output and the target label, includes: A second loss function value is generated based on the three-dimensional convolutional features of the first sample and the target label using the second loss function; a third loss function value is generated based on the model output and the target label using the second loss function.
8. The medical model training method according to claim 6, characterized in that, The step of optimizing the medical model based on the first loss function value, the second loss function value, and the third loss function value to obtain the trained medical model includes: The two-dimensional convolutional feature extraction module of the medical model is optimized based on the first loss function value, the first three-dimensional convolutional feature extraction module of the medical model is optimized based on the second loss function value, and the second three-dimensional convolutional feature extraction module of the medical model is optimized based on the third loss function value, thereby obtaining the trained medical model.
9. A medical image segmentation device, characterized in that, The device includes: The medical image acquisition module is used to acquire the medical image to be segmented. The target two-dimensional feature extraction module is used to extract target two-dimensional features from the medical image to be segmented; An initial three-dimensional feature extraction module is used to extract features from the medical image to be segmented to obtain initial three-dimensional features; The first stitching module is used to stitch the target two-dimensional features and the initial three-dimensional features together; The segmentation module is used to extract features from the stitched features to obtain the target three-dimensional features, and to obtain the segmentation result of the medical image based on the target three-dimensional features; The target two-dimensional feature extraction module includes: An extraction unit is used to extract current two-dimensional features with different receptive field sizes from the current input features, wherein the first input current input feature is generated based on the medical image to be segmented; The splicing unit is used to splice the current two-dimensional features with different receptive field sizes to obtain the next input feature, and use the next input feature as the current input feature. It continues to extract the current two-dimensional features with different receptive field sizes from the current input feature, and uses the final output next input feature as the target two-dimensional feature.
10. A medical model training device, characterized in that, The medical model training device includes: A medical sample data acquisition module is used to acquire medical sample data, which includes sample medical images and corresponding target labels; The sample two-dimensional convolution feature extraction module is used to extract the sample two-dimensional convolution features of the medical image. The sample 3D convolution feature extraction module is used to extract the first sample 3D convolution features of the sample medical image; The second splicing module is used to splice the two-dimensional convolutional features of the sample and the three-dimensional convolutional features of the first sample to obtain the three-dimensional convolutional features of the second sample. The model processing module is used to extract features from the three-dimensional convolutional features of the second sample and obtain the model output based on the feature extraction results. The network parameter update module is used to generate a first loss function value based on the two-dimensional convolutional features of the sample and the target label, generate a second loss function value based on the three-dimensional convolutional features of the first sample and the target label, and generate a third loss function value based on the model output and the target label; and optimize the medical model according to the first loss function value, the second loss function value and the third loss function value to obtain the trained medical model. The training device is further configured to extract two-dimensional features of the current sample with different receptive field sizes from the current sample input features, wherein the first input current sample input feature is generated based on the sample medical image; the two-dimensional features of the current sample with different receptive field sizes are concatenated to obtain the next sample input feature, and the next sample input feature is used as the current sample input feature, and the two-dimensional features of the current sample with different receptive field sizes are extracted from the current sample input features, and the last output next sample input feature is used as the sample two-dimensional convolution feature.
11. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 5 or 6 to 8.