Ultrasound image processing method and apparatus, computer device, and storage medium
By performing multi-class semantic segmentation and diameter measurement on ultrasound images, the inaccuracy of traditional ultrasound image processing is solved, improving the accuracy and consistency of examination results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- REPRODUCTIVE & GENETIC HOSPITAL OF CITIC XIANGYA CO LTD
- Filing Date
- 2025-03-17
- Publication Date
- 2026-06-26
AI Technical Summary
Traditional ultrasound image processing methods are not accurate enough, resulting in low consistency of examination results.
By acquiring a set of ultrasound images, multi-class semantic segmentation is performed, and the diameter of the target is measured at different preset segmentation scales. Ultrasound image analysis is then performed using a trained multi-class semantic segmentation model.
It improves the accuracy of ultrasound image processing and analysis, reduces errors in doctors' subjective judgment, and enhances the objectivity of examination results.
Smart Images

Figure CN120339371B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of image processing technology, and in particular to an ultrasound image processing method, apparatus, computer equipment, storage medium, and computer program product. Background Technology
[0002] With the development of ultrasound technology, it has been gradually introduced into clinical applications. The results of ultrasound images often serve as important auxiliary information for doctors in diagnosis and treatment. For example, ultrasound technology is used to perform prenatal examinations on pregnant women, and ultrasound images can be used to assess fetal growth and development.
[0003] However, the processing and analysis of ultrasound images in traditional methods may not be accurate enough, and may also lead to low consistency of examination results. Summary of the Invention
[0004] Therefore, it is necessary to provide an ultrasound image processing method, apparatus, computer equipment, computer-readable storage medium, and computer program product that can improve the accuracy of ultrasound image processing in response to the above-mentioned technical problems.
[0005] Firstly, this application provides an ultrasound image processing method. The method includes:
[0006] Acquire a set of ultrasound images, wherein each ultrasound image in the set contains multiple categories of target objects;
[0007] At different preset segmentation scales, multi-category semantic segmentation is performed on each ultrasound image in the ultrasound image set to obtain a set of segmentation results for each category of the identified target;
[0008] For each category of target segmentation results set, the diameter table length of the target in each segmentation result set is measured, and the largest measured diameter table length is determined as the target diameter table length of the target in that category.
[0009] Secondly, this application also provides an ultrasound image processing apparatus. The apparatus includes:
[0010] The data acquisition module is used to acquire a set of ultrasound images, wherein each ultrasound image in the set contains multiple categories of identification targets;
[0011] The semantic segmentation module is used to perform multi-category semantic segmentation on each ultrasound image in the ultrasound image set at different preset segmentation scales to obtain a set of segmentation results for each category of the identified target;
[0012] The data measurement module is used to measure the diameter table length of the target in each segmentation result set for each category of the identified target, and to determine the largest diameter table length as the target diameter table length of the identified target in that category.
[0013] Thirdly, this application also provides a computer device. The computer device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to implement the steps described in the ultrasound image processing method embodiments above.
[0014] Fourthly, this application also provides a computer-readable storage medium. The computer-readable storage medium stores a computer program thereon, which, when executed by a processor, implements the steps in the above-described embodiments of the ultrasound image processing method.
[0015] Fifthly, this application also provides a computer program product. The computer program product includes a computer program that, when executed by a processor, implements the steps described in the embodiments of the ultrasound image processing method.
[0016] The aforementioned ultrasound image processing method, apparatus, computer equipment, storage medium, and computer program products, in this application, acquire an ultrasound image set, where each ultrasound image includes multiple categories of identifiable targets. Then, multi-category semantic segmentation is performed on each ultrasound image at different preset segmentation scales. Larger segmentation scales capture global information of the ultrasound image, while smaller segmentation scales capture detailed information, thereby enabling more comprehensive and accurate identification of each category of identifiable targets, thus improving the accuracy of ultrasound image processing and analysis. Furthermore, for each category of identifiable targets, the diameter table length of the identifiable targets in each segmentation result set is measured, and the largest diameter table length is determined as the target diameter table length for that category of identifiable targets. This quantitative measurement method is based on objective data calculation, reducing errors that may arise from the doctor's subjective judgment, further improving the accuracy of ultrasound image analysis results. Attached Figure Description
[0017] Figure 1 This is an application environment diagram of an ultrasound image processing method in one embodiment;
[0018] Figure 2 This is a flowchart illustrating an ultrasound image processing method in one embodiment;
[0019] Figure 3 This is a flowchart illustrating the ultrasound image processing method in another embodiment;
[0020] Figure 4This is a flowchart illustrating the semantic segmentation steps in one embodiment;
[0021] Figure 5 This is a flowchart illustrating the category existence probability prediction step in one embodiment;
[0022] Figure 6 This is a flowchart illustrating an ultrasound image processing method in a detailed embodiment;
[0023] Figure 7 This is a schematic diagram of the structure of multi-category semantic segmentation in one embodiment;
[0024] Figure 8 This is a structural block diagram of an ultrasound image processing device in one embodiment;
[0025] Figure 9 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation
[0026] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.
[0027] The ultrasound image processing method provided in this application embodiment can be applied to, for example... Figure 1 In the application environment shown, terminal 102 communicates with server 104 via a network. A data storage system can store the data that server 104 needs to process. The data storage system can be integrated onto server 104, or it can be located in the cloud or on another network server.
[0028] Specifically, an operator can acquire a set of ultrasound images using an ultrasound acquisition device. Each ultrasound image in the set contains multiple categories of target recognition. The operator can upload the set of ultrasound images to a server 104 via a terminal 102. The server 104 performs multi-category semantic segmentation on each ultrasound image in the set at different preset segmentation scales to obtain a set of segmentation results for each category of target recognition. Furthermore, for each category of target recognition, the server 104 measures the diameter table length of the target recognition in each segmentation result set and determines the maximum measured diameter table length as the target diameter table length of the target recognition in that category.
[0029] The terminal 102 can be, but is not limited to, various personal computers, laptops, smartphones, tablets, IoT devices, and portable wearable devices. IoT devices can include smart speakers, smart TVs, smart air conditioners, and smart in-vehicle systems. Portable wearable devices can include smartwatches, smart bracelets, and head-mounted devices. The server 104 can be implemented using a standalone server or a server cluster consisting of multiple servers.
[0030] In one embodiment, such as Figure 2 As shown, an ultrasound image processing method is provided, which can be applied to... Figure 1 Taking server 104 as an example, the following steps are included:
[0031] S100, acquire the set of ultrasound images.
[0032] Each ultrasound image in the ultrasound image set contains multiple categories of identifying targets. The ultrasound image set includes multiple ultrasound images, which are pictures obtained by using ultrasound waves to image internal organs or tissues of the human body. In this embodiment, taking the application of ultrasound technology to early pregnancy embryo examination as an example, each ultrasound image may include multiple categories of identifying targets, such as the gestational sac, yolk sac, and embryo.
[0033] Specifically, the fetus in the pregnant woman's womb can be scanned manually using an ultrasound imaging device, thereby obtaining multiple ultrasound images and forming an ultrasound image set. For example, in a pre-pregnancy examination, the ultrasound imaging device may have acquired 10 ultrasound images, each containing multiple categories of identifiable targets such as the gestational sac, yolk sac, and embryo.
[0034] S200 performs multi-category semantic segmentation on each ultrasound image in the ultrasound image set at different preset segmentation scales to obtain a set of segmentation results for each category of the identified target.
[0035] Semantic segmentation aims to classify every pixel in an ultrasound image into a specific category. Multi-class semantic segmentation, on the other hand, categorizes pixels in an ultrasound image into different categories, such as classifying pixels corresponding to the gestational sac, yolk sac, and embryo into their respective categories. In multi-class semantic segmentation, different segmentation scales can be pre-set to analyze the ultrasound image from different levels of detail. A larger segmentation scale means a greater focus on the macroscopic, overall perspective, while a smaller scale means a greater focus on the details of the ultrasound image, enabling the discovery of minute features. For example, in this embodiment, four different preset segmentation scales can be set to achieve semantic segmentation from microscopic to macroscopic levels.
[0036] The segmentation result set refers to the summary of all segmentation results obtained for each category after performing multi-category semantic segmentation on each ultrasound image in the ultrasound image set. For example, for the category of gestational sac, semantic segmentation is performed on the ultrasound images in the ultrasound image set, and the final segmentation result set of gestational sac includes each ultrasound image that marks the gestational sac.
[0037] For example, for each ultrasound image in the ultrasound image set, multi-class semantic segmentation is performed at different preset segmentation scales. For instance, four different segmentation scales, from smallest to largest, can be preset. At each segmentation scale, the server assigns pixels in the ultrasound image to corresponding categories, such as a portion as a gestational sac, a portion as a yolk sac, a portion as an embryo, and a portion as background. It should be noted that in multi-class semantic segmentation, the segmentation result for each ultrasound image can simultaneously label various categories of recognition targets, or different categories of recognition targets can be labeled separately to obtain different sets of segmentation results. Furthermore, different sets of segmentation results can be output through different independent output channels. Each category's segmentation result set contains segmentation results for multiple ultrasound images at different segmentation scales for that category. Since ultrasound images may be scaled during semantic segmentation, and different segmentation scales affect the scaling degree, the segmentation results can be further downsampled to restore them to the size of the input ultrasound image. By averaging the segmentation results at different segmentation scales, for example, averaging the corresponding pixel values of ultrasound images at different segmentation scales, a segmentation result set for each category's target is obtained (taking a preset segmentation scale of 4 as an example, one ultrasound image should correspond to 4 segmentation results, and after averaging, one ultrasound image corresponds to one segmentation result, and the segmentation result set includes the segmentation results corresponding to each of the multiple ultrasound images).
[0038] S300: For each category of the target segmentation result set, measure the diameter table length of the target in each segmentation result set, and determine the maximum diameter table length as the target diameter table length of the target in the category.
[0039] The radial length refers to the longest radial distance of the identified target in the segmentation result. For example, if the segmentation result includes a gestational sac, the length measured along the longest direction of the gestational sac is the radial length of the gestational sac (identified target).
[0040] For example, the segmentation result set for each category of the identified target should include multiple segmentation results. Taking the identified target as a gestational sac as an example, for each segmentation result (each processed ultrasound image), the length of the gestational sac in the longest axis direction of that segmentation result is determined. This length is the diameter of the gestational sac. Each segmentation result corresponds to one diameter, thus ultimately yielding a set of diameters. Further, the largest diameter is found in the set of diameters and determined as the target diameter for that category of the identified target. For example, the diameters of the gestational sac measured based on each ultrasound image are compared, and the largest diameter is selected as the target diameter of the gestational sac. This is because the largest diameter can most representatively reflect the maximum size of the identified target in its natural state, which is of great significance for subsequent clinical analysis and evaluation.
[0041] The aforementioned ultrasound image processing method differs from traditional manual analysis. This application acquires a set of ultrasound images, each containing multiple categories of identifiable targets. Then, it performs multi-category semantic segmentation on each ultrasound image at different preset segmentation scales. Larger segmentation scales capture global information, while smaller scales capture detailed information, enabling more comprehensive and accurate identification of each category of identifiable targets, thus improving the accuracy of ultrasound image processing and analysis. Furthermore, for each category of identifiable targets, the diameter table length of each segmented target is measured, and the largest diameter table length is determined as the target diameter table length for that category. This quantitative measurement method, based on objective data calculations, reduces errors that may arise from subjective judgment by physicians, further improving the accuracy of ultrasound image analysis.
[0042] In one embodiment, such as Figure 3 As shown, S200 includes:
[0043] S210 takes a set of ultrasound images as input, calls a trained multi-class semantic segmentation model, and obtains a set of segmentation results for each category of the identified target.
[0044] The trained multi-class semantic segmentation model includes multiple independent output channels. Each independent output channel outputs a set of segmentation results for a single category of identified targets. The trained multi-class semantic segmentation model is based on a set of historical ultrasound images carrying category labels. This model, built using deep learning technology, analyzes input ultrasound images, classifying each pixel into different categories to achieve semantic segmentation of different targets within the ultrasound image. For example, in this embodiment, the multi-class semantic segmentation model can distinguish between different categories of identified targets such as the gestational sac, yolk sac, and embryo. Furthermore, to facilitate subsequent analysis of the segmentation results for different categories of identified targets, the multi-class semantic segmentation model in this embodiment is designed with multiple independent output channels. Each independent output channel is specifically responsible for outputting the segmentation results for a single category of identified targets. For example, the segmentation results for the gestational sac, yolk sac, and embryo are output through different independent output channels.
[0045] For example, the multi-class semantic segmentation model in this application can be constructed and trained in the following way. First, a multi-class semantic segmentation model to be trained can be constructed based on a deep learning network. The multi-class semantic segmentation model to be trained includes a Backbone part, a Neck part, and a Prediction part. Among them, the Backbone part is a combination of various convolutional modules and residual modules, which is used to extract high-dimensional features (form a feature map) in ultrasound images. Different convolutional modules have different parameters such as kernel size and stride, which can extract features at different scales and levels. The residual module allows the deep learning network to directly learn the residual between the input and output, reducing the training difficulty of the deep learning network. The Neck part is used to perform multiple upsampling and downsampling operations on the high-dimensional features in the ultrasound image. Upsampling refers to increasing the resolution of the feature map so that the feature map can contain more detailed information, while downsampling reduces the resolution of the feature map, reduces the amount of data, and extracts more abstract features. Through repeated upsampling and downsampling, the Neck part can fuse features of multiple granularities. The Prediction part outputs the class probability of each pixel based on the Softmax function. For each pixel in the ultrasound image, after processing by the Backbone and Neck parts, a feature representation of that pixel is obtained. These feature representations are then input into the Softmax function, which calculates the probability of the pixel belonging to each class. For example, in ultrasound image segmentation, after processing by the Softmax function, each pixel will have a probability value belonging to different classes such as gestational sac, yolk sac, embryo, and background.
[0046] After constructing the multi-class semantic segmentation model to be trained, appropriate training data should be selected for training. For example, the training data could be 1000 historical ultrasound images containing normal early pregnancy samples, or 500 historical ultrasound images containing each type of early pregnancy abnormality (500 ultrasound images containing abnormal gestational sacs, 500 ultrasound images containing abnormal yolk sacs, 500 ultrasound images containing abnormal embryos, etc.). Each historical ultrasound image carries a category label, meaning that the gestational sac, yolk sac, and embryo are labeled in each historical ultrasound image. Using the above training data to train the multi-class semantic segmentation model, since the training data contains a large number of ultrasound images and corresponding pixel-level category labeling information, during the training process, the multi-class semantic segmentation model can gradually learn the features and patterns of different categories of recognition targets in the training data based on the input ultrasound images and corresponding labeling information. This improves the accuracy and performance of multi-class semantic segmentation of ultrasound images, ultimately resulting in a trained multi-class semantic segmentation model.
[0047] Furthermore, the acquired ultrasound image dataset is used as input data and fed into a pre-trained multi-class semantic segmentation model. Based on the knowledge learned during training, the model classifies each pixel in the ultrasound image, determining which category it belongs to. Since the pre-trained multi-class semantic segmentation model has multiple independent output channels, each corresponding to a category of the target, after processing the ultrasound image, each independent output channel will output the segmentation result for the category it is responsible for.
[0048] In this embodiment, the trained multi-class semantic segmentation model is trained based on a set of historical ultrasound images carrying category labels. Therefore, the trained multi-class semantic segmentation model can capture the feature information of different categories of recognition targets. Each independent output channel of the model is responsible for outputting the segmentation result of a category of recognition target, which allows the model to focus on feature learning and segmentation of each category, thereby improving the accuracy of segmentation results for each category.
[0049] In one embodiment, such as Figure 4 As shown, when a trained multi-class semantic segmentation model is invoked, the following steps are performed:
[0050] S220, Perform multi-category semantic segmentation on each ultrasound image in the ultrasound image set at different preset segmentation scales to obtain a set of segmentation results corresponding to different preset segmentation scales.
[0051] S230, perform upsampling operation on different segmentation result sets to obtain segmentation result sets corresponding to different preset segmentation scales under the original size.
[0052] S240: Average the segmentation result sets corresponding to different preset segmentation scales under the original size to obtain the segmentation result sets of each category of recognition target.
[0053] Here, the original size refers to the size of the ultrasound images in the ultrasound image set. When the trained multi-class semantic segmentation model is invoked, the model will perform multi-class semantic segmentation on each ultrasound image in the ultrasound image set at different preset segmentation scales. Different preset segmentation scales mean that the model will analyze the ultrasound images from the perspective of multiple resolutions or receptive fields.
[0054] Specifically, four different segmentation scales can be set. Larger segmentation scales can quickly capture the overall structure and general outline information in ultrasound images, while smaller segmentation scales can extract detailed features from ultrasound images, providing more accurate segmentation of the edges and fine structures of the target. For example, in early pregnancy ultrasound images, the gestational sac is much larger than the yolk sac and embryo. Single-scale semantic segmentation often tends to focus on predicting the segmentation results of the gestational sac while ignoring the yolk sac and embryo. Therefore, by enabling the multi-class semantic segmentation model to perform semantic segmentation of ultrasound images at different segmentation scales, the accuracy of the segmentation results can be improved.
[0055] Furthermore, after obtaining the set of segmentation results corresponding to different preset segmentation scales, since the size of the segmentation result is usually different from the size of the original ultrasound image when performing semantic segmentation at different scales, it is necessary to perform an upsampling operation to restore the size of the segmentation result to the size of the input ultrasound image. This allows the segmentation results at different scales to be subjected to subsequent comprehensive analysis on the same size basis.
[0056] After obtaining the set of segmentation results corresponding to different preset segmentation scales at the original size, an averaging method can be used to synthesize the segmentation information at different scales. For example, the set of segmentation results corresponding to different preset segmentation scales includes the probability of each pixel in each ultrasound image belonging to each category, such as the probability of each pixel belonging to the gestational sac, yolk sac, and embryo, respectively. Averaging means that for each ultrasound image, the probability of each pixel belonging to each category obtained from the analysis at different segmentation scales is averaged to obtain the final segmentation result. Specifically, assuming four segmentation scales are preset, for a certain pixel in a certain ultrasound image, the probabilities of it belonging to the gestational sac, yolk sac, embryo, and background under these four segmentation scales are (0.1, 0.5, 0.4, 0), (0.2, 0.6, 0.2, 0), (0.1, 0.6, 0.3, 0), and (0.1, 0.4, 0.5, 0), respectively. Then, after averaging, the final probability of this pixel belonging to the gestational sac, yolk sac, embryo, and background is (0.125, 0.525, 0.35, 0).
[0057] In this embodiment, by performing semantic segmentation at different preset segmentation scales and integrating information from different segmentation scales through averaging, the limitations of a single segmentation scale can be effectively reduced, misjudgments can be decreased, and the accuracy of semantic segmentation for various types of recognition targets can be improved.
[0058] In one embodiment, such as Figure 5 As shown, the trained multi-class semantic segmentation model includes a multilayer perceptron. When the trained multi-class semantic segmentation model is invoked, the following steps are also performed:
[0059] S250 performs feature extraction and pooling operations on each ultrasound image to obtain feature vectors of different categories of recognition targets under different preset segmentation scales. It then concatenates the feature vectors of the same category of recognition targets under different preset segmentation scales to obtain the vector concatenation results of each category of recognition targets.
[0060] S260 takes the vector concatenation results of the identified targets of each category as input, calls the multilayer perceptron, and obtains the probability of the existence of different categories of identified targets in each ultrasound image category.
[0061] S270, if the category probability of the identified target is less than the preset category probability threshold, then the segmentation result corresponding to the identified target is filtered out.
[0062] Among them, the multilayer perceptron is a network structure composed of multiple layers of neurons, which can be used to predict the probability of the presence of different categories of targets in ultrasound images based on the input feature vector. Feature extraction refers to extracting information that can represent the image features from ultrasound images. Pooling is a downsampling operation that can aggregate local regions of the feature map, reducing the amount of data and computational complexity, while retaining the main feature information.
[0063] For example, for each ultrasound image in the ultrasound image set, feature extraction and pooling operations are first performed. For instance, the convolutional module in a multi-class semantic segmentation model encodes the input ultrasound image (feature extraction), converting the image information into a feature vector. Then, at different preset segmentation scales, the feature vector is decoded, restoring the encoded feature vector to a feature representation related to the image semantics, resulting in a decoded feature vector. Besides being fed into the convolutional head to output the segmentation result, the decoded feature vector is also input into a global average pooling layer. This pooling layer will further process the decoded feature vector... A global average pooling operation is performed. Assuming the input feature map size is H*W*C (height*width*number of channels), after the global average pooling layer, each channel will obtain an average value, and the final output is a 1*C feature vector. Assuming the preset number of segmentation scales is 4, each ultrasound image can obtain four sets of feature vectors after the above processing. Each set of feature vectors includes feature vectors of different categories of recognition targets. For the same category of recognition targets, the feature vectors under different preset segmentation scales are concatenated to obtain a 1*(C1+C2+C3+C4) vector concatenation result.
[0064] Furthermore, the vector concatenation result obtained above is input into a multilayer perceptron (MLP). The MLP can perform nonlinear transformation and learning on the input vector concatenation result to predict the probability of the presence of a gestational sac, yolk sac, and embryo in the ultrasound image. For example, if the concatenated 1*(C1+C2+C3+C4) vector is input into the MLP, the MLP will output a 1*3 feature vector. The three values in this feature vector represent the probability of the presence of a gestational sac, yolk sac, and embryo in that frame of the ultrasound image, respectively. If the probability of the presence of a certain category of target in the ultrasound image is less than a preset category probability threshold, for example, the probability of the presence of a gestational sac is less than 0.5, then it can be considered that a gestational sac does not actually exist in the ultrasound image. Regardless of the segmentation result corresponding to the gestational sac, this segmentation result should be filtered, meaning that the segmentation result should be suppressed, and the segmentation result of this category of target should not be used as the final valid output. This is because when a gestational sac does not exist in the ultrasound image, the segmentation output head may still predict the segmentation result of the gestational sac, which will lead to serious measurement errors. Therefore, suppressing the segmentation result of the gestational sac can reduce possible misidentification. It should be noted that since the input ultrasound images are multiple frames, there may be some ultrasound images where the probability of the gestational sac is less than 0.5, while in other ultrasound images the probability of the gestational sac is greater than or equal to 0.5. In this case, the decision on whether to suppress the segmentation of the gestational sac can be made by following the majority rule.
[0065] In this embodiment, a global average pooling layer is used to process the feature vector, which reduces redundant information in the features. Furthermore, by concatenating feature vectors at multiple different segmentation scales, the ability to identify gestational sacs, yolk sacs, and embryos of different sizes and shapes can be enhanced. Subsequently, based on the probability of the existence of the identified target output by the multilayer perceptron, a suppression operation is performed on the segmentation result, which can effectively reduce subsequent misjudgments and improve the efficiency and accuracy of ultrasound image processing.
[0066] In one embodiment, S300 includes: for each segmentation result, extracting the contour pixels of the target in the segmentation result, determining the straight-line distance between each pixel in the contour pixels, and determining the longest straight-line distance as the diameter table length of the target.
[0067] Following the above embodiments, after obtaining the segmentation results, it is first necessary to extract the contour pixels of the target from the segmentation results, for example, through edge detection algorithms, to accurately find the boundary of the target. Then, the straight-line distance between any two pixels in these contour pixels is calculated, and the largest straight-line distance is determined as the diameter of the target. The diameter reflects the maximum size of the target in a certain direction.
[0068] In this embodiment, the distance calculation at the pixel level provides a more objective and accurate diameter table length compared to estimating the size of the target based solely on visual observation, thus providing a reliable basis for subsequent analysis.
[0069] To provide a clearer explanation of the ultrasound image processing method provided in this application, the following description is provided in conjunction with the appendix. Figure 6 and one A detailed embodiment will be explained, which includes the following steps:
[0070] S601, acquire a set of ultrasound images, and perform multi-class semantic segmentation on each ultrasound image in the set of ultrasound images at different preset segmentation scales to obtain a set of segmentation results corresponding to different preset segmentation scales.
[0071] S602, perform upsampling operation on different segmentation result sets to obtain segmentation result sets corresponding to different preset segmentation scales under the original size, where the original size is the size of the ultrasound images in the ultrasound image set.
[0072] S603, average the segmentation result sets corresponding to different preset segmentation scales under the original size to obtain the segmentation result sets of each category of recognition target.
[0073] S604: For each ultrasound image, feature extraction and pooling operations are performed on the ultrasound image to obtain feature vectors of different categories of recognition targets under different preset segmentation scales. The feature vectors of the same category of recognition targets under different preset segmentation scales are concatenated to obtain the vector concatenation results of recognition targets of each category.
[0074] S605 takes the vector concatenation results of the identified targets of each category as input, calls the multilayer perceptron, and obtains the probability of the existence of different categories of identified targets in each ultrasound image category.
[0075] S606 If the category probability of the identified target is less than the preset category probability threshold, then the segmentation result of the identified target is filtered out.
[0076] S607: For each category of the identified target, a set of segmentation results is generated. For each segmentation result, the contour pixels of the identified target are extracted from the segmentation result, and the straight-line distance between each pixel in the contour pixels is determined.
[0077] S608, the longest straight-line distance is determined as the diameter table length of the identified target, and the largest measured diameter table length is determined as the target diameter table length of the identified target of the category.
[0078] In the detailed embodiment, a pre-trained multi-class semantic segmentation model is invoked, and the model architecture is as follows: Figure 7As shown, the input image on the left is an ultrasound image, which serves as the input data for the entire model. The ultrasound image requires feature extraction through multiple stages, each with a different segmentation scale. Furthermore, each stage of feature extraction can be implemented using multiple convolutional modules, such as deformable convolutional modules, to capture features more accurately. The features extracted at each stage are then processed by a convolutional head and an average pooling layer. The convolutional head outputs segmentation results corresponding to different preset segmentation scales, while the average pooling layer outputs pooled feature vectors. These feature vectors, corresponding to different preset segmentation scales, are concatenated and input into a multilayer perceptron to predict the probability of different categories of target objects existing in each ultrasound image.
[0079] It should be noted that when using a multi-class semantic segmentation model for early pregnancy ultrasound image processing, artifacts may exist in the original acquired ultrasound images, such as artifacts between the embryo and the gestational sac, which may lead to inaccurate measurement results. Therefore, before inputting the ultrasound image set into the multi-class semantic segmentation model, the ultrasound image set needs to be cleaned to filter out ultrasound images of poor quality. Furthermore, considering that the yolk sac and embryo may be adhered together in early pregnancy ultrasound images, the multi-class semantic segmentation model may segment the yolk sac as part of the embryo, resulting in an excessively large embryo diameter table length in subsequent measurements. Therefore, when training the multi-class semantic segmentation model, ultrasound images containing the above-mentioned conditions can be added to the training data, or ultrasound images with abrupt changes in embryo diameter table length can be discarded or suppressed when applying the trained multi-class semantic segmentation model.
[0080] Furthermore, early pregnancy ultrasound images may include embryos with lower limbs, which could cause the embryo's diameter measurement to be misplaced onto the lower limbs, leading to an overestimation. Therefore, ultrasound images containing embryos with lower limbs can be added to the training data of the multi-class semantic segmentation model, and the lower limbs of the embryo can be removed during the annotation of these ultrasound images. Alternatively, the lower limbs of the embryo can be manually removed from the segmentation results when applying the trained multi-class semantic segmentation model.
[0081] When the ultrasound image processing method provided in this application is applied to the segmentation and measurement of early pregnancy ultrasound images, after multiple experimental verifications, the multi-class semantic segmentation model achieved accuracy coefficients of 0.972, 0.908, and 0.826 for segmenting the gestational sac, yolk sac, and embryo, respectively, with an average segmentation accuracy coefficient of 0.902. Furthermore, compared to existing models, the multi-class semantic segmentation model of this application has a lighter parameter count, with only 9.03 parameters. Furthermore, the speed can reach 12 millisecond frames, which can effectively improve the segmentation accuracy and efficiency of the three anatomical structures: gestational sac, yolk sac, and embryo.
[0082] It should be understood that although the steps in the flowcharts of the above embodiments are shown sequentially according to the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the above embodiments may include multiple steps or multiple stages. These steps or stages are not necessarily completed at the same time, but can be executed at different times. The execution order of these steps or stages is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the steps or stages of other steps.
[0083] Based on the same inventive concept, this application also provides an ultrasound image processing apparatus for implementing the ultrasound image processing method described above. The solution provided by this apparatus is similar to the solution described in the above method; therefore, the specific limitations in one or more ultrasound image processing apparatus embodiments provided below can be found in the limitations of the ultrasound image processing method described above, and will not be repeated here.
[0084] In one embodiment, such as Figure 8 As shown, an ultrasound image processing device 800 is provided, including: a data acquisition module 810, a semantic segmentation module 820, and a data measurement module 830, wherein:
[0085] The data acquisition module 810 is used to acquire a set of ultrasound images, each of which contains multiple categories of target objects.
[0086] The semantic segmentation module 820 is used to perform multi-category semantic segmentation on each ultrasound image in the ultrasound image set at different preset segmentation scales to obtain a set of segmentation results for each category of the identified target.
[0087] The data measurement module 830 is used to measure the diameter table length of the target in each segmentation result set for each category of the target identification set, and to determine the largest diameter table length as the target diameter table length of the target identification set for the category.
[0088] In one embodiment, the semantic segmentation module 820 is further configured to take the ultrasound image set as input, call the trained multi-class semantic segmentation model, and obtain the segmentation result set of each category of the identified target. The trained multi-class semantic segmentation model includes multiple independent output channels, each of which is used to output the segmentation result set of the identified target of one category. The trained multi-class semantic segmentation model is trained based on the historical ultrasound image set carrying category labels.
[0089] In one embodiment, the semantic segmentation module 820 is further configured to perform multi-category semantic segmentation on each ultrasound image in the ultrasound image set at different preset segmentation scales to obtain segmentation result sets corresponding to different preset segmentation scales, perform upsampling operations on different segmentation result sets to obtain segmentation result sets corresponding to different preset segmentation scales at the original size, where the original size is the size of the ultrasound images in the ultrasound image set, and calculate the average of the segmentation result sets corresponding to different preset segmentation scales at the original size to obtain segmentation result sets for each category of recognition targets.
[0090] In one embodiment, the trained multi-class semantic segmentation model includes a multilayer perceptron. The semantic segmentation module 820 is further configured to perform feature extraction and pooling operations on each ultrasound image to obtain feature vectors of different categories of recognition targets under different preset segmentation scales. The feature vectors of the same category of recognition targets under different preset segmentation scales are concatenated to obtain the vector concatenation results of each category of recognition targets. Using the vector concatenation results of each category of recognition targets as input, the multilayer perceptron is invoked to obtain the category existence probability of different categories of recognition targets in each ultrasound image. If the category probability of the recognition target is less than a preset category probability threshold, the segmentation result of the recognition target is filtered out.
[0091] In one embodiment, the data measurement module 830 is further configured to extract the contour pixels of the target in each segmentation result, determine the straight-line distance between each pixel in the contour pixels, and determine the longest straight-line distance as the diameter of the target.
[0092] Each module in the aforementioned ultrasound image processing device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device as software, so that the processor can call and execute the corresponding operations of each module.
[0093] In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as follows: Figure 9As shown, this computer device includes a processor, memory, input / output (I / O) interfaces, and a communication interface. The processor, memory, and I / O interfaces are connected via a system bus, and the communication interface is also connected to the system bus via the I / O interfaces. The processor provides computational and control capabilities. The memory includes non-volatile storage media and internal memory. The non-volatile storage media stores the operating system, computer programs, and a database. The internal memory provides the environment for the operation of the operating system and computer programs stored in the non-volatile storage media. The database stores data such as sets of ultrasound images. The I / O interfaces are used for exchanging information between the processor and external devices. The communication interface is used for communication with external terminals via a network connection. When the computer program is executed by the processor, it implements an ultrasound image processing method.
[0094] Those skilled in the art will understand that Figure 9 The structure shown is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation on the computer device to which the present application is applied. Specific computer devices may include more or fewer components than those shown in the figure, or combine certain components, or have different component arrangements.
[0095] In one embodiment, a computer device is provided, including a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the steps described in the ultrasound image processing embodiment.
[0096] In one embodiment, a computer-readable storage medium is provided having a computer program stored thereon, which, when executed by a processor, implements the steps described in the ultrasound image processing embodiment.
[0097] In one embodiment, a computer program product is provided, including a computer program that, when executed by a processor, implements the steps described in the ultrasound image processing embodiment.
[0098] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of related data must comply with the relevant laws, regulations and standards of the relevant countries and regions.
[0099] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments described above. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.
[0100] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0101] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.
Claims
1. An ultrasound image processing method, characterized in that, The method includes: Acquire a set of ultrasound images, wherein each ultrasound image in the set contains multiple categories of target objects; Using the ultrasound image set as input, a trained multi-class semantic segmentation model is invoked to perform multi-class semantic segmentation on each ultrasound image in the ultrasound image set at different preset segmentation scales, resulting in segmentation result sets corresponding to different preset segmentation scales. An upsampling operation is then performed on the different segmentation result sets to obtain segmentation result sets corresponding to different preset segmentation scales at the original size, where the original size is the size of the ultrasound images in the ultrasound image set. The average of the segmentation result sets corresponding to different preset segmentation scales at the original size is then calculated to obtain the segmentation result sets for each category of the identified target. For each category of target segmentation results, the contour pixels of the target are extracted from the segmentation results. The straight-line distance between each pixel in the contour pixels is determined. The longest straight-line distance is determined as the diameter table length of the target. The maximum measured diameter table length is determined as the target diameter table length of the target for that category.
2. The method according to claim 1, characterized in that, The trained multi-class semantic segmentation model includes multiple independent output channels, each of which is used to output a set of segmentation results for a single class of identified targets. The trained multi-class semantic segmentation model is trained based on a set of historical ultrasound images carrying class labels.
3. The method according to claim 1, characterized in that, The trained multi-class semantic segmentation model includes a multilayer perceptron. When the trained multi-class semantic segmentation model is invoked, the following steps are also performed: For each ultrasound image, feature extraction and pooling operations are performed on the ultrasound image to obtain feature vectors of different categories of recognition targets under different preset segmentation scales. The feature vectors of the same category of recognition targets under different preset segmentation scales are concatenated to obtain the vector concatenation results of recognition targets of each category. Using the vector concatenation results of the identified targets of each category as input, the multilayer perceptron is invoked to obtain the probability of the presence of different categories of identified targets in each ultrasound image; If the category probability of the identified target is less than a preset category probability threshold, the segmentation result of the identified target is filtered out.
4. An ultrasonic image processing device, characterized in that, The device includes: The data acquisition module is used to acquire a set of ultrasound images, wherein each ultrasound image in the set contains multiple categories of identification targets; The semantic segmentation module is used to take the ultrasound image set as input, call the trained multi-class semantic segmentation model, and perform multi-class semantic segmentation on each ultrasound image in the ultrasound image set at different preset segmentation scales to obtain segmentation result sets corresponding to different preset segmentation scales. The module then performs an upsampling operation on the different segmentation result sets to obtain segmentation result sets corresponding to different preset segmentation scales at the original size, where the original size is the size of the ultrasound images in the ultrasound image set. Finally, the module averages the segmentation result sets corresponding to different preset segmentation scales at the original size to obtain segmentation result sets for each category of recognition target. The data measurement module is used to extract the contour pixels of the identified target from the segmentation results set of the identified target for each category, determine the straight-line distance of each pixel in the contour pixels, determine the longest straight-line distance as the diameter table length of the identified target, and determine the largest measured diameter table length as the target diameter table length of the identified target for the category.
5. The apparatus according to claim 4, characterized in that, The trained multi-class semantic segmentation model includes multiple independent output channels, each of which is used to output a set of segmentation results for a single class of identified targets. The trained multi-class semantic segmentation model is trained based on a set of historical ultrasound images carrying class labels.
6. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it performs the following steps: Acquire a set of ultrasound images, wherein each ultrasound image in the set contains multiple categories of target objects; Using the ultrasound image set as input, a trained multi-class semantic segmentation model is invoked to perform multi-class semantic segmentation on each ultrasound image in the ultrasound image set at different preset segmentation scales, resulting in segmentation result sets corresponding to different preset segmentation scales. An upsampling operation is then performed on the different segmentation result sets to obtain segmentation result sets corresponding to different preset segmentation scales at the original size, where the original size is the size of the ultrasound images in the ultrasound image set. The average of the segmentation result sets corresponding to different preset segmentation scales at the original size is then calculated to obtain the segmentation result sets for each category of the identified target. For each category of target segmentation results, the contour pixels of the target are extracted from the segmentation results. The straight-line distance between each pixel in the contour pixels is determined. The longest straight-line distance is determined as the diameter table length of the target. The maximum measured diameter table length is determined as the target diameter table length of the target for that category.
7. The computer device according to claim 6, characterized in that, The trained multi-class semantic segmentation model includes a multilayer perceptron, and the processor, when executing the computer program, further implements the following steps: For each ultrasound image, feature extraction and pooling operations are performed on the ultrasound image to obtain feature vectors of different categories of recognition targets under different preset segmentation scales. The feature vectors of the same category of recognition targets under different preset segmentation scales are concatenated to obtain the vector concatenation results of recognition targets of each category. Using the vector concatenation results of the identified targets of each category as input, the multilayer perceptron is invoked to obtain the probability of the presence of different categories of identified targets in each ultrasound image; If the category probability of the identified target is less than a preset category probability threshold, the segmentation result of the identified target is filtered out.
8. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it performs the following steps: Acquire a set of ultrasound images, wherein each ultrasound image in the set contains multiple categories of target objects; Using the ultrasound image set as input, a trained multi-class semantic segmentation model is invoked to perform multi-class semantic segmentation on each ultrasound image in the ultrasound image set at different preset segmentation scales, resulting in segmentation result sets corresponding to different preset segmentation scales. An upsampling operation is then performed on the different segmentation result sets to obtain segmentation result sets corresponding to different preset segmentation scales at the original size, where the original size is the size of the ultrasound images in the ultrasound image set. The average of the segmentation result sets corresponding to different preset segmentation scales at the original size is then calculated to obtain the segmentation result sets for each category of the identified target. For each category of target segmentation results, the contour pixels of the target are extracted from the segmentation results. The straight-line distance between each pixel in the contour pixels is determined. The longest straight-line distance is determined as the diameter table length of the target. The maximum measured diameter table length is determined as the target diameter table length of the target for that category.
9. The computer-readable storage medium according to claim 8, characterized in that, The trained multi-class semantic segmentation model includes a multilayer perceptron, and the computer program, when executed by a processor, performs the following steps: For each ultrasound image, feature extraction and pooling operations are performed on the ultrasound image to obtain feature vectors of different categories of recognition targets under different preset segmentation scales. The feature vectors of the same category of recognition targets under different preset segmentation scales are concatenated to obtain the vector concatenation results of recognition targets of each category. Using the vector concatenation results of the identified targets of each category as input, the multilayer perceptron is invoked to obtain the probability of the presence of different categories of identified targets in each ultrasound image; If the category probability of the identified target is less than a preset category probability threshold, the segmentation result of the identified target is filtered out.
10. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 3.