Training methods, related methods and devices for ultrasound image processing models of breast lesions
By combining multimodal ultrasound imaging and training methods for ultrasound image processing models of breast lesions from different scanning angles, the problem of low accuracy in breast lesion diagnosis was solved, and more efficient and accurate prediction of breast lesion types was achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TSINGHUA UNIVERSITY
- Filing Date
- 2023-07-07
- Publication Date
- 2026-06-30
AI Technical Summary
Existing ultrasound image analysis methods for breast lesions suffer from low diagnostic accuracy and are highly dependent on the clinical experience of ultrasound physicians, leading to inconsistent diagnostic results between different physicians and even among the same physician in different scenarios.
Using the ResNet+SENet framework, multimodal ultrasound images are fused and combined with transverse and longitudinal scanning perspectives. The benign and malignant lesions are predicted by training a breast lesion ultrasound image processing model. The Swin-Unet model is used for lesion segmentation, and the model parameters are optimized through multiple attention mechanisms and loss functions to improve diagnostic accuracy.
It improves the accuracy of breast lesion diagnosis, reduces reliance on the experience of ultrasound physicians, lowers the false positive rate, and improves the consistency and efficiency of examination results.
Smart Images

Figure CN116862872B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image processing technology, specifically to a method, related methods, and apparatus for training a breast lesion ultrasound image processing model. Background Technology
[0002] Currently, breast ultrasound has become the primary method for breast cancer screening. During a breast ultrasound examination, the sonographer needs to simultaneously scan and diagnose breast lesions, making the results highly dependent on the sonographer's clinical experience. Furthermore, different sonographers, and even the same sonographer in different situations, can easily arrive at different diagnoses for the same breast lesion. Therefore, to improve the accuracy of examination results and reduce the workload of sonographers, artificial intelligence (AI) technology is used to assist sonographers in analyzing ultrasound images of breast lesions and obtaining diagnostic results.
[0003] However, current methods for analyzing ultrasound images of breast lesions suffer from low accuracy in diagnosing breast lesions. Summary of the Invention
[0004] To address the aforementioned technical problems, this invention provides a method, related methods, and apparatus for training a breast lesion ultrasound image processing model.
[0005] The technical solution of the present invention is as follows:
[0006] This invention provides a method for training a breast lesion ultrasound image processing model, comprising:
[0007] Acquire first model training data; the first model training data includes a set of ultrasound video frames of the first target breast lesion acquired from different scanning perspectives;
[0008] Perform data preprocessing on the training data of the first model;
[0009] The model is trained using the preprocessed first model training data to obtain a breast lesion ultrasound image processing model; the breast lesion ultrasound image processing model is used to predict the type of breast lesion corresponding to the set of ultrasound video frames of the same breast lesion obtained from different scanning perspectives; the breast lesion type includes benign and malignant.
[0010] The present invention also provides a method for processing ultrasound images of breast lesions, comprising:
[0011] Acquire ultrasound video frame data of the breast lesion to be processed; the ultrasound video frame data of the breast lesion to be processed includes a set of ultrasound video frames of the second target breast lesion acquired from different scanning angles;
[0012] Data preprocessing is performed on the ultrasound video frame data of the breast lesion to be treated;
[0013] The preprocessed ultrasound video frame data of the breast lesion to be processed is input into the ultrasound image processing model of the breast lesion to obtain the prediction result of the breast lesion type corresponding to the ultrasound video frame data of the breast lesion to be processed; the breast lesion type includes benign and malignant; the ultrasound image processing model of the breast lesion is trained using the method described above.
[0014] The present invention also provides a training device for a breast lesion ultrasound image processing model, comprising:
[0015] The module for acquiring first model training data is used to acquire first model training data; the first model training data includes a set of ultrasound video frames of the first target breast lesion acquired from different scanning perspectives;
[0016] The first data preprocessing module is used to preprocess the training data of the first model.
[0017] The training model module is used to train the model using the preprocessed first model training data to obtain a breast lesion ultrasound image processing model; the breast lesion ultrasound image processing model is used to predict the type of breast lesion corresponding to the set of ultrasound video frames of the same breast lesion obtained from different scanning perspectives; the breast lesion type includes benign and malignant.
[0018] The present invention also provides an ultrasound image processing device for breast lesions, comprising:
[0019] A module for acquiring ultrasound video frame data of breast lesions to be treated is used to acquire ultrasound video frame data of breast lesions to be treated; the ultrasound video frame data of breast lesions to be treated includes a set of ultrasound video frames of the second target breast lesion acquired from different scanning angles.
[0020] The second data preprocessing module is used to preprocess the ultrasound video frame data of the breast lesion to be processed.
[0021] The module for determining the prediction result of breast lesion type is used to input the preprocessed ultrasound video frame data of the breast lesion to be processed into the ultrasound image processing model of the breast lesion to obtain the prediction result of the breast lesion type corresponding to the ultrasound video frame data of the breast lesion to be processed; the breast lesion type includes benign and malignant; the ultrasound image processing model of the breast lesion is trained using the method described above.
[0022] This invention employs the aforementioned technical solution to obtain first model training data. The first model training data includes a set of ultrasound video frames of a first target breast lesion acquired from different scanning perspectives. The first model training data undergoes data preprocessing. The preprocessed first model training data is used for model training to obtain a breast lesion ultrasound image processing model. This model is used to predict the type of breast lesion corresponding to a set of ultrasound video frames of the same breast lesion acquired from different scanning perspectives. The breast lesion type includes benign and malignant lesions. Therefore, because the model is trained based on a set of ultrasound video frames of the first target breast lesion acquired from different scanning perspectives, this invention can combine ultrasound video frames of the same breast lesion acquired from different scanning perspectives to predict the type of breast lesion corresponding to that set of ultrasound video frames, making the breast lesion type prediction result of this application more accurate. Attached Figure Description
[0023] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0024] Figure 1 This is a flowchart illustrating a method for training a breast lesion segmentation model according to an embodiment of the present invention;
[0025] Figure 2 This is a flowchart illustrating a method for training a breast lesion ultrasound image processing model according to an embodiment of the present invention.
[0026] Figure 3 This is a flowchart illustrating a method for processing ultrasound images of breast lesions provided in an embodiment of the present invention;
[0027] Figure 4 This is a schematic diagram of the structure of a breast lesion ultrasound image processing model training device provided in an embodiment of the present invention;
[0028] Figure 5 This is a schematic diagram of the structure of an ultrasound image processing device for breast lesions provided in an embodiment of the present invention. Detailed Implementation
[0029] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0030] Breast cancer is a common malignant tumor. Recent studies show that the incidence and mortality rates of breast cancer in Chinese women are on the rise, and the average age of onset is gradually decreasing. Early detection, early diagnosis, and early treatment can effectively improve the cure rate of breast cancer. Methods for breast cancer screening include mammography and breast ultrasound. Due to its affordability, safety, and accuracy, breast ultrasound has become the primary method for breast cancer screening. During a breast ultrasound examination, the sonographer needs to simultaneously scan and diagnose breast lesions, making the results highly dependent on the sonographer's clinical experience. Furthermore, different sonographers, and even the same sonographer in different situations, can easily arrive at different diagnoses for the same breast lesion. Therefore, to improve the accuracy of examination results and reduce the workload of sonographers, artificial intelligence (AI) technology is used to assist sonographers in analyzing ultrasound images of breast lesions and obtaining diagnostic results.
[0031] In the process of using artificial intelligence (AI) technology to analyze ultrasound images of breast lesions and obtain diagnostic results, a ResNet+SENet framework is employed to fuse multimodal ultrasound images (B-mode ultrasound, color Doppler ultrasound, and elastography static images). Both transverse and longitudinal scanning perspectives are used to predict the benignity or malignancy of the lesions. In this prediction process, the average of the prediction results from different scanning perspectives is taken to obtain the final diagnostic result. This technical approach suffers from low accuracy in diagnosing breast lesions.
[0032] To address the aforementioned technical problems, this invention provides a method, related methods, and apparatus for training a breast lesion ultrasound image processing model. The technical solution of this invention will be described in detail below with reference to the accompanying drawings.
[0033] Figure 1 This is a schematic flowchart of a breast lesion segmentation model training method provided in an embodiment of the present invention. Figure 1 As shown, this process includes:
[0034] Step 101: Obtain the second model training data; the second model training data contains multiple static ultrasound images of breast lesions, each of which carries lesion contour annotation information; the multiple static ultrasound images of breast lesions include static ultrasound images of breast lesions of at least one type.
[0035] In the embodiments of this specification, the static ultrasound image of a breast lesion can be a static ultrasound image of any type of breast lesion (e.g., nodules and lymph nodes). Furthermore, the static ultrasound image of a breast lesion can be a static ultrasound image of a breast lesion containing typical lesion features, manually selected from publicly available medical image data. The selected static ultrasound image of a breast lesion may contain lesion contour annotation information; in this case, it is not necessary to annotate the image with lesion contour annotation information. Conversely, the selected static ultrasound image of a breast lesion may not contain lesion contour annotation information; in this case, it is necessary to manually annotate the image with lesion contour annotation information.
[0036] Furthermore, the ratio of static ultrasound images of breast lesions corresponding to benign lesions to static ultrasound images of breast lesions corresponding to malignant lesions in the training data of the second model can be 1:1.
[0037] Step 102: Use the training data of the second model to pre-train the breast lesion segmentation model and obtain the pre-trained model parameters.
[0038] In this embodiment, the Swin-Unet algorithm is used to construct a Swin-Unet model. Then, based on 5-fold cross-validation and the training data of the second model, the Swin-Unet model is trained to obtain pre-trained model parameters. Furthermore, the model trained in this step is used to output an image with lesion segmentation results after a breast lesion ultrasound image is input.
[0039] Swin-Unet, a medical image segmentation algorithm announced in 2021, is similar to a Unet structure, but replaces the encoder and decoder parts with Swin Transformer blocks. This can improve the multi-scale long-distance dependency problem in medical image segmentation to a certain extent, enabling the model to better distinguish lesion pixels from designated pixels, where designated pixels are pixels around the lesion that are similar in appearance to the lesion.
[0040] The Swin-Unet model consists of an encoder, a bottleneck, a decoder, and skip connections. Since the Swin-Unet model is existing technology, a brief explanation of the model is provided below.
[0041] The encoder is used to convert the input static ultrasound image of breast lesions into sequence embeddings (i.e., feature extraction from the static ultrasound image of breast lesions). Specifically, the input static ultrasound image of breast lesions is first segmented into non-overlapping image patches of a preset size, where the preset size can be 4*4. Then, the patches are input into the encoder. In the encoder, the patches are fed into two consecutive Swin Transformer blocks for representation learning, while the feature dimension and resolution of the patches remain unchanged. Afterward, the patches pass through a patch merging layer, which reduces the number of tokens (2×downsampling) and lowers the image resolution. This process is repeated three times in the encoder.
[0042] Furthermore, to address the difficulty of transformer convergence, the features output by the encoder are passed through a bottleneck constructed from two consecutive Swing Transformer blocks to further learn deep features.
[0043] The decoder transforms the feature representation of the bottleneck output into the model's prediction target. Unlike the patch merging layer used in the encoder, the decoder uses a patch expansion layer to upsample the previously extracted deep features. The patch expansion layer reshapes the feature maps of adjacent dimensions into higher-resolution feature maps (2*upsampling), and accordingly halves the feature dimension.
[0044] Furthermore, there are skip connections between the encoder and decoder, which can fuse features from the encoder with upsampled features from the decoder. This connects shallow and deep features together to reduce spatial information loss caused by downsampling.
[0045] Furthermore, the encoder's output connection has a linear layer, and the size of the connection feature remains the same as the size of the upsampled feature, so that the image in the final output lesion segmentation result is the same size as the static ultrasound image of the breast lesion input to the encoder.
[0046] Step 103: Obtain the third model training data; the third model training data includes a first number of breast lesion keyframes and a second number of non-breast lesion keyframes; the breast lesion keyframes are frames manually selected from ultrasound videos of breast lesions of a specified type of breast lesion that contain breast lesions with typical breast lesion characteristics; the breast lesion keyframes carry breast lesion contour annotation information; the non-breast lesion keyframes are frames manually selected from ultrasound videos of breast lesions of the specified type of breast lesion that do not contain breast lesions but contain typical breast contours; the non-breast lesion keyframes are negative samples.
[0047] In the embodiments of this specification, the ultrasound video of the breast lesion of the specified type of breast lesion can be the ultrasound video of a breast sample (hereinafter referred to as the breast sample to be studied) studied by relevant researchers, and the lesion type of the breast sample is the specified type of breast lesion.
[0048] Furthermore, the first number can be greater than the second number. The third model training data can include a set of frames from multiple ultrasound videos of breast lesions in the breast sample under study; each frame set contains keyframes of breast lesions and keyframes of non-breast lesions, and the number of keyframes of breast lesions and keyframes of non-breast lesions in each frame set can be the same. In a specific example, for each ultrasound video of a breast lesion in the breast sample under study, 8 keyframes of breast lesions and 4 keyframes of non-breast lesions are manually selected from the ultrasound video of that breast lesion.
[0049] In this embodiment, because the appearance of breast contours and breast lesions is similar, the model may easily mistake the breast contour for a breast lesion contour and segment it accordingly. Therefore, to avoid the model mistaking the breast contour for a breast lesion and to improve the accuracy of the model's lesion segmentation results, non-breast lesion keyframes are used as negative samples in this embodiment.
[0050] It should be noted that a positive sample is the target sample that the model aims to detect, i.e., a sample containing breast lesions. A negative sample does not require the model to segment for breast lesions; it is considered not to contain breast lesions.
[0051] Step 104: Based on the pre-trained model parameters, use the third model training data to train the breast lesion segmentation model to obtain the breast lesion segmentation model.
[0052] In this embodiment of the specification, the model used for training the breast lesion segmentation model is still the Swin-Unet model. The initial parameters of the Swin-Unet model are the parameters of the pre-trained model. The training framework is similar to that in step 102. The parameters of the Swin-Unet model are fine-tuned using the third model training data to obtain the final breast lesion segmentation model. Compared with the model trained in step 102, this breast lesion segmentation model has higher accuracy in lesion segmentation results because its model parameters are fine-tuned using the third model training data.
[0053] It should be noted that before using the training data for the second and third models, the ultrasound images in each training dataset underwent data preprocessing. Preprocessing for any training dataset included: effective region segmentation and standardization. Effective region segmentation involved removing background UI interfaces of different types of ultrasound instruments, patient information, and sampling dates from the original ultrasound images in the training data to obtain desensitized ultrasound images of uniform size. Standardization involved denoising and enhancing the original ultrasound images in the training data. Image enhancement methods included horizontal flipping and rotation of the images.
[0054] This embodiment employs the above-described technical solution, training the breast lesion segmentation model in two stages: Stage 1: Using second model training data (which can include static ultrasound images of lesions corresponding to any type of breast lesion), the breast lesion segmentation model is pre-trained to obtain pre-trained model parameters. Stage 2: Based on the pre-trained model parameters obtained in Stage 1, the breast lesion segmentation model is trained using the third model training data (containing non-breast lesion keyframes, and these non-breast lesion keyframes are negative samples), resulting in the breast lesion segmentation model. Therefore, the breast lesion segmentation model of this embodiment can effectively avoid mistaking breast contours for breast lesions to a certain extent, thereby improving the accuracy of the lesion segmentation results and reducing the false positive rate of breast lesion detection results.
[0055] The present invention also provides a method for training a breast lesion ultrasound image processing model. Figure 2 This is a schematic flowchart of a method for training a breast lesion ultrasound image processing model according to an embodiment of the present invention. Figure 2 As shown, this process includes:
[0056] Step 201: Obtain the first model training data; the first model training data includes a set of ultrasound video frames of the first target breast lesion obtained from different scanning perspectives.
[0057] In this embodiment of the specification, step 201: obtaining the first model training data may specifically include:
[0058] First, ultrasound videos of the first target breast lesion are acquired from different scanning angles.
[0059] Then, for each scanning viewpoint of the ultrasound video, the lesion segmentation results of each video frame in the ultrasound video are obtained. Specifically, obtaining the lesion segmentation results of each video frame in the ultrasound video may include: using the breast lesion segmentation model trained in the above embodiment to obtain the lesion segmentation results of each video frame in the ultrasound video.
[0060] Next, based on the lesion segmentation results, the video frame with the largest lesion area in the ultrasound video is determined.
[0061] Finally, using the video frame with the largest lesion area as the anchor point, a preset number of video frames are selected from the ultrasound video to obtain the ultrasound video frame set.
[0062] In this embodiment of the specification, the video frame with the largest lesion area is used as the anchor point, and a preset number of video frames are selected from the ultrasound video to obtain the ultrasound video frame set, which may specifically include:
[0063] A target video segment is selected from the ultrasound video; the target video segment contains the video frame with the largest lesion area, and the number of video frames in the target video segment is the preset number;
[0064] The set of video frames in the target video segment is defined as the ultrasound video frame set.
[0065] In a specific example, suppose the ultrasound video is V original The length of the ultrasound video is N0, and the position of the video frame with the largest lesion area in the ultrasound video is n. max (i.e., the nth time in the ultrasound video) max The frame is the video frame with the largest lesion area, the preset number is z, and the video frames in the ultrasound video frame set are V. filter The strategy for selecting a preset number of video frames from the ultrasound video to obtain the ultrasound video frame set is as follows:
[0066]
[0067] Formula (1) can be understood as: if the anchor point (i.e., the video frame with the largest lesion area) is located in the ultrasound video V originaz If the ultrasound video V is acquired within the first z / 2 video frames, then the ultrasound video V is obtained. originalThe first z video frames are taken as the set of ultrasound video frames; if the anchor point is located at ultrasound video V original Between the z / 2th video frame and (N0-z / 2)th video frame, the ultrasound video V is obtained. original The (n)th max -z / 2) to the (n)th max The set of video frames between +z / 2) is defined as the ultrasonic video frame set; if the anchor point is located in the ultrasonic video V original After the (N0-z / 2)th video frame, the ultrasound video V is acquired. original The video frames between (N0-z) and N0 are referred to as the set of ultrasound video frames.
[0068] In the embodiments of this specification, the video frame with the largest lesion area is used as the anchor point in the ultrasound video V. original The ultrasound video frame set is selected from the data. Therefore, the embodiments of this specification can automatically remove redundant information from the ultrasound video, thereby improving the efficiency of ultrasound image processing of breast lesions. Furthermore, a larger lesion area is more conducive to accurately analyzing the type of lesion. In this embodiment, the video frame with the largest lesion area is used as the anchor point, within the ultrasound video V... original The selection of the ultrasound video frame set in this embodiment improves the accuracy of ultrasound image processing of breast lesions.
[0069] Step 202: Perform data preprocessing on the training data of the first model.
[0070] The data preprocessing methods in the embodiments of this specification are the same in each embodiment, and will not be repeated here. For details, please refer to the relevant content in the first embodiment mentioned above.
[0071] Step 203: Use the preprocessed first model training data to train the model and obtain the ultrasound image processing model for breast lesions; the ultrasound image processing model for breast lesions is used to predict the type of breast lesion corresponding to the set of ultrasound video frames of the same breast lesion obtained from different scanning perspectives; the type of breast lesion includes benign and malignant.
[0072] In this embodiment of the specification, step 203: using the preprocessed first model training data to train the model and obtain a breast lesion ultrasound image processing model, may specifically include:
[0073] First, for the set of ultrasound video frames of the first target breast lesion acquired from each scanning view, a first vector is determined based on a multiple attention mechanism; the first vector is used to represent the trend of lesion size change among the various video frames in the ultrasound video frame set.
[0074] Then, based on the lesion segmentation results of the ultrasound video frame set, a second vector of the ultrasound video frame set is determined; the second vector is used to represent the trend of lesion size change among the various video frames in the ultrasound video frame set.
[0075] Next, based on the first vector and the second vector, the first loss function corresponding to the set of ultrasound video frames is calculated.
[0076] Next, the predicted type of breast lesion is obtained from the set of ultrasound video frames of the first target breast lesion obtained from each scanning view, as output by the training model.
[0077] Next, based on the predicted type of breast lesion and the actual type of the first target breast lesion, a second loss function is calculated.
[0078] Finally, the model parameters of the trained model are modified according to each of the first loss function and the second loss function.
[0079] The following example, using ultrasound video frame sets from two scanning perspectives (first scanning perspective and second scanning perspective), illustrates the process of obtaining the ultrasound image processing model for breast lesions. The first scanning perspective can be a transverse section of the first target breast lesion, and the second scanning perspective can be a longitudinal section of the first target breast lesion. Assume the first ultrasound video frame set under the first scanning perspective is: U1 = {u 11 ,u 12 ,…,u 1z The set of second ultrasound video frames from the second scanning perspective is: U2 = {u 21 ,u 22 ,…,u 2z}. Where u 1i Let i = 1, 2, 3, ..., z, represent the i-th video frame in the first set of ultrasound video frames, u 2i Similarly, the process of obtaining an ultrasound image processing model of a breast lesion may include:
[0080] First, obtain the first set of ultrasound video frames and the second set of ultrasound video frames.
[0081] Then, feature extraction is performed on the first set of ultrasound video frames to obtain the first feature representation: F1 = {f 11 ,f 12 ,…,f 1z}, feature extraction is performed on the second set of ultrasound video frames to obtain the second feature representation: F2={f 21 ,f 22 ,…,f 2z}. Among them, f 1iLet i = 1, 2, 3, ..., z, represent the feature representation of the i-th video frame in the first set of ultrasound video frames, f 2i Similarly.
[0082] Next, for the first set of ultrasound video frames, based on a multiple attention mechanism and according to the first feature representation, a first vector V1 is determined to identify the trend of breast lesion size changes in each video frame of the first set of ultrasound video frames. Specifically, average pooling is performed on F1 to obtain the score S1 = {s} for each video frame in the first keyframe set. 11 ,s 12 ,…,s 1z Then, starting from the first element of S1, compare each adjacent element (i.e., the score) in S1 sequentially. If s 1i >s 1i+1 If the result is 0, the value is 1; otherwise, it is 0. The resulting comparisons (0 or 1) form the first vector V1, which is a (z-1) dimensional vector. Furthermore, based on the lesion segmentation results of each video frame in the first ultrasound video frame set, starting from the first element of F1, the lesion size is compared sequentially between adjacent video frames in F1. If f... 1i >f 1i+1 If the result is 0, the value is 1; otherwise, it is 0. The various comparison results (0 or 1) obtained sequentially constitute the second vector G1. Therefore, G1 is also a (z-1) dimensional vector. Next, based on the first vector V1 and the second vector G1, the first loss function of the first ultrasound video frame set is calculated using formula (2):
[0083]
[0084] Where m is the total number of samples used for model training, i.e., the total number of video frames. V 1,j G represents the j-th element in V1. 1,j This represents the j-th element in G1.
[0085] Next, the first loss function for the second set of ultrasound video frames is obtained in the same manner.
[0086] Next, the predicted types of breast lesions corresponding to the sets of ultrasound video frames from two scanning perspectives are obtained. Specifically, for each feature representation (feature representation F1 and feature representation F2), positional encoding is performed on each element of the feature representation according to the position of each video frame in the video frame set, and then written into the feature representation. Then, the positionally encoded feature representation F1 is passed through a self-attention layer to learn the correlation between its different video frames, resulting in a feature vector set F1' = {f 11 ',f12 ',…,f 1z Similarly, after the position-encoded feature representation F2 passes through the self-attention layer, the feature vector set F2' = {f} is obtained. 21 ',f 22 ',…,f 2z Furthermore, the feature vector set F1' and the feature vector set F2' are concatenated to obtain the fused feature vector F' = {f}. 11 ',f 12 ',…,f 1z ', f 21 ',f 22 ',…,f 2z The fused feature vector is then input into a fully connected layer, and finally outputs the predicted type of breast lesion. Based on the predicted type of breast lesion and the actual type of the first target breast lesion, a cross-entropy loss function (i.e., the second loss function) is calculated.
[0087] Finally, the model parameters of the trained model are modified according to the loss function L of the trained model (i.e., L = each first loss function + the second loss function).
[0088] It should be noted that this example can be extended to scenarios with more scanning perspectives and is not limited to the two scanning perspectives mentioned above.
[0089] This embodiment employs the aforementioned technical solution. By fusing ultrasound images from different scanning perspectives at the feature map level, this embodiment can more fully aggregate information from different scanning sections. Furthermore, by automatically selecting video frames with larger lesion areas and determining the breast lesion type based on these frames, this embodiment can automatically remove redundant information from ultrasound images, reducing manual annotation costs and improving the accuracy and efficiency of breast lesion type determination. Additionally, by incorporating various first loss functions into the loss function L of the training model, the model can focus more on video frames with larger breast lesion areas during iteration, thereby improving the accuracy of the model's breast lesion type prediction results.
[0090] Based on a general inventive concept, the present invention also provides a method for processing ultrasound images of breast lesions. Figure 3 This is a schematic flowchart of a breast lesion ultrasound image processing method provided in an embodiment of the present invention. Figure 3 As shown, this process includes:
[0091] Step 301: Acquire ultrasound video frame data of the breast lesion to be processed; the ultrasound video frame data of the breast lesion to be processed includes a set of ultrasound video frames of the second target breast lesion acquired from different scanning angles.
[0092] Step 302: Perform data preprocessing on the ultrasound video frame data of the breast lesion to be processed.
[0093] Step 303: Input the preprocessed ultrasound video frame data of the breast lesion to be processed into the ultrasound image processing model of the breast lesion to obtain the prediction result of the breast lesion type corresponding to the ultrasound video frame data of the breast lesion to be processed; the breast lesion type includes benign and malignant; the ultrasound image processing model of the breast lesion is trained using the method described above.
[0094] Optionally, step 301: acquiring ultrasound video frame data of the breast lesion to be processed, which may specifically include:
[0095] The ultrasound video of the second target breast lesion under different scanning angles is acquired; for each ultrasound video under each scanning angle, the lesion segmentation result of each video frame in the ultrasound video is acquired; based on the lesion segmentation result, the video frame with the largest lesion area in the ultrasound video is determined; using the video frame with the largest lesion area as the anchor point, a preset number of video frames are selected in the ultrasound video to obtain the ultrasound video frame set.
[0096] It should be noted that the breast lesion ultrasound image processing model training method in this embodiment and the above embodiment are based on the same inventive concept and have the same or corresponding execution process. For the specific execution process, please refer to the above embodiment.
[0097] Based on a general inventive concept, the present invention also provides a training device for ultrasound image processing model of breast lesions. Figure 4 This is a schematic diagram of the structure of a breast lesion ultrasound image processing model training device provided in an embodiment of the present invention. Figure 4 As shown, this device includes:
[0098] The module 41 for acquiring first model training data is used to acquire first model training data; the first model training data includes a set of ultrasound video frames of the first target breast lesion acquired from different scanning perspectives.
[0099] The first data preprocessing module 42 is used to preprocess the training data of the first model.
[0100] The training model module 43 is used to train the model using the preprocessed first model training data to obtain a breast lesion ultrasound image processing model; the breast lesion ultrasound image processing model is used to predict the type of breast lesion corresponding to the set of ultrasound video frames of the same breast lesion obtained from different scanning perspectives; the type of breast lesion includes benign and malignant.
[0101] Optionally, the module 41 for obtaining the first model training data may specifically include:
[0102] An ultrasound video acquisition submodule is used to acquire ultrasound videos of the first target breast lesion under different scanning angles.
[0103] The lesion segmentation result acquisition submodule is used to acquire the lesion segmentation result of each video frame in the ultrasound video for each scanning view.
[0104] The ultrasound video frame set acquisition submodule is used to determine the video frame with the largest lesion area in the ultrasound video based on the lesion segmentation result; and to select a preset number of video frames in the ultrasound video using the video frame with the largest lesion area as the anchor point to obtain the ultrasound video frame set.
[0105] Optionally, the submodule for obtaining lesion segmentation results can be used for:
[0106] Using the breast lesion segmentation model trained in the above embodiments, the lesion segmentation results of each video frame in the ultrasound video are obtained.
[0107] The submodule for obtaining the ultrasound video frame set can be used specifically for:
[0108] A target video segment is selected from the ultrasound video; the target video segment contains the video frame with the largest lesion area, and the number of video frames in the target video segment is the preset number; the set of each video frame in the target video segment is determined as the ultrasound video frame set.
[0109] Training model module 43 can be used specifically for:
[0110] First, for the set of ultrasound video frames of the first target breast lesion acquired from each scanning view, a first vector is determined based on a multiple attention mechanism; the first vector is used to represent the trend of lesion size change among the various video frames in the ultrasound video frame set.
[0111] Then, based on the lesion segmentation results of the ultrasound video frame set, a second vector of the ultrasound video frame set is determined; the second vector is used to represent the trend of lesion size change among the various video frames in the ultrasound video frame set.
[0112] Next, based on the first vector and the second vector, the first loss function corresponding to the set of ultrasound video frames is calculated.
[0113] Next, the predicted type of breast lesion is obtained from the set of ultrasound video frames of the first target breast lesion obtained from each scanning view, as output by the training model.
[0114] Next, based on the predicted type of breast lesion and the actual type of the first target breast lesion, a second loss function is calculated.
[0115] Finally, the model parameters of the trained model are modified according to each of the first loss function and the second loss function.
[0116] Based on the same inventive concept, the present invention also provides a breast lesion ultrasound image processing device. Figure 5 This is a schematic diagram of the structure of a breast lesion ultrasound image processing device provided in an embodiment of the present invention. Figure 5 As shown, this device includes:
[0117] The module 51 for acquiring ultrasound video frame data of breast lesions to be treated is used to acquire ultrasound video frame data of breast lesions to be treated; the ultrasound video frame data of breast lesions to be treated includes a set of ultrasound video frames of the second target breast lesion acquired from different scanning angles.
[0118] The second data preprocessing module 52 is used to preprocess the ultrasound video frame data of the breast lesion to be processed.
[0119] The module 53 for determining the prediction result of breast lesion type is used to input the preprocessed ultrasound video frame data of the breast lesion to be processed into the ultrasound image processing model of the breast lesion to obtain the prediction result of the breast lesion type corresponding to the ultrasound video frame data of the breast lesion to be processed; the breast lesion type includes benign and malignant; the ultrasound image processing model of the breast lesion is trained using the method described above.
[0120] Optionally, module 51 for acquiring ultrasound video frame data of the breast lesion to be processed can be used for:
[0121] The ultrasound video of the second target breast lesion under different scanning angles is acquired; for each ultrasound video under each scanning angle, the lesion segmentation result of each video frame in the ultrasound video is acquired; based on the lesion segmentation result, the video frame with the largest lesion area in the ultrasound video is determined; using the video frame with the largest lesion area as the anchor point, a preset number of video frames are selected in the ultrasound video to obtain the ultrasound video frame set.
[0122] For the foregoing method embodiments, in order to simplify the description, they are all described as a series of actions. However, those skilled in the art should understand that the present invention is not limited to the described order of actions, because according to the present invention, some steps can be performed in other orders or simultaneously. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to the present invention.
[0123] It should be noted that the various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For apparatus embodiments, since they are basically similar to method embodiments, the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments.
[0124] The steps in the methods of the various embodiments of the present invention can be adjusted, merged, or deleted in order according to actual needs, and the technical features described in the various embodiments can be replaced or combined.
[0125] The modules and sub-modules in the various embodiments of the present invention can be merged, divided, and deleted according to actual needs.
[0126] In the embodiments provided by this invention, it should be understood that the disclosed terminals, devices, and methods can be implemented in other ways. For example, the terminal embodiments described above are merely illustrative. For instance, the division of modules or sub-modules is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple sub-modules or modules may be combined or integrated into another module, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interfaces, devices, or modules, and may be electrical, mechanical, or other forms.
[0127] The modules or submodules described as separate components may or may not be physically separate. The components that constitute a module or submodule may or may not be physical modules or submodules; that is, they may be located in one place or distributed across multiple network modules or submodules. Some or all of the modules or submodules can be selected to achieve the purpose of this embodiment's solution, depending on actual needs.
[0128] Furthermore, the functional modules or sub-modules in the various embodiments of the present invention can be integrated into one processing module, or each module or sub-module can exist physically separately, or two or more modules or sub-modules can be integrated into one module. The integrated modules or sub-modules described above can be implemented in hardware or in the form of software functional modules or sub-modules.
[0129] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.
[0130] The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein can be implemented directly by hardware, a software unit executed by a processor, or a combination of both. The software unit can be located in random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.
[0131] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0132] The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A method for training a breast lesion ultrasound image processing model, characterized in that, include: Obtain the training data for the first model; The first model training data includes a set of ultrasound video frames of the first target breast lesion acquired from different scanning perspectives; wherein, acquiring the first model training data specifically includes: acquiring ultrasound videos of the first target breast lesion from the different scanning perspectives; for each ultrasound video from the scanning perspective, acquiring the lesion segmentation results of each video frame in the ultrasound video; based on the lesion segmentation results, determining the video frame with the largest lesion area in the ultrasound video; using the video frame with the largest lesion area as an anchor point, selecting a preset number of video frames in the ultrasound video to obtain the set of ultrasound video frames; wherein, acquiring the lesion segmentation results of each video frame in the ultrasound video specifically includes: using a breast lesion segmentation model to acquire the lesion segmentation results of each video frame in the ultrasound video; the training process of the breast lesion segmentation model includes: acquiring the second model training data. The training data for the second model includes multiple static ultrasound images of breast lesions, each carrying lesion contour annotation information; the multiple static ultrasound images of breast lesions include static ultrasound images of breast lesions of at least one type of breast lesion; the second model training data is used to pre-train a breast lesion segmentation model to obtain pre-trained model parameters; the training data for the third model includes a first number of breast lesion keyframes; the breast lesion keyframes are frames manually selected from ultrasound videos of breast lesions of a specified type of breast lesion that contain breast lesions with typical breast lesion characteristics; the breast lesion keyframes carry breast lesion contour annotation information; based on the pre-trained model parameters, the third model training data is used to train a breast lesion segmentation model to obtain a breast lesion segmentation model; Perform data preprocessing on the training data of the first model; The model is trained using the preprocessed first model training data to obtain a breast lesion ultrasound image processing model; the breast lesion ultrasound image processing model is used to predict the type of breast lesion corresponding to the set of ultrasound video frames of the same breast lesion obtained from different scanning perspectives; the breast lesion type includes benign and malignant. The model is trained using the preprocessed first model training data to obtain a breast lesion ultrasound image processing model, which specifically includes: For each set of ultrasound video frames of the first target breast lesion acquired from each scanning view, a first vector is determined based on a multiple attention mechanism; the first vector is used to represent the trend of lesion size change among the various video frames in the ultrasound video frame set. Based on the lesion segmentation results of the ultrasound video frame set, a second vector of the ultrasound video frame set is determined; the second vector is used to represent the trend of lesion size change among the various video frames in the ultrasound video frame set. Calculate the first loss function corresponding to the set of ultrasound video frames based on the first vector and the second vector; Obtain the breast lesion prediction type from the set of ultrasound video frames of the first target breast lesion obtained from each scanning view, as output by the training model; The second loss function is calculated based on the predicted type of breast lesion and the actual type of the first target breast lesion. The model parameters of the trained model are modified according to each of the first loss function and the second loss function.
2. The method according to claim 1, characterized in that, The third model training data also includes a second number of non-breast lesion keyframes; the non-breast lesion keyframes are frames manually selected from the ultrasound videos of breast lesions of the specified type of breast lesions that do not contain breast lesions but contain typical breast contours; the non-breast lesion keyframes are negative samples.
3. The method according to claim 1, characterized in that, Using the video frame with the largest lesion area as the anchor point, a predetermined number of video frames are selected from the ultrasound video to obtain the ultrasound video frame set, specifically including: A target video segment is selected from the ultrasound video; the target video segment contains the video frame with the largest lesion area, and the number of video frames in the target video segment is the preset number; The set of video frames in the target video segment is defined as the ultrasound video frame set.
4. A method for processing ultrasound images of breast lesions, characterized in that, include: Acquire ultrasound video frame data of the breast lesion to be processed; the ultrasound video frame data of the breast lesion to be processed includes a set of ultrasound video frames of the second target breast lesion acquired from different scanning angles; Data preprocessing is performed on the ultrasound video frame data of the breast lesion to be treated; The preprocessed ultrasound video frame data of the breast lesion to be processed is input into the ultrasound image processing model of the breast lesion to obtain the prediction result of the breast lesion type corresponding to the ultrasound video frame data of the breast lesion to be processed; the breast lesion type includes benign and malignant; the ultrasound image processing model of the breast lesion is trained by the method described in any one of claims 1-3.
5. The method according to claim 4, characterized in that, Acquire ultrasound video frame data of the breast lesion to be treated, specifically including: Acquire ultrasound videos of the second target breast lesion from different scanning angles; For each ultrasound video from a scanning perspective, obtain the lesion segmentation results for each video frame in the ultrasound video; Based on the lesion segmentation results, determine the video frame with the largest lesion area in the ultrasound video; Using the video frame with the largest lesion area as the anchor point, a preset number of video frames are selected from the ultrasound video to obtain the ultrasound video frame set.
6. A training device for ultrasound image processing model of breast lesions, characterized in that, include: The module for obtaining the first model training data is used to obtain the first model training data; The first model training data includes a set of ultrasound video frames of the first target breast lesion acquired from different scanning perspectives; wherein, the module for acquiring the first model training data specifically includes: acquiring ultrasound videos of the first target breast lesion from the different scanning perspectives; for each ultrasound video from the scanning perspective, acquiring the lesion segmentation results of each video frame in the ultrasound video; determining the video frame with the largest lesion area in the ultrasound video based on the lesion segmentation results; using the video frame with the largest lesion area as an anchor point, selecting a preset number of video frames in the ultrasound video to obtain the set of ultrasound video frames; wherein, acquiring the lesion segmentation results of each video frame in the ultrasound video specifically includes: using a breast lesion segmentation model to acquire the lesion segmentation results of each video frame in the ultrasound video; the training process of the breast lesion segmentation model includes: A second model training data is obtained; the second model training data contains multiple static ultrasound images of breast lesions, each of which carries lesion contour annotation information; the multiple static ultrasound images of breast lesions include static ultrasound images of breast lesions of at least one type of breast lesion; the second model training data is used to pre-train a breast lesion segmentation model to obtain pre-trained model parameters; a third model training data is obtained; the third model training data contains a first number of breast lesion keyframes; the breast lesion keyframes are frames manually selected from ultrasound videos of breast lesions of a specified type of breast lesion that contain breast lesions with typical breast lesion characteristics; the breast lesion keyframes carry breast lesion contour annotation information; based on the pre-trained model parameters, the third model training data is used to train a breast lesion segmentation model to obtain a breast lesion segmentation model; The first data preprocessing module is used to preprocess the training data of the first model. The training model module is used to train the model using the preprocessed first model training data to obtain a breast lesion ultrasound image processing model; the breast lesion ultrasound image processing model is used to predict the type of breast lesion corresponding to the set of ultrasound video frames of the same breast lesion obtained from different scanning perspectives; the breast lesion type includes benign and malignant. The training model module is specifically used for: For each set of ultrasound video frames of the first target breast lesion acquired from each scanning view, a first vector is determined based on a multiple attention mechanism; the first vector is used to represent the trend of lesion size change among the various video frames in the ultrasound video frame set. Based on the lesion segmentation results of the ultrasound video frame set, a second vector of the ultrasound video frame set is determined; the second vector is used to represent the trend of lesion size change among the various video frames in the ultrasound video frame set. Calculate the first loss function corresponding to the set of ultrasound video frames based on the first vector and the second vector; Obtain the breast lesion prediction type from the set of ultrasound video frames of the first target breast lesion obtained from each scanning view, as output by the training model; The second loss function is calculated based on the predicted type of breast lesion and the actual type of the first target breast lesion. The model parameters of the trained model are modified according to each of the first loss function and the second loss function.
7. A breast lesion ultrasound image processing device, characterized in that, include: A module for acquiring ultrasound video frame data of breast lesions to be treated is used to acquire ultrasound video frame data of breast lesions to be treated; the ultrasound video frame data of breast lesions to be treated includes a set of ultrasound video frames of the second target breast lesion acquired from different scanning angles. The second data preprocessing module is used to preprocess the ultrasound video frame data of the breast lesion to be processed. The module for determining the prediction result of breast lesion type is used to input the preprocessed ultrasound video frame data of the breast lesion to be processed into the ultrasound image processing model of the breast lesion to obtain the prediction result of the breast lesion type corresponding to the ultrasound video frame data of the breast lesion to be processed; the breast lesion type includes benign and malignant; the ultrasound image processing model of the breast lesion is trained by the method described in any one of claims 1-3.