A method for quickly and intelligently identifying ABUS image artifacts
By using the YOLO model to quickly and intelligently identify ABUS image artifacts, the problem of time-consuming and labor-intensive ABUS image artifact identification is solved, achieving efficient artifact recognition and quality control, and improving the accuracy of breast cancer screening.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- 广州格希丽医疗科技有限公司
- Filing Date
- 2026-03-18
- Publication Date
- 2026-06-19
AI Technical Summary
Current methods for identifying artifacts in ABUS images rely on manual image reading, which is time-consuming and labor-intensive, lacks objective evaluation standards, and seriously affects the accuracy of breast cancer screening.
A target detection method based on the YOLO model is adopted to achieve fast and intelligent identification of ABUS image artifacts through feature extraction and weight optimization functions, and generate a quality assessment visualization map.
It enables accurate identification of air leakage artifacts and reverberation artifacts in ABUS images, significantly reducing the workload of physicians in interpreting images and improving the efficiency of breast cancer screening and image quality control.
Smart Images

Figure CN122243929A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image artifact technology, specifically to an ABUS image artifact rapid intelligent identification method. Background Technology
[0002] Breast cancer is the most common cancer among women, with approximately 420,000 women diagnosed annually in China, and a mortality rate as high as 35%. Conventional two-dimensional ultrasound imaging is currently the most widely used screening tool; however, early breast cancer (such as ductal carcinoma in situ, DCIS) often presents as clusters of tiny calcifications. Two-dimensional ultrasound has insufficient resolution, providing limited information, and calcifications are extremely difficult to distinguish from strong echoes in glandular tissue, leading to missed diagnoses and misdiagnoses in breast cancer screening. Automated breast ultrasound (ABUS) is a novel high-resolution three-dimensional ultrasound imaging technology for the breast, offering superior discrimination of lesion location, morphology, and texture compared to traditional two-dimensional ultrasound imaging, and holds great potential for improving the detection rate of early breast cancer screening.
[0003] ABUS imaging is achieved by continuously sliding a wide linear array ultrasound transducer across one breast. A single scan can acquire approximately 500 cross-sectional images covering most of the breast structure. These images are then overlaid to obtain a three-dimensional anatomical image of the entire breast, including cross-sectional, sagittal, and coronal planes. The unilateral scan view includes three standard views: median (AP), lateral (LAT), and medial (MED), and four additional views: inferior (INF), superior (SUP), upper lateral quadrant (UOQ), and external (X). However, in actual imaging, factors such as improper parameter settings, inappropriate depth selection, and loose probe contact often lead to excessively high or low image gain, glandular tissue placement that is too deep or too shallow, and the generation of reverberation and air leakage artifacts, severely interfering with the accuracy of subsequent screening. Among these, loose probe contact is the most common factor, producing reverberation artifacts that appear as alternating light and dark stripes near the skin surface, and air leakage artifacts that appear as large dark shadows deep within the tissue, severely obscuring lesion features and causing missed diagnoses. Because ABUS imaging has a large number of frames, relying solely on doctors' visual observation to determine the presence of artifacts, their location, and the severity of the artifacts is time-consuming and labor-intensive, severely impacting screening efficiency.
[0004] The causes of ABUS image artifacts are clearly identifiable: most originate from the transducer failing to achieve perfect fit with the skin in the area around the nipple, resulting in reverberation artifacts behind the nipple in the ultrasound image; in addition, improper pressure applied by the operator when using the probe (too much or too little) can lead to poor coupling between the probe and the skin, thereby producing air leakage artifacts in the breast area.
[0005] Currently, artifact identification in ABUS images mainly relies on manual image reading by physicians. This 3D image reading process is labor-intensive, time-consuming, and lacks objective evaluation standards. The rapid development of deep learning has provided a new path for ABUS image quality control. For example, some scholars have proposed using bilinear convolutional neural networks for global quality evaluation of ultrasound images. However, this type of method suffers from a lack of quantifiable indicators for ultrasound image grading standards. Therefore, there is an urgent need to develop a rapid and intelligent artifact identification system for ABUS images, establish an ABUS image quality assessment system, achieve real-time quantification of artifact proportions, and introduce an intelligent early warning mechanism. If the image quality does not meet the standards, it can promptly remind physicians to rescan to ensure image quality, thereby improving screening efficiency and effectiveness. Summary of the Invention
[0006] The purpose of this invention is to provide a rapid and intelligent method for identifying ABUS image artifacts. Based on target detection using the YOLO model, it achieves accurate identification of air leakage artifacts and reverberation artifacts in ultrasound images, thereby enabling quality control of 3D breast ultrasound images and solving the problems mentioned in the background art.
[0007] To achieve the above objectives, the present invention provides the following technical solution:
[0008] A fast and intelligent method for identifying ABUS image artifacts includes:
[0009] S1: Acquire the 3D ABUS image data to be processed, preprocess the 3D ABUS image data, and generate a 2D image and structured label data containing category and location;
[0010] S2: Feature extraction is performed based on the central network, and multi-level features are efficiently fused based on the feature transfer network to generate lightweight enhanced features with rich semantics and accurate spatial information.
[0011] S3: The weight optimization function based on multi-component collaboration comprehensively evaluates the model's performance in classification, localization, and distribution prediction. The network parameters are iteratively updated until convergence is achieved through the backpropagation algorithm.
[0012] S4: Verify model performance and implement quantitative evaluation of image quality, generating a quality evaluation visualization.
[0013] Preferably, in step S1, the three-dimensional ABUS image data is preprocessed by performing the following operations:
[0014] The 3D ABUS image data is analyzed, sliced along its depth dimension, and a set of 2D images is generated. The spatial mapping relationship between each slice and the original 3D data is maintained, and all 2D images are uniformly adjusted to the preset target resolution.
[0015] The preprocessed 2D image is manually annotated to generate a segmentation mask for distinguishing different target regions. The segmentation mask is then converted into training labels required by the target detection model through a label generation process.
[0016] Specifically, connected component analysis is performed on each category region in the segmentation mask to determine each independent target instance, and the corresponding bounding box information is calculated for each target instance; the coordinates of the bounding box are normalized and converted into relative coordinate representations independent of image size; the category identifier of each target is integrated with the normalized position information to generate structured label data containing category and position, which is input into the model along with the corresponding two-dimensional image to guide the supervised learning process of the model.
[0017] Preferably, data augmentation processing is performed on minority class samples with a small number of samples, including but not limited to geometric transformation operations to change the spatial layout of the image and photometric transformation operations to adjust the visual attributes of the image, and the annotation information associated with the image is updated synchronously when the transformation is performed, so as to keep the augmented image consistent with the label.
[0018] Preferably, in step S2, feature extraction is performed based on the central network, and the following operations are performed:
[0019] The multi-stage feature extraction architecture enables efficient hierarchical feature learning of the input data. This architecture begins with an initial downsampling unit, which gradually reduces the spatial dimension of the input image through a series of convolutional operations to extract basic low-level features.
[0020] After each downsampling stage, a feature enhancement module is configured. This module adopts a branching processing path: one path directly passes the input features, while the other path performs complex nonlinear transformations through one or more deep feature extraction units. The outputs of the two paths are then fused to generate enhanced features with rich details and high-level semantics. At the highest level of feature blocks, a multi-scale pooling module is applied to aggregate global context information and enhance the representational power of the features.
[0021] Preferably, in step S2, based on the efficient fusion of multi-level features using a feature transfer network, the following operations are performed:
[0022] The detection head used for small artifact detection incorporates... The module, and the formula, are shown below:
[0023] (1)
[0024] in, For input feature blocks; For the feature block of the i-th scale; For scale quantity; For output information layer; For dynamic interpolation fusion module; An adaptive weighting mechanism for multi-path coordinates; This is a multi-feature fusion module;
[0025] Dynamic interpolation fusion module The specific implementation formula is as follows:
[0026] (2)
[0027] in, This is the first feature block; This is the second feature block; Used to adjust the number of channels in a feature block; This is a function for adjusting the feature block size.
[0028] Multi-path coordinate adaptive weighting mechanism The specific implementation formula is as follows:
[0029] (3)
[0030] in, For input feature blocks; This is a horizontal weighted graph; This is a vertical weighted graph; Channel weighting graph; For conversion functions; This is the transpose of the vertical weights;
[0031] The formula for calculating the horizontal weighted chart is as follows:
[0032] (4)
[0033] in, For horizontal adaptive pooling, ; A convolutional layer that outputs horizontal weights;
[0034] The formula for calculating the horizontal weighted chart is as follows:
[0035] (5)
[0036] in, Vertical adaptive average pooling, Convolutional layers used to output vertical weights;
[0037] The formula for calculating the channel weights is as follows:
[0038] (6)
[0039] in, Global adaptive average pooling, ; Convolutional layers used for output channel weights;
[0040] Multi-feature fusion module The specific implementation formula is as follows:
[0041] (7)
[0042] in, Low-resolution feature blocks; High-resolution feature blocks; It is a convolutional layer; It is an interpolation function;
[0043] This network achieves feature interaction through a bidirectional information transmission path: from top to bottom, high-level semantic features are upgraded in resolution and then fused with mid- and low-level features step by step; from bottom to top, the fused features are downsampled to enhance high-level semantics. In this process, feature blocks first undergo multi-scale feature extraction, as shown in the following formula:
[0044] (8)
[0045]
[0046] Next, feature fusion is performed, and the specific formula is as follows:
[0047] (9)
[0048] in, It is a multi-feature fusion module used to fuse features from adjacent scales;
[0049] Secondly, feature alignment is performed, using the following formula:
[0050] (10)
[0051] (11)
[0052] in, It is an interpolation function used to adjust all feature blocks to the same size;
[0053] Finally, a dynamic feature fusion module is introduced. This module not only integrates multi-scale information through an adaptive weighting mechanism, but also employs efficient computational units in its internal structure to achieve a balance between performance and efficiency. The formula for the dynamic feature fusion module is shown below:
[0054] (12)
[0055] in, It is a dynamic interpolation fusion module used to fuse features enhanced by an adaptive weighting mechanism with the original input features;
[0056] Finally, a set of lightweight augmented features with rich semantics and accurate spatial information is output to provide high-quality input for subsequent detection tasks.
[0057] Preferably, in step S3, the performance of the model in classification, localization, and distribution prediction is comprehensively evaluated based on a multi-component collaborative weight optimization function, and the following operations are performed:
[0058] The model predictions are matched with the ground truth labels to determine the positive and negative samples used for supervised learning. This weight optimization function must include at least the following three core evaluation terms:
[0059] The first is the classification error evaluation item, which measures the deviation between the predicted class confidence and the target class;
[0060] The second is the positioning error evaluation item, which is calculated through a composite measurement strategy. This strategy not only integrates multiple spatial relationship measurement standards, but also models the coordinate representation of the bounding box as a probability distribution, and optimizes the positioning accuracy by calculating the difference between the predicted distribution and the true distribution.
[0061] The third is the distribution consistency evaluation term, which is used to ensure that the discrete probability distribution predicted by the model can accurately fit the actual target location distribution;
[0062] An innovative weight optimization function is adopted by combining CIoU and NWD loss, where CIoU is calculated as follows:
[0063] (13)
[0064] in, Indicates intersection, union, and ratio. ; This represents the Euclidean distance between the center point of the predicted bounding box and the center point of the true bounding box. This represents the diagonal length of the smallest bounding rectangle that encloses the two bounding boxes; Indicates the weighting coefficient. ; This indicates the consistency index of aspect ratio. ; This represents the coordinates of the center point of the predicted and ground truth bounding boxes; This indicates the width and height of the predicted bounding box; Represents the width and height of the actual bounding box;
[0065] and The weight optimization function for the loss is:
[0066] (14)
[0067] NWD (Normalized Wasserstein Distance) loss is a bounding box loss based on Wasserstein distance, which is more favorable for small targets and occluded targets. Its calculation formula is as follows:
[0068] (15)
[0069] in, These are the predicted and ground truth bounding boxes, respectively. Represented as The set of all joint probability distributions between them; Represented as Euclidean distance;
[0070] and The weight optimization function for the loss is:
[0071] (16)
[0072] A hybrid loss function is formed by combining CIoU loss and NWD loss:
[0073] (17)
[0074] in, This represents the weighting coefficient (set to 0.7 in this invention), i.e.:
[0075] (18)
[0076] The above evaluation items are weighted and aggregated according to preset weight coefficients to form the overall optimization objective. Based on this overall optimization objective, the model iteratively updates the network parameters through the backpropagation algorithm until convergence.
[0077] Preferably, in step S3, the collaborative optimization of classification and localization based on the task-aligned sample allocation strategy involves the following operations:
[0078] Candidate positive samples are selected based on spatial proximity. Each candidate sample is scored using a comprehensive evaluation index that integrates classification confidence and localization accuracy. Based on this index, the optimal candidate sample is selected as the positive sample for each real target. Conflict resolution rules are applied to handle cases where a sample matches multiple targets, ensuring the uniqueness of the assignment and guiding the model to prioritize learning samples that perform well in both classification and localization, thus enhancing the consistency between the two core tasks.
[0079] Preferably, in step S4, the model performance is verified and the image quality is quantitatively evaluated, a quality evaluation visualization is generated, and the following operations are performed:
[0080] The model is tested for performance using a model validation module and an independent validation dataset. This test calculates the spatial overlap between the predicted results and the ground truth labels and generates a quantitative evaluation result based on one or more evaluation thresholds.
[0081] The trained model is applied to the medical images to be evaluated to perform image quality level quantification. This process generates a quality assessment visualization by identifying and quantifying the spatial distribution and density of various predetermined abnormal regions in the image.
[0082] The visualization uses differentiated visual identifiers to distinguish different types of anomalies, intuitively presenting the location, type, and distribution intensity of abnormal areas in the image, providing a basis for decision-making in image quality control.
[0083] Compared with the prior art, the beneficial effects of the present invention are:
[0084] This invention employs the YOLO model to detect and statistically analyze artifacts in ABUS images. Leveraging the YOLO architecture, which includes trunk, neck, and head networks, artifact features can be effectively learned and high-precision artifact detection achieved. Finally, a heatmap is used to present the proportion of artifacts in the 3D image, assisting physicians in assessing ABUS imaging quality. This significantly reduces the workload and time cost for physicians interpreting images, achieving the goal of accurately identifying air leakage artifacts and reverberation artifacts in ultrasound images, thereby enabling quality control of 3D breast ultrasound images. Attached Figure Description
[0085] Figure 1 This is a schematic diagram of the breast artifact imaging method of the present invention;
[0086] Figure 2 This is a visualization of the model training labels for the present invention.
[0087] Figure 3 This is a flowchart of the ABUS image artifact rapid intelligent identification method of the present invention;
[0088] Figure 4This is a heatmap showing the artifact distribution of the present invention. Detailed Implementation
[0089] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0090] To address the challenges of reasonably quantifying artifact volume in existing breast ultrasound imaging techniques, which makes artifact detection in critical image regions difficult and quality control methods time-consuming, please refer to [the relevant documentation]. Figures 1-4 This embodiment provides the following technical solution:
[0091] A fast and intelligent method for identifying ABUS image artifacts includes:
[0092] S1: Preprocessing of ABUS three-dimensional breast ultrasound images;
[0093] The process involves acquiring 3D ABUS image data to be processed and converting it into a series of 2D image sequences through a data preprocessing workflow. This workflow first parses the 3D ABUS image data and slices it along its depth dimension to generate a set of 2D images while maintaining the spatial mapping relationship between each slice and the original 3D data. Then, all 2D images are uniformly adjusted to a preset target resolution.
[0094] To address the problem of uneven distribution of samples from different classes in the training data, this invention also includes a data balancing strategy. This strategy performs data augmentation processing on minority class samples with a small number of samples. The data augmentation processing comprehensively utilizes a variety of image transformation methods, including but not limited to geometric transformation operations (such as flipping or rotating) for changing the spatial layout of the image and photometric transformation operations (such as brightness or contrast adjustment) for adjusting the visual attributes of the image. Furthermore, the annotation information associated with the image is updated synchronously during the transformation to ensure that the augmented image is consistent with the label.
[0095] By manually annotating the preprocessed 2D image, a segmentation mask is generated to distinguish different target regions. Subsequently, a label generation process converts the segmentation mask into training labels required by the target detection model. This process first performs connected component analysis on each category region in the segmentation mask to identify individual target instances; then, it calculates the corresponding bounding box information for each target instance; next, it normalizes the coordinates of the bounding boxes, converting them into relative coordinate representations independent of image size; finally, it integrates the category identifier of each target with the normalized location information to generate structured label data containing category and location. This structured label data, along with the corresponding 2D image, is input into the model to guide the supervised learning process. The model's label information is as follows: Figure 2 As shown.
[0096] It should be noted that the object detection model is a real-time object detection algorithm based on deep learning. Its core concept is to transform the object detection task into a single regression problem, directly predicting the object's category and bounding box from the input image.
[0097] It should be noted that the core function of the neuron signal nonlinear transformation unit is to perform nonlinear transformation processing on the aggregated signal received by the neuron and output it to subsequent network layers, providing the network with nonlinear modeling capabilities to adapt to the learning of complex input-output mapping relationships. In object detection tasks, this transformation unit can adopt an appropriate signal normalization transformation strategy according to the category determination requirements: for binary category determination scenarios, a monotonic signal mapping strategy adapted to dual-class output is adopted; for multi-category determination scenarios, a normalized signal transformation strategy adapted to multi-category distribution is adopted. All of the above strategies achieve the quantitative expression of category attribution relationships through signal modulation, rather than being used as error measurement functions.
[0098] It should be noted that a mask is a binary image used to identify different objects or regions in an image. In digital image processing, a mask is usually an array or matrix of the same size as the original image, in which selected regions are labeled with specific values (such as 1 or True), while the remaining regions are labeled with other values (such as 0 or False).
[0099] It should be noted that identification, using artificial intelligence algorithms and techniques, is the problem of automatically identifying input data samples. If the data is identified into two categories, it is called binary classification; it can also be divided into multiple categories (such as three, four, five, or nine categories). The goal of this type of task is to train an artificial intelligence model that can accurately identify and classify input data into predefined categories. Specifically, artificial intelligence identification tasks typically involve the following aspects:
[0100] 1) Data input: The input for the recognition task can be various types of data. In this article, it specifically refers to two-dimensional breast ultrasound data. These data need to be properly preprocessed so that they can be accepted and processed by the artificial intelligence model.
[0101] 2) Category definition: In the identification task, the set of categories must be clearly defined. These categories are the targets predicted by the model. In this task, it specifically refers to the four types of artifacts in ultrasound breast images.
[0102] 3) Model training: The core of artificial intelligence recognition tasks is to train a model that can automatically map data to the correct category. This is usually done through supervised learning, that is, using a labeled dataset to train the model. During the training process, the model continuously adjusts its internal parameters through backpropagation and optimization algorithms to minimize the difference between the predicted category and the actual category.
[0103] 4) Model Evaluation: After training, an independent test set is needed to evaluate the model's performance. Evaluation metrics typically include accuracy, precision, recall, F1 score, etc. These metrics help to understand the model's performance on the recognition task.
[0104] 5) Model application: Once the model has been trained and validated, it can be applied to new unknown data for automatic identification, which can be applied to the quality control of three-dimensional breast volume ultrasound images.
[0105] S2: Forward propagation process;
[0106] Feature extraction in the middle network: The backbone network of this invention aims to achieve efficient hierarchical feature learning of input data through a multi-stage feature extraction architecture. This architecture begins with an initial downsampling unit, which gradually reduces the spatial dimension of the input image through a series of convolutional operations to extract basic low-level features. After each downsampling stage, a feature enhancement module is configured. This module adopts a branching processing path: one path directly passes the input features, while the other path performs complex nonlinear transformations through one or more deep feature extraction units. Subsequently, the outputs of the two paths are fused to generate enhanced features with rich details and high-level semantics. This design promotes the effective reuse of features and the smooth propagation of gradients. Through the above progressive processing, the network constructs a feature pyramid structure containing multiple levels. The feature blocks at different levels in this structure have different spatial resolutions and semantic depths, thereby capturing multi-scale visual information from fine-grained to coarse-grained. Finally, a multi-scale pooling module is applied to the highest-level feature blocks to aggregate global contextual information and further enhance the representational power of the features.
[0107] It should be noted that the feature extraction module is designed as a hierarchical information abstraction process, aiming to separate and encode the most discriminative information for subsequent analysis tasks from the received multi-dimensional raw data signals. To achieve this, the core of the module contains one or more cascaded hierarchical processing units. Each processing unit applies a set of learnable local feature extraction operators. These operators, as parameterized feature detectors, perform weighted transformation operations on local data regions by sliding across the entire dimension of the input data to respond to specific local patterns. Through this hierarchical processing, the raw data is progressively transformed into a series of intermediate feature representations. The semantic information they contain also evolves from low-level, concrete patterns to high-level, abstract patterns as the processing level deepens. Ultimately, the module outputs a set of high-level features containing rich contextual information, providing a solid foundation for subsequent information fusion and decision generation.
[0108] The feature transfer network aims to efficiently fuse multi-level features. Its design follows a lightweight principle to reduce computational overhead while ensuring fusion effectiveness. In this invention, a feature transfer network is incorporated into the detection head for small artifact detection. The module is shown in the following formula:
[0109] (1)
[0110] in, For input feature blocks; For the feature block of the i-th scale; For scale quantity; For output information layer; For dynamic interpolation fusion module; An adaptive weighting mechanism for multi-path coordinates; This is a multi-feature fusion module;
[0111] Dynamic interpolation fusion module The specific implementation formula is as follows:
[0112] (2)
[0113] in, This is the first feature block; This is the second feature block; Used to adjust the number of channels in a feature block; This is a function for adjusting the feature block size.
[0114] Multi-path coordinate adaptive weighting mechanism The specific implementation formula is as follows:
[0115] (3)
[0116] in, For input feature blocks; This is a horizontal weighted graph; This is a vertical weighted graph; Channel weighting graph; For conversion functions; This is the transpose of the vertical weights;
[0117] The formula for calculating the horizontal weighted chart is as follows:
[0118] (4)
[0119] in, For horizontal adaptive pooling, ; A convolutional layer that outputs horizontal weights;
[0120] The formula for calculating the horizontal weighted chart is as follows:
[0121] (5)
[0122] in, Vertical adaptive average pooling, ; Convolutional layers used to output vertical weights;
[0123] The formula for calculating the channel weights is as follows:
[0124] (6)
[0125] in, Global adaptive average pooling, ; Convolutional layers used for output channel weights;
[0126] Multi-feature fusion module The specific implementation formula is as follows:
[0127] (7)
[0128] in, Low-resolution feature blocks; High-resolution feature blocks; It is a convolutional layer; It is an interpolation function;
[0129] This network achieves feature interaction through a two-way information transmission path: from top to bottom, high-level semantic features are upscaled and then fused with mid- and low-level features step by step; from bottom to top, the fused features are downsampled to enhance high-level semantics. In this process, the feature blocks first undergo multi-scale feature extraction, as shown in the following formula:
[0130] (8)
[0131]
[0132] Next, feature fusion is performed, and the specific formula is as follows:
[0133] (9)
[0134] in, It is a multi-feature fusion module used to fuse features from adjacent scales;
[0135] Secondly, feature alignment is performed, using the following formula:
[0136] (10)
[0137] (11)
[0138] in, It is an interpolation function used to adjust all feature blocks to the same size;
[0139] Finally, a dynamic feature fusion module is introduced. This module not only integrates multi-scale information through an adaptive weighting mechanism, but also employs efficient computational units in its internal structure to achieve a balance between performance and efficiency. The formula for the dynamic feature fusion module is shown below:
[0140] (12)
[0141] in, It is a dynamic interpolation fusion module used to fuse features enhanced by an adaptive weighting mechanism with the original input features;
[0142] Finally, a set of lightweight augmented features with rich semantics and accurate spatial information is output, providing high-quality input for subsequent detection tasks.
[0143] It should be noted that the core configuration of the central feature extraction network includes basic processing modules such as the feature mapping and transformation unit and the feature sampling and condensation unit. It mainly undertakes the functions of multi-stage feature parsing and hierarchical progressive feature extraction of input data. Among them, the feature mapping and transformation unit realizes the initial feature extraction through spatial mapping and nonlinear transformation of multi-dimensional data, while the feature sampling and condensation unit optimizes the feature density through information aggregation and scale adjustment. Through the collaborative work of multiple modules, the transformation process from raw data to high-order abstract features is gradually completed.
[0144] It should be noted that the core configuration of the feature transfer processing network includes a feature dimension adaptation unit and a feature enhancement and integration unit. It mainly undertakes the secondary processing of the output information of the backbone feature extraction network and the function of multi-source feature fusion. Among them, the feature dimension adaptation unit achieves feature refinement by dynamically adjusting the dimensionality and information density of feature data, while the feature enhancement and integration unit combines multi-level feature aggregation strategy and key feature attention adaptation mechanism to integrate features at different levels in a correlated manner, and finally achieves the enhancement and optimization of feature expression capabilities.
[0145] S3: Backpropagation process;
[0146] The backpropagation process of this invention relies on a multi-component collaborative weight optimization function, which aims to comprehensively evaluate the model's performance in classification, localization, and distribution prediction. First, the model predictions are matched with the ground truth labels to determine the positive and negative samples used for supervised learning.
[0147] The weight optimization function includes at least the following three core evaluation terms: First, a classification error evaluation term, which measures the deviation between the predicted class confidence and the target class; second, a localization error evaluation term, which is calculated through a composite metric strategy. This strategy not only integrates multiple spatial relationship metrics but also models the coordinate representation of the bounding box as a probability distribution and optimizes localization accuracy by calculating the difference between the predicted distribution and the true distribution; and third, a distribution consistency evaluation term, which ensures that the discrete probability distribution predicted by the model can accurately fit the actual target location distribution.
[0148] This invention employs a method combining CIoU and NWD loss to optimize the innovative weight function, wherein the CIoU calculation formula is as follows:
[0149] (13)
[0150] in, Indicates intersection, union, and ratio. ; This represents the Euclidean distance between the center point of the predicted bounding box and the center point of the true bounding box. This represents the diagonal length of the smallest bounding rectangle that encloses the two bounding boxes; Indicates the weighting coefficient. ; This indicates the consistency index of aspect ratio. ; This represents the coordinates of the center point of the predicted and ground truth bounding boxes; This indicates the width and height of the predicted bounding box; Represents the width and height of the actual bounding box;
[0151] and The weight optimization function for the loss is:
[0152] (14)
[0153] NWD (Normalized Wasserstein Distance) loss is a bounding box loss based on Wasserstein distance, which is more favorable for small targets and occluded targets. Its calculation formula is:
[0154] (15)
[0155] in, These are the predicted and ground truth bounding boxes, respectively. Represented as The set of all joint probability distributions between them; Represented as Euclidean distance;
[0156] and The weight optimization function for the loss is:
[0157] (16)
[0158] In this invention, CIoU loss and NWD loss are combined to form a hybrid loss function:
[0159] (17)
[0160] in, This represents the weighting coefficient (set to 0.7 in this invention), i.e.:
[0161] (18)
[0162] Finally, the above evaluation items are weighted and aggregated according to the preset weight coefficients to form the overall optimization objective. Based on this overall optimization objective, the model iteratively updates the network parameters through the backpropagation algorithm until convergence.
[0163] This invention employs a task-aligned sample allocation strategy to achieve collaborative optimization of classification and localization. This strategy first selects candidate positive samples based on spatial proximity, then scores each candidate sample using a comprehensive evaluation metric that integrates classification confidence and localization accuracy. Based on this metric, the optimal candidate sample is selected as the positive sample for each real target, and conflict resolution rules are applied to handle cases where a sample matches multiple targets, ensuring the uniqueness of the allocation. In this way, the model is guided to prioritize learning from samples that perform well in both classification and localization, thereby enhancing the consistency between the two core tasks.
[0164] It should be noted that the parameter calibration function is used to optimize and adjust the model. Its core function is to characterize the degree of deviation between the model's prediction results and the real target information. In the target detection task, this parameter calibration function can be divided into a category determination error measurement unit and a location matching error measurement unit. The category determination error measurement unit adopts an error quantization form adapted to the category differentiation task, while the location matching error measurement unit adopts a composite error quantization form that integrates spatial overlap and distribution distance features.
[0165] S4: To verify model performance and achieve quantitative evaluation of image quality, this invention constructs a comprehensive evaluation system. First, the model performance is tested using an independent validation dataset through a model validation module. This test calculates the spatial overlap between the predicted results and the ground truth annotations, and generates quantitative evaluation results such as a comprehensive accuracy index based on one or more evaluation thresholds. Subsequently, the trained model is applied to the medical images to be evaluated to perform image quality level quantification. This process identifies and quantifies the spatial distribution and density of various predetermined abnormal regions in the image, generating a quality evaluation visualization. The visualization uses differentiated visual identifiers to distinguish different categories of abnormalities, thereby intuitively presenting the location, type, and distribution intensity of abnormal regions in the image. (The heatmap is shown below.) Figure 4 As shown, this provides a basis for decision-making in image quality control.
[0166] It should be noted that breast artifacts are classified into four levels: Class 0, Class 1, Class 2, and Class 3, which are explained in detail below:
[0167] Category 0: Air leak artifacts are caused by incomplete contact between the transducer and skin during the initial acquisition of breast volume ultrasound images. Visualization of air leak artifacts includes... Figure 1 As shown in (a);
[0168] Type 1: Reverb artifacts. These are artifacts caused by air leakage due to incomplete adhesion between the transducer and the skin, preventing sound waves from penetrating behind the skin. The visualization of reverberation artifacts is as follows: Figure 1 As shown in (a);
[0169] Type 2: Small air leak artifacts originate from uneven application of coupling agent and the potential presence of air bubbles during the acquisition process. Visualization of small air leak artifacts is as follows: Figure 1 As shown in (b);
[0170] Type 3: Small reverb artifacts. Because the acoustic signal from the transducer in the area surrounding the nipple cannot penetrate the nipple, a leakage artifact of a small target is generated in the area behind the nipple. Visualization of small target reverberation artifacts is as follows: Figure 1 As shown in (b).
[0171] In summary, this invention, based on the YOLO model, achieves the goal of accurately identifying air leakage artifacts and reverberation artifacts in ultrasound images, thereby enabling quality control of 3D breast ultrasound images.
[0172] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.
[0173] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A fast and intelligent method for identifying ABUS image artifacts, characterized in that, include: S1: Acquire the 3D ABUS image data to be processed, preprocess the 3D ABUS image data, and generate a 2D image and structured label data containing category and location; S2: Feature extraction is performed based on the central network, and multi-level features are efficiently fused based on the feature transfer network to generate lightweight enhanced features with rich semantics and accurate spatial information. S3: The weight optimization function based on multi-component collaboration comprehensively evaluates the model's performance in classification, localization, and distribution prediction. The network parameters are iteratively updated until convergence is achieved through the backpropagation algorithm. S4: Verify model performance and implement quantitative evaluation of image quality, generating a quality evaluation visualization.
2. The method for rapid intelligent identification of ABUS image artifacts according to claim 1, characterized in that, In step S1, the 3D ABUS image data is preprocessed by performing the following operations: The 3D ABUS image data is analyzed, sliced along its depth dimension, and a set of 2D images is generated. The spatial mapping relationship between each slice and the original 3D data is maintained, and all 2D images are uniformly adjusted to the preset target resolution. The preprocessed 2D image is manually annotated to generate a segmentation mask for distinguishing different target regions. The segmentation mask is then converted into training labels required by the target detection model through a label generation process. Specifically, connected component analysis is performed on each category region in the segmentation mask to determine each independent target instance, and the corresponding bounding box information is calculated for each target instance; the coordinates of the bounding box are normalized and converted into relative coordinate representations independent of image size; the category identifier of each target is integrated with the normalized position information to generate structured label data containing category and position, which is input into the model along with the corresponding two-dimensional image to guide the supervised learning process of the model.
3. The method for rapid intelligent identification of ABUS image artifacts according to claim 2, characterized in that, Data augmentation is performed on minority class samples with a small number of samples, including but not limited to geometric transformation operations to change the spatial layout of the image and photometric transformation operations to adjust the visual attributes of the image. The annotation information associated with the image is updated synchronously during the transformation to ensure that the augmented image is consistent with the label.
4. The method for rapid intelligent identification of ABUS image artifacts according to claim 3, characterized in that, In step S2, feature extraction is performed based on the central network, and the following operations are performed: The multi-stage feature extraction architecture enables efficient hierarchical feature learning of the input data. This architecture begins with an initial downsampling unit, which gradually reduces the spatial dimension of the input image through a series of convolutional operations to extract basic low-level features. After each downsampling stage, a feature enhancement module is configured. This module adopts a branching processing path: one path directly passes the input features, while the other path performs complex nonlinear transformations through one or more deep feature extraction units. The outputs of the two paths are then fused to generate enhanced features with rich details and high-level semantics. At the highest level of feature blocks, a multi-scale pooling module is applied to aggregate global context information and enhance the representational power of the features.
5. The method for rapid intelligent identification of ABUS image artifacts according to claim 4, characterized in that, In step S2, based on the efficient fusion of multi-level features using a feature transfer network, the following operations are performed: The detection head used for small artifact detection incorporates... The module, and the formula, are shown below: (1); in, For input feature blocks; For the feature block of the i-th scale; For scale quantity; For output information layer; For dynamic interpolation fusion module; An adaptive weighting mechanism for multi-path coordinates; This is a multi-feature fusion module; Dynamic interpolation fusion module The specific implementation formula is as follows: (2); in, This is the first feature block; This is the second feature block; Used to adjust the number of channels in a feature block; This is a function for adjusting the feature block size. Multi-path coordinate adaptive weighting mechanism The specific implementation formula is as follows: (3); in, For input feature blocks; This is a horizontal weighted graph; This is a vertical weighted graph; Channel weighting graph; For conversion functions; This is the transpose of the vertical weights; The formula for calculating the horizontal weighted chart is as follows: (4); in, For horizontal adaptive pooling, ; A convolutional layer that outputs horizontal weights; The formula for calculating the horizontal weighted chart is as follows: (5); in, Vertical adaptive average pooling, Convolutional layers used to output vertical weights; The formula for calculating the channel weights is as follows: (6); in, Global adaptive average pooling, ; Convolutional layers used for output channel weights; Multi-feature fusion module The specific implementation formula is as follows: (7); in, Low-resolution feature blocks; High-resolution feature blocks; It is a convolutional layer; It is an interpolation function; This network achieves feature interaction through a bidirectional information transmission path: from top to bottom, high-level semantic features are upgraded in resolution and then fused with mid- and low-level features step by step; from bottom to top, the fused features are downsampled to enhance high-level semantics. In this process, feature blocks first undergo multi-scale feature extraction, as shown in the following formula: (8); ; Next, feature fusion is performed, and the specific formula is as follows: (9); in, It is a multi-feature fusion module used to fuse features from adjacent scales; Secondly, feature alignment is performed, using the following formula: (10); (11); in, It is an interpolation function used to adjust all feature blocks to the same size; Finally, a dynamic feature fusion module is introduced. This module not only integrates multi-scale information through an adaptive weighting mechanism, but also employs efficient computational units in its internal structure to achieve a balance between performance and efficiency. The formula for the dynamic feature fusion module is shown below: (12); in, It is a dynamic interpolation fusion module used to fuse features enhanced by an adaptive weighting mechanism with the original input features; Finally, a set of lightweight augmented features with rich semantics and accurate spatial information is output to provide high-quality input for subsequent detection tasks.
6. The method for rapid intelligent identification of ABUS image artifacts according to claim 5, characterized in that, In step S3, the performance of the model in classification, localization, and distribution prediction is comprehensively evaluated based on the weight optimization function of multi-component collaboration, and the following operations are performed: The model predictions are matched with the ground truth labels to determine the positive and negative samples used for supervised learning. This weight optimization function must include at least the following three core evaluation terms: The first is the classification error evaluation item, which measures the deviation between the predicted class confidence and the target class; The second is the positioning error evaluation item, which is calculated through a composite measurement strategy. This strategy not only integrates multiple spatial relationship measurement standards, but also models the coordinate representation of the bounding box as a probability distribution, and optimizes the positioning accuracy by calculating the difference between the predicted distribution and the true distribution. The third is the distribution consistency evaluation term, which is used to ensure that the discrete probability distribution predicted by the model can accurately fit the actual target location distribution; An innovative weight optimization function is adopted by combining CIoU and NWD loss, where CIoU is calculated as follows: (13); in, Indicates intersection, union, and ratio. ; This represents the Euclidean distance between the center point of the predicted bounding box and the center point of the true bounding box. This represents the diagonal length of the smallest bounding rectangle that encloses the two bounding boxes; Indicates the weighting coefficient. ; This indicates the consistency index of aspect ratio. ; This represents the coordinates of the center point of the predicted and ground truth bounding boxes; This indicates the width and height of the predicted bounding box; Represents the width and height of the actual bounding box; and The weight optimization function for the loss is: (14); NWD (Normalized Wasserstein Distance) loss is a bounding box loss based on Wasserstein distance, which is more favorable for small targets and occluded targets. Its calculation formula is as follows: (15); in, These are the predicted and ground truth bounding boxes, respectively. Represented as The set of all joint probability distributions between them; Represented as Euclidean distance; and The weight optimization function for the loss is: (16); A hybrid loss function is formed by combining CIoU loss and NWD loss: (17); in, This represents the weighting coefficient (set to 0.7 in this invention), i.e.: (18); The above evaluation items are weighted and aggregated according to preset weight coefficients to form the overall optimization objective. Based on this overall optimization objective, the model iteratively updates the network parameters through the backpropagation algorithm until convergence.
7. The method for rapid intelligent identification of ABUS image artifacts according to claim 6, characterized in that, In step S3, a collaborative optimization of classification and localization is performed based on a task-aligned sample allocation strategy, and the following operations are performed: Candidate positive samples are selected based on spatial proximity. Each candidate sample is scored using a comprehensive evaluation index that integrates classification confidence and localization accuracy. Based on this index, the optimal candidate sample is selected as the positive sample for each real target. Conflict resolution rules are applied to handle cases where a sample matches multiple targets, ensuring the uniqueness of the assignment and guiding the model to prioritize learning samples that perform well in both classification and localization, thus enhancing the consistency between the two core tasks.
8. The method for rapid intelligent identification of ABUS image artifacts according to claim 7, characterized in that, In step S4, the model performance is verified and the image quality is quantitatively evaluated, a quality evaluation visualization is generated, and the following operations are performed: The model is tested for performance using a model validation module and an independent validation dataset. This test calculates the spatial overlap between the predicted results and the ground truth labels and generates a quantitative evaluation result based on one or more evaluation thresholds. The trained model is applied to the medical images to be evaluated to perform image quality level quantification. This process generates a quality assessment visualization by identifying and quantifying the spatial distribution and density of various predetermined abnormal regions in the image. The visualization uses differentiated visual identifiers to distinguish different types of anomalies, intuitively presenting the location, type, and distribution intensity of abnormal areas in the image, providing a basis for decision-making in image quality control.