Breast tumor ai-assisted diagnosis method and system

By combining multimodal ultrasound image acquisition and AI-assisted diagnostic systems with deep learning models and pathological knowledge, the problem of subjective differences in breast tumor diagnosis has been solved, achieving objective, accurate, and efficient breast tumor diagnosis and supporting system self-optimization.

CN122290948APending Publication Date: 2026-06-26HANGZHOU BURAN TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HANGZHOU BURAN TECH CO LTD
Filing Date
2026-03-10
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Current breast tumor diagnosis relies on doctors' subjective experience, which leads to discrepancies and inconsistencies in interpretation. The lack of objective and quantitative judgment criteria results in high false positive and false negative rates and low diagnostic efficiency.

Method used

The system uses a multimodal ultrasound host to acquire B-mode, elastography, and Doppler images. It then uses an AI inference engine to perform image registration and feature extraction. Combined with a deep learning model optimized based on pathological knowledge, it provides objective diagnostic suggestions and supports interactive corrections by doctors, establishing a data loop for model optimization.

Benefits of technology

It has achieved standardization and consistency in the diagnosis of breast tumors, improved diagnostic accuracy and efficiency, reduced false positive and false negative rates, supported continuous learning and optimization, and improved diagnostic reliability through human-machine collaboration.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122290948A_ABST
    Figure CN122290948A_ABST
Patent Text Reader

Abstract

This invention provides an AI-assisted diagnostic method and system for breast tumors. Through innovative hardware and software integration design, advanced multimodal fusion AI algorithms, and unique optimization strategies based on pathological knowledge, it systematically solves the core pain points of existing technologies, such as strong subjectivity, low efficiency, and poor consistency. It provides a complete and advanced technical path for achieving accurate, efficient, and standardized ultrasound diagnosis of breast tumors.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent medical technology, and in particular to an AI-assisted diagnostic method and system for breast tumors, an electronic device, and a computer-readable storage medium. Background Technology

[0002] Breast cancer is one of the most common malignant tumors in women worldwide, and its early and accurate diagnosis is crucial for treatment planning and patient prognosis. Currently, clinical diagnosis of breast cancer mainly relies on breast ultrasound imaging technology. The standard diagnostic procedure typically includes the following three key parts: 1. Mode B ultrasound imaging: Provides morphological information about the tumor, such as whether the edges are smooth (spiculated, lobulated), whether the internal echoes are homogeneous, whether the aspect ratio is greater than 1, and whether there are microcalcifications. Doctors use these characteristics to subjectively classify the tumor into BI-RADS grades (0-6), which is the core basis for assessing the risk of malignancy.

[0003] 1. Elastography Analysis: This includes strain elastography and shear wave elastography. By applying pressure or acoustic radiation force to tissue, the degree of deformation or shear wave velocity is measured, indirectly reflecting the tissue's stiffness. Malignant tumors are typically harder, exhibiting a high elasticity score (e.g., 4-5 on the Tsukuba 5-point scale) or an increased shear wave velocity / Young's modulus value. This process involves "elasticity detection after appropriate pressure, deformation feedback," and the stiffness is overlaid as a color map (hardness bars of varying shades) onto the B-mode image.

[0004] 3. Color Doppler flow imaging: Detects blood flow signals inside and around the tumor. Malignant masses are usually metabolically active, which stimulates the formation of new blood vessels, resulting in "rich blood flow signals" and a potentially high blood flow resistance index (RI).

[0005] The aforementioned existing technologies suffer from the following core pain points: Current diagnostic procedures heavily rely on the subjective experience of ultrasound physicians for image interpretation and comprehensive scoring. Interpretations can vary between physicians (inter-observer variability), and even the same physician may exhibit inconsistencies at different times (intra-observer variability). Manually measuring tumor size, assessing elasticity, and delineating blood flow areas is not only inefficient and prolongs examination time for individual patients, but more importantly, it lacks standardized, objective, and quantifiable judgment criteria. Although the BI-RADS classification provides a framework, subjective gray areas remain regarding specific aspects such as the degree of edge spiculation, the definition of elasticity color intensity, and the threshold for "rich" blood flow. This subjectivity can lead to unnecessary biopsies (false positives) or delayed diagnosis of malignant lesions (false negatives).

[0006] Therefore, there is an urgent need for an auxiliary tool that can integrate multimodal ultrasound information and provide objective, accurate, and efficient interpretation in order to standardize the diagnostic process and improve the consistency and accuracy of diagnosis. Summary of the Invention

[0007] To address the technical problems existing in the prior art, the present invention provides the following technical solution: On the one hand, an AI-assisted diagnostic system for breast tumors is provided, including: A multimodal ultrasound host is used to acquire B-mode images, elastography images, and Doppler blood flow images of the breast. A central control and preprocessing server is used to register and preprocess the multimodal images; The AI ​​inference engine is used to perform tumor segmentation, feature extraction, and classification prediction on registered multimodal images based on a trained deep learning model. The physician diagnostic terminal is used to display AI analysis results and support interactive corrections by doctors. The central control and data management server is used to manage patient information, task scheduling, and report generation; A cloud-based model management platform for model version management, data anonymization and aggregation, and model iteration and optimization.

[0008] Preferably, the multimodal ultrasound host supports shear wave elastography and color Doppler imaging, and can output raw radio frequency data or baseband IQ data.

[0009] Preferably, the central control and preprocessing server includes a GPU computing unit for performing non-rigid registration of multimodal images and AI model inference.

[0010] Preferably, the AI ​​inference engine includes a dual-stream fusion network with a shared encoder, independent encoders, a cross-modal attention fusion layer, and a segmentation head and a classification head.

[0011] Preferably, the system further includes a pathological knowledge optimization module, which integrates pathological slide features into the ultrasound AI model through knowledge distillation to improve the model's recognition accuracy.

[0012] On the other hand, an AI-assisted diagnostic method for breast tumors is provided, implemented based on the system described above, including the following steps: Acquire multimodal ultrasound images of the breast, including B-mode images, elastography images, and Doppler flow images; The multimodal images are registered and preprocessed to align them spatially; The registered images are then input into a trained AI model for tumor segmentation, feature extraction, and classification prediction. The AI ​​analysis results are presented to doctors in a visual format, and interactive corrections are supported. A structured diagnostic report is generated based on the doctor's confirmation.

[0013] Preferably, the AI ​​model is a two-stream fusion network, including a shared encoder, an independent encoder, a cross-modal attention fusion layer, a segmentation decoder, and a classification head.

[0014] Preferably, the method further includes optimizing the AI ​​model using pathological slide images and constraining the alignment of ultrasound features with pathological features through a knowledge distillation loss function.

[0015] Preferably, the multimodal image registration adopts a free deformation model based on B-splines and uses normalized mutual information as a similarity measure.

[0016] Preferably, the method further includes establishing an "image-pathology" data closed loop, continuously collecting clinical pathology data and using iterative optimization of the model to achieve self-evolution of the diagnostic system.

[0017] On the other hand, an electronic device is provided, comprising: a processor; and a memory storing computer-readable instructions, which, when executed by the processor, implement the method described above.

[0018] On the other hand, a computer-readable storage medium is provided, wherein at least one instruction is stored therein, the at least one instruction being loaded and executed by a processor to implement the above method.

[0019] The beneficial effects of the technical solutions provided in the embodiments of the present invention include at least the following: 1. Objectification and standardization to reduce subjective differences: This solution provides quantitative morphological, elastic, and blood flow characteristic values ​​and probabilistic diagnostic suggestions through AI models. It transforms traditional descriptions that rely on doctors' subjective experience (such as "slightly rough edges" or "rich blood flow") into objective numerical indicators (such as "spiculation index = 1.8" or "blood flow area ratio = 35%), which greatly reduces the interpretation differences between different doctors and between different times of the same doctor, and promotes the standardization of breast ultrasound diagnosis.

[0020] 2. Multimodal Deep Information Fusion for Enhanced Diagnostic Accuracy: Traditional methods rely on doctors to synthesize information from three images in their minds, which can easily lead to overlooking certain aspects. This solution utilizes a deep learning model to achieve automated and intelligent deep fusion of ultrasound, elasticity, and blood flow information at the pixel level, enabling the discovery of cross-modal correlation patterns that are difficult for the human eye to detect. Combined with optimization based on pathological knowledge, the model learns features that more closely approximate the essence of the disease, significantly improving the ability to identify early-stage and atypical malignant tumors, and potentially reducing false negative and false positive rates.

[0021] 3. Significantly improve diagnostic efficiency and optimize workflow: AI models automatically complete tumor localization, segmentation, measurement, and feature analysis within seconds, freeing doctors from tedious manual measurements, image editing, and descriptive text input. One-click generation of structured reports reduces report writing time from minutes to tens of seconds, allowing doctors to focus more on image interpretation and patient communication, thereby improving the overall throughput of departmental diagnosis and treatment.

[0022] 4. Achieving Evolvable Intelligent Diagnosis: Unlike traditional static diagnostic rules or scales, this solution establishes a closed-loop "clinical-pathological" data system, enabling the system to continuously optimize itself using the latest diagnostic gold standards. This means that the system's diagnostic capabilities can continuously grow with the accumulation of hospital cases, always keeping pace with the latest clinical practice and pathological understanding, and possessing long-term viability.

[0023] 5. Human-Machine Collaboration: Empowering, Not Replacing, Doctors: The system design is always centered on the doctor. AI provides rapid, objective "second opinions" and quantitative evidence, but the final diagnostic authority remains with the doctor. The interactive interface allows doctors to easily correct AI results and make decisions based on more comprehensive clinical information. This collaborative model leverages the computational advantages of AI while respecting and integrating the clinical wisdom and responsibility of doctors, making it more easily accepted and promoted in clinical practice. Attached Figure Description

[0024] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0025] Figure 1 This is a hardware system composition diagram provided in an embodiment of the present invention; Figure 2 This is a schematic diagram of a software service architecture provided by an embodiment of the present invention; Figure 3 This is a schematic diagram of the mechanism of the AI-assisted diagnosis model for breast tumors provided in an embodiment of the present invention; Figure 4 This is a schematic diagram of the method flow provided in an embodiment of the present invention. Detailed Implementation

[0026] The technical solution of the present invention will now be described with reference to the accompanying drawings.

[0027] In embodiments of the present invention, words such as "exemplarily," "for example," etc., are used to indicate that something is an example, illustration, or description. Any embodiment or design described as "exemplary" in the present invention should not be construed as being more preferred or advantageous than other embodiments or designs. Specifically, the use of the word "exemplary" is intended to present the concept in a concrete manner. Furthermore, in embodiments of the present invention, the meaning expressed by "and / or" can be both, or either one.

[0028] In the embodiments of this invention, the terms "image" and "picture" may sometimes be used interchangeably. It should be noted that, without emphasizing the distinction between them, they convey the same meaning. Similarly, the terms "of," "corresponding (relevant)," and "corresponding" may sometimes be used interchangeably. It should be noted that, without emphasizing the distinction between them, they convey the same meaning.

[0029] In this embodiment of the invention, sometimes a subscript such as W1 may be mistakenly written as a non-subscript form such as W1. When the difference is not emphasized, the meaning they express is the same.

[0030] To make the technical problems, technical solutions and advantages of the present invention clearer, a detailed description will be given below in conjunction with the accompanying drawings and specific embodiments.

[0031] This solution aims to build an artificial intelligence-based breast tumor auxiliary diagnostic system to address the aforementioned issues.

[0032] I. Overall System Architecture Introduction This system, the "Multimodal AI-Assisted Diagnostic System for Breast Tumors," is a deeply integrated hardware and software solution designed to automate and intelligently integrate the entire process of ultrasound, elastic, and Doppler data acquisition, AI intelligent analysis, and structured report generation.

[0033] (I) System Design Objectives 1. Multimodal data fusion: Seamlessly acquire and synchronize B-mode, elasticity map, color Doppler image and raw RF / fundamental wave data.

[0034] 1. End-to-end AI analysis: Develop deep learning models to automatically identify tumor regions and extract quantitative indicators that integrate morphological, stiffness, and blood flow characteristics.

[0035] 3. Objective grading and risk assessment: Based on features extracted by AI, output objective BI-RADS grading suggestions, probability of malignant risks, and descriptions of key features.

[0036] 4. Human-machine collaborative workflow: Present AI analysis results to doctors in an intuitive and interactive way to assist them in making final decisions and generate structured reports with one click.

[0037] 5. Continuous learning and optimization: The system supports iterative optimization of the AI ​​model using subsequently acquired pathological gold standard data to form a diagnostic closed loop.

[0038] (II) Hardware System Composition and Data Communication The core of the hardware system is an enhanced breast ultrasound diagnostic workstation, whose architecture is as follows: Figure 1 As shown.

[0039] Detailed introduction of hardware system: 1. Multimodal ultrasound host: Core component: A high-end ultrasound host that supports B-mode imaging, shear wave elastography, and color / energy Doppler imaging.

[0040] Key components: Ultrasound probes: Frequency range coverage (e.g., 5-14 MHz) for high-quality B-mode and elastography.

[0041] High-performance beamformers and digital signal processors enable fast, high-resolution image reconstruction, especially for shear wave elastic imaging that requires multiple instantaneous transmissions.

[0042] Data Interface: Provides output ports for raw RF data and demodulated baseband IQ data, which forms the basis for subsequent advanced AI feature extraction, superior to using only compressed display images.

[0043] Communication method: The ultrasound probe is connected to the host via a dedicated coaxial cable. Inside the host, each processing module communicates with the central processing unit via a high-speed backplane bus (such as PCIe). The host sends synchronized, timestamp-aligned B-mode, elasticity (Young's modulus map), and Doppler (blood flow velocity / energy map) data streams to the computing unit via a 10 Gigabit fiber optic network card or a dedicated PCIe expansion card.

[0044] 2. Central control and preprocessing server: Core: A server equipped with a high-performance GPU (such as NVIDIA RTX A6000) is deployed locally in the ultrasound room.

[0045] Function: Data reception and buffering: Real-time reception of multimodal data streams from the ultrasound host.

[0046] Data preprocessing: Image standardization (grayscale normalization), noise reduction, and crucial multimodal image registration. Since B-mode, elastography, and Doppler images may exhibit pixel-level shifts due to respiration or slight movement, precise alignment using affine transformation or elastography algorithms is essential to ensure accurate correspondence of these three types of information at the same anatomical location.

[0047] AI model inference: Run the trained AI-assisted diagnosis model for breast tumors to perform real-time or near-real-time analysis on the preprocessed image sequences.

[0048] Communication method: It exchanges data at high speed with its own CPU / memory via the PCIe bus. It also communicates with the central server and diagnostic terminal via a network.

[0049] 3. Central control and data management server: Core: Servers deployed on the departmental network run centralized software services.

[0050] Function: Patient information management: It interfaces with the hospital information system (HIS) and the picture archiving system (PACS) to obtain basic patient information and historical records.

[0051] Task scheduling: Assign inspection tasks to the corresponding ultrasound host-AI computing unit pairs.

[0052] Results storage and forwarding: Receive the preliminary analysis results from the AI ​​unit and store them in the PACS along with the original images, while simultaneously pushing them to the physician's diagnostic terminal.

[0053] Workflow Engine: Manages the entire process from application to report issuance.

[0054] Communication method: Interact with all terminals and devices via the hospital's local area network.

[0055] 4. Physician Diagnostic Terminal: Core: A computer workstation equipped with a high-resolution medical monitor and dedicated diagnostic software.

[0056] Functions: Allows ultrasound doctors to operate ultrasound equipment, view AI analysis results, make interactive corrections, and generate and issue reports.

[0057] Communication method: Tasks and results are obtained from the central server via a local area network.

[0058] 5. Cloud-based model management and training platform (optional but important): Core: A service platform deployed outside the hospital firewall or on the hospital's private cloud.

[0059] Function: Model version management: Store and manage different versions of AI models.

[0060] Secure data anonymization and aggregation: After obtaining informed consent and ethical approval from patients, and after de-identification processing, diagnostic data and corresponding pathological results are collected.

[0061] Model retraining and optimization: Using newly collected pathological annotation data, the model is iteratively optimized periodically or triggered to generate an improved version, which is then securely distributed to the edge AI unit for updates.

[0062] Communication method: Limited and secure data synchronization and model updates with the institute's internal system via encrypted VPN or dedicated line.

[0063] (III) Software System Architecture and Interaction like Figure 2 As shown, the software system adopts a microservice architecture, and its main modules include: 1. Equipment Drive and Data Acquisition Service: Controls the ultrasound host to synchronously acquire multimodal raw data. Deployment Location: The multimodal ultrasound host has a built-in embedded system or an external control unit; it directly controls the scanning parameters of the ultrasound probe (frequency, depth, etc.), synchronously acquires B-mode, elastic, and Doppler raw data, and transmits them to the preprocessing server via high-speed interfaces (such as PCIe, 10 Gigabit fiber).

[0064] 2. Data Preprocessing and Registration Service: Implements the image standardization and registration algorithms described above. Deployment Location: Central control and preprocessing server (local server equipped with GPU), utilizing server GPU resources for image standardization, noise reduction, and non-rigid registration calculations to ensure spatial alignment of multimodal images.

[0065] 3. AI Inference Engine Service: This service encapsulates deep learning models and provides RESTful APIs or gRPC interfaces for invocation. It receives registered multimodal image patches and returns tumor segmentation masks, feature vectors, and classification results. Deployment location: GPU-accelerated computing unit (connected to the preprocessing server via a PCIe bus); it loads deep learning models (such as dual-stream fusion networks), performs real-time inference on the registered images, and outputs segmentation masks and classification results.

[0066] 4. Feature Analysis and Report Generation Service: This service transforms the feature vectors output by the AI ​​into clinically readable descriptions (e.g., "spiculated edges, elasticity score of 5, rich blood flow RI=0.78"), and inputs them into a structured report template. Deployment location: Central control and data management server (department-level server). It receives the AI ​​inference results, converts the feature vectors into clinical descriptions (e.g., "spiculated edges," "RI=0.78"), and calls the hospital's PACS / HIS interface to generate a structured report.

[0067] 5. Human-Computer Interaction Client Software: Provides a graphical interface and displays: Display B-mode, elastic, and Doppler images side-by-side or merged.

[0068] The tumor region is automatically segmented by AI and displayed by overlaying heat maps or outlines.

[0069] The sidebar displays the quantitative features extracted by AI, BI-RADS classification suggestions, and the probability of malignancy (0-100%).

[0070] Doctors can drag sliders to adjust confidence thresholds, manually modify segmentation contours, and select different diagnostic opinions.

[0071] Generate reports with one click and send them to PACS / HIS.

[0072] Deployment location: Physician diagnostic terminal (high-resolution medical monitor workstation); Interaction method: Receives AI analysis results pushed by the central server via local area network, and provides a visual interface for doctors to adjust and confirm diagnostic conclusions.

[0073] 6. Model Management Platform Service: Responsible for model deployment, version control, performance monitoring, and synchronization with the cloud platform. Deployment locations can be as follows: Local end: Central control server (responsible for model deployment, version switching and performance monitoring); Cloud: Hospital private cloud or third-party secure cloud platform (responsible for model training, optimization and version management).

[0074] The software's control logic over the hardware is as follows: 1. The doctor selects a patient and begins the examination at the terminal; 2. The terminal software sends instructions to the data acquisition service through the central server to start the B-mode, elastography, and Doppler scanning sequences of the designated ultrasound host; 3. The ultrasound host begins data acquisition and streams the raw data to the preprocessing service via the data acquisition service; 4. After the preprocessing service completes the registration, it calls the AI ​​inference engine service; 5. The AI ​​inference engine service loads the model and performs inference on the GPU computing unit, and returns the results; 6. The feature analysis service processes the results and pushes them to the physician's terminal software for real-time display via the central server; 7. After the doctor confirms or modifies the report, the terminal software calls the report generation service, and the final report is stored and sent.

[0075] Specifically: (1) Data acquisition and transmission process Triggering and initiation: The doctor selects the patient and initiates the examination on the physician diagnosis terminal. The client software sends instructions to the device driver service through the central server to start the multimodal scanning sequence (B-mode, elastography, Doppler) of the ultrasound host.

[0076] Raw data acquisition and synchronization: The ultrasound host controls the probe to acquire data, generate time-stamped B-mode images, elasticity maps (Young's modulus values), and Doppler blood flow maps, and push them to the preprocessing server in real time via 10 Gigabit fiber optic / PCIe.

[0077] (2) Data processing and AI inference process Preprocessing and registration: The data preprocessing service performs grayscale normalization and noise reduction on the received multimodal data, and aligns the elastic map and Doppler map to the anatomical coordinate system of the B-mode image through a non-rigid registration algorithm based on B-splines.

[0078] AI model inference: Registered image patches are transmitted to the GPU-accelerated computing unit via the PCIe bus. The AI ​​inference engine service loads the model and performs forward propagation. Shared encoders extract low-level features, while independent encoders generate modality-specific high-level features (such as ultrasound morphology features and elastic stiffness features). Dynamically weighted fusion features across modal attention layers output tumor segmentation probability maps and classification results (BI-RADS grading, malignancy probability).

[0079] (3) Results feedback and diagnosis closed loop Results visualization and interaction: AI analysis results (segmentation contours, quantified features, and grading suggestions) are pushed to the physician's diagnostic terminal via a central server. The client software displays the results in the form of heatmap overlays and sidebar values. Physicians can correct the segmentation contours or adjust feature weights through the interface, and the system updates the diagnostic results in real time.

[0080] Report generation and data closed loop: After the doctor confirms the diagnosis, the report generation service automatically fills in the structured template and sends it to the PACS / HIS system via the hospital's local area network. Pathology results are anonymized and uploaded to the cloud-based model management platform for model iteration and optimization. New versions are distributed to the local AI unit via an encrypted channel.

[0081] The central server allocates resources to the ultrasound host and GPU computing units through a task scheduling module to avoid conflicts. A streaming data transmission and parallel computing architecture is adopted to ensure that the latency from data acquisition to AI result feedback is controlled within 2 seconds (meeting clinical real-time requirements). Through the above deployment and interaction design, the software system achieves fully automated processing of multimodal data, while simultaneously ensuring data security and continuous model evolution through a local-cloud collaborative architecture.

[0082] (iv) Core AI Model: Technical Principles and Algorithm Details like Figure 3 As shown, the core of this system is a deep learning model that fuses multimodal features (AI-assisted diagnosis model for breast tumors).

[0083] 1. Overall Model Architecture The AI-assisted diagnostic model for breast tumors in this invention is a two-stream fusion network. Its basic process is: input -> single-modal feature extraction -> cross-modal attention feature fusion -> joint feature learning and output.

[0084] (1) The model adopts an architecture with a shared-independent encoder and a fused decoder. Assume the input is a registered image triple: , respectively representing ultrasound image, elastography, and Doppler image.

[0085] (1). Shared Feature Encoder (Low-Level Feature Extraction): A shallow convolutional neural network (CNN) used to extract low-level, common image features (such as edges, textures, etc.) from all modalities. Let its function be... ,but .

[0086] : The low-level general feature tensor output by the shared feature encoder contains low-level visual common information of multimodal images (such as edge contours and basic textures). The shared feature encoder function, implemented by a shallow CNN, is responsible for extracting general low-level features from multimodal inputs.

[0087] A shallow convolutional neural network (CNN) is used as the shared feature encoder. The input registered image triple I is used to extract features and output the underlying common image features (such as edges and textures) common to all modalities, providing basic feature support for subsequent independent encoders.

[0088] (2). Independent Feature Encoders (Advanced Semantic Feature Extraction): Three parallel, deeper CNN encoders with potentially different weights. , , They receive ( ). ) as well as The corresponding part learns modality-specific high-level semantic features.

[0089] (2.1) For Mode B: ,in: : High-level semantic feature tensor of B-mode image, containing modality-specific information such as tumor morphology, edges, and internal echoes.

[0090] The B-mode independent encoder is a deeper CNN (with potentially different weights than other modal encoders) that focuses on feature learning from ultrasound images.

[0091] The input B-mode ultrasound image provides basic morphological information about the tumor.

[0092] Shared low-level features The corresponding feature part of the B-mode image.

[0093] Independent encoder Receive ultrasound image and the part corresponding to pattern B in the shared low-level features By learning advanced semantic features specific to the B pattern through a deeper CNN structure, the morphological information of the tumor (such as whether the edges are smooth and whether there are calcifications) is captured.

[0094] (2.2) For the elasticity diagram: ,in: Definitions of each item: The high-level semantic feature tensor of the elasticity graph contains modality-specific information such as tissue hardness distribution and hardness heterogeneity.

[0095] Elasticity map independent encoder, a deep CNN (weight independent), focuses on learning elasticity imaging features.

[0096] The input elasticity image (such as Young's modulus image) reflects tissue stiffness information.

[0097] Shared low-level features The corresponding characteristic part of the elasticity diagram.

[0098] Independent encoder Receive elasticity diagram and the part corresponding to the elastic mode in the shared low-level features By learning the high-level semantic features unique to elasticity maps through deep CNN, we can focus on capturing the hardness distribution and heterogeneity of tumor tissue (such as hardness value, regional differences, etc.).

[0099] (2.3) For Doppler diagrams: ,in: Definitions of each item: : The high-level semantic feature tensor of the Doppler image contains modality-specific information such as blood flow signal distribution, vascular morphology, and blood flow richness.

[0100] : Doppler image independent encoder, a deep CNN (weight independent), focusing on learning blood flow imaging features.

[0101] The input color Doppler blood flow image reflects the blood flow in the tumor area.

[0102] Shared low-level features The corresponding feature part of the Doppler image.

[0103] Independent encoder Received Doppler image and the part corresponding to the Doppler mode in the shared low-level features By learning the high-level semantic features unique to Doppler images through deep CNN, we can capture information such as blood flow morphology, richness, and vascular course inside and around the tumor.

[0104] 2. Three-modal feature tensor Channel dimension splicing process 1) Prerequisites for stitching: Feature alignment and dimensionality consistency Before stitching, it is necessary to ensure the high-level feature tensors of the ultrasound, elasticity, and Doppler three modes. , , The following conditions must be met: Consistent spatial dimensions: Since the input image has been aligned at the pixel level through non-rigid registration (such as a free deformation model based on B-splines), the height (H) and width (W) of the three feature tensors are the same, which are both the model input size (such as 256×256).

[0105] Feature structure matching: The feature tensor of each modality is a 4-dimensional tensor with a shape of (1, Ci, H, W), where Ci is the number of channels (e.g., ultrasound features). Contains 256 channels, elastic graph and Doppler Each contains 128 channels.

[0106] 2). Stitching operation: Channel dimension stacking The concatenation process is performed along the channel dimension of the feature tensor, and the specific steps are as follows: Input features: Advanced features of B-mode ultrasound The shape is (1,CB,H,W), and it includes semantic features such as tumor morphology and margins; Elastic graph advanced features The shape is (1, CE, H, W), and it includes characteristics such as hardness distribution and heterogeneity. High-level features of Doppler images The shape is (1, CD, H, W), which includes features such as blood flow signal distribution and vascular morphology.

[0107] Concatenation method: Use a tensor concatenation function (such as torch.cat in PyTorch) to merge the three elements in the channel dimension (usually the second dimension). The formula is expressed as: , Output: The concatenated feature tensor The shape is (1, CB+CE+CD, H, W), and the number of channels is the sum of the number of channels of the three-modal features (e.g., 256+128+128=512 channels).

[0108] 3). Purpose of splicing and subsequent applications Multimodal information integration: By channel splicing, three complementary types of information, namely morphology (B-ultrasound), stiffness (elasticity) and blood flow (Doppler), are encoded into the same feature space, providing a complete input for cross-modal attention fusion.

[0109] Attention weight calculation: after splicing It compresses the spatial dimension through global average pooling (GAP), and then calculates the attention map A through a neural network to achieve dynamic weighting of features of different modalities and channels.

[0110] Feature fusion basics: weighted fused features It will further generate unified features through convolution operations. It is used for tumor segmentation and classification of benign and malignant tumors.

[0111] 4). Key Technical Details Channel number design: The number of channels for different modalities can be adjusted according to their information complexity (e.g., ultrasound contains more morphological details, so more channels should be allocated) to ensure a balanced contribution of multimodal features.

[0112] Data type consistency: Before splicing, ensure that the data type (e.g., float32) and numerical range (e.g., normalized to [-1, 1]) of all feature tensors are consistent to avoid affecting subsequent attention calculations due to scale differences.

[0113] Through the above steps, the three-modal features are organically integrated in the channel dimension, laying the foundation for AI models to capture cross-modal correlation patterns (such as the combination of malignant tumor features such as "spiculated edges + high hardness + abundant blood flow").

[0114] 3. Cross-modal attention feature fusion layer: This is the key to model innovation. This invention introduces an attention mechanism that allows the model to dynamically determine how much information to extract from which spatial location and feature channel of each modality during comprehensive judgment.

[0115] 3.1 The specific formula for calculating a spatial-channel joint attention map A: , in, GAP stands for Global Average Pooling, which compresses the feature map space dimension to 1×1 while preserving channel-level global information. , It is a learnable weight matrix used for feature transformation; Ultrasound, elasticity, and Doppler three-modal characteristic tensors { , , The splicing result at the channel dimension; ReLU activation function, for example, is the activation function. , used to introduce nonlinear relationships; Sigmoid is also an activation function, such as the activation function. ), used to compress the output to the 0-1 range, as attention weights; The size of `A` is the same as ` Similarly, each element is between 0 and 1, representing the importance weight of the corresponding position and channel.

[0116] The formula is defined as: compressing multimodal concatenation features through global average pooling (GAP). The spatial dimension, through two layers of neural networks (weights) , The attention weight map A, which is the same dimension as the input features, is calculated using the activation functions (ReLU, Sigmoid) to achieve dynamic weighting of features of different modalities and spatial locations.

[0117] 3.2 Weighted Fusion: , in This indicates element-wise multiplication.

[0118] Then, the present invention can be used for Perform further convolution The unified feature representation after fusion is obtained. : Specifically: Through attention weight map A and spliced ​​features Element-wise multiplication (Hadamard product) is performed to dynamically weight the multimodal features. Regions with higher weight values ​​(between 0 and 1) contribute more to the fusion result, thereby focusing on key features and suppressing redundant information.

[0119] It is the intermediate feature tensor after weighted fusion, which incorporates multimodal key information filtered by the attention mechanism. The weighted fused features... Convolution operations (Conv) are performed, and through parameter learning of the convolution kernel, multimodal weighted features are further integrated into a unified feature representation with global consistency, providing input for subsequent segmentation and classification tasks.

[0120] The final unified feature tensor, which contains multimodal collaborative information, is used for tumor segmentation and benign / malignant tumor classification.

[0121] 4. Joint Feature Learning and Output: (1) A segmentation head, a decoder consisting of deconvolution and convolutional layers, to As input, it upsamples and outputs a tumor segmentation probability map of the same size as the input image.

[0122] , A probability map of the same size as the input image, where each pixel value represents the probability that the location is a tumor; The Decoder(⋅) is a decoder consisting of deconvolution (upsampling) and convolutional layers, responsible for restoring spatial resolution from high-dimensional features. It progressively upsamples the feature map to the same size as the original image through deconvolution, and then outputs the probability value (between 0 and 1) of each pixel belonging to the tumor through convolutional layers, forming a segmentation probability map.

[0123] (2) Classification head, in Apply global average pooling, connect to a fully connected layer, and output two tasks. (1.1) BI-RADS classification prediction: , GAP(⋅): Global average pooling, which compresses the feature map into a 1×1×C vector (where C is the number of channels). A fully connected layer used for BI-RADS grading, with an output dimension of 5.

[0124] Fusion features Global average pooling (GAP) is applied to compress the spatial dimension, resulting in a fixed-length feature vector, which is then passed through a fully connected layer. Output five probability distributions (corresponding to BI-RADS 2, 3, 4A, 4B, 4C / 5, with special handling for 1 and 6). .

[0125] By combining feature learning, the model can simultaneously output the precise spatial location of the tumor and the qualitative diagnosis results, avoiding the one-sidedness of a single task, improving the ability to identify complex cases (such as early occult tumors and atypical lesions), and providing more comprehensive auxiliary diagnostic evidence for clinical practice.

[0126] (1.2) Malignant / benign classification prediction: , Given a binary classification output, provide the probability of malignancy. (Between 0 and 1). The feature vectors, after sharing the gap with the BI-RADS hierarchy, are processed through independent fully connected layers. The binary classification result is output, and after Sigmoid activation, the probability (0-1) that the tumor is malignant is obtained. This is a fully connected layer used for classifying benign and malignant diseases, with an output dimension of 1, followed by a Sigmoid activation function.

[0127] Both the segmentation head and the classification head are based on Using this as input, we achieve multi-task joint learning of "segmentation-classification": Spatial information utilization: The segmentation head preserves spatial details through upsampling to achieve pixel-level tumor localization; Global feature utilization: The classification head aggregates global features through GAP to achieve overall benign and malignant and graded judgments. The two complement each other to improve the completeness of diagnosis.

[0128] (v) Model optimization strategies based on pathological knowledge The initial model was trained using image features and clinically labeled BI-RADS grading. To further improve the model's ability to identify the essential characteristics of malignant tumors, this invention introduces pathological slide images for knowledge distillation or joint training.

[0129] Method 1: Feature Space Alignment (Knowledge Distillation) 1. Data preparation: Collect a set of confirmed cases, which includes "ultrasound image triplet" and corresponding "digital images of whole pathological slides".

[0130] 2. Pathological Feature Extraction: Train an independent model (such as ResNet) pre-trained on a large pathological image dataset as a "teacher network" to extract feature vectors from pathological WSIs. This feature encodes gold standard information such as cellular atypia, mitotic figures, and tissue structure.

[0131] 3. Joint training: Keeping the ultrasonic AI model (student network) architecture unchanged, add a projection head to fuse features. Mapped to a For spaces of the same dimension, we get .

[0132] Loss function in the original segmentation loss (such as Dice Loss) and classification loss Based on (cross-entropy), add a knowledge distillation loss. : , Definitions of each item: The total loss function for model training, taking into account segmentation, classification, and knowledge distillation tasks; α, β, γ: Adjustable weighting coefficients that control the contribution ratios of segmentation loss, classification loss, and knowledge distillation loss, respectively. Segmentation loss (such as Dice Loss) is used to optimize pixel-level segmentation accuracy in tumor regions; Classification loss (such as cross-entropy loss) is used to optimize the accuracy of BI-RADS grading and benign / malignant classification. Knowledge distillation loss is used to constrain the consistency between ultrasound model features and pathological features. Operating mechanism: Based on the original segmentation loss and classification loss, knowledge distillation loss is introduced. By balancing the contributions of the three factors to model training through the weight coefficients α, β, and γ, the ultrasound AI model can learn pathological gold standard features while optimizing segmentation and classification tasks.

[0133] in, Mean squared error (MSE) or cosine similarity loss can be used: , Knowledge distillation loss based on mean square error reflects the numerical differences between feature vectors; Fusion characteristics of ultrasound AI model (student network) Feature vectors mapped by the projection head, dimensions and pathological features Consistent; The feature vectors extracted from whole-slide images (WSI) by the pathology teacher network encode gold standard information such as cellular atypia and tissue structure. The square of the L2 norm (the square of the Euclidean distance) is used to quantify the difference between two feature vectors.

[0134] By calculating the projection characteristics of the ultrasonic model With pathological features The squared L2 norm between them measures the Euclidean distance between them in the feature space. Minimizing this loss allows the ultrasound features to approximate the gold standard features of pathology.

[0135] , Knowledge distillation loss based on cosine similarity reflects the directional differences between feature vectors; the cosine similarity value ranges from [-1, 1], and is 1 when the two vectors are in the same direction, at which point the loss is zero. =0. The cosine similarity between two feature vectors is calculated to measure their directional consistency. The loss value is obtained by subtracting the cosine similarity from 1. Minimizing this loss can align ultrasound features with pathological features in terms of direction, capturing deep semantic associations.

[0136] Through the loss function design described above, the model can "distill" pathological gold standard knowledge into the ultrasound AI model, enabling it to learn deep imaging patterns related to the malignancy of tumors and improve diagnostic accuracy. Technical effect: By minimizing This forces the ultrasound AI model to learn feature representations As close as possible to the characteristics of the pathological gold standard This allows the model to implicitly learn deeper imaging patterns associated with the pathological malignancy when interpreting ultrasound images, rather than just surface morphological features, thereby improving its ability to identify occult malignant features and its diagnostic confidence.

[0137] Method 2: Pathology-guided attention mechanism Furthermore, pathological diagnostic results (benign / malignant) can be used as a strong supervisory signal and introduced into the aforementioned cross-modal attention fusion layer. Specifically, during training, this invention encourages the model to associate the attention weights of feature regions that contribute significantly to the final classification with key areas of the pathologically confirmed tumor. This can be achieved by adding a correlation constraint to the attention loss, guiding the model to focus more on imaging regions related to malignant pathological results.

[0138] II. Method Implementation Steps (e.g.) Figure 4 (As shown) Step 1: Multimodal Data Acquisition and Synchronization Implementation process: 1. Doctors use ultrasound equipment to scan the patient's breasts.

[0139] 2. Upon detection of a suspected mass, the operating device is switched to multimodal combined scanning mode. The system controls the ultrasound probe to automatically or semi-automatically acquire data sequentially and rapidly while maintaining a stable position. a. High-resolution B-mode image.

[0140] b. Shear wave elastic imaging data, generating Young's modulus diagram.

[0141] c. Color Doppler energy map, with the option to acquire pulsed Doppler spectra to calculate RI.

[0142] 3. The hardware system assigns the same timestamp and spatial location code to these three sets of data and sends them to the AI ​​computing unit via a high-speed network.

[0143] Technical principle: Through hardware synchronization and software triggering, it ensures that the three images originate from the same anatomical section at almost the same moment, minimizing misalignment caused by breathing, heartbeat, or ultrasound probe movement.

[0144] Technical effect: It provides a high-quality, spatiotemporally aligned input data foundation for subsequent accurate registration and fusion analysis.

[0145] Step 2: Data Preprocessing and Image Registration Implementation process: 1. Standardization: Dynamic range compression and grayscale normalization are performed on the B-mode image; the Young's modulus value (kPa) of the elasticity map is mapped to a standard color scale; color separation and brightness normalization are performed on the Doppler energy map.

[0146] 2. Key Step - Registration: Since the B-mode image is the basic anatomical reference, the elastogram and color Doppler image are non-rigidly registered to the B-mode image.

[0147] The algorithm can employ a free deformation model based on B-splines. Its principle is to optimize a spatial transformation function to make the floating image (elastic map) more flexible. Compared with the reference image (ultrasound image) The similarity metric is the highest.

[0148] Similarity measures are commonly represented by normalized mutual information, and its expression is as follows: , in: Normalized mutual information is used to measure the similarity between two images. Its value ranges from [0, 1]. The closer the value is to 1, the better the alignment effect. Reference image (B-mode ultrasound image) serves as the anatomical benchmark for registration; The floating image (elasticity map or Doppler map) needs to be registered to the reference image; : Spatial transformation function, used to transform the pixel coordinates of a floating image to the coordinate system of a reference image, such as the transformation parameters to be optimized (e.g., B-spline control points) in an elastic map. The floating image after transformation TT represents the... Acting on The result afterward; Entropy measures the uncertainty of image information; its calculation formula is: ,in This represents the probability distribution of image grayscale values. Technical significance: Based on the principles of information theory, NMI avoids the sensitivity of traditional gray-scale similarity measures (such as mean square error) to modal differences. It can effectively handle the differences in gray-scale characteristics between ultrasound multimodal images (B-mode ultrasound, elastography, Doppler), ensuring that the same pixel point corresponds to the same tissue's morphology, hardness, and blood flow attributes after registration, providing a precise spatial alignment basis for the feature fusion of subsequent AI models.

[0149] Technical principle: By using information theory metrics, we can find spatial transformations that maximize the sharing of information between images of different modalities and eliminate geometric distortion caused by physical acquisition.

[0150] Technical effect: Achieving pixel-level multimodal alignment ensures that each pixel analyzed by the subsequent AI model strictly corresponds to the three attributes (morphology, hardness, and blood flow) of the same tissue site, greatly improving the accuracy and reliability of feature fusion.

[0151] Step 3: AI Model Inference and Feature Extraction Implementation process: 1. Register the image triples Crop or scale the model to the preset input size (e.g., 256x256) and feed it into the loaded AI-assisted diagnosis model for breast tumors.

[0152] 2. Model forward propagation: a. Shared and independent encoders perform layer-by-layer convolution and pooling to extract multi-level features.

[0153] b. The attention fusion layer dynamically calculates the weights A and performs weighted fusion on the features to generate... .

[0154] c. Segment decoder pair Perform upsampling and convolution to output a segmented heatmap. .

[0155] d. Classification head pairs Perform pooling and fully connected computations, and output... and .

[0156] 3. Post-processing: For Binarization is performed using a threshold (e.g., 0.5) to obtain a precise contour mask of the tumor. Based on this mask, quantized features are extracted from the original image. Morphological characteristics: area, perimeter, aspect ratio, roundness, and edge burr index (calculated using the contour Fourier descriptor).

[0157] Elastic characteristics: average Young's modulus value, maximum value, and standard deviation within the mask (reflecting hardness heterogeneity).

[0158] Blood flow characteristics: percentage of blood flow signal, average blood flow intensity, and complexity of vascular morphology.

[0159] Technical principle: Deep learning models learn complex patterns highly correlated with the benign or malignant nature of tumors from raw pixels through nonlinear transformations of tens of millions of parameters. The attention mechanism mimics the cognitive process of a doctor "focusing on" certain areas.

[0160] Technical benefits: Within seconds, it automatically completes tumor localization and segmentation, and outputs a series of objective and repeatable quantitative indicators and preliminary diagnostic suggestions, completely replacing manual measurement and subjective scoring.

[0161] Step 4: Human-Machine Collaborative Diagnosis and Report Generation Implementation process: 1. Results Visualization: The physician's terminal software displays the original image and AI analysis results side-by-side. The AI ​​segmentation contours are overlaid on the ultrasound image with a semi-transparent highlight color. Quantitative features, BI-RADS suggested classification (e.g., "4B"), and malignancy probability (e.g., "72%)" are clearly listed in the sidebar.

[0162] 2. Interaction and Correction (Doctor): Adjust / Confirm Contour: If you are not satisfied with the automatic segmentation, you can fine-tune it using the brush tool. The system will recalculate all quantized features and malignancy probability in real time based on the new contour.

[0163] Adjusting diagnostic weights: The software provides a slider that allows doctors to manually fine-tune the weights of the three dimensions of morphology, elasticity, and blood flow in the final judgment (the model provides default weights, which are the interface for manual intervention).

[0164] Choosing a diagnosis conclusion: The doctor refers to the AI ​​suggestions and combines them with clinical experience to select the final BI-RADS classification and diagnostic opinion (probably benign, suspected malignant, biopsy recommended, etc.) from the structured options provided by the software.

[0165] 3. Report Generation: When a doctor clicks "Generate Report," the system automatically fills in the patient's information, images, AI-extracted quantitative features, and the interactive diagnostic conclusions into a standardized structured report template, and can automatically compare it with historical examinations. The report can be sent to PACS and HIS with one click.

[0166] Technical principle: Combining the objective computing power of AI with the high-level clinical decision-making ability of doctors. AI acts as a "super assistant" to handle tedious observation and measurement, while doctors act as the "decision-making body" to make comprehensive judgments and take responsibility.

[0167] Technical benefits: It significantly reduces the time doctors spend writing reports (from several minutes to tens of seconds), improves the standardization of the diagnostic process, and at the same time preserves doctors' leadership and flexibility in key decisions, thus maximizing human-machine efficiency.

[0168] Step 5: Continuous Model Optimization and Closure Implementation process: 1. For cases that underwent biopsy or surgery, the pathological diagnosis results (gold standard) were entered into the system.

[0169] 2. Under the premise of complying with ethical and safety regulations, the system packages the desensitized multimodal ultrasound data, AI prediction results, and pathological results of these cases.

[0170] 3. Periodically or after the amount of data has accumulated to a certain threshold, securely transmit this data to the cloud-based model management platform.

[0171] 4. The cloud platform uses new "image-pathology" paired data to incrementally train or fine-tune existing models, with particular optimization of knowledge distillation loss. .

[0172] 5. After verifying the performance improvement of the new model, it will be securely distributed as the new version to the edge AI computing units of various hospitals for updates.

[0173] Technical principle: Through the closed loop of "data -> model -> application -> new data", the AI ​​system can continuously learn from clinical practice, especially from the ultimate answer of the gold standard of pathology.

[0174] Technical effects: The diagnostic system has achieved self-evolution, enabling its diagnostic accuracy to continuously improve with the increase of usage time and the accumulation of cases. This effectively solves the problem of "one-time training and permanent obsolescence" of the model, ensuring the long-term vitality and clinical applicability of the system.

[0175] In summary, the "AI-assisted diagnostic method and system for breast tumors" systematically addresses the core pain points of existing technologies, such as high subjectivity, low efficiency, and poor consistency, through innovative hardware and software integration design, advanced multimodal fusion AI algorithms, and unique optimization strategies based on pathological knowledge. It provides a complete and advanced technical path for achieving accurate, efficient, and standardized ultrasound diagnosis of breast tumors.

[0176] The above embodiments can be implemented, in whole or in part, by software, hardware (such as circuits), firmware, or any other combination thereof. When implemented using software, the above embodiments can be implemented, in whole or in part, as a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more sets of available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. A semiconductor medium can be a solid-state drive.

[0177] It should be understood that the term "and / or" in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, or B alone. A and B can be singular or plural. Additionally, the character " / " in this article generally indicates an "or" relationship between the preceding and following related objects, but it can also represent an "and / or" relationship. Please refer to the context for a more accurate understanding.

[0178] In this invention, "at least one" means one or more, and "more than one" means two or more. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of a single item or a plurality of items. For example, at least one of a, b, or c can represent: a, b, c, ab, ac, bc, or abc, where a, b, and c can be a single item or multiple items.

[0179] It should be understood that, in various embodiments of the present invention, the order of the above-mentioned process numbers does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

[0180] Those skilled in the art will recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0181] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the devices, apparatuses, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.

[0182] In the several embodiments provided by this invention, it should be understood that the disclosed devices, apparatuses, and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another device, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.

[0183] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0184] In addition, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.

[0185] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this invention, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0186] The above description is merely a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A breast tumor AI-assisted diagnostic system, characterized in that, include: A multimodal ultrasound host is used to acquire B-mode images, elastography images, and Doppler blood flow images of the breast. A central control and preprocessing server is used to register and preprocess the multimodal images; The AI ​​inference engine is used to perform tumor segmentation, feature extraction, and classification prediction on registered multimodal images based on a trained deep learning model. The physician diagnostic terminal is used to display AI analysis results and support interactive corrections by doctors. The central control and data management server is used to manage patient information, task scheduling, and report generation; A cloud-based model management platform for model version management, data anonymization and aggregation, and model iteration and optimization.

2. The system according to claim 1, characterized in that, The multimodal ultrasound host supports shear wave elastography and color Doppler imaging, and can output raw radio frequency data or baseband IQ data.

3. The system according to claim 1, characterized in that, The central control and preprocessing server includes GPU computing units for performing non-rigid registration of multimodal images and AI model inference.

4. The system according to claim 1, characterized in that, The AI ​​inference engine includes a dual-stream fusion network with a shared encoder, independent encoders, a cross-modal attention fusion layer, and segmentation and classification heads.

5. The system according to claim 1, characterized in that, The system also includes a pathological knowledge optimization module, which integrates pathological slide features into the ultrasound AI model through knowledge distillation to improve the model's recognition accuracy.

6. A method for AI-assisted diagnosis of breast tumors, implemented based on the system according to any one of claims 1-5, characterized in that, Includes the following steps: Acquire multimodal ultrasound images of the breast, including B-mode images, elastography images, and Doppler flow images; The multimodal images are registered and preprocessed to align them spatially; The registered images are then input into a trained AI model for tumor segmentation, feature extraction, and classification prediction. The AI ​​analysis results are presented to doctors in a visual format, and interactive corrections are supported. A structured diagnostic report is generated based on the doctor's confirmation.

7. The method according to claim 6, characterized in that, The AI ​​model is a two-stream fusion network, including a shared encoder, an independent encoder, a cross-modal attention fusion layer, a segmentation decoder, and a classification head.

8. The method according to claim 6, characterized in that, The method also includes optimizing the AI ​​model using pathological slide images and constraining the alignment of ultrasound features with pathological features through a knowledge distillation loss function.

9. The method according to claim 6, characterized in that, The multimodal image registration adopts a B-spline-based free deformation model and uses normalized mutual information as a similarity measure.

10. The method according to claim 6, characterized in that, The method also includes establishing an "image-pathology" data closed loop, continuously collecting clinical pathology data and using iterative optimization of the model to achieve self-evolution of the diagnostic system.