Method and system for judging diagnosis coincidence degree of medical technology report and pathology report based on large language model, and storage medium

By extracting and mapping structured features from a large language model, combined with enhanced retrieval generation and model fine-tuning, the efficiency and accuracy issues in determining the diagnostic consistency between medical technology reports and pathology reports were resolved, achieving efficient and reliable automated quality control.

CN122245810APending Publication Date: 2026-06-19NINGBO TECH PARK MINGTIAN YIWANG TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NINGBO TECH PARK MINGTIAN YIWANG TECH CO LTD
Filing Date
2026-03-13
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing methods for determining the consistency of diagnoses between medical technology reports and pathology reports are time-consuming and labor-intensive, difficult to cover all cases, and automated systems have difficulty accurately understanding medical logical relationships, leading to frequent cases of missed or misdiagnosed diagnoses.

Method used

By employing a large language model-based approach, and through structured feature extraction, vertical domain structured prompt words, retrieval enhancement generation, and model fine-tuning, combined with confidence assessment of probability distribution vectors, we achieve automated mapping and quality control of imaging features and pathological diagnostic features.

Benefits of technology

This improved the accuracy and efficiency of judging the diagnostic consistency between medical technology reports and pathology reports, ensured the objectivity and reliability of quality control conclusions, and reduced the occurrence of missed diagnoses and misdiagnoses.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122245810A_ABST
    Figure CN122245810A_ABST
Patent Text Reader

Abstract

This application relates to a method, system, and storage medium for judging the diagnostic consistency between medical technology reports and pathology reports based on a large language model. The method automatically parses the logical association between medical technology and pathology reports through structured feature extraction and vertical domain prompts. Simultaneously, by combining enhanced retrieval generation and model fine-tuning, it improves the semantic understanding accuracy of medical terminology and hierarchical logic. Furthermore, it introduces a confidence assessment and triage mechanism based on probability distribution vectors to identify and isolate low-confidence predictions, ensuring the objectivity of quality control conclusions.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of medical information technology, and in particular to a method, system, and storage medium for judging the diagnostic consistency between medical technology reports and pathology reports based on a large language model. Background Technology

[0002] In clinical practice, medical examination reports, such as those from radiology, ultrasound, and endoscopy, along with pathological examination reports, are the two pillars of disease diagnosis. Typically, medical examinations provide a preliminary assessment of the nature of a lesion by observing its morphology, blood flow, and anatomical features, while pathological diagnosis, through microscopic observation of tissue or cell morphology, is widely recognized as the gold standard for clinical diagnosis.

[0003] For the same lesion in the same patient, the diagnostic conclusions in the medical technology report and the final conclusions in the pathology report should maintain a high degree of consistency. Conducting verification of the consistency between medical technology and pathology diagnoses is not only a core requirement for quality control in medical institutions, but also an important means to assess the accuracy of clinical diagnoses, reduce misdiagnosis and missed diagnosis, and improve the quality of medical care.

[0004] However, in existing medical quality control practices, determining diagnostic concordance faces the following main challenges: First, with the number of reports generated by large medical institutions increasing exponentially every day, the manual review method that relies on professional physicians to read and compare each report is not only time-consuming and labor-intensive, but also often only allows for quality control through random sampling, making it difficult to achieve comprehensive coverage of all cases. This may result in missed or misdiagnosed cases with significant clinical importance.

[0005] Secondly, existing information technology methods mostly rely on keyword matching or simple natural language processing (NLP) rules. Due to the large number of synonyms, abbreviations, and uncertain modifiers such as "considered as," "not excluded," and "compliant with," and the different reporting language systems of different departments, traditional automated systems have difficulty deeply understanding the medical logical relationships in the reports and cannot accurately determine the mapping relationship between imaging grades and pathological histological characteristics, often resulting in a large number of false alarms. Summary of the Invention

[0006] To address the aforementioned issues, this application provides a highly reliable method, system, and storage medium for judging the diagnostic consistency between medical technology reports and pathology reports based on a large language model.

[0007] To achieve the above objectives, in a first aspect, embodiments of this application provide a method for judging the diagnostic consistency between medical technology reports and pathology reports based on a large language model, comprising the following steps: The system acquires the medical technology report text and pathology report text to be processed, and extracts the imaging feature set from the medical technology report and the pathological diagnosis feature set from the pathology report through a preset structured feature extraction model. Construct structured prompt words for vertical domains. The prompts include at least preset role constraints, task definition protocols, structured output templates, few-shot learning examples, and compliance constraints. The imaging feature set and the pathological diagnosis feature set are mapped to the vertical domain structured prompt word instructions, and input into the preset large language model. The inference engine of the large language model generates diagnostic consistency prediction data. Extract the probability distribution vector of the output layer of the large language model, calculate the confidence index value of the prediction conclusion based on the probability distribution vector, and perform split processing on the prediction conclusion according to the mapping relationship between the confidence index value and the preset threshold. Based on the results of the traffic splitting process, a structured diagnostic compliance report is output, which includes compliance level, judgment logic basis, and key difference dimensions.

[0008] Preferably, the steps for extracting the imaging feature set and the pathological diagnostic feature set include: A preset medical terminology standard alignment algorithm is invoked to map the terms in the medical technology report and pathology report to a standard medical terminology space. The specific execution steps of the medical terminology standard alignment algorithm include: The medical technology report and pathology report are input into a named entity recognition model built on BiLSTM-CRF to identify medical entity boundaries in the text in order to extract candidate medical terms; Traverse the pre-built standardized medical terminology mapping dictionary and perform precise matching retrieval on the candidate medical terms. The dictionary includes at least a thesaurus index that points semantically equivalent terms to the same standard term, and an abbreviation expansion table that maps abbreviations to their full names. In response to the candidate medical term not finding an exact match in the standardized medical terminology dictionary, the character edit distance between the candidate medical term and the target standard term, as well as the length of their topological association path in the medical ontology knowledge graph, are calculated. If the character editing distance is less than a preset threshold and the topological association path length is less than 3, then a standard mapping relationship between the candidate medical term and the target standard term is established.

[0009] Preferably, the task definition protocol is configured with four levels of logical decision rules: The following conditions must be met: the set of imaging features and the set of pathological diagnostic features are logically consistent in terms of the preset malignancy assessment dimension; The general consensus is that the set of imaging features and the set of pathological diagnostic features are logically consistent in the main diagnostic dimension, but differ in the non-core medical description dimension. The following does not apply: The set of imaging features and the set of pathological diagnostic features are logically exclusive in the dimension of diagnostic conclusion. Irrelevant: The anatomical targets corresponding to the set of imaging features and the set of pathological diagnostic features do not overlap.

[0010] Preferably, the large language model, combined with retrieval enhancement, generates auxiliary reasoning, and the specific execution steps include: The set of imaging features is converted into a vector to be retrieved, and the K medical knowledge entries with the highest matching degree are retrieved from a preset medical knowledge vector database. The medical knowledge vector database is constructed by vectorizing medical guide documents, disease classification standards and standard medical terminology database. The fusion weight w between the medical knowledge item and the retrieval vector is calculated using the following formula. i : In the formula, sim i Let be the cosine similarity between the i-th medical knowledge item vector and the vector to be retrieved. The preset temperature coefficient; And, the medical knowledge entries are weighted according to the fusion weight w i The weighted information is then processed and embedded into the context of the structured prompt word instruction in the vertical domain.

[0011] Preferably, the large language model is fine-tuned based on the LoRA algorithm, and the specific execution steps include: The rank of the low-rank decomposition matrix of the LoRA algorithm is set to an integer value between 8 and 64; Lock the target weight parameters for fine-tuning, which at least cover the query matrix W in the attention mechanism layer of the large language model. q Key matrix W k Value matrix W v and output matrix W o And the upprojection matrix W in the feedforward neural network layer up and the downward projection matrix W down ; The cross-entropy loss function is used as the target loss function in the fine-tuning process, and the calculation formula is as follows: In the formula, The number of training samples. The total number of categories for diagnostic conformity classification. For the first The sample belongs to the first The true label of the class, For the model to predict the first The sample belongs to the first The probability value of the class.

[0012] Preferably, the processing steps for calculating the confidence index value of the prediction conclusion are as follows: Obtain the Logits vector sequence of the output layer when the large language model generates prediction conclusions, and calculate the probability distribution entropy H of the Logits vector sequence using the following formula: In the formula, V is the dimension of the output probability distribution vector, and p i Let p be the prediction probability of the i-th component. i It is obtained by performing Softmax normalization on the Logits vector of the model output layer, and the calculation formula is: Among them, z i This refers to the i-th component value of the Logits vector; If the calculated probability distribution entropy H satisfies the preset dispersion condition, it is determined that the uncertainty of the model prediction is too high. An anomaly identification signal is added to the prediction data, and the prediction conclusion is redirected to the secondary verification queue.

[0013] Preferably, the structured diagnostic compliance report further includes correction guidance data for medical technology reports, the generation steps of which include: Qualitative features are extracted from the pathological diagnostic feature set and compared with the imaging feature set in multiple dimensions. Local feature descriptions in the medical technology report that contradict the qualitative features are identified, and corresponding correction guidance data is matched from a preset medical image description strategy library.

[0014] Preferably, the method further includes a model iteration step based on feedback data: The report feature pairs after secondary verification and calibration are obtained and stored in the incremental training library as supervised training samples. Based on the GPPO algorithm, the parameters of the reward model of the large language model are updated, and the updated reward model is used to guide the policy optimization of the large language model. The function for calculating the reward value R(y|x) used to update the parameters is: In the formula, x represents the input report features, y represents the model output, and yi represents the model output. ∗ Standard answers marked by experts; Racc (y,y ∗ ) represents the accuracy bonus item, when y = y ∗ If the judgment results are consistent, the first preset value is used; otherwise, 0 is used. R format (y) is the format compliance reward item. When y conforms to the preset structured JSON format specification, the second preset value is taken; otherwise, 0 is taken. R expert (y) represents the expert ranking preference reward, calculated based on the Bradley-Terry model: Where r w and r l σ represents the potential reward scores for the preferred and unpredictable outputs in the expert annotations, respectively; α, β, and γ are the corresponding weight coefficients.

[0015] Secondly, embodiments of this application provide a system for judging the diagnostic consistency between medical technology reports and pathology reports based on a large language model, including: One or more processors; Storage device for storing one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the method described in any embodiment of the first aspect.

[0016] Thirdly, embodiments of this application provide a computer-readable storage medium having computer program instructions stored thereon, which, when executed by a processor, implement the method described in any embodiment of the first aspect.

[0017] This application presents a method, system, and storage medium for judging the diagnostic consistency between medical technology reports and pathology reports based on a large language model. Through structured feature extraction and vertical domain prompts, it automatically parses the logical connections between medical technology and pathology reports. Simultaneously, by combining enhanced retrieval generation and model fine-tuning, it improves the semantic understanding accuracy of medical terminology and hierarchical logic. Furthermore, it introduces a confidence assessment and triage mechanism based on probability distribution vectors to identify and isolate low-confidence predictions, ensuring the objectivity of quality control conclusions. Attached Figure Description

[0018] Figure 1 This is a flowchart illustrating the diagnostic consistency judgment method for medical technology reports and pathology reports based on a large language model, provided in an embodiment of this application.

[0019] Figure 2 This is a localized deployment architecture diagram of the medical technology report and pathology report diagnostic consistency judgment system based on a large language model provided in the embodiments of this application. Detailed Implementation

[0020] The preferred embodiments of this application are described below with reference to the accompanying drawings. It should be understood that the preferred embodiments described herein are for illustration and explanation only and are not intended to limit this application.

[0021] Firstly, embodiments of this application provide a method for judging the diagnostic consistency between medical technology reports and pathology reports based on a large language model. In this embodiment, the method is executed by a computing device equipped with a processor and memory. Specifically, as follows... Figure 1 As shown, the method includes the following steps: Step S1: Obtain the report text to be processed and extract its features.

[0022] Specifically, the process first acquires the texts of medical technology reports (such as radiology reports, ultrasound reports, or endoscopy reports) and pathology reports to be processed. Then, using a pre-defined structured feature extraction model, it extracts the imaging feature set from the medical technology report and the pathological diagnostic feature set from the pathology report, respectively. In this embodiment, the imaging feature set includes, but is not limited to, examination site, imaging manifestations, and initial diagnostic grading such as BI-RADS / TI-RADS; the pathological diagnostic feature set includes, but is not limited to, the location of the submitted tissue, pathological type, and degree of differentiation.

[0023] Preferably, the feature extraction process also includes specific steps for extracting the imaging feature set and the pathological diagnosis feature set, that is, by calling a preset medical terminology standard alignment algorithm, the terms in the medical technology report and pathology report are mapped to the standard medical terminology space to eliminate the semantic gap caused by different doctors' writing habits and different departmental discourse systems.

[0024] Specifically, the execution steps of the medical terminology standard alignment algorithm include: First, unstructured medical and pathology reports are input into a pre-trained Named Entity Recognition (NER) model. In this embodiment, the model employs a BiLSTM-CRF (Bidirectional Long Short-Term Memory Network-Conditional Random Field) architecture. The BiLSTM layer captures the contextual semantic features of each character or word in the report text, addressing long-distance dependencies in medical descriptions, such as "left upper lobe...sees a nodule." The CRF layer handles the constraints of the label sequence; for example, the label "I-Disease" must immediately follow "B-Disease," thus accurately identifying the start and end boundaries of medical entities. This model can extract a series of candidate medical terms from continuous text.

[0025] Subsequently, the pre-built standardized mapping dictionary of medical terms is traversed, and key-value matching is performed on the extracted candidate medical terms. The dictionary contains two core sub-tables: Thesaurus: Stores semantically equivalent pairs of terms. For example, it unifies "malignant lesion", "space-occupying lesion (highly suspicious)" and "Ca" to the standard term "malignant tumor".

[0026] Abbreviation Expansion Table: Stores the correspondence between commonly used clinical abbreviations and their full names. For example, "NHL" is mapped to "non-Hodgkin's lymphoma", and "SCC" is mapped to "squamous cell carcinoma".

[0027] If a candidate term is found in the dictionary, the corresponding standard term is output directly, and the process ends.

[0028] Next, in response to a candidate medical term not finding an exact match in the aforementioned dictionary—for example, due to a typo, a non-standard combination of words, or an extremely rare abbreviation—the semantic disambiguation engine, based on a hybrid strategy, performs the following calculations: Calculate character edit distance: The Levenshtein Distance algorithm is used to calculate the character differences between candidate terms (such as "invasive ductal carcinoma") and target standard terms (such as "invasive ductal carcinoma").

[0029] Calculate the path length of topological association: Locate the core root of the candidate term and the node position of the target standard term in the medical ontology knowledge graph (such as UMLS or SNOMED CT), and calculate the shortest path length between them in the graph.

[0030] Finally, if the character editing distance is less than a preset threshold, for example, less than 2 character differences, and the topological association path length between the two in the knowledge graph is less than 3, then it is determined that the two have a substantial semantic affiliation relationship, and a standard mapping is established.

[0031] For example: If the report contains "papillary thyroid carcinoma", edit distance analysis shows that "Ca" and "Carcinoma" are spelled very differently, but are consistent with "Ca" in the abbreviation table. In the knowledge graph, the "papillary thyroid carcinoma" node points to the parent node "thyroid cancer" with a path length of 1 (direct subclass). The conclusion satisfies the condition of "path length < 3", and it is successfully standardized as "papillary thyroid carcinoma", thus avoiding missed diagnosis due to non-standard writing.

[0032] Step S2: Construct structured prompt words for the vertical domain.

[0033] Specifically, the constructed vertical domain structured prompt words instructions include at least preset role constraints (such as defining the model as a senior medical quality control expert), task definition protocol, structured output template (such as JSON format specification), few-shot learning examples (including positive and negative examples), and compliance constraints (such as privacy protection and medical ethics requirements).

[0034] The task definition protocol is configured with four levels of logical judgment rules to define the specific categories of compliance: The criteria are: the set of imaging features and the set of pathological diagnostic features are logically consistent in terms of the preset malignancy assessment dimension; for example, the imaging is classified as category 5, and the pathology confirms it as cancer.

[0035] The general agreement is that the set of imaging features and the set of pathological diagnostic features are logically consistent in the main diagnostic dimension, but differ in non-core medical descriptive dimensions; for example, the degree of malignancy is consistent, but there are slight deviations in the detailed description of the specific degree of differentiation.

[0036] The following does not apply: The set of imaging features and the set of pathological diagnostic features are logically exclusive in the dimension of diagnostic conclusion; for example, the imaging description is benign, but the pathological conclusion is malignant.

[0037] Irrelevant: The anatomical targets corresponding to the set of imaging features and the set of pathological diagnostic features do not overlap; for example, the two reports target completely different lesion sites.

[0038] Step S3: Model reasoning and knowledge enhancement.

[0039] Specifically, the imaging feature set and the pathological diagnosis feature set are mapped to the vertical domain structured prompt word instructions, and input into a preset large language model, so as to generate diagnostic conformity prediction data through the inference engine of the large language model.

[0040] In this embodiment, the large language model, combined with retrieval enhancement, generates assisted reasoning to reduce the incidence of logical errors. Specific execution steps include: First, a highly authoritative pre-defined medical knowledge vector database is constructed. In this embodiment, the preferred data sources for this database include: ICD-11 disease classification standard documents, SNOMED-CT (Systematic Medical Nomenclature - Clinical Terminology), and the latest version of the WHO tumor classification guidelines.

[0041] Subsequently, pre-trained text embedding models in the medical field, such as fine-tuned versions of PubMedBERT or BioBERT, were used to slice and vectorize the unstructured documents. Each knowledge entry contains the original text block and its corresponding d-dimensional feature vector.

[0042] During the inference phase, the set of image features extracted in step S1 is converted into a retrieval vector q, and the relationship between q and the vectors k of each entry in the knowledge base is calculated. j The cosine similarity is calculated, and then the K medical knowledge items with the highest matching degree are selected based on the similarity ranking. In this embodiment, in order to balance the length limit of the context window and the richness of information, the value range of K is preferably set to 3 to 10 (e.g., K=5).

[0043] Meanwhile, to prevent low-relevance items (noise) from interfering with the model's reasoning logic, this embodiment does not directly concatenate the search results. Instead, a temperature coefficient-based weighted fusion algorithm is used to calculate the fusion weight w for each knowledge item. i The calculation formula is as follows: In the formula, sim i Let be the cosine similarity between the i-th medical knowledge item vector and the vector to be retrieved. The preset temperature coefficient τ is used to adjust the sharpness of the weight distribution. In this embodiment, the temperature coefficient τ is preferably set to a range of 0.05 to 0.2.

[0044] After calculating the weights, the medical knowledge entries are then processed according to the fusion weight w. i The algorithm performs weighted processing and embeds the enhanced knowledge information into the context of the vertical domain structured prompt instruction. For example, it uses the tag format "[Reference Knowledge] {Integrated Knowledge Content}" to embed the weighted authoritative guidance information into the context of the vertical domain structured prompt instruction.

[0045] In some embodiments, a low-rank adaptation (LoRA) algorithm is used to perform instruction tuning on the model, enabling the general-purpose large language model to adapt to the complex logical mapping relationship between medical technology reports and pathology reports. Specific execution steps include: (1) Hyperparameter setting: The rank of the low-rank decomposition matrix of the LoRA algorithm is set to an integer value between 8 and 64. In this embodiment, R is preferably set to 16.

[0046] (2) Target Module Selection: This embodiment selects to comprehensively fine-tune the core weights in the Transformer architecture to retain the general language capabilities of the large model and inject medical domain knowledge; the specific fine-tuning objects include: Attention layer: Query matrix W q Key matrix W kValue matrix W v and output matrix W o .

[0047] Feedforward Network Layer (FFN): Up projection matrix W up and the downward projection matrix W down .

[0048] (3) Loss Function Calculation: The fine-tuning process adopts a fully supervised learning model, aiming to minimize the difference between the predicted probability distribution and the expert-annotated true distribution. In this embodiment, the cross-entropy loss function is used as the target loss function for the fine-tuning process, and the calculation formula is as follows: In the formula, This represents the number of samples in the current training batch. In this embodiment, C=4, representing the total number of categories for diagnostic compliance classification, corresponding to "compliant", "basically compliant", "non-compliant", and "irrelevant" respectively. For the first The sample belongs to the first The actual label of the class is represented using one-hot encoding, that is, 1 if it belongs to the class, and 0 otherwise; For the model to predict the first The sample belongs to the first The probability value of the class, that is, the output after Softmax normalization.

[0049] By backpropagating based on this loss function, the parameters of the LoRA matrix are continuously updated until the model converges on the validation set, thereby obtaining a dedicated model with medical quality control capabilities.

[0050] Step S4: Confidence assessment and triage.

[0051] In this step, the uncertainty of the model output is quantified to automatically triage the prediction conclusions, thereby avoiding the output of erroneous diagnostic conclusions. The specific processing flow is as follows: Extract the probability distribution vector (Logits) from the output layer of the large language model and calculate the confidence index value of the prediction conclusion. The specific processing steps are as follows: S41: Obtain Logits and perform Softmax normalization.

[0052] When the large language model completes inference, the last token position in its output layer generates an original numerical vector called the Logits vector sequence (Z). The numerical values ​​in this vector are usually in the range of (−∞,+∞) and do not have a direct probabilistic meaning.

[0053] First, the Logits vector is subjected to Softmax normalization, mapping it to a probability distribution P. The calculation formula is as follows: In the formula: z i This represents the original value of the i-th dimension of the Logits vector; V represents the dimension of the output probability distribution vector. In this embodiment, V corresponds to the total number of categories for diagnostic conformity classification. For example, when setting a four-level classification of "conforms, largely conforms, does not conform, and is irrelevant," V=4.

[0054] p i Let p be the probability value predicted by the normalized model for the current case belonging to i categories, and satisfy ∑p i =1.

[0055] S42: Calculate the entropy of the probability distribution.

[0056] In this embodiment, the information entropy H of the probability distribution is calculated as a confidence index: This avoids situations where relying solely on the maximum probability value is insufficient to measure uncertainty.

[0057] When H approaches 0, it indicates that the probability distribution is very sharp, for example, P=[0.99,0.01,0,0], which shows that the model is very confident in the judgment result and has no hesitation; while when H is large, it indicates that the probability distribution tends to be flat, for example, P=[0.25,0.25,0.25,0.25], which shows that the model thinks that all four conclusions are possible and is in a state of confusion.

[0058] S43: Automated traffic splitting based on discreteness conditions.

[0059] In this embodiment, a discreteness determination threshold is preset, which is preferably set to 0.7, and the following splitting logic is executed: Scenario 1 (Abnormal Interception): If the calculated H > 0.7, it is determined that the uncertainty of the model prediction is too high. An abnormal flag signal with status: "REVIEW_REQUIRED" can be added to the generated JSON data, and the prediction data can be redirected to the secondary verification queue to force manual intervention for review.

[0060] Scenario 2 (fast track): If H≤0.7, the model prediction confidence level is considered to be up to standard. The category with the highest probability can be directly extracted as the final conclusion, and a final report can be generated.

[0061] Through the automated data splitting process based on the probability distribution vector mapping relationship described above, the data flow direction can be dynamically adjusted according to the confidence index value, thereby maximizing the processing efficiency of full-volume quality control while ensuring the reliability of the quality control conclusions.

[0062] Step S5: Result output and correction guidance.

[0063] Specifically, based on the results of the traffic splitting process, a structured diagnostic compliance report is output, which includes compliance level, judgment logic basis, and key difference dimensions.

[0064] In this embodiment, the structured diagnostic compliance report also includes correction guidance data for medical technology reports, and its generation steps include: Qualitative features are extracted from the pathological diagnostic feature set and compared with the imaging feature set in multiple dimensions. Local feature descriptions in the medical technology report that contradict the qualitative features are identified, and corresponding correction guidance data is matched from a preset medical image description strategy library. For example, radiologists are prompted to pay attention to specific imaging signs to reduce missed diagnoses.

[0065] Step S6: Feedback and iterative optimization.

[0066] In some embodiments, the method further includes a model iteration step based on feedback data: (1) Feedback data acquisition and sample library construction: The system receives report feature pairs in real time after calibration by a secondary verification queue. Specifically, it structures the correction opinions, correct diagnostic answers, and discussion conclusions for controversial cases entered by quality control experts, extracts the original error prediction-human correction label feature pairs, and stores them as high-value supervised samples in the incremental training database.

[0067] (2) Iterative optimization of the prompt word instruction set: Based on the error cases in the incremental training database, semantic bias analysis is performed to identify ambiguous expressions or logical blind spots in the original structured prompt instructions: Example Supplement: Based on the identified bias patterns, representative cases are selected from the incremental library as new few-shot learning examples to replace the original low-contribution examples.

[0068] Rule Refinement: The terminology standardization rule base in the medical terminology standard alignment algorithm is updated synchronously to include newly emerging medical abbreviations and department-specific titles into the standard mapping relationship.

[0069] (3) Periodic evolution of model parameters and strategies: Configured with a periodic retraining engine (e.g., triggered quarterly or when a preset sample size threshold is reached): Incremental fine-tuning: The large language model is incrementally fine-tuned using an incremental training database, with targeted feature enhancement specifically for weak specialties (such as ophthalmology and dentistry).

[0070] Reinforcement learning optimization: Obtain the report feature pairs after secondary verification and calibration, and store them as supervised training samples in the incremental training library; based on the GPPO algorithm, update the parameters of the reward model of the large language model, and use the updated reward model to guide the policy optimization of the large language model.

[0071] Specifically, cases that have been corrected by quality control experts are periodically extracted from the secondary verification queue to construct triplet data (x, y). model ,y expert ), where x is the original input (features of medical technology and pathology reports), and y is the input. model This is the model's raw output (which may contain errors or flaws), y expert This is the standard answer revised by experts. This data will be stored in an incremental training library for subsequent training of the reward model and optimization of the policy model.

[0072] To guide the model to converge to the optimal policy, the function for calculating the reward value R(y|x) used to update the parameters is: In the formula, x represents the input report features, y represents the model output, and yi represents the model output. ∗ The standard answer marked by experts.

[0073] The specific definitions and parameter configurations for each reward item are as follows: Accuracy Bonus Item R acc (y,y ∗ ): By comparing the conclusion fields (such as "compliance level") in module y with the expert standard answer y ∗ Whether they are consistent. In this embodiment, the first preset value is preferably set to 1.0; that is: if the conclusions are consistent, the reward is +1.0; otherwise, the reward is 0.

[0074] Format compliance reward item R format (y): The output y is syntax-validated by calling a JSON parser to ensure that downstream information systems (such as quality control dashboards and HIS systems) can seamlessly parse the data. In this embodiment, the second preset value is preferably set to 0.5; that is, if the output conforms to the preset JSON Schema specification, a reward of +0.5 is given; otherwise, it is 0. This effectively penalizes the model for generating unstructured redundant text.

[0075] Expert ranking preference reward item R expert(y): Used to capture the fluency of a doctor's language or explanation logic that cannot be quantified by rules. Specifically, it is based on the Bradley-Terry model to score pairwise data, that is, experts score two different outputs (preferred y) for the same input. w with inferior selection y l Sort the results and train a reward model to predict the potential scores r of the two results. w and r l .

[0076] Specifically, the calculation formula is as follows: Here, σ is the Sigmoid function. This reward aims to maximize the score difference between the best and worst answers, making the model's output style closer to the expression habits of human experts.

[0077] Furthermore, to balance the above three objectives, the weighting coefficients α, β, and γ must satisfy the normalization condition α + β + γ = 1. In a preferred embodiment of this application, the weights are configured as follows: α = 0.6, β = 0.2, γ = 0.2.

[0078] In other words, accuracy (α) is given the highest weight of 60%, ensuring that safety is paramount; format (β) and expert preference (γ) are each given a weight of 20%, which aims to help the model become more standardized and human-like on the basis of correctness.

[0079] (4) Dynamic synchronization of external knowledge bases: Execute the knowledge base dynamic update logic to support the continued effectiveness of the Retrieval Enhancement Generation (RAG) module: Guideline synchronization: Monitor and import the latest medical diagnostic guidelines, WHO tumor classification standards and hospital-specific diagnostic standards, and update them to the medical knowledge vector database after vectorization.

[0080] Case database expansion: Difficult cases and complex conformity determination cases reviewed by experts are included in the search scope, so that the model can obtain more valuable contextual enhancement information when processing similar cases in the future.

[0081] Secondly, embodiments of this application provide a system for judging the diagnostic consistency between medical technology reports and pathology reports based on a large language model. The system includes one or more processors and a storage device for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors implement the method described in any embodiment of the first aspect.

[0082] In practice, the system is preferably deployed on localized computing nodes within the institute's private network environment. For example... Figure 2As shown, the system interfaces with the hospital's existing IT infrastructure via a pre-defined API interface or the hospital's private network. This infrastructure includes, but is not limited to: HIS (Hospital Information System) / LIS (Laboratory Information System): Used to obtain basic patient diagnosis and treatment information and pathology examination request data; PACS (Picture Archiving and Communication System): Used to retrieve the original image report text generated by medical technology examinations such as radiology, ultrasound, and endoscopy.

[0083] At the physical level, the system consists of one or more high-performance computing servers, and at the logical level, it is divided into the following four modules: The prompt word engine module is responsible for dynamically constructing structured instructions containing templates, examples, and rules based on different specialty characteristics.

[0084] The medical knowledge base module stores vectorized medical guidelines, standard terminology ontology, and historical precedents, providing support for RAG retrieval.

[0085] Locally deployed large model module: Configured with pre-trained large language models (such as the Qwen series) that have been privately deployed and fine-tuned with LoRA, used to perform core semantic reasoning.

[0086] Security and Compliance Module: Responsible for performing data anonymization, access control, and audit logging to ensure that the processing complies with medical regulatory requirements.

[0087] In addition, to meet the compliance requirements of medical data, the system is configured so that all data flows within the hospital area. All feature extraction and model inference are completed on local computing nodes. That is, before executing step S1, the system uses the configured automatic desensitization engine to identify and remove patient privacy information, including unique identifiers such as name, ID number, and contact information, from the report text, retaining only clinical descriptive features related to diagnosis. At the same time, operation permissions are assigned based on the role access control (RBAC) mechanism, and an immutable operation audit log is generated for each compliance judgment behavior.

[0088] To further illustrate the logical processing of the method described in this application, the following explanation is provided in conjunction with a specific clinical application scenario: Scenario 1: Determining the accuracy of breast cancer diagnosis.

[0089] In this example, the system receives the following input data: Imaging features: Irregular hypoechoic nodule in the upper outer quadrant of the right breast, BI-RADS 5, suggesting a high suspicion of malignancy.

[0090] Pathological features: Invasive ductal carcinoma of the right breast, grade II, with immunohistochemical markers of ER(+) and PR(+).

[0091] The system executes the processing logic of steps S1 to S3 as follows: First, the Embedding model identifies that both "BI-RADS 5" and "invasive ductal carcinoma" belong to the high-malignancy category in the semantic space. Then, the Retrieval Enhancement Generation (RAG) module retrieves relevant breast cancer diagnostic guidelines for auxiliary reasoning. Finally, the system outputs a "consistent" conclusion with a confidence level of 0.95, accurately extracting the key consistency point: "lesion location: upper outer quadrant of the right breast." This example verifies that the proposed solution can accurately achieve a logical closed loop of cross-disciplinary semantics when handling high-risk cases.

[0092] Scenario 2: Case of thyroid nodule diagnosis not meeting the criteria.

[0093] In this example, the system obtained the following conflict data: Imaging features: A solid nodule in the middle of the left thyroid gland, with clear borders and homogeneous echoes, TI-RADS category 2, considered benign.

[0094] Pathological features: Papillary thyroid carcinoma on the left side, microcarcinoma (<1cm).

[0095] After the system executes the processing logic, it identifies a significant logical exclusion between the imaging grade (TI-RADS 2, benign risk <2%) and the pathological gold standard (papillary carcinoma, malignant).

[0096] Risk identification: The system determines the compliance level as "non-compliant" and calculates a confidence level of 0.92.

[0097] Correction guidance generation: Based on the logic of step S5, the system identifies that the description of "clear boundaries and uniform echo" in the image report may mask the subtle features of microcarcinomas.

[0098] Improvement suggestions output: The system matches the strategy library, and the output includes improvement suggestions such as "Review ultrasound images, and pay attention to microcalcifications and aspect ratio characteristics".

[0099] This example demonstrates that the proposed solution can not only detect diagnostic biases, but also provide clinically valuable feedback data through difference dimension analysis, effectively compensating for the technical shortcomings of traditional quality control methods in handling subtle missed diagnosis risks.

[0100] Thirdly, embodiments of this application provide a computer-readable storage medium having computer program instructions stored thereon, which, when executed by a processor, implement the method described in any embodiment of the first aspect.

[0101] The method, system, and storage medium for judging the diagnostic consistency of medical technology reports and pathology reports based on a large language model provided in this application automatically parses the logical association between medical technology and pathology reports through structured feature extraction and vertical domain prompts. Simultaneously, by combining enhanced retrieval generation and model fine-tuning, the semantic understanding accuracy of medical terminology and hierarchical logic is improved. Furthermore, a confidence assessment and triage mechanism based on probability distribution vectors is introduced to identify and isolate low-confidence predictions, ensuring the objectivity of quality control conclusions.

[0102] In the description of this application, it should be noted that the terms "vertical", "up", "down", "horizontal", etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. They are only for the convenience of describing this application and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on this application.

[0103] In the description of this application, it should also be noted that, unless otherwise expressly specified and limited, the terms "set," "install," "connect," and "link" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal connection of two components. Those skilled in the art can understand the specific meaning of the above terms in this application according to the specific circumstances.

[0104] Finally, it should be noted that the above descriptions are merely preferred embodiments of this application and are not intended to limit this application. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical features. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the protection scope of this application.

Claims

1. A method for judging the diagnostic concordance between medical technology reports and pathology reports based on a large language model, characterized in that, Includes the following steps: The system acquires the medical technology report text and pathology report text to be processed, and extracts the imaging feature set from the medical technology report and the pathological diagnosis feature set from the pathology report through a preset structured feature extraction model. Construct structured prompt words for vertical domains. The prompts include at least preset role constraints, task definition protocols, structured output templates, few-shot learning examples, and compliance constraints. The imaging feature set and the pathological diagnosis feature set are mapped to the vertical domain structured prompt word instructions, and input into the preset large language model. The inference engine of the large language model generates diagnostic consistency prediction data. Extract the probability distribution vector of the output layer of the large language model, calculate the confidence index value of the prediction conclusion based on the probability distribution vector, and perform split processing on the prediction conclusion according to the mapping relationship between the confidence index value and the preset threshold. Based on the results of the traffic splitting process, a structured diagnostic compliance report is output, which includes compliance level, judgment logic basis, and key difference dimensions.

2. The method for judging the diagnostic consistency between medical technology reports and pathology reports based on a large language model according to claim 1, characterized in that, The steps for extracting the imaging feature set and the pathological diagnostic feature set include: A preset medical terminology standard alignment algorithm is invoked to map the terms in the medical technology report and pathology report to a standard medical terminology space. The specific execution steps of the medical terminology standard alignment algorithm include: The medical technology report and pathology report are input into a named entity recognition model built on BiLSTM-CRF to identify medical entity boundaries in the text in order to extract candidate medical terms; Traverse the pre-built standardized medical terminology mapping dictionary and perform precise matching retrieval on the candidate medical terms. The dictionary includes at least a thesaurus index that points semantically equivalent terms to the same standard term, and an abbreviation expansion table that maps abbreviations to their full names. In response to the candidate medical term not finding an exact match in the standardized medical terminology dictionary, the character edit distance between the candidate medical term and the target standard term, as well as the length of their topological association path in the medical ontology knowledge graph, are calculated. If the character editing distance is less than a preset threshold and the topological association path length is less than 3, then a standard mapping relationship between the candidate medical term and the target standard term is established.

3. The method for judging the diagnostic consistency between medical technology reports and pathology reports based on a large language model according to claim 1, characterized in that, The task definition protocol is configured with four levels of logical decision rules: The following conditions must be met: the set of imaging features and the set of pathological diagnostic features are logically consistent in terms of the preset malignancy assessment dimension; The general consensus is that the set of imaging features and the set of pathological diagnostic features are logically consistent in the main diagnostic dimension, but differ in the non-core medical description dimension. The following does not apply: The set of imaging features and the set of pathological diagnostic features are logically exclusive in the dimension of diagnostic conclusion. Irrelevant: The anatomical targets corresponding to the set of imaging features and the set of pathological diagnostic features do not overlap.

4. The method for judging the diagnostic consistency between medical technology reports and pathology reports based on a large language model according to claim 1, characterized in that, The large language model, combined with retrieval enhancement, generates auxiliary reasoning, and the specific execution steps include: The set of imaging features is converted into a vector to be retrieved, and the K medical knowledge entries with the highest matching degree are retrieved from a preset medical knowledge vector database. The medical knowledge vector database is constructed by vectorizing medical guide documents, disease classification standards and standard medical terminology database. The fusion weight w between the medical knowledge item and the retrieval vector is calculated using the following formula. i : In the formula, sim i Let be the cosine similarity between the i-th medical knowledge item vector and the vector to be retrieved. The preset temperature coefficient; And, the medical knowledge entries are weighted according to the fusion weight w i The weighted information is then processed and the enhanced knowledge information is embedded into the context of the structured prompt word instruction in the vertical domain.

5. The method for judging the diagnostic consistency between medical technology reports and pathology reports based on a large language model according to claim 1, characterized in that, The large language model is fine-tuned based on the LoRA algorithm, and the specific execution steps include: The rank of the low-rank decomposition matrix of the LoRA algorithm is set to an integer value between 8 and 64; Lock the target weight parameters for fine-tuning, which at least cover the query matrix W in the attention mechanism layer of the large language model. q Key matrix W k Value matrix W v and output matrix W o And the upprojection matrix W in the feedforward neural network layer up and the downward projection matrix W down ; The cross-entropy loss function is used as the target loss function in the fine-tuning process, and the calculation formula is as follows: In the formula, The number of training samples. The total number of categories for diagnostic conformity classification. For the first The sample belongs to the first The true label of the class, For the model to predict the first The sample belongs to the first The probability value of the class.

6. The method for judging the diagnostic consistency between medical technology reports and pathology reports based on a large language model according to claim 1, characterized in that, The steps for calculating the confidence index value of the prediction conclusion are as follows: Obtain the Logits vector sequence of the output layer when the large language model generates prediction conclusions, and calculate the probability distribution entropy H of the Logits vector sequence using the following formula: In the formula, V is the dimension of the output probability distribution vector, and p i Let p be the prediction probability of the i-th component. i It is obtained by performing Softmax normalization on the Logits vector of the model output layer, and the calculation formula is: Among them, z i This refers to the value of the i-th component of the Logits vector; If the calculated probability distribution entropy H satisfies the preset dispersion condition, it is determined that the uncertainty of the model prediction is too high. An anomaly identification signal is added to the prediction data, and the prediction conclusion is redirected to the secondary verification queue.

7. The method for judging the diagnostic consistency between medical technology reports and pathology reports based on a large language model according to claim 1, characterized in that, The structured diagnostic compliance report also includes guidance data for correcting medical technology reports, and its generation steps include: Qualitative features are extracted from the pathological diagnostic feature set and compared with the imaging feature set in multiple dimensions. Local feature descriptions in the medical technology report that contradict the qualitative features are identified, and corresponding correction guidance data is matched from a preset medical image description strategy library.

8. The method for judging the diagnostic consistency between medical technology reports and pathology reports based on a large language model according to claim 1, characterized in that, The method also includes a model iteration step based on feedback data: The report feature pairs after secondary verification and calibration are obtained and stored in the incremental training library as supervised training samples. Based on the GPPO algorithm, the parameters of the reward model of the large language model are updated, and the updated reward model is used to guide the policy optimization of the large language model. The function for calculating the reward value R(y|x) used to update the parameters is: In the formula, x represents the input report features, y represents the model output, and yi represents the model output. ∗ Standard answers marked by experts; R acc (y,y ∗ ) represents the accuracy bonus item, when y is equal to y ∗ If the judgment results are consistent, the first preset value is used; otherwise, 0 is used. R format (y) is the format compliance reward item. When y conforms to the preset structured JSON format specification, the second preset value is taken; otherwise, 0 is taken. R expert (y) represents the expert ranking preference reward, calculated based on the Bradley-Terry model: Where r w and r l σ represents the potential reward scores for the preferred and unpredictable outputs in the expert annotations, respectively; α, β, and γ are the corresponding weight coefficients.

9. A system for judging the diagnostic consistency between medical technology reports and pathology reports based on a large language model, characterized in that, include: One or more processors; Storage device for storing one or more programs; When the one or more programs are executed by the one or more processors, the one or more processors implement the method of any one of claims 1 to 8.

10. A computer-readable storage medium having computer program instructions stored thereon, characterized in that, When the computer program instructions are executed by the processor, they implement the method of any one of claims 1 to 8.