A medical structured data extraction method, device, equipment and medium

By using a pre-trained medical big language model to process medical record text and match it with medical form data elements, the problem of inefficient extraction of unstructured medical record data is solved, and efficient and automated generation and display of structured data is achieved.

CN122240741APending Publication Date: 2026-06-19WINNING HEALTH TECHNOLOGY GROUP CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
WINNING HEALTH TECHNOLOGY GROUP CO LTD
Filing Date
2024-12-18
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, medical record data exists in the form of unstructured text, which leads to low efficiency and error-proneness in automated processing and structured extraction.

Method used

The original medical record text is processed using a pre-trained medical big data language model to extract key information and match it with standard data elements in the target medical form to generate a structured data file, which is then rendered and displayed on the medical form interface.

🎯Benefits of technology

It enables automated and efficient extraction of key information from medical records, improves the efficiency of generating target medical forms, and allows medical staff to intuitively view key information points.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240741A_ABST
    Figure CN122240741A_ABST
Patent Text Reader

Abstract

This application provides a method, apparatus, device, and medium for extracting structured medical data, relating to the field of medical data processing technology. The method includes: obtaining the original medical record text of the target patient based on a creation trigger operation of a target medical form input through a preset medical form interface; processing the original medical record text using a pre-trained medical language model to obtain key information about the medical record; matching multiple key information points with predefined standard data elements in the target medical form to obtain a target structured data file corresponding to the target medical form; and rendering and displaying the target medical form on the preset medical form interface using the target structured data file. By processing the original medical record text through a pre-trained medical language model, key information about the medical record is accurately extracted and displayed in the target medical form, allowing medical staff to intuitively view the key information points corresponding to the target patient.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of medical data processing technology, and more specifically, to a method, apparatus, device, and medium for extracting structured medical data. Background Technology

[0002] With the rapid development of medical informatics and artificial intelligence technologies, the processing and analysis of medical data has become a crucial support for scientific research and clinical decision-making. However, most current medical record data exists in unstructured text format, making the automated processing and structured extraction of this data a key factor restricting the efficiency of medical research. Currently, most medical institutions rely on manual methods or simple keyword matching techniques to extract medical record data, which are inefficient and prone to errors. Summary of the Invention

[0003] The purpose of this invention is to address the shortcomings of the prior art by providing a method, apparatus, device, and medium for extracting structured medical data. This allows for the processing of original medical record text using a pre-trained medical language model to accurately extract key information from the medical record. The key information is then displayed in a target medical form, enabling medical staff to intuitively view the key information points in the original medical record text corresponding to the target patient.

[0004] To achieve the above objectives, the technical solutions adopted in the embodiments of this application are as follows:

[0005] In a first aspect, embodiments of this application provide a method for extracting structured medical data, including:

[0006] Based on the creation trigger operation of the target medical form for the target patient entered through the preset medical form interface, the original medical record text of the target patient is obtained;

[0007] A pre-trained medical big language model is used to process the original medical record text to obtain key information of the medical record, which includes multiple key information points.

[0008] The multiple key information points are matched with multiple predefined standard data elements in the target medical form to obtain the target structured data file corresponding to the target medical form;

[0009] Using the target structured data file, the target medical form is rendered and displayed on the preset medical form interface.

[0010] In an optional implementation, the step of matching the plurality of key information points with predefined standard data elements in the target medical form to obtain the target structured data file corresponding to the target medical form includes:

[0011] Based on the name of each standard data element in the plurality of standard data elements, the names of the plurality of key information points are matched to obtain the key information points corresponding to each standard data element;

[0012] The content parameters of the corresponding key information points are determined as the data parameters of each standard data element;

[0013] The names of the multiple standard data elements and the data parameters of the multiple standard data elements are respectively used as multiple structured data, each of which includes: the name of a standard data element and its corresponding data parameters;

[0014] The multiple pieces of structured data are assembled to generate the target structured data file.

[0015] In an optional implementation, the step of matching the name of each standard data element in the plurality of standard data elements with the names of the plurality of key information points to obtain the key information point corresponding to each standard data element includes:

[0016] Based on the name of each standard data element and the names of the multiple key information points, the key information points with the same name are determined as the key information points corresponding to each standard data element;

[0017] Alternatively, based on the name of each standard data element and the names of the multiple key information points, the key information points whose names satisfy a preset correspondence are determined as the key information points corresponding to each standard data element.

[0018] In an optional implementation, the key information in the medical record further includes: source information of the plurality of key information points, wherein the source information of each key information point is used to indicate the source position of each key information point in the original medical record text, and the original medical record text at the source position;

[0019] The method further includes:

[0020] The source data of the corresponding key information points shall be used as the source information of each standard data element;

[0021] The source information of the multiple standard data elements is rendered and displayed in the preset medical form interface.

[0022] In an optional implementation, the method further includes:

[0023] Based on the source selection operation for the target standard data element, a feedback control for the target standard data element is displayed;

[0024] Based on the traceability feedback trigger operation for the target standard data element input through the feedback control, a traceability feedback window for the target standard data element is displayed;

[0025] Based on the information input through the traceability feedback window, the feedback information of the target standard data element is determined;

[0026] Based on the feedback information, the medical big language model is fine-tuned.

[0027] In an optional implementation, the pre-trained medical large language model is used to process the original medical record text to obtain key medical record information, including:

[0028] The original medical record text is sent to the server through the preset interface of the medical big language model, so that the server uses the medical big language model to process the original medical record text and obtain the key information of the medical record.

[0029] Receive the key medical record information returned by the server;

[0030] The step of fine-tuning the medical language model based on the feedback information includes:

[0031] The feedback information is sent to the server through the preset interface of the medical language model, so that the server can fine-tune the medical language model based on the feedback information.

[0032] In an optional implementation, before processing the original medical record text using a pre-trained medical large language model to obtain key medical record information, the method further includes:

[0033] Acquire medical sample data for a preset key information extraction task. The medical sample data includes: sample medical record text, and sample medical record key information corresponding to the sample medical record text.

[0034] The medical sample data is used to fine-tune the preset medical domain model to obtain the medical big language model.

[0035] Secondly, embodiments of this application also provide a medical structured data extraction device, comprising:

[0036] The acquisition module is used to acquire the original medical record text of the target patient based on the creation trigger operation of the target medical form for the target patient entered through the preset medical form interface;

[0037] The processing module is used to process the original medical record text using a pre-trained medical big language model to obtain key information of the medical record, which includes multiple key information points.

[0038] The matching module is used to match the multiple key information points with multiple predefined standard data elements in the target medical form to obtain the target structured data file corresponding to the target medical form;

[0039] The display module is used to render and display the target medical form on the preset medical form interface using the target structured data file.

[0040] Thirdly, embodiments of this application also provide a computer device, including: a processor, a storage medium, and a bus, wherein the storage medium stores program instructions executable by the processor, and when the computer device is running, the processor communicates with the storage medium via the bus, and the processor executes the program instructions to perform the steps of the medical structured data extraction method as described in any of the first aspects.

[0041] Fourthly, embodiments of this application also provide a computer-readable storage medium storing a computer program, which, when executed by a processor, performs the steps of the medical structured data extraction method as described in any of the first aspects.

[0042] The beneficial effects of this application are:

[0043] This application provides a method, apparatus, device, and medium for extracting structured medical data. The method includes: obtaining the original medical record text of the target patient based on a creation trigger operation of a target medical form input through a preset medical form interface; processing the original medical record text using a pre-trained medical language model to obtain key information of the medical record, including multiple key information points; matching these multiple key information points with predefined standard data elements in the target medical form to obtain a target structured data file corresponding to the target medical form; and finally rendering and displaying the target medical form on the preset medical form interface using the target structured data file. By processing the original medical record text through a pre-trained medical language model, the key information of the medical record is accurately extracted, and then displayed in the target medical form. This allows medical staff to intuitively view the key information points in the original medical record text corresponding to the target patient, and enables automated and efficient extraction of key information, improving the efficiency of generating target medical forms. Attached Figure Description

[0044] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present invention and should not be regarded as a limitation on the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0045] Figure 1 One of the flowcharts for a medical structured data extraction method provided in this application embodiment;

[0046] Figure 2 A schematic diagram of a target medical form provided in an embodiment of this application;

[0047] Figure 3 A second schematic flowchart illustrating a medical structured data extraction method provided in this application embodiment;

[0048] Figure 4 A third schematic flowchart illustrating a medical structured data extraction method provided in this application embodiment;

[0049] Figure 5 A fourth schematic flowchart illustrating a medical structured data extraction method provided in this application embodiment;

[0050] Figure 6 A schematic diagram of a traceability feedback window provided in an embodiment of this application;

[0051] Figure 7 The fifth flowchart illustrating a medical structured data extraction method provided in this application embodiment;

[0052] Figure 8 A flowchart illustrating a method for extracting structured medical data, provided as an embodiment of this application, is shown in Figure 6.

[0053] Figure 9 A functional module diagram of a medical structured data extraction device provided in this application embodiment;

[0054] Figure 10 This is a schematic diagram of a computer device provided in an embodiment of this application. Detailed Implementation

[0055] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are some embodiments of the present invention, but not all embodiments.

[0056] Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely to illustrate selected embodiments of the application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without inventive effort are within the scope of protection of this application.

[0057] In the description of this application, it should be noted that if the terms "upper", "lower", etc. appear to indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings, or the orientation or positional relationship that the product of this application is usually placed in, it is only for the convenience of describing this application and simplifying the description, and does not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of this application.

[0058] Furthermore, the terms "first," "second," etc., used in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Additionally, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0059] It should be noted that, where there is no conflict, the features in the embodiments of this application can be combined with each other.

[0060] To accurately extract key information points from the original medical record text of a target patient and display them in a medical form, allowing medical staff to intuitively view these key information points, this application provides a method for extracting structured medical data. Based on the creation trigger operation of a target medical form for the target patient input through a preset medical form interface, the original medical record text of the target patient is obtained. A pre-trained medical language model is used to process the original medical record text to obtain key information, including multiple key information points. These key information points are then matched with predefined standard data elements in the target medical form to obtain a target structured data file corresponding to the target medical form. Finally, the target structured data file is used to render and display the target medical form in the preset medical form interface. By processing the original medical record text through a pre-trained medical language model, key information is accurately extracted and displayed in the target medical form, allowing medical staff to intuitively view the key information points in the original medical record text of the target patient.

[0061] The medical structured data extraction method provided in this application will be explained in detail below with reference to the accompanying drawings and specific examples. The medical structured data extraction method provided in this application can be implemented by a computer device pre-installed with a medical structured data extraction algorithm, through running the algorithm or software. The computer device can be, for example, a server or a terminal, and the terminal can be a user computer. Figure 1 This is one of the flowcharts illustrating a medical structured data extraction method provided in this application embodiment. Figure 2 This is a schematic diagram of a target medical form provided in an embodiment of this application. Figure 1 As shown, the method includes:

[0062] S101. Based on the creation trigger operation of the target medical form for the target patient entered through the preset medical form interface, obtain the original medical record text of the target patient.

[0063] In this embodiment, a creation control for the target medical form is displayed on a preset medical form interface. By selecting the creation control and entering the target patient's medical record identifier, the original medical record text of the target patient can be obtained. The target patient's medical record identifier can be the target patient's medical number, ID number, hospitalization number, etc.

[0064] S102. A pre-trained medical big language model is used to process the original medical record text to obtain key information from the medical record.

[0065] The key information in the medical record includes several key information points.

[0066] Specifically, after obtaining the original medical record text of the target patient, the original medical record file is cleaned to remove irrelevant information, such as headers, footers, signatures, etc., and the original medical record file is segmented and tagged with parts of speech to obtain the processed original medical record file.

[0067] A pre-trained medical big language model is used to perform deep understanding and analysis on the processed original medical record text, and output key information of the medical record. The key information of the medical record includes key information of multiple attributes, and each attribute includes multiple key information points. For example, the key information of multiple attributes can be: patient basic information, diagnosis information, treatment process, medication use, test results, etc. The patient basic information includes multiple key information points such as donor-recipient relationship, donor-recipient blood type, etc.

[0068] S103. Match multiple key information points with multiple predefined standard data elements in the target medical form to obtain the target structured data file corresponding to the target medical form.

[0069] S104. Using the target structured data file, render and display the target medical form in the preset medical form interface.

[0070] Specifically, different medical forms have different predefined standard data elements. By matching multiple key information points with the predefined standard data elements in the target medical form, the target structured data file corresponding to the target medical form is obtained. The target structured data file can be a JSON format structured data file.

[0071] The content of the target medical form is rendered based on the data content of the target structured data file, thereby displaying the specific content of the target medical form on the preset medical form interface, such as... Figure 2 As shown, the basic information form for hematopoietic stem cell transplant patients is displayed on the preset medical form interface.

[0072] In summary, this application provides a method for extracting structured medical data. The method includes: obtaining the original medical record text of the target patient based on a creation trigger operation of a target medical form input through a preset medical form interface; processing the original medical record text using a pre-trained medical language model to obtain key information, including multiple key information points; matching these key information points with predefined standard data elements in the target medical form to obtain a target structured data file corresponding to the target medical form; and finally rendering and displaying the target medical form on the preset medical form interface using the target structured data file. By processing the original medical record text using a pre-trained medical language model, key information is accurately extracted and displayed in the target medical form, allowing medical staff to intuitively view the key information points in the original medical record text corresponding to the target patient. This method automates and efficiently extracts key information from medical records, eliminating the need for manual extraction by medical staff and improving the efficiency of generating target medical forms.

[0073] Based on the above embodiments, this application also provides another possible implementation of the medical structured data extraction method. Figure 3 This is a second flowchart illustrating a method for extracting structured medical data according to an embodiment of this application. Figure 3 As shown, multiple key information points are matched with predefined standard data elements in the target medical form to obtain the target structured data file corresponding to the target medical form, including:

[0074] S201. Match the name of each standard data element in the multiple standard data elements with the names of multiple key information points to obtain the key information points corresponding to each standard data element.

[0075] In this embodiment, each key information point includes the name and content parameters of the key information point, and each standard data element includes the name and data parameters of each standard data element. By matching the name of each standard data element with the names of multiple key information points, the key information point corresponding to each standard data element is determined.

[0076] Optionally, based on the name of each standard data element and the names of multiple key information points, key information points with the same name are identified as the key information points corresponding to each standard data element.

[0077] For example, if the name of the first key information point is "gene type" among multiple key information points, and the name of the first standard data element is "gene type" among multiple standard data elements, then the first key information point and the first standard data element are determined to be the same, and thus the first key information point and the first standard data element are determined to correspond.

[0078] Alternatively, based on the name of each standard data element and the names of multiple key information points, the key information points whose names satisfy a preset correspondence are determined as the key information points corresponding to each standard data element.

[0079] For example, if among multiple key information points, there is a second key information point whose name is "diagnosis of the primary disease," and among multiple standard data elements, there is a second standard data element whose name is "Western medicine diagnosis name," then it is determined that the names of the second key information point and the second standard data element satisfy a preset correspondence, and thus the second key information point and the second standard data element are determined to correspond.

[0080] S202. Determine the content parameters of the corresponding key information points as the data parameters of each standard data element.

[0081] For example, the name of the first key information point is: gene type, and the content parameter of the first key information point is: ["NSD2,TET1,BCR:ABL,IKZF1,CSRP2"]. If it is determined that the first key information point corresponds to the first standard data element, then the content parameter of the first key information point is determined to be the data parameter of the first standard data element, and the data parameter of the first standard data element is determined to be: ["NSD2,TET1,BCR:ABL,IKZF1,CSRP2"].

[0082] S203. The names of multiple standard data elements and the data parameters of multiple standard data elements are respectively treated as multiple structured data.

[0083] Each piece of structured data includes: the name of a standard data element and its corresponding data parameters.

[0084] S204. Assemble multiple pieces of structured data to generate a target structured data file.

[0085] Specifically, each matched standard data element is treated as a key-value pair and assembled to generate the target structured data file. For example, if the target patient is a hematopoietic stem cell transplant patient, the target structured data file corresponding to the target medical form will be a JSON-formatted structured data file. The specific content of the target structured data file is as follows:

[0086] json

[0087] {

[0088] Name: "Supplier-Receiver Relationship", Data Parameter: "Brother-Supplier Relationship";

[0089] Name: "HLA matching", Data parameters: "HLA10 / 10";

[0090] Name: "Donor / Recipient Blood Type", Data Parameter: "O+ Donor O+";

[0091] Name: "Donor Gender", Data Parameter: "Male";

[0092] Name: "Diagnosis of Primary Disease", Data Parameters: "Acute Lymphoblastic Leukemia (Ph positive, BCR / ABLP190, CSRP2+, ZKF1+)";

[0093] Name: "Clinical Staging", Data Parameter: "Not Mentioned";

[0094] Name: "Gene Type", Data Parameters: "NSD2,TET1,BCR:ABL,IKZF1,CSRP2";

[0095] Name: "Genetic level at initial diagnosis", Data parameters: "NSD2 and TET1 mutations";

[0096] Name: "Pre-transplant genetic level", Data parameter: "negative";

[0097] }

[0098] In the method provided in this application embodiment, the name of each standard data element in a plurality of standard data elements is matched with the names of multiple key information points to obtain the key information points corresponding to each standard data element. The content parameters of the corresponding key information points are determined as the data parameters of each standard data element. The names and data parameters of the plurality of standard data elements are respectively used as multiple pieces of structured data. Each piece of structured data includes: the name of a standard data element and its corresponding data parameters. The multiple pieces of structured data are assembled to generate a target structured data file. By matching multiple key information points with predefined multiple standard data elements in the target medical form, a target structured data file corresponding to the target medical form is obtained, which is used to render and display the target medical form.

[0099] Based on the above embodiments, this application also provides another possible implementation of the medical structured data extraction method. The key information in the medical record further includes: source information for multiple key information points. The source information for each key information point indicates its source location in the original medical record text, as well as the original medical record text at that source location. Figure 4 This is the third flowchart illustrating a medical structured data extraction method provided in this application embodiment, as shown below. Figure 4 As shown, the method also includes:

[0100] S301. Use the source data of the corresponding key information points as the source information of each standard data element.

[0101] S302. Render and display the traceability information of multiple standard data elements in the preset medical form interface.

[0102] In this embodiment, the pre-trained medical big language model processes the original medical record text and determines not only the names and content parameters of multiple key information points, but also the source information of multiple key information points.

[0103] For example, the source information for the first key information point includes: Source location: discharge summary, original medical record: 1. Acute lymphoblastic leukemia (Ph positive, BCRIABLP190, CSRP2+, ZKF1+); 2. Allogeneic hematopoietic stem cell transplantation (planned, brother donates from brother HLA10 / 10); 3. Bone marrow suppression (grade IV); 4. Pancytopenia; 5. Agranulocytosis; 6. Infectious fever; 7. Perianal infection; 8. Hyponatremia; 9. Liver damage; 10. Pain.

[0104] If the first key information point is determined to correspond to the first standard data element, then the source data of the first key information point is determined to be the source information of the first standard data element. The source information of the first standard data element is determined to be: Source location: discharge summary, original medical record: 1. Acute lymphoblastic leukemia (Ph positive, BCRIABLP190, CSRP2+, ZKF1+); 2. Allogeneic hematopoietic stem cell transplantation (planned, brother donates from brother HLA10 / 10); 3. Bone marrow suppression (grade IV); 4. Pancytopenia; 5. Agranulocytosis; 6. Infectious fever; 7. Perianal infection; 8. Hyponatremia; 9. Liver damage; 10. Pain.

[0105] For example, if the target patient is a hematopoietic stem cell transplant recipient, the specific content of the target structured data file is as follows:

[0106] Json

[0107] {

[0108] Name: "Donor-Recipient Relationship", Data Parameter: "Brother Donates to Brother", Source Location: "Discharge Summary", Original Medical Record: "Discharge Diagnosis: 1. Acute lymphoblastic leukemia (Ph positive, BCRIABLP190, CSRP2+, ZKF1+); 2. Allogeneic hematopoietic stem cell transplantation (planned, brother donates to brother HLA10 / 10); 3. Bone marrow suppression (Grade IV); 4. Pancytopenia; 5. Agranulocytosis; 6. Infectious fever; 7. Perianal infection; 8. Hyponatremia; 9. Liver damage; 10. Pain";

[0109] Name: "HLA typing", Data parameters: "HLA10 / 10", Source location: "Discharge summary", Original medical record: "Discharge diagnosis: 1. Acute lymphoblastic leukemia (Ph positive, BCRIABLP190, CSRP2+, ZKF1+); 2. Allogeneic hematopoietic stem cell transplantation (planned, brother donates to brother, HLA10 / 10); 3. Myelosuppression (Grade IV); 4. Pancytopenia; 5. Agranulocytosis; 6. Infectious fever; 7. Perianal infection; 8. Hyponatremia; 9. Liver damage; 10. Pain"

[0110] Name: "Donor / Recipient Blood Type", Data Parameters: "O+ Donor O+", Source Location: "Discharge Summary", Original Medical Record: "Discharge Diagnosis: 1. Acute lymphoblastic leukemia (Ph positive, BCRIABLP190, CSRP2+, ZKF1+); 2. Allogeneic hematopoietic stem cell transplantation (planned, brother donates brother HLA10 / 10); 3. Bone marrow suppression (Grade IV); 4. Pancytopenia; 5. Agranulocytosis; 6. Infectious fever; 7. Perianal infection; 8. Hyponatremia; 9. Liver damage; 10. Pain";

[0111] Name: "Donor Gender", Data Parameter: "Male", Source Location: "Discharge Summary", Original Medical Record: "Discharge Diagnosis: 1. Acute lymphoblastic leukemia (Ph positive, BCRIABLP190, CSRP2+, ZKF1+); 2. Allogeneic hematopoietic stem cell transplantation (planned, brother donates to brother HLA10 / 10); 3. Bone marrow suppression (Grade IV); 4. Pancytopenia; 5. Agranulocytosis; 6. Infectious fever; 7. Perianal infection; 8. Hyponatremia; 9. Liver damage; 10. Pain";

[0112] Name: "Diagnosis of Primary Disease", Data Parameters: "Acute Lymphoblastic Leukemia (Ph positive, BCR / ABLP190, CSRP2+, ZKF1+)", Source: "Discharge Summary", Original Medical Record: "Discharge Diagnosis: 1. Acute lymphoblastic leukemia (Ph positive, BCR / ABLP190, CSRP2+, ZKF1+); 2. Allogeneic hematopoietic stem cell transplantation (planned, brother donates HLA10 / 10); 3. Bone marrow suppression (Grade IV); 4. Pancytopenia; 5. Agranulocytosis; 6. Infectious fever; 7. Perianal infection; 8. Hyponatremia; 9. Liver damage; 10. Pain"

[0113] Name: "Clinical Staging", Data Parameters: "Not Mentioned", Source Location: "Discharge Summary", Original Medical Record: "Discharge Diagnosis: 1. Acute lymphoblastic leukemia (Ph positive, BCRIABLP190, CSRP2+, ZKF1+); 2. Allogeneic hematopoietic stem cell transplantation (planned, brother donates, brother HLA10 / 10); 3. Myelosuppression (Grade IV); 4. Pancytopenia; 5. Agranulocytosis; 6. Infectious fever; 7. Perianal infection; 8. Hyponatremia; 9. Liver damage; 10. Pain";

[0114] Name: "Gene type", Data parameters: "NSD2, TET1, BCR:ABL, IKZF1, CSRP2", Source location: "Discharge summary", Original medical record: "P190 type BCR:ABL was detected in the above B-ALL related fusion genes, and WT1 expression was normal. Recommended follow-up items: BCR:ABL(P190) mRNA level, NGS: Class III mutation: NSD2, TET1 mutation, Chromosome: 46, XY, dup(1)(q11g44)[31 / 46, XY[1] A total of 4 metaphase phases were analyzed, of which 3 karyotypes had abnormalities of duplication of segments between 1q11 and 1q44; the remaining 1 was a normal male karyotype. Based on the patient's MICM typing and next-generation sequencing results, acute B lymphocytic leukemia (BCR::ABL(P190)+, CSRP2+, IKZF1+) was diagnosed";

[0115] Name: "Genetic Level at Initial Diagnosis", Data Parameters: "NSD2, TET1 Type Mutations", Source Location: "Discharge Summary", Original Medical Record: "AMLALLMDS Gene Combination (including P53 mutation): The specimen internal control was good. P190 type BCR:ABL was detected in the above B-ALL related fusion genes, and WT1 expression was normal. Recommended follow-up items: BCR:ABL (P190) mRNA level. NGS: Type III mutations: NSD2, TET1 mutations."

[0116] Name: "Pre-transplant genetic level", Data parameter: "Negative", Source location: "Discharge summary", Original medical record: "Bone marrow aspiration on May 7, 2024 showed grade III myelodysplasia, with 3% of the original blood cells. Immune residual + BCL-2 showed: a total of 930,000 nucleated cells were detected, with CD19+ cells accounting for 1.26% of the nucleated cells, representing B cells at different developmental stages, and expressing BCL2. IKZF1 deletion mutation was negative."

[0117] }

[0118] Then, based on the target structured data file, the traceability information of multiple standard data elements is rendered and displayed in the preset medical form interface.

[0119] In the method provided in this application embodiment, the traceability data of the corresponding key information points is used as the traceability information of each standard data element, and the traceability information of multiple standard data elements is rendered and displayed in a preset medical form interface. By displaying the traceability information of multiple standard data elements in the preset medical form interface, medical staff can view the traceability information of each standard data element, making it convenient for them to check whether the data parameters displayed for each standard data element are correct.

[0120] Based on the above embodiments, this application also provides another possible implementation of the medical structured data extraction method. Figure 5 This is the fourth flowchart illustrating a medical structured data extraction method provided in this application embodiment. Figure 6 This is a schematic diagram of a traceability feedback window provided in an embodiment of this application. Figure 5 As shown, the method also includes:

[0121] S401. Based on the source selection operation for the target standard data element, display the feedback control for the target standard data element.

[0122] S402. Based on the traceability feedback triggered by the input through the feedback control for the target standard data element, display the traceability feedback window for the target standard data element.

[0123] S403. Based on the information input through the traceability feedback window, determine the feedback information of the target standard data element.

[0124] S404. Based on the feedback information, fine-tune the medical language model.

[0125] In this embodiment, the target medical form and traceability information of multiple standard data elements are displayed on the preset medical form interface. For the traceability selection operation of the target standard data element, the cursor is moved to the area where the traceability information of the target standard data element is located, or the area where the traceability information of the target standard data element is located is clicked. A feedback control is displayed in the area where the traceability information of the target standard data element is located.

[0126] The traceability feedback triggered by inputting a response to a target standard data element via the feedback control—that is, clicking the feedback control—displays the traceability feedback window for the target standard data element, such as... Figure 6 As shown, clicking the feedback control in the area where the traceability information of the supplier-recipient relationship is located in the target standard data element will display the traceability feedback window in the preset medical form interface.

[0127] If medical staff believe that the traceability information of the donor-recipient relationship is incorrect, they can click the feedback control in the area where the traceability information of the donor-recipient relationship is located, and enter the correct traceability information of the donor-recipient relationship in the traceability feedback window, including the source location of the correct traceability information in the original medical record text, and the original medical record text at the source location. By entering the information in the traceability feedback window, the feedback information of the target standard data element can be determined.

[0128] By using feedback information, the medical big language model is fine-tuned so that it can accurately output the source information of each key information point.

[0129] In the method provided in this application embodiment, a feedback control for the target standard data element is displayed based on the source selection operation for the target standard data element. A source traceability feedback trigger operation for the target standard data element is displayed based on the source traceability feedback input through the feedback control. The feedback information for the target standard data element is determined based on the information input through the source traceability feedback window. The medical big language model is fine-tuned based on the feedback information. By displaying the source traceability feedback window and inputting feedback information, erroneous source traceability information can be corrected, and the medical big language model can be fine-tuned to improve the accuracy of the medical big language model in determining the source traceability information of each key information point.

[0130] Based on the above embodiments, this application also provides another possible implementation of the medical structured data extraction method. Figure 7 This is the fifth flowchart illustrating a medical structured data extraction method provided in this application embodiment, as shown below. Figure 7 As shown, a pre-trained medical large language model is used to process the original medical record text to obtain key information, including:

[0131] S501. The original medical record text is sent to the server through the preset interface of the medical big language model, so that the server uses the medical big language model to process the original medical record text and obtain the key information of the medical record.

[0132] S502, Receive key medical record information returned by the server.

[0133] In this embodiment, after obtaining the original medical record text of the target patient by triggering the creation of the target medical form input through the preset medical form interface, the original medical record text is sent to the server through the preset interface of the medical big language model, so that the server uses the medical big language model to process the original medical record text and obtain the key information of the medical record.

[0134] Multiple key information points are matched with predefined standard data elements in the target medical form to obtain the target structured data file corresponding to the target medical form. The target structured data file is then returned to the preset medical form interface through the preset interface of the medical big language model.

[0135] Based on the feedback information above, the medical language model was fine-tuned, including:

[0136] S503. Feedback information is sent to the server through the preset interface of the medical big language model, so that the server can fine-tune the medical big language model based on the feedback information.

[0137] Specifically, feedback information is sent to the server so that the server can retrain the medical big data language model and adjust the model parameters based on the correctly labeled data, thereby improving the accuracy and robustness of the source information extraction.

[0138] In the method provided in this application embodiment, the original medical record text is sent to the server through a preset interface of the medical big language model, so that the server uses the medical big language model to process the original medical record text and obtain key information of the medical record; the key information of the medical record returned by the server is received, and feedback information is sent to the server through the preset interface of the medical big language model, so that the server can fine-tune the medical big language model based on the feedback information, so as to realize the retraining of the medical big language model and improve the accuracy and robustness of the source information extraction.

[0139] This application also provides another possible implementation of the medical structured data extraction method. Figure 8 This is the sixth flowchart illustrating a medical structured data extraction method provided in this application embodiment, as shown below. Figure 8 As shown, before processing the original medical record text using a pre-trained medical large language model to obtain key information, the method also includes:

[0140] S601. Obtain medical sample data for the preset key information extraction task.

[0141] The medical sample data includes: sample medical record text, and key information of the sample medical record corresponding to the sample medical record text.

[0142] S602. Fine-tune the preset medical domain model using medical sample data to obtain a medical big language model.

[0143] In this embodiment, the sample medical record text specifically includes: books, clinical guidelines, drug instructions, medical records, test reports, etc. A Transformer structure is used, employing normalization techniques (Root Mean Square Layer Normalization, RMSNorm), a nonlinear activation function (Switched Gated Linear Unit, SwiGLU), and rotational position vectors to fine-tune the preset medical domain model, resulting in a large medical language model.

[0144] The method provided in this application embodiment obtains medical sample data for a preset key information extraction task. The medical sample data includes: sample medical record text and sample medical record key information corresponding to the sample medical record text. The medical sample data is used to fine-tune a preset medical domain model to obtain a medical big language model. The medical big language model is trained to obtain a medical big language model, which enables the medical big language model to deeply understand the semantic information of the medical record text, accurately identify and extract key information, improve the accuracy of information extraction, and the medical big language model does not rely on preset rules or templates. It can effectively handle medical domain-specific professional terms and abbreviations, and improve the system's generalization ability.

[0145] The following will continue to explain the medical structured data extraction device and computer equipment provided in any of the above embodiments of this application. The specific implementation process and the resulting technical effects are the same as those in the corresponding method embodiments. For the sake of brevity, the parts not mentioned in this embodiment can be referred to the corresponding content in the method embodiment.

[0146] Figure 9 This is a schematic diagram of the functional modules of a medical structured data extraction device provided in an embodiment of this application. Figure 9 As shown, the medical structured data extraction device 100 includes:

[0147] The acquisition module 110 is used to acquire the original medical record text of the target patient based on the creation trigger operation of the target medical form for the target patient entered through the preset medical form interface;

[0148] The processing module 120 is used to process the original medical record text using a pre-trained medical big language model to obtain key information of the medical record, which includes multiple key information points.

[0149] The matching module 130 is used to match multiple key information points with multiple predefined standard data elements in the target medical form to obtain the target structured data file corresponding to the target medical form;

[0150] Display module 140 is used to render and display the target medical form on a preset medical form interface using the target structured data file.

[0151] Optionally, the matching module 130 is further configured to match the name of each standard data element in the multiple standard data elements with the names of multiple key information points to obtain the key information points corresponding to each standard data element; determine the content parameters of the corresponding key information points as the data parameters of each standard data element; take the names of the multiple standard data elements and the data parameters of the multiple standard data elements as multiple pieces of structured data, each piece of structured data including: the name of a standard data element and the corresponding data parameters; assemble the multiple pieces of structured data to generate a target structured data file.

[0152] Optionally, the matching module 130 is further configured to determine key information points with the same name as each standard data element and the names of multiple key information points as key information points corresponding to each standard data element; or, based on the name of each standard data element and the names of multiple key information points, determine key information points whose names satisfy a preset correspondence as key information points corresponding to each standard data element.

[0153] Optionally, the key information in the medical record also includes: source information for multiple key information points, wherein the source information for each key information point is used to indicate the source location of each key information point in the original medical record text, and the original medical record text at the source location;

[0154] The device also includes:

[0155] The display module 140 is also used to use the traceability data of the corresponding key information points as the traceability information of each standard data element; and to render and display the traceability information of multiple standard data elements in the preset medical form interface.

[0156] Optionally, the device further includes:

[0157] The display module 140 is also used to display a feedback control for the target standard data element based on the traceability selection operation for the target standard data element; and to display a traceability feedback window for the target standard data element based on the traceability feedback trigger operation for the target standard data element input through the feedback control.

[0158] The determination module is used to determine the feedback information of the target standard data element based on the input operation of the information entered through the traceability feedback window;

[0159] The fine-tuning module is used to fine-tune the medical large language model based on feedback information.

[0160] Optionally, the processing module 120 is used to send the original medical record text to the server through a preset interface of the medical big language model, so that the server uses the medical big language model to process the original medical record text to obtain key information of the medical record; receive the key information of the medical record returned by the server; and send feedback information to the server through the preset interface of the medical big language model, so that the server can fine-tune the medical big language model based on the feedback information.

[0161] Optionally, the fine-tuning module is also used to obtain medical sample data for a preset key information extraction task. The medical sample data includes: sample medical record text and sample medical record key information corresponding to the sample medical record text; the medical sample data is used to fine-tune the preset medical domain model to obtain a medical big language model.

[0162] The above-described device is used to execute the method provided in the foregoing embodiments, and its implementation principle and technical effect are similar, so they will not be described again here.

[0163] These modules can be one or more integrated circuits configured to implement the above methods, such as one or more Application Specific Integrated Circuits (ASICs), one or more microprocessors, or one or more Field Programmable Gate Arrays (FPGAs). Alternatively, when a module is implemented using processing element scheduler code, the processing element can be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. Furthermore, these modules can be integrated together as a system-on-a-chip (SOC).

[0164] Figure 10 This is a schematic diagram of a computer device provided in an embodiment of this application, which can be used for medical structured data extraction. Figure 10 As shown, the computer device 200 includes: a processor 210, a storage medium 220, and a bus 230.

[0165] Storage medium 220 stores machine-readable instructions executable by processor 210. When the computer device is running, processor 210 communicates with storage medium 220 via bus 230, and processor 210 executes the machine-readable instructions to perform the steps of the above method embodiment. The specific implementation and technical effects are similar, and will not be described again here.

[0166] Optionally, this application also provides a storage medium 220, on which a computer program is stored. When the computer program is run by a processor, it executes the steps of the above-described method embodiments. The specific implementation and technical effects are similar, and will not be repeated here.

[0167] In the several embodiments provided by this invention, it should be understood that the disclosed apparatus and methods can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between apparatuses or units may be electrical, mechanical, or other forms.

[0168] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0169] Furthermore, the functional units in the various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or in the form of hardware plus software functional units.

[0170] The integrated units implemented as software functional units described above can be stored in a computer-readable storage medium. These software functional units, stored in a storage medium, include several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute some steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0171] The above are merely specific embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A method for extracting structured medical data, characterized in that, include: Based on the creation trigger operation of the target medical form for the target patient entered through the preset medical form interface, the original medical record text of the target patient is obtained; A pre-trained medical big language model is used to process the original medical record text to obtain key information of the medical record, which includes multiple key information points. The multiple key information points are matched with multiple predefined standard data elements in the target medical form to obtain the target structured data file corresponding to the target medical form; Using the target structured data file, the target medical form is rendered and displayed on the preset medical form interface.

2. The method as described in claim 1, characterized in that, The step of matching the multiple key information points with predefined standard data elements in the target medical form to obtain the target structured data file corresponding to the target medical form includes: Based on the name of each standard data element in the plurality of standard data elements, the names of the plurality of key information points are matched to obtain the key information points corresponding to each standard data element; The content parameters of the corresponding key information points are determined as the data parameters of each standard data element; The names of the multiple standard data elements and the data parameters of the multiple standard data elements are respectively used as multiple structured data, each of which includes: the name of a standard data element and its corresponding data parameters; The multiple pieces of structured data are assembled to generate the target structured data file.

3. The method as described in claim 2, characterized in that, The step of matching the name of each standard data element in the plurality of standard data elements with the names of the plurality of key information points to obtain the key information point corresponding to each standard data element includes: Based on the name of each standard data element and the names of the plurality of key information points, key information points with the same names are identified as the key information points corresponding to each standard data element; or... Based on the name of each standard data element and the names of the multiple key information points, the key information points whose names satisfy a preset correspondence are determined as the key information points corresponding to each standard data element.

4. The method as described in claim 2, characterized in that, The key information in the medical record also includes: source information of the multiple key information points, wherein the source information of each key information point is used to indicate the source position of each key information point in the original medical record text, and the original medical record text at the source position; The method further includes: The source data of the corresponding key information points shall be used as the source information of each standard data element; The source information of the multiple standard data elements is rendered and displayed in the preset medical form interface.

5. The method as described in claim 4, characterized in that, The method further includes: Based on the source selection operation for the target standard data element, a feedback control for the target standard data element is displayed; Based on the traceability feedback trigger operation for the target standard data element input through the feedback control, a traceability feedback window for the target standard data element is displayed; Based on the information input through the traceability feedback window, the feedback information of the target standard data element is determined; Based on the feedback information, the medical big language model is fine-tuned.

6. The method as described in claim 5, characterized in that, The pre-trained medical language model is used to process the original medical record text to obtain key medical record information, including: The original medical record text is sent to the server through the preset interface of the medical big language model, so that the server uses the medical big language model to process the original medical record text and obtain the key information of the medical record. Receive the key medical record information returned by the server; The step of fine-tuning the medical language model based on the feedback information includes: The feedback information is sent to the server through the preset interface of the medical language model, so that the server can fine-tune the medical language model based on the feedback information.

7. The method as described in claim 1, characterized in that, Before processing the original medical record text using a pre-trained medical big language model to obtain key information about the medical record, the method further includes: Acquire medical sample data for a preset key information extraction task. The medical sample data includes: sample medical record text, and sample medical record key information corresponding to the sample medical record text. The medical sample data is used to fine-tune the preset medical domain model to obtain the medical big language model.

8. A medical structured data extraction device, characterized in that, include: The acquisition module is used to acquire the original medical record text of the target patient based on the creation trigger operation of the target medical form for the target patient entered through the preset medical form interface; The processing module is used to process the original medical record text using a pre-trained medical big language model to obtain key information of the medical record, which includes multiple key information points. The matching module is used to match the multiple key information points with multiple predefined standard data elements in the target medical form to obtain the target structured data file corresponding to the target medical form; The display module is used to render and display the target medical form on the preset medical form interface using the target structured data file.

9. A computer device, characterized in that, include: The device includes a processor, a storage medium, and a bus, wherein the storage medium stores program instructions executable by the processor, and when the computer device is running, the processor communicates with the storage medium via the bus, and the processor executes the program instructions to perform the steps of the medical structured data extraction method as described in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, The storage medium stores a computer program, which, when executed by a processor, performs the steps of the medical structured data extraction method as described in any one of claims 1 to 7.