A case data extraction method under multiple medical institutions and a related device thereof
By constructing a machine learning model and UIE framework, and using the TF-IDF algorithm to identify tags, generate prompt templates and output templates, the problem of excessive data transmission and inconsistent information formats in the acquisition of case data between multiple medical institutions in different regions is solved, and efficient and intelligent information extraction is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHINA PING AN PROPERTY INSURANCE CO LTD
- Filing Date
- 2023-06-20
- Publication Date
- 2026-06-23
AI Technical Summary
In the acquisition of case data across multiple medical institutions in different regions, there are problems such as excessive data transmission and interaction between the receiving end and the collected end, and the acquisition of complex information lacking a unified format.
We employ machine learning model training and prompt learning methods. By constructing an information extraction model, using the TF-IDF algorithm to identify labeled words, and generating prompt templates and output templates, we combine the UIE framework to achieve unified format extraction of case data.
It reduces the number of data transmission interactions, improves the intelligence of information extraction, ensures that the output information has a uniform format, and saves extraction time.
Smart Images

Figure CN116756224B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of smart healthcare technology, and in particular to a method for extracting case data across multiple medical institutions and related equipment. Background Technology
[0002] Smart healthcare, abbreviated as WITMED, is a recently emerging specialized medical term. It involves creating regional healthcare information platforms with health records, utilizing cutting-edge Internet of Things (IoT) technology to enable interaction between patients, medical staff, medical institutions, and medical equipment, gradually achieving informatization. Due to the imperfections of China's public healthcare management system, high medical costs, limited access, and low coverage are problems plaguing the public.
[0003] Therefore, we need to establish a smart medical information network platform system so that patients can enjoy safe, convenient, and high-quality medical services with shorter waiting times and basic medical expenses. Currently, in the process of upgrading traditional healthcare to smart healthcare, improving comprehensive medical records and cross-regional medical cooperation has become a hot topic. However, in practical implementation, especially in acquiring case data across multiple medical institutions in different regions, there are often too many data transmission interactions between the receiving and collecting ends, resulting in complex and unformatted information. Therefore, there is an urgent need for a method and related equipment for extracting case data from multiple medical institutions across regions to solve the problems of excessive data transmission interactions between the receiving and collecting ends and the lack of a unified format for the acquired information. Summary of the Invention
[0004] The purpose of this application is to propose a method and related equipment for extracting case data across multiple medical institutions, in order to solve the problems of excessive data transmission and interaction between the receiving end and the collected end in the existing technology for obtaining case data across multiple medical institutions in different regions, and the complex information obtained lacking a unified format.
[0005] To address the aforementioned technical problems, this application provides a method for extracting case data from multiple medical institutions, employing the following technical solution:
[0006] A method for extracting case data from multiple medical institutions includes the following steps:
[0007] Obtain N medical institution objects to be extracted, where N is a positive integer;
[0008] Extract any one of the N medical institution objects as the target medical institution, and connect to the target data cache library of the target medical institution, wherein the data cache library stores the full amount of historical case data;
[0009] Sample case data are obtained from the data cache library, wherein the sample case data consists of a number of selected case data from the full historical case data;
[0010] The sample case data is input into a pre-built machine learning model, and the machine learning model is trained using a prompting learning method to obtain a trained target machine learning model.
[0011] Obtain the pre-set target information to be extracted and the output template corresponding to the target information;
[0012] A prompt template for extracting the target information based on the trained machine learning model;
[0013] Based on the prompt template, the output template, and the preset UIE framework, a model building operation is performed to obtain the information extraction model corresponding to the target medical institution;
[0014] After completing the model building operation for the N medical institution objects, N information extraction models corresponding to the N medical institution objects are obtained;
[0015] Data extraction operations are performed on the data cache corresponding to the N medical institution objects according to the N information extraction models to obtain the data extraction results;
[0016] The data extraction results are output to the preset target receiving end.
[0017] Furthermore, before performing the step of inputting the sample case data into a pre-built machine learning model and training the machine learning model using a cue-based learning approach to obtain a trained target machine learning model, the method further includes:
[0018] Extract patient information, lesion information, diagnosis information, physician information, and treatment information from each historical case data entry within the sample case data, and obtain the extraction results;
[0019] Based on the extraction results, subsets of patient information, lesion information, diagnosis information, physician information, and treatment information are constructed.
[0020] Based on the subset of patient information and a preset word frequency analysis model, tag words for identifying patient information from the target information are obtained;
[0021] Similarly, tags for identifying lesion information from the target information, tags for identifying diagnostic information from the target information, tags for identifying physician information from the target information, and tags for identifying treatment information from the target information are obtained respectively.
[0022] Organize tags for identifying patient information, lesion information, diagnosis information, physician information, and treatment information, and construct a tag set.
[0023] Furthermore, the step of obtaining tag words for identifying patient information from the target information based on the subset of patient information and a preset word frequency analysis model specifically includes:
[0024] The subset of patient information is input into the preset word frequency analysis model, wherein the preset word frequency analysis model is a word frequency analysis model based on the TF-IDF algorithm;
[0025] Each piece of patient information in the patient information subset is segmented into words to obtain the segmentation results for each piece of patient information.
[0026] Based on the word segmentation results and the TF-IDF algorithm, calculate the TF-IDF value of each word in each piece of data to be analyzed;
[0027] All word segments in the patient information subset are sorted in descending order of TF-IDF value, and the top N word segments are selected as tag words for identifying patient information from the target information, where N is a positive integer.
[0028] Furthermore, the step of calculating the TF-IDF value of each word in each piece of data to be analyzed based on the word segmentation results and the TF-IDF algorithm specifically includes:
[0029] Obtain the word segmentation results of the current data to be analyzed;
[0030] Based on the word segmentation results, the frequency of the current word segment in the current data to be analyzed is calculated.
[0031] Count the number of all data entries to be analyzed in the patient information subset, and count the number of all data entries to be analyzed that contain the current word segmentation;
[0032] Calculate the inverse document frequency of the current word segment based on the number of all data entries to be analyzed in the patient information subset and the number of all data entries to be analyzed that contain the current word segment.
[0033] The TF-IDF value of the current word segment is obtained by performing a multiplication operation based on the word frequency and the inverse document frequency.
[0034] Furthermore, the step of organizing and constructing a tag set, which includes identifying patient information, lesion information, diagnosis information, physician information, and treatment information, specifically comprises:
[0035] Based on the different categories of information, different category distinction numbers are assigned to the tag words of different information categories. The different information categories include patient information category, lesion information category, diagnosis information category, physician information category, and treatment information category.
[0036] Different sub-category distinction numbers are set for different tags of the same category of information, wherein the different tags of the same category of information refer to the top N words when identifying the same category of information;
[0037] Create a ternary word group based on the current tag word, the major category distinction number of the current tag word, and the minor category distinction number of the current tag word;
[0038] Get the trigrams corresponding to each tag word;
[0039] The three-word phrases corresponding to each tag word are added to a pre-built set to complete the construction of the tag word set.
[0040] Furthermore, the machine learning model includes a T5 model, and the step of extracting the prompt template of the target information based on the trained machine learning model specifically includes:
[0041] The sample case data and the tag word set are pre-input into the T5 model as training corpus and recognition reference corpus;
[0042] The target information is set to the output information of the T5 model;
[0043] Identify all the tag words contained in the input information based on the tag word set, as well as the major category difference number and minor category difference number corresponding to all the tag words contained in the input information;
[0044] Each historical case data is sequentially obtained from the training corpus, and the position information of all the tag words contained in the input information in each historical case data is identified.
[0045] Based on the position information of all the tags contained in the input information in each historical case data, replace the major category difference number and minor category difference number corresponding to each tag word with the corresponding position in each historical case data, and obtain each case data after the replacement is completed;
[0046] Each case data entry after the replacement is completed is set as a prompt template for extracting the target information.
[0047] Furthermore, the step of performing model building operations based on the prompt template, the output template, and the preset UIE framework to obtain the information extraction model corresponding to the target medical institution specifically includes:
[0048] The prompt template is deployed at the input interface of the UIE framework, and the output template is deployed at the output interface of the UIE framework to complete the construction of the information extraction model for the target medical institution.
[0049] Before executing the step of batch retrieving the target information from the data cache of the corresponding medical institution and sending it to the preset target receiving end through the information extraction model corresponding to each medical institution, the method further includes:
[0050] The information extraction model corresponding to each medical institution is deployed between the target receiving end and the data cache of the corresponding medical institution.
[0051] The step of batch retrieving the target information from the data cache of the corresponding medical institution and sending it to the preset target receiving end through the information extraction model corresponding to each medical institution specifically includes:
[0052] Retrieve historical case data in batches from the data cache of each medical institution;
[0053] The acquired historical case data is input into the input interface of the corresponding information extraction model.
[0054] By using the prompt templates pre-deployed at the input interface and the major category difference numbers and minor category difference numbers corresponding to the tags in the prompt templates, the target information to be extracted contained in each historical case data is identified;
[0055] The target information is sent to the output interface, and the target information is output in a standardized format according to the output template at the output interface.
[0056] To address the aforementioned technical problems, this application also provides a computer device that employs the following technical solution:
[0057] A computer device includes a memory and a processor, wherein the memory stores computer-readable instructions, and the processor executes the computer-readable instructions to implement the steps of the case data extraction method for multiple medical institutions described above.
[0058] To address the aforementioned technical problems, this application also provides a computer-readable storage medium, employing the technical solution described below:
[0059] A computer-readable storage medium storing computer-readable instructions, which, when executed by a processor, implement the steps of the case data extraction method for multiple medical institutions as described above.
[0060] Compared with the prior art, the embodiments of this application have the following main advantages:
[0061] The case data extraction method for multiple medical institutions described in this application involves: connecting to the data cache of the target medical institution; acquiring sample case data; acquiring pre-set target information to be extracted and the corresponding output template; inputting the sample case data into a pre-built machine learning model and training it using a prompting learning method to obtain a prompt template for extracting target information; constructing an information extraction model corresponding to the target medical institution based on the prompt template, output template, and a preset UIE framework; sequentially setting different medical institutions as target medical institutions and repeating the above steps to obtain an information extraction model corresponding to each medical institution; and using the information extraction model corresponding to each medical institution to batch extract target information from the data cache of the corresponding medical institution and send it to the preset target receiving end. By constructing an information extraction model corresponding to each medical institution, the digital medical cloud platform can extract target information from multiple target sources, i.e., different medical institutions, with only a small amount of data for training, and can complete the extraction of target information in one go, eliminating the need for multiple information extractions based on different keywords. This method is more intelligent, saves extraction time, and reduces the number of data transmission interactions. It also ensures that the output target information has a unified output format. Attached Figure Description
[0062] To more clearly illustrate the solutions in this application, the accompanying drawings used in the description of the embodiments of this application will be briefly introduced below. Obviously, the accompanying drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0063] Figure 1 This is an exemplary system architecture diagram to which this application can be applied;
[0064] Figure 2 A flowchart of an embodiment of the case data extraction method under multiple medical institutions according to this application;
[0065] Figure 3 A flowchart of a specific embodiment of the tag set construction method described in the multi-medical-institution case data extraction method of this application;
[0066] Figure 4 yes Figure 3 A flowchart of a specific embodiment of step 303 shown;
[0067] Figure 5 yes Figure 4 A flowchart of a specific embodiment of step 403 shown;
[0068] Figure 6 yes Figure 3 A flowchart of a specific embodiment of step 305 shown;
[0069] Figure 7 yes Figure 2 A flowchart of a specific embodiment of step 206 shown;
[0070] Figure 8 yes Figure 2 A flowchart of a specific embodiment of step 209 shown;
[0071] Figure 9 A schematic diagram of a structural embodiment of the case data extraction device for multiple medical institutions according to this application;
[0072] Figure 10 A schematic diagram of the structure of an embodiment of the computer device according to this application. Detailed Implementation
[0073] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application pertains; the terminology used herein in the specification of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having," and any variations thereof, in the specification, claims, and foregoing drawings of this application, are intended to cover non-exclusive inclusion. The terms "first," "second," etc., in the specification, claims, or foregoing drawings of this application are used to distinguish different objects, not to describe a particular order.
[0074] In this document, the term "embodiment" means that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment of this application. The appearance of this phrase in various places throughout the specification does not necessarily refer to the same embodiment, nor is it a separate or alternative embodiment mutually exclusive with other embodiments. It will be explicitly and implicitly understood by those skilled in the art that the embodiments described herein can be combined with other embodiments.
[0075] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
[0076] like Figure 1As shown, system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105. Network 104 serves as the medium for providing communication links between terminal devices 101, 102, and 103 and server 105. Network 104 may include various connection types, such as wired or wireless communication links, or fiber optic cables, etc.
[0077] Users can use terminal devices 101, 102, and 103 to interact with server 105 via network 104 to receive or send messages, etc. Various communication client applications can be installed on terminal devices 101, 102, and 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social media platform software, etc.
[0078] Terminal devices 101, 102, and 103 can be various electronic devices with displays and support web browsing, including but not limited to smartphones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III), MP4 players (Moving Picture Experts Group Audio Layer IV), laptops, and desktop computers, etc.
[0079] Server 105 can be a server that provides various services, such as a backend server that supports the pages displayed on terminal devices 101, 102, and 103.
[0080] It should be noted that the case data extraction method under multiple medical institutions provided in this application embodiment is generally executed by a server / terminal device, and correspondingly, the case data extraction device under multiple medical institutions is generally set in the server / terminal device.
[0081] It should be understood that Figure 1 The number of terminal devices, networks, and servers shown is merely illustrative. Depending on implementation needs, any number of terminal devices, networks, and servers can be included.
[0082] Continue to refer to Figure 2 A flowchart illustrating an embodiment of the case data extraction method for multiple medical institutions according to this application is shown. The case data extraction method for multiple medical institutions includes the following steps:
[0083] Step 201: Obtain N medical institution objects to be extracted, where N is a positive integer.
[0084] Step 202: Extract any one of the N medical institution objects as the target medical institution, and connect to the target data cache library of the target medical institution.
[0085] In this embodiment, the data cache library stores all historical case data, wherein the all historical case data refers to all historical case data within the target medical institution.
[0086] In this embodiment, the data cache library can be a file-based data cache library, such as the Hive library, or a relational data cache library, such as the MySQL or Oracle library.
[0087] Step 203: Obtain sample case data from the data cache library, wherein the sample case data consists of a number of selected case data from the full historical case data.
[0088] In this embodiment, each historical case data includes patient information, lesion information, diagnostic information, physician information, and treatment information.
[0089] In this embodiment, the step of obtaining sample case data from the data cache library specifically includes: if the data cache library is a file-based data cache library, then downloading a preset number of historical case data files from the data cache library in a file download manner; or, if the data cache library is a relational data cache library, then obtaining a preset number of historical case data entries from the data cache library in an Ajax manner.
[0090] Step 204: Input the sample case data into the pre-built machine learning model and train the machine learning model using a prompting learning method to obtain the trained target machine learning model.
[0091] In this embodiment, before performing the step of inputting the sample case data into a pre-built machine learning model and training the machine learning model using a prompting learning method to obtain a trained target machine learning model, the method further includes a label word set construction method.
[0092] Continue to refer to Figure 3 , Figure 3 A flowchart of a specific embodiment of the tag set construction method described in the multi-medical institution case data extraction method of this application includes:
[0093] Step 301: Extract patient information, lesion information, diagnostic information, physician information, and treatment information from each historical case data in the sample case data, and obtain the extraction results;
[0094] Step 302: Based on the extraction results, construct patient information subsets, lesion information subsets, diagnostic information subsets, physician information subsets, and treatment information subsets;
[0095] Step 303: Based on the subset of patient information and the preset word frequency analysis model, obtain the tag words for identifying patient information from the target information;
[0096] Continue to refer to Figure 4 , Figure 4 yes Figure 3 A flowchart of a specific embodiment of step 303 shown includes:
[0097] Step 401: Input the subset of patient information into the preset word frequency analysis model, wherein the preset word frequency analysis model is a word frequency analysis model based on the TF-IDF algorithm;
[0098] Step 402: Perform word segmentation on each piece of patient information in the patient information subset to obtain the word segmentation results contained in each piece of patient information.
[0099] Step 403: Based on the word segmentation results and the TF-IDF algorithm, calculate the TF-IDF value of each word in each piece of data to be analyzed;
[0100] Continue to refer to Figure 5 , Figure 5 yes Figure 4 A flowchart of a specific embodiment of step 403 shown includes:
[0101] Step 501: Obtain the word segmentation results of the current data to be analyzed;
[0102] Step 502: Based on the word segmentation processing results, count the word frequency of the current word segment in the current data to be analyzed. Specifically, obtain the total number of words in the current data to be analyzed after word segmentation processing, identify the number of times the current word segment appears in all words, and calculate the ratio of the number of times to the total number of words as the word frequency of the current word segment in the current data to be analyzed.
[0103] Step 503: Count the number of all data entries to be analyzed in the patient information subset, and count the number of all data entries to be analyzed that contain the current word segmentation;
[0104] Step 504: Based on the number of all data entries to be analyzed in the patient information subset and the number of all data entries to be analyzed containing the current segment, calculate the inverse document frequency (IVF) of the current segment. Specifically, use the inverse document frequency formula: Calculate the inverse document frequency of the current word segment, where IDF represents the inverse document frequency, D represents the number of all data entries to be analyzed in the patient information subset, d represents the number of all data entries to be analyzed that contain the current word segment, and adding 1 is to avoid the denominator being 0;
[0105] Step 505: Perform a multiplication operation based on the word frequency and the inverse document frequency to obtain the TF-IDF value of the current word segment.
[0106] Step 404: Sort all word segments in the patient information subset according to the TF-IDF value from largest to smallest, and select the top N word segments as tag words to identify patient information from the target information, where N is a positive integer.
[0107] Step 304, similarly, obtain the tag words for identifying lesion information from the target information, the tag words for identifying diagnostic information from the target information, the tag words for identifying physician information from the target information, and the tag words for identifying treatment information from the target information;
[0108] By constructing subsets of patient information, lesion information, diagnosis information, physician information, and treatment information, and using a pre-defined word frequency analysis model, it is possible to obtain the tag words corresponding to different information categories in historical case data. This allows subsequent programs to directly identify the information category to be obtained when they directly hit the tag words, and quickly obtain the target information based on the information category.
[0109] Step 305: Organize the tags for identifying patient information, lesion information, diagnosis information, physician information, and treatment information, and construct a tag set.
[0110] By constructing a tag set, sufficient data labels are provided for subsequent prompt learning, ensuring that the data labels for prompt learning are derived from the sample case data. Especially when different medical institutions have different case data storage templates, sample case data is collected separately from each medical institution, thereby constructing a corresponding tag set for each institution based on its sample case data, thus guaranteeing the correspondence between the tag set and the corresponding medical institution.
[0111] Continue to refer to Figure 6 , Figure 6 yes Figure 3 A flowchart of a specific embodiment of step 305 shown includes:
[0112] Step 601: Based on the different categories of identified information, set different major category distinction numbers for the tag words of different information categories;
[0113] In this embodiment, the different information categories include patient information category, lesion information category, diagnosis information category, physician information category, and treatment information category;
[0114] Step 602: Set different sub-category distinction numbers for different tags of the same category of information, wherein the different tags of the same category of information refer to the top N words when identifying the same category of information;
[0115] Step 603: Create a ternary word group based on the current tag word, the major category distinction number of the current tag word, and the minor category distinction number of the current tag word;
[0116] Step 604: Obtain the trigram phrase corresponding to each tag word;
[0117] Step 605: Add the trigram corresponding to each tag word to the pre-built set to complete the construction of the tag word set.
[0118] By assigning category-specific and subcategory-specific distinguishing numbers to tags, the problem of identical tags across different information categories, which could lead to confusion during identification, is avoided. Alternatively, other methods can be used to mitigate this confusion. Specifically, tags within different information categories can be compared, eliminating tags belonging to multiple categories and retaining only those unique to each category. A distinguishing number can then be assigned to the retained tags.
[0119] Step 205: Obtain the pre-set target information to be extracted and the output template corresponding to the target information, wherein the target information includes at least one of the patient information, lesion information, diagnostic information, physician information and treatment information.
[0120] In this embodiment, the output template corresponding to the target information is generally a pre-set template with a fixed and standardized output format, such as an output template in JSON data format.
[0121] In this embodiment, the target information can be a single patient information, lesion information, diagnostic information, physician information, or treatment information, or it can be a comprehensive information consisting of two or more of the patient information, lesion information, diagnostic information, physician information, and treatment information.
[0122] In this embodiment, the target information can be freely set according to the needs of the receiving end. The receiving end can be a cross-platform online cloud medical record management platform, or it can be other authorized medical institutions that remotely obtain medical case data from the collected end.
[0123] Step 206: Extract the prompt template of the target information according to the trained machine learning model.
[0124] In this embodiment, the preset machine learning model is the T5 model.
[0125] Continue to refer to Figure 7 , Figure 7 yes Figure 2 A flowchart of a specific embodiment of step 206 shown includes:
[0126] Step 701: The sample case data is used as training corpus and the set of labeled words is used as recognition reference corpus and pre-input into the T5 model;
[0127] Step 702: Set the target information to the output information of the T5 model;
[0128] Step 703: Identify all the tag words contained in the input information, as well as the major category difference number and minor category difference number corresponding to all the tag words contained in the input information, based on the tag word set;
[0129] Correspondingly, if each tag word in the tag word set is a unique tag word, then step 703 is to identify all the tag words contained in the input information and the distinguishing numbers of all the tag words contained in the input information according to the tag word set.
[0130] Step 704: Sequentially obtain each historical case data from the training corpus, and identify the position information of all the tag words contained in the input information in each historical case data;
[0131] Step 705: Based on the position information of all the tags contained in the input information in each historical case data, replace the major category difference number and minor category difference number corresponding to each tag word in the corresponding position in each historical case data, and obtain each case data after the replacement is completed;
[0132] Correspondingly, if each tag in the tag set is a unique tag, then step 705 is to replace the difference number corresponding to each tag in each historical case data with the position information of all the tag words contained in the input information in each historical case data, and obtain each case data after the replacement is completed.
[0133] Step 706: Set each case data after the replacement is completed as a prompt template for extracting the target information.
[0134] By replacing the tag words with the set differentiation numbers, a prompt template is generated. This not only preserves the original text data structure of the corresponding historical case data, but also makes it easier to identify the target information directly through the differentiation numbers in the prompt template.
[0135] Step 207: Perform model building operations based on the prompt template, the output template, and the preset UIE framework to obtain the information extraction model corresponding to the target medical institution.
[0136] In this embodiment, the step of performing model building operations based on the prompt template, the output template, and the preset UIE framework to obtain the information extraction model corresponding to the target medical institution specifically includes: deploying the prompt template at the input interface of the UIE framework, deploying the output template at the output interface of the UIE framework, and completing the construction of the information extraction model for the target medical institution.
[0137] In this embodiment, the UIE framework is used to construct the information extraction model, which allows the target information to be extracted in one go through the prompt template with only one data comparison, eliminating the need for multiple information extractions based on different keywords. This makes the process more intelligent, saves extraction time, and reduces the number of data transmission interactions.
[0138] Step 208: After completing the model building operation for the N medical institution objects, N information extraction models corresponding to the N medical institution objects are obtained.
[0139] Step 209: Perform data extraction operations on the data cache corresponding to the N medical institution objects according to the N information extraction models to obtain the data extraction results.
[0140] In this embodiment, before executing the step of batch retrieving the target information from the data cache of the corresponding medical institution and transmitting it to the preset target receiving end using the information extraction model corresponding to each medical institution, the method further includes: deploying the information extraction model corresponding to each medical institution between the target receiving end and the data cache of the corresponding medical institution. This enables the digital healthcare cloud platform to extract target information from multiple target sources, i.e., different medical institutions, requiring only a small amount of data for training to complete the one-time extraction of target information, eliminating the need for multiple information extractions based on different keywords. This is more intelligent, saves extraction time, and reduces the number of data transmission interactions.
[0141] Continue to refer to Figure 8 , Figure 8 yes Figure 2 A flowchart of a specific embodiment of step 209 shown includes:
[0142] Step 801: Batch retrieve historical case data from the data cache of each medical institution;
[0143] Step 802: Input the batch-acquired historical case data into the input interface of the corresponding information extraction model;
[0144] Step 803: Identify the target information to be extracted contained in each historical case data by using the prompt template pre-deployed at the input interface and the major category difference number and minor category difference number corresponding to the tags in the prompt template;
[0145] Correspondingly, if each tag in the tag set is a unique tag, then step 803 is to identify the target information to be extracted contained in each historical case data by using the prompt template pre-deployed at the input interface and the difference number corresponding to the tag in the prompt template;
[0146] Step 804: Send the target information to the output interface, and output the target information in a standardized format according to the output template at the output interface.
[0147] Step 210: Output the data extraction result to the preset target receiving end.
[0148] By constructing information extraction models corresponding to each medical institution, the digital medical cloud platform can extract target information from multiple target sources, i.e., different medical institutions. Only a small amount of data is needed for training, and the target information can be extracted in one go. There is no need to extract information multiple times based on different keywords. This is more intelligent, saves extraction time, and reduces the number of data transmission interactions.
[0149] This application connects to the data cache of the target medical institution; obtains sample case data; obtains pre-set target information to be extracted and the corresponding output template; inputs the sample case data into a pre-built machine learning model and trains it using a prompting learning method to obtain a prompt template for extracting target information; constructs an information extraction model corresponding to the target medical institution based on the prompt template, output template, and a pre-set UIE framework; sets different medical institutions as target medical institutions in sequence and repeats the above steps to obtain an information extraction model corresponding to each medical institution; and uses the information extraction model corresponding to each medical institution to batch extract target information from the data cache of the corresponding medical institution and send it to the pre-set target receiving end. By constructing an information extraction model corresponding to each medical institution, the digital medical cloud platform can extract target information from multiple target sources, i.e., different medical institutions, with only a small amount of data for training, and can complete the extraction of target information in one go, eliminating the need for multiple information extractions based on different keywords. This is more intelligent, saves extraction time, and reduces the number of data transmission interactions. At the same time, it also ensures that the output target information has a unified output format.
[0150] The embodiments of this application can acquire and process relevant data based on artificial intelligence technology. Artificial intelligence (AI) refers to the theories, methods, technologies, and application systems that use digital computers or machines controlled by digital computers to simulate, extend, and expand human intelligence, perceive the environment, acquire knowledge, and use that knowledge to obtain optimal results.
[0151] Foundational technologies for artificial intelligence generally include sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating / interactive systems, and mechatronics. AI software technologies mainly encompass computer vision, robotics, biometrics, speech processing, natural language processing, and machine learning / deep learning.
[0152] In this embodiment, by constructing an information extraction model corresponding to each medical institution, the digital medical cloud platform can extract target information from multiple target sources, i.e., different medical institutions. Only a small amount of data is needed for training, allowing for a one-time extraction of target information without the need for multiple extractions based on different keywords. This is more intelligent, saves extraction time, and reduces the number of data transmission interactions. Simultaneously, it ensures that the output target information has a unified output format.
[0153] Further reference Figure 9 As a response to the above Figure 2The implementation of the method shown in this application provides an embodiment of a case data extraction device for multiple medical institutions. This device embodiment is similar to... Figure 2 Corresponding to the method embodiments shown, this device can be specifically applied to various electronic devices.
[0154] like Figure 9 As shown, the case data extraction device 900 for multiple medical institutions described in this embodiment includes: a medical institution object acquisition module 901, a target object connection module 902, a sample case data acquisition module 903, a learning model training module 904, a target information setting module 905, a prompt template extraction module 906, an information extraction model construction module 907, a multiple information extraction model acquisition module 908, an extraction result acquisition module 909, and an extraction result transmission module 910. Wherein:
[0155] The medical institution object acquisition module 901 is used to acquire N medical institution objects to be extracted, where N is a positive integer;
[0156] The target object connection module 902 is used to extract any one of the N medical institution objects as the target medical institution and connect to the target data cache library of the target medical institution, wherein the data cache library stores the full amount of historical case data.
[0157] The sample case data acquisition module 903 is used to acquire sample case data from the data cache library, wherein the sample case data is a number of selected case data from the full historical case data;
[0158] The learning model training module 904 is used to input the sample case data into a pre-constructed machine learning model and train the machine learning model using a prompting learning method to obtain a trained target machine learning model.
[0159] The target information setting module 905 is used to obtain the pre-set target information to be extracted and the output template corresponding to the target information;
[0160] The prompt template extraction module 906 is used to extract prompt templates for the target information based on the trained machine learning model.
[0161] The information extraction model construction module 907 is used to perform model construction operations based on the prompt template, the output template and the preset UIE framework to obtain the information extraction model corresponding to the target medical institution.
[0162] The multiple information extraction model acquisition module 908 is used to obtain N information extraction models corresponding to the N medical institution objects after the model construction operation of the N medical institution objects is completed.
[0163] The extraction result acquisition module 909 is used to perform data extraction operations on the data cache corresponding to the N medical institution objects according to the N information extraction models, and obtain the data extraction results.
[0164] The extraction result transmission module 910 is used to output the data extraction result to a preset target receiving end.
[0165] This application connects to the data cache of the target medical institution; obtains sample case data; obtains pre-set target information to be extracted and the corresponding output template; inputs the sample case data into a pre-built machine learning model and trains it using a prompting learning method to obtain a prompt template for extracting target information; constructs an information extraction model corresponding to the target medical institution based on the prompt template, output template, and a pre-set UIE framework; sets different medical institutions as target medical institutions in sequence and repeats the above steps to obtain an information extraction model corresponding to each medical institution; and uses the information extraction model corresponding to each medical institution to batch extract target information from the data cache of the corresponding medical institution and send it to the pre-set target receiving end. By constructing an information extraction model corresponding to each medical institution, the digital medical cloud platform can extract target information from multiple target sources, i.e., different medical institutions, with only a small amount of data for training, and can complete the extraction of target information in one go, eliminating the need for multiple information extractions based on different keywords. This is more intelligent, saves extraction time, and reduces the number of data transmission interactions. At the same time, it also ensures that the output target information has a unified output format.
[0166] Those skilled in the art will understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware through computer-readable instructions. These computer-readable instructions can be stored in a computer-readable storage medium. When the program is executed, it can include the processes of the embodiments of the methods described above. The aforementioned storage medium can be a non-volatile storage medium such as a magnetic disk, optical disk, or read-only memory (ROM), or random access memory (RAM).
[0167] It should be understood that although the steps in the flowcharts of the accompanying figures are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some steps in the flowcharts of the accompanying figures may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily completed at the same time, but can be executed at different times, and their execution order is not necessarily sequential, but can be performed alternately or in turn with other steps or at least some of the sub-steps or stages of other steps.
[0168] To address the aforementioned technical problems, embodiments of this application also provide a computer device. Please refer to [link / reference needed]. Figure 10 , Figure 10 This is a basic structural block diagram of the computer device in this embodiment.
[0169] The computer device 10 includes a memory 10a, a processor 10b, and a network interface 10c that are interconnected via a system bus. It should be noted that only the computer device 10 with components 10a-10c is shown in the figure; however, it should be understood that it is not required to implement all the shown components, and more or fewer components can be implemented alternatively. Those skilled in the art will understand that the computer device described here is a device capable of automatically performing numerical calculations and / or information processing according to pre-set or stored instructions, and its hardware includes, but is not limited to, microprocessors, application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), digital signal processors (DSPs), embedded devices, etc.
[0170] The computer device can be a desktop computer, laptop, handheld computer, or cloud server, etc. The computer device can interact with the user via a keyboard, mouse, remote control, touchpad, or voice control.
[0171] The memory 10a includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 10a may be an internal storage unit of the computer device 10, such as the hard disk or memory of the computer device 10. In other embodiments, the memory 10a may also be an external storage device of the computer device 10, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc., equipped on the computer device 10. Of course, the memory 10a may also include both the internal storage unit and the external storage device of the computer device 10. In this embodiment, the memory 10a is typically used to store the operating system and various application software installed on the computer device 10, such as computer-readable instructions for a method of extracting case data from multiple medical institutions. Furthermore, the memory 10a can also be used to temporarily store various types of data that have already been output or will be output.
[0172] In some embodiments, the processor 10b may be a central processing unit (CPU), controller, microcontroller, microprocessor, or other data processing chip. The processor 10b is typically used to control the overall operation of the computer device 10. In this embodiment, the processor 10b is used to execute computer-readable instructions stored in the memory 10a or to process data, for example, to execute computer-readable instructions for the multi-medical-institution case data extraction method.
[0173] The network interface 10c may include a wireless network interface or a wired network interface, which is typically used to establish communication connections between the computer device 10 and other electronic devices.
[0174] The computer device proposed in this embodiment belongs to the field of smart healthcare technology. This application connects to the data cache of a target medical institution; acquires sample case data; acquires pre-set target information to be extracted, and the corresponding output template; inputs the sample case data into a pre-built machine learning model, and trains it using a prompting learning method to obtain a prompt template for extracting target information; constructs an information extraction model corresponding to the target medical institution based on the prompt template, output template, and a preset UIE framework; sets different medical institutions as target medical institutions sequentially, and repeats the above steps to obtain an information extraction model corresponding to each medical institution; and uses the information extraction model corresponding to each medical institution to batch extract target information from the data cache of the corresponding medical institution and send it to a preset target receiving end. By constructing an information extraction model corresponding to each medical institution, the digital healthcare cloud platform can extract target information from multiple target sources, i.e., different medical institutions, requiring only a small amount of data for training to complete the one-time extraction of target information, eliminating the need for multiple information extractions based on different keywords, making it more intelligent, saving extraction time, and reducing the number of data transmission interactions. Simultaneously, it also ensures that the output target information has a unified output format.
[0175] This application also provides another embodiment, namely, providing a computer-readable storage medium storing computer-readable instructions that can be executed by a processor to cause the processor to perform the steps of the case data extraction method under multiple medical institutions as described above.
[0176] The computer-readable storage medium proposed in this embodiment belongs to the field of smart healthcare technology. This application connects to the data cache of a target medical institution; acquires sample case data; acquires pre-set target information to be extracted, and the corresponding output template; inputs the sample case data into a pre-built machine learning model, and trains it using a prompting learning method to obtain a prompt template for extracting target information; constructs an information extraction model corresponding to the target medical institution based on the prompt template, output template, and a preset UIE framework; sets different medical institutions as target medical institutions sequentially, and repeats the above steps to obtain an information extraction model corresponding to each medical institution; and uses the information extraction model corresponding to each medical institution to batch extract target information from the data cache of the corresponding medical institution and send it to a preset target receiving end. By constructing an information extraction model corresponding to each medical institution, the digital healthcare cloud platform can extract target information from multiple target sources, i.e., different medical institutions, requiring only a small amount of data for training to complete the one-time extraction of target information, eliminating the need for multiple information extractions based on different keywords, making it more intelligent, saving extraction time, and reducing the number of data transmission interactions. Simultaneously, it also ensures that the output target information has a unified output format.
[0177] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk), and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the various embodiments of this application.
[0178] Obviously, the embodiments described above are only some embodiments of this application, not all embodiments. The accompanying drawings show preferred embodiments of this application, but do not limit the patent scope of this application. This application can be implemented in many different forms; rather, the purpose of providing these embodiments is to provide a more thorough and comprehensive understanding of the disclosure of this application. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or make equivalent substitutions for some of the technical features. Any equivalent structures made using the content of this application's specification and drawings, directly or indirectly applied to other related technical fields, are similarly within the scope of patent protection of this application.
Claims
1. A method for extracting case data from multiple medical institutions, characterized in that, Includes the following steps: Obtain N medical institution objects to be extracted, where N is a positive integer; Extract any one of the N medical institution objects as the target medical institution, and connect to the target data cache library of the target medical institution, wherein the data cache library stores the full amount of historical case data; Sample case data are obtained from the data cache library, wherein the sample case data consists of a number of selected case data from the full historical case data; The sample case data is input into a pre-constructed machine learning model, and the machine learning model is trained using a cue-based learning approach to obtain a trained target machine learning model, wherein the machine learning model includes a T5 model; Obtain the pre-set target information to be extracted and the output template corresponding to the target information; The prompt template for extracting the target information based on the trained machine learning model specifically includes: The sample case data and the tag word set were pre-input into the T5 model as training corpus and recognition reference corpus. The target information is set to the output information of the T5 model; Identify all the tag words contained in the input information based on the tag word set, as well as the major category distinction number and minor category distinction number corresponding to all the tag words contained in the input information; Each historical case data is sequentially obtained from the training corpus, and the position information of all the tag words contained in the input information in each historical case data is identified. Based on the position information of all the tags contained in the input information in each historical case data, replace the major category difference number and minor category difference number corresponding to each tag word with the corresponding position in each historical case data, and obtain each case data after the replacement is completed; Each case data entry after the replacement is completed is set as a prompt template for the extracted target information; Based on the prompt template, the output template, and the preset UIE framework, a model building operation is performed to obtain the information extraction model corresponding to the target medical institution; After completing the model building operation for the N medical institution objects, N information extraction models corresponding to the N medical institution objects are obtained; Data extraction operations are performed on the data cache corresponding to the N medical institution objects according to the N information extraction models to obtain the data extraction results; The data extraction results are output to the preset target receiving end.
2. The method for extracting case data across multiple medical institutions according to claim 1, characterized in that, Before performing the step of inputting the sample case data into a pre-built machine learning model and training the machine learning model using a cue-based learning approach to obtain a trained target machine learning model, the method further includes: Extract patient information, lesion information, diagnosis information, physician information, and treatment information from each historical case data entry within the sample case data, and obtain the extraction results; Based on the extraction results, subsets of patient information, lesion information, diagnosis information, physician information, and treatment information are constructed. Based on the subset of patient information and a preset word frequency analysis model, tag words for identifying patient information from the target information are obtained; Similarly, tags for identifying lesion information from the target information, tags for identifying diagnostic information from the target information, tags for identifying physician information from the target information, and tags for identifying treatment information from the target information are obtained respectively. Organize tags for identifying patient information, lesion information, diagnosis information, physician information, and treatment information, and construct a tag set.
3. The method for extracting case data across multiple medical institutions according to claim 2, characterized in that, The step of obtaining tag words for identifying patient information from the target information based on the subset of patient information and a preset word frequency analysis model specifically includes: The subset of patient information is input into the preset word frequency analysis model, wherein the preset word frequency analysis model is a word frequency analysis model based on the TF-IDF algorithm; Each piece of patient information in the patient information subset is segmented into words to obtain the segmentation results for each piece of patient information. Based on the word segmentation results and the TF-IDF algorithm, calculate the TF-IDF value of each word in each piece of data to be analyzed; All word segments in the patient information subset are sorted in descending order of TF-IDF value, and the top M word segments are selected as tag words for identifying patient information from the target information, where M is a positive integer.
4. The method for extracting case data across multiple medical institutions according to claim 3, characterized in that, The step of calculating the TF-IDF value of each word in each piece of data to be analyzed based on the word segmentation results and the TF-IDF algorithm specifically includes: Obtain the word segmentation results of the current data to be analyzed; Based on the word segmentation results, the frequency of the current word segment in the current data to be analyzed is calculated. Count the number of all data entries to be analyzed in the patient information subset, and count the number of all data entries to be analyzed that contain the current word segmentation; Calculate the inverse document frequency of the current word segment based on the number of all data entries to be analyzed in the patient information subset and the number of all data entries to be analyzed that contain the current word segment. The TF-IDF value of the current word segment is obtained by performing a multiplication operation based on the word frequency and the inverse document frequency.
5. The method for extracting case data across multiple medical institutions according to claim 3, characterized in that, The step of organizing and constructing a tag set, which includes identifying tags for patient information, lesion information, diagnosis information, physician information, and treatment information, specifically includes: Based on the different categories of information, different category distinction numbers are assigned to the tag words of different information categories. The different information categories include patient information category, lesion information category, diagnosis information category, physician information category, and treatment information category. Different sub-category distinction numbers are set for different tags of the same category of information, wherein the different tags of the same category of information refer to the top M words when identifying the same category of information; Create a ternary word group based on the current tag word, the major category distinction number of the current tag word, and the minor category distinction number of the current tag word; Get the trigrams corresponding to each tag word; The three-word phrases corresponding to each tag word are added to a pre-built set to complete the construction of the tag word set.
6. The method for extracting case data across multiple medical institutions according to claim 1, characterized in that, The step of performing model building operations based on the prompt template, the output template, and the preset UIE framework to obtain the information extraction model corresponding to the target medical institution specifically includes: The prompt template is deployed at the input interface of the UIE framework, and the output template is deployed at the output interface of the UIE framework to complete the construction of the information extraction model for the target medical institution. Before performing the step of extracting data from the data cache corresponding to the N medical institution objects according to the N information extraction models to obtain the data extraction results, the method further includes: The information extraction model corresponding to each medical institution is deployed between the target receiving end and the data cache of the corresponding medical institution. The step of performing data extraction operations on the data cache corresponding to the N medical institution objects according to the N information extraction models to obtain the data extraction results specifically includes: Retrieve historical case data in batches from the data cache of each medical institution; The acquired historical case data is input into the input interface of the corresponding information extraction model. By using the prompt templates pre-deployed at the input interface and the major category difference numbers and minor category difference numbers corresponding to the tags in the prompt templates, the target information to be extracted contained in each historical case data is identified; The target information is sent to the output interface, and the target information is output in a standardized format according to the output template at the output interface.
7. A device for extracting case data across multiple medical institutions, characterized in that, The multi-medical-institution case data extraction device is used to implement the steps of the multi-medical-institution case data extraction method as described in any one of claims 1 to 6, wherein the multi-medical-institution case data extraction device comprises: The medical institution object acquisition module is used to acquire N medical institution objects to be extracted, where N is a positive integer; The target object connection module is used to extract any one of the N medical institution objects as the target medical institution and connect to the target data cache library of the target medical institution, wherein the data cache library stores the full amount of historical case data. The sample case data acquisition module is used to acquire sample case data from the data cache library, wherein the sample case data is a number of selected case data from the full historical case data; The learning model training module is used to input the sample case data into a pre-built machine learning model and train the machine learning model using a prompting learning method to obtain a trained target machine learning model. The target information setting module is used to obtain the pre-set target information to be extracted and the output template corresponding to the target information; The prompt template extraction module is used to extract prompt templates for the target information based on the trained machine learning model. The information extraction model construction module is used to perform model construction operations based on the prompt template, the output template and the preset UIE framework to obtain the information extraction model corresponding to the target medical institution. Multiple information extraction model acquisition module is used to obtain N information extraction models corresponding to the N medical institution objects after the model building operation of the N medical institution objects is completed; The extraction result acquisition module is used to perform data extraction operations on the data cache corresponding to the N medical institution objects according to the N information extraction models, and obtain the data extraction results. The extraction result transmission module is used to output the data extraction result to a preset target receiving end.
8. A computer device comprising a memory and a processor, the memory storing computer-readable instructions, wherein the processor, when executing the computer-readable instructions, implements the steps of the multi-medical-institution case data extraction method as described in any one of claims 1 to 6.
9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer-readable instructions, which, when executed by a processor, implement the steps of the multi-medical institution case data extraction method as described in any one of claims 1 to 6.