Disease name code matching method and device, computer equipment and storage medium
A disease and name technology, which is applied in the fields of devices, computer equipment and storage media, and disease name code matching method, can solve the problem of low accuracy rate of disease name code matching, and achieve the effect of reducing the amount of calculation and improving the accuracy rate
Pending Publication Date: 2020-09-22
PING AN TECH (SHENZHEN) CO LTD
0 Cites 1 Cited by
AI-Extracted Technical Summary
Problems solved by technology
[0004] The purpose of the embodiment of the present application is to propose a disease name code matching method...
Method used
In the present embodiment, carry out HardVoting fusion or SoftVoting fusion according to the similarity that each fuzzy matching sub-model calculates, the result of each fuzzy matching sub-model is all taken into account, thereby generating the second pair of code results, improving the generation of the first The accuracy rate of the binary code result.
In the present embodiment, disease name is input into exact matching sub-model according to the arrangement order of four exact matching sub-models in exact matching model, and four exact matching sub-models are followed by complete matching sub-model, removing stop words Sub-models, primary and secondary segregation sub-models, and synonymous recognition sub-models, according to the four exact matching sub-models, different methods can be used to match disease names, which improves the accuracy of disease name matching.
In the present embodiment, disease name is input to exact matching submodel according to the order of arrangement of four exact matching submodels in exact matching model, and four exact matching submodels are followed by complete matching submodel, removing stop words Sub-models, primary and secondary segregation sub-models, and synonymous recognition sub-models, according to the four exact matching sub-models, different methods can be used to match disease names, which improves the accuracy of disease name matching.
In the present embodiment, each disease name in the disease name list after removing the weight, according to the order of arrangement of exact matching sub-model input exact matching sub-model to match, if can match, then generate the first pair of code results, If it cannot be matched, enter the next exact matching sub-model to continue matching. The exact matching sub-models are different, which ensures that the disease name can be accurately matched from multiple dimensions and improves the accuracy of the disease name code.
In the present embodiment, each fuzzy matching sub-model in the input fuzzy matching model of candidate code disease name, each fuzzy matching sub-model adopts different methods to calculate candidate code disease name and the similarity of each standard disease name degree, combined with the similarity calculated by each fuzzy matching sub-model to generate the second code pairing result, which improves the accuracy of the code pairing of candidate coded disease names.
In the present embodiment, each fuzzy matching sub-model in the input fuzzy matching model of the candidate code disease name, each fuzzy matching sub-model adopts different methods to calculate the similarity between the candidate code disease name and each standard disease name degree, combined with the similarity calculated by each fuzzy matching sub-model to generate the second code pairing result, which improves the accuracy of the code pairing of candidate coded disease names.
In the present embodiment, first disease title list is carried out to reduce calculation load; The disease title list after deduplication is input in accurate matching model and carries out accurate matching, obtains the first code pairing result, fails to realize accurate The matched disease name is input into the fuzzy matching model as a candidate paired disease name for fuzzy matching, and the second code pairing result is obtained. During the two code pairings, the code pairing is performed according to the standard disease classification table; finally, according to the first code pairing result The disease name code list is generated with the second code result, and the disease name is coded in multiple dimensions and modes through exact matching and fuzzy matching, which improves the accuracy of the disease name code.
In the present embodiment, first disease title list is carried out to reduce calculation load; The disease title list after deduplication is input in accurate matching model and carries out accurate matching, obtains the first pair of code result, fails to realize accurate The matched disease name is input into the fuzzy matching model as a candidate ...
Abstract
The embodiment of the invention belongs to the field of artificial intelligence, and relates to a disease name code matching method and device, computer equipment and a storage medium. The method comprises the steps of performing duplicate removal processing on repeated disease names in the disease name list to obtain a duplicate-removed disease name list; inputting the deduplicated disease name list into an accurate matching model, and performing code matching according to a standard disease classification table to obtain a first code matching result and candidate code matching disease names;inputting the obtained candidate code matching disease names into a fuzzy matching model, and performing code matching according to the standard disease classification table to obtain a second code matching result; and generating a disease name code matching list according to the first code matching result and the second code matching result. The disease names are subjected to multi-dimensional and multi-mode code matching, the accuracy of disease name code matching is improved, and the disease name list can also be stored in the blockchain to improve the privacy and security of data.
Application Domain
Relational databasesNatural language data processing +2
Technology Topic
BlockchainComputer equipment +7
Image
Examples
- Experimental program(1)
Example Embodiment
[0048]Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of the application; the terms used herein in the description of the application are only to describe specific embodiments The purpose is not to limit the present application; the terms "comprising" and "having" and any variations thereof in the specification and claims of the present application and the description of the above drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.
[0049] Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiments.
[0050] The block chain referred to in the present invention is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain (Blockchain), essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
[0051] In order to enable those skilled in the art to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings.
[0052] like figure 1 As shown, the system architecture 100 may include terminal devices 101 , 102 , 103 , a network 104 and a server 105 . The network 104 is used as a medium for providing communication links between the terminal devices 101 , 102 , 103 and the server 105 . Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
[0053] Users can use terminal devices 101 , 102 , 103 to interact with server 105 via network 104 to receive or send messages and the like. Various communication client applications can be installed on the terminal devices 101, 102, 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.
[0054] Terminal devices 101, 102, 103 can be various electronic devices with display screens and support for web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, Motion Picture Experts Compression standard audio layer 3), MP4 (MovingPictureExperts Group Audio LayerIV, moving picture experts compression standard audio layer 4) player, laptop portable computer and desktop computer, etc.
[0055] The server 105 may be a server that provides various services, such as a background server that provides support for pages displayed on the terminal devices 101 , 102 , 103 .
[0056] It should be noted that the disease name coding method provided in the embodiment of the present application is generally executed by a server, and correspondingly, the disease name coding device is generally set in the server.
[0057] should understand, figure 1 The number of end devices, networks and servers in is indicative only. According to the implementation needs, there can be any number of terminal devices, networks and servers.
[0058] continue to refer figure 2 , shows a flow chart of an embodiment of the disease name code matching method according to the present application. The method for matching codes of disease names comprises the following steps:
[0059] Step 201, obtain a list of disease names from electronic medical records.
[0060] In this embodiment, the electronic device (such as figure 1 The server shown) can communicate with the terminal or the server through a wired connection or a wireless connection. It should be pointed out that the above wireless connection methods may include but not limited to 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods known or developed in the future .
[0061] Wherein, the disease name list may be a list composed of disease names recorded in the electronic medical record.
[0062] Specifically, the information recorded in the electronic medical record is structured, for example, the electronic medical record contains disease name information, symptom record information, and diagnosis and treatment information. The server reads a large number of disease names from the structured electronic medical records to obtain a list of disease names. In the disease name list, the electronic medical record identifier is stored correspondingly to the disease name. The electronic medical record identifier is the identifier of the electronic medical record, and the electronic medical record identifier can be a character string combining letters, numbers, special symbols, etc.
[0063] In one embodiment, the electronic medical records read by the server may come from various terminals or from a preset database.
[0064] In one embodiment, the server can set a timed task to perform disease name code matching on a regular basis, such as once a month or a quarter, and a timed task can be set to activate at a specific time every month or every quarter Matching tasks. The server can use Cron (timed task) in Linux to trigger information synchronization instructions, and Cron can execute specific tasks at the agreed time.
[0065] Step 202, deduplication processing is performed on duplicated disease names in the disease name list to obtain a deduplicated disease name list.
[0066] Specifically, there may be a large number of identical disease names in the disease name list. For example, during the period of high incidence of influenza, many patients with influenza will go to the hospital for treatment. At this time, the electronic medical records obtained by the hospital will contain many disease names such as "influenza". Increase the amount of calculation and reduce the efficiency of code matching.
[0067] The server first identifies the disease names that appear repeatedly in the disease name list, and then performs deduplication processing to obtain the deduplication disease name list.
[0068] A large number of duplicate disease names can be reserved for only one, and the rest of the duplicate disease names can be deleted, and the electronic medical record identification corresponding to the deleted disease name can be stored in association with the electronic medical record identification corresponding to the reserved disease name, so as to finally restore the identification of all disease names. Coding result; wherein, the electronic medical record identification corresponding to the reserved disease name can inherit the initial electronic medical record identification, or reset the electronic medical record identification corresponding to the reserved disease name.
[0069] Step 203, input the list of deduplicated disease names into the exact matching model, perform code matching according to the standard disease classification table, and obtain the first code matching result and candidate matching disease names.
[0070] Specifically, the exact matching model performs exact matching on the disease name from the text level, and the exact matching model codes the disease name according to the standard disease classification table, that is, exactly matches the disease name with the standard disease name in the standard disease classification table. Standard disease names and disease codes corresponding to the standard disease names are stored in the standard disease classification table, which may be the tenth version of the International Classification of Diseases (ICD): ICD-10.
[0071] When the disease name can be exactly matched with a certain standard disease name, the standard disease name and the disease code corresponding to the standard disease name are taken as the first pairing result. The disease names that cannot be matched by the exact matching model will be used as candidate coded disease names for the second round of code matching.
[0072] Step 204, input the obtained candidate coded disease names into the fuzzy matching model, perform code matching according to the standard disease classification table, and obtain a second code matching result.
[0073] Specifically, the fuzzy matching model performs fuzzy matching on the candidate coded disease name through similarity calculation, and takes the standard disease name that can achieve fuzzy matching with the candidate coded disease name and the disease code corresponding to the standard disease name as the second pair code result.
[0074] In one embodiment, the server can use different types of fuzzy matching methods to calculate the similarity between the candidate coded disease name and the standard disease name in the standard disease classification table, and combine the similarity calculated by different types of fuzzy matching methods to determine the similarity with the standard disease name in the standard disease classification table. Candidate paired disease names realize fuzzy matching of standard disease names, and the standard disease name and the disease code corresponding to the standard disease name are used as the second pairing result.
[0075] Step 205, generating a disease name paired list according to the first coded result and the second coded result.
[0076] Specifically, the server merges the first code pairing result and the second code pairing result into a list, and in the new list, the electronic medical record identifier, the disease name, the standard disease name matching the disease name, and the standard disease name corresponding The corresponding disease code is stored. For the disease name that is deleted in the deduplication process, the server uses the first code pairing result or the second code pairing result of the disease name associated with the deleted disease name as the code pairing result of the deleted disease name, so that Get a complete list of disease name codes.
[0077] In one embodiment, the server can also upload the generated disease name code list to the block chain, so as to improve the privacy and security of the disease name code list.
[0078] In this embodiment, the list of disease names is first deduplicated to reduce the amount of calculation; the deduplicated disease name list is input into the exact matching model for exact matching, and the first code pairing result is obtained, and the diseases that cannot be accurately matched The name is input as a candidate paired disease name into the fuzzy matching model for fuzzy matching, and the second paired result is obtained. During the two paired codes, the codes are matched according to the standard disease classification table; finally, according to the first paired result and the second The result of the code matching generates a code list of disease names, and performs multi-dimensional and multi-mode code matching of disease names through precise matching and fuzzy matching, which improves the accuracy of disease name code matching.
[0079] Furthermore, the exact matching model is composed of several ordered exact matching sub-models, such as image 3 As shown, the above step 203 may include:
[0080] Step 2031 , input each disease name in the deduplicated disease name list into the exact match sub-model according to the arrangement order of the exact match sub-model in the exact match model.
[0081] Specifically, the exact matching model can be composed of several different exact matching sub-models arranged in an orderly manner, and the exact matching sub-model can first perform simple text-level preprocessing on the input disease name, and then perform exact matching. Preprocessing can be to process the characters or phrases in the disease name, such as correcting typos, removing repeated characters or phrases, converting synonyms, and removing meaningless characters, etc. Different exact matching sub-models can perform different text-level preprocessing on disease names; it can be understood that there may also be exact matching sub-models that do not preprocess disease names.
[0082]According to the arrangement order of the exact matching sub-model in the exact matching model, the server first inputs the disease names in the deduplicated disease name list into the exact matching sub-model.
[0083] Step 2032, through the current exact matching sub-model, query the standard disease name matching the input disease name in the standard disease classification table.
[0084] Specifically, the current exact matching sub-model preprocesses the input disease name according to the preprocessing program, and obtains the standard disease classification table after preprocessing, and compares the disease name with each standard disease name in the standard disease classification table one by one , to query the standard disease names that can be matched.
[0085] Step 2033, when a matching standard disease name is found, use the searched standard disease name and the disease code corresponding to the standard disease name as the first code pairing result of the disease name.
[0086] Specifically, when a standard disease name matching the disease name is found, the matching standard disease name and the disease code corresponding to the standard disease name are used as the first code pairing result of the disease name.
[0087] After the exact matching sub-model completes the coding of a disease name, it starts to process the next input disease name. When the disease name can be matched by an exact matching sub-model, the processing of the disease name is ended, and the disease name is no longer matched by the remaining exact matching sub-models.
[0088] Step 2034, when the current exact matching sub-model does not find a matching standard disease name, input the disease name into the next exact matching sub-model to continue matching.
[0089] Specifically, if the current exact matching sub-model fails to find a standard disease name that matches the disease name in the standard disease classification table, input the disease name to the next exact matching sub-model according to the sequence of the exact matching sub-model to continue Perform a match.
[0090] Step 2035, if the disease name has not been matched by each exact matching sub-model, mark the disease name as a candidate matching disease name.
[0091] Specifically, when the exact matching sub-model cannot match the disease name, the disease name is input into the next exact matching sub-model for matching. When none of the exact matching sub-models can match the disease name, the disease name is marked as a candidate paired disease name.
[0092] In this embodiment, each disease name in the disease name list after deduplication is input into the exact matching sub-model for matching according to the arrangement order of the exact matching sub-model. If it can be matched, the first code pairing result is generated; if it cannot be matched , then input the next exact matching sub-model to continue matching. The exact matching sub-models are different, which ensures that the disease name can be accurately matched from multiple dimensions and improves the accuracy of the disease name code.
[0093] In one embodiment, the above step 203 may specifically include: inputting each disease name in the deduplicated disease name list into the exact matching sub-model according to the arrangement order of the four exact matching sub-models in the exact matching model. The four exact match submodels include exact match submodel, stop word removal submodel, primary and secondary separation submodel and synonym recognition submodel.
[0094] Specifically, there are four exact matching sub-models in the exact matching model, and the four exact matching sub-models are the exact matching sub-model, the de-stop word sub-model, the primary and secondary separation sub-model and the synonymous recognition sub-model in sequence . The server first inputs each disease name in the deduplicated disease name list into the exact match sub-model.
[0095] Complete match sub-model: used to fully match the disease name, compare the input disease name with the standard disease name in the standard disease classification table in turn, if the disease name is completely consistent with a standard disease name, then determine the disease name and the standard The disease name is an exact match. The complete matching sub-model takes the matched standard disease name and the disease code corresponding to the standard disease name as the first pairing result. Disease names that failed to be matched by the exact match submodel were input to the de-stopwords submodel.
[0096] Stopwords removal sub-model: The disease name is preprocessed to remove stopwords before matching. First, remove the meaningless punctuation marks in the disease name (such as "?@%¥#,;/", etc.); then access the pre-built medical disease-specific stop thesaurus, and the numbers are recorded in the medical disease-specific stop lexicon , location words, and some specific terms; call the special stop word database for medical diseases to remove the stop words in the disease name (for example, if the disease name is "left metatarsal fracture", then remove "left"); remove the stop words The disease name is sequentially matched with the standard disease name in the standard disease classification table, and the matched standard disease name and the disease code corresponding to the standard disease name are used as the first pairing result. Unmatched disease names were entered into the primary and secondary segregator model.
[0097] Primary and secondary segregation sub-model: The disease name is preprocessed by primary and secondary disease separation before matching. A disease name may be connected by multiple disease names, and the primary and secondary segregation sub-model extracts the primary and secondary disease names. (For example, if the disease name is "1. diabetes 2. hypertension", the main disease name "diabetes" and the secondary disease name "hypertension" are extracted). The primary disease name and the secondary disease name are sequentially matched with the standard disease names in the standard disease classification table to obtain the first code pairing result. Wherein, the main disease name may be the name of the disease recognized first, and the name of the secondary disease may be the name of the disease recognized later. If the disease name is connected by multiple disease names, multiple code matching results will be obtained. The main disease name corresponds to the primary code matching result, and the secondary disease name corresponds to the secondary code matching result. Disease names that were not matched by the primary and secondary segregation submodels were entered into the synonym recognition submodel.
[0098] Synonymous recognition sub-model: preprocess the disease name for synonymous conversion and then match it. The synonym recognition sub-model accesses a pre-built synonymous disease lexicon, which records different representations for the same body part, different representations for the same symptom, different representations for the same disease, etc. Call the synonymous disease lexicon to replace synonyms in disease names, for example, replace "malignant tumor" with "cancer", "hyperthyroidism" with "hyperthyroidism" and so on. Then, match the names of diseases after synonymous replacement with the names of standard diseases in the standard disease classification table in order to obtain the first code pairing result.
[0099] Disease names that were not matched by any of the four exact-matching submodels were marked as candidate paired disease names.
[0100] It can be understood that the above four exact matching sub-models can also be arranged in any order.
[0101] In this embodiment, the name of the disease is input into the exact matching sub-model according to the arrangement order of the four exact matching sub-models in the exact matching model. The primary and secondary segregation sub-model and the synonymous recognition sub-model can use different methods to match disease names according to the four exact matching sub-models, which improves the accuracy of disease name matching.
[0102] Further, the fuzzy matching model is composed of several fuzzy matching sub-models, such as Figure 4 As shown, the above step 204 may specifically include:
[0103] Step 2041, input the obtained candidate coded disease names into each fuzzy matching sub-model in the fuzzy matching model.
[0104] Specifically, the fuzzy matching model can be composed of several fuzzy matching sub-models. Candidate paired disease names are input into each fuzzy matching sub-model in the fuzzy matching model, and each fuzzy matching sub-model can match the candidate pairs through different fuzzy matching methods. coded disease names for matching.
[0105] In one embodiment, the step of inputting the obtained candidate coded disease names into each fuzzy matching sub-model in the fuzzy matching model specifically includes: inputting the obtained candidate coded disease names into four fuzzy matching submodels in the fuzzy matching model Sub-model, four fuzzy matching sub-models include word frequency matching sub-model, N-Gram sub-model, edit distance sub-model and cosine calculation sub-model.
[0106] Specifically, the fuzzy matching model consists of four fuzzy matching sub-models, and the four fuzzy matching sub-models include word frequency matching sub-model, N-Gram sub-model, edit distance sub-model and cosine calculation sub-model. Each candidate coded disease name will be input into four fuzzy matching sub-models for different fuzzy matching.
[0107] Word frequency matching sub-model, which parses candidate coded disease names and each standard disease name in the standard disease classification table into a collection of single characters (for example, "diabetes" is parsed as {"sugar", "urine", "disease"}) . Use the Jaccard coefficient (Jaccard index, also known as the Jaccard similarity coefficient, which is used to compare the similarity and difference between limited sample sets. The larger the Jaccard coefficient value, the higher the sample similarity.) as the character of the candidate code disease name The similarity between the set and each standard disease name character set, adding control parameters in the calculation process for smoothing operation.
[0108] For example, the candidate coded disease name is "diabetes", and the word frequency matching sub-model calculates the Jaccard coefficients of "diabetes" and 26,000 standard disease names in ICD-10 one by one. The calculation formula is as follows:
[0109]
[0110] Among them, A is the character set of candidate code disease names, and B is the character set of standard disease names; Jaccard(A, B) is the similarity between A and B; lenA represents the length of set A, that is, the characters in set A lenB represents the length of set B, that is, the number of characters in set B; len(A∩B) represents the number of identical characters in set A and set B, α and β are control parameters, and the control parameters are artificially set , for example, you can set α to 1 and β to 0.5.
[0111] Then calculate the Jaccard of the candidate paired disease name "diabetes" and the standard disease name "diabetic foot" as:
[0112] N-Gram sub-model: N-Gram (also known as N-gram model) is often used in natural language processing. The N-Gram of the text represents the phrase obtained by segmenting the text according to the length N. The N value is generally 2 or 3. The N-Gram sub-model parses candidate paired disease names and standard disease names into a set of phrases, for example, parses "diabetes" as {"$sugar", "diabetes", "diabetes", "disease$"} , where $ is a padding character. Then calculate the similarity between the phrase set of candidate coded disease names and each standard disease name phrase set with the following formula:
[0113]
[0114] Wherein, M is the phrase set of candidate code disease names, and N is the phrase set of standard disease names; Jaccard (M, N) is the similarity between M and N; lenM represents the length of the set M, that is, the set M The number of phrases in the set; lenN represents the length of the set N, that is, the number of phrases in the set N; len(M∩N) represents the number of the same phrases in the set M and the set N, and δ and ε are control parameters, and the control parameters artificial setting.
[0115] Edit distance sub-model: used to calculate the Levenshtein distance between candidate coded disease names and standard disease names, the smaller the distance, the higher the similarity.
[0116] Levenshtein distance (also known as text edit distance) refers to the minimum number of operations required to convert a string into another string, and the conversion operations include insertion, deletion, and replacement.
[0117] Example: To convert "eeba" to "abac":
[0118] eeba (delete the first e) → eba
[0119] eba (replace the remaining e with a) → aba
[0120] aba (insert c at the end) → abac
[0121] Then the Levenshtein distance between "eeba" and "abac" is 3.
[0122] Cosine calculation sub-model: The cosine calculation sub-model needs to be trained first. First, crawl medical-related data from the Internet to build a medical corpus (such as crawling Wikipedia, Baidu Encyclopedia, and Medical Encyclopedia to build a medical corpus), and use the crawled data to train the Word2Vec model. The Word2Vec model is a model for generating word vectors. . The cosine calculation sub-model first segmented the disease name of the candidate code, and then used the trained Word2Vec model to convert the word segment into a word vector, and calculated the vector average of each word vector as the disease name vector. For example, the word segmentation of the disease name "upper respiratory tract infection" is "upper", "respiratory tract" and "infection", and the disease name vector of "upper respiratory tract infection" can use the The average value of the word vector is represented. Likewise, compute the disease name vector for the standard disease names. The dimensionality reduction of the disease name vector is carried out through the PCA model, and the disease name vector is translated to the area centered on the origin to increase the difference between the vectors. Calculate the cosine similarity between the disease name vectors of candidate coded disease names after PCA correction and the disease name vectors of each standard disease name. Among them, PCA (principal components analysis, also known as principal component analysis technology), is mainly used for data dimensionality reduction.
[0123] In this embodiment, the candidate coded disease names are input to four fuzzy matching sub-models in the fuzzy matching model, and the four fuzzy matching sub-models are word frequency matching sub-model, N-Gram sub-model, edit distance sub-model and cosine calculation Sub-models, each fuzzy matching sub-model matches candidate coded disease names, ensuring the accuracy of candidate coded disease names.
[0124] It can be understood that the standard disease classification table is stored in the database of the server, and the storage address of the standard disease classification table is pre-stored in each exact matching sub-model and each fuzzy matching sub-model; The matching sub-model obtains the standard disease classification table according to the storage address, and performs matching according to the standard disease classification table.
[0125] Step 2042, based on each fuzzy matching sub-model, calculate the similarity between the candidate coded disease name and each standard disease name in the standard disease classification table.
[0126] Specifically, for each fuzzy matching sub-model, calculate the similarity between the input candidate coded disease name and each standard disease name in the standard disease classification table.
[0127] In one embodiment, when the fuzzy matching sub-model is the edit distance sub-model, the calculation step of the similarity between the candidate coded disease name and each standard disease name in the standard disease classification table specifically includes: calculating the candidate coded disease name and the standard The text edit distance of each standard disease name in the disease classification table; each text edit distance is normalized, and the normalized text edit distance is used as the similarity between the candidate coded disease name and each standard disease name.
[0128] Specifically, for each candidate coded disease name, the edit distance sub-model calculates the text edit distance between the candidate coded disease name and each standard disease name. The text editing distance is an integer, and the smaller the text editing distance, the higher the similarity; in order to perform calculations with the similarity calculated by other fuzzy matching sub-models, the text editing distance needs to be normalized, and the value of the text editing distance Compress to the interval [0,1], and use the normalized text edit distance as the similarity between the candidate coded disease name and each standard disease name.
[0129] The edit distance sub-model can normalize the text edit distance by methods such as linear normalization, normalization normalization and nonlinear normalization.
[0130] In this embodiment, the edit distance sub-model calculates the text edit distance between the candidate coded disease name and each standard disease name, and uses the normalized text edit distance as the similarity between the candidate coded disease name and the standard disease name, In order to ensure that the similarities calculated by other fuzzy matching sub-models can be combined to generate the second code pairing result.
[0131] Step 2043: Generate a second code pairing result according to the similarities calculated by each fuzzy matching sub-model.
[0132] Specifically, from the similarities calculated by the fuzzy matching sub-model, the server can select the standard disease name and its disease code corresponding to the highest similarity as the sub-code result of the fuzzy matching sub-model. Among the sub-coding results of each fuzzy matching sub-model, the sub-coding result with the most occurrences is taken as the second coding result.
[0133] In one embodiment, each fuzzy matching sub-model is preset with a corresponding weight. After obtaining each sub-code pairing result, calculate the weight of each sub-code pairing result according to the weight of each fuzzy matching sub-model, and select the sub-code pairing result with the highest weight ratio as the second code pairing result. For example, assuming that there are 4 fuzzy matching sub-models, the sub-coding results of the two fuzzy matching sub-models are both X, and the sub-coding results of the two fuzzy matching sub-models are both Y; the sub-coding results of the sub-models are X The weights of the two fuzzy matching sub-models are both 0.2, and the weights of the two fuzzy matching sub-models whose sub-code result is Y are both 0.3, then the weight ratio of Y (0.6) is greater than the weight ratio of X (0.4), and Y As the second code pairing result.
[0134] In one embodiment, according to the similarity calculated by each fuzzy matching sub-model, the step of generating the second code pairing result specifically includes: for each candidate code-matched disease name, from the similarity calculated by each fuzzy matching sub-model , select the standard disease name and disease code corresponding to the maximum similarity and perform HardVoting fusion to obtain the second code pairing result; or, perform SoftVoting fusion according to the similarity calculated by each fuzzy matching model to obtain the second code pairing result.
[0135] Among them, HardVoting fusion is to select the standard disease name and its disease code corresponding to the highest similarity from the similarities calculated by each fuzzy matching sub-model, and determine the second code pairing result according to the rule of minority obeying the majority; SoftVoting fusion is Calculate the average of the similarities between the candidate paired disease names output by each fuzzy matching sub-model and each standard disease name, and select the standard disease name and its disease code with the highest average similarity as the second paired result.
[0136] When using HardVoting fusion, for each candidate code-matched disease name, firstly take the standard disease name with the highest similarity and its corresponding disease code calculated by each fuzzy matching sub-model to obtain several groups of sub-coded results, and then combine several In the group sub-coding results, the standard disease name with the most occurrences and its corresponding disease code is taken as the second coding result.
[0137] For example, in the calculation results of the word frequency matching sub-model, the candidate coded disease name has the highest similarity with "peripheral neuropathy", which is 90%; the N-Gram sub-model is "peripheral neuropathy", and the similarity is 80%; the edit distance sub-model The model is "peripheral neuropathy", the similarity is 100%; the cosine calculation sub-model is "peripheral neuritis", the similarity is 85%. In the four groups of code matching results, "peripheral neuropathy" appeared 3 times, "peripheral neuropathy" appeared 1 time, and the number of occurrences of "peripheral neuropathy" was greater than that of "peripheral neuropathy". The disease coding of is used as the result of the second pairing.
[0138] When using SoftVoting fusion, for each candidate coded disease name, the similarity between each fuzzy matching sub-model and all standard disease names is obtained. When there are four fuzzy matching sub-models, a total of 4*26000 similarities are obtained. Then combine the results of each fuzzy matching sub-model to calculate the weighted average of the similarity between each standard disease name and the candidate paired disease name, and use the standard disease name and its disease code with the highest average similarity as the second paired result .
[0139] For example (each fuzzy matching sub-model only lists two standard disease names as an example):
[0140] Word frequency matching sub-model: peripheral neuropathy - similarity 99%; peripheral neuropathy - similarity 1%;
[0141] N-Gram submodel: Peripheral neuropathy - 49% similarity; Peripheral neuritis - 51% similarity;
[0142] Edit distance submodel: Peripheral neuropathy - similarity 40%; Peripheral neuropathy - similarity 60%;
[0143] Cosine calculation sub-model: peripheral neuropathy - similarity 90%; peripheral neuritis - similarity 10%;
[0144] When the weights of each fuzzy matching sub-model are the same, then:
[0145] The weighted average of the similarity of "peripheral neuropathy": (99%+49%+40%+90%)÷4=69.5%;
[0146] The weighted average of similarity of "peripheral neuritis": (1%+51%+60%+10%)/2=30.5%;
[0147] If the weighted average of the similarities of "peripheral neuropathy" is greater than the weighted average of the similarities of "peripheral neuropathy", then "peripheral neuropathy" and its corresponding disease codes are used as the second code pairing result.
[0148] In this embodiment, HardVoting fusion or SoftVoting fusion is carried out according to the similarity calculated by each fuzzy matching sub-model, and the results of each fuzzy matching sub-model are taken into account, thereby generating the second code pair result, which improves the generation of the second code pair. The accuracy of the result.
[0149] In this embodiment, the candidate coded disease name is input into each fuzzy matching sub-model in the fuzzy matching model, and each fuzzy matching sub-model adopts different methods to calculate the similarity between the candidate coded disease name and each standard disease name, and then Combined with the similarities calculated by each fuzzy matching sub-model to generate the second code pairing result, the accuracy rate of code matching of candidate coded disease names is improved.
[0150] Those of ordinary skill in the art can understand that realizing all or part of the processes in the methods of the above embodiments can be completed by instructing related hardware through a computer program, and the computer program can be stored in a computer-readable storage medium. During execution, it may include the processes of the embodiments of the above-mentioned methods. Wherein, the aforementioned storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).
[0151] It should be understood that although the various steps in the flow chart of the accompanying drawings are displayed sequentially according to the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some of the steps in the flowcharts of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the order of execution is also It is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
[0152] further reference Figure 4 , as for the above figure 2 The realization of the shown method, the application provides an embodiment of a disease name code matching device, the device embodiment and figure 2 Corresponding to the illustrated method embodiments, the apparatus can be specifically applied to various electronic devices.
[0153] like Figure 4 As shown, the disease name code matching device 300 described in this embodiment includes: a list acquisition module 301, a list deduplication module 302, an exact matching module 303, a fuzzy matching module 304 and a list generation module 305, wherein:
[0154] The list acquisition module 301 is configured to acquire a list of disease names from electronic medical records.
[0155] The list deduplication module 302 is configured to deduplicate the duplicated disease names in the disease name list to obtain a deduplicated disease name list.
[0156] The exact matching module 303 is configured to input the list of deduplicated disease names into the exact matching model, perform code matching according to the standard disease classification table, and obtain the first code matching result and candidate code matching disease names.
[0157] The fuzzy matching module 304 is used for inputting the obtained candidate coded disease names into the fuzzy matching model, performing code matching according to the standard disease classification table, and obtaining a second code matching result.
[0158] A list generating module 305, configured to generate a disease name paired list according to the first paired result and the second paired result.
[0159] In this embodiment, the list of disease names is first deduplicated to reduce the amount of calculation; the deduplicated disease name list is input into the exact matching model for exact matching, and the first code pairing result is obtained, and the diseases that cannot be accurately matched The name is input as a candidate paired disease name into the fuzzy matching model for fuzzy matching, and the second paired result is obtained. During the two paired codes, the codes are matched according to the standard disease classification table; finally, according to the first paired result and the second The result of the code matching generates a code list of disease names, and performs multi-dimensional and multi-mode code matching of disease names through precise matching and fuzzy matching, which improves the accuracy of disease name code matching.
[0160] In some optional implementations of this embodiment, the exact matching module 303 includes: a name input submodule, a name query submodule, a first generation submodule, and a name tag submodule, wherein:
[0161] The name input sub-module is used for inputting each disease name in the deduplicated disease name list into the exact match sub-model according to the arrangement order of the exact match sub-model in the exact match model.
[0162] The name query submodule is used to query the standard disease name matching the input disease name in the standard disease classification table through the current exact matching submodel.
[0163] The first generation sub-module is used to use the queried standard disease name and the disease code corresponding to the standard disease name as the first code pair result of the disease name when a matching standard disease name is found.
[0164] The name input sub-module is also used for inputting the disease name into the next exact matching sub-model to continue matching when the current exact matching sub-model does not find a matching standard disease name.
[0165] The name marking sub-module is used to mark the disease name as a candidate matching disease name if the disease name is not matched by each exact matching sub-model.
[0166] In this embodiment, each disease name in the disease name list after deduplication is input into the exact matching sub-model for matching according to the arrangement order of the exact matching sub-model. If it can be matched, the first code pairing result is generated; if it cannot be matched , then input the next exact matching sub-model to continue matching. The exact matching sub-models are different, which ensures that the disease name can be accurately matched from multiple dimensions and improves the accuracy of the disease name code.
[0167] In some optional implementations of this embodiment, the above-mentioned name input submodule is also used to: arrange each disease name in the disease name list after deduplication according to the arrangement of the four exact matching sub-models in the exact matching sub-model order, input to the exact match sub-model; four exact match sub-models include exact match sub-model, de-stop word sub-model, primary and secondary separation sub-model and synonym recognition sub-model.
[0168] In this embodiment, the name of the disease is input into the exact matching sub-model according to the arrangement order of the four exact matching sub-models in the exact matching model. The primary and secondary segregation sub-model and the synonymous recognition sub-model can use different methods to match disease names according to the four exact matching sub-models, which improves the accuracy of disease name matching.
[0169] In some optional implementations of this embodiment, the fuzzy matching module 304 includes: an input submodule, a calculation submodule, and a second generation submodule, wherein:
[0170] The input sub-module is used for inputting the obtained candidate coded disease names into each fuzzy matching sub-model in the fuzzy matching model.
[0171] The calculation sub-module is used to calculate the similarity between the candidate coded disease name and each standard disease name in the standard disease classification table based on each fuzzy matching sub-model.
[0172] The second generation sub-module is used to generate a second code pairing result according to the similarity calculated by each fuzzy matching sub-model.
[0173] In this embodiment, the candidate coded disease name is input into each fuzzy matching sub-model in the fuzzy matching model, and each fuzzy matching sub-model adopts different methods to calculate the similarity between the candidate coded disease name and each standard disease name, and then Combined with the similarities calculated by each fuzzy matching sub-model to generate the second code pairing result, the accuracy rate of code matching of candidate coded disease names is improved.
[0174] In some optional implementations of this embodiment, the above-mentioned input submodule is also used to: input the obtained candidate coded disease names into four fuzzy matching sub-models in the fuzzy matching model, and the four fuzzy matching sub-models include Word frequency matching sub-model, N-Gram sub-model, edit distance sub-model and cosine calculation sub-model.
[0175] In this embodiment, the candidate coded disease names are input to four fuzzy matching sub-models in the fuzzy matching model, and the four fuzzy matching sub-models are word frequency matching sub-model, N-Gram sub-model, edit distance sub-model and cosine calculation Sub-models, each fuzzy matching sub-model matches candidate coded disease names, ensuring the accuracy of candidate coded disease names.
[0176] In some optional implementations of this embodiment, when the fuzzy matching sub-model is an edit distance sub-model, the calculation sub-module includes: a distance calculation unit and a distance normalization unit, wherein:
[0177] A distance calculation unit, used to calculate the text editing distance between the candidate coded disease name and each standard disease name in the standard disease classification table;
[0178] The distance normalization unit is used to normalize each text edit distance, and use each text edit distance after normalization as the similarity between the candidate coded disease name and each standard disease name.
[0179] In this embodiment, the edit distance sub-model calculates the text edit distance between the candidate coded disease name and each standard disease name, and uses the normalized text edit distance as the similarity between the candidate coded disease name and the standard disease name, In order to ensure that the similarities calculated by other fuzzy matching sub-models can be combined to generate the second code pairing result.
[0180] In some optional implementation manners of this embodiment, the second generating submodule includes: a HardVoting unit or a SoftVoting unit, wherein:
[0181] HardVoting unit, for each candidate coded disease name, from the similarities calculated by each fuzzy matching sub-model, screen the standard disease name and disease code corresponding to the maximum similarity to perform HardVoting fusion, and obtain the second coded result .
[0182] The SoftVoting unit is configured to perform SoftVoting fusion according to the similarity calculated by each fuzzy matching model to obtain a second code pairing result.
[0183] In this embodiment, HardVoting fusion or SoftVoting fusion is carried out according to the similarity calculated by each fuzzy matching sub-model, and the results of each fuzzy matching sub-model are taken into account, thereby generating the second code pair result, which improves the generation of the second code pair. The accuracy of the result.
[0184] In order to solve the above technical problems, the embodiment of the present application further provides computer equipment. For details, please refer to Figure 4 , Figure 4 It is a basic structural block diagram of the computer equipment of this embodiment.
[0185] The computer device 4 includes a memory 41 , a processor 42 and a network interface 43 connected to each other through a system bus. It should be noted that only the computer device 4 with components 41-43 is shown in the figure, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes but is not limited to microprocessors, dedicated Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable GateArray, FPGA), digital processor (Digital Signal Processor, DSP), embedded devices, etc.
[0186] The computer equipment may be computing equipment such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The computer device can perform human-computer interaction with the user through keyboard, mouse, remote controller, touch panel or voice control device.
[0187] The memory 41 includes at least one type of readable storage medium, and the readable storage medium includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4 , such as a hard disk or memory of the computer device 4 . In other embodiments, the memory 41 can also be an external storage device of the computer device 4, such as a plug-in hard disk equipped on the computer device 4, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (FlashCard), etc. Certainly, the memory 41 may also include both an internal storage unit of the computer device 4 and an external storage device thereof. In this embodiment, the memory 41 is usually used to store the operating system installed in the computer device 4 and various application software, such as the program code of the disease name matching method. In addition, the memory 41 can also be used to temporarily store various types of data that have been output or will be output.
[0188] The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chips in some embodiments. This processor 42 is generally used to control the general operation of said computer device 4 . In this embodiment, the processor 42 is configured to run the program code stored in the memory 41 or process data, for example, run the program code of the disease name matching method.
[0189]The network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.
[0190] The computer equipment provided in this embodiment can execute the steps of the above-mentioned disease name code matching method. Here, the steps in the disease name coding method may be the steps in the disease name coding method in the above-mentioned embodiments.
[0191] In this embodiment, the list of disease names is first deduplicated to reduce the amount of calculation; the deduplicated disease name list is input into the exact matching model for exact matching, and the first code pairing result is obtained, and the diseases that cannot be accurately matched The name is input as a candidate paired disease name into the fuzzy matching model for fuzzy matching, and the second paired result is obtained. During the two paired codes, the codes are matched according to the standard disease classification table; finally, according to the first paired result and the second The result of the code matching generates a code list of disease names, and performs multi-dimensional and multi-mode code matching of disease names through precise matching and fuzzy matching, which improves the accuracy of disease name code matching.
[0192] The present application also provides another implementation mode, which is to provide a computer-readable storage medium, the computer-readable storage medium is stored with a disease name code program, and the disease name code program can be executed by at least one processor , so that the at least one processor executes the steps of the method for encoding disease names as described above.
[0193] In this embodiment, the list of disease names is first deduplicated to reduce the amount of calculation; the deduplicated disease name list is input into the exact matching model for exact matching, and the first code pairing result is obtained, and the diseases that cannot be accurately matched The name is input as a candidate paired disease name into the fuzzy matching model for fuzzy matching, and the second paired result is obtained. During the two paired codes, the codes are matched according to the standard disease classification table; finally, according to the first paired result and the second The result of the code matching generates a code list of disease names, and performs multi-dimensional and multi-mode code matching of disease names through precise matching and fuzzy matching, which improves the accuracy of disease name code matching.
[0194] Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
[0195] Apparently, the embodiments described above are only some of the embodiments of the present application, not all of them. The drawings show preferred embodiments of the present application, but do not limit the patent scope of the present application. The present application can be implemented in many different forms, on the contrary, the purpose of providing these embodiments is to make the understanding of the disclosure of the present application more thorough and comprehensive. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features . All equivalent structures made using the contents of the description and drawings of this application, directly or indirectly used in other related technical fields, are also within the scope of protection of this application.
PUM


Description & Claims & Application Information
We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.
Similar technology patents
Direct-current offset optimization method of multi-carrier visible light communication system
Owner:SOUTHEAST UNIV
Kinect depth image-based human body fall detection method and device
Owner:NANCHANG UNIV
Extraction method of electrocardiosignal R waves
Owner:重庆海睿科技有限公司
Fundamental wave and harmonic wave detecting method based on three-coefficient Nuttall windowed interpolation FFT
Owner:ZHEJIANG UNIV
Classification and recommendation of technical efficacy words
- small amount of calculation
- improve accuracy
License plate detection method based on deep learning
Owner:CHENGDU XINEDGE TECH
Vehicle path planning method based on storage unmanned vehicle
Owner:NANJING UNIV OF SCI & TECH
Phase interferometer direction finding method for ambiguity resolution by extension baselines
Owner:UNIV OF ELECTRONIC SCI & TECH OF CHINA
Motion noise interference eliminating method suitable for wearable heart rate monitoring device
Owner:BEIJING UNIV OF POSTS & TELECOMM
Golf club head with adjustable vibration-absorbing capacity
Owner:FUSHENG IND CO LTD
Direct fabrication of aligners for arch expansion
Owner:ALIGN TECH
Stent delivery system with securement and deployment accuracy
Owner:BOSTON SCI SCIMED INC
Method for improving an HS-DSCH transport format allocation
Owner:NOKIA SOLUTIONS & NETWORKS OY
Catheter systems
Owner:ST JUDE MEDICAL ATRIAL FIBRILLATION DIV