A disease name trained model using a medical word semantic representation learning method, an interpretable disease name estimation system using the same, and the estimation method thereof.

The medical word semantic expression learning method segments and vectorizes electronic medical records to create a disease name-learned model, addressing the lack of interpretable disease name estimation in ICD10 coding by quantifying records and providing interpretable disease name estimation.

JP7873840B2Active Publication Date: 2026-06-15KANAI EDUCATIONAL INSTITUTION

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Patents
Current Assignee / Owner
KANAI EDUCATIONAL INSTITUTION
Filing Date
2022-03-30
Publication Date
2026-06-15

AI Technical Summary

Technical Problem

Existing methods lack an effective and interpretable approach to estimate disease names from electronic medical records, particularly in the context of ICD10 coding, as they do not adequately utilize natural language processing to extract and analyze feature amounts for disease names.

Method used

A medical word semantic expression learning method that segments electronic medical records, selects N feature words from a disease name thesaurus, creates a word semantic vector dictionary, and assigns seed vectors to represent words, enabling the creation of a disease name-learned model through machine learning to estimate disease names with interpretability.

🎯Benefits of technology

This method allows for the quantification of electronic medical records into N-dimensional vectors, facilitating the extraction of higher-level disease names and providing interpretability in disease name estimation, overcoming the black boxing of machine learning in this domain.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure 0007873840000002
    Figure 0007873840000002
  • Figure 0007873840000003
    Figure 0007873840000003
  • Figure 0007873840000004
    Figure 0007873840000004
Patent Text Reader

Abstract

To provide a method which selects a feature word from a disease name thesaurus and automatically acquires a vector value obtained by representing a medical document with numerical information of the feature word.SOLUTION: The method comprises a step (A) of separating progress summaries of electronic medical charts with spaces, a step (B) of generating a word meaning vector dictionary in which disease names are represented with N feature words, and a step (C) of expressing the electronic medical charts with the N feature words to obtain their weight vectors. In the step (B), the N feature words placed in higher concepts are selected from a disease name thesaurus, and the word meaning vector dictionary in which all of disease names registered in the disease name thesaurus are represented with the feature words is generated. In the step (c), the word meaning vector dictionary is used to represent all of words appearing in the progress summaries of the electronic medical charts with the N feature words, and N-dimensional seed vectors are given to the words, and N kinds of vector values are learned for each of the progress summaries to obtain an N-dimensional weight vector for each of the electronic medical charts.SELECTED DRAWING: Figure 2
Need to check novelty before this filing date? Find Prior Art

Description

【Technical Field】 【0001】 The present invention relates to a method for creating a disease name learned model using a medical word meaning expression learning method, an interpretable disease name estimation method using the same, and an interpretable disease name estimation system using the disease name learned model. 【Background Art】 【0002】 In the United States, a tool for assisting in the ICD10 coding work performed by medical record information managers has already been commercialized and is becoming widespread. It is considered to apply natural language processing to medical documents to derive ICD10 codes, but its algorithm has not been made public. Also, in Japan, no similar assistive tool using a method of natural language processing has been developed. 【0003】 The invention described in Non-Patent Document 1 uses a vector space model of words, and the invention described in Non-Patent Document 2 attempts to extract entities from the "chief complaint" and "past history" of the discharge summary and structure characteristic words for each diagnostic group classification. 【0004】 However, in order to estimate a disease name from the progress summary of the discharge summary, it is necessary to extract and analyze feature amounts for each disease name, but no established method has been known so far and it is in the research and development stage. 【0005】 【Non-Patent Document 1】 The 39th Joint Conference on Medical Informatics November 24, 2019 4-F-3-04 【Non-Patent Document 2】 The 23rd Annual Meeting of the Japanese Society for Medical Informatics, Spring 2019 June 7, OB-1 【Non-Patent Document 3】 https: / / iss.ndl.go.jp / books / R000000004-I7980256-00 【Summary of the Invention】 【Problems to be Solved by the Invention】 【0006】 This invention provides a learning method that automatically selects 264 disease names from a disease name thesaurus as characteristic words and obtains vector values ​​that represent medical documents using the numerical information of these characteristic words. [Means for solving the problem] 【0007】 (1) Interpretable method for estimating disease names (1-1) Method for learning the meaning and expression of medical vocabulary The medical word semantic expression learning method according to the present invention is an interpretable medical word semantic expression learning method comprising: (A) the step of processing data of a target text; (B) the step of creating a medical word semantic vector dictionary (hereinafter also referred to as the "word semantic vector dictionary") in which disease names are represented by N feature words; and (C) the step of representing the target text by N feature words and obtaining its weight vector. In step (A), the target text is Electronic medical records acquired in memory Step (B) involves segmenting the progress summary, selecting N feature words that are higher in concept from the disease name thesaurus, and creating a word semantic vector dictionary in which disease names registered in the disease name thesaurus are represented by these feature words. Step (C) involves using the word semantic vector dictionary to represent all words appearing in the progress summary of the electronic medical record with the N feature words and assigning a seed vector consisting of N types of vector values. By learning N types of vector values ​​for each segmented progress summary of the electronic medical record, an N-dimensional weight vector can be obtained for each electronic medical record. 【0008】 The medical word semantic expression learning method according to the present invention is An interpretable medical word semantic expression learning method executed by a computer equipped with a processor and memory, (A) The processor The steps involve processing the data of the target text, (B) The processor The steps include creating a medical word semantic vector dictionary in which disease names are represented by N characteristic words, and (C) The processor The process involves representing the target text using N feature words and determining its weight vector, and Includes, (A) The processor The steps for processing the data of the target text are: (A-1) The processor An electronic medical record system that includes the patient's "gender," "age," "specialty," etc., as well as a discharge summary containing the diagnosed disease name represented by a disease code and a summary of the patient's progress. Get it into memory Steps and (A-2) The processor The steps include selecting the aforementioned summary of events as the target document, (A-3) Processor natural language processing Using this, the step of obtaining word segmentation of the summary of the process, Includes, (B) The processor The steps to create a word semantic vector dictionary are: (B-1) The processor, A disease name thesaurus is a system that classifies and systematizes disease names according to their hierarchical relationships, synonyms, and related terms. Get into memory The steps to take, (B-2) The processor The steps include selecting N characteristic words from the aforementioned disease name thesaurus, (B-3) The processor The steps include: obtaining a word semantic vector dictionary by listing the characteristic words corresponding to the hierarchical relationship and synonymous / related relationships of disease names registered in the disease name thesaurus; Includes, (C) The processor The step of finding the weight vector for the target text is: (C-1) The processor Using the aforementioned word semantic vector dictionary, all words appearing in the segmented progress summary are represented by the aforementioned feature words, and a seed vector consisting of N types of vector values ​​is assigned to them. (C-2) The processor The steps include learning N types of vector values ​​for each progress summary of the electronic medical record and obtaining a weight vector for the progress summary, Includes. 【0009】 In the medical word semantic expression learning method according to the present invention, (C-1) The processorThe step of assigning seed vectors to all words is (C-1-1) The processor, including the step of recursively expanding all the words represented by the N feature words using the word meaning vector dictionary. 【0010】 In the medical word meaning expression learning method according to the present invention, (C-2) The processor The step of obtaining the weight vector of the progress summary is (C-2-1) The processor including the step of obtaining a paragraph vector represented by N vector values of the progress summary of each electronic medical record using the seed vector. 【0011】 The medical word meaning expression learning method according to the present invention is In the step in which the (A-1) processor acquires the electronic medical record into memory, The disease name code of the electronic medical record to be acquired into memory is: It may be the International Classification of Diseases, 10th Revision (ICD10). (1-2) How to create a disease name-trained model 【0012】 The method for creating a disease name learned model according to the present invention is A computer equipped with a processor and memory executes In the above medical word meaning expression learning method, The said (A-1) The processor retrieves the electronic medical record into memory. step is (A-1-1) The processor including L1 (L1≥2: L1 is a natural number) training electronic medical records Get into memory step, The said (C-2) The processor The step of obtaining the weight vector of the progress summary is (C-2-2) The processor including the step of obtaining the weight vector of the progress summary of the training electronic medical record, (D) The processor is using the information including the weight vector of the progress summary of the training electronic medical record obtained by the step of (C-2-2) as an explanatory variable Steps to perform machine learning, By executing this,The weight vectors of the explanatory variables were mapped to disease codes for the diagnostic diagnoses in the training electronic medical record. 【0013】 In the method for creating a disease name-learned model according to the present invention, The previously explained variable is, Acquired into memory The aforementioned electronic medical record may also include information such as "gender," "age," and "medical department." 【0014】 In the method for creating a disease name-learned model according to the present invention, (D) The processor The aforementioned machine learning step is: (D-1) The processor retrieved the memory The step of selecting M diagnostic disease names from the aforementioned L1 training electronic medical records (L1≧M), (D-2) The processor, The steps include inputting the explanatory variables of all selected training electronic medical records into a support vector machine (SVM), (D-3) The processor, The steps include obtaining the disease name codes of the diagnostic diagnoses from all the selected training electronic medical records as the target variable from a Support Vector Machine (SVM), Includes, The processor, For all the selected training electronic medical records, a disease name-learned model is obtained that represents the correspondence between the explanatory variables and the dependent variable. (1-3) Interpretable methods for estimating disease names 【0015】 (A1) Electronic medical records Get into memoryIn an interpretable medical word semantic representation learning method that obtains an N-dimensional weight vector for each electronic medical record, the following steps are performed: (B1) selecting N feature words that are higher in concept from a disease name thesaurus and creating a word semantic vector dictionary that represents the disease names registered in this disease name thesaurus using these feature words; and (C1) using the word semantic vector dictionary, representing all words appearing in the progress summary of the electronic medical record with the N feature words and assigning a seed vector consisting of N types of vector values, and learning N types of vector values ​​for each progress summary of the electronic medical record that has been segmented, In step (A1), an evaluation electronic medical record and multiple training electronic medical records are used. Get into memory Then, each progress summary is segmented, and in step (C1), the weight vectors of each progress summary of the evaluation electronic medical record and the training electronic medical record are obtained. An interpretable disease name estimation method is provided, wherein machine learning is performed using information including the weight vector of the progress summary of the training electronic medical record obtained in step (C1) as explanatory variables to create a disease name trained model in which the weight vector of the explanatory variable corresponds to the disease name code of the diagnostic disease name in the training electronic medical record, which is the objective variable; weight vectors of the explanatory variable are created from the weight vector of the progress summary of the evaluation electronic medical record obtained in step (C1); the disease name code, which is the objective variable, is obtained by referring to the disease name trained model; and the disease name (the feature word) that is a higher-level concept of the disease name code is obtained by selecting feature words with large weights from the weight vectors contained in the evaluation electronic medical record. 【0016】 The interpretable disease name estimation method according to the present invention is A computer equipped with a processor and memory, ( Execute steps A-1) to (A-3), (B-1) to (B-3), and (C-1) to (C-2), and each of these steps is, (A-1) The processor An electronic medical record system that includes the patient's "gender," "age," "specialty," etc., as well as a discharge summary containing the diagnosed disease name represented by a disease code and a summary of the patient's progress. Get it into memory Steps and (A-2) The processor The steps include selecting the aforementioned summary of events as the target document, (A-3) The processor The steps include obtaining word segmentation of the summary of events using natural language processing, (B-1) The processor, A disease name thesaurus is a system that classifies and systematizes disease names according to their hierarchical relationships, synonyms, and related terms. Get into memory The steps to take, (B-2) The processor The steps include selecting N characteristic words from the aforementioned disease name thesaurus, (B-3) The processor The steps include: obtaining a word semantic vector dictionary by listing the characteristic words corresponding to the hierarchical relationship and synonymous / related relationships of disease names registered in the disease name thesaurus; (C-1) The processor Using the aforementioned word semantic vector dictionary, all words appearing in the segmented progress summary are represented by the aforementioned feature words, and a seed vector consisting of N types of vector values ​​is assigned to them. (C-2) The processor The steps include learning N types of vector values ​​for each progress summary of the electronic medical record and obtaining a weight vector for the progress summary, In a medical vocabulary semantic expression learning method, (A-1) The processor Electronic medical records Get into memory The steps are, (A-1-1) The processor L1 training electronic medical record (L1≧2: L1 is a natural number) Get into memory Steps and (A-1-2) The processor acquires an evaluation electronic medical record into memory, Includes, Said (C-2) The processor The step of obtaining the weight vector for the progress summary is: (C-2-2) The processor The steps include: finding the weight vector of the progress summary of the training electronic medical record, (C-2-3) The processor The step includes determining the weight vector of the progress summary of the electronic medical record used for evaluation, The processor,By performing machine learning using information including the weight vectors of the progress summaries of the training electronic medical records obtained in step (C-2-2) above as explanatory variables, a disease name-learned model is created in which the weight vectors of the explanatory variables correspond to the disease name codes of the diagnostic diagnoses in the training electronic medical records, which is the target variable. And , The processor, A method for creating a weight vector of explanatory variables from the weight vector of the progress summary of the evaluation electronic medical record obtained in step (C-2-3) above, and estimating the disease name corresponding to the explanatory variable by referring to the disease name trained model, (E-1) The processor The steps include referencing the explanatory variables of the electronic medical record used for evaluation in the disease name trained model, (E-2) The processor The steps include obtaining the disease name code, which is the target variable, from the aforementioned disease name trained model, (E-3) The processor The steps include selecting feature words with large weights from the weight vectors contained in the aforementioned electronic medical record for evaluation, This allows you to obtain the disease name of the feature word that is a higher-level concept of the disease name code obtained from the disease name learned model, including the disease name of the feature word. (2) Interpretable disease name estimation system 【0017】 (Word meaning vector dictionary) The medical word semantic vector dictionary according to the present invention is a dictionary in which disease names registered in a disease name thesaurus are described using N characteristic words selected from the thesaurus, which is a systematized classification of disease names based on their hierarchical relationships, synonymous relationships, etc., and for each disease name registered in the thesaurus, the characteristic words corresponding to the hierarchical relationships and synonymous relationships of that disease name are listed. 【0018】 (Medical vocabulary and meaning learning program) The medical word semantic expression learning program according to the present invention is a program that selects N characteristic words from a disease name thesaurus, and uses a word semantic vector dictionary created by listing the characteristic words corresponding to the hierarchical relationship and synonymous / related relationship of the disease name registered in the disease name thesaurus, to cause a computer to execute steps (F), (G), (H), and (I). (F) A step of inputting the patient's "gender," "age," "department," etc., and the progress summary from the electronic medical record, which includes a discharge summary containing a summary of the progress, into the computer. (G) A step of obtaining word segmentation of the summary of events using natural language processing, (H) Using the word semantic vector dictionary, represent all words appearing in the segmented progress summary with the feature words, and assign a seed vector consisting of N types of vector values. (I) A step of learning N types of vector values ​​for each progress summary of the electronic medical record and obtaining a weight vector for the progress summary, Includes. 【0019】 (Disease name trained model) The interpretable disease name trained model according to the present invention selects N feature words from a disease name thesaurus, and creates a word semantic vector dictionary that lists the feature words corresponding to the hierarchical relationship and synonymous / related relationships of the disease names registered in the disease name thesaurus. Using this word semantic vector dictionary, all words appearing in the segmented progress summaries of multiple training electronic medical records are represented by the aforementioned feature words, a seed vector consisting of N types of vector values ​​is assigned, and for each progress summary of the training electronic medical record, the weight vector of that progress summary is obtained. By performing machine learning using the information including the obtained weight vector as explanatory variables, the disease name code, which is the target variable, is determined. For all training electronic medical records, the dependent variable was determined for each explanatory variable, and the correspondence between the explanatory variables and disease codes was classified. 【0020】 (An interpretable disease name estimation system) The interpretable disease name estimation system according to the present invention includes a device A at a medical institution a that uses a plurality of accumulated training electronic medical records to create a disease name learned model that represents the correspondence between explanatory variables, which include the weight vectors of each of the plurality of accumulated training electronic medical records, and the disease name code of each diagnostic disease name, which is the objective variable. Database B, which stores the disease name learned model created by device A, Device C of medical institution C, which has an electronic medical record for evaluation, can be connected via a network line. Device A is Using a word semantic vector dictionary in which disease names are represented by N feature words selected from a disease name thesaurus, all words appearing in the segmented progress summaries of the multiple training electronic medical records are represented by the feature words. A seed vector consisting of N types of vector values ​​is assigned, and for each discharge summary in the training electronic medical record, a weight vector of the progress summary is obtained, and machine learning is performed using the information including the obtained weight vector as explanatory variables. The disease name trained model created in this way is then sent to database B. Database B is The disease name learned model transmitted from device A is stored in memory. Device C is Using the word semantic vector dictionary created by device A, all words appearing in the segmented progress summary of the evaluation electronic medical record are represented by the feature words, and a seed vector consisting of N types of vector values ​​is assigned to obtain the weight vector of the progress summary of the evaluation electronic medical record, A disease name estimation system that uses information including the obtained weight vector as explanatory variables, references the disease name trained model in database B, and obtains the disease name code corresponding to the explanatory variable as the target variable, Device C of medical institution c can obtain the disease name of a feature word that is a higher-level concept of the disease name code obtained from the disease name trained model by selecting a feature word with a large weight from the weight vector contained in the evaluation electronic medical record. [Effects of the Invention] 【0021】 (Method for learning the meanings and expressions of medical vocabulary) The medical word semantic expression learning method according to the present invention is Get into memory For each electronic medical record progress summary, the vector values ​​of the N-dimensional space spanned by the axes of N feature words selected from the disease thesaurus can be learned to obtain the weight vector of that progress summary. In other words, N feature words that are at the higher level of the concept are selected from the disease thesaurus, a word semantic vector dictionary is created in which the disease names registered in this thesaurus are represented by these feature words, and this is used to... Get into memory By representing all words appearing in the progress summary of the electronic medical record using the corresponding feature words and assigning a seed vector consisting of N types of vector values, and learning the N types of vector values ​​for each progress summary of the electronic medical record, an N-dimensional weight vector can be obtained for each electronic medical record. 【0022】 Therefore, the medical word semantic expression learning method of the present invention can quantify the progress summary of an electronic medical record for each component of N feature words that are positioned as higher-level concepts in the disease name, thereby enabling the acquisition of higher-level disease names (feature words) that are closely related to the description in the progress summary. Furthermore, since the progress summary of an electronic medical record can be quantified as an N-dimensional vector, a format suitable for machine learning of the progress summary can be obtained. 【0023】 (Disease name trained model) The method for creating a disease name-learned model according to the present invention is, in the above-mentioned medical word semantic expression learning method, Get into memory By obtaining weight vectors from the progress summaries of numerous training electronic medical records and performing machine learning using information containing these weight vectors as explanatory variables, the weight vectors of the explanatory variables can be mapped to the disease codes of the diagnostic diagnoses in the training electronic medical records. Therefore, by the method for creating disease name-learned models of the present invention, it is possible to obtain disease name-learned models that represent the correspondence between explanatory variables containing the weight vectors of the progress summaries and the disease codes of the diagnostic diagnoses, which are the target variables, for all selected training electronic medical records. 【0024】 (Method for predicting disease names with interpretability) The disease name estimation method according to the present invention performs word semantic expression learning on a certain evaluation electronic medical record, obtains a weight vector of the progress summary to obtain a higher-level concept, the disease name (feature word), and then estimates the disease name code for that evaluation electronic medical record by referring the weight vector to a disease name-learned model of the training electronic medical record. 【0025】 Therefore, according to the disease name estimation method of the present invention, it is possible to obtain an estimated disease name from the evaluation electronic medical record, as well as a disease name (characteristic word) that is a higher-level concept of that estimated disease name, thereby providing interpretability to the obtained estimated disease name. In other words, a weight vector is obtained from the progress summary of the evaluation electronic medical record created based on the diagnosis, and the estimated disease name and the disease name (characteristic word) of the higher-level concept with the largest weight are obtained simultaneously from that weight vector. This overcomes the problem of black boxing of the estimation reason in machine learning and allows for an explanation of the basis for disease name estimation to the patient. 【0026】 (An interpretable disease name estimation system) The interpretable disease name estimation system according to the present invention is a network system to which device A of medical institution a, which creates a disease name learned model for a large number of training electronic medical records, database B, which stores the disease name learned model created by device A, and device C of medical institution c, which has an evaluation electronic medical record, can be connected. Device C can obtain a weight vector of the progress summary of the evaluation electronic medical record using the medical word semantic expression learning method described above, and can refer to the disease name learned model in database B, using the information including the obtained weight vector as an explanatory variable, to obtain a disease name code corresponding to the explanatory variable. By using the disease name estimation method of the present invention described above, device C of medical institution c can obtain the disease name of the characteristic word that is a higher-level concept of the disease name code, along with the disease name code. 【0027】 Therefore, medical institution C, which has created a progress summary of the evaluation electronic medical record based on the diagnosis, can obtain the weight vector of the progress summary of the evaluation electronic medical record using the above medical word semantic expression learning method, and by referring to database B from device C, it can obtain the estimated disease name from the disease name trained model, as well as simultaneously obtain the disease name (characteristic word) of the higher-level concept with the largest weight. In other words, medical institution C can inform the patient of the estimated disease name based on the created progress summary, and at the same time explain the basis for that estimation. [Brief explanation of the drawing] 【0028】 [Figure 1] A flowchart illustrating the interpretable disease name estimation method according to the present invention. [Figure 2] Flowchart of the medical word semantic expression learning method according to the present invention. [Figure 3] A schematic diagram of the interpretable disease name estimation system according to the present invention. [Figure 4] An explanatory diagram for a medical vocabulary semantic vector dictionary. [Figure 5] A flowchart of the word meaning expression learning method according to the embodiment. [Figure 6] A flowchart of the machine learning process in the example. [Figure 7] A flowchart illustrating an interpretable disease name estimation method in training and evaluation electronic medical records with different distributions, as described in the example. [Figure 8] An explanatory diagram showing the entry structure of the disease name thesaurus dictionary (T dictionary) related to the example. [Figure 9] An explanatory diagram illustrating the structure of disease name codes in a disease name thesaurus. [Modes for carrying out the invention] 【0029】 Hereinafter, embodiments and examples of the interpretable disease name estimation method and interpretable disease name estimation system according to the present invention will be described with reference to the drawings. (1) Interpretable method for estimating disease names 【0030】 As shown in Figure 1, the disease name estimation method according to the present invention obtains a disease name-learned model by performing "(1-1) Word semantic expression learning method" ((A) to (C)) and "(1-2) Method for creating a disease name-learned model" ((D)) on the accumulated training electronic medical records, while simultaneously obtaining the disease name code and the disease name (characteristic word) that is its higher-level concept by performing "(1-3) Interpretable disease name estimation method" on the evaluation electronic medical records. 【0031】 In other words, this "(1-3) Interpretable disease name estimation method" is an interpretable disease name estimation method that obtains a weight vector of the evaluation electronic medical record by performing the "(1-1) Word semantic expression learning method" on the evaluation electronic medical record ((A)~(C)), and obtains a disease name code by referencing this to the disease name trained model ((E)), and at the same time obtains a disease name (feature word) that is a higher-level concept of the disease name code by selecting feature words with large weights from the weight vector. Accordingly, the interpretable disease name estimation method according to the present invention can simultaneously obtain a higher-level concept of the disease name code, which is an interpretation of the disease name, from the weight vector of the evaluation electronic medical record, for the disease name code obtained by referencing the disease name trained model. 【0032】 Chapter (1) of this document will explain in order the following components that constitute the interpretable disease name estimation method according to the present invention: (1-1) (Medical) word semantic expression learning method, (1-2) method for creating a disease name learned model, and (1-3) interpretable disease name estimation method. Furthermore, the methods described in (1) to (1-3) in this Chapter (1) are It is a "method of information processing using computer software." The steps (A) to (E) that constitute these methods, and their respective substeps (A-1) to (A-3), ..., (E-1) to (E-3), etc., are all " The program shall be executed by a computer equipped with a processor and memory. That is, the processor performs calculations for each step and substep, and the processor can access memory and store the resulting output (dictionary, model, etc.) in memory. (1-1) Method for learning the meaning and expression of medical vocabulary 【0033】 The medical word semantic expression learning method according to the present invention is as shown in Figure 2, (A) A step to process the data of the target document, (B) A step of creating a word semantic vector dictionary in which disease names are represented by N characteristic words, (C) A step of representing the target text with N feature words and obtaining its weight vector, It consists of. 【0034】 In step (A), the target text is Get into memory Step (B) is performed by segmenting the electronic medical record progress summary, selecting N feature words that are higher in concept from the disease name thesaurus, and creating a word semantic vector dictionary in which the disease names registered in the disease name thesaurus are represented by these feature words, and in step (C) is performed by using the word semantic vector dictionary above to represent all words appearing in the electronic medical record progress summary using the N feature words and assigning a seed vector consisting of N types of vector values, and learning N types of vector values ​​for each segmented electronic medical record progress summary. 【0035】 The medical word semantic expression learning method of the present invention can obtain an N-dimensional weight vector for each electronic medical record through the above steps (A) to (C). The details of these steps (A) to (C) are as follows. 【0036】 In other words, (A) the step of processing the data of the target document is: (A-1) An electronic medical record containing the patient's "gender," "age," "department," etc., and a discharge summary that includes the diagnosed disease name and a summary of the course of treatment, represented by a disease code. Get into memory Steps to take (A-2) Step of selecting the above summary of events as the target document, (A-3) A step to obtain word segmentation of the above summary using natural language processing, This includes the following. The discharge summary in the electronic medical record summarizes the inpatient's chief complaint, medical history, physical findings, laboratory findings, and medical treatment received during hospitalization. The disease name code may be from the International Statistical Classification of Diseases and Related Health Problems, 10th Revision (hereinafter referred to as "ICD10"). If the disease name code in the electronic medical record is not based on ICD10, it is preferable to convert the appearing disease names to the standard ICD10 disease names using publicly known techniques when obtaining word segmentation of the progress summary in step (A-3). 【0037】 Next, the step of creating (B) a word semantic vector dictionary is: (B-1) A disease name thesaurus is a system in which disease names are classified and organized according to their hierarchical / subordinate relationships, synonyms, and related relationships. Get into memory Steps to take (B-2) A step of selecting N characteristic words from the above disease name thesaurus, (B-3) For each disease name registered in the above disease name thesaurus, the above characteristic words corresponding to the hierarchical relationship and synonymous / related relationship of that disease name are listed to obtain a word semantic vector dictionary. Includes. 【0038】 In step (B-1), the disease name thesaurus used is not particularly limited, but for example, approximately 36,000 disease names are classified and systematized (see Figure 8), and some of these disease names are assigned ICD-10 codes. In step (B-2), when selecting N feature words, it is preferable to choose disease names from the disease name thesaurus used that have as many branches connected by higher-level concepts as possible. The number of feature words N to be selected is not particularly limited, but for example, it is 264. 【0039】 (Word meaning vector dictionary) Thus, the word semantic vector dictionary according to the present invention is a dictionary in which disease names registered in a disease name thesaurus are described using N characteristic words selected from the thesaurus, which is a systematized classification of disease names based on their hierarchical relationships, synonymous relationships, etc. For each disease name registered in the thesaurus, the characteristic words corresponding to its hierarchical relationship and synonymous / related relationship are listed. The method for selecting disease names registered in the disease name thesaurus to be included in this word semantic vector dictionary is not particularly limited. 【0040】 Furthermore, the step of (C) obtaining the weight vector of the target text is: (C-1) Using the above word semantic vector dictionary, represent all words appearing in the above segmented progress summary with the above feature words, and assign a seed vector consisting of N types of vector values. (C-2) A step of learning N types of vector values ​​for each progress summary of the electronic medical record and obtaining the weight vector of the progress summary. Includes. 【0041】 In the medical word semantic expression learning method according to the present invention, (C-1) The step of assigning a seed vector to all words is: (C-1-1) This step includes recursively expanding all of the above words represented by the above N feature words using the above word semantic vector dictionary. 【0042】 In the medical word semantic expression learning method according to the present invention, (C-2) The step of finding the weight vector for the progress summary is: (C-2-1) Using the above seed vector, the step of obtaining a paragraph vector represented by N types of vector values ​​of the progress summary for each electronic medical record. 【0043】 (Medical vocabulary and meaning learning program) The medical word semantic expression learning program according to the present invention, which causes a computer to execute such a medical word semantic expression learning method, is a program that causes a computer to execute steps (F), (G), (H), and (I) using a word semantic vector dictionary created by selecting N characteristic words from a disease name thesaurus and listing the characteristic words corresponding to the hierarchical relationship and synonymous / related relationship of the disease name registered in the disease name thesaurus, (F) A step of inputting the patient's "gender," "age," "department," etc., and the progress summary from the electronic medical record, which includes a discharge summary containing a summary of the progress, into the computer. (G) A step of obtaining word segmentation of the summary of events using natural language processing, (H) Using the word semantic vector dictionary, represent all words appearing in the segmented progress summary with the feature words, and assign a seed vector consisting of N types of vector values. (I) A step of learning N types of vector values ​​for each progress summary of the electronic medical record and obtaining a weight vector for the progress summary, Includes. 【0044】 As explained above, in the medical word semantic representation learning method according to the present invention, a word semantic vector dictionary created from a disease name thesaurus was used to obtain weight vectors for the progress summaries of electronic medical records. These weight vectors are represented by these N types of vector values ​​in the space spanned by N feature words of a higher-level concept selected from the disease name thesaurus. (1-2) How to create a disease name-trained model 【0045】 This section describes how to obtain weight vectors from the progress summaries of training electronic medical records using the medical word semantic expression learning method described in Section (1-1), perform machine learning on these vectors, and create a disease name-learned model (see Figure 1). 【0046】 The method for creating a disease name-learned model according to the present invention is the above-mentioned (A-1) electronic medical record in the medical word semantic expression learning method. Get into memory The steps are: (A-1-1) One L1 (L1≧2: L1 is a natural number) training electronic medical record Get into memory This includes the step of doing so. Furthermore, the step of obtaining the weight vector of the (C-2) progress summary above is: (C-2-2) Steps to determine the weight vector of the progress summary of the training electronic medical record, It may include. 【0047】 Furthermore, the method for creating a disease name-learned model according to the present invention involves performing machine learning using information including the weight vector of the progress summary of the training electronic medical record obtained in step (C-2-2) as explanatory variables, thereby enabling the weight vector of the explanatory variable to correspond to the disease name code of the diagnostic disease name in the training electronic medical record, which is the target variable. 【0048】 Here, the explanatory variables are, for example, information such as "gender," "age," and "department name" included in the electronic medical record, but the items are not particularly limited as long as they are information included in the electronic medical record. In this way, if, for example, three pieces of information have N1 components and are added to the weight vector of the progress summary of the training electronic medical record as explanatory variables, the weight vector of the explanatory variables becomes an (N+N1) dimensional vector. 【0049】 Next, we will explain machine learning. In the method for creating a disease name trained model according to the present invention, (D) The steps for performing machine learning are: (D-1) The step of selecting M diagnostic disease names from the above L1 training electronic medical records (L1≧M), (D-2) The step of inputting all selected explanatory variables from the training electronic medical records into a support vector machine (SVM), (D-3) A step of obtaining the disease name code of the diagnostic diagnoses of all the selected training electronic medical records as the target variable from a Support Vector Machine (SVM), Includes. 【0050】 In step (D-1), there is no limit to the number of M diagnostic diagnoses selected from L1 training electronic medical records. Since the frequency of occurrence of diagnostic diagnoses is known to have a long tail, it is desirable from the standpoint of computational economy and efficiency to limit the number of diagnostic diagnoses by, for example, M=20. However, M may be set to an unlimited number, representing the number of diagnostic diagnoses that appear in L1 training electronic medical records. 【0051】 In step (D-2), Support Vector Machines (SVMs) are a type of machine learning technique, and the machine learning techniques used are not particularly limited. Also, in step (D-1), the disease name codes obtained as the target variable are disease name codes representing the diagnostic diagnoses included in the training electronic medical record, and ICD-10 codes are preferred. 【0052】 As described above, the method for creating a disease name-learned model according to the present invention involves performing machine learning on explanatory vectors, which include weight vectors of the progress summary, for training electronic medical records, thereby obtaining disease name codes for the diagnostic diseases corresponding to the explanatory vectors as the target variable. In other words, for all training electronic medical records used in machine learning, a disease name-learned model is obtained that represents the correspondence between explanatory variables, which include weight vectors of the progress summary, and the target variable, which is the disease name code for the diagnostic disease. (1-3) Interpretable methods for estimating disease names 【0053】 The disease name estimation method according to the present invention estimates disease name codes for evaluation electronic medical records that have undergone word semantic representation learning as described in Section (1-1) by referring to a disease name learned model obtained for training electronic medical records in Section (1-2), and also obtains the disease name (characteristic word) that is the higher-level concept. That is, as shown in Figure 1, the disease name estimation method of the present invention performs steps (A) to (C) for both evaluation electronic medical records and training electronic medical records to obtain the weight vectors of the respective explanatory variables, obtains a disease name learned model for the explanatory variables of the training electronic medical record by step (D), and performs step (E) for the explanatory variables of the evaluation electronic medical record by referring to the disease name learned model of the training electronic medical record to estimate disease name codes and obtain the disease name (characteristic word) that is the higher-level concept. 【0054】 Therefore, the disease name estimation method according to the present invention is (Step 1) Get into memory For the electronic medical record used for evaluation, perform steps (A) to (C) and determine the weight vectors of the explanatory variables. (Step 2) Get into memory For the training electronic medical record created, perform steps (A) to (C) and determine the weight vectors of the explanatory variables. (Step 3) Perform step (D) on the explanatory variables of the training electronic medical record to obtain a disease name-learned model. (Step 4) Step (E) is performed on the explanatory variables of the evaluation electronic medical record, and the disease name code is estimated by referring to the disease name trained model of the training electronic medical record obtained in (Step 3), and the higher-level disease name (feature word) is obtained from the weight vector of the explanatory variables. It is structured as follows. Steps 1 through 4 will be explained below. 【0055】 Steps (1) and (2) are defined in the medical vocabulary semantic expression learning method as described in Steps (A) ((A-1) to (A-3)), Step (B) ((B-1) to (B-3)), and Step (C) ((C-1), (C-2)) above in Section (1-1). The above (A-1) electronic medical record Get into memory The steps are: (A-1-1) One L1 (L1≧2: L1 is a natural number) training electronic medical record Get into memory The steps to take, (A-1-2) A certain electronic medical record for evaluation Get into memory The steps include, The step of obtaining the weight vector for the above (C-2) progress summary is: (C-2-2) The step of obtaining the weight vector of the progress summary of the training electronic medical record, (C-2-3) Includes the step of determining the weight vector of the progress summary of the electronic medical record used for evaluation. 【0056】 Next, (Step 3) uses the information including the weight vectors of the progress summaries of the training electronic medical records obtained in step (C-2-2) above as explanatory variables to perform machine learning, thereby creating a disease name trained model in which the weight vectors of the explanatory variables correspond to the disease name codes of the diagnostic diagnoses in the training electronic medical records, which are the target variables. Here, the machine learning consists of the following steps: (D) The steps for performing machine learning are: (D-1) The step of selecting M diagnostic disease names from the above L1 training electronic medical records (L1≧M), (D-2) The step of inputting all selected explanatory variables from the training electronic medical records into a support vector machine (SVM), (D-3) A step of obtaining the disease name code of the diagnostic diagnoses of all the selected training electronic medical records as the target variable from a Support Vector Machine (SVM), Includes. 【0057】 Step 4 is a method for creating a weight vector of explanatory variables from the weight vector of the progress summary of the evaluation electronic medical record obtained in step (C-2-3) above, and estimating the disease name corresponding to the explanatory variable by referring to the disease name trained model above, (E-1) The step of referencing the explanatory variables of the electronic medical record used for evaluation in the disease name trained model mentioned above, (E-2) The step of obtaining the disease name code, which is the target variable, from the disease name trained model described above, (E-3) A step of selecting feature words with large weights from the weight vectors contained in the above evaluation electronic medical record, Includes. 【0058】 The interpretable disease name estimation method according to the present invention, by performing the above steps (1) to (4), can obtain the disease name of the feature word that is a higher-level concept of the disease name code obtained from the disease name trained model. In other words, by selecting a feature word with a large weight from the weight vector of the progress summary of the training electronic medical record, it is possible to find the disease name that is a higher-level concept of the disease name (disease name code) estimated from the disease name trained model, and to find out the basis for that estimated disease name (disease name code). (2) Interpretable disease name estimation system 【0059】 Chapter (2) of this chapter will explain the system for realizing the interpretable disease name estimation method according to the present invention, as described in Chapter (1), using Figure 3. 【0060】 (An interpretable disease name estimation system) The interpretable disease name estimation system according to the present invention is a disease name estimation system in which a device A at medical institution a creates a disease name learned model that represents the correspondence between explanatory variables, including the weight vectors of each of the accumulated training electronic medical records, and the disease name code of each diagnostic disease name, which is the objective variable, using a plurality of accumulated training electronic medical records; a database B that stores the disease name learned model created by device A; and a device C at medical institution c that has an evaluation electronic medical record, all of which are connected via a network line. 【0061】 Device A uses a word semantic vector dictionary, in which disease names are represented by N feature words selected from a disease name thesaurus, to represent all words appearing in the segmented progress summaries of the multiple training electronic medical records collected above using the above feature words, and assigns a seed vector consisting of N types of vector values. This is done for each discharge summary of the training electronic medical record, and the weight vector of the progress summary of each training electronic medical record is obtained. Then, machine learning is performed using the information including the obtained weight vector as explanatory variables, and the disease name trained model created in this way is sent to database B. 【0062】 Database B stores the disease name trained model transmitted from device A in its memory. 【0063】 Device C uses the word semantic vector dictionary created by Device A to represent all words appearing in the segmented progress summary of the evaluation electronic medical record using the feature words, assigns a seed vector consisting of N types of vector values, and obtains a weight vector for the progress summary of the evaluation electronic medical record. Then, using the information including the obtained weight vector as an explanatory variable, it refers to the disease name trained model in database B and obtains the disease name code corresponding to the explanatory variable as the target variable. 【0064】 Thus, in the interpretable disease name estimation system according to the present invention, device C of medical institution c can obtain the disease name of a feature word that is a higher-level concept of the disease name code obtained from the disease name learned model by selecting a feature word with a large weight from the weight vector contained in the evaluation electronic medical record. 【0065】 (Disease name trained model) In the interpretable disease name estimation system according to the present invention described above, the previously mentioned disease name learned model is created by device A (medical institution a) as shown in Figure 3, and transmitted to database B for storage. 【0066】 First, device A selects N characteristic words from a disease name thesaurus and creates a word semantic vector dictionary that lists the characteristic words corresponding to the hierarchical relationship and synonymous / related relationships of disease names registered in the disease name thesaurus. 【0067】 Then, using this word semantic vector dictionary, all words appearing in the segmented progress summaries of multiple training electronic medical records are represented by the feature words mentioned above, and a seed vector consisting of N types of vector values ​​is assigned to each progress summary of the training electronic medical record to obtain the weight vector of that progress summary. Then, by performing machine learning with the information including the obtained weight vector as explanatory variables, the disease name code, which is the target variable, is obtained. 【0068】 The disease name learned model according to the present invention is data from a correspondence table obtained by determining the dependent variable for each explanatory variable in all training electronic medical records collected by device A (medical institution a), and classifying the correspondence between the explanatory variables and disease name codes (dependent variables). (3) Examples of interpretable disease name estimation 【0069】 Chapter (3) of this section will further explain the interpretable disease name estimation method according to the present invention described above with reference to examples. (3-1) Examples of learning word meanings [Examples] 【0070】 (Electronic medical records) Get into memory (Steps to take) This embodiment includes a step (A) in which data processing of the target document is performed, (A-1) An electronic medical record containing the patient's "gender," "age," "department," etc., and a discharge summary that includes the diagnosed disease name and a summary of the course of treatment, represented by a disease code. Get into memory This corresponds to the step of doing, (A-1-1) One L1 (L1≧2: L1 is a natural number) training electronic medical record Get into memory Steps to take (A-1-2) A certain electronic medical record for evaluation Get into memoryThe steps to take, Includes. 【0071】 In this embodiment, the training electronic medical record and the evaluation electronic medical record used were discharge summaries from a hospital that underwent an electronic medical record system change over a 16-year period. The training electronic medical record was from the old electronic medical record system, with L1=73,150 records and a total of 3,204 diagnoses after data cleansing. The evaluation electronic medical record was from the new electronic medical record system, with L2=48,911 records and a total of 2,849 diagnoses. The following records were deleted as part of the data cleansing conditions. • Delete records with missing values. Remove fields that are not used as explanatory variables. - Remove rare disease names that account for less than 0.02% of the total number of records. • Delete records where the number of characters in the progress summary is less than 50. [Examples] 【0072】 (Disease name thesaurus) Get into memory (Steps to take) This embodiment describes the steps in step (B) of creating a word semantic vector dictionary, (B-1) A disease name thesaurus is a system in which disease names are classified and organized according to their hierarchical / subordinate relationships, synonyms, and related relationships. Get into memory This corresponds to the step of doing so. 【0073】 (Disease Thesaurus) The disease name thesaurus uses a "T dictionary" with the structure shown in Figure 8 (see Non-Patent Literature 3). In Figure 8, the top row item "Code" is "(Category Code) + up to 14 digits (7 levels)" as shown in the example in Figure 9, and the second row item "Classification" is the classification of the term (Classification 1: Priority word, Classifications 2-7: Synonyms). In the T dictionary, terms are combined with superordinate code codes other than those of the superordinate word in the code of the term in question by the "Other Superordinate Code" item (7th level), and combined with the code group of related terms for the term in question by the "Related Term Code" item (8th level). [Examples] 【0074】 (Steps for selecting characteristic words) This embodiment corresponds to step (B-2) of selecting N feature words from the disease name thesaurus in step (B) creating a word semantic vector dictionary. 【0075】 (N characteristic words) In the "Example of Word Semantic Representation Learning" section (3-1), three types of feature words were used: 264 words given from a general conceptual classification, 264 words selected from a disease name thesaurus, and 458 disease names. The "T dictionary" mentioned above was used as the disease name thesaurus. Examples of general feature words and disease name feature words are shown in [Table 1]. 【0076】 [Table 1] 【0077】 Furthermore, the 264 disease-related characteristic words selected from the T dictionary (disease name thesaurus) were prioritized words of 5 letters or less from all 7 levels, with higher-level concepts having more related words (synonyms and lower-level words). Similarly, the 458 disease-related characteristic words selected from the same T dictionary were prioritized words of 5 letters or less from levels 1 to 6. [Examples] 【0078】 (Steps for selecting characteristic words) This embodiment describes the steps in step (B) of creating a word semantic vector dictionary, (B-3) This corresponds to the step of obtaining a word semantic vector dictionary by listing the characteristic words corresponding to the hierarchical relationship and synonymous / related relationship of the disease name registered in the disease name thesaurus. 【0079】 (Word meaning vector dictionary) In this embodiment, a medical word semantic vector dictionary was created using disease names from the T dictionary, as shown in Figure 4. During this process, conversion to the ICD-10 standard form resulted in the duplication of eight feature words, reducing the number of disease names (disease names in the medical word semantic vector dictionary) from 36,768 to 31,033. Note that when creating this medical word semantic vector dictionary, the conversion of disease names from the T dictionary to ICD-10 requires that the disease names appearing in the word segmentation obtained in the progression summary described in (A-3) above be converted to ICD-10 beforehand. 【0080】 The medical word semantic vector dictionary shown in Figure 4 may also be quantified and represented as an N-dimensional vector. If the disease names in dictionary T are represented by vectors in an N-dimensional space spanned by axes of N feature words, then (disease names in dictionary T) can be represented by N 1s and 0s, such as (1,0,0,····,1,0). [Examples] 【0081】 (Step to determine the weight vector for the progress summary) In this embodiment, in step (C), the step of finding the weight vector of the target text, (C-1) Using the above word semantic vector dictionary, represent all words appearing in the above segmented progress summary with the above feature words, and assign a seed vector consisting of N types of vector values, (C-2) This corresponds to the step of learning N types of vector values ​​for each progress summary of the electronic medical record and determining the weight vector of the progress summary. 【0082】 In semantic representation learning, seed vectors were assigned using a medical word semantic vector dictionary, and the weights of feature words for the course summary were learned. Figure 5 shows an example of seed vector assignment using general feature words as an example (top) and an example of course summary weights using disease name feature words as an example (bottom). 【0083】 (Assigning a seed vector) In the "Example of Word Semantic Representation Learning" section (3-1), the step of assigning seed vectors to all words (C-1) includes the step of recursively expanding all words represented by the N feature words using the word semantic vector dictionary (C-1-1). 【0084】 Specifically, in the upper part of Figure 5, all words appearing in all progress summaries of the training electronic medical record are listed on the left, and each word is represented by a vector in an N-dimensional space of feature words using a medical word semantic vector dictionary. Since feature words also appear among the words listed on the left of the upper part of Figure 5, the seed vector changes by substituting them into the vector components of other words. By recursively repeating this operation, a constant seed vector can be assigned to all the words listed on the left of the upper part of Figure 5. 【0085】 Furthermore, the step of (C-2) obtaining the weight vector of the progress summary includes (C-2-1) the step of using the seed vector to obtain a paragraph vector represented by N types of vector values ​​of the progress summary of each electronic medical record. The paragraph vector is an N-dimensional vector that represents the progress summary itself with N types of vector values ​​of characteristic words, and an example of the weight of a progress summary using characteristic words for disease names is shown in the lower part of Figure 5. (3-2) Examples of disease name learned models 【0086】 In this section (3-2), we use information including the weight vectors of the progress summaries of the training electronic medical records obtained in section (3-1) above as explanatory variables, and perform machine learning to create a disease name-learned model in which the weight vectors of the explanatory variables correspond to the disease name codes of the diagnostic diagnoses in the training electronic medical records. [Examples] 【0087】 (Disease name trained model) This embodiment is in step (D) the step of performing the above machine learning, (D-1) A step in which M diagnostic disease names are selected from the above L1 training electronic medical records (L1≧M), (D-2) The step of inputting the explanatory variables of all selected training electronic medical records into a support vector machine (SVM), (D-3) This corresponds to the step of obtaining the disease name code of the diagnostic diagnoses from all the selected training electronic medical records as the objective variable from a Support Vector Machine (SVM). 【0088】 In this embodiment, N=264 feature words were selected from a general-purpose dictionary and a T dictionary, and M=20 diagnostic disease names were selected from L1=73,150 training electronic medical records (Step (D-1)). 【0089】 Next, the weight vector of the progress summary obtained in section (3-1) above was modified to include 24 explanatory variables: age (1D, value is a real number obtained by dividing age by 100), gender (2D, male, female), and medical department (21D, in the case of 21 medical departments). This (N+24)-dimensional explanatory variable vector was then input into a support vector machine (SVM) (step (D-2)). 【0090】 Then, the Support Vector Machine (SVM) was given the disease name codes (20 types) of the diagnostic diagnoses from L1 = 11,839 training electronic medical records as the target variable (Step (D-3)). As described above, a disease name-trained model representing the correspondence between the explanatory variables and the target variable was obtained for L1 = 11,839 training electronic medical records. (3-3) Examples of interpretable disease name estimation 【0091】 From Examples 1 to 6 described above, a disease-learned model was obtained for L1 = 11,839 training electronic medical records, which is a correspondence table between explanatory variables (weight vectors of the progress summary, age, sex, and clinical department vectors) and the dependent variable (disease name code of the diagnosed disease). These steps are shown in the upper half of Figure 1, and the resulting disease-learned model is shown in the middle right of Figure 1. 【0092】 In this section (3-3), we will explain the interpretable disease name estimation according to the present invention using examples, referring to the disease name trained model obtained up to section (3-2) above, and evaluate the F-score obtained as a result of this disease name estimation. [Examples] 【0093】 (Machine Learning) As shown in the flowchart in Figure 7, semantic representation learning as described in Section (3-1) above was performed on the evaluation electronic medical record (new electronic medical record) and the training electronic medical record (old electronic medical record), and weight vectors were obtained for all electronic medical records (progress summaries) of both. Then, training data was created for M=20 disease names (11,839 cases) in the training electronic medical record (old electronic medical record) with L1=73,150 cases, and machine learning as described in Section (3-1) was performed (left column of Figure 7) to obtain a disease name-learned model (bottom center of Figure 7). 【0094】 (Accuracy of disease diagnosis estimation (F-score)) Next, as shown in the flowchart in Figure 6, test data was created for the top 20 diagnoses (11,931 cases) from L2 = 48,911 evaluation electronic medical records (new electronic medical records). Disease name estimation was performed by referring to the disease name trained model described above (right column of Figure 7), and its accuracy (F-score) was evaluated. Specifically, explanatory variables (age, sex, medical department) were added based on the weight vector of the progress summary, and the accuracy (F-score) of disease name estimation was calculated using linear SVM and nonlinear SVM, and the results of linear SVM and nonlinear SVM were evaluated (see Figure 6). 【0095】 (Evaluation results of the estimated F-value for disease diagnosis) The results showed that selecting characteristic words for 264 disease names from a disease name thesaurus resulted in higher evaluation in word semantic representation learning than selecting characteristic words for 264 disease names from a general-purpose dictionary. Furthermore, similar results were obtained when using 458 disease names as characteristic words. The highest F-score was obtained when machine learning was performed using linear SVM with 264 disease names as characteristic words. [Examples] 【0096】 (Interpretation assessment) In this embodiment 8, the interpretability of semantic representation learning using characteristic words for 264 disease names was investigated. Specifically, for the top M=20 disease name codes in the evaluation electronic medical record (new electronic medical record), the characteristic word with the largest average weight of the 264-dimensional vector value obtained by semantic representation learning of the progress summary was examined. As a result, the top 6 disease names (characteristic words) with the largest weights were sensory organ disorders, neonatal disorders, gastrointestinal disorders, circulatory disorders, liver disorders, and hematological disorders. [Examples] 【0097】 (Visualization of weights in the progress summary) In this embodiment 9, the weights of the course summaries were visualized for the top three disease names (characteristic words) among the top six disease names with the highest weights shown in embodiment 8: sensory organ disorders, neonatal disorders, and gastrointestinal disorders. 【0098】 As a result, the characteristic word "sensory organ dysfunction" had a particularly high weight for estimated disease code H251 "senile nuclear cataract," H330 "retinal detachment, retinal tear," and H353 "macular and posterior pole degeneration." Also, the characteristic word "neonatal disorder" had a particularly high weight for estimated disease code P034 "fetus and newborn affected by cesarean section." Furthermore, the characteristic word "digestive disorder" had an exceptionally high weight for estimated disease code C151 "esophageal cancer," C162 "gastric body cancer," C20 "rectal cancer," and C250 "pancreatic cancer." 【0099】 Based on the interpretability evaluation described above, it was found that learning the semantic representation of the progress summary (unsupervised learning) can present disease names (characteristic words) that serve as the basis for disease name estimation, representing higher-level concepts of disease names. [Examples] 【0100】 (Medical vocabulary and meaning learning app) In Figure 3, it would be convenient for medical institutions such as c1 and c2 to use their respective devices C1 and C2 to refer to the word semantic vector dictionary described above, execute the medical word semantic expression learning program, and implement the interpretable disease name estimation method according to the present invention. 【0101】 In this embodiment 10, medical institution a or its affiliated institutions may create a word semantic vector dictionary and a medical word semantic expression learning program, and upload a medical word semantic expression learning application implementing these to database B or the like. Devices C1 and C2 can download this and use it as a word semantic expression learning engine to determine the weight vectors of the progress summaries of the evaluation electronic medical records. Alternatively, the medical word semantic expression learning application may be made executable via an API. 【0102】 In such a word semantic representation learning engine, each medical institution c can input a progress summary of the evaluation electronic medical record created based on the diagnosis into its device C, thereby obtaining a weight vector. From the weight vector of the obtained progress summary, it is possible to simultaneously obtain the disease name (characteristic word) of the higher-level concept with the largest weight. Furthermore, device C can operate a medical word semantic representation learning application to refer to database B with the weight vector of the obtained progress summary and obtain an estimated disease name from the disease name trained model described above. 【0103】 As described above, medical institution C, which has downloaded the medical vocabulary and semantic expression learning app to device C, can inform the patient of its estimated diagnosis based on the created progress summary, and at the same time, explain the basis for that estimation. 【0104】 The interpretable disease name estimation method and interpretable disease name system according to the present invention have been described above using embodiments and examples, but the present invention is not limited to the above embodiments and examples. 【0105】 Furthermore, the present invention can be implemented in various forms with improvements, modifications, and changes based on the knowledge of those skilled in the art, without departing from its spirit. [Industrial applicability] 【0106】 The interpretable disease name estimation method according to the present invention can, for example, utilize a trained disease name estimation model obtained from electronic medical records of a university hospital in a system used by a local clinic.

Claims

[Claim 1] An interpretable medical word semantic expression learning method executed by a computer equipped with a processor and memory, (A) A step in which the processor processes the data of the target document, (B) A step in which the processor creates a medical word semantic vector dictionary in which disease names are represented by N feature words, (C) The processor represents the target text using N feature words and calculates its weight vector, Includes, (A) The step in which the processor processes the target document data is: (A-1) The processor acquires an electronic medical record into memory which contains the patient's "gender," "age," "department name," and a discharge summary which includes the diagnosed disease name represented by a disease name code and a summary of the course of treatment. (A-2) The processor selects the summary of events as the target document, (A-3) The processor uses natural language processing to obtain word segmentation of the summary of events, Includes, (B) The step in which the processor creates a word semantic vector dictionary is: (B-1) The processor obtains a disease name thesaurus in memory, in which disease names are classified and systematized according to their hierarchical / subordinate relationships and synonymous / related relationships. (B-2) The processor selects N characteristic words from the disease name thesaurus, (B-3) The processor obtains a word semantic vector dictionary by listing the characteristic words corresponding to the hierarchical relationship and synonymous / related relationship of the disease name registered in the disease name thesaurus, Includes, (C) The step in which the processor obtains the weight vector of the target text is: (C-1) The processor uses the word semantic vector dictionary to represent all words appearing in the segmented progress summary using the feature words and assigns a seed vector consisting of N types of vector values. (C-2) The processor learns N types of vector values ​​for each progress summary of the electronic medical record and obtains a weight vector for the progress summary, A method for learning the meanings and expressions of medical vocabulary, including medical terms. [Claim 2] (C-1) The step in which the processor assigns a seed vector to all words is: (C-1-1) The processor includes the step of recursively expanding all the words represented by the N feature words using the word semantic vector dictionary, A method for learning the meaning of medical words according to claim 1. [Claim 3] (C-2) The step in which the processor obtains the weight vector of the progress summary is: (C-2-1) The processor uses the seed vector to obtain a paragraph vector which is represented by N types of vector values ​​of the progress summary of each electronic medical record, A method for learning the meaning of medical words according to claim 1. [Claim 4] In the step in which the (A-1) processor acquires the electronic medical record into memory, The medical word semantic expression learning method according to claim 1, wherein the disease name code of the electronic medical record acquired in memory is the International Classification of Diseases, 10th Revision (ICD10). [Claim 5] A medical word semantic expression learning method according to claim 1, which is executed by a computer having a processor and memory, The step of the (A-1) processor acquiring the electronic medical record into memory is as follows: (A-1-1) The processor is L 1 pieces (L 1 ≥2:L 1 The step includes acquiring a training electronic medical record (where is a natural number) into memory, The step in which the (C-2) processor obtains the weight vector for the progress summary is as follows: (C-2-2) The processor includes the step of determining the weight vector of the progress summary of the training electronic medical record, (D) A step in which the processor performs machine learning using information including the weight vector of the progress summary of the training electronic medical record obtained in step (C-2-2) as explanatory variables. A method for creating a disease name-learned model by performing the following steps, wherein the weight vectors of the explanatory variables are associated with the disease name codes of the diagnostic diseases in the training electronic medical record. [Claim 6] The method for creating a disease name learned model according to claim 5, wherein the explanatory variables include information such as "gender," "age," and "medical department" contained in the electronic medical record acquired in memory. [Claim 7] (D) The step in which the processor performs the machine learning is: (D-1) The processor retrieves the L from memory. 1 The steps include selecting M diagnostic disease names from the training electronic medical records and (L 1 ≥M), (D-2) The processor inputs all selected explanatory variables from the training electronic medical records into the support vector machine (SVM), (D-3) The processor obtains disease codes for the diagnostic diagnoses of all selected training electronic medical records from a support vector machine (SVM) as the target variable, Includes, The processor obtains a disease name-learned model representing the correspondence between the explanatory variables and the dependent variable for all selected training electronic medical records. A method for creating a disease name-learned model according to claim 5. [Claim 8] A computer equipped with a processor and memory performs the following steps (A-1) to (A-3), (B-1) to (B-3), and (C-1) to (C-2), and each of these steps is, (A-1) The processor acquires an electronic medical record into memory which contains the patient's "gender," "age," "department name," and a discharge summary which includes the diagnosed disease name represented by a disease name code and a summary of the course of treatment. (A-2) The processor selects the summary of events as the target document, (A-3) The processor uses natural language processing to obtain word segmentation of the summary of events, (B-1) The processor obtains a disease name thesaurus in memory, in which disease names are classified and systematized according to their hierarchical / subordinate relationships and synonymous / related relationships. (B-2) The processor selects N characteristic words from the disease name thesaurus, (B-3) The processor obtains a word semantic vector dictionary by listing the characteristic words corresponding to the hierarchical relationship and synonymous / related relationship of the disease name registered in the disease name thesaurus, (C-1) The processor uses the word semantic vector dictionary to represent all words appearing in the segmented progress summary using the feature words and assigns a seed vector consisting of N types of vector values. (C-2) The processor learns N types of vector values ​​for each progress summary of the electronic medical record and obtains a weight vector for the progress summary, In a medical vocabulary semantic expression learning method, The step of the (A-1) processor acquiring the electronic medical record into memory is as follows: (A-1-1) The processor is L 1 pieces (L 1 ≥2:L 1 The steps include acquiring a training electronic medical record (where is a natural number) into memory, (A-1-2) The processor acquires an evaluation electronic medical record into memory, Includes, The step in which the (C-2) processor obtains the weight vector for the progress summary is as follows: (C-2-2) The processor obtains the weight vector of the progress summary of the training electronic medical record, (C-2-3) The processor obtains a weight vector of the progress summary of the evaluation electronic medical record, Includes, The processor performs machine learning using information including the weight vector of the progress summary of the training electronic medical record obtained in step (C-2-2) as an explanatory variable, thereby creating a disease name learned model in which the weight vector of the explanatory variable corresponds to the disease name code of the diagnostic disease name in the training electronic medical record, which is the objective variable. A method comprising: a processor creating a weight vector of explanatory variables from the weight vector of the progress summary of the evaluation electronic medical record obtained in step (C-2-3), and referencing the disease name learned model to estimate the disease name corresponding to the explanatory variable, (E-1) The processor references the explanatory variables of the evaluation electronic medical record to the disease name learned model, (E-2) The processor obtains the disease name code, which is the target variable, from the disease name learned model, (E-3) The processor selects feature words with large weights from the weight vectors included in the evaluation electronic medical record, An interpretable disease name estimation method that includes the ability to obtain the disease name of the feature word which is a higher-level concept of the disease name code obtained from the disease name learned model. [Claim 9] A program that causes a computer to execute steps (F), (G), (H), and (I) using a word semantic vector dictionary created by selecting N characteristic words from a disease name thesaurus and listing the characteristic words corresponding to the hierarchical relationship and synonymous / related relationship of the disease name registered in the thesaurus, (F) A step of inputting the patient's "gender," "age," "department," and the progress summary from the electronic medical record containing the progress summary into the computer. (G) A step of obtaining word segmentation of the summary of the process using natural language processing, (H) Using the word semantic vector dictionary, represent all words appearing in the segmented progress summary with the feature words, and assign a seed vector consisting of N types of vector values. (I) A step of learning N types of vector values ​​for each progress summary of the electronic medical record and obtaining the weight vector of the progress summary, A medical vocabulary and vocabulary learning program that includes this. [Claim 10] Device A of medical institution a creates a disease name trained model that uses multiple accumulated training electronic medical records to represent the correspondence between explanatory variables, which include the weight vectors of each of the multiple accumulated training electronic medical records, and the disease name code of each diagnostic disease name, which is the objective variable. Database B, which stores the disease name learned model created by device A, Device C of medical institution c, which has an electronic medical record for evaluation, can be connected via a network line. Device A is Using a word semantic vector dictionary in which disease names are represented by N feature words selected from a disease name thesaurus, all words appearing in the segmented progress summaries of the multiple training electronic medical records are represented by the feature words. A seed vector consisting of N types of vector values ​​is assigned, and for each discharge summary in the training electronic medical record, a weight vector of the progress summary is obtained, and machine learning is performed using the information including the obtained weight vector as explanatory variables. The disease name trained model created in this way is then sent to database B. Database B is The disease name learned model transmitted from device A is stored in memory. Device C is Using the word semantic vector dictionary created by device A, all words appearing in the segmented progress summary of the evaluation electronic medical record are represented by the feature words, and a seed vector consisting of N types of vector values ​​is assigned to obtain the weight vector of the progress summary of the evaluation electronic medical record, A disease name estimation system that uses information including the obtained weight vector as explanatory variables, references the disease name trained model in database B, and obtains the disease name code corresponding to the explanatory variable as the target variable, Device C of medical institution c is an interpretable disease name estimation system that can obtain the disease name of a feature word that is a higher-level concept of the disease name code obtained from the disease name learned model by selecting a feature word with a large weight from the weight vector contained in the evaluation electronic medical record.