Medical terminology standardization method and system based on probability transfer matrix
A probability transition matrix and probability matrix technology, applied in the field of machine learning, can solve problems such as poor mapping effect and inability to overcome abbreviations, and achieve the effect of improving accuracy and improving accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0032] A preferred embodiment of a medical term standardization method based on a probability transition matrix of the present invention, comprising:
[0033] Construction of medical terminology database: Due to the aliases of a large number of disease names, non-medical background personnel cannot distinguish medical synonyms from the literal meaning. Aliases and abbreviations, and form the corresponding relationship between terms and ICD10 codes, as shown in the sample table below:
[0034] term set
ICD10 disease name
ICD10 code
Hyperthyroidism
hyperthyroidism
E05.901
hyperthyroidism
hyperthyroidism
E05.901
type 1 diabetes
type 1 diabetes
E10.900
insulin-dependent diabetes
type 1 diabetes
E10.900
[0035] Carry out word segmentation and part-of-speech tagging for medical terms in the medical terminology database;
[0036] Construct an m×n matrix H, and the column names of the matrix represent the ...
Embodiment 2
[0048] This embodiment is based on the standardization method of medical terms based on the probability transfer matrix. On the basis of Example 1, since most of the ICD10 standard disease names exist in the form of phrases, finer segmentation can be performed, such as 'thyroid Hyperfunction' can be further divided into three words {thyroid, function, hyperfunction}. Fine-grained word segmentation can greatly increase the tolerance of the model to writing errors, such as: 'hyperthyroidism', although there is only one typo 'zhuang', but if the term is considered as a whole, the computer will think that 'thyroid function "Hyperthyroidism" and "hyperthyroidism" are completely different terms; if the similarity is compared after word segmentation, the two still have a similarity of 66% from the perspective of word repetition, which greatly improves the tolerance of the model to typos. In order to further improve the tolerance of the model, we introduce the method of cutting charac...
Embodiment 3
[0052] The present embodiment is based on the medical term standardization system of probability transition matrix, and is used for embodiment above-mentioned embodiment 1 or 2 comprises:
[0053] The medical term base stores the aliases and abbreviations of medical terms based on the ICD10 standard, and forms the corresponding relationship between terms and ICD10 codes;
[0054] The medical word cutting and part-of-speech tagging unit is used for word cutting and part-of-speech tagging of medical terms in the medical terminology database;
[0055] The probability transfer matrix frame construction unit is used to construct the m×n matrix H, and the matrix column name represents the complete set N of words, wherein, n is the total number of words in the medical terminology database after word cutting and deduplication operations; M is each row represents a term in the medical terminology database; m is the number of terms in the medical terminology database; matrix element H ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com