Medical terminology standardization method and system based on probability transfer matrix

A probability transition matrix and probability matrix technology, applied in the field of machine learning, can solve problems such as poor mapping effect and inability to overcome abbreviations, and achieve the effect of improving accuracy and improving accuracy

Active Publication Date: 2018-12-18
上海金仕达卫宁软件科技有限公司
View PDF5 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method requires a lot of labor costs in the early stage, and the mapping effect on medical texts that are not included in the term base is very poor
[0004] There are also experts and scholars trying to improve the efficiency of coding through automatic coding, such as Bao Qingsheng, Cheng Shaoyin, Jiang Fan proposed a text similarity coding method based on vocabulary, this method attempts to map medical diseases to suborders of ICD10 coding, and achieved 79% Suborder accuracy, but this method cannot overcome common abbreviations, common medical terms, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Medical terminology standardization method and system based on probability transfer matrix
  • Medical terminology standardization method and system based on probability transfer matrix
  • Medical terminology standardization method and system based on probability transfer matrix

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0032] A preferred embodiment of a medical term standardization method based on a probability transition matrix of the present invention, comprising:

[0033] Construction of medical terminology database: Due to the aliases of a large number of disease names, non-medical background personnel cannot distinguish medical synonyms from the literal meaning. Aliases and abbreviations, and form the corresponding relationship between terms and ICD10 codes, as shown in the sample table below:

[0034] term set

ICD10 disease name

ICD10 code

Hyperthyroidism

hyperthyroidism

E05.901

hyperthyroidism

hyperthyroidism

E05.901

type 1 diabetes

type 1 diabetes

E10.900

insulin-dependent diabetes

type 1 diabetes

E10.900

[0035] Carry out word segmentation and part-of-speech tagging for medical terms in the medical terminology database;

[0036] Construct an m×n matrix H, and the column names of the matrix represent the ...

Embodiment 2

[0048] This embodiment is based on the standardization method of medical terms based on the probability transfer matrix. On the basis of Example 1, since most of the ICD10 standard disease names exist in the form of phrases, finer segmentation can be performed, such as 'thyroid Hyperfunction' can be further divided into three words {thyroid, function, hyperfunction}. Fine-grained word segmentation can greatly increase the tolerance of the model to writing errors, such as: 'hyperthyroidism', although there is only one typo 'zhuang', but if the term is considered as a whole, the computer will think that 'thyroid function "Hyperthyroidism" and "hyperthyroidism" are completely different terms; if the similarity is compared after word segmentation, the two still have a similarity of 66% from the perspective of word repetition, which greatly improves the tolerance of the model to typos. In order to further improve the tolerance of the model, we introduce the method of cutting charac...

Embodiment 3

[0052] The present embodiment is based on the medical term standardization system of probability transition matrix, and is used for embodiment above-mentioned embodiment 1 or 2 comprises:

[0053] The medical term base stores the aliases and abbreviations of medical terms based on the ICD10 standard, and forms the corresponding relationship between terms and ICD10 codes;

[0054] The medical word cutting and part-of-speech tagging unit is used for word cutting and part-of-speech tagging of medical terms in the medical terminology database;

[0055] The probability transfer matrix frame construction unit is used to construct the m×n matrix H, and the matrix column name represents the complete set N of words, wherein, n is the total number of words in the medical terminology database after word cutting and deduplication operations; M is each row represents a term in the medical terminology database; m is the number of terms in the medical terminology database; matrix element H ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a medical terminology standardization method and system based on probability transfer matrix, which are designed for realizing mapping of universal short text (abbreviation, misspelling, daily expression, etc.) to medical standard terminology in the medical field. The medical terminology standardization method based on the probability transfer matrix comprises the followingsteps: constructing a medical terminology database; performing medical word segmentation and part-of-speech tagging; constructing a term-based probability transfer matrix framework; constructing a word vector model; calculating a probability matrix; calculating a probability of terms to be matched. The invention can realize the quick, efficient and accurate mapping of various diseases in the medical field corresponding to the ICD10 standard coding.

Description

technical field [0001] The invention relates to the field of machine learning, in particular to a medical term standardization method and system based on a probability transfer matrix. Background technique [0002] Clinical medical terminology is an important part of medical data, and the standardization and interchangeability of terminology is the key to the exchange and sharing of medical data. There are many sources of medical blood terminology and different writings, and the same concept is expressed differently in different systems. Even within the same medical institution system, different medical personnel or the same medical personnel express differently on the same concept on different occasions and times. Therefore, in order to facilitate the subsequent structured processing of medical texts, information extraction, statistical analysis and knowledge mining, as well as the sharing and exchange of medical data, accurate mapping between various expressions and stand...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G16H70/00
CPCG16H70/00G06F40/289
Inventor 赵孟海严志华
Owner 上海金仕达卫宁软件科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products