Supercharge Your Innovation With Domain-Expert AI Agents!

A medical record search method based on language model

A technology of language model and search method, which is applied in the field of medical case search based on language model, can solve the problems of increased training time, inaccurate probability estimation, inability to improve the model, etc., and achieve the effect of simplifying user operations and improving relevance

Active Publication Date: 2019-03-26
ZHEJIANG UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In actual operation, some words that do not exist in the model dictionary will appear in the training data or test data. This is very common, because the model dictionary is unlikely and does not need to include all words. In the n-gram model, the word It is represented by a vector, and the dimension of the vector is the size of the dictionary. If the dictionary is very large, the dimension of the word vector will be very high, and the model needs to do more calculations during the training process, which increases the training time.
Moreover, a larger dictionary does not improve the effect of the model. Some uncommon words may only appear once or twice in the entire training data set, and the estimation of its probability these two times is very inaccurate. overfitting

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A medical record search method based on language model
  • A medical record search method based on language model
  • A medical record search method based on language model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0059] Name of medical case: Nourishing Yin, clearing heat, removing blood stasis, resuscitating the law to cure infant hemiplegia|Patient: Wang, female, 2 and a half years old. |First visit: June 16, 1983. Chief complaint and medical history (his father's complaint): fever in the afternoon, right limb movement disorder, left eye exotropia for 40 days. The disease began in January 1983, with continuous high fever for 1 week, body temperature of 39-40°C, loss of appetite, and weight loss. It was diagnosed as right bronchial lymph node tuberculosis by a chest X-ray in a certain hospital. He was hospitalized and received streptomycin, remyfon and other treatments. After 2 months, the body temperature gradually returned to normal, and the chest X-ray examination improved. However, in May 1983, he had a high fever again, with a body temperature above 40°C, accompanied by lethargy, projectile vomiting, confusion, and convulsions. After investigation: the pupils are equal in size ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a medial case searching method based on a language model. The method has the steps of 1, extracting a single structural medical scheme from a medical case book through OCR and text structured processing; 2, preprocessing all the medical cases by a Chinese word segmentation tool, wherein the preprocessing includes work segmentation and stop word elimination; 3, obtaining the unigram language model of each medical case by maximum likelihood estimation; 4, counting the number of works corresponding to each work frequency level for each medical case, and using a statistical data fitted curve; 5, smoothing the unigram language model of each medical case by a Good-Turing estimation method; 6, building a language model taking all the medical cases as a whole, wherein the language model can modify the unigram language model of a single medical case; 7, realizing medical case search by using the modified language model. The information search based on the language model is realized. The respective language model of each medical case is built by the N-gram, the probability of generating the text by the language model is used as the search result ranking basis.

Description

technical field [0001] The invention relates to the field of information retrieval, in particular to a method for searching medical cases based on a language model. Background technique [0002] A language model is a model that generates text based on probabilities. Given a sentence, that is, a sequence of words, the language model can obtain this sequence, that is, p(w 1 ,...,w n )The probability. It has many application scenarios, such as speech recognition, machine translation, POS tagging, handwritten font recognition, information retrieval and so on. [0003] The N-gram model is a language model with fast training and high probability of calculating and generating text, which is suitable for information retrieval. The typical N-gram is the Unigram model, a sentence, that is, a sequence of words, w 1 ,...,w n The probability p(w 1 ,...,w n ), according to the chain rule, should be equal to p(w 1 )×p(w 2 |w 1 )…p(w n |w 1 ,...,w n-1 ). If one makes the simp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/33G06F17/27
CPCG06F16/3334G06F16/3346G06F40/279
Inventor 张引姜利成
Owner ZHEJIANG UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More