Electronic medical record text named entity recognition method based on pre-trained language model

A named entity recognition and language model technology, applied in the field of medical information data processing, can solve the problems of limited accuracy and recall rate, cumbersome and complicated problems, and achieve the effect of convenient investigation, comprehensive key information, and good semantic compression effect

Pending Publication Date: 2020-01-17
SUZHOU INST OF BIOMEDICAL ENG & TECH CHINESE ACADEMY OF SCI
View PDF4 Cites 59 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Therefore, the technical problem to be solved by the present invention is to overcome the limited accuracy and recall of existing named entity recognition methods and t...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Electronic medical record text named entity recognition method based on pre-trained language model
  • Electronic medical record text named entity recognition method based on pre-trained language model
  • Electronic medical record text named entity recognition method based on pre-trained language model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] This embodiment provides a method for named entity recognition of electronic medical record text based on a pre-trained language model, such as figure 1 shown, including the following steps:

[0032] Step 1, collect the electronic medical record text from the public data set as the original text, and perform data preprocessing on the original text;

[0033] Specifically, in this embodiment, the corpus used in the original text comes from the electronic medical record text collected in the public data set, the words appearing in all the original text in the statistical data set, and remove stop words, irrelevant symbols, etc., to generate a dictionary document.

[0034] Step 2: Based on the standardized medical terminology, perform entity annotation on the original text preprocessed in step 1 to obtain the annotated text;

[0035] Specifically, based on the SNOMED CT medical terminology set and using the BIO annotation mode, the corpus in the original text preprocessed...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of medical information data processing, and particularly relates to an electronic medical record text named entity recognition method based on a pre-training language model, which comprises the following steps: collecting an electronic medical record text from a public data set as an original text, and preprocessing the original text; labeling the preprocessed original text entity based on the standard medical term set to obtain a labeled text; inputting the annotation text into a pre-training language model to obtain a training text represented bya word vector; constructing a BiLSTM-CRF sequence labeling model, and learning the training text to obtain a trained labeling model; and taking the trained labeling model as an entity recognition model, and inputting a test text to output a labeled category label sequence. According to the method, text features and semantic information in the deep language model are obtained through training in the super-large-scale Chinese corpus, a better semantic compression effect can be provided, the problem that manual annotation is tedious and complex is avoided, the method does not depend on dictionaries and rules, and the recall ratio and accuracy of named entity recognition are improved.

Description

technical field [0001] The invention belongs to the technical field of medical information data processing, and in particular relates to a named entity recognition method of electronic medical record text based on a pre-trained language model. Background technique [0002] Case history is a record of medical activities such as inspection, diagnosis, and treatment by medical personnel on the occurrence, development, and outcome of patients' diseases. A written patient medical record is required. With the development of computer and Internet technology, most hospitals have realized the electronicization of clinical medical records. Electronic medical records use electronic equipment to record, save, manage, transmit and reproduce digital medical records, which are safe, reliable and convenient for recording and storage. , sharing and other advantages. The application of electronic medical records can not only provide the most practical and abundant data for health management...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/295G06F40/30G16H10/60G06K9/62G06N3/04G06N3/08
CPCG16H10/60G06N3/08G06N3/044G06N3/045G06F18/24
Inventor 戴亚康戴斌耿辰周志勇胡冀苏
Owner SUZHOU INST OF BIOMEDICAL ENG & TECH CHINESE ACADEMY OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products