Named entity recognition method for Chinese medical records

A technology for named entity recognition and medical records, applied in neural learning methods, medical data mining, biological neural network models, etc., can solve problems such as time-consuming training, unstable network structure, inaccurate label output, etc., to reduce consumption time, strengthen long-distance dependence, and improve the effect of recognition accuracy

Inactive Publication Date: 2020-10-16
UNIV OF ELECTRONIC SCI & TECH OF CHINA
View PDF4 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] 1. The word segmentation information of each sentence is different, so the network structure is not fixed for each training, and batch training cannot be performed, so the training takes a lot of time
[0006] 2. In Chinese medical records, various labels often overlap. For example, "cardia and gastric fundus cancer" is a disease-type entity, but "cardia" and "stomach" are anatomical entities. This method does not take this situation into account , making part of the label output inaccurate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Named entity recognition method for Chinese medical records
  • Named entity recognition method for Chinese medical records
  • Named entity recognition method for Chinese medical records

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] A named entity recognition method for Chinese medical records, comprising the following steps:

[0041] S1: Preprocess the data text to obtain the vocabulary W, which mainly includes:

[0042] S11: To build a Chinese medical record corpus, please use the BIO notation method to standardize and mark a large number of Chinese medical record texts. The BIO notation method means that for each named entity, use "B" to indicate the beginning of the entity, and "I" to indicate the entity Continuing on, "O" means not of any entity type. In this example, "D" stands for "disease and diagnosis", "C" stands for "examination", "E" stands for "examination", "M" stands for "medicine", and "B" stands for "anatomical part" , "P" stands for "surgery". Labeling a character needs to label its entity category and its position information in the entity, that is, use the label of "entity category-position information" for labeling.

[0043] S12: Use Baidu and Harbin Institute of Technology ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a named entity recognition method for Chinese medical records, and relates to the technical field of natural language processing. The method comprises the steps of carrying outthe preprocessing of a data text; pre-training the character vector model and the word vector model; converting the text sequence into word-based vector representation, introducing a word vector to obtain a mixed vector, and encoding the mixed vector to obtain an encoded text feature vector sequence; inputting the encoded text feature vector sequence, and obtaining an attention vector sequence byusing a multi-head attention mechanism; respectively inputting the attention vector sequences into two layers of CRF classifiers to obtain final label output; updating and converging the model; and performing named entity identification on the Chinese medical records. According to the method, a WC-LSTM network structure of a self-attention mechanism is introduced, and the finally output label isdetermined by using two-layer CRF classification based on the labels with different granularities, so that the time consumed by training is shortened while the named entity recognition accuracy in thefield of Chinese medical records is improved.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a named entity recognition method for Chinese medical records. Background technique [0002] With the development of artificial intelligence technology, especially natural language processing technology, Chinese electronic medical records, which have extensive accumulation and imply huge potential information, have begun to attract people's attention. Named entity recognition refers to the positioning and classification of named entities mentioned in unstructured texts into predefined categories. The field of Chinese medical records used in the present invention is generally defined as: disease and diagnosis, inspection, inspection, surgery, medicine and anatomy parts. Named entity recognition is the cornerstone of many natural language processing technologies, such as knowledge graphs, relationship extraction, etc., and is the first step in using artificial ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/295G06F40/242G16H10/60G16H50/70G06N3/04G06N3/08
CPCG06F40/295G06F40/242G16H10/60G16H50/70G06N3/084G06N3/044G06N3/045
Inventor 田玲卢国明秦科罗光春江涛母翀
Owner UNIV OF ELECTRONIC SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products