Medical field oriented named entity recognition method based on deep learning

A named entity recognition and deep learning technology, which is applied in the medical field and can solve the problems of CRF model not considering semantic information and meaningless annotation results.

Inactive Publication Date: 2018-06-15
哈尔滨福满科技有限责任公司
View PDF4 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The purpose of the present invention is to solve the problem that a large number of meaningless labeling results will appear in the labeling results when the training corpus is extremely scarce because the CRF model does not consider semantic information. With the help of large-scale news field corpus, it is proposed A named entity recognition method based on deep learning for the medical field

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Medical field oriented named entity recognition method based on deep learning
  • Medical field oriented named entity recognition method based on deep learning
  • Medical field oriented named entity recognition method based on deep learning

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0089] Specific implementation mode one: combine figure 1 A named entity recognition method based on deep learning for the medical field in this embodiment is specifically prepared according to the following steps:

[0090] Step 1. Use unlabeled medical corpus to perform word vector vec i training, obtained the word vector vec corresponding to the vocabulary voc and vocabulary voc of supplementary medical field corpus; Wherein, vec=[vec 1 ,vec 2 ,...,vec n ]; voc = [voc 1 ,voc 2 ,...,voc n ]; where i=1,2,...,n; vec=vec 1 ,vec 2 ,K,vec i ,K,vec n ;voc=voc 1 ,voc 2 ,K,voc i ,K,voc n ; n is the total number of word categories in the unlabeled corpus;

[0091] Step 2, utilize the training corpus in the marked corpus of the news field to carry out the training of long-term short-term memory unit network LSTM; Utilize the word vector vec described in step 1 as the pre-training vector of the training of described long-term short-term memory unit network LSTM, utilize LS...

specific Embodiment approach 2

[0112] Specific implementation mode two: the difference between this implementation mode and specific implementation mode one is:

[0113] Step 21, the vocabulary voc and the word vector vec corresponding to the vocabulary voc are pre-trained; use x k and the word vector vec obtained in step 1 to calculate the input X of the LSTM neural network. Among them, two methods are used to calculate the input X of the LSTM neural network. The two methods are specifically: one is to use the word vector vec as the input X of the LSTM model The method selected for the initial value is method one; the other method is to use the word vector vec as the input of the LSTM neural network, namely method two;

[0114] Step 22. Use input X t , the hidden layer h obtained by the t-1th calculation t-1 And the memory unit c obtained by the t-1th calculation t-1 Compute the input gate in of the LSTM model calculated for the tth time t , the output gate o of the LSTM model t And the forget gate f ...

specific Embodiment approach 3

[0117] Specific implementation mode three: the difference between this implementation mode and specific implementation mode one is:

[0118] Step 31, the vocabulary voc and the word vector vec corresponding to the vocabulary voc are pre-trained; use x k and the word vector vec obtained in step 1 to calculate the input X of the LSTM neural network. Among them, two methods are used to calculate the input X of the LSTM neural network. The two methods are specifically: one is to use the word vector vec as the input X of the LSTM model The method selected for the initial value is method one; the other method is to use the word vector vec as the input of the LSTM neural network, namely method two;

[0119] Step 32: Load the LSTM training in the news field to obtain the model parameters θ n , at θ n The parameters are based on the input X t , the hidden layer h obtained by the t-1th calculation t-1 And the memory unit c obtained by the t-1th calculation t-1 Compute the input gat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a medical field oriented named entity recognition method based on deep learning. The method comprises the steps of S1, carrying out long short-term memory (LSTM) training through utilization of training corpuses in labeled corpuses in the medical field; S2, carrying out labeling result path searching according to neural network parameters theta updated in the S1, thereby obtaining labeling results of the labeled corpuses, and evaluating labeling results of test corpuses in the labeled corpuses through utilization of a named entity recognition evaluative criteria F value;and S3, within a training process in the S1, carrying out the long short-term memory (LSTM) training through utilization of the labeled corpuses in the journalism field, carrying out medical field model training according to a trained model and the labeled corpuses in the medical field, and evaluating the labeling results of the test corpuses in the labeled corpuses through utilization of the named entity recognition evaluative criteria F value. The method is applied to the field of named entity recognition.

Description

technical field [0001] The present invention relates to a named entity recognition method, in particular to a named entity recognition method based on deep learning for the medical field. Background technique [0002] As one of the basic tasks of information extraction, named entity recognition has important applications in question answering systems, syntactic analysis, machine translation and other fields. There is a big difference between medical entities and ordinary entities, and the annotation corpus information of entities in the open field has little effect on the annotation of medical entities. The cost of entity labeling in the medical field is reduced. Therefore, how to use a small amount of annotation corpus for better annotation in the medical field is very important. [0003] Deep learning has made significant progress in recent years, and it has been proven to be able to discover complex structures in high-dimensional data for learning. At present, in the f...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06N3/08
CPCG06F40/295G06N3/08
Inventor 朱聪慧赵铁军关毅李岳
Owner 哈尔滨福满科技有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products