Medical field-oriented named entity identifying method based on deep learning

A named entity recognition and deep learning technology, applied in the medical field, can solve the problem that the CRF model does not consider semantic information, meaningless annotation results, etc., to reduce the consumption of human and material resources and improve performance.

Active Publication Date: 2016-12-07
NAT INST OF ADVANCED MEDICAL DEVICES SHENZHEN
View PDF7 Cites 54 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The purpose of the present invention is to solve the problem that a large number of meaningless labeling results will appear in the labeling results when the training corpus is extremely scarce because the CRF model does not consider semantic information, and proposes a medical field based on deep learning named entity recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Medical field-oriented named entity identifying method based on deep learning
  • Medical field-oriented named entity identifying method based on deep learning
  • Medical field-oriented named entity identifying method based on deep learning

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0038] Specific implementation mode one: combine figure 1 A named entity recognition method based on deep learning for the medical field of this embodiment is specifically prepared according to the following steps:

[0039] Step 1. Use unlabeled corpus for word vector vec i Training (using the word2vec toolkit for word vector training), the vocabulary voc of the supplementary medical field corpus (this vocabulary contains more vocabulary than the medical field corpus) and the word vector vec corresponding to the vocabulary voc ;where, i=1,2,3,...,n; vec=vec 1 ,vec 2 ,...,vec i ,...,vec n ;voc=voc 1 ,voc 2 ,...,voc i ,...,voc n ; n is the total number of word categories in the unlabeled corpus;

[0040] Step 2. Use the training corpus in the marked corpus to train the long-short-term memory unit network LSTM; use the word vector vec obtained in step 1 as the pre-training vector, and use the LSTM method according to the pre-training vector, x k and y k Calculate the o...

specific Embodiment approach 2

[0065] Specific implementation mode two: the difference between this implementation mode and specific implementation mode one is: the input sequence X of the LSTM neural network is calculated by using method one described in step two one. The specific process:

[0066] Establish the training corpus vocabulary voc' in the marked corpus, and merge voc' and voc into the vocabulary VOC; VOC=VOC 1 ,VOC 2 ,VOC 3 ,...,VOC N ;

[0067] Randomly initialize the vector matrix word_emb corresponding to the vocabulary VOC, so that the dimension of the vector matrix word_emb is the same as the word vector vec, and assign values ​​according to formula (1):

[0068] w o r d _ emb i = vec i , ∀ i ∈ v o c - - - ( 1 ) ...

specific Embodiment approach 3

[0073] Specific implementation mode three: the difference between this implementation mode and specific implementation mode one or two is: the input sequence X of the LSTM neural network is obtained by calculating the input sequence X of the LSTM neural network as described in step two one:

[0074] Randomly initialize the vector matrix word_emb corresponding to the vocabulary VOC, and keep the vector word_emb after assignment according to formula (1) i Unchanged, that is, it is not updated as a parameter, and then the vector matrix corresponding to a vocabulary in the vocabulary VOC is randomly initialized as word_emb_para, and the input sequence X of the LSTM neural network is calculated:

[0075] X = ( x k [ k 1 , k 2 ] · w...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a medical field-oriented named entity identifying method based on deep learning and relates to a named entity identifying method. The invention mainly aims at solving the problems that semantic information is not considered in a CRF model and a large amount of unmeaning labelling results can appear in labelling results under the condition that training corpus is extremely scarce. The method provided by the invention is realized by the following steps of: firstly, obtaining a vocabulary voc supplementing medical field corpus and a term vector vec corresponding to the vocabulary voc; secondly, carrying out short and long term memory (LSTM) unit network training by utilizing training corpus in labelled corpus; and thirdly, carrying out path finding of labelling results according to an updated neural network parameter theta in the step two, so that the labelling results with labelled corpus; and evaluating labelling result of test corpus in the labelled corpus by utilizing a named entity identification and evaluation criteria value F. The method provided by the invention is applied to the named entity identification field.

Description

technical field [0001] The present invention relates to a named entity recognition method, in particular to a named entity recognition method based on deep learning for the medical field. Background technique [0002] As one of the basic tasks of information extraction, named entity recognition has important applications in question answering systems, syntactic analysis, machine translation and other fields. There is a big difference between medical entities and ordinary entities, and the annotation corpus information of entities in the open field has little effect on the annotation of medical entities. The cost of entity labeling in the medical field is reduced. Therefore, how to use a small amount of annotation corpus for better annotation in the medical field is very important. [0003] Deep learning has made significant progress in recent years, and it has been proven to be able to discover complex structures in high-dimensional data for learning. At present, in the f...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F19/00
CPCG06F19/326G06F40/295
Inventor 朱聪慧赵铁军杨沐昀徐冰曹海龙郑德权
Owner NAT INST OF ADVANCED MEDICAL DEVICES SHENZHEN
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products