Medical field-oriented named entity identifying method based on deep learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A named entity recognition and deep learning technology, applied in the medical field, can solve the problem that the CRF model does not consider semantic information, meaningless annotation results, etc., to reduce the consumption of human and material resources and improve performance.

Active Publication Date: 2016-12-07

NAT INST OF ADVANCED MEDICAL DEVICES SHENZHEN

View PDF7 Cites 54 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0009] The purpose of the present invention is to solve the problem that a large number of meaningless labeling results will appear in the labeling results when the training corpus is extremely scarce because the CRF model does not consider semantic information, and proposes a medical field based on deep learning named entity recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment approach 1

[0038] Specific implementation mode one: combine figure 1 A named entity recognition method based on deep learning for the medical field of this embodiment is specifically prepared according to the following steps:

[0039] Step 1. Use unlabeled corpus for word vector vec i Training (using the word2vec toolkit for word vector training), the vocabulary voc of the supplementary medical field corpus (this vocabulary contains more vocabulary than the medical field corpus) and the word vector vec corresponding to the vocabulary voc ;where, i=1,2,3,...,n; vec=vec 1 ,vec 2 ,...,vec i ,...,vec n ;voc=voc 1 ,voc 2 ,...,voc i ,...,voc n ; n is the total number of word categories in the unlabeled corpus;

[0040] Step 2. Use the training corpus in the marked corpus to train the long-short-term memory unit network LSTM; use the word vector vec obtained in step 1 as the pre-training vector, and use the LSTM method according to the pre-training vector, x k and y k Calculate the o...

specific Embodiment approach 2

[0065] Specific implementation mode two: the difference between this implementation mode and specific implementation mode one is: the input sequence X of the LSTM neural network is calculated by using method one described in step two one. The specific process:

[0066] Establish the training corpus vocabulary voc' in the marked corpus, and merge voc' and voc into the vocabulary VOC; VOC=VOC 1 ,VOC 2 ,VOC 3 ,...,VOC N ;

[0067] Randomly initialize the vector matrix word_emb corresponding to the vocabulary VOC, so that the dimension of the vector matrix word_emb is the same as the word vector vec, and assign values according to formula (1):

[0068] w o r d _ emb i = vec i , ∀ i ∈ v o c - - - ( 1 ) ...

specific Embodiment approach 3

[0073] Specific implementation mode three: the difference between this implementation mode and specific implementation mode one or two is: the input sequence X of the LSTM neural network is obtained by calculating the input sequence X of the LSTM neural network as described in step two one:

[0074] Randomly initialize the vector matrix word_emb corresponding to the vocabulary VOC, and keep the vector word_emb after assignment according to formula (1) i Unchanged, that is, it is not updated as a parameter, and then the vector matrix corresponding to a vocabulary in the vocabulary VOC is randomly initialized as word_emb_para, and the input sequence X of the LSTM neural network is calculated:

[0075] X = ( x k [ k 1 , k 2 ] · w...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a medical field-oriented named entity identifying method based on deep learning and relates to a named entity identifying method. The invention mainly aims at solving the problems that semantic information is not considered in a CRF model and a large amount of unmeaning labelling results can appear in labelling results under the condition that training corpus is extremely scarce. The method provided by the invention is realized by the following steps of: firstly, obtaining a vocabulary voc supplementing medical field corpus and a term vector vec corresponding to the vocabulary voc; secondly, carrying out short and long term memory (LSTM) unit network training by utilizing training corpus in labelled corpus; and thirdly, carrying out path finding of labelling results according to an updated neural network parameter theta in the step two, so that the labelling results with labelled corpus; and evaluating labelling result of test corpus in the labelled corpus by utilizing a named entity identification and evaluation criteria value F. The method provided by the invention is applied to the named entity identification field.

Description

technical field [0001] The present invention relates to a named entity recognition method, in particular to a named entity recognition method based on deep learning for the medical field. Background technique [0002] As one of the basic tasks of information extraction, named entity recognition has important applications in question answering systems, syntactic analysis, machine translation and other fields. There is a big difference between medical entities and ordinary entities, and the annotation corpus information of entities in the open field has little effect on the annotation of medical entities. The cost of entity labeling in the medical field is reduced. Therefore, how to use a small amount of annotation corpus for better annotation in the medical field is very important. [0003] Deep learning has made significant progress in recent years, and it has been proven to be able to discover complex structures in high-dimensional data for learning. At present, in the f...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/27G06F19/00

CPCG06F19/326G06F40/295

Inventor 朱聪慧赵铁军杨沐昀徐冰曹海龙郑德权

Owner NAT INST OF ADVANCED MEDICAL DEVICES SHENZHEN

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Medical field-oriented named entity identifying method based on deep learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment approach 1

specific Embodiment approach 2

specific Embodiment approach 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology