Multi-task named entity recognition and confrontation training method for medical field

A technology of named entity recognition and training methods, applied in neural learning methods, character and pattern recognition, instruments, etc., can solve problems such as poor neural network models

Inactive Publication Date: 2018-06-29
ZHEJIANG UNIV
View PDF5 Cites 73 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Due to the lack of training data in the biomedical field, neural network models often perform very poorly

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-task named entity recognition and confrontation training method for medical field
  • Multi-task named entity recognition and confrontation training method for medical field
  • Multi-task named entity recognition and confrontation training method for medical field

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0134] Taking the Ex-PTM and BioNLP11EPI data sets as an example, the implementation steps of the present invention are as follows:

[0135] 1. Download the BioNLP11EPI dataset from http: / / 2011.bionlp-st.org, download the Ex-PTM dataset from http: / / www.geniaproject.org / , and use the standoff2conll tool to process each row by Consists of a word and a label.

[0136] If there are many labels when the amount of data is small, the training effect will not be good. Process the data sets whose data volume is less than the threshold, and replace multiple labels with unique labels: the labels of the AnatEM data set are uniformly processed into B-Anatomy or I-Anatomy.

[0137] 2. Make statistics on sentences, words and labels to form sentence table, vocabulary table V and label table; make statistics on characters in words to form character table V chr ; let d chr For the dimensionality of each character vector, the matrix of character vectors is:

[0138]

[0139] in for dime...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-task named entity recognition and confrontation training method for medical field. The method includes the following steps of (1) collecting and processing data sets, so that each row is composed of a word and a label; (2) using a convolutional neural network to encode the information at the word character level, obtaining character vectors, and then stitching withword vectors to form input feature vectors; (3) constructing a sharing layer, and using a bidirection long-short-term memory nerve network to conduct modeling on input feature vectors of each word ina sentence to learn the common features of each task; (4) constructing a task layer, and conducting model on the input feature vectors and the output information in (3) through a bidirection long-short-term network to learn private features of each task; (5) using conditional random fields to decode labels of the outputs of (3) and (4); (6) using the information of the sharing layer to train a confrontation network to reduce the private features mixed into the sharing layer. According to the method, multi-task learning is performed on the data sets of multiple disease domains, confrontation training is introduced to make the features of the sharing layer and task layer more independent, and the task of training multiple named entity recognition simultaneously in a specific domain is accomplished quickly and efficiently.

Description

technical field [0001] The invention relates to natural language processing, in particular to a multi-task named entity recognition confrontation training method oriented to the medical field. Background technique [0002] Natural Language Processing (NLP) is an interdisciplinary subject integrating linguistics and computer science. Named Entity Recognition (NER) is a basic task in natural language processing, which aims to identify proper nouns and meaningful quantitative phrases in natural language texts and classify them. With the rise of information extraction and big data concepts, the task of named entity recognition has attracted increasing attention, and has become an important part of natural language processing such as public opinion analysis, information retrieval, automatic question answering, and machine translation. How to automatically, accurately and quickly identify named entities from massive Internet text information has gradually become a hot topic in ac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N3/04G06N3/08
CPCG06N3/084G06N3/045G06F18/2411G06F18/214
Inventor 汤斯亮王凯张宁吴飞庄越挺
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products