Text named entity recognition method based on Bi-LSTM, CNN and CRF

A technology of named entity recognition and text, applied in the direction of neural learning methods, special data processing applications, instruments, etc., can solve problems such as computing power limitations, and achieve the effect of a wide range of application scenarios

Inactive Publication Date: 2017-04-19
ZHEJIANG UNIV
View PDF3 Cites 283 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] For sequence tagging tasks such as named entity recognition and speech recognition, for contextual problems with uncertain length of words, or context-limited problems, bidirectional LSTM (Bi-LSTM) neural networks are efficient: when LSTMs are used in named entity recognition tasks Computational power constraints and the quality of word embeddings limit their efficiency when learning from past information in

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text named entity recognition method based on Bi-LSTM, CNN and CRF
  • Text named entity recognition method based on Bi-LSTM, CNN and CRF
  • Text named entity recognition method based on Bi-LSTM, CNN and CRF

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0099] Taking the New York Times English news document as an example, the above method is applied to the document for text named entity recognition. The specific parameters and practices in each step are as follows:

[0100] 1. Use natural language processing tools to segment the document into sentences and words, so that each word in the document is a line, and the sentences are separated by spaces;

[0101] 2. Make statistics on the sentences, words and labels in 1 respectively to form a sentence table, a vocabulary table and a label table. The labels in the training document include "PER (person name)" "LOC (place name)" "ORG (organization)" "FAC (Institution)" and "GPE (geopolitical name)" are five categories, and the tags in the test documents are all "*". According to statistics, there are 17 sentences and 466 words in the document 1;

[0102] 3. Perform character statistics on the word list in 1 to form character list C;

[0103] 4. Use the trained 600 million Stanford...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text named entity recognition method based on Bi-LSTM, CNN and CRF. The method includes the following steps: (1) using a convolutional nerve network to encode and convert information on text word character level to a character vector; (2) combining the character vector and word vector into a combination which, as an input, is transmitted to a bidirectional LSTM neural network to build a model for contextual information of every word; and (3) in the output end of the LSTM neural network, utilizing continuous conditional random fields to carry out label decoding to a whole sentence, and mark the entities in the sentence. The invention is an end-to-end model without the need of data pre-processing in the un-marked corpus with the exception of the pre-trained word vector, therefore the invention can be widely applied for statement marking of different languages and fields.

Description

technical field [0001] The invention relates to natural language processing, in particular to a text named entity recognition method based on bidirectional LSTM neural network, convolutional neural network and conditional random field (CRF). Background technique [0002] Natural Language Processing (NLP) is an interdisciplinary subject integrating linguistics and computer science. Named Entity Recognition (NER) is a basic task in natural language processing, which aims to identify proper nouns and meaningful quantitative phrases in natural language texts and classify them. With the rise of information extraction and big data concepts, the task of named entity recognition has attracted increasing attention, and has become an important part of natural language processing such as public opinion analysis, information retrieval, automatic question answering, and machine translation. How to automatically, accurately and quickly identify named entities from massive Internet text i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06N3/08
CPCG06F40/295G06N3/08
Inventor 汤斯亮吴飞张宁戴洪良庄越挺张寅
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products