Named entities recognition method based on bidirectional LSTM and CRF

A named entity recognition, two-way technology, applied in the direction of neural learning methods, special data processing applications, instruments, etc., can solve the problems of low efficiency and low accuracy of named entity recognition, to solve the problem of named entity recognition, reduce workload, The effect of simplifying information processing

Inactive Publication Date: 2018-01-30
南京安链数据科技有限公司
View PDF3 Cites 96 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a named entity recognition method based on bidirectional LSTM and CRF, whic

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Named entities recognition method based on bidirectional LSTM and CRF
  • Named entities recognition method based on bidirectional LSTM and CRF
  • Named entities recognition method based on bidirectional LSTM and CRF

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0023] This embodiment provides a named entity recognition method based on two-way LSTM and CRF, the flow chart of the method is as follows figure 1 shown, including the following steps:

[0024] Step 1: Use open source tools to tokenize the text and decompose phrases into individual character forms. Count and number characters, words and tags, construct character tables and phrase tables. Manually annotate the text, and construct a label table by counting the text labels.

[0025] In this step, we use the BIO annotation set to annotate the text corpus, that is, the annotation set contains {B, I, O}, where B represents the beginning of the named entity, I represents the rest of the named entity, and O represents the part that does not belong to the entity. For named entities we use PER for person, LOC for location, and FAC for facility.

[0026] Step 2: Express the character features in step 1 as vectors, initialize the character table C, determine the dimension d1 of each ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a named entities recognition method based on bidirectional LSTM and CRF. The named entities recognition method based on the bidirectional LSTM and CRF is improved and optimizedbased on the traditional named entities recognition algorithm in the prior art. The named entities recognition method based on the bidirectional LSTM and CRF comprises the following steps: (1) preprocessing a text, extracting phrase information and character information of the text; (2) coding the text character information by means of the bidirectional LSTM neural network to convert the text character information into character vectors; (3) using the glove model to code the text phrase information into word vectors; (4) combining the character vectors and the word vectors into a context information vector and putting the context information vector into the bidirectional LSTM neural network; and (5) decoding the output of the bidirectional LSTM with a linear chain condition random field to obtain a text annotation entity. The invention uses a deep neural network to extract text features and decodes the textual features with the condition random field, therefore, the text feature information can be effectively extracted and good effects can be achieved in the entity recognition tasks of different languages.

Description

technical field [0001] The invention relates to a named entity recognition method, in particular to a named entity recognition method based on bidirectional LSTM and CRF. Background technique [0002] Named Entities Recognition (NER) is a basic task of Natural Language Processing (NLP). Its purpose is to identify named entities such as names of people, places, and organizations in the input text. [0003] In the field of named entity recognition, existing technologies can be divided into two categories. One is based on dictionaries and rules, which construct phrase dictionaries for high-frequency words according to the frequency of occurrence of phrases. For words that can be retrieved in dictionaries, directly It is identified as a named entity; or according to the composition rules of the phrase, for example, the name of an institution usually includes location and function information, etc., and the phrase that meets the corresponding rules is directly marked. The other ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/21G06N3/08
Inventor 薛涵凛顾孙炎
Owner 南京安链数据科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products