Bi-LSTM-based named entity identification method

A technology of named entity recognition and gradient descent algorithm, which is applied in the information field, can solve problems such as few network layers, low recognition rate of unregistered words, and no obvious advantages in the final named entity recognition results, and achieve the effect of improving accuracy

Inactive Publication Date: 2018-04-13
北京知道未来信息技术有限公司
View PDF5 Cites 56 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] Dictionary-based named entity recognition relies heavily on the dictionary database and cannot identify unregistered words
HMM (Hidden Markov) and CRF (Conditional Random Field) methods based on word frequency statistics can only relate to the semantics of the previous word of the current word, and the recognition accuracy is not high enough, especially the recognition rate of unregistered words is low
The method based on the artificial neural network model has the problem of gradient disappearance during training, and in actual applications, the number of network layers is small, and the final named entity recognition results have no obvious advantages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bi-LSTM-based named entity identification method
  • Bi-LSTM-based named entity identification method
  • Bi-LSTM-based named entity identification method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] In order to make the above objects, features and advantages of the present invention more obvious and understandable, the present invention will be further described in detail below through specific implementation cases and in conjunction with the accompanying drawings.

[0032] The invention discloses a named entity recognition method based on Bi-LSTM, such as recognizing a person's name, a place name, an organization name, a brand name, a company name, etc. from an unstructured text. The core problem to be solved in the present invention comprises two: 1. use LSTM-CRF model to improve the precision of named entity recognition; 2. add the feature of the character vector of word, solve the recognition to unregistered word named entity (Out of Vocabulary, OV).

[0033] In order to improve the accuracy of named entity recognition, we add Bi-LSTM character features and Bi-LSTM character feature layers on top of the traditional CRF model. The detailed structure is as follow...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Bi-LSTM-based named entity identification method. The method comprises the steps that 1, a training corpus for named entity identification is tagged to form a tagged corpus;2, words and characters in the tagged corpus are converted into vectors; 3, a Bi-LSTM-based named entity identification model is built through the vectors of the words and the characters, and parameters of the Bi-LSTM-based named entity identification model are trained; and 4, named entity identification prediction is conducted on to-be-predicted data through the trained named entity identification model. According to the method, by adopting the vectors based on the words and the characters, features of the characters and the words can be obtained simultaneously, and meanwhile the unknown word problem can be avoided; and in addition, compared with a traditional pure CRF model algorithm, by adopting a bidirectional long short-term memory (Bi-LSTM) neural network, the method has the advantage that more character and word features can be absorbed, and therefore the entity identification precision can be improved.

Description

technical field [0001] The invention belongs to the field of information technology, and in particular relates to a Bi-LSTM-based named entity recognition method. Background technique [0002] Named Entity Recognition (NER for short) refers to the recognition of entities with specific meanings in a text, mainly including names of people, places, institutions, and proper nouns. [0003] The practical scenarios of the named entity recognition method include: [0004] Scenario 1: Event detection. Place, time, and person are several basic components of time. When constructing an event summary, relevant persons, places, units, etc. can be highlighted. In the event search system, related people, time, and places can be used as index keywords. The relationship between several components of an event describes the event in more detail at the semantic level. [0005] Scenario 2: Information retrieval. Named entities can be used to enhance and improve the effect of the retrieval s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/295
Inventor 岳永鹏唐华阳
Owner 北京知道未来信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products