Unlock instant, AI-driven research and patent intelligence for your innovation.

Mixed corpus named entity recognition method based on LSTM-CNN

A named entity recognition and corpus technology, applied in the information field, can solve problems such as gradient disappearance, low recognition rate of unregistered words, and insignificant advantages in the final named entity recognition results, and achieve the effect of improving accuracy and avoiding unregistered words

Inactive Publication Date: 2018-05-01
北京知道未来信息技术有限公司
View PDF4 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0012] Disadvantage 1: The detection granularity of multiple languages ​​is not easy to distinguish, and there is a loss of participle accuracy because a certain language is not detected
[0013] Disadvantage 2: HMM (Hidden Markov) and CRF (Conditional Random Field) methods based on word frequency statistics can only relate to the semantics of the previous word of the current word, and the recognition accuracy is not high enough, especially the recognition rate of unregistered words is low;
[0014] Disadvantage 3: The method based on the artificial neural network model has the problem of gradient disappearance during training, and in actual applications, the number of network layers is small, and the final named entity recognition results have no obvious advantages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mixed corpus named entity recognition method based on LSTM-CNN
  • Mixed corpus named entity recognition method based on LSTM-CNN
  • Mixed corpus named entity recognition method based on LSTM-CNN

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below through specific embodiments and accompanying drawings.

[0045] The invention discloses a mixed corpus named entity recognition method based on LSTM-CNN. For example, identifying named entities such as person names, place names, and organization names in corpus data that is mixed in multiple languages. The core problems of the present invention include three: 1. the efficiency of mixed corpus recognition, 2. the precision of named entity recognition, and 3. the recognition precision of unregistered words.

[0046] In order to solve the problem of unregistered words, the present invention abandons the traditional vocabulary method, but adopts the idea based on word vectors, and is based on character vectors rather than word-based vectors. In order to solve the problem of low precision of the traditional ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a mixed corpus named entity recognition method based on an LSTM-CNN. According to the method, at a training stage, training mixed corpus data with a label is converted into mixed corpus data at a character level, and then a deep learning model based on the LSTM-CNN is trained; and at a prediction stage, test mixed corpus data without a label is converted into mixed corpusdata at the character level, and then the deep learning model trained at the training stage is used to perform prediction. Through the method, vectors at the character level rather than a term level are adopted, the influence of word segmentation precision can be avoided, and meanwhile the problem of unknown terms can be avoided; by the adoption of a combination model of a long short-term memory (LSTM) neural network and a convolutional neural network (CNN), precision is improved a lot compared with a traditional algorithm; and a mixed corpus is directly used to perform model training, it is not needed to detect and separate each language of the mixed corpus, and finally the purpose of recognizing the mixed corpus is achieved.

Description

technical field [0001] The invention belongs to the field of information technology, and in particular relates to a mixed corpus named entity recognition method based on LSTM-CNN. Background technique [0002] Named entity recognition refers to the process of identifying specified entity nouns with specific meanings for a given dataset. The practical scenarios of the named entity recognition method include: [0003] Scenario 1: Event detection. Place, time, and person are several basic components of time. When constructing an event summary, relevant persons, places, units, etc. can be highlighted. In the event search system, related people, time, and places can be used as index keywords. The relationship between several components of an event describes the event in more detail at the semantic level. [0004] Scenario 2: Information retrieval. Named entities can be used to enhance and improve the effect of the retrieval system. When the user enters "major", it can be fou...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06N3/04
CPCG06F40/295G06N3/045
Inventor 唐华阳岳永鹏刘林峰
Owner 北京知道未来信息技术有限公司