Entity identification method, terminal equipment and storage medium

A technology of entity recognition and entity, which is applied in the field of text recognition, can solve the problems that the accuracy cannot meet expectations, the accuracy of entity boundary determination is insufficient, and the lack of high-quality marked entities is used to achieve the effect of reducing the impact

Active Publication Date: 2020-12-18
厦门渊亭信息科技有限公司
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] 1. Existing entity recognition models based on recurrent neural networks require a large amount of high-quality labeled training data, and specific professional field scenarios usually lack high-quality labeled entities, which brings great challenges to existing model training
[0010] 2. The existing entity recognition model cannot handle nested entities well (such as entity: Xiamen Jimei Software Park, where Xiamen, Jimei, Software Park and Jimei Software Park are all independent entities). The usual practice is to take the most The outer (longest) entity, ignoring other entities that exist inside it
[0011] 3. The bottleneck of existing entity recognition technology lies in the insufficient accuracy of entity boundary determination
Usually sparse boundary labels and fuzzy matching will make the accuracy of entity recognition results unable to meet expectations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity identification method, terminal equipment and storage medium
  • Entity identification method, terminal equipment and storage medium
  • Entity identification method, terminal equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041]The embodiment of the present invention provides an entity identification method, such asfigure 1 withfigure 2 As shown, the method includes the following steps:

[0042]S1: Construct a word graph containing the corresponding domain entities of the text to be recognized.

[0043]In this embodiment, a word graph is constructed for the dictionary in each field, and all the words contained in the dictionary are formed into the vertex set of the word graph. If two characters can form an entity, use the words between the two characters. The straight lines representing the undirected relationship are connected.

[0044]Usually an entity is composed of multiple characters, and the positional relationship of different characters in an entity is different and fixed. In order to obtain the positional relationship of different characters in the entity, the mark set BIESO is set in this embodiment to record different vertices respectively. Which of the following five situations the corresponding d...

Embodiment 2

[0069]The present invention also provides an entity recognition terminal device, including a memory, a processor, and a computer program stored in the memory and running on the processor. The processor implements the implementation of the present invention when the processor executes the computer program. Example 1 of the steps in the above method embodiment.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an entity recognition method, terminal equipment and a storage medium. The method comprises the steps of S1, constructing a word graph containing domain entities correspondingto a to-be-recognized text; S2, expressing each word in the to-be-recognized text as a vocabulary tensor through a word vector embedding layer; S3, extracting candidate entities corresponding to theto-be-recognized text from the constructed word graph through a graph neural network module according to all vocabulary tensors of the to-be-recognized text, wherein the graph neural network module comprises a graph attention network layer and a bidirectional graph convolutional network layer; S4, converting a vocabulary tensor and a candidate entity of the text to be recognized into an intermediate calculation tensor containing context information through a bidirectional recurrent neural network layer; and S5, inputting the intermediate calculation tensor into a CRF decoding layer for decoding to obtain an entity contained in the finally recognized text to be recognized. According to the method, the secondary graph structure of the entity boundary is modeled, and the relationship betweenthe entity boundary and the graph neural network is analyzed, so that influence of insufficient judgment of the entity boundary on the result accuracy is reduced.

Description

Technical field[0001]The invention relates to the field of text recognition, in particular to an entity recognition method, terminal equipment and storage medium.Background technique[0002]Named Entity Recognition (NER), also known as "proprietary name recognition", refers to the recognition of entities with specific meanings in the text, including names of persons, places, organizations, proper nouns, etc. Simply put, it is to identify the boundaries and categories of entity references in natural text. The current entity recognition methods include:[0003]1. Supervised learning method: This type of algorithm needs to use large-scale labeled corpus to train the model parameters. Currently commonly used models or methods include hidden Markov model (HMM), language model, maximum entropy model, support tensor machine (SVM), decision tree (DT) and conditional random field (CRF). The current method based on conditional random field is the most successful method in named entity recognition...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/284G06F40/126G06F40/242G06F16/901G06N3/04
CPCG06F40/295G06F40/284G06F40/126G06F40/242G06F16/9024G06N3/045
Inventor 洪万福钱智毅刘剑涵
Owner 厦门渊亭信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products