Entity identification method and device, electronic equipment and storage medium

A technology for entity recognition and entity grouping, applied in neural learning methods, electrical digital data processing, instruments, etc., can solve problems such as poor portability, cumbersome feature selection, and a large number of problems, and achieve the effect of improving representation ability and high labeling efficiency

Pending Publication Date: 2020-11-24
杭州远传新业科技股份有限公司
View PDF0 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The unsupervised model has disadvantages such as cumbersome feature selection and poor portability, while the most intuitive disadvantage of the supervised model is that it requires a large amount of labeled corpus, which often requires a lot of manpower to label data, and the quality of data labeling will greatly affect The recognition accuracy of the model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity identification method and device, electronic equipment and storage medium
  • Entity identification method and device, electronic equipment and storage medium
  • Entity identification method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] Embodiment 1 provides an entity recognition method, please refer to figure 1 shown, including the following steps:

[0049] S110, learn the text to be marked based on the BERT model to obtain word vectors of each word in the text to be marked, and form a text vector from the word vectors of each word.

[0050] The BERT (Bidirectional Encoder Representations from Transformers) model is a deep bidirectional pre-trained language understanding model using the Transformer model as a feature extractor. Essentially, it learns a good feature representation for words by running a self-supervised learning method on the basis of massive corpus. , self-supervised learning refers to supervised learning that operates on data that has not been manually labeled. The Transformer model is a classic NLP model proposed by the Google team. For example, the following formula models a piece of text based on the attention mechanism, which can be trained in parallel and has global information....

Embodiment 2

[0075] Embodiment 2 is an improvement on the basis of Embodiment 1, please refer to figure 2 As shown, the text to be marked is learned based on the BERT model to obtain the word vector of each word in the text to be marked, and the word vector of each word is used to form a text vector, including the following steps:

[0076] S210. Place a sentence start tag, a sentence end tag, and a segmentation tag respectively at the beginning of a sentence, at the end of a sentence of the text to be tagged, and between two sentences in the text to be tagged to obtain an intermediate text. Usually, the sentence beginning label, sentence end label and segmentation label use the [CLS] label, [SEP] label and [SEP] label respectively, and it is convenient to obtain the context information of each word in the text to be labeled when learning based on the BERT model .

[0077] S220. Segment the intermediate text at the character level to obtain a plurality of words, randomly select several wo...

Embodiment 3

[0093] Embodiment 3 discloses an entity recognition device corresponding to the above embodiment, which is the virtual device structure of the above embodiment, please refer to image 3 shown, including:

[0094] The text vector calculation module 410 is used to learn the text to be marked based on the BERT model to obtain the word vector of each word in the text to be marked, and form the text vector by the word vector of each of the words;

[0095] A model set and an unlabeled corpus acquisition module 420, configured to acquire a model set including N preliminary trained neural network models and an unlabeled corpus including a plurality of unlabeled texts, and convert the N preliminary trained neural networks The models are respectively recorded as mi, i=1,..., N, N>2;

[0096] The collaborative training module 430 is configured to, for each of the initially trained neural network models mi, identify each of the unlabeled texts based on other N-1 initially trained neural ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an entity recognition method and device, electronic equipment and a storage medium, relates to the field of natural language processing, and solves the problem that entity recognition needs to label corpus samples on a large scale, and the method comprises the following steps: learning a text to be labeled based on a BERT model to obtain a text vector; carrying out preliminary training on each neural network model by utilizing the labeled text; respectively obtaining N-1 groups of entity label sequences of each unlabeled text based on other N-1 preliminarily trained neural network models in the model set, training each preliminarily trained neural network model mi based on each unlabeled text and the N-1 groups of entity label sequences of each unlabeled text to obtain a cooperatively trained neural network model Mi; calculating the text vector based on a plurality of cooperatively trained neural network models and CRF models to obtain a plurality of candidate labeling sequences; and selecting a group of candidate annotation sequences as annotation results of the to-be-annotated text based on voting rules.

Description

technical field [0001] The present invention relates to the field of natural language processing, in particular to an entity recognition method, device, electronic equipment and storage medium. Background technique [0002] Named Entity Recognition (NER) is one of the most widely used and practical key technologies in the field of natural language processing. It is the basis of knowledge graphs, machine translation, question answering systems and other fields. In the text, there are entities with specific meaning or strong references and they are classified. The types of these entities mainly include person names, organization names, places and some other proper nouns. [0003] The training methods of entity recognition models are generally divided into two types: supervised and unsupervised. Among them, CRF and HMM are commonly used models for unsupervised models, and neural network models are the main representatives for supervised models. The unsupervised model has disad...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/216G06N3/04G06N3/08
CPCG06F40/295G06F40/216G06N3/08G06N3/088G06N3/045
Inventor 嵇望朱鹏飞王伟凯钱艳安毫亿梁青陈默
Owner 杭州远传新业科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products