Unlock instant, AI-driven research and patent intelligence for your innovation.

Incremental named entity recognition method based on pseudo sample replay

A named entity recognition and incremental technology, applied in character and pattern recognition, neural learning methods, instruments, etc., can solve problems such as distillation of difficult old knowledge

Pending Publication Date: 2022-05-17
PEKING UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Despite the initial success of the above approach, it suffers from the following drawbacks: This distillation-based approach relies on the training dataset The number of old-type entities in , if Without the old-type entity in , it is difficult for the teacher model to distill old knowledge into the student model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Incremental named entity recognition method based on pseudo sample replay
  • Incremental named entity recognition method based on pseudo sample replay
  • Incremental named entity recognition method based on pseudo sample replay

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] The present invention includes a master model (M) for named entity recognition, and a generator (G) for generating pseudo-samples,

[0026] Main model Named entity recognition is usually modeled as a sequence labeling task, i.e. assigning a label to each word. The main model of the present invention consists of a feature extractor and a classification layer. The feature extractor uses the pre-trained language model BERT-base, and the classification layer uses a linear layer with softmax. Given a word sequence of length L [x 1 , x 2 ,...,x L ] and labels for each word [y 1 ,y 2 ,...,y L ], first get the hidden vector of each word through the feature extractor [h 1 , h 2 ,...,h L ], and then map the latent vector to the label space [s 1 ,s 2 ,...,s L ], and then get the probability of each word on all types through softmax [p 1 ,p 2 ,...,p L ]:

[0027] z i =Wh i +b

[0028]

[0029] in, d is the hidden vector size of the pre-trained language model...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an incremental named entity recognition method based on pseudo sample replay, which is the basis of a knowledge graph construction technology, and belongs to the technical field of information extraction in natural language processing. In the learning stage, a training set only containing a new entity type is given, and when an old model is used as a teacher and a new student model is trained, knowledge distillation loss is added on the basis of conventional cross entropy loss; in the review stage, pseudo samples about old types are generated as review materials, and old knowledge is obtained by further distillation on the review materials and integrated with new knowledge. According to the method, old-type pseudo samples are used for providing new-type supervision signals for review materials, teachers are used for providing the old-type supervision signals, and after the new-type supervision signals and the old-type supervision signals exist, the supervision signals can be used for restraining output of new student models on the review materials.

Description

technical field [0001] The invention provides an incremental named entity recognition technology, and specifically designs a named entity recognition method based on pseudo-sample replay, which is the basis of knowledge map construction technology and belongs to the technical field of information extraction in natural language processing. Background technique [0002] Traditional Named Entity Recognition [1] It refers to the extraction of specified categories of entities (such as person names, place names, and institution names) from unstructured text, and is one of the important steps in information extraction. Traditional methods are limited to extracting predefined types of entities. However, in reality, the types of entities to be extracted tend to expand dynamically with demand. The introduction of this requires the model to be able to recognize a dynamically expanded set of entity types. In order to adapt to the above scenario, a simple method is to label a data set ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/242G06K9/62G06N3/04G06N3/08
CPCG06F40/295G06F40/242G06N3/08G06N3/044G06F18/2415
Inventor 夏宇李素建
Owner PEKING UNIV