Adversarial interpolation sequence-based annotation data enhancement method and device, equipment and medium

A technology of sequence labeling and interpolation, which is applied in electronic digital data processing, text database query, special data processing applications, etc., can solve the problems affecting the model training effect and the lack of sample data of the sequence labeling model, and achieve the effect of improving the effect.

Pending Publication Date: 2021-08-24
CHINA PING AN LIFE INSURANCE CO LTD
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Solved the technical problem that the sample data of the sequence labeling model is small, which affects the training effect of the model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Adversarial interpolation sequence-based annotation data enhancement method and device, equipment and medium
  • Adversarial interpolation sequence-based annotation data enhancement method and device, equipment and medium
  • Adversarial interpolation sequence-based annotation data enhancement method and device, equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0051] It can be understood that the terms "first", "second" and the like used in this application may be used to describe various elements herein, but these elements are not limited by these terms. These terms are only used to distinguish one element from another element. For example, without departing from the scope of the present application, the first field and the algorithm determination module can be referred to as the second field and the algorithm determination module, and similarly, the second field and the algorithm determination module can be referred to as the first field And algorithm...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an adversarial interpolation sequence-based annotation data enhancement method and device, equipment and a medium. The method comprises the following steps: acquiring first sample data containing sequence labels; inputting the first sample data into a preset language model, outputting candidate word vectors conforming to context semantic constraints, and forming enhanced second sample data according to the candidate word vectors; and interpolating the first sample data and the second sample data by adopting an adversarial interpolation method to obtain interpolated enhanced sample data. According to the sequence annotation data enhancement method provided by the embodiment of the invention, the language model is utilized to provide the candidate word vector conforming to the context constraint, and the adversarial interpolation is utilized to consider the task characteristics, so that a more difficult sample which enables a machine learning algorithm to easily generate misjudgment is generated, the effect of the sequence model under low resources is improved. The problem that the accuracy of the model is influenced by less annotation data is solved.

Description

technical field [0001] The present invention relates to the technical field of sequence labeling, in particular to a method, device, device and medium for enhancing sequence labeling data based on anti-interpolation. Background technique [0002] The sequence model has a wide range of application scenarios in Chinese word segmentation, named entity recognition, entity and relationship extraction, etc. Using sequence annotation in online scenes will encounter the problem of less annotation data (low resources). In the case of low resources, such as only a small number of samples per label, the model may overfit and its performance will not meet expectations. This overfitting situation is more obvious when the data is scarce, such as the extreme case of only 5 samples per class. Facing a low-resource application scenario where labeled data is scarce, data augmentation is an effective technical method, which can use a very small amount of labeled corpus to obtain a basic mode...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F40/289G06F40/30
CPCG06F16/3344G06F16/3346G06F40/289G06F40/30
Inventor 刘广
Owner CHINA PING AN LIFE INSURANCE CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products