Adversarial interpolation sequence-based annotation data enhancement method and device, equipment and medium

A technology of sequence labeling and interpolation, which is applied in electronic digital data processing, text database query, special data processing applications, etc., can solve the problems affecting the model training effect and the lack of sample data of the sequence labeling model, and achieve the effect of improving the effect.
CN113297355APending Publication Date: 2021-08-24CHINA PING AN LIFE INSURANCE CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHINA PING AN LIFE INSURANCE CO LTD
Publication Date
2021-08-24

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses an adversarial interpolation sequence-based annotation data enhancement method and device, equipment and a medium. The method comprises the following steps: acquiring first sample data containing sequence labels; inputting the first sample data into a preset language model, outputting candidate word vectors conforming to context semantic constraints, and forming enhanced second sample data according to the candidate word vectors; and interpolating the first sample data and the second sample data by adopting an adversarial interpolation method to obtain interpolated enhanced sample data. According to the sequence annotation data enhancement method provided by the embodiment of the invention, the language model is utilized to provide the candidate word vector conforming to the context constraint, and the adversarial interpolation is utilized to consider the task characteristics, so that a more difficult sample which enables a machine learning algorithm to easily generate misjudgment is generated, the effect of the sequence model under low resources is improved. The problem that the accuracy of the model is influenced by less annotation data is solved.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The present invention relates to the technical field of sequence labeling, in particular to a method, device, device and medium for enhancing sequence labeling data based on anti-interpolation. Background technique

[0002] The sequence model has a wide range of application scenarios in Chinese word segmentation, named entity recognition, entity and relationship extraction, etc. Using sequence annotation in online scenes will encounter the problem of less annotation data (low resources). In the case of low resources, such as only a small number of samples per label, the model may overfit and its performance will not meet expectations. This overfitting situation is more obvious when the data is scarce, such as the extreme case of only 5 samples per class. Facing a low-resource application scenario where labeled data is scarce, data augmentation is an effective technical method, which can use a very small amount of labeled corpus to obtain a basic mode...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More