Small sample text data hybrid enhancement method

A technology of text data and small samples, which is applied in the field of text data comprehensive enhancement technology, can solve the problems of incomplete text enhancement methods, and achieve the effects of improving adaptability, satisfying effects, and facilitating training

Active Publication Date: 2021-12-10
10TH RES INST OF CETC
View PDF16 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to further improve the quantity and quality of small-sample text data, the present invention aims at the problem of incomplete text enhancement methods in existing applications, and provides a simple, complete, self-adaptive, relatively stable and effective small-sample text data mixed enhancement method , which is beneficial to downstream tasks such as subsequent text classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Small sample text data hybrid enhancement method
  • Small sample text data hybrid enhancement method
  • Small sample text data hybrid enhancement method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] refer to figure 1 . According to the present invention, based on the goal of text data enhancement, firstly, the original text is divided into long text data and short text data, which are automatically separated and distinguished, and the long text data is enhanced by synonym replacement, random insertion, random exchange and random deletion. The length of the text is automatically adapted, and short text data is back-translated and enhanced at the same time; the length distribution of text data samples is statistically analyzed, and the distribution of data samples is subdivided into finer-grained groups for mask prediction or pre-training; each text Data samples are classified into different groups. For different groups of text data samples, different masking probabilities are set according to the group. Mask prediction is performed through the noise reduction self-encoding process, and the text data is enhanced twice. The text data is generated according to the smal...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The small sample text data hybrid enhancement method disclosed by the invention is simple, complete and high in self-adaption. The method is realized through the following technical scheme: based on a text data enhancement target, dividing an original text into long text data and short text data, automatically separating and distinguishing the long text data and the short text data, carrying out synonym replacement, random insertion, random exchange and random deletion on the long text data, automatically adapting texts with different lengths, carrying out retranslation enhancement on the short text data, carrying out statistical analysis on text data sample length distribution, subdividing data sample distribution into groups with finer granularity, and carrying out mask prediction or pre-training; classifying each text data sample into different groups, setting different covering probabilities for the text data samples of different groups according to the groups, and performing mask prediction through a noise reduction self-encoding process to realize secondary enhancement of the text data; and generating batch enhanced texts according to the small sample quantity to realize small sample text data hybrid enhancement. The text enhancement quantity is improved, and the enhancement quality is ensured.

Description

technical field [0001] The present invention relates to many information processing fields such as artificial intelligence and natural language processing, and is mainly used for data enhancement technology of text classification, especially related to comprehensive text data enhancement technology. Background technique [0002] Data augmentation, the artificial creation of training data by transformation for machine learning, is an area of ​​study that is widely studied across machine learning disciplines. It is not only useful for improving the generalization ability of the model, but also can solve many other challenges and problems, from overcoming the limited amount of training data to regulating the goal to limiting the amount of data used to preserve privacy. Data augmentation refers to the data augmentation or feature enhancement of the original small sample data set with the help of auxiliary data or auxiliary information. Data augmentation is to add new data to the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/205G06F40/211G06F40/247G06F40/30
CPCG06F40/205G06F40/247G06F40/211G06F40/30Y02D10/00
Inventor 代翔廖泓舟潘磊
Owner 10TH RES INST OF CETC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products