Text data enhancement method, device and apparatus and computer readable storage medium

A technology of text data and training data, which is applied in the field of computer processing of natural language, can solve problems such as the inability to guarantee the diversity of clauses in translation results, and achieve the effect of ensuring that the syntactic structure remains unchanged, reducing noise interference, and increasing the number

Pending Publication Date: 2021-08-13
HUBEI NORMAL UNIV
View PDF9 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method is often limited by the calling frequency of the translation interface, and cannot guarantee the diversity of translation result clauses
The generative model method mainly uses pre-trained language models such as Bert to predict and generate words that are replaced by MASK in sentences, combined with sequence labels, and uses the model to complete the words that are masked to obtain new clauses. This method ensures that the enhancer The diversity of sentence expressions, but often requires more hyperparameter optimization and a high-configuration operating environment to ensure the accuracy of predictions

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text data enhancement method, device and apparatus and computer readable storage medium
  • Text data enhancement method, device and apparatus and computer readable storage medium
  • Text data enhancement method, device and apparatus and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] In order to facilitate the understanding of the present invention, the present invention will be described more fully below with reference to the associated drawings. Possible embodiments of the invention are shown in the drawings. However, the present invention can be implemented in many different forms and is not limited to the embodiments which have been described herein with reference to the drawings. The embodiments described by referring to the accompanying drawings are exemplary for making the disclosure of the present invention more thorough and comprehensive, and should not be construed as limiting the present invention. Furthermore, if detailed descriptions of known technologies are not technically essential to the illustrated features of the invention, such technical details may be omitted.

[0041] Those skilled in the relevant art can understand that, unless otherwise defined, all terms (including technical terms and scientific terms) used herein have the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a text data enhancement method, device and apparatus and a computer readable storage medium, and the method comprises the following steps: dividing training data into a training set and a verification set according to a preset proportion, grouping the training set according to the polarity of sentence aspect words in the training set, and counting the number of polarity sentences in each group; according to a preset enhancement rate, generating polar sentences with similar or equal sentence numbers in each group; and combining the enhanced polar sentences with the corresponding parent sentences, disorganizing, sorting and storing. According to the text data enhancement method provided by the invention, the training data of the preset scale is classified, and the classified sentences are enhanced according to the polarity of the aspect words in the sentences, so that the quantity of the training data is increased on the premise of ensuring that the syntactic structure is not changed and reducing the introduction of noise interference, and thus the accuracy of natural language processing is improved.

Description

technical field [0001] The invention relates to the field of computer processing of natural language, in particular to a text data enhancement method, device, equipment and computer-readable storage medium. Background technique [0002] In supervised text data classification tasks, deep learning models are trained on data with associated class labels. Models tend to perform better if there are sufficient, representative samples of each class label in the data. In natural language processing, the main challenge is that there are few labeled sample data, especially for low-resource language data, such as certain dialects or minority languages. Manually collecting and labeling additional data is a time-consuming and inefficiently challenging task. Therefore, in such text data classification tasks, text data enhancement methods are commonly used. [0003] Existing text data enhancement methods are mainly aimed at sentences, through methods such as synonym replacement, back tr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/30G06F40/284G06N3/04
CPCG06F40/30G06F40/284G06N3/044
Inventor 李光敏丁毅杨杏本
Owner HUBEI NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products