Text data enhancement method, device and apparatus and computer readable storage medium

A technology of text data and training data, which is applied in the field of computer processing of natural language, can solve problems such as the inability to guarantee the diversity of clauses in translation results, and achieve the effect of ensuring that the syntactic structure remains unchanged, reducing noise interference, and increasing the number
CN113255365APending Publication Date: 2021-08-13HUBEI NORMAL UNIV

Patent Information

Authority / Receiving Office
CN Β· China
Patent Type
Applications(China)
Current Assignee / Owner
HUBEI NORMAL UNIV
Publication Date
2021-08-13

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention relates to a text data enhancement method, device and apparatus and a computer readable storage medium, and the method comprises the following steps: dividing training data into a training set and a verification set according to a preset proportion, grouping the training set according to the polarity of sentence aspect words in the training set, and counting the number of polarity sentences in each group; according to a preset enhancement rate, generating polar sentences with similar or equal sentence numbers in each group; and combining the enhanced polar sentences with the corresponding parent sentences, disorganizing, sorting and storing. According to the text data enhancement method provided by the invention, the training data of the preset scale is classified, and the classified sentences are enhanced according to the polarity of the aspect words in the sentences, so that the quantity of the training data is increased on the premise of ensuring that the syntactic structure is not changed and reducing the introduction of noise interference, and thus the accuracy of natural language processing is improved.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention relates to the field of computer processing of natural language, in particular to a text data enhancement method, device, equipment and computer-readable storage medium. Background technique

[0002] In supervised text data classification tasks, deep learning models are trained on data with associated class labels. Models tend to perform better if there are sufficient, representative samples of each class label in the data. In natural language processing, the main challenge is that there are few labeled sample data, especially for low-resource language data, such as certain dialects or minority languages. Manually collecting and labeling additional data is a time-consuming and inefficiently challenging task. Therefore, in such text data classification tasks, text data enhancement methods are commonly used.

[0003] Existing text data enhancement methods are mainly aimed at sentences, through methods such as synonym replacement, back tr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More