Text corpus expansion method

A corpus and text technology, applied in the field of text corpus expansion

Active Publication Date: 2021-03-05
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF6 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a text corpus expansion method for the problems of the above-mentioned prior art. The corpus generated by this method can take into account both the diversity of sentence patterns and the fidelity of semantics, and is especially suitable when the initial corpus is not sufficient. Text corpus augmentation application

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text corpus expansion method
  • Text corpus expansion method
  • Text corpus expansion method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0068] The specific embodiments of the present invention are described below so that those skilled in the art can understand the present invention, but it should be clear that the present invention is not limited to the scope of the specific embodiments. For those of ordinary skill in the art, as long as various changes Within the spirit and scope of the present invention defined and determined by the appended claims, these changes are obvious, and all inventions and creations using the concept of the present invention are included in the protection list.

[0069] This embodiment provides a text corpus expansion method, the process of which is as follows figure 1 As shown, it includes two stages of index table creation and corpus expansion; this embodiment describes the training corpus expansion method applied to bank intelligent customer service robots for the financial field; the specific steps are as follows:

[0070] First, the creation of the index table includes the foll...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of natural language processing, and particularly provides a text corpus expansion method which is used for overcoming the defects of an existing text enhancement method in sentence pattern diversity and semantic fidelity of corpus generation. According to the method, a word replacement method and a translation method are combined, and the defects causedby combined use of the two methods are overcome; firstly, synonym hash tables, input sentences and the like are subjected to standardization processing, and meanwhile, a strategy of performing one-way replacement in the direction that the number of words is not reduced is adopted in word replacement, so that the influence of words with wrongly written characters and spoken words on the back-translation effect is reduced; in addition, by introducing masking translation, the problem of inaccurate translation of the professional noun by a translation method is improved; the corpus generated by the method gives consideration to sentence pattern diversity and semantic fidelity, and is particularly suitable for text corpus expansion application under the condition that the initial corpus is insufficient.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and in particular relates to a text corpus expansion method. Background technique [0002] When deep learning technology is applied to natural language processing, it often encounters the problem of insufficient text training data or data imbalance. Solving the problem of lack of text corpus has become the focus of research in the field of natural language processing. Text enhancement is a common method to increase text training corpus, which is to construct more text data through some processing on the basis of existing text data. Unlike image data enhancement, text is discrete, and data cannot be increased by linear interpolation, rotation, etc., and the semantic information and grammatical structure of sentences need to be considered. [0003] Text enhancement methods mainly include manual annotation, word replacement, back translation, neural network and other methods. M...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/30G06N3/04G06N3/08
CPCG06F40/289G06F40/30G06N3/08G06N3/045
Inventor 甘涛成鑫何艳敏王志阳
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products