Supercharge Your Innovation With Domain-Expert AI Agents!

Corpus expansion method and device, computer equipment and storage medium

A technology for expanding devices and corpus, applied in the computer field, can solve the problems of inability to learn style and content representation, poor sentence generation effect, unable to generate sentences, etc., and achieve the effect of ensuring the effect of corpus generation

Pending Publication Date: 2021-10-22
NANJING UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in the above methods, unsupervised learning cannot learn a completely decoupled style and content representation, that is, the content representation always contains part of the style information, causing the generator to sometimes fail to generate sentences with the specified style, resulting in ineffective sentence generation. good

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus expansion method and device, computer equipment and storage medium
  • Corpus expansion method and device, computer equipment and storage medium
  • Corpus expansion method and device, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0062] Various exemplary embodiments, features, and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures indicate functionally identical or similar elements. While various aspects of the embodiments are shown in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

[0063] The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or better than other embodiments.

[0064] In addition, in order to better illustrate the present disclosure, numerous specific details are given in the following specific implementation manners. It will be understood by those skilled in the art that the present disclosure may be practiced without some of the specific details. In some instances, methods, means, componen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of computers, in particular to a corpus expansion method and device, computer equipment and a storage medium. The method comprises the steps of obtaining a parallel seed set, a first corpus and a second corpus, training a selection model according to the parallel seed set, the first corpus and the second corpus, and for each first text in the first corpus, determining a matched second text from the second corpus through the trained selection model; forming a plurality of groups of pseudo-parallel text pairs according to the plurality of first texts in the first corpus and the respective matched second texts; and screening the multiple groups of pseudo-parallel text pairs, and adding the screened multiple groups of pseudo-parallel text pairs into the parallel seed set. According to the embodiment of the invention, the mapping relation between the plurality of first texts and the plurality of second texts is learned by using the selection model, so that the problem of style conversion failure caused by the fact that the content style cannot be completely decoupled in related technologies is avoided, and the subsequent corpus generation effect is ensured.

Description

technical field [0001] The present disclosure relates to the field of computer technology, and in particular to a corpus expansion method, device, computer equipment and storage medium. Background technique [0002] Text style transfer is a technique that automatically converts text expression styles while preserving text content. [0003] In related technologies, due to the lack of a large number of parallel corpora with similar content but different styles, the mainstream technology of text style transfer is carried out in the way of unsupervised learning. Usually, a style-independent content representation vector is learned in the training stage, and the input sentence is reconstructed by combining the style representation of the original sentence. In the inference stage, sentences with the target style and specified content are generated based on the style-independent content representation vector and the target style representation. During training, GANs are often use...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/36G06F40/205G06F40/30G06F40/151G06K9/62
CPCG06F16/367G06F18/22
Inventor 黄书剑蒋庆男何亮张建兵陈家骏
Owner NANJING UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More