Generation method and device of synonymous sentence generation model and medium

A technology for generating models and generating devices, which is applied in semantic analysis, instruments, electrical digital data processing, etc., and can solve the problems of a large number of manual labeling costs and few parallel data databases

Active Publication Date: 2020-02-07
BEIJING XIAOMI INTELLIGENT TECH CO LTD
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] The biggest disadvantage of this method is that it requires a large amount of parallel data (that is, sentence pairs, each sentence pair includes two synonymous sentences) for training. In practical applications, there are few databases about parallel data. If you want to obtain a large number of Parallel data requires a lot of manual labeling costs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Generation method and device of synonymous sentence generation model and medium
  • Generation method and device of synonymous sentence generation model and medium
  • Generation method and device of synonymous sentence generation model and medium

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0106] Collect N single sentences from the second set, use the generative model to generate synonymous sentences corresponding to each single sentence for M single sentences in the N single sentences, and obtain M positive sample sentence pairs; use M single sentences in the N single sentences and divide M N-M single sentences other than the single sentence form M negative sample sentence pairs. Among them, M is smaller than N. When using M single sentences and N-M single sentences to form M negative sample sentence pairs, when N-M is greater than M, use M single sentences and M single sentences selected from N-M to form M negative sample sentence pairs, and when N-M is less than M, use The same single sentence in the M single sentences and multiple single sentences in the N-M single sentences respectively form different correspondences, thereby forming M negative sample sentence pairs. That is to say, N-M single sentences can be reused to form M negative sample sentence pair...

example 2

[0108] Collect 2N single sentences from the second set, use the generative model to generate synonymous sentences corresponding to each single sentence for the first N single sentences in the 2N single sentences, and obtain N positive sample sentence pairs; use the last N single sentences in the 2N single sentences to form N negative sample sentence pairs.

[0109] Method 2: collect a fifth preset number of single sentences from the second set, use a generative model to generate synonymous sentences corresponding to each single sentence for the fifth preset number of single sentences, and obtain a fifth preset number of positive sample sentence pairs, Using the fifth preset number of single sentences and the sixth preset number of single sentences in the second set except the fifth preset number of single sentences to form a seventh preset number of negative sample sentence pairs.

[0110] E.g:

[0111] Collect X single sentences from the second set, use the generative model ...

specific Embodiment

[0125] Step 1, data preparation process:

[0126] A large number of synonymous sentence groups are determined by manual labeling to form a first set S. The first set S includes multiple synonymous sentence groups, and each synonymous sentence group includes two or more synonymous sentences.

[0127] A million-level single sentences are collected from Chinese websites randomly or according to preset domain branches through the network, and the collected single sentences form the second set C.

[0128] Step 2, pre-training process:

[0129] Step 2.1, using the first set S to pre-train to obtain a generative model G, the expression of the generative model is: Y=G(X), X and Y are synonymous sentences. Comprise a plurality of synonymous sentence groups in the first collection S, when each synonymous sentence group comprises two synonymous sentences, make these two synonymous sentences train; Each synonymous sentence group comprises more than two synonymous sentences When , any tw...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a generation method and device of a synonymous sentence generation model, and a medium. The method comprises the steps of performing training by using a first set to obtain a generation model and a discrimination model; wherein the first set comprises a plurality of synonymous sentence groups, and each synonymous sentence group comprises at least two synonymous sentences; and performing iterative processing on the generation model and the discrimination model until the generation model converges. A large number of simple sentences are used, and the advantages of low cost and no need of manual annotation of the simple sentences are fully utilized. In the training process of the model, a large number of simple sentences and a reinforcement learning mode are combined,the semantic richness is greatly improved through the use of the simple sentences, the reinforcement learning mode can enable the model to be continuously optimized in the iteration process, and therefore a high-quality synonymous sentence generation model can be trained completely without depending on a large number of parallel corpora.

Description

technical field [0001] This article relates to the technical field of mobile terminal data processing, in particular to a method, device and medium for generating a synonymous sentence generation model. Background technique [0002] The process of synonymous sentence generation is to generate a sentence Y with the same meaning as the sentence X through a generative model for an arbitrary sentence X, and the specific content of the sentence X and the sentence Y is not exactly the same. Synonymous sentence generation can be used to improve the robustness of the system and has a wide range of practical application values. It can also be applied to any field that requires synonymous sentence data expansion, such as: dialogue system corpus expansion, sentiment classification corpus expansion, similar questions generate etc. [0003] The earliest synonym generation methods usually use rule-based methods. For example: firstly, the keywords in the statement X are mined, and the sy...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/205G06F40/30
Inventor 李京蔚崔志崔建伟
Owner BEIJING XIAOMI INTELLIGENT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products