Method and device for expanding training samples

A technology of training samples and sentences, applied in the field of expanding training samples, can solve problems such as imbalance and achieve the effect of realizing data

Active Publication Date: 2022-04-12
ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Data imbalance will cause the classification model to be biased to determine the dialogue group as the category of training samples with a large amount of data, so how to solve the problem of data imbalance has become the core issue of improving classification accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for expanding training samples
  • Method and device for expanding training samples
  • Method and device for expanding training samples

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The solutions provided in this specification will be described below in conjunction with the accompanying drawings.

[0045] figure 1 It is a schematic diagram of an implementation scenario of an embodiment disclosed in this specification. This implementation scenario involves expanding the training samples. Specifically, before using the classification model to determine the standard questions corresponding to the dialogue group, the training samples must be used to train the classification model. Since the number of training samples corresponding to each standard question is unbalanced, Therefore, it is necessary to expand training samples for standard questions with a small number of training samples. Wherein, the above-mentioned dialog group specifically includes machine sentences and user sentences in the dialog between the user and the robot. refer to figure 1In the first stage, the dialogue between the user and the machine is carried out, that is to say, the r...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiments of this specification provide a method and device for expanding training samples. The method includes: obtaining the initial training sample group to be expanded, including the first number of training samples, the training samples include the original machine sentences and original user sentences in the historical dialogue group, and the category labels corresponding to the dialogue group, and the initial training sample group Each training sample of has a first category label; Obtain a second number of training samples from the initial training sample group; For each dialogue group in the second number of training samples, the first class associated with the original machine sentence in each dialogue group A machine statement is input into the dialog generation model of the pre-trained first category label, and the first user statement corresponding to the first machine statement in each dialogue group is generated; the first machine statement and the corresponding first machine statement in each dialogue group are The user sentence is added to the initial training sample group as an extended training sample to obtain an extended training sample group. Able to achieve data equalization of training samples.

Description

technical field [0001] One or more embodiments of this specification relate to the field of computers, and in particular to methods and devices for expanding training samples. Background technique [0002] When the robot answers the user's question, after the dialogue between the robot and the user, classify the dialogue groups including machine sentences and user sentences, and determine the user's request according to the classification results. The above classification may include determining a standard question sentence corresponding to the dialogue group, so that the robot provides an answer corresponding to the standard question sentence. Wherein, the standard questions are also referred to as standard questions, which are sorted out questions that some users may ask. Each question has a question ID. [0003] In the prior art, the classification model is usually trained by using historical conversation groups as training samples, and then the current conversation gro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/332G06F16/35
CPCG06F16/3329G06F16/355
Inventor 王雅芳龙翀张晓彤张杰
Owner ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products