Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for expanding training sample

A technology of training samples and sentences, applied in the field of expanding training samples, can solve problems such as imbalance

Active Publication Date: 2020-01-17
ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
View PDF9 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Data imbalance will cause the classification model to be biased to determine the dialogue group as the category of training samples with a large amount of data, so how to solve the problem of data imbalance has become the core issue of improving classification accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for expanding training sample
  • Method and device for expanding training sample
  • Method and device for expanding training sample

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The solutions provided in this specification will be described below in conjunction with the accompanying drawings.

[0045] figure 1 It is a schematic diagram of an implementation scenario of an embodiment disclosed in this specification. This implementation scenario involves expanding the training samples. Specifically, before using the classification model to determine the standard questions corresponding to the dialogue group, the training samples must be used to train the classification model. Since the number of training samples corresponding to each standard question is unbalanced, Therefore, it is necessary to expand the training samples for standard questions with a small number of training samples. Wherein, the above-mentioned dialog group specifically includes machine sentences and user sentences in the dialog between the user and the robot. refer to figure 1In the first stage, the dialogue between the user and the machine is carried out, that is to say, t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a method and a device for expanding training samples. The method comprises: acquiring a to-be-expanded initial training sample group, the to-be-expanded initial training sample group comprising a first number of training samples, the training samples comprising original machine statements and original user statements in a historical dialogue group and category labels corresponding to the dialogue group, and each training sample in the initial training sample group having a first category label; obtaining a second number of training samples from the initial training sample group; for each dialogue group in the second number of training samples, inputting a first machine statement related to the original machine statement in each dialogue group into apre-trained dialogue generation model of a first class label, and generating a first user statement corresponding to the first machine statement in each dialogue group; and adding the first machine statement and the corresponding first user statement in each dialogue group into the initial training sample group as extended training samples to obtain an extended training sample group. Data equalization of training samples can be realized.

Description

technical field [0001] One or more embodiments of this specification relate to the field of computers, and in particular to methods and devices for expanding training samples. Background technique [0002] When the robot answers the user's question, after the dialogue between the robot and the user, classify the dialogue groups including machine sentences and user sentences, and determine the user's request according to the classification results. The above classification may include determining a standard question sentence corresponding to the dialogue group, so that the robot provides an answer corresponding to the standard question sentence. Wherein, the standard questions are also referred to as standard questions, which are sorted out questions that some users may ask. Each question has a question ID. [0003] In the prior art, the classification model is usually trained by using historical conversation groups as training samples, and then the current conversation gro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/332G06F16/35
CPCG06F16/3329G06F16/355
Inventor 王雅芳龙翀张晓彤张杰
Owner ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products