A method for generating high-quality simulated experiences for dialogue policy learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A policy learning and high-quality technology, applied in the field of machine learning, can solve the problems of weakening the advantages of Dyna-Q framework and low efficiency of DDQ, so as to avoid poor learning effect

Active Publication Date: 2021-08-10

NANHU LAB

View PDF0 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

That is to say, world models implemented by models with high data requirements such as DNN will weaken the advantages brought by the Dyna-Q framework and make DDQ very inefficient in reality

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0054] Such as figure 1 As shown, this scheme proposes a GP-based deep Dyna-Q method for dialogue policy learning. The basic method is consistent with the existing technology, such as using human conversation data to initialize the dialogue policy model and world model, and then Start dialogue policy learning. The dialogue policy learning of the dialogue policy model mainly includes two parts: direct reinforcement learning and indirect reinforcement learning (also called planning). Direct reinforcement learning, using Deep Q-Network (DQN) to improve the dialogue policy based on real experience, the dialogue policy model interacts with the user User, in each step, the dialogue policy model maximizes the value function Q according to the observed dialogue state s, Select the action a to perform. Then, the dialog policy model receives the reward r, the real user's action a r u , and update the current state to s’, and then the real experience (s, a, r, a r u , t) is stored...

Embodiment 2

[0086] Such as Figure 9 As shown, this embodiment is similar to Embodiment 1, and the difference is that in this embodiment, before storing the simulation experience in the buffer, the quality detector performs quality inspection on the simulation experience, and passes the quality inspection. The experience is stored in the buffer.

[0087] Specifically, the upper bound simulation experience e is detected by the quality detector respectively l , lower limit simulation experience e b and meta-simulation experience e i the quality of. The quality detector here can use the traditional GAN (generative confrontation network) quality detector, or the KL divergence (Kullback-Leibler divergence) quality detector independently developed by the applicant.

[0088] The following is a brief introduction to the KL divergence quality detector, such as Figure 4 As shown, the quality inspection of the simulated experience is mainly carried out by comparing the simulated experience w...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a method for generating high-quality simulation experience for dialogue strategy learning, which belongs to the field of machine learning technology, comprising the following steps: S1. generating simulation experience based on GP-based world model prediction; S2. storing the simulation experience in Buffers for dialog policy model training. The world model based on the Gaussian process of this solution can avoid the problem that the quality of simulation experience generated by the traditional DNN model depends on the amount of training data. Less will lead to poor learning effect, low learning efficiency and other problems.

Description

technical field [0001] The invention belongs to the technical field of machine learning, and in particular relates to a method for generating high-quality simulation experience for dialogue strategy learning. Background technique [0002] Task-completion dialogue policy learning aims to build a task-completion-oriented dialogue system that can help users complete a specific single task or multi-domain tasks through several rounds of natural language interaction. It has been widely used in chatbots and personal voice assistants such as Apple's Siri and Microsoft's Cortana. [0003] In recent years, reinforcement learning has gradually become the mainstream method for dialogue policy learning. Based on reinforcement learning, the dialogue system can gradually adjust and optimize the strategy through natural language interaction with the user to improve performance. However, the original reinforcement learning method requires a lot of human-computer dialogue interactions befo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/332G06N3/00G06N20/00

CPCG06N3/008G06F16/3329G06N20/00

Inventor 平洋曹江方文其吴冠霖栾绍童闫顼

Owner NANHU LAB

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A method for generating high-quality simulated experiences for dialogue policy learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology