A deep GP-based Dyna-Q method for dialogue policy learning

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A policy learning and in-depth technology, applied in the field of machine learning, can solve problems such as instability, highly sensitive hyperparameter selection, performance constraints of dialogue learning, etc., and achieve easy evaluation and analysis effects

Active Publication Date: 2022-02-11

NANHU LAB

View PDF3 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, there is a huge instability problem in the training of GAN, which will lead to non-convergence in dialogue policy learning with a high probability, and is highly sensitive to the selection of hyperparameters, which seriously restricts the performance of dialogue learning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0065] The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments.

[0066] Such as figure 1 As shown, this scheme proposes a GP-based deep Dyna-Q method for dialogue policy learning. The basic method is consistent with the existing technology, such as using human conversation data to initialize the dialogue policy model and world model, and then Start dialogue policy learning. The dialogue policy learning of the dialogue policy model mainly includes two parts: direct reinforcement learning and indirect reinforcement learning (also called planning). Direct reinforcement learning, using Deep Q-Network (DQN) to improve the dialogue policy based on real experience, the dialogue policy model interacts with the user User, in each step, the dialogue policy model maximizes the value function Q according to the observed dialogue state s, Select the action a to perform. Then, the dialog policy model receives...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention provides a GP-based deep Dyna-Q method for dialogue policy learning, comprising the following steps: S1. Generate simulation experience from a GP-based world model; S2. Perform quality inspection on the simulation experience described above; S3. Use the simulation experience that passes the quality inspection to train the dialogue policy model. The world model of the present invention abandons the traditional DNN model, but constructs the world model into a Gaussian process model, which has the advantage of being easy to analyze; and the quality detector based on KL divergence can effectively control the simulation experience quality, by introducing KL divergence To examine the distribution of experience, no extra work is required to design and train complex quality detectors, which makes it easier to evaluate the quality of simulated experience, and greatly improves computational efficiency while ensuring the robustness and effectiveness of dialogue policies.

Description

technical field [0001] The invention belongs to the technical field of machine learning, and in particular relates to a GP-based deep Dyna-Q method for dialogue strategy learning. Background technique [0002] Task-completion dialogue policy learning aims to build a task-completion-oriented dialogue system that can help users complete a specific single task or multi-domain tasks through several rounds of natural language interaction. It has been widely used in chatbots and personal voice assistants such as Apple's Siri and Microsoft's Cortana. [0003] In recent years, reinforcement learning has gradually become the mainstream method for dialogue policy learning. Based on reinforcement learning, the dialogue system can gradually adjust and optimize the strategy through natural language interaction with the user to improve performance. However, the original reinforcement learning method requires a lot of human-computer dialogue interactions before obtaining a usable dialogu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G06N3/04G06N3/08G06F16/332

CPCG06N3/04G06N3/08G06F16/3329

Inventor方文其曹江吴冠霖平洋栾绍童闫顼

OwnerNANHU LAB

A deep GP-based Dyna-Q method for dialogue policy learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology