A deep GP-based Dyna-Q method for dialogue policy learning

A policy learning and in-depth technology, applied in the field of machine learning, can solve problems such as instability, highly sensitive hyperparameter selection, performance constraints of dialogue learning, etc., and achieve easy evaluation and analysis effects

A policy learning and in-depth technology, applied in the field of machine learning, can solve problems such as instability, highly sensitive hyperparameter selection, performance constraints of dialogue learning, etc., and achieve easy evaluation and analysis effects

CN113392956BActive Publication Date: 2022-02-11NANHU LAB

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A deep GP-based Dyna-Q method for dialogue policy learning
  • A deep GP-based Dyna-Q method for dialogue policy learning
  • A deep GP-based Dyna-Q method for dialogue policy learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0065] The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments.

[0066] Such as figure 1 As shown, this scheme proposes a GP-based deep Dyna-Q method for dialogue policy learning. The basic method is consistent with the existing technology, such as using human conversation data to initialize the dialogue policy model and world model, and then Start dialogue policy learning. The dialogue policy learning of the dialogue policy model mainly includes two parts: direct reinforcement learning and indirect reinforcement learning (also called planning). Direct reinforcement learning, using Deep Q-Network (DQN) to improve the dialogue policy based on real experience, the dialogue policy model interacts with the user User, in each step, the dialogue policy model maximizes the value function Q according to the observed dialogue state s, Select the action a to perform. Then, the dialog policy model receives...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a GP-based deep Dyna-Q method for dialogue policy learning, comprising the following steps: S1. Generate simulation experience from a GP-based world model; S2. Perform quality inspection on the simulation experience described above; S3. Use the simulation experience that passes the quality inspection to train the dialogue policy model. The world model of the present invention abandons the traditional DNN model, but constructs the world model into a Gaussian process model, which has the advantage of being easy to analyze; and the quality detector based on KL divergence can effectively control the simulation experience quality, by introducing KL divergence To examine the distribution of experience, no extra work is required to design and train complex quality detectors, which makes it easier to evaluate the quality of simulated experience, and greatly improves computational efficiency while ensuring the robustness and effectiveness of dialogue policies.

Description

technical field [0001] The invention belongs to the technical field of machine learning, and in particular relates to a GP-based deep Dyna-Q method for dialogue strategy learning. Background technique [0002] Task-completion dialogue policy learning aims to build a task-completion-oriented dialogue system that can help users complete a specific single task or multi-domain tasks through several rounds of natural language interaction. It has been widely used in chatbots and personal voice assistants such as Apple's Siri and Microsoft's Cortana. [0003] In recent years, reinforcement learning has gradually become the mainstream method for dialogue policy learning. Based on reinforcement learning, the dialogue system can gradually adjust and optimize the strategy through natural language interaction with the user to improve performance. However, the original reinforcement learning method requires a lot of human-computer dialogue interactions before obtaining a usable dialogu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
11 Feb 2022
Publication
CN113392956B
IPC
G06N3/04; G06N3/08; G06F16/332
CPC
G06N3/04; G06N3/08; G06F16/3329
Inventors
方文其; 曹江