A deep GP-based Dyna-Q method for dialogue policy learning

A policy learning and in-depth technology, applied in the field of machine learning, can solve problems such as instability, highly sensitive hyperparameter selection, performance constraints of dialogue learning, etc., and achieve easy evaluation and analysis effects

Active Publication Date: 2022-02-11
NANHU LAB
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there is a huge instability problem in the training of GAN, which will lead to non-convergence in dialogue policy learning with a high probability, and is highly sensitive to the selection of hyperparameters, which seriously restricts the performance of dialogue learning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A deep GP-based Dyna-Q method for dialogue policy learning
  • A deep GP-based Dyna-Q method for dialogue policy learning
  • A deep GP-based Dyna-Q method for dialogue policy learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0065] The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments.

[0066] Such as figure 1 As shown, this scheme proposes a GP-based deep Dyna-Q method for dialogue policy learning. The basic method is consistent with the existing technology, such as using human conversation data to initialize the dialogue policy model and world model, and then Start dialogue policy learning. The dialogue policy learning of the dialogue policy model mainly includes two parts: direct reinforcement learning and indirect reinforcement learning (also called planning). Direct reinforcement learning, using Deep Q-Network (DQN) to improve the dialogue policy based on real experience, the dialogue policy model interacts with the user User, in each step, the dialogue policy model maximizes the value function Q according to the observed dialogue state s, Select the action a to perform. Then, the dialog policy model receives...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a GP-based deep Dyna-Q method for dialogue policy learning, comprising the following steps: S1. Generate simulation experience from a GP-based world model; S2. Perform quality inspection on the simulation experience described above; S3. Use the simulation experience that passes the quality inspection to train the dialogue policy model. The world model of the present invention abandons the traditional DNN model, but constructs the world model into a Gaussian process model, which has the advantage of being easy to analyze; and the quality detector based on KL divergence can effectively control the simulation experience quality, by introducing KL divergence To examine the distribution of experience, no extra work is required to design and train complex quality detectors, which makes it easier to evaluate the quality of simulated experience, and greatly improves computational efficiency while ensuring the robustness and effectiveness of dialogue policies.

Description

technical field [0001] The invention belongs to the technical field of machine learning, and in particular relates to a GP-based deep Dyna-Q method for dialogue strategy learning. Background technique [0002] Task-completion dialogue policy learning aims to build a task-completion-oriented dialogue system that can help users complete a specific single task or multi-domain tasks through several rounds of natural language interaction. It has been widely used in chatbots and personal voice assistants such as Apple's Siri and Microsoft's Cortana. [0003] In recent years, reinforcement learning has gradually become the mainstream method for dialogue policy learning. Based on reinforcement learning, the dialogue system can gradually adjust and optimize the strategy through natural language interaction with the user to improve performance. However, the original reinforcement learning method requires a lot of human-computer dialogue interactions before obtaining a usable dialogu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06N3/04G06N3/08G06F16/332
CPCG06N3/04G06N3/08G06F16/3329
Inventor 方文其曹江吴冠霖平洋栾绍童闫顼
Owner NANHU LAB
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products