Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Method and system for detecting quality of simulated user experience in dialogue strategy learning

A technology for simulating users and policy learning, applied in the field of machine learning, can solve problems such as hyper-parameter selection is highly sensitive, unstable, and dialogue learning performance constraints, and achieve the effect of effectively controlling the quality of simulation experience and the quality of relaxation

Active Publication Date: 2021-08-10
NANHU LAB
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there is a huge instability problem in the training of GAN, which will lead to non-convergence in dialogue policy learning with a high probability, and is highly sensitive to the selection of hyperparameters, which seriously restricts the performance of dialogue learning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for detecting quality of simulated user experience in dialogue strategy learning
  • Method and system for detecting quality of simulated user experience in dialogue strategy learning
  • Method and system for detecting quality of simulated user experience in dialogue strategy learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0029] Such as figure 1 As shown, this scheme proposes a method for detecting the quality of simulated user experience in dialogue policy learning. Dialogue strategy learning. The dialogue policy learning of the dialogue policy model mainly includes two parts: direct reinforcement learning and indirect reinforcement learning (also called planning). Direct reinforcement learning, using Deep Q-Network (DQN) to improve the dialogue policy based on real experience, the dialogue policy model interacts with the user User, in each step, the dialogue policy model maximizes the value function Q according to the observed dialogue state s, Select the action a to perform. Then, the dialog policy model receives the reward r, the real user's action a r u , and update the current state to s’, and then the real experience (s, a, r, a r u , t) is stored in the real user experience database, and t is used to indicate whether the dialogue is terminated.

[0030] Maximize the value functi...

Embodiment 2

[0045] This embodiment is similar to Embodiment 1, the difference is that this embodiment considers that in the initial stage, there are only limited actions (behaviors) in the lexicon world-dict, so the length of the lexicon same-dict is also very small, in order to predict For the thermal world model, preferably when the length of the lexicon same-dict is less than the constant C, the simulation experience is regarded as qualified. The constant C is determined by those skilled in the art according to specific conditions, and is not limited here.

[0046] At this time, only when the length of the thesaurus same-dict reaches a certain value, that is, when it is greater than or equal to the constant C, the variable KL defined in advance is passed. pre Track the KL divergence between thesaurus real-dict and thesaurus world-dict for similarity measurement.

Embodiment 3

[0048] This embodiment provides a system for detecting the quality of simulated user experience in dialogue strategy learning, which is used to implement the method in Embodiment 1 or Embodiment 2, including a system connected to the world model, the real user experience library, and the dialogue strategy model. A quality detector, and the quality detector includes a KL divergence detector, and the KL divergence detector is used to detect the quality of the simulated experience generated by the world model according to the real experience generated by the real user.

[0049] Specifically, the quality detector includes a thesaurus real-dict for storing real experience, a thesaurus world-dict for storing simulated experience, and a primary key for saving the intersection of the thesaurus real-dict and thesaurus world-dict in two Thesaurus same-dict of frequency values ​​in a thesaurus.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a method and system for detecting the quality of simulated user experience in dialogue strategy learning, the method comprising the following steps: S1. generating simulated experience from a world model; Perform quality inspection on the simulation experience described above; S3. Save the simulation experience that passes the quality inspection to be used for dialogue policy model training. This scheme introduces a quality detector based on KL divergence, which can evaluate the quality of simulation experience more easily and effectively, and greatly improve the computational efficiency while ensuring the robustness and effectiveness of the dialogue strategy, and realize the goal of effectively controlling the quality of simulation experience. Purpose.

Description

technical field [0001] The invention belongs to the technical field of machine learning, and in particular relates to a method and system for detecting the quality of simulated user experience in dialogue strategy learning. Background technique [0002] Task-completion dialogue policy learning aims to build a task-completion-oriented dialogue system that can help users complete a specific single task or multi-domain tasks through several rounds of natural language interaction. It has been widely used in chatbots and personal voice assistants such as Apple's Siri and Microsoft's Cortana. [0003] In recent years, reinforcement learning has gradually become the mainstream method for dialogue policy learning. Based on reinforcement learning, the dialogue system can gradually adjust and optimize the strategy through natural language interaction with the user to improve performance. However, the original reinforcement learning method requires a lot of human-computer dialogue in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/332G06F16/36G06N3/00G06N20/00
CPCG06N3/008G06F16/3329G06F16/374G06N20/00
Inventor 曹江吴冠霖方文其平洋栾绍童闫顼
Owner NANHU LAB
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products