Man-machine interaction training method and device based on reinforcement learning strategy

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of reinforcement learning and human-computer interaction, applied in the field of human-computer interaction, can solve problems such as not being well applicable and not being able to improve user transfer entrustment

Pending Publication Date: 2022-04-29

贝壳找房网(北京)信息技术有限公司

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

But for the real estate field, chatbots are required to be able to guide users to entrust, and the pipeline model is not a model trained for specific purposes, so it cannot increase the probability of users entrusting

Therefore, for the chatbots used in the real estate field to achieve specific purposes, the task-based pipeline model is not well applicable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

example 1

[0075] In Example 1, the above-mentioned second model may be a ranking model.

[0076] Exemplarily, the above step 103 may include the following steps 103a1 to 103a4:

[0077] Step 103a1: During the simulated instant messaging interaction process between the ranking model and the first model, the ranking model selects from the set of candidate replies that are related to the first content based on the first content output by the first model The first reply content with the highest context relevance.

[0078] Step 103a2: Screen out the second interactive content whose similarity to the first interactive content corresponding to the current interactive process satisfies a preset similarity from the retrieval library.

[0079] Wherein, the first interactive content includes the first reply content.

[0080] Step 103a3, performing feature extraction on each interactive content in the third interactive content, and concatenating the obtained feature vectors of each interactive co...

example 2

[0095] In Example 2, the above-mentioned second model may be a generative model.

[0096] Specifically, the step of constructing the second model in the above step 102 may include the following step 102b:

[0097] Step 102b, using the target sample set as a training sample to pre-train the second GPT model, and obtain the second model.

[0098] Wherein, each sample in the training samples of the second GPT model includes first object information and scene information; the first object information is used to indicate the first object corresponding to the interactive content of the sample; the scene information Used to indicate the application scenario to which the interactive content of the sample belongs.

[0099] Exemplarily, the above-mentioned second GPT model can generate reply content in the following format:

[0100] [CLS][city_110000][agentId_12095][action_price] This is 500,000 yuan.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a man-machine interaction training method and device based on a reinforcement learning strategy. The method comprises the steps that a first model obtained by training with a target sample set as a training sample is acquired; the target sample set comprises interaction contents of a plurality of interaction processes; constructing a second model, and simulating an instant messaging interaction process by using the second model and the first model; in the interaction process of the second model and the first model, the second model outputs reply content, and parameters of the second model are adjusted based on the influence degree of the reply content output by the second model on evaluation indexes of the interaction process; determining the second model after parameter optimization as a target model; wherein the evaluation index is used for indicating the probability that the interaction process can realize a preset target.

Description

technical field [0001] The present application relates to the field of human-computer interaction, in particular to a human-computer interaction training method and device based on a reinforcement learning strategy. Background technique [0002] In order to improve the quality of service to users and reduce the cost of manual services, the platform has set up chatbots before providing manual services to users. Chatbots can provide users with the necessary basic services and solve some of their problems. When chatbots cannot solve the problems raised by users, or have completed the current stage of communication and need to switch to the next stage of communication, they will turn to manual services. [0003] In related technologies, most chatbots use a task-based pipeline model. The pipeline model can solve a problem raised by a user and ask the user about the problem to obtain necessary information for solving the problem. But for the real estate field, chatbots are requir...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/9032G06F16/906G06N20/00

CPCG06F16/90332G06F16/906G06N20/00

Inventor 王文彬冯伟

Owner 贝壳找房网(北京)信息技术有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Man-machine interaction training method and device based on reinforcement learning strategy

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

example 1

example 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology