Man-machine interaction training method and device based on reinforcement learning strategy
A technology of reinforcement learning and human-computer interaction, applied in the field of human-computer interaction, can solve problems such as not being well applicable and not being able to improve user transfer entrustment
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
example 1
[0075] In Example 1, the above-mentioned second model may be a ranking model.
[0076] Exemplarily, the above step 103 may include the following steps 103a1 to 103a4:
[0077] Step 103a1: During the simulated instant messaging interaction process between the ranking model and the first model, the ranking model selects from the set of candidate replies that are related to the first content based on the first content output by the first model The first reply content with the highest context relevance.
[0078] Step 103a2: Screen out the second interactive content whose similarity to the first interactive content corresponding to the current interactive process satisfies a preset similarity from the retrieval library.
[0079] Wherein, the first interactive content includes the first reply content.
[0080] Step 103a3, performing feature extraction on each interactive content in the third interactive content, and concatenating the obtained feature vectors of each interactive co...
example 2
[0095] In Example 2, the above-mentioned second model may be a generative model.
[0096] Specifically, the step of constructing the second model in the above step 102 may include the following step 102b:
[0097] Step 102b, using the target sample set as a training sample to pre-train the second GPT model, and obtain the second model.
[0098] Wherein, each sample in the training samples of the second GPT model includes first object information and scene information; the first object information is used to indicate the first object corresponding to the interactive content of the sample; the scene information Used to indicate the application scenario to which the interactive content of the sample belongs.
[0099] Exemplarily, the above-mentioned second GPT model can generate reply content in the following format:
[0100] [CLS][city_110000][agentId_12095][action_price] This is 500,000 yuan.
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More - R&D
- Intellectual Property
- Life Sciences
- Materials
- Tech Scout
- Unparalleled Data Quality
- Higher Quality Content
- 60% Fewer Hallucinations
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com



