Training method and device of intelligent agent, equipment, medium and program product

CN122197949APending Publication Date: 2026-06-12SHANGHAI TASHI ZHIHANG TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: SHANGHAI TASHI ZHIHANG TECHNOLOGY CO LTD
Filing Date: 2026-05-14
Publication Date: 2026-06-12

AI Technical Summary

⚠Technical Problem

In existing technologies, when training an agent based on teaching trajectory information, it is easily affected by differences in teaching data quality, state distribution, and time series length, resulting in low training efficiency and poor results. In particular, it is difficult to learn stable operating strategies in long-term, multi-stage tasks such as folding flexible objects.

⚗Method used

By acquiring teaching trajectory information, the value assessment model is used to evaluate the value of each frame of teaching image, obtain the corresponding value score, and use it as training data. Combined with the loss function, the agent is trained to strengthen the learning of key steps, weaken the interference of redundant information, and improve the stability and efficiency of training.

🎯Benefits of technology

It significantly improves the training efficiency and policy stability of intelligent agents, enhances the execution accuracy and generalization ability of complex tasks, and solves the problems of large differences in teaching data and difficulty in model learning, especially in the manipulation of flexible objects.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122197949A_ABST

Patent Text Reader

Abstract

The application relates to the technical field of artificial intelligence, in particular to a method and device for training an agent, a medium and a program product. The method comprises: obtaining teaching trajectory information of a first task, the teaching trajectory information comprising multiple teaching images; inputting the teaching trajectory information into a value evaluation model to obtain a value score corresponding to each teaching image; using the teaching trajectory information and the value score as training data to train a first agent to be trained, and obtaining a second agent capable of executing the first task. The value evaluation model is trained based on sample teaching trajectory information and a label value score related to a remaining task progress. The application realizes differentiated use of teaching data through frame-by-frame value quantification, reduces the influence of teaching data differences on agent training, enables the agent to learn a more stable operation strategy, and improves the training efficiency and training effect of the agent.

Need to check novelty before this filing date? Find Prior Art