Training method and device of intelligent agent, equipment, medium and program product
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI TASHI ZHIHANG TECHNOLOGY CO LTD
- Filing Date
- 2026-05-14
- Publication Date
- 2026-06-12
AI Technical Summary
In existing technologies, when training an agent based on teaching trajectory information, it is easily affected by differences in teaching data quality, state distribution, and time series length, resulting in low training efficiency and poor results. In particular, it is difficult to learn stable operating strategies in long-term, multi-stage tasks such as folding flexible objects.
By acquiring teaching trajectory information, the value assessment model is used to evaluate the value of each frame of teaching image, obtain the corresponding value score, and use it as training data. Combined with the loss function, the agent is trained to strengthen the learning of key steps, weaken the interference of redundant information, and improve the stability and efficiency of training.
It significantly improves the training efficiency and policy stability of intelligent agents, enhances the execution accuracy and generalization ability of complex tasks, and solves the problems of large differences in teaching data and difficulty in model learning, especially in the manipulation of flexible objects.
Smart Images

Figure CN122197949A_ABST