Reinforcement learning method for educating man-machine interaction collaborative robot
A technology of intensive learning and human-computer interaction, applied to manipulators, program-controlled manipulators, manufacturing tools, etc., can solve the problems of long learning cycle, difficult automatic learning process, and difficult direct use of robots, so as to increase hardware costs and facilitate promotion The effect of improving the value and motion precision
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0031] A reinforcement learning method for educational human-computer interaction collaborative robots, the learning method is as follows:
[0032] S1: Set X k Indicates the joint angular velocity and angular velocity state of the robot system, U k Indicates control input, r k Indicates the reference trajectory, and the setting control is as follows:
[0033]
[0034] Among them, K b Indicates the feedback control gain, K f Indicates the feed-forward control gain, K is K b 、K f collection of
[0035] S2: According to the Bellman optimality principle, recursively write the value function V as:
[0036] V(X k ,r k ) = c k +V(X k+1 ,r k+1 ),
[0037] An annotation function is defined based on a value function as follows:
[0038]
[0039] S3: Export the reinforcement learning method, the first step: initialize δ, H, K, and import the training scene; the second step: import the Actor-critic algorithm; the third step: repeat the loop calculation on the training ...
Embodiment 2
[0041] In S3 based learning methods:
[0042] The Actor-critic algorithm detects the system state of the robot, and uses the Critic update to update the control system algorithm, where the recursive execution equation is as follows:
[0043] Q(X k , r k , u k )-Q v (X k+1 , r k+1 , u k+1 ) = c k +V(X k+1 , r k+1 )-[c k+1 +V(X k+2 , r k+2 )].
[0044] After Critic is updated, the control system parameter equation is given by: updated to:
[0045] [H ux h ur ]+H ru K 2 ,
Embodiment 3
[0047] Critic update After the control system algorithm is updated, the robot control system is updated and then the actor is updated. The actor update direction is the maximization of the critic function:
[0048] K=arg u maxQ v (X k , r k , u k ),
[0049] Following the idea of the deterministic policy gradient algorithm, the policy improvement algorithm is expressed as:
[0050]
[0051] in, is the gradient of the critic relative to the control strategy, is the expected function of the gradient, a is the learning rate;
[0052] The historical trajectory and final trajectory of the end effector during the learning process of the robot. With the learning, the motion performance of the robot is continuously optimized. Finally, after 10 learning times and 5 minutes of learning time, the robot can complete the learning task. The motion accuracy of the robot It can be increased by 50%, so as to improve the performance of the robot without increasing the hardware cos...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


