Reinforcement learning method for educating man-machine interaction collaborative robot

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of intensive learning and human-computer interaction, applied to manipulators, program-controlled manipulators, manufacturing tools, etc., can solve the problems of long learning cycle, difficult automatic learning process, and difficult direct use of robots, so as to increase hardware costs and facilitate promotion The effect of improving the value and motion precision

Pending Publication Date: 2022-05-10

朋尼奥(武汉)科技有限公司

View PDF1 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] To sum up, the existing robot learning methods have a long learning cycle, and the learning process is difficult to be fully automated.

Therefore, it is difficult to use directly on the robot

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0031] A reinforcement learning method for educational human-computer interaction collaborative robots, the learning method is as follows:

[0032] S1: Set X k Indicates the joint angular velocity and angular velocity state of the robot system, U k Indicates control input, r k Indicates the reference trajectory, and the setting control is as follows:

[0033]

[0034] Among them, K b Indicates the feedback control gain, K f Indicates the feed-forward control gain, K is K b 、K f collection of

[0035] S2: According to the Bellman optimality principle, recursively write the value function V as:

[0036] V(X k ,r k ) = c k +V(X k+1 ,r k+1 ),

[0037] An annotation function is defined based on a value function as follows:

[0038]

[0039] S3: Export the reinforcement learning method, the first step: initialize δ, H, K, and import the training scene; the second step: import the Actor-critic algorithm; the third step: repeat the loop calculation on the training ...

Embodiment 2

[0041] In S3 based learning methods:

[0042] The Actor-critic algorithm detects the system state of the robot, and uses the Critic update to update the control system algorithm, where the recursive execution equation is as follows:

[0043] Q(X k , r k , u k )-Q v (X k+1 , r k+1 , u k+1 ) = c k +V(X k+1 , r k+1 )-[c k+1 +V(X k+2 , r k+2 )].

[0044] After Critic is updated, the control system parameter equation is given by: updated to:

[0045] [H ux h ur ]+H ru K 2 ,

Embodiment 3

[0047] Critic update After the control system algorithm is updated, the robot control system is updated and then the actor is updated. The actor update direction is the maximization of the critic function:

[0048] K=arg u maxQ v (X k , r k , u k ),

[0049] Following the idea of the deterministic policy gradient algorithm, the policy improvement algorithm is expressed as:

[0050]

[0051] in, is the gradient of the critic relative to the control strategy, is the expected function of the gradient, a is the learning rate;

[0052] The historical trajectory and final trajectory of the end effector during the learning process of the robot. With the learning, the motion performance of the robot is continuously optimized. Finally, after 10 learning times and 5 minutes of learning time, the robot can complete the learning task. The motion accuracy of the robot It can be increased by 50%, so as to improve the performance of the robot without increasing the hardware cos...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a reinforcement learning method for educating a man-machine interaction collaborative robot, and the learning method comprises the following steps: S1, setting Xk to represent the joint angular velocity and angular velocity state of a robot system, Uk to represent control input, and rk to represent a reference trajectory; s2: according to a Bellman optimality principle, recursively writing a value function V as V (Xk, rk) = ck + V (Xk + 1, rk + 1), and defining an annotation function based on the value function as follows: S3: exporting a historical track and a final track of an end effector of a reinforcement learning method and a robot in a learning process, continuously optimizing the motion performance of the robot along with learning, and finally, through 10 times of learning and 5 minutes of learning time, obtaining a final track of the end effector of the robot; and the robot can complete a learning task. According to the invention, automatic learning of the robot without manual intervention is realized, the learning target can be completed in a short time and with low exploration frequency, and the robot has wide popularization value.

Description

technical field [0001] The invention relates to the technical field of human-computer interaction robots, in particular to a reinforcement learning method for educational human-computer interaction collaborative robots. Background technique [0002] With the rapid advancement of science and technology, robot technology has also developed rapidly. Today, robot applications have been widely expanded in different fields of people's daily life and work. Reinforcement learning refers to exploring and learning task-oriented tasks by letting robots move on their own. The controller, the learning process does not need manual intervention, so as to realize the automation of learning. [0003] After a massive search, the existing technology was found. The publication number is: CN112702423A, which discloses a robot learning system based on the interactive entertainment mode of the Internet of Things, which belongs to the field of artificial intelligence technology. The robot learning ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): B25J9/16

CPCB25J9/163B25J9/1664

Inventor 范鹏易凡袁萌

Owner 朋尼奥(武汉)科技有限公司

Reinforcement learning method for educating man-machine interaction collaborative robot

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology