Optimal control method of gait of humanoid robot based on deep Q network

A humanoid robot and network technology, applied in the direction of adaptive control, general control system, control/regulation system, etc., can solve problems such as gait walking and other complex movements, and achieve the effect of fast and stable walking and increased walking speed

Inactive Publication Date: 2020-02-07
HOHAI UNIV
View PDF3 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The current deep reinforcement learning method can make the robot realize some

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Optimal control method of gait of humanoid robot based on deep Q network
  • Optimal control method of gait of humanoid robot based on deep Q network
  • Optimal control method of gait of humanoid robot based on deep Q network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0042] A gait control method for a humanoid robot based on a deep Q network, comprising:

[0043] Construct the gait model of the humanoid robot to realize the omnidirectional walking of the humanoid robot;

[0044] Obtain the interaction data between the humanoid robot and the environment during the walking process, store it in the memory data pool, and use it to provide training samples; the interaction data is a quadruple (s, a, r, s′), where s represents State parameters, a represents the dynamic parameters of the humanoid robot in state s, r represents the feedback reward value obtained by the humanoid robot in state s when performing action a, and s′ represents the reward value obtained by the humanoid robot after performing action a in state s next state;

[0045] Build a deep Q-network learning architecture, learn and train the deep Q-network based on the training samples of the memory data pool, and obtain the state-action strategy deep Q-network model of the humanoid ...

Embodiment 1-1

[0051] On the basis of Embodiment 1, this embodiment takes the simulation platform as an example to illustrate the gait control and optimization process of the humanoid robot, that is, this embodiment selects the NAO simulation robot as the experimental object, and the RoboCup 3D simulation platform as the experimental environment. During the training process, the gait model parameters and state parameters can be captured directly through the platform to fit the state-action value function generated by the robot walking, and the gait action performed by the current robot is selected through the action selection strategy, and the reward function is generated to update the DQN. It can reduce the problem of falling into local optimum caused by too many robot parameters in the optimization process, improve the walking speed of the robot, and realize the fast and stable walking of the humanoid robot.

[0052] refer to figure 1 and figure 2 As shown, this embodiment also includes,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an optimal control method of a gait of a humanoid robot based on a deep Q network. The method comprises the steps of: constructing a gait model, and obtaining interaction databetween the humanoid robot and the environment during walking for providing training samples; performing learning and training on the deep Q network based on the training samples of a memory data poolto obtain a state-action strategy deep Q network model of the humanoid robot; obtaining state parameters of the humanoid robot in an action environment to serve as the input of the deep Q network model, and obtaining action parameters of the deep Q network model under the current state-action strategy; performing gait control on the humanoid robot by using the constructed gait model and accordingto the action parameters output by the deep Q network model; and achieving the purpose of updating the deep Q network in the training of the deep Q network model by generating an award function. By adopting the optimal control method disclosed by the invention, the walking speed of the humanoid robot can be improved, and the fast and stable walking of the humanoid robot can be realized.

Description

technical field [0001] The invention relates to a gait optimization control method of a humanoid robot based on a deep Q network, and belongs to the technical field of humanoid robots. Background technique [0002] As an important branch of mobile robots, humanoid robots are the most suitable universal mobile and manipulation platforms for working with humans. In all the process of imitating human behavior, the most important thing that the robot should have is its walking function. [0003] A humanoid robot has many degrees of freedom and is a changing mechanical structure while walking. Using the 3D Linear Inverted Pendulum Model (3D-LIPM) gait model to realize the fast walking of the robot requires debugging a large number of gait parameters. However, the traditional manual parameter adjustment method takes a lot of time and may not be able to obtain the optimal value. Currently, genetic algorithms, particle swarm optimization, and reinforcement learning can all optimi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G05B13/04
CPCG05B13/042
Inventor 刘惠义袁雯陶莹刘晓芸
Owner HOHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products