A quadruped robot three-legged walking gait motion control method

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By combining teacher and student strategy networks, the robot autonomously identifies the damaged leg and optimizes joint control, solving the problem of stable tripedal motion in a quadruped robot when one leg is damaged, and achieving stable tripedal motion control under special working conditions.

CN120363183BActive Publication Date: 2026-06-26TONGJI UNIV

View PDF 4 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: TONGJI UNIV
Filing Date: 2025-04-10
Publication Date: 2026-06-26

Application Information

Patent Timeline

10 Apr 2025

Application

26 Jun 2026

Publication

CN120363183B

IPC: B25J9/16; B62D57/032; G06N3/045; G06N3/0442; G06N3/0499; G06N3/084; G06N3/096; G06N3/092; G06N3/09

AI Tagging

Technology Topics

Physical medicine and rehabilitation Simulation

Technical Efficacy Phrases

easy to controlGood prior trajectory

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

An electrode piece buffer transfer device and an electrode piece processing system
CN224410677UAchieve efficient batch transmissionImprove cooperationControl engineering Transportation technology
基于低空风险有向网络推演的飞行安全控制方法及系统
CN121999650BAchieve systematic characterizationeasy to control
Automobile steering column limiting device
CN224447870Uavoid steel contactImprove comfort Steering wheel Steering column
Cup with sunken lid
CN224483595Usmall diameteravoid burns Heat conservation Water drinking
Control method, electronic device, vehicle, storage medium, and program product
CN122379566Aincrease flexibility Improve mobility Centre of rotation Control theory

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing quadruped robots struggle to achieve stable tripedal movement when one leg is damaged. Current methods suffer from high modeling complexity and poor adaptability. Learning-based methods rely on large amounts of data and do not fully consider mechanical principles, resulting in limited stability and efficiency.

Method used

By employing a framework of teacher and student policy networks, combined with prior knowledge of mechanics and reinforcement learning, the system autonomously identifies the damaged leg, generates a reference foot position, and optimizes joint control to achieve stable tripedal movement.

Benefits of technology

In the event of a single leg injury, the quadruped robot can autonomously identify the damaged leg and achieve stable tripedal movement, improving the stability and adaptability of movement, reducing data dependence, and enhancing the real-time performance and effectiveness of control.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN120363183B_ABST

Patent Text Reader

Abstract

The present application belongs to the technical field of robot control, and particularly relates to a quadruped robot three-foot walking gait motion control method, steps of which comprise: in a single-leg damage scene, acquiring gait information of three-foot walking of the quadruped robot; designing a reference foot end position generator to generate a reference foot end position, and performing real-time optimization on the reference foot end position through a teacher strategy network; controlling the quadruped robot based on the optimized foot end reference position, and performing interactive training on the teacher strategy network by using motion data of the quadruped robot under different three-foot walking gaits; training a student strategy network by using a supervised learning method to approximate the teacher strategy network; and loading the student strategy network on a physical robot to output corresponding joint control commands of the quadruped robot. Compared with the prior art, the present application can realize stable three-foot motion of the robot under any single-leg damage condition, and provides key technical support for three-foot motion control of the quadruped robot under special working conditions.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of robot control technology, and in particular relates to a method for generating motion in a quadruped robot that can autonomously identify a damaged leg and achieve stable tripedal movement. Background Technology

[0002] Quadruped robots have garnered widespread attention for their excellent terrain adaptability. However, quadruped robots require special gait patterns in specific working conditions. For example, in both military and civilian applications, quadruped robots may face situations where one leg is damaged, and manual intervention and repair may not be timely or effective. In such cases, the quadruped robot needs to utilize its remaining three legs to complete the task. Therefore, the tripedal locomotion of quadruped robots is a significant challenge in the field of quadruped robot motion control.

[0003] Existing quadruped robot motion control methods mainly include model-based and learning-based methods. In cases of single-leg damage, model-based methods typically rely on modeling different damage scenarios separately to achieve tripedal motion control. This approach is not only highly complex in modeling but also lacks adaptability, making it difficult to generalize to various damage scenarios. On the other hand, learning-based methods train control strategies through data-driven approaches, as proposed in some patents, such as CN202410971117.8 and CN202411048913.0, which disclose learning-based tripedal motion control methods. Existing learning methods often rely on large amounts of data for training and do not fully consider the mechanical principles of tripedal motion, leading to a data-dependent training process and limitations in stability and efficiency in practical applications. Therefore, there is an urgent need for a quadruped robot control method that can reduce data dependence while integrating prior mechanical knowledge and adaptively achieving stable tripedal motion. Summary of the Invention

[0004] This invention aims to address the challenge of achieving stable tripedal movement in quadruped robots when one leg is damaged in the prior art. It provides a method that integrates prior mechanical knowledge and learning, enabling quadruped robots to autonomously identify the damaged leg and achieve stable tripedal movement.

[0005] The objective of this invention can be achieved through the following technical solutions:

[0006] A method for controlling the gait motion of a quadruped robot walking on a tripedal gait, comprising the following steps:

[0007] To obtain gait information of a quadruped robot walking in a tripedal configuration when one leg is damaged;

[0008] A reference foot position is generated using a foot reference position generator, and the reference foot position is optimized in real time using a teacher policy network.

[0009] The quadruped robot is controlled based on the optimized foot reference position, and the teacher policy network is interactively trained by the reinforcement learning algorithm using the motion data of the quadruped robot under different tripod walking gaits.

[0010] Supervised learning methods are used to train student policy networks to approximate teacher policy networks;

[0011] A pre-trained student policy network is deployed on a physical robot to collect historical state information of the quadruped robot, input it into the student policy network, and output the corresponding joint control commands for the quadruped robot.

[0012] As a preferred technical solution, the input of the teacher strategy network is the robot's state information at multiple time points, and the output is a foot reference position compensation command; the input of the student strategy network is the robot's state information at multiple time points, and the output is a joint motor position command.

[0013] As a preferred technical solution, the robot's state information input by the teacher strategy network and the student strategy network at each moment includes: robot speed, pitch, roll, yaw angle and angular velocity, contact state of the four feet with the ground, and angle and angular acceleration of the robot's joint motors.

[0014] As a preferred technical solution, the teacher policy network and the student policy network adopt the same recurrent neural network structure. The recurrent neural network adopts a combined network structure of GRU and MLP. The GRU network contains hidden layers for extracting temporal features in the observation sequence. The MLP part consists of three fully connected layers and outputs control commands for each joint of the quadruped robot.

[0015] As a preferred technical solution, the specific process of interactively training the teacher policy network of the reinforcement learning is as follows:

[0016] To obtain gait information of a quadruped robot walking in a tripedal configuration when one leg is damaged;

[0017] The joint angle information of the quadruped robot in the tripod gait at the beginning of each phase is input into the foot reference position generation module. The foot reference position in the future phase time is output by the foot reference position generation formula of the support phase and the swing phase.

[0018] The robot's state information at multiple moments is input into the teacher's policy network, and the foot reference position compensation command is output.

[0019] The output of the teacher policy network is used to compensate for the foot reference position, and the tripedal movement of the quadruped robot is controlled based on the compensated foot coordinates.

[0020] The teacher's policy network was interactively trained using motion data from a quadruped robot under different terrains and different tripedal walking gaits.

[0021] As a preferred technical solution, the specific process by which the foot reference position generation module generates the foot reference position in the future phase time is as follows:

[0022] For the three supporting legs of a quadruped robot in a triped gait, at the beginning of each phase time, it is set that two supporting legs will be in the supporting phase and one supporting leg will be in the swinging phase in the future phase time; at the end of the future phase time, the supporting triangle is formed by the projection points of the foot ends of the three supporting legs onto the horizontal plane.

[0023] The movement of the supporting leg in the supporting phase is set to the backward movement of the foot end. At the end of the future phase time, the coordinates of the two reference positions of the foot end of the supporting leg in the supporting phase projected onto the horizontal plane are set to fixed values based on engineering experience.

[0024] With the objective of maximizing the minimum distance from the quadruped robot's center of gravity projected onto the horizontal plane to the three sides of the supporting triangle, we solve for the optimal footing point, which is the reference position of the supporting leg's foot at the end of the swing phase at the end of the phase time. The objective function is as follows:

[0025]

[0026] In the formula, d1, d2, and d3 are the distances from the centroid projection point to the three sides of the supporting triangle at the end of the future phase time, respectively.

[0027] The reference positions of the three supporting leg feet at the end of the future phase time were obtained by using an optimization algorithm.

[0028] Based on the foot reference position at the end of the future phase time, a trajectory is generated within the corresponding phase time. The foot trajectory of the supporting leg in the support phase is a straight line, and the foot trajectory of the supporting leg in the swing phase is a sine trajectory. Points are taken on the reference trajectory at a set frequency to obtain a series of optimal foot reference positions in the future phase time.

[0029] As a preferred technical solution, when solving for the optimal footing point of the swing phase support leg, the following constraints must be met:

[0030] The quadruped robot's center of gravity projection point (x) g ,y g Within the support triangle formed by the projection points (x1, y1), (x2, y2), and (x3, y3) at the foot ends of the three supporting legs:

[0031]

[0032] ω=1-uv

[0033] u>0, v>0, ω>0

[0034] Based on the mechanical structure parameters of the quadruped robot, the range of optimal footholds of the legs at the end of the phase time during the swing phase is defined.

[0035] As a preferred technical solution, during the training process of the teacher strategy network: when resetting the robot state in each training round, the quadruped robot raises any one leg to a height exceeding a specified height to simulate the triped gait of the quadruped robot; using the damaged leg and the effective supporting leg as known prior information, the joint data of the three supporting legs and the joint data of the damaged leg in the input joint position information are determined, and the foot reference position is generated using the joint data of the effective supporting leg.

[0036] As a preferred technical solution, the reinforcement learning algorithm adopts a near-term strategy to optimize the PPO algorithm, and sets the reward function to include: the quadruped robot's forward movement reward, the quadruped robot's stability reward, the quadruped robot's maintenance of its direction of movement reward, and the triped walking stability reward.

[0037] As a preferred technical solution, the reward function is set as follows:

[0038]

[0039] In the formula, ω1, ω2, ω3, ω4, ω5, ω6, and ω7 are the coefficients of each reward item, with ω1, ω2, and ω7 being positive and ω3, ω4, ω5, and ω6 being negative; v x v is the robot's forward velocity. y ω is the robot's lateral velocity. z v is the robot's yaw rate. z The vertical vibration velocity of the fuselage; h is the fuselage height, g z It is the gravity vector projected onto the robot's body coordinate system; ω x It is the robot's rolling angular velocity, ω y It is the robot's pitch angular velocity, d min It is the minimum distance from the centroid projection point to the three sides of the supporting triangle. When the centroid projection point is inside the supporting triangle, d min When d is positive and the center of gravity projection point is outside the supporting triangle, min Negative, For v x ,v y ,ω z ,v z The expected value of h.

[0040] Compared with the prior art, the present invention has the following beneficial effects:

[0041] 1) This invention incorporates the mechanical principles of stable tripedal motion to design a reference foot position generator, which generates a series of reference foot positions. Reinforcement learning is then introduced into the control framework to optimize these reference foot positions in real time. The optimized foot positions are converted into joint motor angles via inverse kinematics, enabling control of the quadruped robot. A teacher-student policy network structure based on recurrent neural networks allows the quadruped robot to implicitly and autonomously identify damaged legs and implement corresponding quadruped robot control. This enables the robot to achieve stable tripedal motion even with damage to any single leg, providing key technical support for tripedal motion control of quadruped robots under special working conditions.

[0042] 2) This invention combines the mechanical principles of tripedal stability motion with optimization algorithms to design a formula for generating reference foot positions for the tripedal motion of a quadruped robot. At the beginning of each phase time, two legs are in the support phase and one leg is in the swing phase during the future phase time. By setting the reference foot position for the support phase and optimizing the solution for the optimal foothold of the support leg in the swing phase, and considering the mechanical principles of tripedal stability motion during the solution process, the robot's center of gravity projection point is ensured to be within the support triangle, and the minimum distance from the center of gravity projection point to the three sides is maximized. This yields a set of foot reference positions that fully consider tripedal stability, providing a good prior trajectory for the entire training framework and reducing training time.

[0043] 3) The reinforcement learning reward function set in this invention includes a forward movement reward, a movement direction maintenance reward, and a movement direction maintenance reward. In addition, a stability reward term for the triped walking gait of the quadruped robot is designed based on the mechanical principle of triped stable movement. This ensures that the center of gravity projection point on the horizontal plane is within the support triangle formed by the projection points of the three feet, thus ensuring the stability of the robot body while walking forward and improving the stability of triped movement. Attached Figure Description

[0044] Figure 1 The present invention provides an overall flowchart of a method for generating gait motion of a quadruped robot walking on a tripedal gait. Detailed Implementation

[0045] The present invention will now be described in detail with reference to the accompanying drawings and specific embodiments. These embodiments are based on the technical solution of the present invention and provide detailed implementation methods and specific operating procedures. However, the scope of protection of the present invention is not limited to the following embodiments.

[0046] Example 1

[0047] This invention proposes a motion control method for achieving stable tripedal walking in quadruped robots when one leg is damaged. Combining the mechanical principles of stable tripedal motion and optimization algorithms, this invention designs a foot reference position generator to generate a series of foot reference positions for the tripedal walking gait. Reinforcement learning is introduced into the control framework to optimize the foot reference positions in real time. The optimized foot positions are converted into joint motor angles through inverse kinematics, thereby controlling the quadruped robot. This invention employs a teacher-student policy framework. The student policy approximates the teacher policy network through supervised learning, enabling the deployment of the policy network on a real quadruped robot. Both the teacher and student policies use a policy network based on a recurrent neural network (GRU), which implicitly and autonomously identifies the damaged leg, achieving stable tripedal movement in a real quadruped robot even with any leg damaged. This invention enables stable tripedal movement in robots with any single leg damaged, providing key technical support for tripedal motion control of quadruped robots under special working conditions.

[0048] like Figure 1 As shown, the method proposed in this invention specifically includes the following steps:

[0049] S1. Build a quadruped robot model and terrain environment in the simulation environment.

[0050] Specifically, in this embodiment, the simulation environment is Pybullet, and the quadruped robot model is Unitree A1. The robot has a total of 12 joint motors, with three joint motors for each leg, corresponding to the body joint, hip joint, and knee joint respectively. In order to improve the robot's adaptability to real ground, the terrain environment settings mainly include flat ground and slightly rugged terrain.

[0051] S2. Design a simulation mechanism for the tripod gait of a quadruped robot when one leg is damaged: raise any one leg of the quadruped robot above a specified height to simulate the tripod gait of the quadruped robot. Use this mechanism to train the quadruped robot's tripod movement in the simulation environment.

[0052] Specifically, in this embodiment, when resetting the robot's state in each round, one leg is randomly raised to a height exceeding 8cm to simulate tripedal movement under single-leg injury conditions; the robot's initial standing height is 28cm; raising any leg in the simulation is known prior information, therefore, the three effective supporting legs are known during the teacher policy network training process. During teacher policy network training, the injured leg and the effective supporting legs, as known prior information, are used to determine the joint data of the three supporting legs and the joint data of the injured leg in the input 12-dimensional joint position information; the nine joint data of the effective supporting legs are used to generate the foot reference position.

[0053] S3. A control framework for the tripedal walking gait motion of a quadruped robot is designed based on a reinforcement learning algorithm. The control framework mainly includes the following modules: a foot reference position generation module, a teacher policy based on a recurrent neural network, and a student policy based on a recurrent neural network. The input of the foot reference position generation module is the angle information of the 12 joints of the quadruped robot at the beginning of each phase, and the output is the foot reference position in the next phase. The input of the teacher policy based on the recurrent neural network is the state information of the robot at 10 time points, and the output is a 12-dimensional foot reference position compensation command. The input of the student policy based on the recurrent neural network is the state information of the robot at 10 time points, and the output is the position command of the 12 joint motors. The state information of the robot at each time point includes: robot speed, pitch, roll, yaw angle and angular velocity, contact state of the four feet with the ground, and angle and angular acceleration of the 12 joint motors of the robot.

[0054] The recurrent neural network adopts a combined network structure of GRU and MLP. The GRU network contains one hidden layer with 128 hidden units, which is used to extract temporal features in the observation sequence. The MLP part consists of three fully connected layers with an inter-layer dimension of 128*128*128, and the output is the position command of the 12 joint motors of the robot's four legs.

[0055] S4. Design the foot reference position generation formulas for the support phase and the swing phase in the foot reference position generator, and solve the future foot reference position by gradient descent method.

[0056] Specifically: The three-legged gait in this invention adopts a walking gait. At the beginning of each phase time, two legs are in the support phase and one leg is in the swing phase during the future phase time. The reference positions of the feet (three support points) of the three support legs at the end of the future phase time are projected onto the horizontal plane to obtain three points with coordinates (x1, y1), (x2, y3), and (x3, y3). The coordinates of the projection point of the center of gravity onto the horizontal plane are (x1, y1), (x2, y3), and (x3, y3). g ,y g The supporting phase motion is a backward movement of the feet. After the phase time ends, the reference positions of the feet of the two supporting legs are (x1, y1) and (x2, y2). These positions can be fixed based on engineering experience. The optimal footing point (x3, y3) of the leg in the swing phase at the end of the phase time is solved. The solution for this footing point position needs to satisfy the following condition: the center of gravity projection point (x g ,y g Within the supporting triangle formed by the three foot projection points (x1, y1), (x2, y2), and (x3, y3), the minimum distance from the centroid projection point to the three sides of the supporting triangle is maximized. The solution is obtained using an optimization algorithm.

[0057] The objective function is:

[0058]

[0059] Reference robot state after phase time ends (robot) state Solve for the coordinates of the centroid projection point and the robot state. state The coordinates of the centroid projection point can be obtained by solving for the four foot positions. The formula for solving for the coordinates of the centroid projection point is as follows:

[0060] (x g ,y g ) = f(robot state )

[0061] Calculate the centroid projection point (x) g ,y g The distance from the point to the three sides of the supporting triangle is calculated using the following formula:

[0062]

[0063] The constraints are as follows:

[0064] Constraint 1: Ensure the centroid projection point is 9x g ,y g Within the supporting triangle formed by the three foot projection points (x1, y1), (x2, y2), and (x3, y3), the formula is as follows:

[0065]

[0066] ω=1-uv

[0067] u>0, v>0, ω>0

[0068] Constraint 2: The range of values for (x3, y3) is limited according to the mechanical structural parameters of the quadruped robot, as shown in the following formula:

[0069] x low ≤x3≤x high

[0070] y low ≤y3≤y high

[0071] According to the formula, an optimization algorithm can be used to obtain the reference positions of the three supporting legs at the end of the future phase time. Based on these reference positions, a trajectory is generated for this phase time, where the supporting phase trajectory is a straight line and the swing phase trajectory is a sinusoidal trajectory with a height of 3cm. Points are taken on the reference trajectory at a frequency of 50Hz to obtain a series of optimal foot reference positions for the next phase time.

[0072] S5. The teacher policy network is trained through reinforcement learning. The output of the teacher policy network compensates for the foot reference position. The output of the teacher policy network plus the foot reference position coordinates is the final foot coordinate position. The final foot coordinate position is converted into the joint motor angle through inverse kinematics. Finally, a PD controller is used to realize the tripedal movement of the quadruped robot. Lifting different legs of the quadruped robot will result in different tripedal gaits. The motion data of the quadruped robot under different terrains and different tripedal walking gaits are collected for interactive training to improve the stability of the quadruped robot's tripedal gait. A reward function for tripedal movement is designed to strengthen the robot stability term in the reward function. The reward setting should ensure that the robot moves forward while maintaining body stability. The reward function is set as follows:

[0073]

[0074] In the formula, r t For the total reward items, Rewards for advancement in quadruped robots. For the stability of quadruped robots, Rewards for keeping the quadruped robot in the correct direction of movement. To enhance the stability of triped walking, incentives are provided to ensure that the vehicle remains on a horizontal plane with its center of gravity projection point (x) g ,y g The centroid projection point is located within the supporting triangle formed by the three foot projection points (x1, y1), (x2, y2), and (x3, y3), and the minimum distance from the centroid projection point to the three sides of the supporting triangle is the largest.

[0075]

[0076] In the formula, ω1, ω2, ω3, ω4, ω5, ω6, and ω7 are the coefficients of each reward item, with ω1, ω2, and ω7 being positive and ω3, ω4, ω5, and ω6 being negative. x Let v be the robot's forward velocity. y Let ω be the lateral velocity of the robot. z v is the robot's yaw rate. z The vertical vibration velocity of the fuselage is given by h, where h is the fuselage height and g is the vertical vibration velocity. z It is the gravity vector projected onto the robot's body coordinate system, ω x It is the robot's rolling angular velocity, ω y It is the robot's pitch angular velocity, d min It is the minimum distance from the centroid projection point to the three sides of the supporting triangle. When the centroid projection point is inside the supporting triangle, d min When d is positive and the center of gravity projection point is outside the supporting triangle, min Negative, vx ,v y ,ω z ,v z The expected value of h.

[0077] During training, the observation space and some physical parameters are randomized in the domain, and the trained network becomes the teacher policy network. The reinforcement learning algorithm uses the proximal policy optimization PPO algorithm. To reduce the error between the simulation environment and the real environment, the friction coefficient is randomized in the simulation. To reduce the error between the robot model in the simulation and the real robot, the robot's center of gravity position is randomized in the simulation. To reduce the error caused by the robot's sensor noise, the observation space of the policy network is randomized in the simulation.

[0078] S6. The student network is trained to approximate the teacher policy network using supervised learning. The trained student network is the final network deployed on the robot. This invention uses imitation learning to train the student policy network. The GRU network of the student policy can autonomously identify the damaged leg and realize the corresponding tripedal movement based on the temporal information.

[0079] S7. Deploy a student policy network on the physical robot, collect historical state information of the quadruped robot, input it into the student policy, and based on the historical state information, the student policy network can autonomously identify the faulty leg and output the corresponding joint control commands of the quadruped robot to achieve stable control of the quadruped robot's tripod gait.

[0080] This invention proposes a motion control method for achieving stable tripedal walking in quadruped robots when one leg is damaged. Combining the mechanical principles of stable tripedal motion and optimization algorithms, this invention designs a foot reference position generator to generate a series of foot reference positions. Reinforcement learning is introduced into the control framework to optimize these foot reference positions in real time. The optimized foot positions are then converted into joint motor angles through inverse kinematics, enabling control of the quadruped robot. This invention employs a teacher-student policy framework. The student policy network approximates the teacher policy network through supervised learning, enabling deployment of the policy network on a real quadruped robot. Both the teacher and student policies utilize a recurrent neural network (GRU)-based policy network, which implicitly and autonomously identifies the damaged leg, achieving stable tripedal movement even with any leg damage on a real quadruped robot. This invention enables stable tripedal movement in robots with any single leg failure, providing crucial technical support for tripedal motion control of quadruped robots under special working conditions.

[0081] The preferred embodiments of the present invention have been described in detail above. It should be understood that those skilled in the art can make numerous modifications and variations based on the concept of the present invention without creative effort. Therefore, all technical solutions that can be obtained by those skilled in the art based on the concept of the present invention through logical analysis, reasoning, or limited experimentation on the basis of existing technology should be within the scope of protection defined by the claims.

Claims

1. A method for controlling the gait motion of a quadruped robot walking on a tripedal gait, characterized by the following steps: include: To obtain gait information of a quadruped robot walking in a tripedal configuration when one leg is damaged; A reference foot position is generated using a foot reference position generator, and the reference foot position is optimized in real time using a teacher policy network. The quadruped robot is controlled based on the optimized foot reference position, and the teacher policy network is interactively trained by the reinforcement learning algorithm using the motion data of the quadruped robot under different tripod walking gaits. Supervised learning methods are used to train student policy networks to approximate teacher policy networks; A pre-trained student policy network is deployed on a physical robot to collect historical state information of the quadruped robot, input it into the student policy network, and output the corresponding joint control commands for the quadruped robot.

2. The method for controlling the gait of a quadruped robot as described in claim 1, characterized in that, The teacher strategy network takes the robot's state information at multiple time points as input and outputs a foot reference position compensation command; the student strategy network takes the robot's state information at multiple time points as input and outputs a joint motor position command.

3. The method for controlling the gait of a quadruped robot as described in claim 2, characterized in that, The robot's state information input into the teacher's strategy network and the student's strategy network at each moment includes: robot speed, pitch, roll, yaw angle and angular velocity, contact state of the four feet with the ground, and angle and angular acceleration of the robot's joint motors.

4. The method for controlling the gait of a quadruped robot as described in claim 2, characterized in that, The teacher policy network and the student policy network adopt the same recurrent neural network structure. The recurrent neural network adopts a combination network structure of GRU and MLP. The GRU network contains hidden layers to extract temporal features in the observation sequence; the MLP part consists of three fully connected layers to output control commands for each joint of the quadruped robot.

5. The method for controlling the gait of a quadruped robot as described in claim 1, characterized in that, The specific process of interactively training the teacher strategy network for reinforcement learning is as follows: To obtain gait information of a quadruped robot walking in a tripedal configuration when one leg is damaged; Input the joint angle information of the quadruped robot in the tripod gait at the beginning of each phase into the foot reference position generator. Then, through the foot reference position generation formula of the support phase and the swing phase, output the foot reference position in the future phase time. The robot's state information at multiple moments is input into the teacher's policy network, and the foot reference position compensation command is output. The output of the teacher policy network is used to compensate for the foot reference position, and the tripedal movement of the quadruped robot is controlled based on the compensated foot coordinates. The teacher's policy network was interactively trained using motion data from a quadruped robot under different terrains and different tripedal walking gaits.

6. The method for controlling the gait of a quadruped robot as described in claim 5, characterized in that, The specific process by which the foot reference position generator generates the foot reference position for the future phase time is as follows: For the three supporting legs of a tripod gait quadruped robot, at the beginning of each phase time, it is set that two supporting legs will be in the supporting phase and one supporting leg will be in the swinging phase during the future phase time; at the end of the future phase time, the supporting triangle is formed by the projection points of the foot ends of the three supporting legs onto the horizontal plane. The movement of the supporting leg in the supporting phase is set to the backward movement of the foot end. At the end of the future phase time, the coordinates of the two reference positions of the foot end of the supporting leg in the supporting phase projected onto the horizontal plane are set to fixed values based on engineering experience. With the objective of maximizing the minimum distance from the quadruped robot's center of gravity projected onto the horizontal plane to the three sides of the supporting triangle, we solve for the optimal footing point, which is the reference position of the supporting leg's foot at the end of the swing phase at the end of the phase time. The objective function is as follows: In the formula, These are the distances from the centroid projection point to the three sides of the supporting triangle at the end of the future phase time; The reference positions of the three supporting leg feet at the end of the future phase time were obtained by using an optimization algorithm. Based on the foot reference position at the end of the future phase time, a trajectory is generated within the corresponding phase time. The foot trajectory of the supporting leg in the support phase is a straight line, and the foot trajectory of the supporting leg in the swing phase is a sine trajectory. Points are taken on the reference trajectory at a set frequency to obtain a series of optimal foot reference positions in the future phase time.

7. The method for controlling the gait of a quadruped robot as described in claim 6, characterized in that, When determining the optimal footing point of the supporting leg during the swing phase, the following constraints must be satisfied: The quadruped robot's center of gravity projection point Projection points at the feet of the three supporting legs , , Within the supporting triangle formed: Based on the mechanical structure parameters of the quadruped robot, the range of optimal footholds of the legs at the end of the phase time during the swing phase is defined.

8. The method for controlling the gait of a quadruped robot as described in claim 5, characterized in that, During the training process of the teacher policy network: when resetting the robot state in each training round, the quadruped robot raises any one leg to a height exceeding a specified height to simulate the triped gait of the quadruped robot; using the damaged leg and the effective supporting leg as known prior information, the joint data of the three supporting legs and the joint data of the damaged leg in the input joint position information are determined, and the foot reference position is generated using the joint data of the effective supporting leg.

9. A method for controlling the gait of a quadruped robot as described in claim 5, characterized in that, The reinforcement learning algorithm uses a proximal strategy to optimize the PPO algorithm. The reward function is set to include: the quadruped robot's forward movement reward, the quadruped robot's stability reward, the quadruped robot's maintenance of its direction of motion reward, and the triped walking stability reward.

10. A method for controlling the gait of a quadruped robot as described in claim 9, characterized in that, The reward function is set as follows: In the formula, Here are the coefficients for each reward item. For positive, It is a negative number; For the robot's forward velocity, This represents the robot's lateral velocity. This represents the robot's yaw rate. The vibration speed of the fuselage from top to bottom; For fuselage height, It is the gravity vector projected onto the robot's body coordinate system; It is the robot's rolling angular velocity. It is the robot's pitch angular velocity. It is the minimum distance from the centroid projection point to the three sides of the supporting triangle. When the centroid projection point is inside the supporting triangle, When the center of gravity is positive and its projection point is outside the supporting triangle, Negative, for The expected value.