A joint-aware attention mechanism-based damage control method for a hexapod robot
By employing a deep reinforcement learning method based on joint-sensory attention mechanisms, the adaptive motion control problem of hexapod robots under damage conditions is solved, improving training efficiency and task completion rate. This method is applicable to various hexapod robot platforms, especially in disaster relief and high-risk environments.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NAT INNOVATION INST OF DEFENSE TECH PLA ACAD OF MILITARY SCI
- Filing Date
- 2026-04-08
- Publication Date
- 2026-06-19
Smart Images

Figure CN122239752A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of robot control technology, and in particular to a damage control method for a hexapod robot based on a joint-sensing attention mechanism. Background Technology
[0002] The technical problem this invention aims to solve is that hexapod robots are often used in complex and inaccessible extreme environments such as disaster relief and space exploration. However, in extreme environments, robot actuators are prone to locking up and failure, causing traditional control methods to fail. A control method that maintains effective mobility even when part of the robot's actuators is damaged is called damage control.
[0003] Hexapod robots offer greater stability and are widely used in disaster relief and space exploration. However, complex and unpredictable environments increase the likelihood of drive system damage. Damage in these inaccessible environments cannot be repaired immediately, increasing the risk of mission failure and robot lifespan end due to drive system failure hindering normal movement.
[0004] Control methods that enable movement under damaged conditions are called robot damage control. The challenge of robot damage control lies in the fact that damage conditions are not fixed, and the number of strategies increases exponentially with the number of damaged joints. Traditional solutions involve manually designing gait libraries for specific damage conditions; however, this requires enumerating all damage conditions for a single control strategy, resulting in high time and manpower costs. Therefore, in recent years, reinforcement learning-based damage control methods have been developed, possessing multi-form generalized control capabilities and improving strategies through environmental interaction to adapt to various damage scenarios.
[0005] Current reinforcement learning-based damage control primarily focuses on quadruped robots, presenting challenges when transferring it to hexapods. Firstly, there is relatively little research on hexapods, resulting in a performance gap compared to quadrupeds. Secondly, the significantly larger motion space of hexapods greatly increases the problem size, hindering algorithm training and convergence. Some methods employ hierarchical control logic to reduce the problem size. Zhan et al. designed independent control strategies for each leg and used a multi-agent collaborative control approach at the upper level, generating well-coordinated periodic gaits and significantly improving the algorithm's convergence speed. Xu et al. reduced the solution search space using a periodic gait generator and employed meta-reinforcement learning to adjust the gait generator parameters to adapt to different damage scenarios, but this limited the possibility of achieving higher performance through non-periodic gaits. Other methods, inspired by general control strategies, represent morphological context through network structures (such as neural networks and Transformers), improving the effectiveness of algorithm training based on prior morphological information. However, these methods focus on general performance in morphological space, and information loss occurs during domain transformation in damaged space, which impairs their potential to achieve higher performance in damaged space. Summary of the Invention
[0006] To address the problems existing in current technologies, the present invention aims to provide a damage control method for hexapod robots based on a joint-aware attention mechanism. For the issue of hexapod robots failing in complex environments such as disaster relief and space exploration due to actuator damage, an end-to-end control scheme based on deep reinforcement learning is proposed. By employing a joint-dimensional cross-attention mechanism, fusing temporal and spatial information of the joints, and combining it with a specially designed reinforcement learning framework, adaptive motion control of the hexapod robot is achieved even when up to three joints on a single leg are damaged.
[0007] To achieve the above objectives, this invention provides a damage control method for a hexapod robot based on a joint-sensing attention mechanism, which mainly includes the following core components: 1. The temporal information module in the damage control reinforcement learning model obtains the joint angles of the hexapod robot at the current moment. and platform posture The historical moments, their historical angles, and their historical actions; 2. The damage recognition module is based on the current joint angles of each joint of the hexapod robot. and platform posture The error between the robot's actual position and the expected position is analyzed joint by joint. If the error exceeds a set threshold, the joint is judged to be damaged. 3. The spatial information module determines the robot's morphological context and damage context based on the connection relationships and damage types of the damaged joints; and forms a damage mask for the robot's damage relationships. 4. Input the morphological context of each joint and the damage context of the robot into the attention module. The attention module dynamically perceives the joint state, dynamically allocates the influence weight of each joint based on the joint dimension, and outputs the damage control strategy. 5. The motion interpolation module generates control signals based on the damage control strategy to control the movement of the damaged hexapod robot.
[0008] In actual use, the algorithm is divided into a training phase S1 and a deployment phase S2. The training phase S1 precedes the deployment phase. First, a morphologically damaged dataset for controlling model training needs to be constructed, and then the policy training is completed using the damaged dataset. The deployment phase S2 mainly includes power-on self-check, control policy inference, and action interpolation inference.
[0009] During the online training phase of the damage control reinforcement learning model, robot configurations are sampled with replacement and equal probability from the morphological set and simulated training is performed. The morphological context and damage context, representing spatial information, are determined simultaneously, while historical state-action pairs obtained from the environment are used as policy inputs.
[0010] The historical states include: platform position, platform speed, joint position, and joint speed; the action is the target joint position.
[0011] During the deployment phase, the damage identification module analyzes the error between the robot's actual position and the desired position joint by joint to form a damage context. Input is received through the policy network obtained during the training phase to generate motion commands, thereby enabling control of the damaged robot.
[0012] In the creation of the reinforcement learning model for damage control of hexapod robots, the damage control problem is modeled as a Markov decision process with damage scenario context, consisting of six tuples. It means that among them Represents contextual information, where This serves as a morphological context, used to represent joint connection relationships. For corruption context, used to indicate driver corruption conditions, such as Figure 2 As shown; Represents the state space, including the platform state space. and joint state space A represents the action space, Pr represents the state transition probability, γ represents the discount factor, and R represents the reward.
[0013] Reward R is the time step. arrive The sum of cumulative decaying rewards, where H is the time window; the damage control problem is defined as a problem with a given initial state. and context information Optimize a strategy to maximize the expected reward on the morphology set.
[0014] The damage control enhancement model includes five damage scenarios: hip joint damage, hip joint damage, tibial joint damage, hip and tibial joint damage, and total leg damage. Drive damage types include actuator locking and actuator failure. When actuator locking occurs, the joint position is fixed and cannot rotate. Actuator failure results in the joint motor failing to output torque, causing the leg joint to degenerate into a passive joint.
[0015] The joint-aware attention module uses a Transformer pure encoder architecture for its policy network. The computation of joint-aware attention involves querying the q, value v, and key k embedding matrix. The attention module only has a cross-attention layer within the encoder architecture, focusing on the joint dimension. The number of actuators should be matched.
[0016] During the damage identification process, the error estimation of the damage identification module is implemented as follows: maintain a state buffer pool with a length of H time steps, receive the real position sequence of the servo sensor at the past tH time, and compare it one by one with the strategy output position at tH-1 time; set a threshold ε, if the error at each time step is greater than ε, it means that the output of the strategy network has not been executed correctly, and it is judged that the joint is damaged.
[0017] The beneficial effects of this invention are as follows: This invention uses a hexapod robot reinforcement learning framework, improves the training efficiency of the strategy in the driving damage scenario through a carefully designed damage space representation, enhances the potential performance of non-periodic gait through the attention allocation mechanism of the joint dimension, and ensures the training effect on the physical object through the transfer module. It realizes the movement of the physical hexapod robot under the condition that a maximum of three drives of a single leg are damaged, thus solving the damage control problem of hexapod robots.
[0018] This invention overcomes the shortcomings of traditional methods that rely on precise modeling and are limited by specific scenarios. It is applicable to different hexapod robot platforms and has stronger versatility and robustness. Its potential applications include disaster relief and high-risk environment operations, significantly improving the robot's task completion rate and survivability in unpredictable environments. Attached Figure Description
[0019] Figure 1 This is a schematic diagram of the hexapod robot structure and damage conditions used in this invention; Figure 2 This is a schematic diagram of the joint-sensing attention module architecture; Figure 3This is a schematic diagram of the steps of the damage control method for a hexapod robot based on a joint-sensing attention mechanism according to the present invention; Figure 4 yes Figure 2 Schematic diagram of the joint-sensory attention layer. Detailed Implementation
[0020] The technical solution of the present invention will now be clearly and completely described with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0021] In the description of this invention, it should be noted that the terms "center," "upper," "lower," "left," "right," "vertical," "horizontal," "inner," and "outer," etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. They are used only for the convenience of describing the invention and for simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on the invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and should not be construed as indicating or implying relative importance.
[0022] In the description of this invention, it should be noted that, unless otherwise explicitly specified and limited, the terms "installation," "connection," and "linking" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal connection of two components. Those skilled in the art can understand the specific meaning of the above terms in this invention according to the specific circumstances.
[0023] The following combination Figures 1-4 Specific embodiments of the present invention will be described in detail below. It should be understood that the specific embodiments described herein are for illustrative and explanatory purposes only and are not intended to limit the present invention.
[0024] This invention proposes a reinforcement learning framework for hexapod robots to address damage control requirements. By using a cross-attention mechanism at the joint dimension to perceive joint information under damage conditions, it achieves adaptive motion control for hexapod robots when one leg is randomly damaged and at most three joints on that leg are damaged.
[0025] This invention relates to a damage control method for a hexapod robot based on a joint-sensory attention mechanism. The hexapod robot includes a robot platform and six legs connected to the platform. Each leg includes a hip bone, femur, and tibia. A hip joint is located between the hip bone and the platform, a hip joint between the hip bone and the tibia, and a tibia joint between the femur and the tibia. Each joint is equipped with a drive motor, and each leg has three drive motors, for a total of 18 drive motors in the hexapod robot. Figure 1 As shown, joint damage includes two types: joint locking and joint failure, both of which occur on a single leg. Robot joint damage can be categorized based on the number of damaged joints: single-drive damage, dual-drive damage, and triple-drive damage.
[0026] Work process as follows Figure 3 As shown, a reinforcement learning model for damage control of a hexapod robot is created based on the robot's structure, motion characteristics, and damage scenarios. This model is then deployed on the hexapod robot's edge device. The model includes a damage recognition module, an attention module, a temporal information module, a spatial information module, and an action interpolation module. In actual use, the algorithm is divided into a training phase (S1) and a deployment phase (S2). The training phase (S1) precedes the deployment phase. First, a morphological damage dataset for training the control model needs to be constructed. Then, the damage dataset is used to complete policy training. The deployment phase (S2) mainly includes power-on self-check, control policy inference, and action interpolation inference.
[0027] For hexapod robots equipped with the aforementioned reinforcement learning model for hexapod robot damage control, this invention provides a hexapod robot damage control method based on a joint-aware attention mechanism, comprising the following steps: S1 Training Phase; Training the robot control strategy in a simulation environment; including the following sub-steps: S1.1 Construct a morphologically damaged dataset for controlling model training; S1.2 Time information embedding; S1.3 Spatial Information Embedding; S1.4 Attention Calculation.
[0028] The S2 deployment phase includes the following sub-steps: S2.1 The robot performs a self-test upon startup, checking that the robot's joint angles and platform posture readings are correct; S2.2 Time Information Embedding; The time information module in the damage control reinforcement learning model obtains the joint angles of the hexapod robot at the current moment. and platform posture The historical joint angles and historical action pairs at historical moments are output as latent time information variables through the embedding layer, serving as the query embedding q and value embedding v of the attention module; S2.3 Spatial Information Embedding; First, the damage recognition module is used to determine the joint angles of each joint of the hexapod robot at the current moment. and platform posture The error between the robot's actual position and the expected position is analyzed joint by joint, and the error is compared with a set threshold to determine the joint damage result and output the damage mask; the spatial information module predefines the robot morphological context based on the robot type and calculates it joint by joint with the damage mask, and outputs the spatial information latent variables through the embedding layer as the key embedding k of the attention module; S2.4 Attention Calculation: Input the query embedding q, value embedding v, and key embedding k into the attention module to calculate the attention, dynamically perceive the joint state, dynamically allocate the influence weight of each joint based on the joint dimension of the attention, and output the action. The S2.5 motion interpolation module generates motion based on the damage control strategy, and generates driver control signals through the robot's underlying controller to control the movement of the damaged hexapod robot.
[0029] Regarding the damage control reinforcement learning model, training is required before practical application. This invention adopts an end-to-end training-deployment framework, covering damage dataset generation, policy training, policy virtual-to-real-world transfer, and physical deployment. A damage dataset is generated specifically for the damage problem. This dataset is used to train the hexapod robot's policy, and the policy is then transferred to a physical prototype. This process is end-to-end. This invention overcomes the shortcomings of traditional methods that rely on precise modeling and are limited by specific scenarios. It is applicable to different hexapod robot platforms and has stronger versatility and robustness. Potential applications include disaster relief and high-risk environment operations, significantly improving the robot's task completion rate and survivability in unpredictable environments.
[0030] The goal of the online training phase of the damage control reinforcement learning model is to train a policy network capable of controlling a hexapod robot with joint damage. Robot configurations are sampled with replacement and equal probability from the morphology set and trained in a simulation environment. The inputs to the policy network include a morphological context generated based on the robot configuration, a damage context, and historical state-action pairs obtained from the simulation environment. Historical states include platform position, platform velocity, joint position, and joint velocity. Actions are the outputs of the policy network from the previous time step. The output of the policy network is the target joint angle that the robot joint should reach in the current time step, which is converted into servo motion signals by a low-level controller based on robot inverse kinematics. The goal of the deployment phase is to achieve motion control for the hexapod robot experiencing joint-driven damage. The policy network is deployed in the robot controller, with inputs including a predefined morphological context, a damage context, and historical state-action pairs obtained from sensors. The damage context is obtained by analyzing the error between the robot's actual position and the desired position joint by joint using a damage recognition module (DRM). The remaining steps are consistent with the training phase.
[0031] Specifically, the reinforcement learning model for damage control of the hexapod robot used in this invention is created as follows: The damage control problem of hexapod robots can be modeled as a Markov decision process with a damage scenario context, consisting of six tuples. This indicates that, within the set N (referred to as the morphology set), which comprises all possible damage patterns of a hexapod robot, Represents contextual information, where For morphological context, used to represent joint connection relationships: ; This is a corruption context used to indicate driver corruption conditions. ; Represent the state space, where the platform state space is... This includes the robot platform's position information and joint state space. This includes the joint angles of each joint of the robot; A represents the action space, Pr and γ represent the state transition probability and discount factor, respectively, which are fixed parameters for reinforcement learning; R represents the reward, defined as the time step. arrive The sum of cumulative decaying rewards (where H is the time window): ; Therefore, the damage control problem is defined as a problem in the state space given an initial state. and context information Optimize a strategy This maximizes the expected reward on the morphological set: ; Table 1 lists the relevant variable symbols used in this invention.
[0032]
[0033] Table 1. Summary Table of Symbols The training process of the damage control reinforcement model is as follows: Figure 3 As shown on the left: During the online training phase, robot configurations are sampled with replacement and trained using equal probability from a morphological set. The morphological context and damage context, representing spatial information, are simultaneously determined, while historical state-action pairs obtained from the environment serve as policy inputs. Historical states include: platform position, platform velocity, joint position, and joint velocity. Actions are the target joint positions. The output is the target position of the robot joint, i.e., the action. During the deployment phase, the damage recognition module (DRM) analyzes the error between the robot's actual position and the desired position joint by joint to form a damage context. The policy network obtained during the training phase receives inputs and generates motion commands to control the damaged robot. The training process is as follows: S1.1 Morphology Set Construction The morphology set consists of robot samples with up to three damaged actuators. Based on experimental data and damage experience analysis, the damage location is constructed according to two rules: spatial locality, where multiple actuator failures are confined to a single leg; and distal joint priority, where dual damage only occurs in adjacent distal joints. Based on these rules, five possible damage scenarios were generated: hip joint damage, femoral joint damage, tibial joint damage, femoral and tibial joint damage, and complete leg damage. Actuator damage covers two common modes: actuator locking, i.e., the joint position is fixed; and actuator failure, i.e., the joint torque is reduced to zero, degenerating into a passive joint. By combining these damage rules and patterns, 10 damaged robot configuration samples were generated for each leg, resulting in a total of 60 different damaged robot configurations for reinforcement learning training.
[0034] S1.2 Time Information Embedding In this step, the query embedding q and value embedding v are obtained through the time information module. The time information module processes the robot's time-related input information, which includes two parts: the query embedding q and the value embedding v. The robot's joint angles at the current moment are also considered. and platform posture By observing the embedding layer, the query embedding q is obtained; the historical joint angles and historical actions at historical moments are compared. With historical platform posture The value embedding v is obtained by merging the observation S at the current moment with the time embedding layer.
[0035] S1.3 Spatial Information Embedding In this step, the key embedding k is obtained through the spatial information module. The spatial information module processes the robot's spatially relevant input information, which is then used as the key embedding k. The spatial information consists of two parts: the robot's morphological context. This is used to describe the morphological and structural relationships of a robot. The robot's connection relationships are modeled as a tree structure, where the platform is considered the root node, other links are considered child nodes, and joints are considered edges. The robot's connection relationships can then be represented using a self-loop adjacency matrix; the robot's damage context... This is used to describe situations where the robot's drive is damaged. When a drive failure occurs, the corresponding position is set to 0, and the remaining positions are set to 1, forming a damage mask representing the robot's damage relationships. The spatial information of the robot is obtained by multiplying the self-loop adjacency matrix (representing morphological information) and the damage mask (representing damage information) by their diagonal elements.
[0036] S1.4 Attention Calculation The joint-sensing attention module works as follows: like Figure 2 As shown, the robot's policy network It uses a Transformer pure encoder-only architecture. The computation of joint-aware attention requires query q, value v, and key k embeddings. The attention module is designed as a cross-attention layer within the encoder-only architecture, focusing on the joint dimension. The number of actuators should be matched.
[0037] make , and Let the embedding layer weight matrices be the query, value, and key, respectively. Then, the query matrix is defined as: ; The value matrix is defined as: ; The key matrix is defined as: ; The morphological information C is represented in matrix form, and for the i-th and j-th joints: ; like Figure 4As shown, the query matrix, value matrix, and key matrix are processed through a joint-aware attention layer to perform attention operations, resulting in the action output of the policy. The joint-aware attention layer consists of multiple identical attention modules connected in series, where the multi-head attention layer represents the attention operation: ; The normalization layer represents batch normalization operations, and the feedforward layer represents fully connected operations.
[0038] The robot deployment process is as follows: S2.1 The robot performs a self-test upon startup, checking that the robot's joint angles and platform posture readings are correct; S2.2 Embedding time information, same as step S1.2; S2.3 Spatial information embedding, same as step S1.3; During the training phase, damage context is treated as privileged information, but it is unobservable in real-world deployment environments. Therefore, damage needs to be identified by comparing the error between the target and the true position. Error estimation is implemented by maintaining a state buffer of length H time steps, receiving the true position sequence of the servo sensor at past time tH, and comparing it one-to-one with the policy output position at time tH-1. An empirical threshold ε is set; if the error at each time step is greater than ε, it indicates that the policy network's output has not been executed correctly, i.e., the joint is damaged.
[0039] S2.4 Attention calculation, same as step S1.4; S2.5 Action Interpolation: Due to limitations in computing resources and communication latency, computationally intensive attention operations are difficult to deploy frequently on robot edge devices. To ensure the real-time output of robot control commands, a low-frequency strategy and an action interpolation module are combined. During periods of policy output vacuum, the action interpolation module generates control signals, enabling a single policy network output to correspond to multiple actions.
[0040] The key technical point of this invention is: 1. Joint-Dimensional Transformer: This includes the network structure of the joint-dimensional Transformer, the joint-dimensional attention mechanism, and the spatiotemporal embedding module. This network solves the damage control problem for hexapod robots with up to three joints damaged.
[0041] 2. An end-to-end training and deployment framework for damage control of a hexapod robot based on deep reinforcement learning. This includes hexapod robot morphology set generation, joint-dimensional Transformer training, DRM module design, and policy network deployment on a physical prototype. This framework was used to train the damage control policy for the hexapod robot and deploy it on a physical prototype.
[0042] Existing technologies often employ hierarchical controllers to simplify the problem by reducing the search space of solutions, but this limits the possibility of achieving higher performance through non-periodic gait. Furthermore, the underlying periodic gait generators (such as central pattern generators, gait libraries, and gait patterns) all require coupling and grouping of joints in the form of legs, which easily causes training to converge to local optima. This method relies on an accurate robot motion model, so it is only applicable to specific hexapod robots.
[0043] This invention uses a joint-dimensional attention mechanism to address the inefficient policy training problem caused by the increased number of joints in hexapod robots. The search space is the full motion space, preserving the possibility of generating non-periodic gait. The attention mechanism allows any two joints to generate attention without coupling or grouping, thus training tends to explore the global optimum. At the same time, this end-to-end method does not depend on the motion model and is applicable to different hexapod robots.
[0044] Any process or method described in the flowcharts of this invention or otherwise herein can be understood as representing a module, segment, or portion of code comprising one or more executable instructions for implementing a particular logical function or process, which can be implemented in any computer-readable medium for use by an instruction execution system, apparatus, or device. The computer-readable medium can be any medium containing a program for storage, communication, propagation, or transmission for use by the execution system, apparatus, or device, including read-only memory, magnetic disks, or optical disks.
[0045] In the description of this specification, references to terms such as "embodiment," "example," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, those skilled in the art can combine or combine the different embodiments or examples described in this specification and the features therein without causing contradiction.
[0046] While embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention. Those skilled in the art can make changes, modifications, substitutions, and alterations to the above embodiments within the scope of the present invention.
Claims
1. A damage control method for a hexapod robot based on a joint-sensing attention mechanism, characterized in that, The method is based on a damage control reinforcement learning model and includes: first, acquiring the joint angles, platform posture, and historical motion information of the hexapod robot at the current and historical moments; second, performing joint-by-joint damage identification based on the error between the actual and expected joint positions to obtain the location and type of damaged joints; third, constructing morphological context, damage context, and damage mask based on the robot's connectivity and damage results; fourth, inputting temporal and spatial information into the joint dimension attention module, dynamically allocating the influence weights of each joint, and outputting the damage control strategy; and finally, generating continuous control signals based on the motion interpolation module and driving the robot's motion.
2. The damage control method for a hexapod robot based on a joint-sensing attention mechanism according to claim 1, characterized in that, For the damage control reinforcement learning model, it needs to be trained before practical application. An end-to-end training-deployment framework is adopted, which covers damage dataset generation, policy training and hexapod robot deployment. A damage dataset is generated for the damage problem, the damage dataset is used to train the hexapod robot's policy, and the policy is transferred to the hexapod robot's edge device.
3. The damage control method for a hexapod robot based on a joint-sensing attention mechanism according to claim 1, characterized in that, During the online training phase of the damage control reinforcement learning model, robot configurations are sampled with replacement and equal probability from the morphological set and simulated training is performed. The morphological context and damage context, representing spatial information, are determined simultaneously, while historical state-action pairs obtained from the environment are used as policy inputs.
4. The damage control method for a hexapod robot based on a joint-sensing attention mechanism according to claim 3, characterized in that, The historical status includes, for example: platform position, platform speed, joint position, and joint speed; The action refers to the target joint position.
5. The damage control method for a hexapod robot based on a joint-sensing attention mechanism according to claim 3, characterized in that, During the deployment phase, the damage identification module of the damage control reinforcement learning model analyzes the error between the robot's actual position and the desired position joint by joint to form a damage context. Input is received through the policy network obtained during the training phase to generate motion commands, thereby realizing the control of the damaged robot.
6. The damage control method for a hexapod robot based on a joint-sensing attention mechanism according to claim 1, characterized in that, In the creation of the reinforcement learning model for damage control of hexapod robots, the damage control problem is modeled as a Markov decision process with damage scenario context, consisting of six tuples. It means that among them Represents contextual information, where This serves as a morphological context, used to represent joint connection relationships. This is a damage context used to indicate driver corruption. Represents the state space, including the platform state space. and joint state space A represents the action space, Pr represents the state transition probability, γ represents the discount factor, and R represents the reward.
7. The damage control method for a hexapod robot based on a joint-sensing attention mechanism according to claim 6, characterized in that, Reward R is the time step. arrive The sum of cumulative decaying rewards, where H is the time window; the damage control problem is defined as a problem with a given initial state. and context information Optimize a strategy to maximize the expected reward on the morphology set.
8. The damage control method for a hexapod robot based on a joint-sensing attention mechanism according to claim 1, characterized in that, The damage control reinforcement learning model includes five damage location combinations: hip joint damage, hip joint damage, tibial joint damage, hip and tibial joint damage, and whole leg damage. The drive damage types include actuator locking and failure. Actuator locking will cause the joint position to be fixed and unable to rotate, while actuator failure will cause the joint motor to be unable to output torque, and the leg joint will degenerate into a passive joint.
9. The damage control method for a hexapod robot based on a joint-sensing attention mechanism according to claim 1, characterized in that, In the joint-aware attention module of the damage control reinforcement learning model, the robot's policy network is a Transformer pure encoder architecture. The computation of joint-aware attention involves querying q, value v, and key k embedding matrices. The attention module only has a cross-attention layer within the encoder architecture, and the joint dimension... The number of actuators should be matched.
10. The damage control method for a hexapod robot based on a joint-sensing attention mechanism according to claim 1, characterized in that, In the damage identification module of the damage control reinforcement learning model, the error estimation is implemented as follows: maintain a state buffer pool of length H time steps, receive the real position sequence of the servo sensor at the past time tH, and compare it one by one with the policy output position at time tH-1; set a threshold ε, if the error at each time step is greater than ε, it means that the output of the policy network has not been executed correctly, and it is judged that the joint has been damaged.