A safety interaction control method for a robot arm
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGZHOU HUARUI ZHIPU INTELLIGENT ROBOT CO LTD
- Filing Date
- 2026-03-27
- Publication Date
- 2026-06-12
AI Technical Summary
Existing robotic arms suffer from insufficient safety, poor environmental adaptability, and poor energy constraint adaptability in dynamic human-computer interaction scenarios.
A dual-modal energy tank constraint mechanism is adopted. By distinguishing the contact state of the robotic arm, the energy release coefficient is dynamically switched. Combined with the DQN-PPO hybrid reinforcement learning optimizer, a multi-objective reward function is set to optimize the joint dynamic parameters in real time and solve the total joint control torque.
It achieves high safety and high precision control of the robotic arm in dynamic environments, with an end-effector trajectory tracking error of ≤0.05mm and system energy strictly constrained within a safe range, effectively solving the problem of impact force after contact loss.
Smart Images

Figure CN122185199A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of robotic arm control technology, and in particular to a safe interactive control method for robotic arms, applicable to seven-degree-of-freedom or multi-degree-of-freedom robotic arm applications such as industrial assembly and human-machine collaborative operations, which have high requirements for contact safety and dynamic adaptability. Background Technology
[0002] In human-robot collaboration, the suppression of impact force after contact loss and adaptive control in complex dynamic environments are core technical challenges for robotic arms. Existing technologies often employ traditional energy tank methods with a single energy constraint mechanism, which cannot dynamically adjust the constraint strength based on the robotic arm's contact state, making it prone to energy runaway and impact force issues after contact loss. Variable impedance control only achieves compliance adjustment through fixed parameter optimization, lacking adaptability to dynamic interaction scenarios. Reinforcement learning applications in robot control primarily focus on trajectory planning, failing to deeply integrate with energy constraint mechanisms to address safety control issues in human-robot interaction.
[0003] Existing related patents have obvious technical limitations, as follows:
[0004] Chinese invention patent CN119748438B discloses a stability assessment method for a robot system using a variable augmented matrix with a variable impedance model, enabling offline verification of robot stability when stiffness changes. Impedance control is a general force control scheme applicable to various robotic arm tasks, one application being human-robot collaboration: the robotic arm performs force-position hybrid control on an object according to an impedance control law, while the operator applies control forces to the object or robot to achieve human-robot collaboration, completing tasks such as assembly and handling. Depending on the robotic arm task, variable stiffness parameters can be used to reflect compliance and adaptability. It can not only control the dynamic relationship between external forces and robot motion, but also flexibly change these dynamic behaviors continuously during the task. Impedance control can be implemented in joint coordinates, Cartesian coordinates, or other task-specific coordinates. Impedance targets can be achieved in different ways.
[0005] However, the stability assessment method for the variable impedance model of the robot system using the variable augmented matrix only focuses on the stability assessment of the variable impedance model, without energy constraint strategies and reinforcement learning parameter optimization, and cannot solve the problem of impact force after contact loss.
[0006] Chinese invention patent CN117921666B discloses a variable impedance safety interaction method for a robotic arm. This method involves acquiring external torque and position information through the robotic arm's sensors; establishing a dynamic equation for the robotic arm; compensating the variable impedance controller based on the external torque information using the dynamic equation to build an impedance model; calculating the position error based on the position information; calculating the stiffness using the impedance model based on the position error and a preset variable stiffness strategy to obtain the joint torque; and inputting the joint torque to the robotic arm for control. This approach ensures the impedance characteristics of the robotic arm while preventing position errors from exceeding safety limits.
[0007] However, the variable impedance safety interaction method of this robotic arm only improves the interaction compliance by adjusting the variable impedance parameter, without a quantitative constraint mechanism for the energy tank, and lacks the ability to adapt to dynamic environments.
[0008] Chinese invention patent CN119734261B discloses an adaptive force-position hybrid control method based on an energy tank, comprising: redesigning the update rate of the energy tank state based on a barrier function to compensate for the deficiencies of the force-position hybrid control, and ensuring that the energy tank can adjust the system's passivity according to changes in impedance parameters and null space projection. Furthermore, based on a hierarchical decoupling model, an adaptive force-position hybrid controller is designed, ensuring the passivity of the closed-loop system. By dynamically optimizing the accuracy of force and position control according to the real-time state of the robot system, the method can achieve independent...
[0009] It meets high precision requirements during operation tasks, enhances compliance in case of unexpected interruption of human-computer interaction or contact surface, and ensures interaction safety and adaptability.
[0010] However, the aforementioned adaptive force-position hybrid control method based on energy tanks only uses a single energy tank to adjust the system's passivity according to changes in impedance parameters and zero-space projection, but it cannot design differentiated energy constraint strategies for the robotic arm, resulting in poor energy constraint adaptability.
[0011] Therefore, the robotic arms in the aforementioned existing technologies still suffer from technical defects such as insufficient safety, poor environmental adaptability, and poor energy constraint adaptability in dynamic human-computer interaction scenarios. Summary of the Invention
[0012] Therefore, the purpose of this invention is to provide a safe interactive control method for robotic arms, which can improve the safety of robotic arms in dynamic human-computer interaction scenarios, has the ability to adapt to dynamic environments, and can design differentiated energy constraint strategies to solve the problem of poor adaptability of single energy constraints.
[0013] To achieve the aforementioned objectives, the technical solution adopted by the robotic arm safety interactive control method of this application is as follows:
[0014] A method for safe interactive control of a robotic arm, comprising:
[0015] Collect the joint dynamics parameters of the robotic arm;
[0016] Based on the contact state of the robotic arm in the dual-mode energy tank constraint mechanism determined by the joint dynamic parameters, the dual-mode energy release coefficient is dynamically switched to achieve differentiated energy constraint.
[0017] The joint dynamics parameters are optimized in real time by setting a target reward function using a DQN-PPO hybrid reinforcement learning optimizer.
[0018] The constraint terms of the dual-modal energy tank constraint mechanism and the optimized joint dynamic parameters are input into the dynamic equation of the robotic arm to solve the total control torque of the joint with energy constraints of the robotic arm.
[0019] The robotic arm is driven to move according to the total control torque of the joint.
[0020] Furthermore, the joint dynamics parameters include the end-effector contact force of the robotic arm and a contact state recognition threshold, wherein the contact state includes a contact steady state and a contact loss state, and the bimodal energy release coefficient is defined as:
[0021] ;
[0022] In the formula, The two-mode energy release coefficient, For end contact force, For system energy, The system's energy safety threshold, Threshold for identifying contact status.
[0023] Furthermore, the joint dynamic parameters include the joint pose, joint velocity, end-effector contact force, system energy, trajectory tracking error, force control PID parameters, impedance stiffness, damping parameters, and contact state recognition threshold of the robotic arm; the joint dynamic parameters are optimized in real time using a DQN-PPO hybrid reinforcement learning optimizer, by setting a target reward function, including:
[0024] Construct a DQN-PPO hybrid reinforcement learning optimizer;
[0025] Using the joint pose, joint velocity, end contact force, system energy, and trajectory tracking error as the state space, and the force control PID parameters, impedance stiffness, damping parameters, and contact state recognition threshold as the action space, the multi-objective reward function is designed to optimize the joint dynamic parameters.
[0026] Preferably, the multi-objective reward function is defined as:
[0027] ,and In the formula, R is a multi-objective reward function. To indicate the reference contact force, For end contact force, For contact force tracking error, For end-point trajectory tracking error, For system energy, The system's energy safety threshold, As the first weighting coefficient, This is the second weighting coefficient. The third weighting coefficient, It is the fourth weighting coefficient. The bimodal energy release coefficient at the current moment is... The bimodal energy release coefficient at the previous moment. This represents the deviation value of the dual-mode energy release coefficient.
[0028] Furthermore, the optimized joint dynamic parameters include the optimized end-effector contact force, system energy, and contact state recognition threshold of the robotic arm. The constraint terms of the dual-modal energy tank constraint mechanism and the optimized joint dynamic parameters are input into the robotic arm dynamic equations to solve for the total energy-constrained joint control torque of the robotic arm, including:
[0029] The contact state of the robotic arm is determined based on the end contact force and the contact state identification threshold.
[0030] Based on the contact state and system energy, the force-impedance coupling weight of the robotic arm is calculated;
[0031] The total control torque of the joint is calculated based on the force-impedance coupling weight.
[0032] Preferably, the formula for calculating the dynamic equation of the robotic arm is:
[0033] ;
[0034] In the formula, ξ is the displacement vector. For velocity vectors, This is an estimated value for the velocity vector. For the acceleration vector estimate, For the quality matrix, Here is the damping matrix. Here is the stiffness matrix. Here is the friction damping matrix. For friction state variables, Let be the rate of change of frictional state. For energy state variables, The two-mode energy release coefficient, To avoid denominators being zero, use extremely small positive numbers. The energy decay coefficient, As an energy regulator, For end contact force, It is the rate of change of energy state; Threshold for identifying contact status.
[0035] Preferably, the formula for calculating the total control torque of the joint is: And in the formula, This is the total control torque of the joint. To control the torque, For impedance control torque, For force-impedance coupling weights.
[0036] Preferably, the contact state includes a stable contact state and a loss contact state; when the contact state is a stable contact state, the force-impedance coupling weight is 0.7, and when the contact state is a loss contact state, the force-impedance coupling weight is 0.3.
[0037] Furthermore, the robotic arm is driven to move according to the total joint control torque, and this may also include, simultaneously or subsequently:
[0038] Return to the steps for collecting the joint dynamic parameters of the robotic arm.
[0039] Furthermore, prior to the step of acquiring the joint dynamic parameters of the robotic arm, the following steps are also included:
[0040] Based on the DH parameter method and the Pinocchio library, the dynamic equations of the robotic arm are constructed, and the Jacobian matrix and joint dynamic parameters are solved.
[0041] Compared with the prior art, this application has at least the following technical effects:
[0042] The robotic arm safety interaction control method of this application adopts a dual-modal energy tank constraint mechanism. By distinguishing the contact state of the robotic arm, the dual-modal energy release coefficient is dynamically switched to achieve differentiated energy constraints, thereby solving the technical problem of poor adaptability caused by single energy constraints. By setting a multi-objective reward function through a DQN-PPO hybrid reinforcement learning optimizer, the joint dynamic parameters are optimized in real time, thereby realizing real-time adjustment of control parameters and adapting to different human-machine interaction scenarios. This enables the robotic arm's end-effector trajectory tracking error to be ≤ 0.05mm, demonstrating strong adaptability to dynamic environments. By using the constraint terms of the dual-modal energy tank constraint mechanism and the optimized joint dynamic parameters to calculate the total joint control torque, the dual-modal energy constraint and adaptive coupling strategy are coordinated. This not only strictly constrains the system energy within a safe range but also ensures high-precision control of the end-effector contact force and trajectory, effectively solving the problem of downward impact force after contact loss and improving the safety of the robotic arm in dynamic human-machine interaction scenarios. Attached Figure Description
[0043] Figure 1 This is a flowchart of a preferred embodiment of the robotic arm safety interactive control method of this application;
[0044] Figure 2 This is a schematic diagram of a preferred embodiment of the robotic arm safety interactive control device of this application;
[0045] Figure 3 This is a schematic diagram of a preferred embodiment of an electronic device according to this application. Detailed Implementation
[0046] The present application will now be described in detail with reference to the accompanying drawings and specific embodiments. The illustrative embodiments and descriptions used herein are intended to explain the present application, but are not intended to limit the scope of the application. It should be noted that descriptions involving "first," "second," etc., in the embodiments of this application are for descriptive purposes only and should not be construed as indicating or implying their relative importance or implicitly specifying the number of technical features indicated. Therefore, a feature defined with "first" or "second" may explicitly or implicitly include at least one of those features.
[0047] Furthermore, the technical solutions of the various embodiments can be combined with each other, but only if they are based on the ability of those skilled in the art to implement them. When the combination of technical solutions is contradictory or cannot be implemented, it should be considered that such combination of technical solutions does not exist and is not within the scope of protection claimed in this application.
[0048] The present application will be further described in detail below with reference to the accompanying drawings.
[0049] Figure 1 The illustration shows a robotic arm safety interactive control method provided by this application, comprising:
[0050] Collect the joint dynamics parameters of the robotic arm;
[0051] Based on the contact state of the robotic arm in the dual-mode energy tank constraint mechanism determined by the joint dynamic parameters, the dual-mode energy release coefficient is dynamically switched to achieve differentiated energy constraint.
[0052] The joint dynamics parameters are optimized in real time by setting a target reward function using a DQN-PPO hybrid reinforcement learning optimizer.
[0053] The constraint terms of the dual-modal energy tank constraint mechanism and the optimized joint dynamic parameters are input into the dynamic equation of the robotic arm to solve the total control torque of the joint with energy constraints of the robotic arm.
[0054] The robotic arm is driven to move according to the total control torque of the joint.
[0055] The robotic arm safety interaction control method of this application adopts a dual-modal energy tank constraint mechanism. By distinguishing the contact state of the robotic arm, the dual-modal energy release coefficient is dynamically switched to achieve differentiated energy constraints, thereby solving the technical problem of poor adaptability caused by single energy constraints. By setting a multi-objective reward function through a DQN-PPO hybrid reinforcement learning optimizer, the joint dynamic parameters are optimized in real time, thereby realizing real-time adjustment of control parameters and adapting to different human-machine interaction scenarios. This enables the robotic arm's end-effector trajectory tracking error to be ≤ 0.05mm, demonstrating strong adaptability to dynamic environments. By using the constraint terms of the dual-modal energy tank constraint mechanism and the optimized joint dynamic parameters to calculate the total joint control torque, the dual-modal energy constraint and adaptive coupling strategy are coordinated. This not only strictly constrains the system energy within a safe range but also ensures high-precision control of the end-effector contact force and trajectory, effectively solving the problem of downward impact force after contact loss and improving the safety of the robotic arm in dynamic human-machine interaction scenarios.
[0056] In some embodiments, the joint dynamics parameters include the end-effector contact force and contact state recognition threshold of the robotic arm, wherein the contact state includes a contact steady state and a contact loss state, and the bimodal energy release coefficient is defined as:
[0057] ;
[0058] In the formula, The two-mode energy release coefficient, For end contact force, For system energy, The system's energy safety threshold, Threshold for identifying contact status.
[0059] The contact state of the robotic arm is determined based on the end-effector contact force signal. When the contact state is a stable contact state, a relaxed energy constraint (i.e., =1), ensuring control response speed; when the contact state is the contact loss state, it switches to strict energy constraint (i.e., = This is to suppress the downward force.
[0060] In some embodiments, the joint dynamic parameters include the joint pose, joint velocity, end-effector contact force, system energy, trajectory tracking error, force control PID parameters, impedance stiffness, damping parameters, and contact state recognition threshold of the robotic arm; the joint dynamic parameters are optimized in real time by setting a target reward function through a DQN-PPO hybrid reinforcement learning optimizer, including:
[0061] Construct a DQN-PPO hybrid reinforcement learning optimizer;
[0062] Using the joint pose, joint velocity, end contact force, system energy, and trajectory tracking error as the state space, and the force control PID parameters, impedance stiffness, damping parameters, and contact state recognition threshold as the action space, the multi-objective reward function is designed to optimize the joint dynamic parameters.
[0063] By constructing a DQN-PPO hybrid reinforcement learning optimizer, which simultaneously optimizes the joint dynamics parameters (e.g., contact state recognition threshold, force control PID parameters, etc.), the limitations of traditional reinforcement learning in optimizing only a single target are overcome, and dual intelligent optimization of "state recognition-parameter adjustment" is achieved.
[0064] In some embodiments, the multi-objective reward function is defined as:
[0065] ,and In the formula, R is a multi-objective reward function. To indicate the reference contact force, For end contact force, For contact force tracking error, For end-point trajectory tracking error, For system energy, The system's energy safety threshold, As the first weighting coefficient, This is the second weighting coefficient. The third weighting coefficient, It is the fourth weighting coefficient. The bimodal energy release coefficient at the current moment is... The bimodal energy release coefficient at the previous moment. This represents the deviation value of the dual-mode energy release coefficient.
[0066] In some embodiments, the optimized joint dynamic parameters include the optimized end-effector contact force, system energy, and contact state recognition threshold of the robotic arm. The constraint terms of the dual-modal energy tank constraint mechanism and the optimized joint dynamic parameters are input into the robotic arm dynamic equations to solve for the total energy-constrained joint control torque of the robotic arm, including:
[0067] The contact state of the robotic arm is determined based on the end contact force and the contact state identification threshold.
[0068] Based on the contact state and system energy, the force-impedance coupling weight of the robotic arm is calculated;
[0069] The total control torque of the joint is calculated based on the force-impedance coupling weight.
[0070] Specifically, the calculation formula for the robotic arm's dynamic equation is as follows:
[0071] ;
[0072] In the formula, ξ is the displacement vector. For velocity vectors, This is an estimated value for the velocity vector. For the acceleration vector estimate, For the quality matrix, Here is the damping matrix. Here is the stiffness matrix. Here is the friction damping matrix. For friction state variables, Let be the rate of change of frictional state. For energy state variables, The two-mode energy release coefficient, To avoid denominators being zero, use extremely small positive numbers. The energy decay coefficient, As an energy regulator, For end contact force, It is the rate of change of energy state; Threshold for identifying contact status.
[0073] Specifically, the formula for calculating the total control torque of the joint is as follows: And in the formula, This is the total control torque of the joint. To control the torque, For impedance control torque, For force-impedance coupling weights.
[0074] Specifically, the contact state includes a stable contact state and a lost contact state. In this embodiment, it is preferred that the force-impedance coupling weight is 0.7 when the contact state is stable and 0.3 when the contact state is lost. In other embodiments, the force-impedance coupling weight can be set to 0.8 when the contact state is stable and 0.2 when the contact state is lost. Furthermore, users can set the specific values of the force-impedance coupling weight in the stable and lost contact states according to the actual usage scenario of the robotic arm. As long as the sum of the force-impedance coupling weights in the stable and lost contact states is equal to 1, it should fall within the protection scope of the robotic arm safety interaction control method of this application for those skilled in the art.
[0075] In some embodiments, the robotic arm is driven to move according to the total joint control torque, and may further include, simultaneously or subsequently:
[0076] Return to the steps for collecting the joint dynamic parameters of the robotic arm.
[0077] By returning to the step of collecting the joint dynamic parameters of the robotic arm, real-time status signals are fed back, forming an automatic closed-loop control.
[0078] In some embodiments, prior to the step of acquiring the joint dynamics parameters of the robotic arm, the method further includes:
[0079] Based on the DH parameter method and the Pinocchio library, the dynamic equations of the robotic arm are constructed, and the Jacobian matrix and joint dynamic parameters are solved, thus providing a foundation for the control algorithm of the robotic arm safety interactive control method of this application.
[0080] Figure 2 The diagram shows the robotic arm safety interactive control device of this application, comprising:
[0081] The parameter acquisition module is used to collect the joint dynamic parameters of the robotic arm;
[0082] The differentiated energy constraint module is used to dynamically switch the dual-mode energy release coefficient based on the contact state of the robotic arm in the dual-mode energy tank constraint mechanism determined by the joint dynamic parameters, thereby realizing differentiated energy constraint.
[0083] The parameter optimization module is used to set the target reward function and optimize the joint dynamics parameters in real time using the DQN-PPO hybrid reinforcement learning optimizer.
[0084] The torque calculation module is used to input the constraint terms of the dual-mode energy tank constraint mechanism and the optimized joint dynamic parameters into the dynamic equation of the robotic arm to calculate the total joint control torque of the robotic arm with energy constraints.
[0085] An execution module is used to drive the robotic arm to move according to the total control torque of the joint.
[0086] In some embodiments, the robotic arm safety interactive control device of this application further includes a feedback module, which is used to connect to the parameter acquisition module to realize real-time status signal feedback and form automatic closed-loop control.
[0087] In some embodiments, the robotic arm safety interactive control device of this application further includes a modeling module, which is used to construct the robotic arm dynamic equations based on the DH parameter method and the Pinocchio library, and solve the Jacobian matrix and joint dynamic parameters, thereby providing a basis for the control algorithm of the robotic arm safety interactive control device of this application.
[0088] The various modules of the aforementioned robotic arm safety interactive control device can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device in hardware form, or stored in the memory of a computer device in software form, so that the processor can call and execute the operations corresponding to each module. Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the above-described division of functional units and modules is merely an example. In practical applications, the above functions can be assigned to different functional units and modules as needed, that is, the internal structure of the device described in this application can be divided into different functional units or modules to complete all or part of the functions described above.
[0089] This application also provides a robotic arm safety interactive control system, including a perception layer, a state recognition layer, a hybrid reinforcement learning optimization layer, a control layer, and an execution layer; the perception layer is used to collect contact state signals of the robotic arm, the state recognition layer is used to determine the contact state and monitor system energy, the hybrid reinforcement learning optimization layer is used to output real-time optimized joint dynamic parameters, the control layer is used to integrate a dual-modal energy tank module and a force-impedance adaptive coupling controller, and the execution layer is used to drive the robotic arm to move and feed back state signals; the system is used to implement the steps of the robotic arm safety interactive control method described in the above embodiments.
[0090] The following details the specific implementation process of the robotic arm safety interactive control system of this application:
[0091] (I) Hardware Setup
[0092] Robotic arm: A seven-degree-of-freedom serial industrial robotic arm is adopted, equipped with joint photoelectric encoders (for collecting joint pose and joint speed information) and a six-dimensional force sensor at the end (for collecting end contact force of the robotic arm).
[0093] Control hardware: An industrial PC based on Linux+ROS (clock frequency ≥ 3.0GHz, memory ≥ 16G) serves as the control core to implement algorithm calculation;
[0094] Software tools: Matlab or Simulink are used to build simulation models, the Pinocchio library is used for joint dynamic parameters identification, and the PyTorch framework is used to train and deploy reinforcement learning models;
[0095] Communication method: The industrial control computer and the robotic arm servo driver communicate via EtherCAT bus with a communication frequency of ≥1kHz to ensure real-time control.
[0096] (ii) Parameter initialization
[0097] Impedance parameters: Impedance stiffness Damping parameters ;
[0098] Force control PID parameters: proportional coefficient Differential coefficients Integral coefficient Force feedforward compensation term ;
[0099] Energy tank parameters: Energy safety threshold Energy attenuation coefficient Minimal positive number ;
[0100] Reinforcement learning parameters: learning rate Discount factor First weighting coefficient Second weighting coefficient Third weighting coefficient Fourth weighting coefficient ;
[0101] Coupling weights: Contact steady state, force-impedance coupling weights Contact loss state, force-impedance coupling weight .
[0102] (III) Control Process
[0103] 1. Simulation training: Build a seven-DOF robotic arm simulation model in Matlab or Simulink, set different contact forces and external disturbance scenarios, and train the DQN-PPO hybrid reinforcement learning optimizer until the multi-objective reward function converges;
[0104] 2. Parameter Deployment: Deploy the trained DQN-PPO hybrid reinforcement learning optimizer to a Linux+ROS industrial control computer and initialize the relevant parameters of impedance, force control, and energy tank;
[0105] 3. Signal Acquisition: Real-time acquisition of joint dynamic parameters such as joint pose or speed, end contact force, and trajectory tracking error of the robotic arm via sensors, and transmission to the industrial control computer;
[0106] 4. Status Recognition: Determine the contact state of the robotic arm based on the end-effector contact force signal, calculate the system energy in real time, and coordinate with the system. contrast;
[0107] 5. Parameter optimization: The DQN-PPO hybrid reinforcement learning optimizer outputs real-time optimized joint dynamic parameters based on the acquired signals;
[0108] 6. Torque calculation: The force-impedance coupling torque is calculated based on the optimized joint dynamic parameters, and the total joint control torque is obtained by combining the dual-modal energy constraint term;
[0109] 7. Closed-loop operation: The industrial control computer sends the control torque to the servo driver through the EtherCAT bus to drive the robotic arm to move. At the same time, the sensors provide real-time feedback of status signals to achieve closed-loop control.
[0110] (iv) Experimental Results
[0111] The robotic arm safety interaction control method of this application is applied to a human-machine collaborative assembly scenario for a seven-degree-of-freedom industrial robotic arm. Experimental results show that:
[0112] When the robotic arm tracks a sinusoidal trajectory, the end effector position error is ≤ 0.05mm, and the contact force tracking error is ≤ 0.3N;
[0113] When re-contacting after contact loss, the peak value of the end contact force is ≤ 5N, and there is no issue with downward impact force.
[0114] The system energy is always kept within the 8J safety threshold, and there is no energy runaway.
[0115] The control effect showed no significant attenuation to external force disturbances of ±5N and 10% perturbation of the robot arm parameters, demonstrating good robustness.
[0116] Figure 3This application provides an electronic device, which can be a server. The electronic device includes a processor, memory, and a communication interface connected via a system bus. The processor provides computing and control capabilities. The memory can be implemented using any type of volatile or non-volatile storage device or a combination thereof, including but not limited to: disks, optical disks, EEPROMs, EPROMs, SRAMs, ROMs, magnetic storage, flash memory, and PROMs. The memory provides an environment for the operation of the operating system and computer programs stored within it. The communication interface is a network interface used for communication with external terminals via a network connection. When the computer program is executed by the processor, it implements the steps of the robotic arm safety interactive control method described in the above embodiments.
[0117] This application also provides a computer-readable storage medium storing a computer program / instructions thereon, which, when executed by a processor, implements the steps of the robotic arm safety interactive control method described in the above embodiments. The computer-readable storage medium includes, but is not limited to, ROM, RAM, CD-ROM, magnetic disk, and floppy disk.
[0118] This application also provides a computer program product, including a computer program / instructions, which, when executed by a processor, implement the steps of the robotic arm safety interactive control method described in the above embodiments.
[0119] The embodiments of this application have been described above with reference to the accompanying drawings. However, this application is not limited to the specific embodiments described above. The specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art can make many other forms under the guidance of this application without departing from the spirit and scope of the claims, and all of these forms are within the protection scope of this application.
Claims
1. A safe interactive control method for a robotic arm, characterized in that, include: Collect the joint dynamics parameters of the robotic arm; Based on the contact state of the robotic arm in the dual-mode energy tank constraint mechanism determined by the joint dynamic parameters, the dual-mode energy release coefficient is dynamically switched. The joint dynamics parameters are optimized in real time by setting a target reward function using a DQN-PPO hybrid reinforcement learning optimizer. The constraint terms of the dual-modal energy tank constraint mechanism and the optimized joint dynamic parameters are input into the dynamic equation of the robotic arm to solve the total control torque of the joint with energy constraints of the robotic arm. The robotic arm is driven to move according to the total control torque of the joint.
2. The robotic arm safety interactive control method according to claim 1, characterized in that, The joint dynamics parameters include the end-effector contact force and contact state recognition threshold of the robotic arm. The contact state includes a steady contact state and a loss contact state. The dual-modal energy release coefficient is defined as follows: ; In the formula, The two-mode energy release coefficient, For end contact force, For system energy, The system's energy safety threshold, Threshold for identifying contact status.
3. The robotic arm safety interactive control method according to claim 1, characterized in that, The joint dynamic parameters include the joint pose, joint velocity, end contact force, system energy, trajectory tracking error, force control PID parameters, impedance stiffness, damping parameters, and contact state recognition threshold of the robotic arm. By using a DQN-PPO hybrid reinforcement learning optimizer, a target reward function is set, and the joint dynamics parameters are optimized in real time, including: Construct a DQN-PPO hybrid reinforcement learning optimizer; Using the joint pose, joint velocity, end contact force, system energy, and trajectory tracking error as the state space, and the force control PID parameters, impedance stiffness, damping parameters, and contact state recognition threshold as the action space, the multi-objective reward function is designed to optimize the joint dynamic parameters.
4. The robotic arm safety interactive control method according to claim 3, characterized in that, The multi-objective reward function is defined as follows: ,and In the formula, R is a multi-objective reward function. To indicate the reference contact force, For end contact force, For contact force tracking error, For end-point trajectory tracking error, For system energy, The system's energy safety threshold, As the first weighting coefficient, This is the second weighting coefficient. The third weighting coefficient, It is the fourth weighting coefficient. The bimodal energy release coefficient at the current moment is... The bimodal energy release coefficient at the previous moment. This represents the deviation value of the dual-mode energy release coefficient.
5. The robotic arm safety interactive control method according to claim 1, characterized in that, The optimized joint dynamic parameters include the optimized end-effector contact force, system energy, and contact state recognition threshold of the robotic arm. The constraint terms of the dual-modal energy tank constraint mechanism and the optimized joint dynamic parameters are input into the robotic arm dynamic equations to solve for the total energy-constrained joint control torque of the robotic arm, including: The contact state of the robotic arm is determined based on the end contact force and the contact state identification threshold. Based on the contact state and system energy, the force-impedance coupling weight of the robotic arm is calculated; The total control torque of the joint is calculated based on the force-impedance coupling weight.
6. The robotic arm safety interactive control method according to claim 5, characterized in that, The formula for calculating the dynamic equation of the robotic arm is as follows: ; In the formula, ξ is the displacement vector. For velocity vector, This is an estimated value for the velocity vector. For the acceleration vector estimate, For the quality matrix, Here is the damping matrix. Here is the stiffness matrix. Here is the friction damping matrix. For friction state variables, Let be the rate of change of frictional state. For energy state variables, The two-mode energy release coefficient, To avoid denominators being zero, use extremely small positive numbers. The energy decay coefficient, As an energy regulator, For end contact force, It is the rate of change of energy state; Threshold for identifying contact status.
7. The robotic arm safety interactive control method according to claim 5, characterized in that, The formula for calculating the total control torque of the joint is as follows: And in the formula, This is the total control torque of the joint. To control the torque, For impedance control torque, For force-impedance coupling weights.
8. The robotic arm safety interactive control method according to claim 7, characterized in that, The contact state includes a stable contact state and a contact loss state; when the contact state is a stable contact state, the force-impedance coupling weight is 0.7, and when the contact state is a contact loss state, the force-impedance coupling weight is 0.
3.
9. The robotic arm safety interactive control method according to any one of claims 1-8, characterized in that, The robotic arm is driven to move according to the total joint control torque, and may include, or may include, the following: Return to the steps for collecting the joint dynamic parameters of the robotic arm.
10. The robotic arm safety interactive control method according to any one of claims 1-8, characterized in that, Before the step of collecting the joint dynamic parameters of the robotic arm, the following steps are also included: Based on the DH parameter method and the Pinocchio library, the dynamic equations of the robotic arm are constructed, and the Jacobian matrix and joint dynamic parameters are solved.