A robot shaft hole assembly method and system based on guided teaching reinforcement learning

CN118528255BActive Publication Date: 2026-06-23QINGDAO UNIV OF TECH +3

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: QINGDAO UNIV OF TECH
Filing Date: 2024-05-11
Publication Date: 2026-06-23

Application Information

Patent Timeline

11 May 2024

Application

23 Jun 2026

Publication

CN118528255B

IPC: B25J9/16

AI Tagging

Application Domain

Programme-controlled manipulator

Technology Topics

SimulationMyoelectric arm

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN118528255B_ABST

Patent Text Reader

Abstract

The application discloses a robot shaft hole assembly method and system based on guided teaching reinforcement learning, and the method comprises the following steps: a robot adopts a reinforcement learning method to perform an assembly operation, a human expert wears an electromyographic arm ring to monitor an assembly process and progress; a control center receives sEMG signals in real time and compares the sEMG signals with a set threshold value, when the sEMG signals are not less than the set threshold value, the control center controls the robot to perform an assembly task by adopting a guided teaching method, when the sEMG signals are lower than the set threshold value, the control center controls the robot to perform an assembly task by adopting the reinforcement learning method; the control center integrates decision processes and output sets of the guided teaching method and the reinforcement learning method into a unified assembly skill framework, and the assembly skill framework selects the reinforcement learning method or the guided teaching method according to input parameters. The application can effectively reduce the training cost of the reinforcement learning by guiding the control center to perform the guided teaching, and the robot has better adjustment and adaptation capabilities.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of robot assembly technology, and specifically relates to a robot shaft hole assembly method and system based on guided teaching reinforcement learning. Background Technology

[0002] Shaft and hole assembly is a classic task for robots. Based on whether the shaft and hole are in contact during the assembly process, this task can be divided into non-contact and contact methods. Contact methods often employ tactile sensing control, based on force feedback control. Force sensors are installed to acquire force and torque information from the robot's end effector, and feedback control is used to control the robot's movement, thus achieving shaft and hole assembly. This method suffers from poor safety performance, complex assembly process, and low work efficiency. Non-contact methods generally use vision alignment to complete shaft and hole assembly. A camera at the robot's end effector acquires the positional information of the shaft, hole, and workpiece, thus achieving shaft and hole assembly. However, the accuracy is limited by factors such as camera precision and ambient light.

[0003] To address the safety and versatility issues in contact and non-contact shaft-hole assembly, Chinese invention patent application CN117086877A discloses a robot shaft-hole assembly method, apparatus, and device based on deep reinforcement learning. The method includes the following steps: constructing a robot state set and action set; inputting the state set and action set into a trained deep reinforcement learning-based shaft-hole assembly algorithm model, outputting a selected assembly action; executing the assembly action to complete the shaft-hole assembly, recording the state at the completion of the shaft-hole assembly, using this state as input for the next state, and selecting the next assembly action.

[0004] Reinforcement learning (RL) has enormous potential for application in robotics. Robot assembly strategies based on reinforcement learning offer an automated and autonomous solution for assembly tasks. Reinforcement learning is an intelligent algorithm that employs autonomous learning, learning the appropriate actions to take in various states through interaction with the environment to maximize cumulative rewards. Through training with reinforcement learning algorithms, robots can learn and master efficient assembly skills over time, thereby completing accurate shaft and hole assembly tasks. This approach provides a new technological pathway for production line automation and can significantly reduce the prior knowledge and human intervention required for assembly tasks. In assembly tasks, robots acquire independent exploration and perception capabilities through reinforcement learning, enabling them to determine the position of the workpiece, the robot's posture, and other state information, and select appropriate actions based on this information to complete the assembly task, such as moving or rotating the end effector and the workpiece.

[0005] However, training physical robots using reinforcement learning in real-world environments suffers from inefficiency; to achieve acceptable results, the robot agent requires significant training time and resources. Furthermore, as the state space increases in size and complexity, the training cost for finding the optimal policy also rises. Additionally, traditional autonomous robot operating systems cannot meet the needs of all production tasks. In reinforcement learning models, the robot agent forms a closed loop with the environment, relying entirely on sparse rewards for decision-making and action selection. This can lead to task failures under changing conditions, particularly in tasks with environmental and state uncertainty. Human experience and knowledge are crucial; human involvement and guidance enable the robot to adapt and adjust better. Summary of the Invention

[0006] This invention provides a robot shaft hole assembly method and system based on guided teaching reinforcement learning. It obtains human intention, variable impedance strategy and assembly state evaluation index from the sEMG signal generated during the human teaching assembly process, and integrates guided teaching strategy and reinforcement learning strategy. It aims to solve the problems of high training cost and insufficient adjustment and adaptation ability in the prior art.

[0007] To solve the above-mentioned technical problems, the shaft hole assembly method proposed in this invention includes the following steps:

[0008] S1: The robot uses reinforcement learning to perform assembly operations, while human experts wear electromyography armbands to monitor the assembly process and progress.

[0009] S2: The control center receives the sEMG signal in real time and compares it with the set threshold. When the sEMG signal is not less than the set threshold, the control robot uses the guided teaching method to perform the assembly task. When the sEMG signal is lower than the set threshold, the control robot uses the reinforcement learning method to perform the assembly task.

[0010] S3: The control center integrates the decision-making process and output of the two methods, guided teaching and reinforcement learning, into a unified assembly skill framework. The assembly skill framework selects to execute either the reinforcement learning method or the guided teaching method based on the input parameters.

[0011] The guided teaching method is specifically as follows:

[0012] S21: Human experts wearing electromyography (EMG) armbands drag the robot's end effector to perform the teaching assembly process. The EMG armbands send the collected sEMG signal data and IMU Euler angle data to the control center. The robot sends the assembly contact force, torque data and robot end effector position data to the control center.

[0013] S22: The control center preprocesses the received sEMG signal data to generate smoothed sEMG data;

[0014] S23: Construct an assembly direction intention recognition model based on sEMG signals, and identify the motion direction generated by human experts performing assembly actions according to the smoothed sEMG data and state parameters;

[0015] S24: The control center calculates the damping matrix through the inverse impedance control model based on the assembly contact force, torque data and motion direction. The robot controls the movement and rotation of the end effector in the X, Y and Z axis directions based on the six-dimensional diagonal vector in the damping matrix.

[0016] Preferably, the preprocessing method is the moving average method, and the window radius of the moving average method is set to 5.

[0017] Preferably, step S23 specifically includes:

[0018] Calculate the position difference of the robot at two consecutive moments in one direction of the robot's end effector space:

[0019]

[0020] In the formula, To calculate the position difference, and They are respectively and Position data of the robot's end effector at all times;

[0021] Based on the positional changes of the end effector on the X and Y axes in the robot's base coordinate system, the magnitude and sign of the positional difference of the robot in the XY plane are used to determine the speed and direction of the robot's end effector, and eight motion direction labels are determined.

[0022] A motion direction recognition model is constructed using the support vector machine method. The model is based on sEMG signal data and IMU Euler angle data as input features and outputs the corresponding motion direction.

[0023] Preferably, the calculation method for the eight directions of motion is as follows:

[0024] and The direction of movement is upward;

[0025] and The direction of movement is to the upper right;

[0026] and The direction of movement is to the right;

[0027] and The direction of movement is to the lower right;

[0028] and The direction of movement is downward;

[0029] and The direction of movement is to the lower left;

[0030] and The direction of movement is to the left;

[0031] and The direction of movement is upward to the left;

[0032] in, The position difference in the X direction. This represents the position difference in the Y direction.

[0033] Preferably, the calculation process of the reverse impedance control model is as follows:

[0034] Calculate the current torque reading With target torque The difference between them yields the force error. Calculation force error transpose ;

[0035] Calculate the damping matrix and assign the damping parameters in the damping matrix to the robot's impedance controller;

[0036] A damping parameter strategy model based on sEMG signals is constructed. The damping parameter strategy model is trained using smoothed sEMG data and IMU Euler angle data, and the trained model is saved.

[0037] Preferably, the damping matrix is calculated as follows:

[0038]

[0039] in, The intermediate transformation matrix is calculated as follows:

[0040]

[0041] in, The response matrix is calculated as follows:

[0042]

[0043] In the above formula, This is the time control constant in the controller. For the transpose of force error, Here is the inertia matrix of the impedance controller. Due to speed difference, The velocity is based on the robot's position difference.

[0044] Preferably, the threshold set in step S2 is 1000μV.

[0045] Preferably, the reverse impedance control model is constructed using a logistic regression machine learning method.

[0046] Accordingly, this invention also proposes a robot shaft-hole assembly system based on guided teaching reinforcement learning, including a control center and a robot that communicates bidirectionally with the control center. The robot's end effector is equipped with a six-dimensional force / torque sensor for collecting assembly contact force and torque data and sending it to the control center. The system is characterized by further including an electromyography (EMG) arm loop, which collects sEMG signals and IMU Euler angle data from human experts and sends them to the control center. The control center also receives position data from the robot's end effector, executes the aforementioned assembly method, and controls the robot to switch between guided teaching and reinforcement learning methods to perform the shaft-hole assembly task.

[0047] Preferably, the communication method between the control center and the electromyography arm ring is one of RS-232, RS-485, Ethernet, Bluetooth, ZigBee and WiFi.

[0048] Compared with the prior art, the present invention has the following technical effects:

[0049] 1. The assembly method proposed in this invention incorporates the experience and knowledge of human experts, identifies the motion direction intention and variable impedance strategy from the sEMG signals generated during the assembly process taught by human experts, and sends the motion direction intention and variable impedance strategy to the robot to guide the assembly movement of the taught robot; it makes up for the potential limitations of reinforcement learning models and improves the adaptability and stability of the robot in complex assembly tasks.

[0050] 2. The assembly method proposed in this invention combines manual guidance teaching and reinforcement learning to assemble shaft holes in a controlled robot. The introduced manual guidance teaching method can reduce the large amount of training time and resources required for the robot agent in reinforcement learning, thereby reducing training costs. Attached Figure Description

[0051] Figure 1 This is a flowchart of the assembly method described in this invention;

[0052] Figure 2 This is a schematic diagram of the guided teaching and reinforcement learning method according to an embodiment of the present invention;

[0053] Figure 3 This is a schematic diagram of the movement direction according to an embodiment of the present invention;

[0054] Figure 4 This is a schematic diagram of the system setup according to an embodiment of the present invention.

[0055] The specific reference numerals in the attached diagram are: 10, control center; 20, robot; 30, hole part; 40, shaft part; 50, electromyographic arm ring; 60, six-dimensional force / torque sensor. Detailed Implementation

[0056] To make the objectives, technical solutions, and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below in conjunction with specific embodiments of the present application and with reference to the accompanying drawings.

[0057] Example 1

[0058] A robot shaft hole assembly method based on guided teaching reinforcement learning, such as Figure 1 , 2 As shown, it includes the following steps:

[0059] S1: The robot uses reinforcement learning to perform assembly operations, while human experts wear electromyography armbands to monitor the assembly process and progress.

[0060] S2: The control center receives sEMG (Surface electromyography) signals in real time and compares them with a set threshold. The control center uses a strategy switching mechanism to monitor the sEMG signal strength in the electromyography arm loop. When this strategy switching mechanism detects that the sEMG signal is not less than the set threshold, the control robot uses a guided teaching method to perform the assembly task. When the sEMG signal is lower than the set threshold, the control robot uses a reinforcement learning method to perform the assembly task.

[0061] In this embodiment, the threshold for comparison with real-time sEMG data is set to 1000μV. When the real-time sEMG data is detected to be not lower than 1000μV, the robot is controlled to perform the assembly task using a guided teaching method; however, when the real-time sEMG data is detected to be lower than 1000μV, the robot is controlled to perform the assembly task using a reinforcement learning method.

[0062] The guided teaching method is specifically as follows:

[0063] S21: Human experts wearing electromyography (EMG) armbands drag the robot's end effector to perform the teaching assembly process. The EMG armbands send the collected sEMG signal data and IMU (Inertial Measurement Unit) Euler angle data to the control center. The robot sends the assembly contact force, torque data, and robot end effector position data to the control center.

[0064] S22: The control center preprocesses the received sEMG signal data to generate smoothed sEMG data.

[0065] This embodiment uses a moving average method for preprocessing, with a window radius of 5. Specifically, the window size is first determined; a window radius of 5 means a window size of 11 (5 previous points, 5 current points, and 5 next points). Then, the window is slid along the signal, starting from the beginning and moving one point at a time. Next, the average value is calculated for all points within the window; this average replaces the original value at the window's center point. For boundary handling in the moving average method, where there aren't enough points at the beginning and end of the signal, boundary value duplication can be used to fill in the missing data.

[0066] Using the moving average method to process sEMG signals can make the processed sEMG signals smoother, and high-frequency noise and random fluctuations can be effectively suppressed.

[0067] S23: Construct an assembly direction intention recognition model based on sEMG signals to identify the motion direction generated by the assembly actions performed by a human expert based on the smoothed sEMG data and state parameters. This model utilizes sEMG signals, IMU Euler angle data, and the position data of the robot's end effector. Using a position difference method, it determines the motion trend of the robot's end effector based on its six-dimensional pose change information, thereby estimating the velocity direction of the robot's assembly motion during the teaching process, and marking the assembly direction during the teaching process with motion direction labels.

[0068] Specifically, in one direction of the robot's end effector space, the position difference of the robot at two consecutive moments is calculated:

[0069]

[0070] In the formula, To calculate the position difference, and They are respectively and The position data of the robot's end effector at all times, based on the position difference The sign of the sign determines the robot's direction of motion in that direction. If... A positive value indicates that the robot moved along that direction during the observation time interval, i.e., it moved forward in that direction; if... A negative value indicates that the robot moves in the opposite direction, i.e., backwards in that direction; if A value of zero indicates that the robot's position in that direction has not changed.

[0071] Based on the positional changes of the end effector on the X and Y axes in the robot's base coordinate system, the magnitude and sign of the positional difference of the robot in the XY plane are used to determine the speed and direction of the robot's end effector. Eight motion direction labels are determined using the common positional changes in the X and Y directions.

[0072] like Figure 3 As shown, the calculation method for the eight directions of motion is as follows:

[0073] and The direction of movement is upward;

[0074] and The direction of movement is to the upper right;

[0075] and The direction of movement is to the right;

[0076] and The direction of movement is to the lower right;

[0077] and The direction of movement is downward;

[0078] and The direction of movement is to the lower left;

[0079] and The direction of movement is to the left;

[0080] and The direction of movement is upward to the left;

[0081] in, The position difference in the X direction. This represents the position difference in the Y direction.

[0082] A motion direction recognition model is constructed using the support vector machine method. The model is based on sEMG signal data and IMU Euler angle data as input features and outputs the corresponding motion direction.

[0083] S24: A reverse impedance control model is constructed using the Logistic Regression machine learning method. Taking human expert sEMG signal data as input, the corresponding guided teaching strategy is output. First, the dataset is split into a feature set and a label set. The data collected by the electromyography arm loop serves as the feature set, containing eight-dimensional sEMG signal data and three-dimensional IMU unit Euler angle data. The output strategy serves as the label set, containing three-dimensional damping coefficients in the Y, Rx, and Ry directions. Then, the dataset is randomly divided into a training set and a test set, with the test set comprising 20% of the total dataset, sufficient to ensure the model's generalization ability while avoiding overfitting. Next, a logistic regression model is created using the Logistic Regression class, and the model is trained using the training set. Finally, the model is saved for use by the control center when switching to the guided teaching method.

[0084] Based on the assembly contact force, torque data, and direction of motion, the control center calculates the damping matrix using an inverse impedance control model. The robot then controls the movement and rotation of the end effector in the X, Y, and Z axes based on the six-dimensional diagonal vectors in the damping matrix.

[0085] When the robot performs an assembly task using the guided teaching method, it utilizes the assembly direction intention recognition model and the inverse impedance control model within the guided teaching method to obtain the assembly direction intention from the sEMG signal of a human expert. A 6×6 diagonal positive definite damping matrix is calculated using the inverse impedance control model. The assembly direction intention and damping matrix are then sent to the robot, controlling it to perform the assembly task using the guided teaching strategy. Simultaneously, the robot acquires contact force and end-effector position information as its current state. A mapping relationship between state inputs and the guided teaching method is established using a dictionary-structured key-value pair approach. Policy labels are assigned to the state inputs: Policy = 1 for guided teaching and Policy = 0 for reinforcement learning. In this 6×6 diagonal positive definite damping matrix, the six-dimensional diagonal vectors define the damping parameters in the robot's impedance control, controlling the movement and rotation of the robot's end effector in the X, Y, and Z directions, respectively.

[0086] Once human experts monitoring the assembly process observe that the assembly task is proceeding smoothly, manual guidance can be terminated. The control center's strategy switching mechanism detects that the sEMG signal peak value is below a set threshold of 1000μV, determines that the assembly environment is suitable for continuing to use the reinforcement learning strategy, and switches to the reinforcement learning method to continue executing the assembly task.

[0087] By incorporating the experience and knowledge of human experts, the intention of motion direction and the variable impedance strategy are identified from the sEMG signals generated during the assembly process taught by human experts. The intention of motion direction and the variable impedance strategy are then sent to the robot to guide the assembly motion of the taught robot. This approach compensates for the potential limitations of reinforcement learning models and improves the adaptability and stability of the robot in complex assembly tasks.

[0088] The calculation process of the reverse impedance control model is as follows:

[0089] Calculate the current torque reading With target torque The difference between them yields the force error. Calculation force error transpose The calculation method is as follows:

[0090]

[0091]

[0092] Calculate the damping matrix and assign the damping parameters in the damping matrix to the robot's impedance controller;

[0093] A damping parameter strategy model based on sEMG signals is constructed. The damping parameter strategy model is trained using smoothed sEMG data and IMU Euler angle data, and the trained model is saved.

[0094] The damping matrix is a diagonal positive definite matrix, and its six diagonal elements are damping parameters. The obtained damping parameters are assigned to the robot impedance controller, and the calculation method is as follows:

[0095]

[0096] in, This is the intermediate transformation matrix, a key matrix in the impedance control algorithm, reflecting the relationship between the current state and the target state. The calculation method is as follows:

[0097]

[0098] in, The response matrix represents the difference between the current state and the desired state of the system, reflecting the rate of change in the system's dynamic response. The calculation method is as follows:

[0099]

[0100] In the above formula, This refers to the time control constant in the robot controller. For the transpose of force error, Here is the inertia matrix of the impedance controller. Due to speed difference, The velocity is based on the robot's position difference.

[0101] S3: The control center integrates the decision-making process and output of both guided teaching and reinforcement learning methods into a unified assembly skill framework, which selects to execute either the reinforcement learning method or the guided teaching method based on the input parameters.

[0102] Example 2

[0103] like Figure 4 As shown, a robot shaft-hole assembly system based on guided teaching reinforcement learning includes a control center 10 and a robot 20 that communicates bidirectionally with the control center 10. The end effector of the robot 20 is equipped with a six-dimensional force / torque sensor 60 for collecting assembly contact force and torque data and sending it to the control center 10. The system is characterized by further including an electromyography (EMG) arm loop 50, which collects sEMG signals and IMU Euler angle data from human experts and sends them to the control center 10. The control center 10 also receives position data from the end effector of the robot 20 and executes the assembly method as described in Embodiment 1, controlling the robot 20 to switch between guided teaching and reinforcement learning methods to perform the shaft-hole assembly task.

[0104] In this embodiment, a computer is used as the control center 10. In other embodiments of the present invention, a microcontroller may also be used as the control center 10. The electromyography arm ring 50 in this embodiment is a MYO electromyography arm ring.

[0105] During assembly, the shaft component 50 is clamped at the end of the robot 20. Human experts can control the robot 20 to insert the shaft component 50 into the hole component 40, or the robot 20 can use a reinforcement learning strategy to insert the shaft component 50 into the hole component 40. A six-dimensional force / torque sensor 60 records the assembly contact force and torque data and transmits it to the control center 10. The control center 10 processes the contact force, torque data, sEMG electromyography signals, and robot position information through a Python interface.

[0106] The communication method between the control center 10 and the electromyography arm ring 50 is one of RS-232, RS-485, Ethernet, Bluetooth, ZigBee and WiFi.

[0107] The above description is only a preferred embodiment of the present invention. It should be noted that those skilled in the art can make several modifications and improvements without departing from the inventive concept of the present invention, and these all fall within the protection scope of the present invention.

Claims

1. A robot shaft hole assembly method based on guided teaching reinforcement learning, characterized in that, Includes the following steps: S1: The robot uses reinforcement learning to perform assembly operations, while human experts wear electromyography armbands to monitor the assembly process and progress. S2: The control center receives the sEMG signal in real time and compares it with the set threshold. When the sEMG signal is not less than the set threshold, the control robot uses the guided teaching method to perform the assembly task. When the sEMG signal is lower than the set threshold, the control robot uses the reinforcement learning method to perform the assembly task. S3: The control center integrates the decision-making process and output of the two methods, guided teaching and reinforcement learning, into a unified assembly skill framework. The assembly skill framework selects to execute either the reinforcement learning method or the guided teaching method based on the input parameters. The guided teaching method is specifically as follows: S21: Human experts wearing electromyography (EMG) armbands drag the robot's end effector to perform the teaching assembly process. The EMG armbands send the collected sEMG signal data and IMU Euler angle data to the control center. The robot sends the assembly contact force, torque data and robot end effector position data to the control center. S22: The control center preprocesses the received sEMG signal data to generate smoothed sEMG data; S23: Construct an assembly direction intention recognition model based on sEMG signals, and identify the motion direction generated by human experts performing assembly actions according to the smoothed sEMG data and state parameters; S24: The control center calculates the damping matrix through the inverse impedance control model based on the assembly contact force, torque data and motion direction. The robot controls the movement and rotation of the end effector in the X, Y and Z axis directions based on the six-dimensional diagonal vector in the damping matrix.

2. The robot shaft hole assembly method based on guided teaching reinforcement learning according to claim 1, characterized in that, The preprocessing method is the moving average method, and the window radius of the moving average method is set to 5.

3. The robot shaft hole assembly method based on guided teaching reinforcement learning according to claim 1, characterized in that, Step S23 specifically involves: Calculate the position difference of the robot at two consecutive moments in one direction of the robot's end effector space: In the formula, To calculate the position difference, and They are respectively and Position data of the robot's end effector at all times; Based on the positional changes of the end effector on the X and Y axes in the robot's base coordinate system, the magnitude and sign of the positional difference of the robot in the XY plane are used to determine the speed and direction of the robot's end effector, and eight motion direction labels are determined. A motion direction recognition model is constructed using the support vector machine method. The model is based on sEMG signal data and IMU Euler angle data as input features and outputs the corresponding motion direction.

4. The robot shaft hole assembly method based on guided teaching reinforcement learning according to claim 3, characterized in that, The calculation methods for the eight directions of motion are as follows: and The direction of movement is upward; and The direction of movement is to the upper right; and The direction of movement is to the right; and The direction of movement is to the lower right; and The direction of movement is downward; and The direction of movement is to the lower left; and The direction of movement is to the left; and The direction of movement is upward to the left; in, The position difference in the X direction. This represents the position difference in the Y direction.

5. The robot shaft hole assembly method based on guided teaching reinforcement learning according to claim 1, characterized in that, The calculation process of the reverse impedance control model is as follows: Calculate the current torque reading With target torque The difference between them yields the force error. Calculation force error transpose ; Calculate the damping matrix and assign the damping parameters in the damping matrix to the robot's impedance controller; A damping parameter strategy model based on sEMG signals is constructed. The damping parameter strategy model is trained using smoothed sEMG data and IMU Euler angle data, and the trained model is saved.

6. The robot shaft hole assembly method based on guided teaching reinforcement learning according to claim 5, characterized in that, The damping matrix is calculated as follows: in, The intermediate transformation matrix is calculated as follows: in, The response matrix is calculated as follows: In the above formula, This is the time control constant in the controller. For the transpose of force error, Here is the inertia matrix of the impedance controller. Due to speed difference, The velocity is based on the robot's position difference.

7. The robot shaft hole assembly method based on guided teaching reinforcement learning according to claim 1, characterized in that, The threshold set in step S2 is 1000μV.

8. The robot shaft hole assembly method based on guided teaching reinforcement learning according to claim 1, characterized in that, The reverse impedance control model is constructed using a logistic regression machine learning method.

9. A robot shaft hole assembly system based on guided teaching reinforcement learning, comprising a control center (10) and a robot (20) communicating bidirectionally with the control center (10), wherein the end of the robot (20) is provided with a six-dimensional force / torque sensor (60) for collecting assembly contact force and torque data and sending them to the control center (10); characterized in that, It also includes an electromyography (EMG) arm loop (50), which is used to collect sEMG signals and IMU Euler angle data from human experts and send them to a control center (10). The control center (10) also receives position data of the end effector of the robot (20), executes the assembly method as described in any one of claims 1-8, and controls the robot (20) to switch between guided teaching method and reinforcement learning method to perform shaft hole assembly task.

10. A robot shaft hole assembly system based on guided teaching reinforcement learning according to claim 9, characterized in that, The communication method between the control center (10) and the electromyography arm ring (50) is one of RS-232, RS-485, Ethernet, Bluetooth, ZigBee and WiFi.