Methods, devices, equipment, and media for acquiring teaching data for robotic arm imitation learning.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TSINGHUA UNIVERSITY
- Filing Date
- 2024-05-14
- Publication Date
- 2026-06-30
Smart Images

Figure CN118269062B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of data tracking and acquisition technology, and in particular to a method, apparatus, equipment and medium for acquiring teaching data for robotic arm imitation learning. Background Technology
[0002] Visual servoing is a control system in which a camera or other visual sensor is used to acquire information from the environment in real time. This information is then used to guide and adjust the movement of a mechanical system. In robotic arm applications, visual servoing adjusts the position, orientation, or posture of the robotic arm in real time by analyzing visual input to achieve precise target tracking and manipulation. Robotic arm imitation learning is a method of training a robotic arm to perform tasks by observing and imitating demonstration (teaching) data. By collecting and learning from teaching data, the robotic arm can learn to perform specific tasks without explicit programming. Deep learning models are often used to learn the movement patterns of a robotic arm from teaching data.
[0003] Therefore, visual servoing and robotic arm imitation learning can be combined to form a comprehensive control strategy. Visual servoing provides real-time environmental perception, enabling the robotic arm to dynamically adjust according to the current visual input. Robotic arm imitation learning learns motion patterns from teaching data, enabling the robotic arm to exhibit motion behaviors similar to those of the teacher in specific tasks. Combining the two can achieve more flexible and intelligent robotic arm control, especially suitable for complex and uncertain environments.
[0004] In this process, the teaching data tracking method determines the effectiveness of combining visual servoing and robotic arm imitation learning. Generally, traditional teaching data tracking methods are only applicable to specific scenarios and conditions, lack the ability to generalize to changes and new environments, and mostly require manually designed features to represent the state of the robotic arm, which becomes very difficult in complex tasks. As the environment changes, it is often difficult to use large-scale teaching data for learning, resulting in limited model learning ability and relatively poor adaptability, making it difficult to cope with the challenges of different working scenarios.
[0005] In summary, existing data tracking methods struggle to learn from large-scale teaching data, limiting the model's learning ability and significantly impacting its generalization and environmental adaptability, which urgently needs to be addressed. Summary of the Invention
[0006] This application provides a method, apparatus, device, and medium for acquiring teaching data for robotic arm imitation learning, in order to solve the problems that existing data tracking methods are difficult to use for learning with large-scale teaching data, resulting in limited model learning ability and greatly affecting the model's generalization ability and environmental adaptability.
[0007] The first aspect of this application provides a method for acquiring teaching data for robotic arm imitation learning, comprising the following steps: acquiring image data of a target object collected in real time by a camera at the end of the robotic arm, and preprocessing the image data to generate standard image data that meets preset conditions; inputting the standard image data into a preset target detection model to generate image features corresponding to the standard image data and a first coordinate value of the target object in the camera coordinate system; calculating a second coordinate value of the target object in the camera coordinate system at the end of the robotic arm based on the first coordinate value and a preset real-time joint position of the robotic arm, and performing a world coordinate system transformation operation on the second coordinate value to generate trajectory data of the target object in the world coordinate system; obtaining robotic arm control command data based on the trajectory data, and establishing a training dataset based on the robotic arm control command data and the image features, so as to train a preset multilayer perceptron through the training dataset to generate a behavior cloning model, and integrating the behavior cloning model and a preset deep deterministic policy gradient model to construct a reinforcement learning model, so as to use the reinforcement learning model to acquire teaching data for robotic arm imitation learning.
[0008] Optionally, in one embodiment of this application, before acquiring image data of the target object in real time through a preset robotic arm end-effector camera, the method further includes: building a target monocular vision servo system based on a preset ROS environment; and obtaining the real-time joint position of the robotic arm according to the target monocular vision servo system.
[0009] Optionally, in one embodiment of this application, the step of generating image features corresponding to the image standard data and the first coordinate value of the target object in the camera coordinate system based on the image standard data and a preset target detection model includes: extracting initial image features from the image standard data; performing feature point detection on the initial image features to obtain multiple key points corresponding to the initial image features, and generating a feature descriptor for each of the multiple key points; performing feature matching on the feature descriptors to obtain feature matching results, and generating the image features according to the feature matching results and a preset filtering strategy; and generating position information and bounding box information of the robotic arm end-effector camera based on the target detection model and the image standard data, so as to determine the first coordinate value of the target object in the camera coordinate system according to the position information and the bounding box information.
[0010] Optionally, in one embodiment of this application, the step of calculating the second coordinate value of the target object in the coordinate system of the robot arm's end-effector camera based on the first coordinate value and a preset real-time joint position of the robot arm, and performing a world coordinate system transformation operation on the second coordinate value to generate trajectory data of the target object in the world coordinate system, includes: determining the correspondence between the first coordinate value and the real-time joint position of the robot arm, and obtaining the second coordinate value of the target object based on the correspondence and the real-time joint position of the robot arm; determining the static transformation relationship between the camera coordinate system and the robot arm's end-effector camera coordinate system, establishing the world coordinate system based on the static transformation relationship, and transforming the second coordinate value to the world coordinate system to generate trajectory data of the target object in the world coordinate system.
[0011] Optionally, in one embodiment of this application, the step of training a preset multilayer perceptron using the training dataset to generate a behavior clone model, and integrating the behavior clone model and a preset deep deterministic policy gradient model to construct a reinforcement learning model, so as to use the reinforcement learning model to obtain teaching data for robotic arm imitation learning, includes: training the multilayer perceptron based on a target loss function, a preset supervised learning strategy, and the training dataset to obtain the behavior clone model; constructing the deep deterministic policy gradient model according to a preset state space, action space, and reward function, and establishing the reinforcement learning model based on the behavior clone model and the deep deterministic policy gradient model; performing online control of the reinforcement learning model through a preset model prediction control strategy to obtain model prediction control results, and generating motion commands for the robotic arm end-effector camera based on the model prediction control results, so as to use the motion commands to control the robotic arm end-effector camera to obtain teaching data for robotic arm imitation learning.
[0012] A second aspect of this application provides a device for acquiring teaching data for robotic arm imitation learning, comprising: a acquisition module for acquiring image data of a target object captured in real time by a camera at the end of the robotic arm, and preprocessing the image data to generate standard image data that meets preset conditions; a conversion module for inputting the standard image data into a preset target detection model to generate image features corresponding to the standard image data and a first coordinate value of the target object in the camera coordinate system, calculating a second coordinate value of the target object in the camera coordinate system at the end of the robotic arm based on the first coordinate value and a preset real-time joint position of the robotic arm, and performing a world coordinate system transformation operation on the second coordinate value to generate trajectory data of the target object in the world coordinate system; and an acquisition module for obtaining robotic arm control command data based on the trajectory data, and establishing a training dataset based on the robotic arm control command data and the image features, training a preset multilayer perceptron through the training dataset to generate a behavior cloning model, and integrating the behavior cloning model and a preset deep deterministic policy gradient model to construct a reinforcement learning model, so as to acquire teaching data for robotic arm imitation learning using the reinforcement learning model.
[0013] Optionally, in one embodiment of this application, it further includes: a setup module, used to set up a target monocular vision servo system based on a preset ROS environment before acquiring image data of the target object in real time through a preset robotic arm end-effector camera; and a position module, used to obtain the real-time joint position of the robotic arm according to the target monocular vision servo system.
[0014] Optionally, in one embodiment of this application, the conversion module includes: an extraction unit, used to extract initial image features from the image standard data; a detection unit, used to perform feature point detection on the initial image features to obtain multiple key points corresponding to the initial image features, and generate a feature descriptor for each of the multiple key points; a matching unit, used to perform feature matching on the feature descriptor to obtain a feature matching result, and generate the image features according to the feature matching result and a preset filtering strategy; and a first determination unit, used to generate position information and bounding box information of the robotic arm end-effector camera based on the target detection model and the image standard data, so as to determine the first coordinate value of the target object in the camera coordinate system according to the position information and the bounding box information.
[0015] Optionally, in one embodiment of this application, the conversion module further includes: a second determining unit, configured to determine the correspondence between the first coordinate value and the real-time joint position of the robotic arm, and obtain the second coordinate value of the target object based on the correspondence and the real-time joint position of the robotic arm; and a third determining unit, configured to determine the static transformation relationship between the camera coordinate system and the coordinate system of the robotic arm end-effector camera, establish the world coordinate system according to the static transformation relationship, and convert the second coordinate value to the world coordinate system to generate trajectory data of the target object in the world coordinate system.
[0016] Optionally, in one embodiment of this application, the acquisition module includes: a training unit, configured to train the multilayer perceptron based on a target loss function, a preset supervised learning strategy, and the training dataset to obtain the behavior cloning model; an establishment unit, configured to construct the deep deterministic policy gradient model according to a preset state space, action space, and reward function, and establish the reinforcement learning model based on the behavior cloning model and the deep deterministic policy gradient model; and a control unit, configured to perform online control of the reinforcement learning model through a preset model prediction control strategy to obtain model prediction control results, and generate motion commands for the robotic arm end-effector camera based on the model prediction control results, so as to use the motion commands to control the robotic arm end-effector camera to acquire teaching data for robotic arm imitation learning.
[0017] A third aspect of this application provides an electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the method for acquiring robotic arm imitation learning teaching data as described in the above embodiments.
[0018] A fourth aspect of this application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described method for acquiring teaching data for robotic arm imitation learning.
[0019] Therefore, the embodiments of this application have the following beneficial effects:
[0020] The embodiments of this application acquire real-time image data of a target object from a camera at the end effector of a robotic arm, preprocess the image data to generate standard image data that meets preset conditions, and input the standard image data into a preset target detection model to generate image features corresponding to the standard image data and the first coordinate value of the target object in the camera coordinate system. Based on the first coordinate value and the preset real-time joint position of the robotic arm, the second coordinate value of the target object in the camera coordinate system at the end effector of the robotic arm is calculated, and the second coordinate value is transformed into the world coordinate system to generate trajectory data of the target object in the world coordinate system. Based on the trajectory data, robotic arm control command data is obtained, and a training dataset is established based on the robotic arm control command data and image features. A preset multilayer perceptron is trained using the training dataset to generate a behavior cloning model. The behavior cloning model and a preset deep deterministic policy gradient model are integrated to construct a reinforcement learning model. The reinforcement learning model is used to obtain teaching data for robotic arm imitation learning, thereby greatly improving the model's flexibility and environmental adaptability, and improving the model's generalization performance. Thus, it solves the problems of existing data tracking methods that are difficult to learn using large-scale teaching data, resulting in limited model learning ability and greatly affecting the model's generalization ability and environmental adaptability.
[0021] Additional aspects and advantages of this application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of this application. Attached Figure Description
[0022] The above and / or additional aspects and advantages of this application will become apparent and readily understood from the following description of the embodiments taken in conjunction with the accompanying drawings, wherein:
[0023] Figure 1 This is a flowchart illustrating a method for acquiring teaching data for robotic arm imitation learning according to an embodiment of this application;
[0024] Figure 2 This is a schematic diagram of the execution logic of a method for acquiring teaching data for robotic arm imitation learning according to an embodiment of this application;
[0025] Figure 3 This is an example diagram of a device for acquiring teaching data for robotic arm imitation learning according to an embodiment of this application;
[0026] Figure 4 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application.
[0027] Among them, 10-acquisition device for robotic arm imitation learning teaching data; 100-acquisition module, 200-conversion module, 300-acquisition module; 401-memory, 402-processor, 403-communication interface. Detailed Implementation
[0028] The embodiments of this application are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain this application, and should not be construed as limiting this application.
[0029] The following description, with reference to the accompanying drawings, describes a method, apparatus, device, and medium for acquiring robotic arm imitation learning teaching data according to embodiments of this application. To address the problems mentioned in the background art, this application provides a method for acquiring teaching data for robotic arm imitation learning. In this method, image data of a target object is acquired in real-time by a camera at the end of the robotic arm, and the image data is preprocessed to generate standard image data that meets preset conditions. The standard image data is input into a preset target detection model to generate image features corresponding to the standard image data and the first coordinate value of the target object in the camera coordinate system. Based on the first coordinate value and the preset real-time joint position of the robotic arm, the second coordinate value of the target object in the camera coordinate system at the end of the robotic arm is calculated, and a world coordinate system transformation operation is performed on the second coordinate value to generate trajectory data of the target object in the world coordinate system. Robotic arm control command data is obtained based on the trajectory data, and a training dataset is established based on the robotic arm control command data and image features. A preset multilayer perceptron is trained using the training dataset to generate a behavior cloning model. Furthermore, the behavior cloning model and a preset deep deterministic policy gradient model are integrated to construct a reinforcement learning model. This reinforcement learning model is used to acquire teaching data for robotic arm imitation learning, thereby greatly improving the model's flexibility and environmental adaptability, and enhancing its generalization performance. This solves the problems of existing data tracking methods, such as the difficulty in using large-scale teaching data for learning, which limits the model's learning ability and greatly affects its generalization ability and environmental adaptability.
[0030] Specifically, Figure 1 This is a flowchart illustrating a method for acquiring teaching data for robotic arm imitation learning, provided in an embodiment of this application.
[0031] like Figure 1 As shown, the method for acquiring teaching data for the robotic arm's imitation learning includes the following steps:
[0032] In step S101, image data of the target object collected in real time by the end-effector camera of the robotic arm is acquired, and the image data is preprocessed to generate image standard data that meets preset conditions.
[0033] The embodiments of this application first load the weights and structure of the pre-trained model to ensure that the model can recognize the end effector of the robotic arm (i.e., the end camera); secondly, nodes can be written in ROS to subscribe to camera image topics in real time to obtain image data of the target object collected in real time by the end camera of the robotic arm; then, the embodiments of this application can perform necessary preprocessing operations on the real-time acquired image data, such as image scaling, normalization or color space conversion, to meet the input requirements of the Faster R-CNN model with the ability to detect and track objects in real time.
[0034] Optionally, in one embodiment of this application, before acquiring image data of the target object in real time through a preset robotic arm end-effector camera, the method further includes: building a target monocular vision servo system based on a preset ROS environment; and obtaining the real-time joint position of the robotic arm according to the target monocular vision servo system.
[0035] It should be noted that before acquiring image data of the target object in real time through a pre-set robotic arm end-effector camera, the embodiments of this application also require the construction of a standard monocular vision servo system, such as... Figure 2 As shown, the specific process is as follows:
[0036] 1. Camera selection and installation: Select a suitable monocular camera and install and adjust it according to the camera manufacturer's recommendations to ensure that the camera can acquire clear images;
[0037] 2. ROS Environment Setup: Setting up the ROS (Robot Operating System) environment, including installing ROS and configuring the ROS workspace;
[0038] 3. Camera driver and ROS node configuration: Install the camera's ROS driver and configure the ROS node to receive camera image data;
[0039] It is important to note that after configuring the camera driver and ROS node, you need to configure the ROS node using a monocular or multi-view camera to receive camera image data and prepare for subsequent visual servoing tasks. In this step, you need to perform camera calibration, image acquisition settings, and camera driver configuration.
[0040] 4. OpenCV Integration and Image Processing: Integrate the OpenCV library into ROS and write code to implement image processing functions such as image preprocessing and feature extraction. For example, the embodiments of this application can perform image preprocessing according to task requirements, such as cropping, scaling, and color space conversion, to extract the required image features.
[0041] 5. TF Library Usage: Use the TF library in ROS for coordinate system transformation in order to process the position information of the robotic arm's end effector;
[0042] 6. Camera Calibration: Use the calibration board tool to calibrate the camera to obtain its intrinsic and extrinsic parameters for subsequent coordinate transformation and attitude estimation.
[0043] 7. Real-time image display: Write ROS node code to display the real-time acquired image data in the graphical interface so that users can observe and debug in real time;
[0044] 8. Data Transmission and Recording: Ensure the transmission and recording of camera images and other relevant data; this includes the following two steps:
[0045] (1) Create a ROS Bag file to record nodes: Use the ROS Bag tool to create a sample Bag file to record the data in the nodes; ensure that the Bag file contains timestamps for subsequent data synchronization;
[0046] (2) Data transmission: Joint position information and camera image data are transmitted through ROS topics, and these data are recorded in Bag files.
[0047] Furthermore, embodiments of this application may select a method for recording joint angles or end effector pose trajectories, choosing to record the angles of all joints or only the pose of the end effector based on the robot arm and task requirements; and write a ROS node that subscribes to the robot arm's joint angles or end effector pose topics, and saves this data to a ROSBag file; in addition, embodiments of this application also require the use of the ROSBag API in the node to create a Bag file instance for recording data, ensuring that the Bag file contains a timestamp for subsequent data synchronization;
[0048] It should be noted that, in the nodes, embodiments of this application can write the joint angle or end effector pose data of each time step into the ROSBag file, and at the same time, subscribe to the camera image topic and write the image data into the Bag file as well;
[0049] It should be noted that before writing the joint angle or end effector pose data into the ROS Bag file, the embodiments of this application first need to insert the acquisition of teaching tracking data required for visual servoing. By subscribing to camera image topics and writing the image data into the Bag file, the synchronization of the robot arm's motion trajectory and the camera images in time is ensured for subsequent model training. The start and stop logic of the nodes is also written to ensure that the data acquisition process starts when needed and stops when it ends.
[0050] Subsequently, embodiments of this application can use the built-in joint encoder to write ROS nodes, subscribe to the robotic arm joint angle topic, obtain joint angle information, and convert the joint angles into joint position information using information provided by the robotic arm specifications and documentation. In addition, embodiments of this application also need to write ROS nodes to obtain position information provided by the robotic arm from internal or external sensors. If external sensors are used, necessary coordinate system transformations need to be performed to ensure that the sensor and the robotic arm coordinate system are correctly aligned, and the obtained position information is used for subsequent data processing. Furthermore, ROS nodes are written to subscribe to the joint position information topic, and the joint position information is saved to a ROS Bag file or used for subsequent data processing within the node.
[0051] In step S102, the image standard data is input into the preset target detection model to generate the image features corresponding to the image standard data and the first coordinate value of the target object in the camera coordinate system. Based on the first coordinate value and the preset real-time joint position of the robotic arm, the second coordinate value of the target object in the camera coordinate system at the end of the robotic arm is calculated, and the second coordinate value is transformed into the world coordinate system to generate the trajectory data of the target object in the world coordinate system.
[0052] Furthermore, embodiments of this application can input preprocessed image data into a Faster R-CNN model to obtain the real-time position of the robotic arm end effector (i.e., the first coordinate value of the target object in the camera coordinate system) and bounding box information, and use the information provided by the model to achieve real-time tracking of the robotic arm end effector; secondly, embodiments of this application can use preset real-time joint positions of the robotic arm to correspond the position of the robotic arm end effector in the camera coordinate system with the joint position information, so as to calculate the second coordinate value of the target object in the camera coordinate system of the robotic arm end effector, and perform a world coordinate system transformation operation on the second coordinate value to generate trajectory data of the target object in the world coordinate system.
[0053] Subsequently, embodiments of this application can associate the position information of the robotic arm end effector acquired in real time with the corresponding image features and store the associated data. For example, the data can be saved to a ROS Bag file for subsequent imitation learning. In addition, embodiments of this application also need to ensure that the position information of the robotic arm end effector and the image feature data are synchronized in time, and verify the accuracy of the association algorithm to ensure that the correspondence between the joint position and the image data is correct.
[0054] Optionally, in one embodiment of this application, generating image features corresponding to the image standard data and the first coordinate value of the target object in the camera coordinate system based on image standard data and a preset target detection model includes: extracting initial image features from the image standard data; performing feature point detection on the initial image features to obtain multiple key points corresponding to the initial image features, and generating a feature descriptor for each key point among the multiple key points; performing feature matching on the feature descriptors to obtain feature matching results, and generating image features based on the feature matching results and a preset filtering strategy; and generating position information and bounding box information of the robotic arm end-effector camera based on the target detection model and image standard data, so as to determine the first coordinate value of the target object in the camera coordinate system based on the position information and bounding box information.
[0055] It should be noted that the specific process by which this application embodiment performs feature analysis on the image to determine the first coordinate value of the target object in the camera coordinate system is as follows:
[0056] 1. Deep learning feature extraction:
[0057] (1) Image feature extraction was performed using a pre-trained VGG16 model;
[0058] (2) Load the weights and structure of the pre-trained model and remove the fully connected layers of the model to preserve the spatial information of the image;
[0059] (3) Input the image into the model and obtain the output of the convolutional layer as the feature representation of the image;
[0060] 2. SIFT Feature Point Detection:
[0061] (1) Apply the SIFT algorithm to the original image to detect feature points;
[0062] (2) The SIFT scale-invariant feature transformation image feature detection algorithm extracts stable and discriminative key points;
[0063] (3) For each key point, extract its location, scale and orientation information;
[0064] 3. Feature descriptor generation:
[0065] (1) No additional descriptor generation is needed for the features extracted by CNN, because the convolutional neural network has already extracted the high-level feature representation of the image;
[0066] (2) For the keypoints detected by SIFT, generate descriptors around each keypoint;
[0067] (3) A descriptor is a numerical description of the image information of the area surrounding a key point, which is used for subsequent feature matching;
[0068] 4. Feature matching:
[0069] It is important to note that before feature matching, image feature analysis is performed first. Important features in the image are extracted using a convolutional neural network. Then, the SIFT algorithm is used for feature point detection and descriptor generation, followed by feature matching and result filtering.
[0070] (1) Use the generated feature descriptors for feature matching;
[0071] (2) For each feature extracted by the CNN, the Euclidean nearest neighbor search method is used for matching;
[0072] (3) For the feature descriptors generated by SIFT, a matching algorithm based on the nearest neighbor matching descriptor similarity is used;
[0073] 5. Matching result filtering:
[0074] (1) For feature matching results, some filtering mechanisms are applied to remove incorrect matches;
[0075] (2) Perform distance threshold filtering and nearest neighbor and second nearest neighbor distance ratio filtering to generate image features;
[0076] 6. The embodiments of this application can generate position information and bounding box information of the end-effector camera of the robotic arm based on the target detection model and image standard data, so as to determine the first coordinate value of the target object in the camera coordinate system according to the position information and bounding box information.
[0077] Optionally, in one embodiment of this application, the second coordinate value of the target object in the camera coordinate system at the end of the robotic arm is calculated based on the first coordinate value and the preset real-time joint position of the robotic arm, and a world coordinate system transformation operation is performed on the second coordinate value to generate trajectory data of the target object in the world coordinate system. This includes: determining the correspondence between the first coordinate value and the real-time joint position of the robotic arm, and obtaining the second coordinate value of the target object based on the correspondence and the real-time joint position of the robotic arm; determining the static transformation relationship between the camera coordinate system and the camera coordinate system at the end of the robotic arm, establishing a world coordinate system based on the static transformation relationship, and transforming the second coordinate value to the world coordinate system to generate trajectory data of the target object in the world coordinate system.
[0078] In actual implementation, embodiments of this application can use the TF library in ROS to ensure the correct transformation between the joint coordinate system and the camera coordinate system of the robotic arm, and publish the static transformation relationship between the joint coordinate system and the camera coordinate system in ROS.
[0079] Furthermore, embodiments of this application also require writing ROS nodes, subscribing to topics related to camera images and joint position information, converting joint position information into coordinates in the camera coordinate system, and associating it with camera images; within the node, the associated joint positions and camera images, etc., are saved to a ROS Bag file; simultaneously, embodiments of this application require ensuring that joint position information and camera image data are synchronized in time to prevent data inconsistency, and verifying the accuracy of coordinate system transformation to ensure that the transformed data corresponds correctly in the same coordinate system.
[0080] Subsequently, embodiments of this application also require the establishment of a world coordinate system to obtain trajectory data of the target object in the world coordinate system. Specifically, the process of establishing a world coordinate system in embodiments of this application is as follows:
[0081] 1. Prepare the calibration plate:
[0082] (1) Use a calibration plate of known size, with squares or dots of known size on the plane;
[0083] (2) Place the calibration plate in the working area of the robotic arm and the camera to ensure that the calibration plate can be seen by the camera throughout the scene;
[0084] 2. Acquire calibration images:
[0085] (1) Images of the calibration board are captured by a camera at different positions and orientations;
[0086] (2) Ensure that the calibration board occupies enough pixels in the image and that the feature points on the calibration board can be accurately detected;
[0087] 3. Feature point extraction:
[0088] (1) Use corner detection image processing algorithm to extract feature points in calibration board image;
[0089] (2) The coordinates of the feature points should correspond to the actual positions of the feature points on the calibration plate;
[0090] 4. Camera calibration:
[0091] (1) Use the actual size information of the calibration plate and the feature point information collected by the camera to calibrate the camera's intrinsic parameters and distortion parameters;
[0092] (2) Use the calibrateCamera function in OpenCV to calibrate the camera;
[0093] 5. End-effector calibration;
[0094] (1) Place the calibration plate in the working area of the end effector of the robotic arm;
[0095] (2) Record the joint angles of the robotic arm or the pose of the end effector in different positions and postures;
[0096] 6. Camera and robotic arm linkage:
[0097] (1) Using the results of camera calibration and robotic arm end effector calibration, establish the transformation relationship between the camera coordinate system and the robotic arm end effector coordinate system;
[0098] (2) It is represented by a coordinate transformation matrix, which includes translation and rotation information;
[0099] 7. Publish the world coordinate system. In ROS, use the tf library to publish the static transformation relationship between the camera coordinate system and the robot arm end effector coordinate system to establish the world coordinate system.
[0100] In step S103, control command data for the robotic arm is obtained based on the trajectory data. A training dataset is established based on the control command data and image features. A preset multilayer perceptron is trained using the training dataset to generate a behavior cloning model. The behavior cloning model and a preset deep deterministic policy gradient model are integrated to construct a reinforcement learning model. The reinforcement learning model is then used to obtain teaching data for the robotic arm's imitation learning.
[0101] Furthermore, embodiments of this application can obtain robotic arm control command data based on trajectory data, and construct a training dataset by combining image features, so as to train a multilayer perceptron through the training dataset, generate a behavior clone model, and integrate the behavior clone model and the deep deterministic policy gradient model to construct a reinforcement learning model, so as to obtain teaching data for robotic arm imitation learning using the reinforcement learning model.
[0102] Therefore, the embodiments of this application combine behavioral cloning and reinforcement learning, making full use of the advantages of teaching data and reinforcement learning, making the model more adaptable and capable of learning. Using the MPC method for online control and optimization, the control strategy of the robotic arm can be dynamically adjusted in a real-time environment, which can adapt well to environmental changes and uncertainties.
[0103] Optionally, in one embodiment of this application, a pre-defined multilayer perceptron is trained using a training dataset to generate a behavior clone model. The behavior clone model and a pre-defined deep deterministic policy gradient model are then integrated to construct a reinforcement learning model. This reinforcement learning model is used to acquire teaching data for robotic arm imitation learning. The process includes: training the multilayer perceptron based on a target loss function, a pre-defined supervised learning strategy, and a training dataset to obtain the behavior clone model; constructing a deep deterministic policy gradient model based on a pre-defined state space, action space, and reward function; establishing a reinforcement learning model based on the behavior clone model and the deep deterministic policy gradient model; controlling the reinforcement learning model online using a pre-defined model predictive control strategy to obtain model predictive control results; and generating motion commands for the robotic arm's end-effector camera based on the model predictive control results. These motion commands are then used to control the robotic arm's end-effector camera to acquire teaching data for robotic arm imitation learning.
[0104] In the specific implementation process, the embodiments of this application need to train a behavior cloning model and perform reinforcement learning integration to obtain teaching data for the robotic arm's imitation learning. The specific behavior cloning model training process is as follows:
[0105] 1. Data preparation:
[0106] (1) Using the results obtained from the teaching data tracking, obtain image features and corresponding robotic arm control command data;
[0107] (2) Ensure that the dataset contains diverse scene and motion examples to improve the generalization of the model;
[0108] 2. Data preprocessing:
[0109] (1) Perform necessary preprocessing on image features, including normalization, scaling or color space conversion, to meet the input requirements of the model;
[0110] (2) Normalize the control command data of the robotic arm to better adapt to the neural network model;
[0111] 3. Model Architecture Design:
[0112] (1) Construct a multilayer perceptron neural network model and define an appropriate number of hidden layers and neurons;
[0113] (2) Take the image features as input and output the control commands for the robotic arm;
[0114] (3) Select network parameters such as activation function and weight initialization method;
[0115] 4. Loss function selection;
[0116] (1) Mean Square Error (MSE) is used as the loss function to measure the difference between the model output and the actual robotic arm motion in the teaching data;
[0117] (2) The MSE loss function calculates the squared error between the model output and the target value in order to perform supervised learning;
[0118] 5. Model compilation;
[0119] (1) Compile the neural network model, select the Adam optimizer, and set the loss function to MSE;
[0120] (2) Specify the root mean square error evaluation index to monitor model performance;
[0121] 6. Model training: Using the prepared teaching dataset, train the neural network model through supervised learning, taking image features as input and updating the model parameters through backpropagation by comparing them with real robotic arm control commands.
[0122] (1) Using the prepared teaching dataset, train the neural network model through supervised learning;
[0123] (2) In each training batch, the image features are used as input and compared with the real robotic arm control commands. The model parameters are updated through backpropagation.
[0124] 7. Model evaluation;
[0125] (1) Use the validation set to evaluate the performance of the model and monitor the downward trend of the loss function;
[0126] (2) The effectiveness of the model is further verified by visualizing the prediction results of the model and the actual movement trajectory of the robotic arm;
[0127] 8. After achieving satisfactory performance, save the trained behavior clone model for future application in real-world environments.
[0128] Furthermore, the specific process of reinforcement learning ensemble in the embodiments of this application is as follows:
[0129] 1. DDPG Deep Reinforcement Learning Algorithm:
[0130] (1) Implement the Deep Deterministic Policy Gradient (DDPG) algorithm, which is suitable for reinforcement learning problems with continuous action spaces;
[0131] (2) Define the state space, action space, reward function, and other task-related parameters;
[0132] 2. Behavioral cloning model:
[0133] (1) Based on the teaching data obtained by tracking, a behavior cloning model is established to imitate the teaching data. This model is a deep learning model for learning the mapping from state to action.
[0134] 3. Integration of DDPG and behavioral cloning:
[0135] (1) Integrate the DDPG algorithm with the behavior cloning model to form a hybrid strategy;
[0136] (2) During training, DDPG is responsible for exploring the environment, and behavioral clone is responsible for imitating the teaching data. The two work together to update the control strategy.
[0137] 4. Online control and optimization using the MPC method:
[0138] (1) At each time step, online control and optimization are performed using the Model Predictive Control (MPC) method;
[0139] (2) Based on the current state and the control strategies provided by DDPG and behavioral cloning model, MPC predicts a series of future action sequences and selects the optimal sequence;
[0140] 5. Control command generation:
[0141] (1) Based on the results of MPC, generate actual control commands as motion commands for the end effector of the robotic arm;
[0142] (2) Convert the motion sequence output by MPC into actual joint angles or end effector poses;
[0143] 6. Adjust control strategies online:
[0144] (1) During the training of DDPG and behavioral clones, the control strategy is adjusted in real time to adapt to environmental changes and uncertainties;
[0145] (2) Use an experience playback mechanism to learn from historical experience in order to improve the robustness of the control strategy;
[0146] 7. Training and optimization:
[0147] (1) During the training process, continuously optimize DDPG and behavior clone model to improve the performance of reinforcement learning;
[0148] (2) Use the new experience from online data collection to train the model and update the model parameters.
[0149] Furthermore, it should be noted that, in the actual process of acquiring and tracking teaching data, the embodiments of this application may also be implemented based on the following steps:
[0150] 1. Online estimation of image Jacobian matrix based on improved Kalman filter:
[0151] (1) Use Kalman filtering to estimate the noise covariance matrix of the state model and dynamically adjust the model parameters in a recursive manner to improve tracking performance;
[0152] (2) By selecting the learning statistics as the adaptive factor, the filtering gain can be dynamically adjusted according to environmental noise and external interference, thereby improving the robustness of the system to complex scenarios.
[0153] (3) Combining adaptive robust Kalman filtering and Kalman filtering based on maximum correlation entropy to meet the needs of image feature tracking under different noise conditions and improve the accuracy and stability of imitation learning.
[0154] 2. Hybrid localization control combining a smooth variable structure filter and a two-way extreme learning machine;
[0155] (1) Improve the system’s robustness to noise and interference by using a smooth variable structure filter algorithm to ensure that the acquired teaching data is of high quality and reliability;
[0156] (2) The bidirectional extreme learning machine algorithm is used to estimate the nonlinear mapping function between image features and interaction matrix, which further improves the accuracy and real-time performance of data tracking;
[0157] 3. Solve image occlusion and feature interference problems:
[0158] (1) A dual adaptive strong tracking Kalman filter is introduced. By adjusting the image observation data, the visual state of occlusion / interference and the image Jacobian matrix are effectively estimated, thereby improving the reliability and accuracy of the teaching data.
[0159] (2) Design a dual closed-loop image tracking control method based on occlusion / interference, combining an outer loop velocity controller based on proportional-derivative and sliding mode algorithms and an inner loop joint controller based on adaptive sliding mode algorithm to realize dynamic adjustment of the robotic arm motion and ensure the real-time visibility of image features during the motion process.
[0160] Therefore, the embodiments of this application are based on image tracking control using an adaptive dynamic programming algorithm. The adaptive dynamic programming algorithm optimizes the energy consumption of the robotic arm's vision servo system, ensuring the stability and efficiency of the system during the acquisition of teaching data.
[0161] The execution logic of the robotic arm imitation learning teaching data acquisition method of this application is described below through a specific embodiment.
[0162] The specific execution logic of the robotic arm imitation learning teaching data acquisition method in this application is as follows:
[0163] 1. Determine the range of teaching movements and camera settings:
[0164] Before starting the demonstration, determine the range of human movement and set up a camera to ensure that the entire movement process can be captured;
[0165] 2. Obtain teaching data:
[0166] The video of human demonstration movements is captured by a camera and converted into an image sequence;
[0167] 3. Preprocessing image data:
[0168] The captured image sequences are preprocessed, including denoising, image enhancement, and edge detection, to improve the accuracy and stability of tracking.
[0169] 4. Online estimation of image Jacobian matrix based on improved Kalman filter:
[0170] During the teaching data tracking process, an improved Kalman filter algorithm is used to dynamically estimate the noise covariance matrix of the state model in order to improve tracking performance;
[0171] 5. Hybrid localization control algorithm combining smooth variable structure filter and bidirectional extreme learning machine:
[0172] By combining a smooth variable structure filter and a bidirectional extreme learning machine, image features are localized and controlled to improve the system's robustness to noise and ensure the real-time visibility of image features.
[0173] 6. Handling image occlusion and feature interference issues:
[0174] The dual adaptive strong tracking Kalman filter and the dual closed-loop image tracking control method based on occlusion / interference are used to solve the problems of image occlusion and feature interference, thereby improving tracking accuracy and robustness.
[0175] 7. Image tracking control based on adaptive dynamic programming:
[0176] An algorithm based on adaptive dynamic programming is applied to optimize the energy consumption of the robotic arm's vision servo system, ensuring the system's stability and efficiency during the tracking process;
[0177] 8. Record teaching data:
[0178] The teaching data obtained through tracking is recorded and used as input for the robotic arm's imitation learning, so as to facilitate subsequent learning and imitation processes.
[0179] In summary, the embodiments of this application achieve end-to-end learning through deep learning models, such as convolutional neural networks and multilayer perceptrons, and directly map image features to control commands for the robotic arm, reducing the complexity of manually designed features. Furthermore, by combining visual information with joint position information, this application can more comprehensively understand the robotic arm's motion state in three-dimensional space, improving the accuracy of understanding and controlling the robotic arm's motion. Moreover, by using the Faster R-CNN algorithm for real-time detection and tracking, this application can capture changes in the robotic arm's end effector under different scenarios, increasing the diversity of teaching data and improving the model's generalization ability.
[0180] It is understood that this application acquires teaching data through a visual servoing system, which can provide high-precision position information of the robotic arm's end effector, thereby generating high-quality and highly realistic teaching data. Furthermore, by employing image feature analysis and visual detection algorithms, it can perceive the position of the robotic arm's end effector in the environment in real time, enabling real-time control of the robotic arm. This ensures that the acquisition of teaching data is real-time and can be quickly adjusted according to changes in the environment. Therefore, by combining real-time perception and learning capabilities, this application is more adaptable, generalizable, and flexible than traditional teaching data tracking methods.
[0181] According to the method for acquiring teaching data for robotic arm imitation learning proposed in this application, image data of the target object collected in real time by the end-effector camera of the robotic arm is acquired and preprocessed to generate standard image data that meets preset conditions. The standard image data is input into a preset target detection model to generate image features corresponding to the standard image data and the first coordinate value of the target object in the camera coordinate system. The second coordinate value of the target object in the camera coordinate system of the end-effector of the robotic arm is calculated based on the first coordinate value and the preset real-time joint position of the robotic arm. The second coordinate value is then transformed into the world coordinate system to generate trajectory data of the target object in the world coordinate system. The robotic arm control command data is obtained based on the trajectory data. Based on the robotic arm control command data and image features, a training dataset is established to train a preset multilayer perceptron, generate a behavior cloning model, and integrate the behavior cloning model and a preset deep deterministic policy gradient model to construct a reinforcement learning model. The reinforcement learning model is then used to acquire teaching data for robotic arm imitation learning, thereby greatly improving the model's flexibility and environmental adaptability, and improving the model's generalization performance.
[0182] Secondly, the apparatus for acquiring robotic arm imitation learning teaching data according to an embodiment of this application is described with reference to the accompanying drawings.
[0183] Figure 3 This is a block diagram of a device for acquiring teaching data for robotic arm imitation learning according to an embodiment of this application.
[0184] like Figure 3 As shown, the robotic arm's learning and teaching data acquisition device 10 includes: a data acquisition module 100, a conversion module 200, and an acquisition module 300.
[0185] The acquisition module 100 is used to acquire image data of the target object in real time from the end-effector camera of the robotic arm, and to preprocess the image data to generate image standard data that meets preset conditions.
[0186] The conversion module 200 is used to input image standard data into a preset target detection model to generate image features corresponding to the image standard data and the first coordinate value of the target object in the camera coordinate system. Based on the first coordinate value and the preset real-time joint position of the robotic arm, the second coordinate value of the target object in the camera coordinate system at the end of the robotic arm is calculated, and the second coordinate value is converted to the world coordinate system to generate trajectory data of the target object in the world coordinate system.
[0187] The acquisition module 300 is used to obtain robotic arm control command data based on trajectory data, and to establish a training dataset based on the robotic arm control command data and image features. The training dataset is used to train a preset multilayer perceptron to generate a behavior clone model. The behavior clone model and a preset deep deterministic policy gradient model are integrated to construct a reinforcement learning model, so as to obtain teaching data for robotic arm imitation learning using the reinforcement learning model.
[0188] Optionally, in one embodiment of this application, the robotic arm imitation learning teaching data acquisition device 10 of this application embodiment further includes: a construction module and a position module.
[0189] The module is used to build a target monocular vision servo system based on a preset ROS environment before acquiring image data of the target object in real time through a preset robotic arm end-effector camera.
[0190] The position module is used to obtain the real-time joint position of the robotic arm based on the target monocular vision servo system.
[0191] Optionally, in one embodiment of this application, the conversion module 200 includes: an extraction unit, a detection unit, a matching unit, and a first determination unit.
[0192] The extraction unit is used to extract the initial image features from the standard image data.
[0193] The detection unit is used to perform feature point detection on the initial image features to obtain multiple key points corresponding to the initial image features, and to generate a feature descriptor for each key point among the multiple key points.
[0194] The matching unit is used to perform feature matching operations on the feature descriptors, obtain the feature matching results, and generate image features based on the feature matching results and a preset filtering strategy.
[0195] The first determining unit is used to generate position information and bounding box information of the end-effector camera of the robotic arm based on the target detection model and image standard data, so as to determine the first coordinate value of the target object in the camera coordinate system according to the position information and bounding box information.
[0196] Optionally, in one embodiment of this application, the conversion module 200 further includes a second determining unit and a third determining unit.
[0197] The second determining unit is used to determine the correspondence between the first coordinate value and the real-time joint position of the robotic arm, and to obtain the second coordinate value of the target object based on the correspondence and the real-time joint position of the robotic arm.
[0198] The third determining unit is used to determine the static transformation relationship between the camera coordinate system and the end-effector camera coordinate system of the robotic arm, establish a world coordinate system based on the static transformation relationship, and transform the second coordinate values into the world coordinate system to generate trajectory data of the target object in the world coordinate system.
[0199] Optionally, in one embodiment of this application, the acquisition module 300 includes: a training unit, a setup unit, and a control unit.
[0200] The training unit is used to train a multilayer perceptron based on the target loss function, a pre-defined supervised learning strategy, and a training dataset to obtain a behavior cloning model.
[0201] The unit is used to construct a deep deterministic policy gradient model based on a preset state space, action space, and reward function, and to build a reinforcement learning model based on the behavior cloning model and the deep deterministic policy gradient model.
[0202] The control unit is used to control the reinforcement learning model online through a preset model prediction control strategy, obtain the model prediction control result, and generate motion commands for the robotic arm end-effector camera based on the model prediction control result, so as to use the motion commands to control the robotic arm end-effector camera to obtain teaching data for the robotic arm's imitation learning.
[0203] It should be noted that the foregoing explanation of the method for acquiring teaching data for robotic arm imitation learning also applies to the device for acquiring teaching data for robotic arm imitation learning in this embodiment, and will not be repeated here.
[0204] The device for acquiring teaching data for robotic arm imitation learning according to the embodiments of this application includes an acquisition module for acquiring image data of a target object collected in real time by a camera at the end of the robotic arm, and preprocessing the image data to generate standard image data that meets preset conditions; a conversion module for inputting the standard image data into a preset target detection model to generate image features corresponding to the standard image data and the first coordinate value of the target object in the camera coordinate system, calculating the second coordinate value of the target object in the camera coordinate system at the end of the robotic arm based on the first coordinate value and the preset real-time joint position of the robotic arm, and performing a world coordinate system transformation operation on the second coordinate value to generate trajectory data of the target object in the world coordinate system; and an acquisition module for obtaining robotic arm control command data based on the trajectory data, and establishing a training dataset based on the robotic arm control command data and image features, so as to train a preset multilayer perceptron through the training dataset, generate a behavior cloning model, and integrate the behavior cloning model and a preset deep deterministic policy gradient model to construct a reinforcement learning model, so as to use the reinforcement learning model to acquire teaching data for robotic arm imitation learning, thereby greatly improving the flexibility and environmental adaptability of the model and improving the generalization performance of the model.
[0205] Figure 4 A schematic diagram of the structure of an electronic device provided in an embodiment of this application. The electronic device may include:
[0206] The memory 401, the processor 402, and the computer program stored on the memory 401 and capable of running on the processor 402.
[0207] When the processor 402 executes the program, it implements the method for acquiring robotic arm imitation learning teaching data provided in the above embodiments.
[0208] Furthermore, electronic devices also include:
[0209] Communication interface 403 is used for communication between memory 401 and processor 402.
[0210] The memory 401 is used to store computer programs that can run on the processor 402.
[0211] The memory 401 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk storage device.
[0212] If the memory 401, processor 402, and communication interface 403 are implemented independently, then the communication interface 403, memory 401, and processor 402 can be interconnected via a bus to complete communication between them. The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be divided into address buses, data buses, control buses, etc. For ease of representation, Figure 4 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.
[0213] Optionally, in a specific implementation, if the memory 401, processor 402, and communication interface 403 are integrated on a single chip, then the memory 401, processor 402, and communication interface 403 can communicate with each other through an internal interface.
[0214] Processor 402 may be a central processing unit (CPU), an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of this application.
[0215] This application also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the above-described method for acquiring teaching data for robotic arm imitation learning.
[0216] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.
[0217] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this application, "N" means at least two, such as two, three, etc., unless otherwise explicitly specified.
[0218] Any process or method described in the flowchart or otherwise herein can be understood as representing a module, segment, or portion of code comprising one or N executable instructions for implementing custom logic functions or processes, and the scope of the preferred embodiments of this application includes additional implementations in which functions may be performed not in the order shown or discussed, including substantially simultaneously or in reverse order depending on the functions involved, as should be understood by those skilled in the art to which embodiments of this application pertain.
[0219] The logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a processor-included system, or other system that can fetch and execute instructions from, an instruction execution system, apparatus, or device). For the purposes of this specification, "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transmit programs for use by, or in conjunction with, an instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of computer-readable media include: an electrical connection having one or more wires (electronic device), a portable computer disk drive (magnetic device), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Alternatively, the computer-readable medium may be paper or other suitable media on which the program can be printed, since the program can be obtained electronically by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in a computer memory.
[0220] It should be understood that the various parts of this application can be implemented using hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods can be implemented using software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.
[0221] Those skilled in the art will understand that all or part of the steps of the methods in the above embodiments can be implemented by a program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.
[0222] Furthermore, the functional units in the various embodiments of this application can be integrated into a processing module, or each unit can exist physically separately, or two or more units can be integrated into a module. The integrated module can be implemented in hardware or as a software functional module. If the integrated module is implemented as a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.
[0223] The storage medium mentioned above can be a read-only memory, a disk, or an optical disk, etc. Although embodiments of this application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting this application. Those skilled in the art can make changes, modifications, substitutions, and variations to the above embodiments within the scope of this application.
Claims
1. A method for acquiring teaching data for robot arm imitation learning, characterized by comprising: Includes the following steps: The image data of the target object is acquired in real time by the end-effector camera of the robotic arm, and the image data is preprocessed to generate image standard data that meets preset conditions. The image standard data is input into a preset target detection model to generate image features corresponding to the image standard data and the first coordinate value of the target object in the camera coordinate system. The second coordinate value of the target object in the camera coordinate system at the end of the robotic arm is calculated based on the first coordinate value and the real-time joint position of the robotic arm. The second coordinate value is then transformed into the world coordinate system to generate trajectory data of the target object in the world coordinate system. The trajectory data includes the results obtained by tracking using teaching data; Based on the trajectory data, control command data for the robotic arm is obtained. Based on the control command data and the image features, a training dataset is established to train a preset multilayer perceptron, generate a behavior cloning model, and integrate the behavior cloning model and a preset deep deterministic policy gradient model to construct a reinforcement learning model. The reinforcement learning model is then used to obtain teaching data for the robotic arm's imitation learning. The process of training a preset multilayer perceptron using the training dataset to generate a behavior clone model, and integrating the behavior clone model with a preset deep deterministic policy gradient model to construct a reinforcement learning model, and using the reinforcement learning model to obtain teaching data for robotic arm imitation learning, includes: The multilayer perceptron is trained based on the target loss function, the preset supervised learning strategy, and the training dataset to obtain the behavior cloning model. The deep deterministic policy gradient model is constructed based on the preset state space, action space, and reward function, and the reinforcement learning model is established based on the behavior cloning model and the deep deterministic policy gradient model. The reinforcement learning model is controlled online using a preset model prediction control strategy to obtain model prediction control results. Based on the model prediction control results, motion commands are generated for the robotic arm end-effector camera to acquire teaching data for robotic arm imitation learning.
2. The method of claim 1, wherein, Before acquiring real-time image data of the target object via a pre-set end-effector camera, the process also includes: Build a target monocular vision servoing system based on the preset ROS environment; The real-time joint position of the robotic arm is obtained based on the target monocular vision servo system.
3. The method of claim 2, wherein, The step of generating image features corresponding to the image standard data and the first coordinate value of the target object in the camera coordinate system based on the image standard data and the preset target detection model includes: Extract the initial image features from the image standard data; A feature point detection operation is performed on the initial image features to obtain multiple key points corresponding to the initial image features, and a feature descriptor is generated for each of the multiple key points. Perform a feature matching operation on the feature descriptor to obtain the feature matching result, and generate the image features based on the feature matching result and a preset filtering strategy; Based on the target detection model and the image standard data, the position information and bounding box information of the end-effector camera of the robotic arm are generated, so as to determine the first coordinate value of the target object in the camera coordinate system according to the position information and the bounding box information.
4. The method of claim 3, wherein, The step of calculating the second coordinate value of the target object in the coordinate system of the end-effector camera based on the first coordinate value and the preset real-time joint position of the robotic arm, and performing a world coordinate system transformation operation on the second coordinate value to generate trajectory data of the target object in the world coordinate system, includes: The correspondence between the first coordinate value and the real-time joint position of the robotic arm is determined, and the second coordinate value of the target object is obtained based on the correspondence and the real-time joint position of the robotic arm. The static transformation relationship between the camera coordinate system and the end-effector camera coordinate system of the robotic arm is determined, and the world coordinate system is established based on the static transformation relationship. The second coordinate value is then transformed into the world coordinate system to generate trajectory data of the target object in the world coordinate system. 5.A device for acquiring teaching data for robot arm imitation learning, characterized in that, include: The acquisition module is used to acquire image data of the target object in real time from the end-effector camera of the robotic arm, and to preprocess the image data to generate image standard data that meets preset conditions. The conversion module is used to input the image standard data into a preset target detection model to generate image features corresponding to the image standard data and the first coordinate value of the target object in the camera coordinate system. Based on the first coordinate value and the real-time joint position of the robotic arm, the module calculates the second coordinate value of the target object in the camera coordinate system at the end of the robotic arm and performs a world coordinate system conversion operation on the second coordinate value to generate trajectory data of the target object in the world coordinate system. The trajectory data includes the results obtained by tracking using teaching data; The acquisition module is used to obtain robotic arm control command data based on the trajectory data, and to establish a training dataset based on the robotic arm control command data and the image features, so as to train a preset multilayer perceptron through the training dataset, generate a behavior cloning model, and integrate the behavior cloning model and a preset deep deterministic policy gradient model to construct a reinforcement learning model, so as to use the reinforcement learning model to obtain teaching data for robotic arm imitation learning. The acquisition module includes: The training unit is used to train the multilayer perceptron based on the target loss function, a preset supervised learning strategy, and the training dataset to obtain the behavior cloning model. The establishment unit is used to construct the deep deterministic policy gradient model according to the preset state space, action space and reward function, and to establish the reinforcement learning model based on the behavior cloning model and the deep deterministic policy gradient model; The control unit is used to control the reinforcement learning model online through a preset model prediction control strategy, obtain the model prediction control result, and generate motion commands for the robotic arm end-effector camera based on the model prediction control result, so as to use the motion commands to control the robotic arm end-effector camera to acquire teaching data for robotic arm imitation learning.
6. The apparatus of claim 5, wherein, Also includes: The module is used to build a target monocular vision servoing system based on a preset ROS environment before acquiring image data of the target object in real time through a preset robotic arm end-effector camera. The position module is used to obtain the real-time joint position of the robotic arm according to the target monocular vision servo system.
7. The apparatus according to claim 6, characterized in that, The conversion module includes: The extraction unit is used to extract the initial image features of the image standard data; The detection unit is used to perform feature point detection on the initial image features to obtain multiple key points corresponding to the initial image features, and generate a feature descriptor for each of the multiple key points. A matching unit is used to perform a feature matching operation on the feature descriptor, obtain a feature matching result, and generate the image features based on the feature matching result and a preset filtering strategy. The first determining unit is used to generate position information and bounding box information of the end-effector camera of the robotic arm based on the target detection model and the image standard data, so as to determine the first coordinate value of the target object in the camera coordinate system according to the position information and the bounding box information.
8. The apparatus according to claim 7, characterized in that, The conversion module also includes: The second determining unit is used to determine the correspondence between the first coordinate value and the real-time joint position of the robotic arm, and to obtain the second coordinate value of the target object based on the correspondence and the real-time joint position of the robotic arm. The third determining unit is used to determine the static transformation relationship between the camera coordinate system and the coordinate system of the end-effector camera of the robotic arm, establish the world coordinate system according to the static transformation relationship, and transform the second coordinate value into the world coordinate system to generate trajectory data of the target object in the world coordinate system.
9. An electronic device, characterized in that, include: A memory, a processor, and a computer program stored in the memory and executable on the processor, the processor executing the program to implement the method for acquiring robotic arm imitation learning teaching data as described in any one of claims 1-4.
10. A computer-readable storage medium having a computer program stored thereon, characterized in that, The program is executed by the processor to implement the method for acquiring robotic arm imitation learning teaching data as described in any one of claims 1-4.