A robot free-grasping control method fusing object detection and a robot
By generating temporary stopping points using the robot's camera module and prior knowledge base, and combining RGB and depth images to determine candidate grasping postures, the problem of inaccurate target object grasping in cluttered scenes is solved, achieving fast and accurate target object grasping.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- CHONGQING UNIV
- Filing Date
- 2024-05-15
- Publication Date
- 2026-06-26
AI Technical Summary
In cluttered environments, existing technologies cannot achieve accurate and rapid grasping of target objects, especially since the depth camera on the robot can only obtain target information from a single view, resulting in a low matching success rate.
The robot collects images using its camera module, combines them with a prior knowledge base and an object detection model, generates temporary docking points, adjusts the robot's position to keep the target object within its grasping range, uses RGB and depth images to determine candidate grasping postures, and selects a target grasping posture that is reachable by the robotic claw and will not result in a collision.
It achieves accurate and rapid object grasping in cluttered scenes, using RGB images to identify targets and depth images to determine pose, reducing computational load and improving grasping accuracy and efficiency.
Smart Images

Figure CN118418128B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of robot control technology, and more specifically, to a robot free grasping control method and robot that integrates target detection. Background Technology
[0002] Current grasping detection or 6DoF (6 degrees of freedom) pose estimation methods are mainly divided into two categories: those that do not rely on an experience base and directly output grasping candidate poses for unknown objects; and those that rely on an object's grasping experience base and obtain the grasping pose after identifying and matching the target object. The experience base-free method generates candidate grasping positions from all suitable locations within the field of view, suitable for disordered grasping within a fixed 3D region, but cannot meet the requirement of grasping a specific target in cluttered scenes of arbitrary areas. The experience base-dependent method requires matching the target pose with known 3D models in the experience base after detection to obtain the grasping pose. However, target objects in real-world scenes are complex and varied, and the depth camera on the robot usually only obtains target information from a single view, resulting in a low matching success rate. Therefore, currently, accurate and rapid grasping of target objects in cluttered scenes is not possible. Summary of the Invention
[0003] In view of this, the purpose of this application is to provide a robot free grasping control method and robot that integrates target detection, which can improve the problem of not being able to accurately and quickly grasp target objects in cluttered scenes.
[0004] To achieve the above technical objectives, the technical solution adopted in this application is as follows:
[0005] In a first aspect, embodiments of this application provide a robot free grasping control method that integrates target detection, the method comprising:
[0006] S110, upon receiving an instruction to grasp a target object, a temporary stopping point is generated for the robot to approach the target object based on the first image set captured by the camera module on the robot and the robot's prior knowledge base;
[0007] S120, when the robot navigates to the temporary stopping point, it takes pictures of the target object through the camera module to obtain a second image set;
[0008] S130, based on the RGB image and depth image in the second image set, determine whether the target object is within the grasping range of the robot;
[0009] S140, when the target object is not within the grasping range of the robot, the position of the robot is adjusted based on the position difference between the target object and the grasping range of the robot, so that the target object is within the grasping range of the robot after the adjustment.
[0010] S150, when the target object is within the grasping range of the robot, at least one set of candidate grasping postures is determined based on the point cloud data of the target object in the depth image;
[0011] S160, determine a target grasping posture from at least one set of candidate grasping postures, wherein the target grasping posture is a candidate grasping posture that is reachable by the robot's mechanical gripper without collision.
[0012] S170, control the robot to grasp the target object in the target grasping posture.
[0013] In conjunction with the first aspect, in some alternative implementations, step S110 includes:
[0014] S111, query whether the prior knowledge base records the storage location of the target object;
[0015] S112, when the storage location of the target object does not exist in the prior knowledge base, environmental information is collected through the camera module to obtain the first image set, wherein the first image set includes an aligned first RGB image and a first depth image;
[0016] S113, using a target detection model, detect whether the target object exists in the first RGB image;
[0017] S114, when the target object is not present in the first RGB image, the pitch angle of the camera module is adjusted stepwise within the adjustment range of the pitch angle adjustment mechanism of the camera module, and based on the adjusted pitch angle, steps S112 to S113 are repeated until the adjustment range is exhausted or the target object is determined to exist in the first RGB image.
[0018] S115, when the pitch angle has been traversed and the target object is not found in any of the first RGB images collected, control the robot to rotate in place, and rotate by a specified angle each time. Based on the viewpoint after each rotation, repeat steps S112 to S114 until the robot has rotated one full circle or it is determined that the target object exists in the first RGB image.
[0019] S116, when the target object exists in the first RGB image, based on the positioning box of the target object in the first RGB image, a map region representing the location of the target object is determined from the first depth image corresponding to the first RGB image, and used as the first interest map region;
[0020] S117, Based on the point cloud data of the first attention map region, generate a temporary docking point for the robot to approach the target object;
[0021] S118, or, when the storage location of the target object exists in the prior knowledge base, a temporary docking point is generated for the robot to approach the target object based on the storage location of the target object.
[0022] In conjunction with the first aspect, in some alternative implementations, step S130 includes:
[0023] The first image region where the target object is located is determined from the RGB images of the second image set using an object detection model;
[0024] Based on the position information of the first image region in the RGB image, an image region representing the location of the target object is determined from the depth image of the second image set, and used as the second image region;
[0025] Based on the second map area, determine the position of the center point of the target object's surface;
[0026] If the center point of the target object's surface is within the grasping range, then the target object is determined to be within the robot's grasping range.
[0027] Alternatively, if the center point of the target object's surface is not within the grasping range, then the target object is determined to be outside the robot's grasping range.
[0028] In conjunction with the first aspect, in some alternative implementations, step S140 includes:
[0029] Based on the positional difference between the center point of the target object's surface and the robot's grasping range, an obstacle avoidance algorithm is used to plan a fine-tuned path for the robot to travel to the final stopping point.
[0030] Based on the fine-tuned path, the robot is controlled to travel to the final stopping point. When the robot is at the final stopping point, the target object is within the field of view of the camera module and within the robot's grasping range.
[0031] In conjunction with the first aspect, in some alternative implementations, step S150 includes:
[0032] A first bounding box containing the target object is determined from the depth image, wherein the size and position of the first bounding box are the same as the size and position of the bounding box of the target object in the RGB image;
[0033] Based on the center of the first positioning frame, the first positioning frame is enlarged by a first specified factor to obtain a second positioning frame;
[0034] The region of the second localization box in the depth image is taken as the second region of interest, and the point cloud data of the second region of interest is filtered to obtain the first point cloud dataset.
[0035] The first point cloud dataset is downsampled using voxels, and the size of the voxel downsampling is dynamically adjusted based on the number of points in the downsampled dataset to ensure that the number of points in the downsampled dataset is within a preset range, thus obtaining the set of points of interest, P. f ;
[0036] Based on the center of the first positioning frame, the first positioning frame is reduced by a second specified factor to obtain the third positioning frame;
[0037] Focusing on P f The point cloud located in the third positioning frame is used as the sampling area point cloud P. s ;
[0038] Create the pose model of the robot's mechanical gripper, represented as: g = (q, v) x v y ), g represents the pose of the robotic gripper, q refers to the midpoint position of the bottom of the robotic gripper, v x The direction of travel of the mechanical gripper, v y It refers to passing through point q and being adjacent to v. x Vertical direction;
[0039] Sampling area point cloud P s N grasping points p are selected by sampling from the farthest point. For each point p, based on the normal information of point p and the point cloud around point p, the normal curvature direction and principal curvature direction of the surface corresponding to point p are determined. The reverse normal curvature direction is used as the initial v of the mechanical gripper's movement. x The reverse direction of the maximum principal curvature is used as the initial v for the movement of the mechanical claw. y ;
[0040] By adjusting the initial v y The position q of the bottom midpoint of the robotic gripper is used to adjust the gripper's hand posture, and the relationship between the robotic gripper and the cloud of interest P is calculated. f Based on the collision situation, at least one set of candidate grab poses that do not involve collisions is obtained.
[0041] In conjunction with the first aspect, in some alternative implementations, step S160 includes:
[0042] S161, sort the number of point clouds in the mechanical gripper corresponding to at least one group of candidate grasping postures, and select the candidate grasping posture that is reachable by the mechanical gripper and has the largest number of point clouds as the target grasping posture.
[0043] S162, or, input the point cloud data corresponding to at least one set of the candidate grasping postures into a trained convolutional neural network for grasping posture scoring, obtain the scores of at least one set of the candidate grasping postures, and select the candidate grasping posture that is reachable by the mechanical claw and has the highest score as the target grasping posture.
[0044] In conjunction with the first aspect, in some alternative implementations, step S161 includes:
[0045] S1611, Sort the number of point clouds in the mechanical gripper corresponding to at least one set of candidate grasping postures to obtain a posture sequence;
[0046] S1612, take the candidate grasping posture corresponding to the maximum number of point clouds as the first grasping posture;
[0047] S1613, calculate the pre-grabbing pose based on the pose retraction of the first grasping posture;
[0048] S1614, based on the pre-grasp pose and the environmental information corresponding to the second interest region after voxel downsampling, performs obstacle avoidance path planning for the robotic claw in Cartesian space;
[0049] S1615 If a path is not successfully planned, the first grasping posture is deleted from the posture sequence, and steps S1612 to S1614 are repeated based on the posture sequence after deleting the first grasping posture until a path is successfully planned or all candidate grasping postures in the posture sequence are traversed.
[0050] S1616 If a path is successfully planned, the first grasping posture is taken as the target grasping posture.
[0051] In conjunction with the first aspect, in some alternative implementations, step S162 includes:
[0052] S1621, input the point cloud data corresponding to at least one set of candidate grasping postures into a trained convolutional neural network for grasping posture scoring, obtain the scores of at least one set of candidate grasping postures, and sort the candidate grasping postures based on the scores to obtain a posture sequence.
[0053] S1622, take the candidate grabbing posture corresponding to the maximum score as the second grabbing posture;
[0054] S1623, calculate the pre-grabbing pose based on the pose retraction of the second grasping posture;
[0055] S1624, based on the pre-grasp pose and the environmental information corresponding to the second interest region after voxel downsampling, performs obstacle avoidance path planning for the robotic claw in Cartesian space;
[0056] S1625 If a path is not successfully planned, the second grasping posture is deleted from the posture sequence, and steps S1622 to S1624 are repeated based on the posture sequence after deleting the second grasping posture until a path is successfully planned or all candidate grasping postures in the posture sequence are traversed.
[0057] S1616 If a path is successfully planned, the second grasping posture is taken as the target grasping posture.
[0058] In conjunction with the first aspect, in some alternative implementations, step S170 includes:
[0059] The control parameters of the robot's robotic arm are determined based on the target grasping posture;
[0060] Based on the control parameters, the robotic arm is controlled to move the robotic claw to the part of the target object corresponding to the target grasping posture, and to perform a grasping action to grasp the target object.
[0061] Secondly, this application also provides a robot, which includes a camera module, a pitch angle adjustment mechanism, a mobile chassis, a processor, and a memory. One end of the pitch angle adjustment mechanism is connected to the camera module, and the other end of the pitch angle adjustment mechanism is connected to the mobile chassis. The memory stores a computer program, which, when executed by the processor, causes the robot to perform the above-described method.
[0062] The invention employing the above technical solution has the following advantages:
[0063] In the technical solution provided in this application, upon receiving an instruction to grasp a target object, a temporary stopping point for the robot to approach the target object is generated based on the first image set acquired by the camera module on the robot and the robot's prior knowledge base. This facilitates the robot's rapid location of the target object in various environments, while the prior knowledge base helps reduce computational load. When the robot navigates to the temporary stopping point, it takes a picture of the target object through the camera module. Based on the RGB image and depth image in the second image set, it is determined whether the target object is within the robot's grasping range. If the target object is not within the robot's grasping range, the robot's position is adjusted based on the positional difference between the target object and the robot's grasping range so that the target object is within the robot's grasping range after the adjustment. If the target object is within the robot's grasping range, at least one set of candidate grasping postures is determined based on the point cloud data of the target object in the depth image. A candidate grasping posture that is reachable by the robot's mechanical claw and does not result in a collision is selected as the target grasping posture. The robot is then controlled to grasp the target object in the target grasping posture. Thus, RGB images can be used to accurately identify target objects, and depth images can be used to quickly locate target objects, which helps robots to accurately and quickly grasp target objects. Attached Figure Description
[0064] This application can be further illustrated by the non-limiting embodiments given in the accompanying drawings. It should be understood that the following drawings only illustrate some embodiments of this application and should not be considered as limiting the scope. For those skilled in the art, other related drawings can be obtained from these drawings without any inventive effort.
[0065] Figure 1 This is a flowchart illustrating a robot free grasping control method that integrates target detection, as provided in an embodiment of this application.
[0066] Figure 2 This is a schematic diagram illustrating the process of a robot performing a grasping task, as provided in an embodiment of this application.
[0067] Figure 3 This is a schematic diagram illustrating the principle of robot obstacle avoidance provided in an embodiment of this application.
[0068] Figure 4 This is a schematic diagram of the filtering process for a depth image provided in an embodiment of this application.
[0069] Figure 5 This is a schematic diagram illustrating the principle of the parameterized representation of the gripping posture of the mechanical claw provided in the embodiments of this application.
[0070] Figure 6 This is a schematic diagram of the candidate grasping posture generation process provided in an embodiment of this application.
[0071] Figure 7 This is a schematic diagram of the capture and execution process provided in an embodiment of this application. Detailed Implementation
[0072] The present application will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that similar or identical parts are referred to by the same reference numerals in the drawings or description. Implementations not shown or described in the drawings are forms known to those skilled in the art. In the description of this application, terms such as "first" and "second" are used only to distinguish descriptions and should not be construed as indicating or implying relative importance.
[0073] Please refer to Figure 1 This application provides a robot free grasping control method that integrates target detection. This method can be applied to robots and can realize the free grasping of target objects. The target object can be flexibly defined according to the actual situation, such as a water cup or other objects.
[0074] Understandably, the robot may include a camera module, a pitch adjustment mechanism, a mobile chassis, a processor, and a memory. One end of the pitch adjustment mechanism is connected to the camera module, and the other end is connected to the mobile chassis. The pitch adjustment mechanism can be used to adjust the pitch angle of the camera module. The memory stores a computer program, which, when executed by the processor, causes the robot to perform the steps of the following method.
[0075] Robots can also include robotic arms and grippers for performing grasping tasks. The models of robotic arms and grippers can be flexibly selected according to the actual situation; for example, a six-degree-of-freedom robotic arm can be used, and the gripper can be a two-finger parallel claw.
[0076] It should be noted that, for the sake of convenience and brevity, the specific working process of the robot described above can be referred to the corresponding steps in the following method, and will not be elaborated further here.
[0077] The robot free grasping control method that integrates target detection can include the following steps:
[0078] S110, upon receiving an instruction to grasp a target object, a temporary stopping point is generated for the robot to approach the target object based on the first image set captured by the camera module on the robot and the robot's prior knowledge base;
[0079] S120, when the robot navigates to the temporary stopping point, it takes pictures of the target object through the camera module to obtain a second image set;
[0080] S130, based on the RGB image and depth image in the second image set, determine whether the target object is within the grasping range of the robot;
[0081] S140, when the target object is not within the grasping range of the robot, the position of the robot is adjusted based on the position difference between the target object and the grasping range of the robot, so that the target object is within the grasping range of the robot after the adjustment.
[0082] S150, when the target object is within the grasping range of the robot, at least one set of candidate grasping postures is determined based on the point cloud data of the target object in the depth image;
[0083] S160, determine a target grasping posture from at least one set of candidate grasping postures, wherein the target grasping posture is a candidate grasping posture that is reachable by the robot's mechanical gripper without collision.
[0084] S170, control the robot to grasp the target object in the target grasping posture.
[0085] The steps of the robot free grasping control method based on fused target detection will be described in detail below:
[0086] In this embodiment, step S110 may include:
[0087] S111, query whether the storage location of the target object is recorded in the prior knowledge base. If the storage location of the target object is not found in the prior knowledge base, proceed to step S112; if the storage location of the target object is found in the prior knowledge base, proceed to step S118.
[0088] S112, environmental information is collected through the camera module to obtain the first image set, wherein the first image set includes an aligned first RGB image and a first depth image;
[0089] S113, using the target detection model, detect whether the target object exists in the first RGB image. If the target object does not exist in the first RGB image, proceed to step S114; if the target object exists in the first RGB image, proceed to step S116.
[0090] S114, within the adjustment range of the pitch angle adjustment mechanism of the camera module, the pitch angle of the camera module is adjusted in steps, and based on the adjusted pitch angle, steps S112 to S113 are repeated until the adjustment range is exhausted, or if it is determined that the target object exists in the first RGB image, then proceed to step S116.
[0091] S115, when the pitch angle has been traversed and the target object is not found in any of the first RGB images collected, control the robot to rotate in place, and rotate by a specified angle each time. Based on the viewpoint after each rotation, repeat steps S112 to S114 until the robot has rotated one full circle, or if the target object is found in the first RGB image, proceed to step S116.
[0092] S116, Based on the bounding box of the target object in the first RGB image, determine the image region representing the location of the target object from the first depth image corresponding to the first RGB image, and use it as the first interest image region;
[0093] S117, Based on the point cloud data of the first attention map region, generate a temporary docking point for the robot to approach the target object;
[0094] S118, when the storage location of the target object exists in the prior knowledge base, a temporary docking point is generated for the robot to approach the target object based on the storage location of the target object.
[0095] In this embodiment, the prior knowledge base typically records the storage locations of corresponding objects (such as water cups) in advance. In step S110, the prior knowledge base is first queried to see if it records the storage location of the target object. If the storage location of the target object is not found in the prior knowledge base, the target detection model is then used to detect whether the target object exists in the environment surrounding the robot based on machine vision. This facilitates the rapid detection of whether the target object exists in the current environment. If the prior knowledge base records the storage location of the target object, it helps to reduce the amount of computation.
[0096] In this embodiment, the camera module can be a depth camera, which can acquire RGB images and depth images, and align the RGB images and depth images to obtain aligned RGB images and depth images. The image alignment method is conventional and not specifically limited here. The RGB image is a color image, which can be used for target recognition using machine vision algorithms. The depth image is a point cloud image acquired by the depth camera. The point cloud data in the depth image reflects the distance (or depth) between the object's surface and the depth camera, and can be used to locate the target object.
[0097] In step S113, the target detection model can be a conventional neural network model based on deep learning, such as a YOLO series neural network model, without any specific limitation.
[0098] In step S114, the pitch angle adjustment mechanism is a conventional mechanism for adjusting the pitch angle. In other embodiments, the pitch angle adjustment mechanism can be a three-degree-of-freedom rotation mechanism. Adjusting the pitch angle of the camera module in steps can be understood as starting with the minimum pitch angle and then increasing it by a preset angle. For example, if the adjustment range of the pitch angle adjustment mechanism is 0° to 60°, the camera module can use 30° as the preset angle, allowing it to capture environmental images at three pitch angles: 0°, 30°, and 60°. If, during the traversal, a target object is found in the first RGB image, proceed to step S116. If no target object is found in any of the first RGB images captured after traversing the adjustment range, proceed to step S115.
[0099] In step S115, the specified angle for each rotation can be flexibly determined according to the actual situation, for example, 90°. If the target object exists in the first RGB image captured by the camera module during one rotation, the process proceeds to step S116; if the target object does not exist in any of the first RGB images captured by the camera module during one rotation, the process ends, indicating that there is no target object in the environment, and the robot returns to the starting point or other default position.
[0100] The object detection model has the function of generating bounding boxes for target objects in RGB images. In step S116, the bounding box is typically the smallest rectangular box that selects the target object. In the aligned first RGB image and first depth image, pixels at the same position in both images represent the same point in the environment object. Therefore, the location region of the bounding box in the first RGB image can be used as the location region of the target object in the first depth image, and this location region can be used as the first region of interest.
[0101] In step S117, the point cloud data in the first area of interest carries depth data, which can be used to reflect the distance and relative orientation between the camera module and the target object. If this distance exceeds a preset distance, a temporary docking point can be generated between the target object and the robot. The horizontal distance between this temporary docking point and the target object is less than or equal to the preset distance, and the robot can reach it. The preset distance can be flexibly determined according to the actual situation, for example, 1 meter. If this distance does not exceed the preset distance, the robot's current position can be used as the temporary docking point.
[0102] In step S118, the storage location of the target object, or the location closest to the storage location and accessible to the robot, can serve as a temporary docking point.
[0103] In step S120, the second image set includes aligned RGB images and depth images, which can be referred to as the second RGB image and the second depth image, respectively.
[0104] In this embodiment, step S130 may include:
[0105] S131, the first image region where the target object is located is determined from the RGB image of the second image set by the target detection model. The first image region can be the positioning box for selecting the target object output by the target detection model in the RGB image.
[0106] S132, based on the position information of the first image area in the RGB image, determine the image area representing the location of the target object from the depth image of the second image set, and use it as the second image area. The position area of the second image area in the depth image is the same as the position area of the first image area (or positioning box) in the RGB image.
[0107] S133, Based on the second map area, determine the position of the center point of the target object surface. For example, the center point of the second map area can be used as the center point of the target object surface.
[0108] S134, If the position of the center point of the target object's surface is within the grasping range, then the target object is determined to be within the grasping range of the robot;
[0109] S135, or, if the position of the center point of the target object's surface is not within the grasping range, then it is determined that the target object is not within the robot's grasping range.
[0110] In step S130, the center point of the target object's surface is used as the target object's position reference point. By detecting the depth data of the target object's surface center point in the depth image, it is possible to detect whether the target object is within the robot's grasping range. This simplifies the calculation of "determining whether the target object is within the grasping range" and reduces the computational load.
[0111] In this embodiment, step S140 may include:
[0112] S141, Based on the position difference between the center point of the target object's surface and the robot's grasping range, an obstacle avoidance algorithm is used to plan a fine-tuned path for the robot to travel to the final stopping point;
[0113] S142, based on the fine-tuned path, control the robot to travel to the final stopping point, wherein when the robot is at the final stopping point, the target object is within the field of view of the camera module and the target object is within the robot's grasping range.
[0114] Understandably, when the robot is at a temporary docking point, it may not be suitable for the robot's robotic arm to perform grasping. That is, errors in robot chassis movement and long-distance camera vision may cause the distance between the robot and the target object to exceed the working range of the robotic arm after the robot approaches. Therefore, it is necessary to fine-tune the robot's pose so that the grasping range of the fine-tuned robot can cover the position of the target object.
[0115] In this embodiment, the target object's center point coordinates can be transformed by combining the target object's location information output by target detection and the depth image output by the camera to obtain the spatial position of the target object's surface geometric center point relative to the robot. Then, based on the target object's surface center point position and the robotic arm's grasping range, it is further determined whether the target object is within the robotic arm's grasping range. If it is not within the robotic arm's grasping range, the robot is controlled to fine-tune its pose using visual guidance combined with radar obstacle avoidance. The robot makes motion control decisions based on the visual feedback of the target object's position and the robotic arm's grasping range information to guide the robot chassis movement. To avoid collisions with the environment during pose fine-tuning, the position information of obstacles around the robot acquired by the LiDAR is also considered in the motion control. Figure 3 As shown, four robot directions (front, back, left, and right) are set for fine-tuning. Figure 2 In this context, N represents the number of LiDAR buses (e.g., N=1 indicates a single-line LiDAR). The obstacle information acquired by the LiDAR is divided into a ring-shaped obstacle-sensitive region, which is further divided into four sub-regions: front, back, left, and right. Within this sensitive region, if the number of point clouds of obstacles in a certain direction exceeds a set threshold, it is determined that a near-range obstacle exists in that direction, and the vision-guided robot is prevented from moving in that direction. If the robot chassis supports omnidirectional motion, the robot's fine-tuning direction is not limited to the four directions (front, back, left, and right); it can be further subdivided or even non-polarized to achieve more flexible and precise robot pose fine-tuning. After pose fine-tuning, it ensures that the target object is included in the camera's field of view and is within the working range of the robotic arm, providing accurate target object positioning information for subsequent processes.
[0116] In this embodiment, step S150 may include:
[0117] S151, determine the first positioning box (which can be denoted as positioning box K) where the target object is located from the depth image, wherein the size and position of the first positioning box are the same as the size and position of the positioning box of the target object in the RGB image;
[0118] S152, Based on the center of the first positioning frame, enlarge the first positioning frame by a first specified factor to obtain a second positioning frame (which can be denoted as positioning frame K). fThe specified multiplier can be flexibly determined according to the actual situation, for example, 1.5 times;
[0119] S153, the region of the second location box in the depth image is taken as the second region of interest, and the point cloud data of the second region of interest is filtered to obtain the first point cloud dataset;
[0120] S154, perform voxel downsampling on the first point cloud dataset, and dynamically adjust the voxel downsampling size according to the number of points after downsampling, so that the number of points after downsampling is within a preset range, and obtain the interest point cloud set P. f ;
[0121] S155, Based on the center of the first positioning frame, the first positioning frame is reduced by a second specified factor to obtain a third positioning frame (which can be denoted as positioning frame K). s The second specified multiple can be flexibly determined according to the actual situation, for example, 0.75 times;
[0122] S156 will focus on P f The point cloud located in the third positioning frame is used as the sampling area point cloud P. s ;
[0123] S157, Create the pose model of the robot's mechanical gripper, represented as: g = (q, v) x v y ), g represents the pose of the robotic gripper, q refers to the midpoint position of the bottom of the robotic gripper, v x The direction of travel of the mechanical gripper, v y It refers to passing through point q and being adjacent to v. x Vertical direction;
[0124] S158, Sampling area point cloud P s N grasping points p are selected by sampling from the farthest point. For each point p, based on the normal information of point p and the point cloud around point p, the normal curvature direction and principal curvature direction of the surface corresponding to point p are determined. The reverse normal curvature direction is used as the initial v of the mechanical gripper's movement. x The reverse direction of the maximum principal curvature is used as the initial v for the movement of the mechanical claw. y ;
[0125] S159, by adjusting the initial v y The position q of the bottom midpoint of the robotic gripper is used to adjust the gripper's hand posture, and the relationship between the robotic gripper and the cloud of interest P is calculated. f Based on the collision situation, at least one set of candidate grab poses that do not involve collisions is obtained.
[0126] Please refer to the reference. Figure 4 , Figure 5 and Figure 6 As an example, the mechanical gripper can be a two-finger parallel gripper, and the implementation process of step S150 can be as follows:
[0127] In depth images, point clouds can be represented as a set of three-degree-of-freedom points P. A ∈p i Let i = 1, ..., n. Here, each point p... i (x, y, z) represents only position information and does not include information such as color or normals. After pose fine-tuning in step S140, the target object point cloud data is segmented based on the target object localization information output by the target detection to obtain the interest point cloud set P. f .
[0128] To avoid collisions between the mechanical gripper (or gripper) and objects surrounding the target, or to perform secondary collision detection, the target detection bounding box K is enlarged by a certain ratio to obtain a new bounding box K. f As a filtering mask, the positioning box K is used at this time. f The data includes not only target object information but also information about the surrounding environment, fully considering the collision issues between the generated grasping pose and the environment and target object. Then, the bounding box K in the depth image is... f The point cloud data of the image area is subjected to channel filtering. The x and y channels are filtered according to the bounding box K. f The location is filtered. Filtering the depth image channel corresponding to the RGB image target detection results allows for fast coarse segmentation of the target object. Since the depth information of the target object obtained from a single view of the camera module is limited, the depth information filtering of the z-channel uses the depth of the target object's surface center point as the midpoint, extending a certain distance (e.g., 4 cm) before and after the midpoint as the upper and lower limits of the filtering. The purpose of z-channel filtering is to remove foreground and background elements contained in the detection bounding box obtained from target detection. Simple depth channel filtering can greatly reduce the interference of non-target object point clouds in the area of interest, improving the subsequent processing speed. The x and y channel filters specify the bit box K respectively. f Filtering in both the horizontal and vertical directions of the image area; z-channel filtering refers to filtering within a specified bounding box K. f Filtering in the depth direction.
[0129] To further reduce the computational load after channel filtering, the robot can perform voxel downsampling on the point cloud data. The voxel downsampling size is dynamically adjusted based on the number of points in the final downsampled point cloud. If too many points remain after downsampling, the computational load will increase; if too few points remain, a large amount of surface features will be lost, which is detrimental to subsequent processing. Therefore, adjusting the sampling size appropriately results in the set of points of interest, P, after voxel downsampling. f Focus on P fThis is used to calculate the collision between the gripper and the object. Calculating the collision between the gripper and the object is used multiple times in the subsequent process of generating candidate grasping postures. Compared to calculating gripper collision using global point cloud data, calculating gripper collision within the area of interest makes this invention more efficient. As an example, the specific process can be as follows: Figure 4 As shown.
[0130] The robot can reduce the bounding box K obtained from target detection by a certain ratio to obtain a new bounding box K. s At this time, the positioning box K s Only the central region of the target object is included. For the focus cloud P f According to the positioning frame K s The position is cropped using the x and y channels to obtain the point cloud P of the sampling area. s By analyzing the point cloud P in the sampling area s The neighborhood relationship is determined, the surface normal direction of each point is estimated, and the normal direction of the point cloud is uniformly adjusted to be opposite to the viewpoint direction, i.e., towards the camera direction.
[0131] Please refer to Figure 5 In this embodiment, the grasping pose of the robot's mechanical gripper can be represented as follows: Figure 5 As shown. That is, determining two intersecting straight lines on the robotic gripper forms a unique plane that determines the gripper's posture, and taking the intersection of the two lines determines the gripper's position, thus obtaining a unique grasping pose, as shown. Figure 5 As shown, locking the forward extension direction and parallel direction of the robotic gripper yields a unique grasping pose (which can be described as using a two-line approach to describe the grasping pose). In other embodiments, the grasping pose of the robotic gripper can also be represented as follows: determining three points on the robotic gripper that are not on a straight line constitutes a unique plane to determine the gripper's posture; taking any point to determine the gripper's position yields a unique grasping pose; locking the left and right gripping points and the midpoint of the gripper's bottom yields a unique grasping pose (i.e., using a three-point approach to describe the grasping pose); determining a straight line on the robotic gripper and a point outside the line constitutes a unique plane to determine the gripper's posture; taking that point to determine the gripper's position yields a unique grasping pose; locking any one of the left or right gripping points and the forward extension direction of the gripper yields a unique grasping pose (i.e., using a one-point-one-line approach to describe the grasping pose).
[0132] Based on information such as the gripping point, the object's surface normal, and the directions of maximum and minimum curvature, this invention selects a two-line approach to describe the pose of the robotic gripper, g = (q, v). x ,v y ), where q is the intersection of the two lines, representing the midpoint of the bottom of the two-finger parallel gripper. v x Indicates the forward direction of the two parallel grippers, v y This indicates two parallel grippers and is related to v.x The vertical direction is the direction in which the grippers are parallel, such as... Figure 5 As shown, the midpoint position q provides a three-degree-of-freedom position (x, y, z) for the robotic gripper. x Direction of movement and v y Since the parallel directions are perpendicular to each other, the orthogonal direction vz of the two parallel grippers can be determined using the right-hand rule. x ,v y ,v z This provides three degrees of freedom for the mechanical gripper to grasp.
[0133] Point cloud P in the sampling area s N capture points p are selected by sampling from the farthest point. The normal curvature direction and principal curvature direction of the surface corresponding to point p are estimated by combining the normal information of point p with that of its local neighborhood points. The reverse normal curvature direction is used as v. x The reverse direction of maximum principal curvature is used as the initial v y Within a certain range, by fine-tuning v y And p to adjust the hand posture and the point cloud of the attention area P f Collision scenarios are calculated. Ultimately, a set of candidate grab poses that do not collide with the target object or its surrounding environment is obtained. As an example, the specific process can be as follows: Figure 6 As shown.
[0134] In this embodiment, step S160 may include:
[0135] S161, sort the number of point clouds in the mechanical gripper corresponding to at least one group of candidate grasping postures, and select the candidate grasping posture that is reachable by the mechanical gripper and has the largest number of point clouds as the target grasping posture.
[0136] S162, or, input the point cloud data corresponding to at least one set of the candidate grasping postures into a trained convolutional neural network for grasping posture scoring, obtain the scores of at least one set of the candidate grasping postures, and select the candidate grasping posture that is reachable by the mechanical claw and has the highest score as the target grasping posture.
[0137] In this embodiment, the generated candidate grasping postures are sorted. This can be done by training a convolutional neural network based on deep learning to score the grasping posture, or by simply sorting them according to the number of point clouds within the gripper. Here, we are only sorting the candidate grasping postures, not rigorously selecting the best one. This is because the generated candidate grasping postures only consider the collision situation of objects near the target (object) with the gripper, and do not consider the reachability of the robotic arm in the overall environment. The reachability of the robotic arm is a key indicator of successful target grasping; therefore, the selection of the final actual grasping posture is based on the reachability determination of the robotic arm. The initial sorting only determines the order in which the grasping postures are queried.
[0138] In this embodiment, step S161 may include:
[0139] S1611, Sort the number of point clouds in the mechanical gripper corresponding to at least one set of candidate grasping postures to obtain a posture sequence;
[0140] S1612, take the candidate grasping posture corresponding to the maximum number of point clouds as the first grasping posture;
[0141] S1613, calculate the pre-grabbing pose based on the pose retraction of the first grasping posture;
[0142] S1614, based on the pre-grasp pose and the environmental information corresponding to the second interest region after voxel downsampling, performs obstacle avoidance path planning for the robotic claw in Cartesian space;
[0143] S1615 If a path is not successfully planned, the first grasping posture is deleted from the posture sequence, and steps S1612 to S1614 are repeated based on the posture sequence after deleting the first grasping posture until a path is successfully planned or all candidate grasping postures in the posture sequence are traversed.
[0144] S1616, If a path is successfully planned, the first grasping posture is taken as the target grasping posture.
[0145] In this embodiment, step S162 may include:
[0146] S1621, input the point cloud data corresponding to at least one set of candidate grasping postures into a trained convolutional neural network for grasping posture scoring, obtain the scores of at least one set of candidate grasping postures, and sort the candidate grasping postures based on the scores to obtain a posture sequence.
[0147] S1622, take the candidate grabbing posture corresponding to the maximum score as the second grabbing posture;
[0148] S1623, calculate the pre-grabbing pose based on the pose retraction of the second grasping posture;
[0149] S1624, based on the pre-grasp pose and the environmental information corresponding to the second interest region after voxel downsampling, performs obstacle avoidance path planning for the robotic claw in Cartesian space;
[0150] S1625 If a path is not successfully planned, the second grasping posture is deleted from the posture sequence, and steps S1622 to S1624 are repeated based on the posture sequence after deleting the second grasping posture until a path is successfully planned or all candidate grasping postures in the posture sequence are traversed.
[0151] S1616 If a path is successfully planned, the second grasping posture is taken as the target grasping posture.
[0152] In this embodiment, the robot can sequentially set the grasping pose of the target object and calculate the pre-grasping pose based on the posture retraction of the grasping pose. For the localization box K in the depth image... f The point cloud data of the image area, after depth channel filtering and voxel downsampling, is filled with square dilation to filter out clutter interference from the camera and reduce the computational load of subsequent collision detection. Based on the set pre-grabbing position and processed environmental obstacle information, a path is planned in Cartesian space. If a path can be successfully planned, grabbing is performed directly. If a path cannot be successfully planned, it indicates that the target pose is unreachable due to environmental obstacles or limitations of the robotic arm's motion dimensions. In this case, the current target pose is discarded, and the next candidate grabbing pose is selected. The plan is then retested for reachability until successful planning or all candidate poses have been tested. As an example, the specific process can be as follows: Figure 7 As shown.
[0153] In this embodiment, step S170 may include:
[0154] The control parameters of the robot's robotic arm are determined based on the target grasping posture;
[0155] Based on the control parameters, the robotic arm is controlled to move the robotic claw to the part of the target object corresponding to the target grasping posture, and to perform a grasping action to grasp the target object.
[0156] Based on the above design, the method provided by this invention only uses the RGB image during target detection. Subsequent sampling uses the detection results to directly crop the point cloud data of the depth map to the key regions of interest. Calculations and estimations are performed on the point cloud of the regions of interest, thereby reducing the computational load. Furthermore, this invention crops the point cloud in the depth image and combines it with farthest point sampling, enabling the robot to obtain collision-free grasping poses more efficiently and quickly. This invention generates grasping poses by directly sampling the central region of the target object, which has advantages in computational power consumption and running speed compared to methods that rely on prior knowledge bases to search and match the entire target object point cloud. This invention only focuses on two divided rectangular regions (i.e., the second and third bounding boxes), and does not need to obtain accurate edge information of the target object. Therefore, target detection alone is sufficient, and further semantic segmentation or instance segmentation is not required.
[0157] The sampling region for candidate grasping postures in this invention is based on the central region of the object after the target position has been scaled down. The postures generated by sampling from this region do not consider the object's edges for grasping, which helps reduce the possibility of the object slipping after being grasped by the robotic gripper. In actual grasping, the point cloud information acquired by the camera is the point cloud under a single view, rather than the complete point cloud data under the 3D model. The point cloud under a single view is usually represented as a slice of the object. If the grasping posture is sampled from the edge of the point cloud, it is easy to generate grasping postures that pass through the interior of the object or some edge grasping postures that do not consider the weight of the object. By using the central region of the target object as the sampling region for the grasping center point, this problem can be avoided.
[0158] Compared to other grasping methods that require detecting the placement of objects on a desktop, detecting planes, and filtering out point clouds of specific colors, this invention integrates a 6DoF task grasping method with object detection. It does not require limiting the grasping scene; as long as the target object can be identified, grasping pose generation can be performed in any scene.
[0159] Based on the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented by hardware or by using software plus necessary general-purpose hardware platforms. Based on this understanding, the technical solution of this application can be embodied in the form of a software product. This software product can be stored in a non-volatile storage medium (such as CD-ROM, USB flash drive, mobile hard drive, etc.) and includes several instructions to cause a computer device (such as a personal computer, robot, or network device, etc.) to execute the methods described in the various implementation scenarios of this application.
[0160] In the embodiments provided in this application, it should be understood that the disclosed robots and methods can also be implemented in other ways. The robot and method embodiments described above are merely illustrative. For example, the flowcharts and block diagrams in the accompanying drawings illustrate the possible architecture, functions, and operations of robots, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or part of code, which includes one or more executable instructions for implementing a specified logical function. It should also be noted that each block in a block diagram and / or flowchart, and combinations of blocks in block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or action, or using a combination of dedicated hardware and computer instructions. Furthermore, the functional modules in the various embodiments of this application can be integrated together to form an independent part, or each module can exist independently, or two or more modules can be integrated to form an independent part.
[0161] The above description is merely an embodiment of this application and is not intended to limit the scope of protection of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application.
Claims
1. A robot free grasping control method integrating target detection, characterized in that, The method includes: S110, upon receiving an instruction to grasp a target object, a temporary stopping point is generated for the robot to approach the target object based on the first image set captured by the camera module on the robot and the robot's prior knowledge base; S120, when the robot navigates to the temporary stopping point, it takes pictures of the target object through the camera module to obtain a second image set; S130, based on the RGB image and depth image in the second image set, determine whether the target object is within the grasping range of the robot; S140, when the target object is not within the grasping range of the robot, the position of the robot is adjusted based on the position difference between the target object and the grasping range of the robot, so that the target object is within the grasping range of the robot after the adjustment. S150, when the target object is within the grasping range of the robot, at least one set of candidate grasping postures is determined based on the point cloud data of the target object in the depth image; S160, determine a target grasping posture from at least one set of candidate grasping postures, wherein the target grasping posture is a candidate grasping posture that is reachable by the robot's mechanical gripper without collision. S170, control the robot to grasp the target object in the target grasping posture; Step S150 includes: A first bounding box containing the target object is determined from the depth image, wherein the size and position of the first bounding box are the same as the size and position of the bounding box of the target object in the RGB image; Based on the center of the first positioning frame, the first positioning frame is enlarged by a first specified factor to obtain a second positioning frame; The region of the second localization box in the depth image is taken as the second region of interest, and the point cloud data of the second region of interest is filtered to obtain the first point cloud dataset. The first point cloud dataset is downsampled using voxels, and the size of the voxel downsampling is dynamically adjusted based on the number of points in the downsampled dataset to ensure that the number of points in the downsampled dataset is within a preset range, thus obtaining the set of points of interest, P. f ; Based on the center of the first positioning frame, the first positioning frame is reduced by a second specified factor to obtain the third positioning frame; Focusing on P f The point cloud located in the third positioning frame is used as the sampling area point cloud P. s ; Create a pose model of the robot's mechanical gripper, represented as follows: , This indicates the pose of the robotic gripper. The midpoint of the bottom of the mechanical claw. The direction of travel of the mechanical gripper. Refers to the process Point and Vertical direction; Sampling area point cloud P s N grasping points p are selected by sampling from the farthest point. For each point p, based on the normal information of point p and the point cloud around point p, the normal curvature direction and principal curvature direction of the surface corresponding to point p are determined. The reverse normal curvature direction is used as the initial direction for the movement of the mechanical gripper. The direction of maximum principal curvature in the opposite direction is used as the initial direction of the movement of the mechanical claw. ; By adjusting the initial and the midpoint of the bottom of the mechanical claw To adjust the gripper posture of the robotic gripper, and to calculate the relationship between the robotic gripper and the cloud of concerns P. f Based on the collision situation, at least one set of candidate grab poses that do not involve collisions is obtained.
2. The method according to claim 1, characterized in that, Step S110 includes: S111, query whether the prior knowledge base records the storage location of the target object; S112, when the storage location of the target object does not exist in the prior knowledge base, environmental information is collected through the camera module to obtain the first image set, wherein the first image set includes an aligned first RGB image and a first depth image; S113, using a target detection model, detect whether the target object exists in the first RGB image; S114, when the target object is not present in the first RGB image, the pitch angle of the camera module is adjusted stepwise within the adjustment range of the pitch angle adjustment mechanism of the camera module, and based on the adjusted pitch angle, steps S112 to S113 are repeated until the adjustment range is traversed, or it is determined that the target object exists in the first RGB image. S115, when the pitch angle has been traversed and the target object is not found in any of the first RGB images collected, control the robot to rotate in place, and rotate by a specified angle each time. Based on the viewpoint after each rotation, repeat steps S112 to S114 until the robot has rotated one full circle or it is determined that the target object exists in the first RGB image. S116, when the target object exists in the first RGB image, based on the positioning box of the target object in the first RGB image, a map region representing the location of the target object is determined from the first depth image corresponding to the first RGB image, and used as the first interest map region; S117, Based on the point cloud data of the first attention map region, generate a temporary docking point for the robot to approach the target object; S118, or, when the storage location of the target object exists in the prior knowledge base, a temporary docking point is generated for the robot to approach the target object based on the storage location of the target object.
3. The method according to claim 1, characterized in that, Step S130 includes: The first image region where the target object is located is determined from the RGB images of the second image set using an object detection model; Based on the position information of the first image region in the RGB image, an image region representing the location of the target object is determined from the depth image of the second image set, and used as the second image region; Based on the second map area, determine the position of the center point of the target object's surface; If the center point of the target object's surface is within the grasping range, then the target object is determined to be within the robot's grasping range. Alternatively, if the center point of the target object's surface is not within the grasping range, then the target object is determined to be outside the robot's grasping range.
4. The method according to claim 3, characterized in that, Step S140 includes: Based on the positional difference between the center point of the target object's surface and the robot's grasping range, an obstacle avoidance algorithm is used to plan a fine-tuned path for the robot to travel to the final stopping point. Based on the fine-tuned path, the robot is controlled to travel to the final stopping point. When the robot is at the final stopping point, the target object is within the field of view of the camera module and within the robot's grasping range.
5. The method according to claim 1, characterized in that, Step S160 includes: S161, sort the number of point clouds in the mechanical gripper corresponding to at least one group of candidate grasping postures, and select the candidate grasping posture that is reachable by the mechanical gripper and has the largest number of point clouds as the target grasping posture. S162, or, input the point cloud data corresponding to at least one set of the candidate grasping postures into a trained convolutional neural network for grasping posture scoring, obtain the scores of at least one set of the candidate grasping postures, and select the candidate grasping posture that is reachable by the mechanical claw and has the highest score as the target grasping posture.
6. The method according to claim 5, characterized in that, Step S161 includes: S1611, Sort the number of point clouds in the mechanical gripper corresponding to at least one set of candidate grasping postures to obtain a posture sequence; S1612, take the candidate grasping posture corresponding to the maximum number of point clouds as the first grasping posture; S1613, calculate the pre-grabbing pose based on the pose retraction of the first grasping posture; S1614, based on the pre-grasp pose and the environmental information corresponding to the second interest region after voxel downsampling, performs obstacle avoidance path planning for the robotic claw in Cartesian space; S1615 If a path is not successfully planned, the first grasping posture is deleted from the posture sequence, and steps S1612 to S1614 are repeated based on the posture sequence after deleting the first grasping posture until a path is successfully planned or all candidate grasping postures in the posture sequence are traversed. S1616 If a path is successfully planned, the first grasping posture is taken as the target grasping posture.
7. The method according to claim 5, characterized in that, Step S162 includes: S1621, input the point cloud data corresponding to at least one set of candidate grasping postures into a trained convolutional neural network for grasping posture scoring, obtain the scores of at least one set of candidate grasping postures, and sort the candidate grasping postures based on the scores to obtain a posture sequence. S1622, take the candidate grabbing posture corresponding to the maximum score as the second grabbing posture; S1623, calculate the pre-grabbing pose based on the pose retraction of the second grasping posture; S1624, based on the pre-grasp pose and the environmental information corresponding to the second interest region after voxel downsampling, performs obstacle avoidance path planning for the robotic claw in Cartesian space; S1625 If a path is not successfully planned, the second grasping posture is deleted from the posture sequence, and steps S1622 to S1624 are repeated based on the posture sequence after deleting the second grasping posture until a path is successfully planned or all candidate grasping postures in the posture sequence are traversed. S1616 If a path is successfully planned, the second grasping posture is taken as the target grasping posture.
8. The method according to claim 1, characterized in that, Step S170 includes: The control parameters of the robot's robotic arm are determined based on the target grasping posture; Based on the control parameters, the robotic arm is controlled to move the robotic claw to the part of the target object corresponding to the target grasping posture, and to perform a grasping action to grasp the target object.
9. A robot, characterized in that, The robot includes a camera module, a pitch angle adjustment mechanism, a mobile chassis, a processor, and a memory. One end of the pitch angle adjustment mechanism is connected to the camera module, and the other end of the pitch angle adjustment mechanism is connected to the mobile chassis. The memory stores a computer program, which, when executed by the processor, causes the robot to perform the method as described in any one of claims 1-8.