Robot and control method, device and readable storage medium thereof
By acquiring point cloud images and using a preset grasping network to predict contact points, a rotation matrix is constructed to derive joint coordinates and control the robot's grasping posture. This solves the problems of high cost and low quality when a three-finger dexterous hand grasps unknown objects in a cluttered environment, and achieves a highly efficient grasping effect.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- MIDEA GRP (SHANGHAI) CO LTD
- Filing Date
- 2023-01-19
- Publication Date
- 2026-06-12
AI Technical Summary
When using a three-finger dexterity hand to grasp unknown objects in a cluttered environment, existing technologies require significant manual labor and produce poor grasping quality, mainly due to the large amount of data and limitations that restrict the grasping posture.
By acquiring point cloud images of the object to be grasped, a pre-set grasping network is used for classification and prediction to determine the contact point between the fingertip and the object. A rotation matrix is constructed with the coordinate system of the contact point as the origin relative to the world coordinate system. The coordinate information of the joints is derived, and the robot is controlled to work according to the grasping posture corresponding to the transfer matrix.
This reduces the dimensionality of data collection, decreases the amount of data processed and labor costs, improves the efficiency and quality of robot grasping, and avoids the problem of poor grasping quality caused by data limitations.
Smart Images

Figure CN116229156B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of control technology, and more specifically, to a robot and its control method, apparatus, and readable storage medium. Background Technology
[0002] Grasping unknown objects with a three-finger dexterity in a cluttered environment often involves using 10 dimensions to represent the grasping data, resulting in a large amount of data and requiring significant manual labor to construct the data.
[0003] To reduce the amount of data during construction, several dimensions of data are limited. These limitations restrict the crawling approach and result in poor crawling quality. Summary of the Invention
[0004] The present invention aims to solve at least one of the technical problems existing in the prior art or related art.
[0005] Therefore, a first aspect of the present invention is to provide a method for controlling a robot.
[0006] A second aspect of the present invention is that a control device for a robot is provided.
[0007] A third aspect of the invention is that it provides another control device for a robot.
[0008] A fourth aspect of the present invention is that a readable storage medium is provided.
[0009] A fifth aspect of the invention is that a robot is provided.
[0010] A sixth aspect of the invention is that it provides another type of robot.
[0011] In view of the above, according to a first aspect of the present invention, the present invention provides a robot control method, the robot including a mechanical gripper having joints and fingertips, the control method comprising: acquiring a point cloud image containing an object to be grasped; inputting the point cloud image into a preset grasping network for the preset grasping network to perform classification and prediction based on the point cloud image to obtain the contact point between the fingertips and the object to be grasped; constructing a first rotation matrix relative to the world coordinate system with the contact point as the origin based on the coordinate information of the contact point; determining the coordinate information of the joints based on the coordinate information of the contact point, the vector between the joints and the fingertips, and the first rotation matrix; determining a transfer matrix relative to the world coordinate system with the joints as the origin based on the coordinate information of the joints; and controlling the robot to operate according to the grasping posture corresponding to the transfer matrix.
[0012] According to a second aspect of the present invention, the present invention provides a control device for a robot. The robot includes a mechanical gripper having joints and fingertips. The control device includes: an acquisition unit for acquiring a point cloud image containing an object to be grasped; a prediction unit for inputting the point cloud image into a preset grasping network for the preset grasping network to perform classification and prediction based on the point cloud image to obtain the contact point between the fingertips and the object to be grasped; a construction unit for constructing a first rotation matrix relative to the world coordinate system with the contact point as the origin based on the coordinate information of the contact point; a determination unit for determining the coordinate information of the joints based on the coordinate information of the contact point, the vector between the joints and the fingertips, and the first rotation matrix; a processing unit for determining a transfer matrix relative to the world coordinate system with the joint as the origin based on the coordinate information of the joints; and a control unit for controlling the robot to operate according to the grasping posture corresponding to the transfer matrix.
[0013] According to a third aspect of the present invention, another control device for a robot is provided, comprising: a controller and a memory, wherein the memory stores a program or instructions, and the controller, when executing the program or instructions in the memory, implements the steps of any of the methods described above.
[0014] According to a fourth aspect of the present invention, a readable storage medium is provided on which a program or instructions are stored, which, when executed by a processor, implement the steps of any of the methods described above.
[0015] According to a fifth aspect of the present invention, a robot is provided, comprising: a robot control device as described in any of the above; and / or a readable storage medium as described above.
[0016] According to a sixth aspect of the invention, another robot is provided, comprising: a mechanical gripper having joints and fingertips; and a processor that implements the steps of any of the methods described above.
[0017] In the above technical solution, the coordinate information of the contact point can be used to derive the coordinate information of the joint, thereby realizing the expression of the grasping posture. While realizing the control of the robot, the dimensionality of data acquisition is reduced, thereby reducing the amount of data to be processed and reducing the manual cost of robot control.
[0018] Furthermore, the above control process does not require data restrictions, thus avoiding the problem of poor robot grasping quality caused by data restrictions, thereby improving the robot's grasping efficiency.
[0019] Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Attached Figure Description
[0020] The above and / or additional aspects and advantages of the present invention will become apparent and readily understood from the description of the embodiments taken in conjunction with the following drawings, in which:
[0021] Figure 1 A flowchart illustrating the robot control method in an embodiment of the present invention is shown;
[0022] Figure 2 A schematic diagram of a robot grasping an object is shown in an embodiment of the present invention;
[0023] Figure 3 A schematic diagram of the mechanical gripper in an embodiment of the present invention is shown;
[0024] Figure 4 A schematic diagram of a preset crawling network is shown in an embodiment of the present invention;
[0025] Figure 5 One of the schematic block diagrams of the robot control device in an embodiment of the present invention is shown;
[0026] Figure 6 The second schematic block diagram of the robot control device in an embodiment of the present invention is shown. Detailed Implementation
[0027] To better understand the above aspects, features, and advantages of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that, unless otherwise specified, the embodiments and features described in these embodiments can be combined with each other.
[0028] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and therefore the scope of protection of the invention is not limited to the specific embodiments disclosed below.
[0029] In one embodiment of this application, such as Figure 1 As shown, a robot control method is provided. The robot includes a mechanical gripper with joints and fingertips. The control method includes:
[0030] Step 102: Obtain a point cloud image containing the object being captured;
[0031] Step 104: Input the point cloud image into the preset grasping network so that the preset grasping network can perform classification and prediction based on the point cloud image to obtain the contact point between the fingertip and the grasped object.
[0032] Step 106: Based on the coordinate information of the contact point, construct the first rotation matrix relative to the world coordinate system with the contact point as the origin;
[0033] Step 108: Determine the coordinate information of the joint based on the coordinate information of the contact point, the vector between the joint and the fingertip, and the first rotation matrix;
[0034] Step 110: Based on the joint coordinate information, determine the transition matrix between the coordinate system with the joint as the origin and the world coordinate system;
[0035] Step 112: Control the robot to work according to the grasping posture corresponding to the transfer matrix.
[0036] The embodiments of this application propose a robot control method. By running the above control method, the coordinate information of the joints can be derived using the coordinate information of the contact point, thereby realizing the expression of the grasping posture. While realizing the control of the robot, the dimensionality of data acquisition is reduced, thereby reducing the amount of data to be processed and reducing the manual cost of robot control.
[0037] Furthermore, the above control process does not require data restrictions, thus avoiding the problem of poor robot grasping quality caused by data restrictions, thereby improving the robot's grasping efficiency.
[0038] In the above embodiments, by acquiring point cloud data, the preset grasping network can judge each point in the point cloud data and determine whether it is a contact point, thereby predicting the position where the fingertip contacts the grasped object, that is, the contact point mentioned above.
[0039] In one embodiment, the point cloud data may be the point cloud data when the robotic gripper grasps the object being grasped.
[0040] In the above embodiments, since the coordinate information of the joint can be derived from the coordinate information of the contact point, when using the preset grasping network to predict the contact point, the preset grasping network can represent the contact point in a way with fewer dimensions, thereby reducing the prediction efficiency of the preset grasping network.
[0041] In the above embodiments, by controlling the robot to work according to the grasping posture corresponding to the transfer matrix, the dimension of the grasping data is transformed, thereby transforming it into an expression with the same number of dimensions as in the related embodiments, so that the above embodiments can be adapted to the related embodiments, expanding the scope of application of the embodiments.
[0042] In the above embodiments, the transformation matrix relative to the world coordinate system with the joint as the origin is determined based on the joint coordinate information. Specifically, this includes: constructing a second rotation matrix relative to the world coordinate system with the joint as the origin based on the joint coordinate information; and determining the transformation matrix relative to the world coordinate system with the joint as the origin based on the joint coordinate information and the second rotation matrix.
[0043] In this embodiment, a second rotation matrix is constructed so that a vector of fixed length can be expressed using the contact points, thereby expressing the robot's pose.
[0044] In the above embodiments, based on the coordinate information of the contact point, a first rotation matrix is constructed relative to the world coordinate system with the contact point as the origin. Specifically, this includes: constructing a first coordinate system based on the coordinate information of the contact point, wherein the direction of a certain coordinate axis of the first coordinate system is the same as the normal vector of the fingertip; and determining the first rotation matrix based on the first coordinate system.
[0045] In this embodiment, a first coordinate system is constructed, and the direction of a certain coordinate axis of the first coordinate system is defined to be the same as the normal vector, so that the coordinates of the contact point can be expressed in the coordinate system on the robot, thereby limiting the control of the robot.
[0046] In one embodiment, such as Figure 2 As shown, firstly, A Grasp represents grasping, and Grasp Contact is the grasping point, that is, the position of contact between the fingertip and the object being grasped is defined as t. contact Let n be the normal vector of the surface on the fingertip that is in contact with the object being grasped. Let a circle be used to fit the fingertip area, with its center defined as ft and radius r. Then the coordinate information of the fingertip area is t. ft Its expression is as follows:
[0047] t ft =t contact +rn;
[0048] Construct a coordinate system T at point ft. ft , among which, T ft The z-axis direction is the same as the normal vector n. To distinguish them, let's denote them as v, defined as follows:
[0049] v = [v1, v2, v3] T ;
[0050] Where v1, v2, and v3 correspond to coordinate system T respectively. ft The x-axis, y-axis, and z-axis in the diagram.
[0051] Based on the above, the rotation matrix R between the coordinate system with point ft as the origin and the world coordinate system is... ft The expression is as follows:
[0052] R ft =[R 1 [0, -v3, v2] T ,v];
[0053] Among them, R1 =[0,-v3,v2] T ×v.
[0054] Secondly, by mapping the two fingers x∈[-1,1] and y∈[-1,1], a fixed-length vector V is calculated. finger and based on v finger Calculate t oj .
[0055] t oj =||v finger ||(R fT ·[x,y,z] T )+t ft ;
[0056] t oj Represents the coordinate information of the joint, ||v finger || represents a fixed length of two fingers, [x,y,z] T Indicates in T ft The unit vector in t ft This contains the coordinates of the contact point.
[0057] in, x, y, z are the coordinates in the world coordinate system.
[0058] Again, built on t oj rotation matrix R oj .
[0059] R oj =[v finger ,R 2 ,v finger ×v z ]·R0;
[0060] Among them, R 2 =v finger ×[v finger ×v z ], v z It is a vector along the z-axis, and R0 is a fixed rotation matrix. oj It can be generated by R oj and t oj Specifically, the following is obtained:
[0061] Then according to T oj and the final grasping posture T pose The relative relationships are obtained as follows:
[0062] T pose =T oj ·(T) -1
[0063] Based on the above, the expression method can be transformed from a 10-dimensional crawling space to a 6-dimensional crawling space.
[0064] In one of the relevant embodiments, a three-clawed robot is used as an example, that is, the robot has three fingers and achieves high-quality grasping G = {p, q}, specifically, p is the robot's grasping posture. There are a total of 6 dimensions, where q represents the robot's joint degrees of freedom, such as... Figure 3 As shown, q = {θ0, θ1, θ2, θ3}, which has four dimensions. θ0 is the degree of freedom of the translational joints, and θ1, θ2, θ3 are the degrees of freedom of the finger joints. 11 ,θ 21 ,θ 31 It is the degree of freedom of the joint connected to the finger, which is the same as θ1, θ2, θ3.
[0065] Through the transformation of the crawling method described above in this application, the high-quality crawling G can be transformed from 10 dimensions to 6 dimensions, and its crawling space is expressed as follows:
[0066] G = {x, y, θ} ms ,θ m ,θ s1 ,θ s2}
[0067] Where x, y are the mappings of the finger vector in the first coordinate system; θ ms ,θ m The degrees of freedom of the main joint refer to the translational joint angle and the joint angle of the finger in contact with a specified point on the object; θ s1 ,θ s2 represents the degrees of freedom of the non-master joints, which refer to the joint angles of the other two fingers. represents the coordinate positions, represents the degrees of freedom of the master joint, and represents the degrees of freedom of the non-master joints.
[0068] Here θ 11 ,θ 21 ,θ 31 These represent the angles of the fingertip joints, and they are linked to θ1, θ2, and θ3. Specifically, when finger 3 (one of the three fingers) contacts the object to be grasped, θ... m correspond Figure 3 θ3, θ ms correspond Figure 3 θ0, θ s1 ,θ s2 These correspond to θ2 for finger 2 and θ3 for finger 3, respectively.
[0069] In the above embodiments, T is determined based on the mechanical gripper's own parameters, which will not be elaborated further here.
[0070] In the above embodiments, the method further includes: constructing a training dataset; training the crawling network based on the training dataset until the loss function corresponding to the crawling network converges, thereby obtaining a preset crawling network.
[0071] In this embodiment, considering that the contact point coordinates output by the preset crawling network are expressed using 6-dimensional data, it is necessary to train the network to ensure the accuracy of the output contact points.
[0072] In one embodiment, the coordinate information of the contact point is expressed as follows:
[0073] G = {x, y, θ} ms ,θ m ,θ s1 ,θ s2}
[0074] Where G represents the coordinate information of the contact point.
[0075] The loss function for the capture network is expressed as follows:
[0076]
[0077] in:
[0078]
[0079] in, To capture the loss function corresponding to the network, The loss function for point classification prediction is: Let x and y be the loss functions. For θ m loss function, For θ ms loss function, For θ s1 θ s2 The loss function, where, yes and The weighted cumulative sum, where α, β, γ, γ1, γ2, and γ3 are constants.
[0080] The loss function for the capture network can be obtained by superimposing multiple loss functions, specifically, as follows: Figure 4 As shown, the grasping network uses the Grasp Point Segmentation module to classify and predict whether each point in the point cloud image is a grasping point.
[0081] Based on this, the loss formula corresponding to the Grasp Point Segmentation module is expressed as follows:
[0082]
[0083] in, It's cross-entropy loss. p It is the prediction result of the segmentation head for each captured sample obtained from PointNet++, c g It is its corresponding tag.
[0084] Regarding x, y, θ in the above text ms ,θ m Main joint prediction is used for x, y, θ ms ,θ m A prediction is made to obtain the initial grasping posture, where the loss formula for x and y is expressed as follows:
[0085]
[0086] Where, N c It is the total number of contact points. It is the finger projection value predicted by the contact point. It is the actual finger projection value at the contact point, δ c The value is 1 when x and y correspond to a contact point, and 0 otherwise.
[0087] Regarding θ m The loss formula is:
[0088]
[0089] in, and θ is for a given capture p m The corresponding true category and residual, and That is the corresponding predicted value.
[0090] For θ ms The loss formula for s is:
[0091]
[0092] in, and θ is for a given capture p ms The corresponding true category and residual, and That is the corresponding predicted value.
[0093] Furthermore, by supporting joint prediction, the remaining grasping joint θ s1 θ s2 For prediction, the corresponding loss formula is expressed as follows:
[0094]
[0095] in, and θ is for a given capture p s1 ,θ s2 The corresponding true category and residual, and That is the corresponding predicted value.
[0096] Based on the above, the loss function for the crawling network is expressed as follows:
[0097]
[0098] in:
[0099]
[0100] In practical use, α, β, γ, γ1, γ2 and γ3 are taken as needed. In the embodiments of this application, α = 1, β = γ = 5, γ1 = γ2 = γ3 = 1.
[0101] The loss value is determined by the loss function of the crawling network, and then the parameters of the Grasp PointSegmentation module, Main joint Prediction, and Supporting Joint Prediction mentioned above are updated based on the loss value to obtain the preset crawling network.
[0102] In the above embodiments, constructing a training dataset specifically includes: determining the model of the object to be grasped; generating grasping postures and corresponding joint parameters based on the coordinate system under the model of the object to be grasped; determining grasping evaluation values based on grasping postures and corresponding joint parameters; and recording grasping postures and corresponding joint parameters whose grasping evaluation values are greater than preset evaluation values.
[0103] In this embodiment, the model of the object to be grasped can be found in an open-source object database. The object to be grasped can be of appropriate size, different shape, and can be one of 80 objects from our daily lives.
[0104] By generating a grasping posture and corresponding joint parameters for each object to be grasped in its corresponding object coordinate system, the mesh model of the object is first downsampled to make the point cloud of the object uniformly distributed. Then, the points in the object point cloud are uniformly sampled. For a sampling point, we search three dimensions: S1×S2×S3, where S1 is the depth of the gripper, i.e., the mechanical gripper, S2 is the angle of rotation of the gripper in the normal direction, and S3 is the angle of the translational joint, thereby achieving uniform sampling of each object.
[0105] These three quantities are randomly sampled and combined within a reasonable range, and the generated grasp is automatically closed by the finger joints. It stops when each finger collides with the object. The grasp evaluation value can be an existing model, such as GraspIt!, which is a grasp simulator that can evaluate the grasp quality.
[0106] By setting preset evaluation values, samples with higher quality can be selected using these preset evaluation values as filtering criteria.
[0107] After selecting high-quality samples, parameters such as the contact point and normal vector between the mechanical gripper and the object to be gripped are calculated or obtained, thereby constructing the training dataset.
[0108] In one embodiment, the fingertip has a contact surface that contacts the object being grasped, and the contact surface is clustered based on a clustering algorithm to obtain contact points.
[0109] In one embodiment, when calculating the contact point, since the actual contact is a single contact surface, we use the k-means algorithm to cluster the contact surface into a single contact point to construct the contact between the object to be grasped and the robotic gripper.
[0110] Among them, the k-means algorithm is a commonly used clustering algorithm based on Euclidean distance.
[0111] Furthermore, these objects are placed on a table in random numbers and with random orientations to construct the scene; and the previously generated grasping pose G in the object coordinate system is used to... o Transform to the scene's world coordinate system G w Down:
[0112] G w =P·G o
[0113] Here, P represents the object's 6D pose in the world coordinate system. The entire grasping process is then simulated using a pybullet-based simulation environment. All collision-free and valid grasps are selected and recorded to construct a scene grasping dataset. Finally, the scene is photographed from any angle above the desktop, and the grasped images G from the scene dataset are recorded. w Grab G in camera coordinate system cam And record it to obtain a point cloud image.
[0114] The embodiments proposed in this application are applicable to the grasping posture prediction of commercially available three-finger grippers with translational joints. For a given three-finger gripper, the structural size and kinematic parameters of the gripper are first obtained, and the center point ft and the radius r of the circle corresponding to the three fingers of the gripper are calculated; then, these are substituted into the formula to obtain a new solution formula for three-finger grasping, and the parameters of the solution formula are G = {x, y, θ} in six dimensions. ms ,θ m ,θ s1 ,θ s2 Then, if using a three-finger gripper with a Barrett hand or similar specifications, the provided dataset can be used directly. If the specifications differ significantly, the dataset needs to be generated according to the given three-finger gripper specifications. Finally, the generated dataset is used to train CMG-Net. The input single-viewpoint cloud data predicts the grasping pose and gripper joint angles in the camera coordinate system. After CMG-Net is trained, we can randomly place some graspable objects in the experimental scene, capture the scene using a calibrated RGBD camera, record the camera pose, and convert the captured depth image into point cloud data. It is recommended to filter background points in the point cloud to concentrate the sampling of grasping points on the objects, avoiding a large number of invalid grasping points. Then, the converted point cloud is input into the trained CMG-Net to obtain k grasping points in the camera coordinate system, where k can be set manually. We will select one grasping point G. cam First, transform it to the world coordinate system G. w ={p,q}, and move the robotic arm's end effector (i.e., the gripper mounting position) to the predicted grasping posture. Move the gripper joints to q = {θ0, θ1, θ2, θ3}. It is recommended to move the translational joints first, then the finger joints. If the object is easily dropped after the gripper closes, you can give the finger joints some angular compensation so that the gripper can grasp the object better.
[0115] This solution analyzes the physical characteristics of three-finger grasping, thereby reducing the size of the grasping space we need to search when predicting grasping, thus enabling the algorithm to obtain high-quality grasping postures faster and better.
[0116] Furthermore, by constructing a large-scale three-finger grasping dataset for different types of objects and making predictions on unknown types of objects in reality, we ensured the feasibility of our grasping algorithm in real-world applications.
[0117] In some embodiments of this application, such as Figure 5 As shown, the present invention provides a robot control device 500. The robot includes a mechanical gripper with joints and fingertips. The control device includes: an acquisition unit 502 for acquiring a point cloud image containing an object to be grasped; a prediction unit 504 for inputting the point cloud image into a preset grasping network, so that the preset grasping network can perform classification and prediction based on the point cloud image to obtain the contact point between the fingertips and the object to be grasped; a construction unit 506 for constructing a first rotation matrix relative to the world coordinate system with the contact point as the origin based on the coordinate information of the contact point; a determination unit 508 for determining the coordinate information of the joints based on the coordinate information of the contact point, the vector between the joints and the fingertips, and the first rotation matrix; a processing unit 510 for determining a transfer matrix relative to the world coordinate system with the joint as the origin based on the coordinate information of the joints; and a control unit 512 for controlling the robot to work according to the grasping posture corresponding to the transfer matrix.
[0118] The embodiments of this application propose a robot control device 500, which can use the coordinate information of the contact point to derive the coordinate information of the joint, thereby realizing the expression of the grasping posture. While realizing the control of the robot, the dimensionality of data acquisition is reduced, thereby reducing the amount of data to be processed and reducing the manual cost of robot control.
[0119] Furthermore, the above control process does not require data restrictions, thus avoiding the problem of poor robot grasping quality caused by data restrictions, thereby improving the robot's grasping efficiency.
[0120] In the above embodiments, by acquiring point cloud data, the preset grasping network can judge each point in the point cloud data and determine whether it is a contact point, thereby predicting the position where the fingertip contacts the grasped object, that is, the contact point mentioned above.
[0121] In the above embodiments, since the coordinate information of the joint can be derived from the coordinate information of the contact point, when using the preset grasping network to predict the contact point, the preset grasping network can represent the contact point in a way with fewer dimensions, thereby reducing the prediction efficiency of the preset grasping network.
[0122] In the above embodiments, by controlling the robot to work according to the grasping posture corresponding to the transfer matrix, the dimension of the grasping data is transformed, thereby transforming it into an expression with the same number of dimensions as in the related embodiments, so that the above embodiments can be adapted to the related embodiments, expanding the scope of application of the embodiments.
[0123] In the above embodiment, the processing unit 510 is specifically used to: construct a second rotation matrix relative to the world coordinate system with the joint as the origin based on the joint coordinate information; and determine a transfer matrix relative to the world coordinate system with the joint as the origin based on the joint coordinate information and the second rotation matrix.
[0124] In this embodiment, a second rotation matrix is constructed so that a vector of fixed length can be expressed using the contact points, thereby expressing the robot's pose.
[0125] In the above embodiment, the construction unit 506 is specifically used to: construct a first coordinate system based on the coordinate information of the contact point, wherein the direction of a certain coordinate axis of the first coordinate system is the same as the normal vector of the fingertip; and determine a first rotation matrix based on the first coordinate system.
[0126] In this embodiment, a first coordinate system is constructed, and the direction of a certain coordinate axis of the first coordinate system is defined to be the same as the normal vector, so that the coordinates of the contact point can be expressed in the coordinate system on the robot, thereby limiting the control of the robot.
[0127] In one embodiment, the position of contact between the fingertip and the object being grasped is first defined as t. contact The normal vector of the surface on the fingertip that is in contact with the object being grasped is defined as n. A circle is used to fit the fingertip area, with its center defined as ft and radius r. The coordinate information of the fingertip area is then:
[0128] t ft =t contact +rn;
[0129] Construct a coordinate system T at point ft. ft , among which, T ft The z-axis direction is the same as the normal vector n. To distinguish them, let's denote them as v, defined as follows:
[0130] v = [v1, v2, v3] T ;
[0131] Where v1, v2, and v3 correspond to coordinate system T respectively. ft The x-axis, y-axis, and z-axis in the diagram.
[0132] Based on the above, then:
[0133] R ft =[R 1 [0, -v3, v2] T ,v];
[0134] Among them, R 1 =[0,-v3,v2] T ×v.
[0135] Secondly, by mapping the two fingers x∈[-1,1] and y∈[-1,1], a fixed-length vector V is calculated. finger and based on v finger Calculate t oj .
[0136] t oj =||v finger ||(R fT ·[x,y,z] T )+t ft ;
[0137] t oj Represents the coordinate information of the joint, ||v finger || represents a fixed length of two fingers, [x,y,z] T Indicates in T ft The unit vector in t ft This contains the coordinates of the contact point.
[0138] in, x, y, z are the coordinates in the world coordinate system.
[0139] Again, built on t oj rotation matrix R oj .
[0140] R oj =[v finger ,R 2 ,v finger ×v z ]·R0;
[0141] Among them, R 2 =v finger ×[v finger ×v z ], v z It is a vector along the z-axis, and R0 is a fixed rotation matrix. oj It can be generated by R oj and t oj Specifically, the following is obtained:
[0142] Then according to T oj and the final grasping posture Tpose The relative relationships are obtained as follows:
[0143] T pose =T oj ·(T) -1
[0144] Based on the above, the expression method can be transformed from a 10-dimensional crawling space to a 6-dimensional crawling space.
[0145] In one of the relevant embodiments, a three-clawed robot is used as an example, that is, the robot has three fingers and achieves high-quality grasping G = {p, q}, specifically, p is the robot's grasping posture. There are a total of 6 dimensions. q represents the robot's joint degrees of freedom, q = {θ0, θ1, θ2, θ3}. There are also four dimensions: θ0 represents the translational joint degrees of freedom, and θ1, θ2, θ3 represent the finger joint degrees of freedom. 11 ,θ 21 ,θ 31 It is the degree of freedom of the joint connected to the finger, which is the same as θ1, θ2, θ3.
[0146] Through the transformation of the crawling method described above in this application, the high-quality crawling G can be transformed from 10 dimensions to 6 dimensions, and its crawling space is expressed as follows:
[0147] G = {x, y, θ} ms ,θ m ,θ s1 ,θ s2}
[0148] Where x, y are the mappings of the finger vector in the first coordinate system; θ ms ,θ m The degrees of freedom of the main joint refer to the translational joint angle and the joint angle of the finger in contact with a specified point on the object; θ s1 ,θ s2 The degrees of freedom of the non-primary joints refer to the joint angles of the other two fingers.
[0149] In the above embodiments, T is determined based on the mechanical gripper's own parameters, which will not be elaborated further here.
[0150] In the above embodiments, the prediction unit 504 is further configured to: construct a training dataset; train the crawling network according to the training dataset until the loss function corresponding to the crawling network converges, thereby obtaining a preset crawling network.
[0151] In this embodiment, considering that the contact point coordinates output by the preset crawling network are expressed using 6-dimensional data, it is necessary to train the network to ensure the accuracy of the output contact points.
[0152] In one embodiment, the coordinate information of the contact point is expressed as follows:
[0153] G = {x, y, θ} ms ,θ m ,θ s1 ,θ s2}
[0154] Where G represents the coordinate information of the contact point.
[0155] The loss function for the capture network is expressed as follows:
[0156]
[0157] in:
[0158]
[0159] in, To capture the loss function corresponding to the network, The loss function for point classification prediction is: Let x and y be the loss functions. For θ m loss function, For θ ms loss function, For θ s1 θ s2 The loss function, where, yes and The weighted cumulative sum, where α, β, γ, γ1, γ2, and γ3 are constants.
[0160] The loss function of the grasping network can be obtained by superimposing multiple loss functions. Specifically, the grasping network uses the Grasp Point Segmentation module to classify and predict whether each point in the point cloud image is a grasping point.
[0161] Based on this, the loss formula corresponding to the Grasp Point Segmentation module is expressed as follows:
[0162]
[0163] in, It's cross-entropy loss. p It is the prediction result of the segmentation head for each captured sample obtained from PointNet++, c g It is its corresponding tag.
[0164] Regarding x, y, θ in the above text ms ,θ m Main joint prediction is used for x, y, θ ms ,θ m A prediction is made to obtain the initial grasping posture, where the loss formula for x and y is expressed as follows:
[0165]
[0166] Where, N c It is the total number of contact points. It is the finger projection value predicted by the contact point. It is the actual finger projection value at the contact point, δ c The value is 1 when x and y correspond to a contact point, and 0 otherwise.
[0167] Regarding θ m The loss formula is:
[0168]
[0169] in, and θ is for a given capture p m The corresponding true category and residual, and That is the corresponding predicted value.
[0170] For θ ms The loss formula for s is:
[0171]
[0172] in, and θ is for a given capture p ms The corresponding true category and residual, and That is the corresponding predicted value.
[0173] Furthermore, by supporting joint prediction, the remaining grasping joint θ s1 θ s2 For prediction, the corresponding loss formula is expressed as follows:
[0174]
[0175] in, and θ is for a given capture p s1 ,θ s2 The corresponding true category and residual, and That is the corresponding predicted value.
[0176] Based on the above, the loss function for the crawling network is expressed as follows:
[0177]
[0178] in:
[0179]
[0180] In practical use, α, β, γ, γ1, γ2 and γ3 are taken as needed. In the embodiments of this application, α = 1, β = γ = 5, γ1 = γ2 = γ3 = 1.
[0181] The loss value is determined by the loss function of the crawling network, and then the parameters of the Grasp PointSegmentation module, Main joint Prediction, and Supporting Joint Prediction mentioned above are updated based on the loss value to obtain the preset crawling network.
[0182] In the above embodiments, the prediction unit 504 is further configured to: determine the model of the object to be grasped; generate a grasping posture and corresponding joint parameters based on the coordinate system under the model of the object to be grasped; determine a grasping evaluation value based on the grasping posture and corresponding joint parameters; and record the grasping posture and corresponding joint parameters where the grasping evaluation value is greater than a preset evaluation value.
[0183] In this embodiment, the model of the object to be grasped can be found in an open-source object database. The object to be grasped can be of appropriate size, different shape, and can be one of 80 objects from our daily lives.
[0184] By generating a grasping posture and corresponding joint parameters for each object to be grasped in its corresponding object coordinate system, the mesh model of the object is first downsampled to make the point cloud of the object uniformly distributed. Then, the points in the object point cloud are uniformly sampled. For a sampling point, we search three dimensions: S1×S2×S3, where S1 is the depth of the gripper, i.e., the mechanical gripper, S2 is the angle of rotation of the gripper in the normal direction, and S3 is the angle of the translational joint, thereby achieving uniform sampling of each object.
[0185] These three quantities are randomly sampled and combined within a reasonable range, and the generated grasp is automatically closed by the finger joints. It stops when each finger collides with the object. The grasp evaluation value can be an existing model, such as GraspIt!, which is a grasp simulator that can evaluate the grasp quality.
[0186] By setting preset evaluation values, samples with higher quality can be selected using these preset evaluation values as filtering criteria.
[0187] After selecting high-quality samples, parameters such as the contact point and normal vector between the mechanical gripper and the object to be gripped are calculated or obtained, thereby constructing the training dataset.
[0188] In one embodiment, the fingertip has a contact surface that contacts the object being grasped, and the contact surface is clustered based on a clustering algorithm to obtain contact points.
[0189] In one embodiment, when calculating the contact point, since the actual contact is a single contact surface, we use the k-means algorithm to cluster the contact surface into a single contact point to construct the contact between the object to be grasped and the robotic gripper.
[0190] Furthermore, these objects are placed on a table in random numbers and with random orientations to construct the scene; and the previously generated grasping pose G in the object coordinate system is used to... o Transform to the scene's world coordinate system G w Down:
[0191] G w =P·G o
[0192] Here, P represents the object's 6D pose in the world coordinate system. The entire grasping process is then simulated using a pybullet-based simulation environment. All collision-free and valid grasps are selected and recorded to construct a scene grasping dataset. Finally, the scene is photographed from any angle above the desktop, and the grasped images G from the scene dataset are recorded. w Grab G in camera coordinate system cam And record it to obtain a point cloud image.
[0193] In one embodiment, such as Figure 6 As shown, the present invention provides another robot control device 600, including: a controller 602 and a memory 604, wherein the memory 604 stores a program or instructions, and the controller 602 implements the steps of any of the methods described above when executing the program or instructions in the memory 604.
[0194] The memory 604 can be used to store software programs and various data. The memory may primarily include a first storage area for storing programs or instructions and a second storage area for storing data. The first storage area may store the operating system, application programs or instructions required for at least one function (such as sound playback, image playback, etc.). Furthermore, the memory may include volatile memory or non-volatile memory, or both. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory can be random access memory (RAM), static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDRSDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous linked dynamic random access memory (SLDRAM), and direct memory bus RAM (DRRAM). The memory in the embodiments of this application includes, but is not limited to, these and any other suitable types of memory.
[0195] In one embodiment, the present invention provides a readable storage medium on which a program or instructions are stored, which, when executed by a processor, implement the steps of any of the methods described above.
[0196] In one embodiment, the present invention provides a robot, including: a robot control device as described in any of the above; and / or a readable storage medium as described above.
[0197] In one embodiment, the present invention provides another robot comprising: a mechanical claw having joints and fingertips; and a processor that implements the steps of any of the methods described above.
[0198] In the above embodiments, there are multiple joints and multiple fingertips, with each joint corresponding to one of the multiple fingertips.
[0199] In the above embodiment, the number of fingertips is three.
[0200] The terms "first" and "second" in the specification and claims of this application may explicitly or implicitly include one or more of the features. In the textual description of this invention, unless otherwise stated, "a plurality of" means two or more. Furthermore, in the specification and claims, "and / or" indicates at least one of the connected objects, and the character " / " generally indicates that the preceding and following objects are in an "or" relationship.
[0201] In the textual description of this invention, it is understood that the terms "center," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," "axial," "radial," and "circumferential" indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. They are only for the convenience of describing and simplifying the embodiments of this invention, and do not indicate or imply that the structures, devices, or elements referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, these descriptions should not be construed as limiting the invention.
[0202] In the textual description of this invention, it is understood that, unless explicitly specified and limited, the terms "installation," "connection," and "linking" should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal communication between two components. Those skilled in the art will understand the specific meaning of the above terms in this invention based on the specific circumstances.
[0203] In the claims, description, and accompanying drawings of this invention, the term "plural" refers to two or more. Unless otherwise explicitly defined, the terms "upper," "lower," etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the drawings, and are only for the convenience of describing the invention and simplifying the description process, not to indicate or imply that the device or element referred to must have the described specific orientation, or be constructed and operated in a specific orientation. Therefore, these descriptions should not be construed as limiting the invention. The terms "connect," "install," "fix," etc., should be interpreted broadly. For example, "connect" can be a fixed connection between multiple objects, a detachable connection between multiple objects, or an integral connection; it can be a direct connection between multiple objects or an indirect connection between multiple objects through an intermediate medium. For those skilled in the art, the specific meaning of the above terms in this invention can be understood based on the specific circumstances described above.
[0204] In the claims, description, and accompanying drawings of this invention, the terms "one embodiment," "some embodiments," "specific embodiment," etc., refer to a specific feature, structure, material, or characteristic described in connection with that embodiment or example, which is included in at least one embodiment or example of the invention. In the claims, description, and accompanying drawings of this invention, illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.
[0205] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A method for controlling a robot, characterized in that, The robot includes a mechanical gripper having joints and fingertips, and the control method includes: Obtain a point cloud image containing the captured object; The point cloud image is input into a preset grasping network, which then performs classification and prediction based on the point cloud image to obtain the contact point between the fingertip and the grasped object. Based on the coordinate information of the contact point, a first rotation matrix is constructed relative to the world coordinate system with the contact point as the origin. The coordinate information of the joint is determined based on the coordinate information of the contact point, the vector between the joint and the fingertip, and the first rotation matrix; Based on the coordinate information of the joint, determine the transition matrix between the coordinate system with the joint as the origin and the world coordinate system; Control the robot to work according to the grasping posture corresponding to the transition matrix; The preset grasping network classifies and predicts whether each point in the point cloud image is a grasping point.
2. The robot control method according to claim 1, characterized in that, The fingertip has a contact surface that contacts the object being grasped. The contact surface is clustered based on a clustering algorithm to obtain the contact point.
3. The robot control method according to claim 1, characterized in that, The step of determining the transition matrix between the coordinate system with the joint as the origin and the world coordinate system based on the joint's coordinate information specifically includes: Based on the coordinate information of the joint, a second rotation matrix is constructed with the coordinate system of the joint as the origin relative to the world coordinate system; Based on the coordinate information of the joint and the second rotation matrix, determine the transition matrix between the coordinate system with the joint as the origin and the world coordinate system.
4. The robot control method according to claim 1, characterized in that, The step of constructing a first rotation matrix relative to the world coordinate system with the contact point as the origin, based on the coordinate information of the contact point, specifically includes: Based on the coordinate information of the contact point, a first coordinate system is constructed, wherein the direction of a certain coordinate axis of the first coordinate system is the same as the normal vector of the fingertip; The first rotation matrix is determined based on the first coordinate system.
5. The robot control method according to any one of claims 1 to 4, characterized in that, Also includes: Build the training dataset; The crawling network is trained based on the training dataset until the loss function corresponding to the crawling network converges, thus obtaining the preset crawling network.
6. The robot control method according to claim 5, characterized in that, The construction of the training dataset specifically includes: Determine the model of the object to be grasped; Based on the coordinate system of the model of the object to be grasped, the grasping posture and corresponding joint parameters are generated; The grasping evaluation value is determined based on the grasping posture and the corresponding joint parameters; Record the grasping posture and corresponding joint parameters when the grasping evaluation value is greater than the preset evaluation value.
7. A control device for a robot, characterized in that, The robot includes a mechanical claw having joints and fingertips, and the control device includes: The acquisition unit is used to acquire a point cloud image containing the object being captured. The prediction unit is used to input the point cloud image into a preset grasping network, so that the preset grasping network can perform classification and prediction based on the point cloud image to obtain the contact point between the fingertip and the grasped object. The construction unit is used to construct a first rotation matrix relative to the world coordinate system with the contact point as the origin, based on the coordinate information of the contact point. The determining unit is configured to determine the coordinate information of the joint based on the coordinate information of the contact point, the vector between the joint and the fingertip, and the first rotation matrix; The processing unit is used to determine the transition matrix between the coordinate system with the joint as the origin and the world coordinate system based on the coordinate information of the joint. A control unit is used to control the robot to work according to the grasping posture corresponding to the transfer matrix; The preset grasping network classifies and predicts whether each point in the point cloud image is a grasping point.
8. A readable storage medium, characterized in that, The readable storage medium stores a program or instructions that, when executed by a processor, implement the steps of the method as described in any one of claims 1 to 6.
9. A robot, characterized in that, include: The robot control device as described in claim 7; and / or The readable storage medium as described in claim 8.
10. A robot, characterized in that, include: A mechanical gripper having joints and fingertips; A processor that implements the steps of the method as described in any one of claims 1 to 6.
11. The robot according to claim 10, characterized in that, There are multiple joints and multiple fingertips, with each of the multiple joints corresponding to one of the multiple fingertips.
12. The robot according to claim 11, characterized in that, The number of fingertips is three.