Object pose estimation method and apparatus, storage medium, and electronic device

By performing rotation sampling on the 3D object model and target object category matching, the problems of low recall and low efficiency in object pose recognition in existing technologies are solved, and efficient pose estimation in complex environments is achieved.

CN122199657APending Publication Date: 2026-06-12GUANGZHOU SHIYUAN INNOVATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
GUANGZHOU SHIYUAN INNOVATION TECH CO LTD
Filing Date
2024-12-10
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing technologies have low recall and efficiency in object pose recognition in complex environments, especially in scenarios with object occlusion, stacking, and reflection, making accurate grasping difficult.

Method used

By establishing 3D object models of various categories, performing rotational sampling to obtain sub-model data, and identifying the target object category for pose estimation matching to reduce the number of model matches, pose estimation is performed using the anisotropic data after rotational sampling.

🎯Benefits of technology

It improves recall and pose estimation efficiency in complex environments, reduces the workload of model matching, and improves recognition accuracy in complex environments such as stacking.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122199657A_ABST
    Figure CN122199657A_ABST
Patent Text Reader

Abstract

Embodiments of the present application provide a kind of object pose estimation method, device, storage medium and electronic equipment, method includes: the three-dimensional object model of each category is rotated sampling, obtains several sub-model data of each category;Sub-model data includes sampling point cloud data and sampling rotation angle;According to the target image of target object, determine the target point cloud data of target object, the initial position of target object and target category;Obtain several target sub-model data corresponding to target category;The sampling point cloud data in several target sub-model data is matched with target point cloud data, determines the matching sampling point cloud data and the transformation relationship of matching sampling point cloud data and target point cloud data;According to transformation relationship, the initial position of target object, the sampling rotation angle corresponding to matching sampling point cloud data, obtain pose estimation result, the efficiency of the present application embodiment can improve pose estimation, improve recall rate.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of pose calculation, and in particular to a method, apparatus, storage medium, and electronic device for estimating the pose of an object. Background Technology

[0002] With the rapid development of industrial automation, robotics technology has been widely applied in various fields such as grasping, assembly, packaging, processing, and logistics sorting. However, a common challenge in these applications is how to accurately grasp objects in complex environments. Especially in scenarios with disordered stacking, robots need to be able to identify different categories of objects and calculate their positions and orientations, which is crucial for improving operational efficiency and accuracy.

[0003] In related technologies, 2D or 3D vision systems are used for pose recognition. 2D vision systems perform well in object recognition tasks and can effectively solve the recognition problem, but they cannot provide object pose information and cannot accurately grasp objects. While 3D vision systems can recognize objects and estimate their poses, providing more information than 2D vision, their recall and efficiency cannot be guaranteed in complex scenarios such as object occlusion, stacking, and reflections. Summary of the Invention

[0004] To overcome the problems existing in related technologies, this application provides an object pose estimation method, apparatus, storage, and electronic device, which can improve recall and estimation efficiency.

[0005] According to a first aspect of the embodiments of this application, an object pose estimation method is provided, comprising the following steps:

[0006] Acquire three-dimensional object models of several categories; perform rotation sampling on the three-dimensional object models of each category to obtain several sub-model data of each category; the sub-model data includes sampled point cloud data and sampled rotation angle;

[0007] Acquire a target image of the target object; based on the target image, determine the target point cloud data of the target object, the initial position of the target object, and the target category;

[0008] Based on the target category, obtain corresponding target sub-model data;

[0009] The target point cloud data is matched with sampled point cloud data in several target sub-model data to determine the matched sampled point cloud data and the transformation relationship between the matched sampled point cloud data and the target point cloud data.

[0010] Based on the transformation relationship, the initial position of the target object, and the sampling rotation angle corresponding to the matched sampling point cloud data, the pose estimation result is obtained.

[0011] According to a second aspect of the embodiments of this application, an object pose estimation apparatus is provided, comprising:

[0012] The sub-model data acquisition module is used to acquire three-dimensional object models of several categories; to perform rotation sampling on the three-dimensional object models of each category to obtain several sub-model data of each category; the sub-model data includes sampled point cloud data and sampled rotation angle;

[0013] The target image acquisition module is used to acquire a target image of a target object; and based on the target image, to determine the target point cloud data of the target object, the initial position of the target object, and the target category.

[0014] The target data acquisition module is used to obtain several target sub-model data according to the target category;

[0015] The transformation relationship determination module is used to match the target point cloud data with sampled point cloud data in several target sub-model data, and determine the matched sampled point cloud data and the transformation relationship between the matched sampled point cloud data and the target point cloud data.

[0016] The pose estimation module is used to obtain the pose estimation result based on the transformation relationship, the initial position of the target object, and the sampling rotation angle corresponding to the matched sampling point cloud data.

[0017] According to a third aspect of the embodiments of this application, an electronic device is provided, including a processor and a memory; the memory stores a computer program adapted to be loaded by the processor and executed as described above in the object pose estimation method.

[0018] According to a fourth aspect of the embodiments of this application, a computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a processor, implements the object pose estimation method as described above.

[0019] This application embodiment acquires several categories of 3D object models; performs rotation sampling on the 3D object models of each category to obtain several sub-model data for each category; the sub-model data includes sampled point cloud data and sampling rotation angle; acquires a target image of the target object; determines the target point cloud data, the initial position of the target object, and the target category based on the target image; obtains several corresponding target sub-model data based on the target category; matches the target point cloud data with the sampled point cloud data in the several target sub-model data to determine the matched sampled point cloud data and the transformation relationship between the matched sampled point cloud data and the target point cloud data; obtains a pose estimation result based on the transformation relationship, the initial position of the target object, and the sampling rotation angle corresponding to the matched sampled point cloud data. Furthermore, by identifying the category of the target object and performing pose estimation matching on the sub-model data of the 3D object model of the target object category after rotation sampling, the number of model matching requirements is reduced, the workload of model matching is decreased, and the efficiency of pose estimation is improved. Simultaneously, performing pose estimation based on sub-model data with anisotropic data after rotation sampling can improve the recall rate in complex environments such as stacking.

[0020] It should be understood that the above general description and the following detailed description are exemplary and explanatory only, and do not limit this application.

[0021] To better understand and implement this invention, the following detailed description is provided in conjunction with the accompanying drawings. Attached Figure Description

[0022] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0023] Figure 1 This is a flowchart illustrating an object pose estimation method according to one embodiment of this application;

[0024] Figure 2 This is a flowchart illustrating a method for rotational sampling of a three-dimensional object model according to an embodiment of this application;

[0025] Figure 3 This is a flowchart illustrating a method for determining target point cloud data, the initial position of a target object, and the target category, as shown in one embodiment of this application.

[0026] Figure 4 A flowchart illustrating a method for determining a matching transformation relationship according to an embodiment of this application;

[0027] Figure 5 A flowchart illustrating a method for initial registration transformation relationships in one embodiment of this application;

[0028] Figure 6 A flowchart illustrating a method for determining the pose estimation result in one embodiment of this application;

[0029] Figure 7 This is a schematic block diagram of an object pose estimation device according to one embodiment of this application;

[0030] Figure 8 This is a schematic diagram of the structure of an electronic device according to one embodiment of this application. Detailed Implementation

[0031] To make the objectives, technical solutions, and advantages of this application clearer, the embodiments of this application will be described in further detail below with reference to the accompanying drawings. Wherein, when the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements.

[0032] It should be understood that the embodiments described below do not represent all embodiments consistent with this application. Rather, they are merely examples of apparatuses and methods consistent with some aspects of this application as detailed in the appended claims. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without inventive effort are within the scope of protection of this application.

[0033] The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The singular forms "a" and "the" as used herein are also intended to include the plural forms unless the context clearly indicates otherwise. Furthermore, in the description of this application, unless otherwise stated, "a plurality of" means two or more. It should also be understood that the term "and / or" as used herein refers to and includes any or all possible combinations of one or more associated listed items, for example, A and / or B, which can represent: A alone, A and B together, and B alone; the character " / " generally indicates that the preceding and following objects are in an "or" relationship.

[0034] It should be understood that although the terms first, second, third, etc., may be used in this application to describe various information, this information should not be limited to these terms, and these terms are only used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence, nor should they be construed as indicating or implying relative importance. Those skilled in the art can understand the specific meaning of the above terms in this application according to the specific circumstances. Depending on the context, the word "if" as used in this application can be interpreted as "when," "when," or "in response to determination."

[0035] In scenarios with disordered stacking, robots need to be able to identify different categories of objects and calculate their positions and poses, which is crucial for improving operational efficiency and accuracy. Related technologies employ either 2D or 3D vision systems for pose recognition. 2D vision systems perform well in object recognition tasks, effectively solving the recognition problem, but they cannot provide object pose information, let alone accurately grasp objects. While 3D vision systems can identify objects and estimate their poses, providing more information than 2D vision, their recall and efficiency are relatively low in complex scenarios such as object occlusion, stacking, and reflections.

[0036] This application establishes 3D object models for various categories, rotates and samples the 3D object models to obtain sub-model data, and then identifies the category of the target object, performs pose estimation matching on the sub-model data corresponding to the category of the target object, reducing the number of model matching and improving the efficiency of pose estimation. At the same time, since the sub-model data after rotation sampling is anisotropic, using it for pose estimation can improve the recall rate in complex environments such as stacking.

[0037] The application scenarios of this application include robots; the robot includes a camera component, a motion component, a robotic arm assembly, and a host. The object pose estimation method in this application embodiment is executed by the host, which can be built into the robot or placed externally, connected to the camera component, motion component, and robotic arm assembly via wired or wireless means. Specifically, the camera component captures an image of the target object and sends the image to the host; the host performs recognition and pose estimation on the target object image to obtain a pose estimation result; based on the pose estimation result, the host controls the motion component to move the robotic arm assembly to an appropriate position so that the robotic arm assembly can reach the target object, thereby controlling the movement of the robotic arm assembly to grasp or manipulate the target object.

[0038] The following will be combined with the appendix Figures 1 to 6 This paper provides a detailed description of the object pose estimation method provided in the embodiments of this application.

[0039] Please see Figure 1The object pose estimation method provided in this application includes the following steps:

[0040] Step S101: Obtain three-dimensional object models of several categories; perform rotation sampling on the three-dimensional object models of each category to obtain several sub-model data of each category; the sub-model data includes sampled point cloud data and sampled rotation angle.

[0041] In one embodiment, three-dimensional object models of several categories are created using 3D modeling software. In another embodiment, ready-made three-dimensional object models of several categories are downloaded from an online database or model library. In yet another embodiment, three-dimensional object models are obtained by scanning objects of several categories from the real world using 3D scanning technology.

[0042] In one embodiment, a three-dimensional object model is placed in a virtual modeling environment. A spatial rectangular coordinate system is established with the center of the three-dimensional object model as the origin. The three-dimensional object model is rotated and sampled with the X, Y, or Z axis as the rotation axis and the preset rotation angle as the sampling step size. After each rotation, the point cloud data of the three-dimensional object model under the current view is obtained as the sampled point cloud data, and the corresponding rotation angle is used as the sampled rotation angle.

[0043] In another embodiment, a three-dimensional object model is placed in a virtual modeling environment, and a spatial rectangular coordinate system is established with the center of the three-dimensional object model as the origin. A virtual observation camera is set up, and the virtual observation camera rotates around the X, Y, or Z axis with a preset rotation step size. After each rotation, the three-dimensional object model observed by the virtual observation camera is used as the sampled point cloud data, and the corresponding rotation angle is used as the sampled rotation angle. The angle between the virtual observation camera and the three-dimensional object model is used as the sampled rotation angle.

[0044] In one embodiment, the robot executes step S101 each time it performs object pose estimation to obtain several sub-model data for each category. In another embodiment, to improve the efficiency of object pose estimation, when the robot performs object pose estimation for the first time, it executes step S101 to obtain several sub-model data for each category, and then stores the several sub-model data for each category. In subsequent object pose estimations, step S101 is not executed again; instead, the stored sub-model data is directly retrieved.

[0045] Step S102: Obtain the target image of the target object; based on the target image, determine the target point cloud data of the target object, the initial position of the target object, and the target category.

[0046] In one embodiment, a target image of the target object is obtained by taking a picture of the target object using a camera component on the robot.

[0047] In one embodiment, point cloud data of the target object is extracted from the target image using 3D reconstruction technology, and the initial position of the target object is obtained by locating the target object; the target image is then input into a classification model to obtain the target category to which the object belongs.

[0048] Step S103: Obtain several target sub-model data according to the target category.

[0049] It is understood that the mapping relationship between each category and the corresponding sub-model data has been stored in step S101. Therefore, in one embodiment, several sub-model data corresponding to the target category where the target object is located are used as several target sub-model data.

[0050] Step S104: Match the target point cloud data with the sampled point cloud data in several target sub-model data to determine the matched sampled point cloud data and the transformation relationship between the matched sampled point cloud data and the target point cloud data.

[0051] In one embodiment, feature extraction is performed on the target point cloud data and the sampled point cloud data in several target sub-model data respectively. The extracted features are compared to determine the sampled point cloud data with the highest matching degree with the target point cloud data, which is then used as the matched sampled point cloud data.

[0052] In one embodiment, the rotation and translation matrix between the matched sampled point cloud data and the target point cloud data is used as the transformation relationship between the matched sampled point cloud data and the target point cloud data.

[0053] Step S105: Based on the transformation relationship, the initial position of the target object, and the sampling rotation angle corresponding to the matched sampling point cloud data, obtain the pose estimation result.

[0054] Understandably, pose estimation results include the estimated position and estimated angle of the target object. The estimated position is obtained by transforming the initial position of the target object according to the transformation relationship; the estimated angle is obtained by the sampling rotation angle corresponding to the matched sampled point cloud data.

[0055] This application embodiment acquires several categories of 3D object models; performs rotation sampling on each category of 3D object models to obtain several sub-model data for each category; the sub-model data includes sampled point cloud data and sampling rotation angle; acquires a target image of the target object; determines the target point cloud data, initial position, and target category of the target object based on the target image; obtains several corresponding target sub-model data based on the target category; matches the target point cloud data with the sampled point cloud data in the several target sub-model data to determine the matched sampled point cloud data and the transformation relationship between the matched sampled point cloud data and the target point cloud data; obtains the pose estimation result based on the transformation relationship, the initial position of the target object, and the sampling rotation angle corresponding to the matched sampled point cloud data. Furthermore, by identifying the category of the target object and performing pose estimation matching on the sub-model data of the 3D object model of the target object category after rotation sampling, the number of model matchings is reduced, the workload of model matching is decreased, and the efficiency of pose estimation is improved. Simultaneously, performing pose estimation based on sub-model data with anisotropic data after rotation sampling can improve the recall rate in complex environments such as stacking.

[0056] Please see Figure 2 In one embodiment, step S101, which involves rotating and sampling the 3D object model to obtain data for several sub-models, includes:

[0057] Step S1011: Establish a spatial rectangular coordinate system with the center of the three-dimensional object model as the origin, and establish an isosphere with the center of the three-dimensional object model as the center of the sphere and the preset length as the radius of the sphere.

[0058] In one embodiment, a spatial rectangular coordinate system is established with the center of the 3D object model as the origin, the horizontal direction of the 3D object model as the X-axis, the direction perpendicular to the screen as the Y-axis, and the vertical line as the Z-axis.

[0059] Understandably, the preset length is generally much greater than the maximum distance between the center and the surface of the 3D object model, so that the isosphere is located outside the 3D object model and wraps around it.

[0060] Step S1012: Set up a virtual observation camera facing the center of the sphere, with the X, Y or Z axis of the spatial rectangular coordinate system as the rotation axis, and the preset rotation angle as the sampling step size, to uniformly sample along the equisional sphere to obtain a number of sampling point cloud data; each time sampling, the angle of the virtual observation camera relative to the three-dimensional object model is used as the corresponding sampling rotation angle.

[0061] Considering that 3D object models can be asymmetric or symmetric, different sampling methods can be adopted for different models.

[0062] For asymmetric models, since each change in angle produces a significantly different appearance, the purpose of rotational sampling is to capture all possible viewpoints of the object to facilitate subsequent feature extraction and recognition. For such objects, sampling with a uniform rotation step size can be used to obtain the corresponding sub-model data.

[0063] For symmetrical models, since rotation along the axis of symmetry does not change their appearance, the sampling angular range can be reduced. Uniform sampling can be performed only within one or a few quadrants of the axis of symmetry, and then replicated to other quadrants using symmetry. To ensure sampling accuracy, the virtual observation camera can optionally also rotate around the origin to obtain sub-model data in different directions.

[0064] In this embodiment, the model is rotated along the X, Y, or Z axis of a spatial rectangular coordinate system, with a preset rotation angle as the sampling step size. Uniform sampling is performed along an isotropic surface, which can obtain anisotropic data of a three-dimensional object model. This reduces the amount of data sampling and data dispersion, and improves the recall rate and pose estimation accuracy in complex environments such as stacking.

[0065] Please see Figure 3 In an optional embodiment, step S102, which involves obtaining the target point cloud data of the target object, the initial position of the target object, and the target category based on the target image, includes:

[0066] Step S1021: Identify the target image and determine the range of the target object.

[0067] Optionally, the target image is input into an object detection model for recognition to obtain the range of the target object. The object detection model can be a model trained using a deep learning algorithm. The deep learning algorithm can be YOLOv7 (You Only Look Once version 7, YOLOv7 object detection network) or similar algorithms; this application is not limited to any particular algorithm.

[0068] Optionally, the target image is identified using an object detection algorithm to obtain the range of the target object. The object detection algorithm can be any existing algorithm capable of object recognition, and this application does not impose any limitations on it.

[0069] The target object range can be an area that completely covers the target object, and in terms of implementation, it can be a rectangular envelope, a circular envelope, etc. that encloses the target object.

[0070] Step S1022: Detect and segment the target object range to obtain the initial position of the target object, the mask of the target object, and the target category.

[0071] Optionally, the target object range is input into an object segmentation and recognition model for segmentation and category identification to obtain the initial position of the target object, the mask of the target object, and the target category. The object segmentation and recognition model can be a model such as SAM (Segment Anything Model), and this application is not limited thereto. It is understood that the target object range can also be input into an object segmentation model for segmentation to obtain the segmented target object, the corresponding initial position of the target object, and the corresponding mask of the target object, and then the segmented target object can be input into a category recognition model to obtain the target category.

[0072] Optionally, the target object area is segmented and its category identified using an object segmentation algorithm to obtain the initial position of the target object, the mask of the target object, and the target category. The object segmentation algorithm can be any existing algorithm capable of object segmentation, and this application does not impose any restrictions.

[0073] The mask of the target object refers to a matrix with the same size as the target object image, where the target object region is marked as 1 (or any non-zero value), while the background region is marked as 0.

[0074] Step S1023: Extract point cloud data based on the mask of the target object to obtain target point cloud data.

[0075] It is understandable that the corresponding point cloud data can be obtained based on the target image of the target object, and the mask of the target object can be applied to the point cloud data to obtain the target point cloud data of the target object.

[0076] The embodiments of this application improve the accuracy of target object recognition by first detecting the range where the target object is located, and then segmenting the range where the target object is located to obtain a mask of the target object.

[0077] Please see Figure 4 In one embodiment, step S104, which involves matching the target point cloud data with sampled point cloud data from several target sub-model data to determine the matched sampled point cloud data and the transformation relationship between the matched sampled point cloud data and the target point cloud data, includes:

[0078] Step S1041: Obtain the first fast point feature histogram of the target point cloud data and the second fast point feature histogram of the sampled point cloud data in several target sub-model data.

[0079] Optionally, the step of obtaining the first fast point feature histogram of the target point cloud data includes: calculating the normal vector and curvature of all points in the neighborhood relative to each point in the target point cloud data; quantizing the normal vector and curvature of all points in the neighborhood relative to the point to obtain the histogram of each point; and normalizing the histogram of each point to obtain the first fast point feature histogram.

[0080] After calculating the normal vectors of all points in the neighborhood relative to the given point, the direction of all normal vectors is adjusted to ensure the resulting fast point feature histogram is more realistic, pointing from the inside of the object outwards. Before calculating the normal vectors and curvature of all points in the neighborhood relative to the given point, voxel filtering can be performed on the target point cloud data to remove impurities and improve registration accuracy.

[0081] Optionally, the step of obtaining the second fast point feature histogram of several target sub-model data includes: calculating the normal vector and curvature of all points in the neighborhood relative to each point in the sampled point cloud data of the target sub-model data; quantizing the normal vector and curvature of all points in the neighborhood relative to the point to obtain the histogram of each point; and normalizing the histogram of each point to obtain the second fast point feature histogram.

[0082] After calculating the normal vectors of all points in the neighborhood relative to the given point, the direction of all normal vectors is adjusted to ensure that the resulting fast point feature histogram is more realistic, pointing from the inside of the object to the outside. Furthermore, before calculating the normal vectors and curvature of all points in the neighborhood relative to the given point, voxel filtering can be performed on the sampled point cloud data to remove impurities and improve registration accuracy.

[0083] Understandably, the robot can acquire the second fast point feature histograms of several target sub-model data each time it performs pose estimation; alternatively, the robot can calculate histograms of the sub-model data of each category during the first pose estimation, obtain the second fast point feature histograms of each sub-model data, and store the second fast point feature histograms of each sub-model data. In subsequent pose estimations, the robot can directly acquire the stored second fast point feature histograms of the target sub-model data.

[0084] Step S1042: Based on the first fast point feature histogram and the second fast point feature histogram, register the target point cloud data with the sampled point cloud data in several target sub-model data to obtain the transformation relationship between the initially registered sampled point cloud data and the target point cloud data.

[0085] Step S1043: Based on the transformation relationship of the initial registration, perform iterative nearest point calculation between the target point cloud data and several sampled point cloud data of the initial registration to determine the matching sampled point cloud data and the transformation relationship between the matching sampled point cloud data and the target point cloud data.

[0086] This application embodiment uses a first fast point feature histogram and a second fast point feature histogram to obtain the transformation relationship between several sampled point cloud data and target point cloud data in the initial registration. Based on the transformation relationship of the initial registration, the target point cloud data and several sampled point cloud data in the initial registration are iteratively calculated to determine the matching sampled point cloud data and the transformation relationship between the matching sampled point cloud data and the target point cloud data. By continuously iterating the transformation relationship, a precise transformation relationship between the sampled point cloud data and the target point cloud data can be obtained, thereby improving the accuracy of pose estimation.

[0087] Please see Figure 5 In one embodiment, step S1041, which involves registering the target point cloud data with sampled point cloud data from several target sub-model data based on the first fast point feature histogram and the second fast point feature histogram, to obtain the transformation relationship between the initially registered sampled point cloud data and the target point cloud data, includes:

[0088] Step S1411: Select several sampling points in the target point cloud data.

[0089] Step S1412: Based on each sampling point, find several similar points in the sampling point cloud data of the target sub-model data that have similar fast point feature histograms to the sampling point; randomly select one point from the several similar points as the point that corresponds one-to-one with the target point cloud in the sampling point cloud data of the target sub-model data, and calculate the rotation and translation matrix between the corresponding points; based on the rotation and translation matrix, transform the target point cloud to obtain the sum of distance errors between the transformed target point cloud and the sampling point cloud data in the target sub-model data, and obtain the sum of distance errors corresponding to each sampling point;

[0090] Step S1413: Based on the rotation and translation matrix corresponding to the minimum distance error, determine the relationship between the changes in the initially registered sampled point cloud data and the target point cloud data.

[0091] This application embodiment determines similar points based on feature histograms and sampled point cloud data in the target sub-model data. Based on these similar points, it calculates rotation and translation matrices for the target point cloud and the sampled point cloud data in the target sub-model data. Based on these rotation and translation matrices, it transforms the target point cloud to obtain the sum of distance errors between the transformed target point cloud and the sampled point cloud data in the target sub-model data. It then obtains the sum of distance errors corresponding to each sampled point. Based on the rotation and translation matrix corresponding to the minimum distance error, it determines the change relationship between the sampled point cloud data in the target sub-model data and the target point cloud data, thereby improving the efficiency of determining the transformation relationship.

[0092] In an optional embodiment, step S1043, which involves iteratively calculating the nearest point between the target point cloud data and several sampled point cloud data from the initial registration based on the transformation relationship of the initial registration, to determine the matching sampled point cloud data and the transformation relationship between the matching sampled point cloud data and the target point cloud data, includes:

[0093] Step S1431: Based on the transformation relationship of the initial registration, transform the target point cloud data to obtain the first point cloud data, and perform the following matching steps:

[0094] Step S1432: Based on each point in the first point cloud data, find the nearest corresponding point in the initially registered sampled point cloud data to form an initial corresponding point pair;

[0095] Step S1433: Based on the initial corresponding point pairs, obtain the first transformation relationship; according to the first transformation relationship, transform the first point cloud data to obtain the second point cloud data;

[0096] Step S1434: Obtain the distance error between the second point cloud data and several sampled point cloud data from the initial registration;

[0097] Step S1435: When the distance error is greater than the preset threshold or the number of iterations is less than the preset number, update the second point cloud data to the first point cloud data and continue to execute the matching step until the distance error is less than the preset threshold or the number of iterations is greater than the preset number. Obtain the distance error and the corresponding first transformation relationship corresponding to each initially registered sampled point cloud data. Use the initially registered sampled point cloud data with the smallest distance error as the matched sampled point cloud data, and use the corresponding first transformation relationship as the transformation relationship between the matched sampled point cloud data and the target point cloud data.

[0098] The embodiments of this application are based on iterative nearest point calculation to determine the matching sampled point cloud data and the transformation relationship between the matching sampled point cloud data and the target point cloud data, which can improve the accuracy of matching.

[0099] Please see Figure 6In one embodiment, step S105, which involves obtaining the pose estimation result based on the transformation relationship, the initial position of the target object, and the sampling rotation angle corresponding to the matched sampling point cloud data, includes:

[0100] Step S1051: Based on the transformation relationship, transform the initial position of the target object to obtain the estimated position of the target object;

[0101] Step S1052: Use the sampling rotation angle corresponding to the matched sampling point cloud data as the estimated angle of the target object;

[0102] Step S1053: Obtain the pose estimation result based on the estimated position and estimated angle.

[0103] The embodiments of this application obtain the estimated position of the target object based on the transformation relationship, and determine the estimated angle of the target object based on the sampling rotation angle corresponding to the matched sampling point cloud data, which can improve the efficiency of the estimated pose result.

[0104] In one embodiment, the sub-model data further includes the matching probability of the target object matching the sub-model; step S103, which involves obtaining several corresponding target sub-model data according to the target category, includes:

[0105] Step S1031: Arrange the matching probabilities of several sub-model data corresponding to the target category in descending order, and select the sub-model data that is arranged before the preset position as the target sub-model data.

[0106] Compared to using all sub-model data for pose estimation, the embodiments of this application use sub-model data with higher matching probability for pose estimation, which can improve the efficiency of pose estimation.

[0107] In one embodiment, after step 105, which involves obtaining the pose estimation result based on the transformation relationship, the initial position of the target object, and the sampling rotation angle corresponding to the matched sampling point cloud data, the following steps are included:

[0108] Step S106: Update the matching probability of each sub-model data according to the number of sub-model data corresponding to the target category, the historical number of successful matching of the target category, and the number of successful matching of each sub-model data.

[0109] Assuming the number of sub-model data corresponding to the target category is *a*, the number of successful historical matches to the target category is *b*, and the number of successful matches registered to the *i*-th sub-template is *c*, then the probability distribution of the *i*-th sub-template can be updated as P. i = (1+c) / (a+b).

[0110] This application embodiment updates the matching probability of sub-model data by statistically analyzing the poses of objects that have been successfully matched in each instance. Then, it performs matching based on sub-model data with higher probabilities, eliminating some poses that cannot be stably placed in the real world, reducing the number of sub-model data matches, thereby greatly improving the recall and efficiency of pose estimation.

[0111] Please see Figure 7 This is a schematic diagram of the object pose estimation device provided in the second embodiment of this application. The device 200 includes:

[0112] The sub-model data acquisition module 201 is used to acquire three-dimensional object models of several categories; to perform rotation sampling on the three-dimensional object models of each category to obtain several sub-model data of each category; the sub-model data includes sampled point cloud data and sampled rotation angle;

[0113] The target image acquisition module 202 is used to acquire the target image of the target object; and based on the target image, to determine the target point cloud data of the target object, the initial position of the target object, and the target category.

[0114] The target data acquisition module 203 is used to obtain several target sub-model data according to the target category;

[0115] The transformation relationship determination module 204 is used to match the target point cloud data with the sampled point cloud data in several target sub-model data, and determine the matched sampled point cloud data and the transformation relationship between the matched sampled point cloud data and the target point cloud data.

[0116] The pose estimation module 205 is used to obtain the pose estimation result based on the transformation relationship, the initial position of the target object, and the sampling rotation angle corresponding to the matched sampling point cloud data.

[0117] It should be noted that the object pose estimation device provided in the second embodiment of this application is only illustrated by the above-described division of functional modules when executing the object pose estimation method. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the object pose estimation device provided in the second embodiment of this application and the object pose estimation method in the first embodiment of this application belong to the same concept, and its implementation process is detailed in the method embodiment, which will not be repeated here.

[0118] The object pose estimation device of the second embodiment of this application can be applied to a computer device. This device embodiment can be implemented by software, hardware, or a combination of software and hardware. Taking software implementation as an example, as a logical device, it is formed by the processor that processes the file reading the corresponding computer program instructions in the memory and executing them. From a hardware perspective, the computer device in which it resides may include a processor and a memory, which are interconnected via a data bus or other known means.

[0119] Please see Figure 8 This is a schematic diagram of the structure of the electronic device provided in the third embodiment of this application. Figure 8 As shown, the electronic device 300 can specifically be a computer, mobile phone, tablet computer, interactive flat panel, etc. In an exemplary embodiment of this application, the electronic device 300 may include: at least one processor 310, at least one memory 320, at least one display 330, at least one network interface 340, user interface 350, and at least one communication bus 360.

[0120] The communication bus 360 is used to enable communication between these components.

[0121] The user interface 350 may include a display screen and a camera; the user interface 350 may also include standard wired and wireless interfaces.

[0122] The network interface 340 may optionally include a standard wired interface and a wireless interface (such as a Wi-Fi interface).

[0123] The processor 310 may include one or more processing cores. The processor 310 connects to various parts within the electronic device 300 using various interfaces and lines, and performs various functions and processes data by running or executing instructions, programs, code sets, or instruction sets stored in the memory 320, and by calling data stored in the memory 320. Optionally, the processor 310 may be implemented using at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), or Programmable Logic Array (PLA). The processor 310 may integrate one or a combination of several of the following: Central Processing Unit (CPU), Graphics Processing Unit (GPU), and modem. The CPU primarily handles the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the content required for display; and the modem handles wireless communication. It is understood that the modem may also be implemented as a separate chip without being integrated into the processor 310.

[0124] The memory 320 may include random access memory (RAM) or read-only memory. Optionally, the memory 320 may include a non-transitory computer-readable storage medium. The memory 320 can be used to store instructions, programs, code, code sets, or instruction sets. The memory 320 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as touch function, sound playback function, image playback function, etc.), instructions for implementing the above-described method embodiments, etc.; the data storage area may store data involved in the above-described method embodiments, etc. Optionally, the memory 320 may also be at least one storage device located remotely from the aforementioned processor 310. Figure 8 As shown, the memory 320, which serves as a computer storage medium, may include an operating system, a network communication module, and a user.

[0125] exist Figure 8In the electronic device 300 shown, the user interface 350 is mainly used to provide an input interface for the user and to obtain the user input data; while the processor 310 can be used to call the application program stored in the memory 320, such as an application program for object pose estimation; and execute the relevant operations of any object pose estimation method in the above embodiments, and has the corresponding functions and beneficial effects.

[0126] The fourth embodiment of this application also provides a computer-readable storage medium storing a computer program thereon. The instructions are adapted to be loaded by a processor and executed by the steps of the object pose estimation method described above. For the specific execution process, please refer to the detailed description shown in the embodiment, which will not be repeated here. The device containing the storage medium can be an electronic device such as a personal computer, laptop computer, smartphone, or tablet computer.

[0127] For the device embodiments, since they basically correspond to the method embodiments, the relevant parts can be referred to in the description of the method embodiments. The device embodiments described above are merely illustrative, wherein the components described as separate parts may or may not be physically separate, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this application according to actual needs. Those skilled in the art can understand and implement this without creative effort.

[0128] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0129] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1The computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function selected in one or more boxes.

[0130] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function selected in one or more boxes.

[0131] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0132] Memory may include non-persistent memory in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, like read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0133] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0134] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes that element.

[0135] The above are merely embodiments of this application and are not intended to limit the scope of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of the claims of this application.

Claims

1. A method for estimating the pose of an object, characterized in that, Includes the following steps: Acquire three-dimensional object models of several categories; perform rotation sampling on the three-dimensional object models of each category to obtain several sub-model data of each category; the sub-model data includes sampled point cloud data and sampled rotation angle; Acquire a target image of the target object; based on the target image, determine the target point cloud data of the target object, the initial position of the target object, and the target category; Based on the target category, obtain corresponding target sub-model data; The target point cloud data is matched with sampled point cloud data in several target sub-model data to determine the matched sampled point cloud data and the transformation relationship between the matched sampled point cloud data and the target point cloud data. Based on the transformation relationship, the initial position of the target object, and the sampling rotation angle corresponding to the matched sampling point cloud data, the pose estimation result is obtained.

2. The object pose estimation method according to claim 1, characterized in that: The step of rotating and sampling the three-dimensional object model to obtain data for several sub-models includes: A spatial rectangular coordinate system is established with the center of the three-dimensional object model as the origin, and an equal spherical surface is established with the center of the three-dimensional object model as the center of the sphere and a preset length as the radius of the sphere. A virtual observation camera facing the center of the sphere is set up, with the X, Y or Z axis of the spatial rectangular coordinate system as the rotation axis and a preset rotation angle as the sampling step size. It performs uniform sampling along the spherical surface to obtain a number of sampling point cloud data. Each time sampling is performed, the angle of the virtual observation camera relative to the three-dimensional object model is used as the corresponding sampling rotation angle.

3. The object pose estimation method according to claim 1, characterized in that: The sub-model data also includes the matching probability of the target object matching the sub-model; The step of obtaining corresponding target sub-model data based on the target category includes: The matching probabilities of several sub-model data corresponding to the target category are arranged from largest to smallest, and the sub-model data arranged before the preset position is selected as the target sub-model data.

4. The object pose estimation method according to claim 3, characterized in that: After the step of obtaining the pose estimation result based on the transformation relationship, the initial position of the target object, and the sampling rotation angle corresponding to the matched sampling point cloud data, the following steps are included: The matching probability of each sub-model data is updated based on the number of sub-model data corresponding to the target category, the historical number of successful matches of the target category, and the number of successful matches of each sub-model data.

5. The object pose estimation method according to claim 1, characterized in that: The step of determining the target point cloud data of the target object, the initial position of the target object, and the target category based on the target image includes: The target image is identified to determine the range of the target object; The target object range is detected and segmented to obtain the initial position of the target object, the mask of the target object, and the target category; Point cloud extraction is performed based on the mask of the target object to obtain the target point cloud data of the target object.

6. The object pose estimation method according to claim 1, characterized in that: The step of obtaining the pose estimation result based on the transformation relationship, the initial position of the target object, and the sampling rotation angle corresponding to the matched sampling point cloud data includes: Based on the transformation relationship, the initial position of the target object is transformed to obtain the estimated position of the target object; The sampling rotation angle corresponding to the matched sampling point cloud data is used as the estimated angle of the target object; Based on the estimated position and the estimated angle, the pose estimation result is obtained.

7. The object pose estimation method according to any one of claims 1 to 6, characterized in that: The step of matching the target point cloud data with sampled point cloud data from several target sub-model data to determine the matched sampled point cloud data and the transformation relationship between the matched sampled point cloud data and the target point cloud data includes: Obtain the first fast point feature histogram of the target point cloud data and the second fast point feature histogram of the sampled point cloud data in several target sub-model data; Based on the first fast point feature histogram and the second fast point feature histogram, the target point cloud data is registered with the sampled point cloud data in several target sub-model data to obtain the transformation relationship between the initially registered sampled point cloud data and the target point cloud data. Based on the transformation relationship of the initial registration, the target point cloud data and several sampled point cloud data of the initial registration are iteratively closest to each other to determine the matching sampled point cloud data and the transformation relationship between the matching sampled point cloud data and the target point cloud data.

8. The object pose estimation method according to claim 7, characterized in that: The step of registering the target point cloud data with sampled point cloud data from several target sub-model data based on the first fast point feature histogram and the second fast point feature histogram to obtain the transformation relationship between the initially registered sampled point cloud data and the target point cloud data includes: Select several sampling points from the target point cloud data; Based on each sampling point, several similar points with similar fast point feature histograms are found in the sampling point cloud data of the target sub-model data. One point is randomly selected from these similar points as the point that corresponds one-to-one with the target point cloud in the sampling point cloud data of the target sub-model data. The rotation and translation matrix between the corresponding points is calculated. Based on the rotation and translation matrix, the sum of distance errors between the target point cloud and the sampling point cloud data in the target sub-model data after transformation at the corresponding points is obtained, thus obtaining the sum of distance errors corresponding to each sampling point. Based on the rotation and translation matrix corresponding to the minimum distance error, the relationship between the initial registered sampled point cloud data and the target point cloud data is determined.

9. The object pose estimation method according to claim 7, characterized in that: The step of iteratively calculating the nearest point between the target point cloud data and several sampled point cloud data from the initial registration, based on the transformation relationship of the initial registration, to determine the matching sampled point cloud data and the transformation relationship between the matching sampled point cloud data and the target point cloud data, includes: Based on the transformation relationship described in the initial registration, the target point cloud data is transformed to obtain the first point cloud data, and the following matching steps are performed: Based on each point in the first point cloud data, find the nearest corresponding point in the initially registered sampled point cloud data to form an initial corresponding point pair; Based on the initial corresponding point pairs, a first transformation relationship is obtained; according to the first transformation relationship, the first point cloud data is transformed to obtain the second point cloud data; Obtain the distance error between the second point cloud data and several sampled point cloud data from the initial registration; When the distance error is greater than a preset threshold or the number of iterations is less than a preset number, the second point cloud data is updated to the first point cloud data, and the matching step continues to be executed until the distance error is less than the preset threshold or the number of iterations is greater than the preset number. Then, the distance error and the corresponding first transformation relationship are obtained for each initially registered sampled point cloud data. The initially registered sampled point cloud data with the smallest distance error is used as the matched sampled point cloud data, and the corresponding first transformation relationship is used as the transformation relationship between the matched sampled point cloud data and the target point cloud data.

10. An object pose estimation device, characterized in that, include: The sub-model data acquisition module is used to acquire three-dimensional object models of several categories; to perform rotation sampling on the three-dimensional object models of each category to obtain several sub-model data of each category; the sub-model data includes sampled point cloud data and sampled rotation angle; The target image acquisition module is used to acquire a target image of a target object; and based on the target image, to determine the target point cloud data of the target object, the initial position of the target object, and the target category. The target data acquisition module is used to obtain several target sub-model data according to the target category; The transformation relationship determination module is used to match the target point cloud data with sampled point cloud data in several target sub-model data, and determine the matched sampled point cloud data and the transformation relationship between the matched sampled point cloud data and the target point cloud data. The pose estimation module is used to obtain the pose estimation result based on the transformation relationship, the initial position of the target object, and the sampling rotation angle corresponding to the matched sampling point cloud data.

11. An electronic device comprising a processor and a memory; characterized in that, The memory stores a computer program adapted to be loaded by the processor and executed as the object pose estimation method as described in any one of claims 1 to 9.

12. A computer-readable storage medium having a computer program stored thereon, characterized in that, When executed by a processor, the computer program implements the object pose estimation method as described in any one of claims 1 to 9.