Method, apparatus, device, medium and product for pose determination

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
The method simplifies the hand-eye calibration process for robots and unfixed cameras by determining the relative pose of the camera and robot without requiring a calibration board, enhancing user experience and reducing technical requirements, especially for mobile and handheld setups.

WO2026137358A1PCT designated stage Publication Date: 2026-07-02ABB (SCHWEIZ) AG +3

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: ABB (SCHWEIZ) AG
Filing Date: 2024-12-26
Publication Date: 2026-07-02

Application Information

Patent Timeline

26 Dec 2024

Application

02 Jul 2026

Publication

WO2026137358A1

IPC: G06T7/73

AI Tagging

Technology Topics

Computer graphics (images)Engineering

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing hand-eye calibration methods for robots are cumbersome and require significant human effort, and they are limited to scenarios where the relative pose of the camera is fixed during work.

Method used

A method for pose determination that determines, by an electronic device, a first set of keypoints of a robot based on visual information of the robot obtained via a camera. The method further comprises determining, by the electronic device, a second set of keypoints of the robot based on joint angle information is aligned with the visual information of the camera. The method further comprises determining, by the electronic device, a second set of keypoints of the robot based on joint angle information obtained from the robot, wherein the joint angle information is aligned with the visual information. The apparatus further comprises a third determining module configured to determine, by the electronic device, a pose of the camera relative to the robot based on the first set of keypoints and the second set of keypoints and the tolerance threshold. The apparatus further comprises a third determining module configured to determine, by the electronic device, a pose of the camera relative to the robot based on the first set of keypoints and the second set of keypoints and the tolerance threshold. The apparatus further comprises a fourth determining module configured to determine, by the electronic device, a second set of keypoints of the robot based on joint angle information obtained from the robot, wherein the joint angle information is aligned with the visual information.

Benefits of technology

The proposed solution simplifies the hand-eye calibration process, allowing for unfixed cameras and/or robots, and improves user experience by eliminating the need for the unfixed camera and/or the fixed robot with the handheld camera and the mobile robot with the fixed camera and the mobile robot with the handheld camera.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN2024142923_02072026_PF_FP_ABST

Patent Text Reader

Abstract

Embodiments of present disclosure provide a method, an apparatus, an electronic device, a computer-readable storage device, and a computer program product for pose determination. The method comprises determining, by an electronic device, a first set of keypoints of a robot based on visual information of the robot obtained via a camera. The method further comprises determining, by the electronic device, a second set of keypoints of the robot based on joint angle information obtained from the robot, wherein the joint angle information is aligned with the visual information. The method further comprises determining, by the electronic device, a pose of the camera relative to the robot based on the first set of keypoints and the second set of keypoints. In this way, the user experience for performing a hand-eye calibration related to a robot and a camera can be improved.

Need to check novelty before this filing date? Find Prior Art

Description

METHOD, APPARATUS, DEVICE, MEDIUM AND PRODUCT FOR POSE DETERMINATIONFIELD

[0001] Embodiments of the present disclosure generally relate to the field of computer technology and in particular, to a method, an apparatus, an electronic device, a computer-readable medium and a computer program product for pose determination.BACKGROUND

[0002] Robots are intelligent machines that can work semi-autonomously or fully autonomously. Robots can perform tasks such as operations or movements through programming and automatic control. Robots can assist or even replace human beings in performing dangerous, heavy, and complex tasks, improving work efficiency and quality, serving human life, and expanding or extending human activities and capabilities. Robots can be aware of surrounding environment through sensors. The hand-eye calibration is usually the first step in building a robot vision system to determine the relative pose relationship between the camera and the robot.SUMMARY

[0003] In general, various example embodiments of the present disclosure provide a method, an apparatus, an electronic device, a computer-readable storage device, and a computer program product for pose determination.

[0004] In a first aspect, it is provided a method for pose determination. The method comprises determining, by an electronic device, a first set of keypoints of a robot based on visual information of the robot obtained via a camera. The method further comprises determining, by the electronic device, a second set of keypoints of the robot based on joint angle information obtained from the robot, wherein the joint angle information is aligned with the visual information. The method further comprises determining, by the electronic device, a pose of the camera relative to the robot based on the first set of keypoints and the second set of keypoints.

[0005] In a second aspect, it is provided an apparatus for pose determination. The apparatus comprises a first determining module configured to determine, by an electronic device, a first set of keypoints of a robot based on visual information of the robot obtained via a camera. The apparatus further comprises a second determining module configured to determine, by the electronic device, a second set of keypoints of the robot based on joint angle information obtained from the robot, wherein the joint angle information is aligned with the visual information. The apparatus further comprises a third determining module configured to determine, by the electronic device, a pose of the camera relative to the robot based on the first set of keypoints and the second set of keypoints.

[0006] In a third aspect, it is provided an electronics device. The electronics device comprises a processor; and a memory coupled to the processor, wherein the memory has instructions stored therein, and the instructions, when executed by the processor, cause the device to execute actions of the first aspect.

[0007] In a forth aspect, it is provided a computer-readable medium. The computer-readable medium comprises instructions stored therein, which when executed by a processor, cause the processor to perform methods of the first aspect.

[0008] In a fifth aspect, it is provided a computer program product. The computer program product comprises instructions stored therein, which when executed by a processor, cause the processor to perform methods of the first aspect.

[0009] It is to be understood that the Summary is not intended to identify key or essential features of embodiments of the present disclosure, nor is it intended to be used to limit the scope of the present disclosure. Other features of the present disclosure will become readily comprehensible through the description below.DESCRIPTION OF DRAWINGS

[0010] Through the following detailed descriptions with reference to the accompanying drawings, the above and other objectives, features and advantages of the example embodiments disclosed herein will become more comprehensible. In the drawings, several example embodiments disclosed herein will be illustrated in an example and in a non-limiting manner, wherein:

[0011] FIG. 1 illustrates a schematic diagram of an example environment in which a plurality of embodiments of the present disclosure can be implemented;

[0012] FIG. 2 illustrates an example scenario in which some embodiments of the present disclosure can be implemented;

[0013] FIG. 3 illustrates a flowchart of an example process for a hand-eye calibration in accordance with some embodiments of the present disclosure;

[0014] FIG. 4 illustrates an example block diagram of a hand-eye calibration with three-dimension (3D) keypoint detection in accordance with some embodiments of the present disclosure;

[0015] FIG. 5 illustrates an example block diagram of an AI model for detecting 3D keypoints of a robot the in accordance with some embodiments of the present disclosure;

[0016] FIG. 6 illustrates a flowchart of an example method for pose determination in accordance with some embodiments of the present disclosure;

[0017] FIG. 7 illustrates a block diagram of an example apparatus for pose determination in accordance with some embodiments of the present disclosure; and

[0018] FIG. 8 illustrates a block diagram illustrating an electronic device in accordance with some embodiments of the present disclosure.

[0019] Throughout all the drawings, the same or similar reference numerals represent the same or similar elements.DETAILED DESCRIPTION OF EMBODIMENTS

[0020] Principles of the present disclosure will now be described with reference to several example embodiments shown in the drawings. Though example embodiments of the present disclosure are illustrated in the drawings, it is to be understood that the embodiments are described only to facilitate those skilled in the art in better understanding and thereby achieving the present disclosure, rather than to limit the scope of the disclosure in any manner.

[0021] The term comprises "or" includes "and" its variants are to be read as open terms that mean "includes, but is not limited to" . The term "or" is to be read as "and / or" unless the context clearly indicates otherwise. The term "based on" is to be read as "based at least in part on" . The term "being operable to" is to mean a function, an action, a motion or a state can be achieved by an operation induced by a user or an external mechanism. The term "one embodiment" and "an embodiment" are to be read as "at least one embodiment" . The term "another embodiment" is to be read as "at least one other embodiment" . The terms "first" , "second" , and the like may refer to different or same objects. Other definitions, explicit and implicit, may be included below. A definition of a term is consistent throughout the description unless the context clearly indicates otherwise.

[0022] The functions or algorithms described herein may be implemented in software in one embodiment. The software may consist of computer executable instructions stored on computer readable media or computer readable storage device such as one or more non-transitory memories or other type of hardware-based storage devices, either local or networked. Further, such functions correspond to modules, which may be software, hardware, firmware or any combination thereof. Multiple functions may be performed in one or more modules as desired, and the embodiments described are merely examples. The software may be executed on a digital signal processor, ASIC, microprocessor, or other type of processor operating on a computer system, such as a personal computer, server or other computer system, turning such computer system into a specifically programmed machine.

[0023] The functionality can be configured to perform an operation using, for instance, software, hardware, firmware, or the like. For example, the phrase "configured to" can refer to a logic circuit structure of a hardware element that is to implement the associated functionality. The phrase "configured to" can also refer to a logic circuit structure of a hardware element that is to implement the coding design of associated functionality of firmware or software. The term "module" refers to a structural element that can be implemented using any suitable hardware (e.g., a processor, among others) , software (e.g., an application, among others) , firmware, or any combination of hardware, software, and firmware. The term "logic" encompasses any functionality for performing a task. For instance, each operation illustrated in the flowcharts corresponds to logic for performing that operation. An operation can be performed using, software, hardware, firmware, or the like. The terms, "component" , "system" , and the like may refer to computer-related entities, hardware, and software in execution, firmware, or combination thereof. A component may be a process running on a processor, an object, an executable, a program, a function, a subroutine, a computer, or a combination of software and hardware. The term, "processor" may refer to a hardware component, such as a processing unit of a computer system.

[0024] The terms "a" or "an" as used herein, are defined as one or more than one. Also, the use of introductory phrases such as "at least one" and "one or more" in the claims should not be construed to imply that the introduction of another claim element by the indefinite articles "a" or "an" limits any particular claim containing such introduced claim element to disclosures containing only one such element, even when the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an" . The same holds true for the use of definite articles.

[0025] Furthermore, the claimed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computing device to implement the disclosed subject matter. Computer-readable storage media can include, but are not limited to, magnetic storage devices, e.g., hard disk, floppy disk, magnetic strips, optical disk, compact disk (CD) , digital versatile disk (DVD) , smart cards, flash memory devices, among others. In contrast, computer-readable media, i.e., not storage media, may additionally include communication media such as transmission media for wireless signals and the like.

[0026] As discussed above, the hand-eye calibration is usually the first step in building the robot vision system to determine the relative pose relationship between the camera and the robot. In the traditional the hand-eye calibration method, the robot holds a calibration board in different fields of views of the camera at different poses, and the computing device records the poses of the robot and the poses of the calibration board in the camera's coordinate system, and then the computing device uses these data to obtain the hand-eye relationship between the robot and the camera.

[0027] It can be seen that the traditional hand-eye calibration method requires a lot of human effort and experience. Further, the traditional hand-eye calibration method is limited to a scenario that the relative pose of robot and camera is fixed during work. Therefore, there is a need to propose a solution for the case where the robot and the camera are not fixed as well as avoiding using the calibration board. Therefore, the present disclosure proposed a new solution for pose determination. The proposed solution does not require the calibration board, and can improve the user experience for performing the hand-eye calibration, especially for the unfixed camera and / or the unfixed robot.

[0028] FIG. 1 illustrates a schematic diagram of an example environment 100 in which a plurality of embodiments of the present disclosure can be implemented. The example environment 100 is only illustrated and is not intended to suggest any limitations as to scope of use or functionality of embodiments of the disclosure described herein.

[0029] As shown, the example environment 100 comprises an electronic device 102, a robot 104 and a camera 106. An example of the electronic device 102 may be a server or a computer in a remote cloud. Another example of the electronic device 102 may be a mobile phone, a smart phone or a tablet. In some example embodiments, the camera 106 may be separated from the electronic device 102 and the camera 106 and the electronic device 102 can communicate with each other via wired or wireless connections. In some example embodiments, the camera 106 may be integrated in the electronic device 102.

[0030] The camera 106 can move, for example, the camera 106 may move to the place as shown by camera 108. A user may hold the camera 106 and move it. In some cases, the user may use a phone or tablet with one or more camera, and in this case, the camera 106 and the electronic device 102 can be considered as a whole.

[0031] The camera 106 may capture one or more images of the robot 104 and send the one or more images to the electronic device 102. In some example embodiments, the camera 106 may capture one or more videos of the robot 104 and send the one or more videos to the electronic device 102. Therefore, the one or more images and the one or more videos may be collectively referred to as visual information 112.

[0032] The electronic device 102 may receive the visual information 112 from the camera 106. In the case that the camera 106 is integrated in the electronic device 102, a module or program may be designed to receive the visual information 112 via the internal bus. The electronic device 102 may also receive the joint angle information 110 from the robot. The joint angle information 110 may refer to the angles of joints of the robot 104.

[0033] The electronic device 102 may align the visual information 112 and the joint angle information 110. For example, each frame of the visual information 112 may have a time stamp. The time stamp may indicate the time at which the frame is taken. The joint angle information 110 may also have a similar time stamp indicating an angle of a joint and its corresponding time. The electronic device 102 may align the visual information 112 and the joint angle information 110 based on the time stamps. It is to be understood that there may other implementations to align the visual information 112 and the joint angle information 110.

[0034] The electronic device 102 may use the visual information 112 and the joint angle information 110 to determine relative pose between the camera 106 and the robot 104. For example, the electronic device 102 may determine a pose of the camera 106 which is relative to the robot 104. In some example embodiments, if the camera moves, the pose of the camera 106 may be updated accordingly.

[0035] FIG. 2 illustrates an example scenario 200 in which some embodiments of the present disclosure can be implemented. An electronic device 202 in FIG. 2 may correspond to the electronic device 102 in FIG. 1. A robot 204 in FIG. 2 may correspond to the robot 104 in FIG. 1. A camera 206 in FIG. 2 may correspond to the camera 106 in FIG. 1.

[0036] As shown in FIG. 2, the camera 206 may have a field of view between a dashed line 210 and a dashed line 212. The robot 204 is in the field of view of the camera 206. Thus, the camera 206 can see the whole outline of the robot 204. The camera 206 can send the visual information of the robot 204 to the electronic device 202. The electronic device 202 may use the visual information to determine the joints of the robot 204. The determined positions of joints of the robot 204 may be referred to as a first set of keypoints. For example, the position coordinates of the joints 220, 222 and 224 in the camera base may be referred to as a first set of keypoints.

[0037] The electronic device 202 may further obtain angles of the joints 220, 222 and 224 from the robot 204. The electronic device 202 may determine the time at which the visual information is captured. The electronic device 202 may retrieve the corresponding angles of the joints 220, 222 and 224 at that time. For example, if an image of the robot 204 is taken by the camera 206 at 15: 00, then the values of the angles of the joints 220, 222 and 224 should be the values of the angles the joints 220, 222 and 224 at 15: 00.

[0038] In some example embodiments, there may be a tolerance threshold. For example, the tolerance threshold may be 100ms. That is, if an image of the robot 204 is taken by the camera 206 at 15: 00, then the values of the angles of the joints 220, 222 and 224 should be the values of the angles the joints 220, 222 and 224 between 15: 00 -100ms and 15: 00 + 100ms.

[0039] FIG. 3 illustrates a flowchart of an example process 300 for a hand-eye calibration in accordance with some embodiments of the present disclosure. For the purposed of better description, FIG. 3 will be described with reference to FIG. 1.

[0040] At 304, the electronic device 102 may synchronize the time in the robot 104 and the time in the camera 106, such that the time in the robot 104, the camera 106, and the electronic device 102 is uniform. At 302, the electronic device 102 may obtain the images and / or the videos of the robot from the camera 106. At 306, the electronic device 102 may obtain the angle values of the joints from the robot 104.

[0041] At 308, the electronic device 102 may reconstruct 3D keypoints of the robot 104 in the camera coordinate system. For example, the electronic device 102 may use an AI model to determine the joints of the robot 104, and the electronic device 102 may determine the position coordinates of the joints in camera coordinate system as the first set of the keypoints.

[0042] At 310, the electronic device 102 may calculate 3D keypoints of the robot 104 in the robot coordinate system based on joint angles and a model of the robot. For example, the electronic device 102 may use joint angles and the robot Denavit-Hartenberg (DH) model to determine the position coordinates of the joints in the robot coordinate system as the second set of the keypoints.

[0043] At 312, the electronic device 102 may establish equations based on rigid body kinematics and solve the equations. At 314, the electronic device 102 may get relative pose between the camera 106 and the robot 104 by solving the equations and further may send the relative pose to the robot 104.

[0044] By the implementing the process 300, auxiliary tools for calibration such as the calibration board can be avoided. It can save cost for users and can be used for two-dimension (2D) cameras and smart phone with cameras directly. The user does not need to know the internal parameters of the cameras. The process 300 does not need to jog the calibration poses which simplifies the hand-eye calibration process and reduces the technical requirement for user, such that even people without experience in robotics and computer vision can complete the hand-eye calibration. Therefore, when the pose of the robot and / or the camera is not fixed, such as a mobile robot with fixed camera, a fixed robot with a handheld camera and a mobile robot with a handheld camera, the process 300 is still applicable for such cases.

[0045] FIG. 4 illustrates an example block diagram 400 of a hand-eye calibration with 3D keypoint detection in accordance with some embodiments of the present disclosure. FIG. 4 may be a more detailed example of FIG. 3. In general, the hand-eye calibration with 3D keypoint detection may detect and reconstruct the 3D keypoints of the robot. The 3D keypoints of the robot may a series of feature points defined on the robot body, such as the center of the rotation axis (i.e., a joint of the robot) . The 3D keypoint reconstruction in the camera coordinate system may refer to the analysis of robot images or videos by AI model.

[0046] On the other hand, the 3D keypoints in the robot coordinate system may be calculated by robot forward kinematic with joint angles of the robot and the DH model. When the positions of 3D keypoints of robot in the camera coordinate system and the robot coordinate system both are known, the equations based on rigid body kinematics may be established and solved. For example, when the number of nonlinear keypoints is equal to or greater than four.

[0047] A camera may capture images and / or videos 404 of a robot 402 and may send the images and / or videos 404 to an electronic device. The electronic device may obtain the joint angles 412 of the robot 402. The electronic device may use a neural network 406 to extract joints of the robot 402 as the 3D keypoints 408 of the robot 402 in the camera base. The electronic device may use robot forward kinematics 416 and robot DH model 414 to determine the 3D keypoints 418 of the robot 402 in the robot base.

[0048] The electronic device may use both the 3D keypoints 408 of the robot 402 in the camera base and the 3D keypoints 418 of the robot 402 in the robot base to establish rigid transform function 410, which may be written as equation (1) : A×Pcam = Prob (1) wherein A represents the pose of camera in robot coordinate; Pcam represents 3D keypoints of the robot in the camera coordinate system; and Prob represents 3D keypoints of the robot in the robot coordinate system.

[0049] By solving the equation (1) , the pose of camera in robot coordinate can be obtained. That is, the hand-eye relationship 420 can be determined. In this way, calibration board is no longer required in the hand-eye calibration process, which reduce dependency on auxiliary tools and save cost for users. Further, the proposed method is suitable for 2D cameras, and the internal parameters of the camera are not required. Moreover, there is no need to jog special robot poses for calibration. The calibration can be implemented while the robot is working for other tasks. A user without experience in robotics and computer vision can complete the hand-eye calibration, and thus user experience can be improved.

[0050] FIG. 5 illustrates an example block diagram of an AI model 500 for detecting 3D keypoints of a robot the in accordance with some embodiments of the present disclosure. The architecture the AI model 500 may comprise three components. The first component may be the backbone. The backbone may be an efficient neural network structure designed for mobile and embedded devices. The backbone can enhance performance through inverted residual structures and lightweight depth-wise separable convolutions, considering memory efficiency and computational costs.

[0051] The second component may be the header. The header may receive feature maps from the backbone as input and output feature maps of different dimensions through their respective convolutional operations. In an example, the header may comprise the keypoint regression. The keypoint regression may output [N, 2K, H, W] , with K key points represented by x, y coordinates, hence there is 2K data points, representing the offset of key points from the center point.

[0052] The third component may be the post process. The post process may be responsible for converting the output of the header into the final key point coordinates and scores. It may include decoding of the keypoint regression, filtering and sorting of key points, and ultimately may output the coordinates and confidence scores of the key points.

[0053] As shown in FIG. 5, an image 502 may be input into the backbone 520. The image 502 may be adjusted to the expected resolution of the model and undergoes necessary scaling and padding in a preprocessing module which is not shown. The backbone 520 may extract features and use residual networks 504, 506 and 508 to extract shallow features in different sizes and fuse the features and the shallow features to obtain the fused features.

[0054] The fused features may be input down-sampled to reduce the size of the fused features. For example, block 512 shows N ×N feature, which is smaller than the size of M×M feature in block 510. The down-sampled features may be input to the keypoint regression 514.

[0055] Then, the AI model 500 may output a tensor representing the predicted key point coordinates and scores. In some cases, the AI model 500 may filter out keypoints with high confidence scores based on key point score thresholds and calculate the connections between keypoints for visualization. In some cases, keypoints and connections may be visualized on the input image to display the predicted results.

[0056] FIG. 6 illustrates a flowchart of an example method 600 for pose determination in accordance with some embodiments of the present disclosure. For the purposed of better description, FIG. 6 will be described with reference to FIG. 1.

[0057] At block 602, the electronic device 102 determines a first set of keypoints of a robot based on visual information of the robot obtained via a camera. For example, the electronic device 102 may determine one or more coordinates representing the one or more joints in a camera coordinate system as the first set of keypoints.

[0058] At block 604, the electronic device 102 determines a second set of keypoints of the robot based on joint angle information obtained from the robot. For example, the electronic device 102 may determine one or more coordinates representing the one or more joints in a robot coordinate system as the second set of keypoints. The joint angle information is aligned with the visual information. For example, the joint angle information is synchronized with the visual information in time domain.

[0059] At 606, the electronic device 102 determines a pose of the camera relative to the robot based on the first set of keypoints and the second set of keypoints. By implementing the embodiments of the method 600, no calibration board is required in calibration, and thus the user experience for performing a hand-eye calibration related to a robot and a camera can be improved.

[0060] Reference is made to FIG. 7, which illustrates a block diagram of an example apparatus 700 for pose determination in accordance with some embodiments of the present disclosure. The apparatus 700 comprises a first determining module 702 configured to determine, by an electronic device, a first set of keypoints of a robot based on visual information of the robot obtained via a camera. The apparatus further comprises a second determining module 704 configured to determine, by the electronic device, a second set of keypoints of the robot based on joint angle information obtained from the robot, wherein the joint angle information is aligned with the visual information. The apparatus further comprises a third determining module 706 configured to determine, by the electronic device, a pose of the camera relative to the robot based on the first set of keypoints and the second set of keypoints.

[0061] In some example embodiments, the visual information of the robot may be obtained by obtaining one or more images comprising the robot captured by the camera as the visual information. In some example embodiments, the visual information of the robot may be obtained by obtaining one or more videos comprising the robot captured by the camera as the visual information. In some example embodiments, the visual information of the robot may be obtained by both the above items.

[0062] In some example embodiments, the one or more images may comprise one or more joints of the robot captured by the camera from a plurality of field of view. In some example embodiments, the one or more videos may comprise one or more joints of the robot captured by the camera from a plurality of field of view. In some example embodiments, the visual information of the robot may be obtained by both the above items.

[0063] In some example embodiments, the apparatus 700 may comprise a first module configured to align the joint angle information with the visual information. The first module may comprise a second module configured to initiate a connection among the camera, the robot and the electronic device; a third module configured to synchronize the camera and the robot in time domain via the connection; and a fourth module configured to align the joint angle information and the visual information based on the synchronized time domain.

[0064] In some example embodiments, the apparatus 700 may further comprise a fifth module configured to obtain one or more angles of the one or more joints as the joint angle information, wherein one or more timepoints at which the one or more angles are obtained correspond to one or more timepoints at which the one or more images the or one or more videos are obtained.

[0065] In some example embodiments, the fourth module may comprise a sixth module configured to obtain a first set of timepoints of the one or more angles of the one or more joints; obtain a second set of timepoints of the one or more images or the one or more videos; and map the first set of timepoints and the second set of timepoints based on the synchronized time domain.

[0066] In some example embodiments, the first determining module 702 may comprise a seventh module configured to determine the one or more joints based on the visual information using an AI model; and determine one or more coordinates representing the one or more joints in a camera coordinate system as the first set of keypoints based on the visual information.

[0067] In some example embodiments, the second determining module 704 may comprise an eighth module configured to determine the one or more joints based on the joint angle information and a kinematics model of the robot; and determine one or more coordinates representing the one or more joints in a robot coordinate system as the second set of keypoints based on the joint angle information and the robot kinematics model.

[0068] In some example embodiments, the third determining module 706 may comprise a ninth module configured to determine a transform coordinate from the first set of keypoints to the second set of keypoints based on a kinematics model of rigid body; and determine the transform coordinate as the pose of the camera relative to the robot.

[0069] In some example embodiments, the electronic device may comprise at least one of a mobile phone or a tablet, and the camera may be integrated in the electronic device. In some example embodiments, the camera may be separated from the electronic device.

[0070] In some example embodiments, the apparatus 700 may further comprise a tenth module to update the pose of the camera relative to the robot in response to determining that the camera or the robot moved a distance.

[0071] By implementing the example embodiments of FIG. 7, similar to FIG. 6, no calibration board is required in calibration, and thus the user experience for performing a hand-eye calibration related to a robot and a camera can be improved. In some example embodiments, an AI based 3D keypoints reconstruction technology is used to realize robot hand-eye calibration. The AI model with high accuracy and suitable for potential calibration views can be used to meet the needs of hand-eye calibration with unfixed cameras and / or robots, and high accuracy demands.

[0072] FIG. 8 illustrates a block diagram illustrating an electronic device 800 in accordance with some embodiments of the present disclosure. As indicated, the device 800 includes a central processing unit (CPU) 801, which can execute various appropriate actions and processing based on the computer program instructions stored in a read-only memory (ROM) 802 or the computer program instructions loaded into a random access memory (RAM) 803 from a storage unit 808. The RAM 803 also stores all kinds of programs and data required by operating the electronic device 800. CPU 801, ROM 802 and RAM 803 are connected to each other via a bus 804, to which an input / output (I / O) interface 805 is also connected.

[0073] A plurality of components in the device 800 are connected to the I / O interface 805, comprising: an input unit 806, such as a keyboard, a mouse and the like; an output unit 807, such as various types of displays, loudspeakers and the like; a storage unit 808, such as a storage disk, an optical disk and the like; and a communication unit 809, such as a network card, a modem, a wireless communication transceiver and the like. The communication unit 809 allows the device 800 to exchange information / data with other devices through computer networks such as Internet and / or various telecommunication networks.

[0074] Each procedure and processing described above, such as the method 600, can be executed by a processing unit 801. For example, in some embodiments, the method 600 can be implemented as computer software programs, which are tangibly included in a machine-readable medium, such as a storage unit 808. In some embodiments, the computer-readable medium is a non-transitory computer-readable medium. In some embodiments, the computer program can be partially or completely loaded and / or installed to the device 800 via the ROM 802 and / or the communication unit 809. When the computer program is loaded to the RAM 803 and executed by the CPU 801, one or more steps of the above described method 600 are implemented. Alternatively, in other embodiments, the CPU 801 may also be configured in any proper manner to implement the above process / method.

[0075] The present disclosure may be a method, a device, a system and / or a computer program product. The computer program product can include a computer-readable storage medium loaded with computer-readable program instructions thereon for executing various aspects of the present disclosure.

[0076] The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include: a portable computer diskette, a hard disk, a random access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM or Flash memory) , a static random access memory (SRAM) , a portable compact disc read-only memory (CD-ROM) , a digital versatile disk (DVD) , a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination thereof. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable) , or electrical signals transmitted through a wire.

[0077] Computer readable program instructions described herein can be downloaded to respective computing / processing devices from a computer readable storage medium, or downloaded to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and / or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and / or edge servers. A network adapter card or network interface in each computing / processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing / processing device.

[0078] Computer readable program instructions for carrying out operations of the present disclosure may be assembly instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN) , or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider) . In some embodiments, by means of state information of the computer readable program instructions, an electronic circuitry including, for example, programmable logic circuitry (PLC) , field-programmable gate arrays (FPGA) , or programmable logic arrays (PLA) can be personalized to execute the computer readable program instructions, thereby implementing various aspects of the present disclosure.

[0079] Aspects of the present disclosure are described herein with reference to flowchart and / or block diagrams of methods, apparatus (systems) , and computer program products according to embodiments of the present disclosure. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer readable program instructions.

[0080] These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which are executed via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions / acts specified in the flowchart and / or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and / or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function / act specified in the flowchart and / or block diagram block or blocks.

[0081] The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which are executed on the computer, other programmable apparatus, or other device implement the functions / acts specified in the flowchart and / or block diagram block or blocks.

[0082] The flowchart and block diagrams illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, snippet, or portion of codes, which comprises one or more executable instructions for implementing the specified logical function (s) . In some alternative implementations, the functions noted in the block may be implemented in an order different from those illustrated in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and / or flowchart illustration, and combinations of blocks in the block diagrams and / or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or by combinations of special purpose hardware and computer instructions.

[0083] Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are contained in the above discussions, these should not be construed as limitations on the scope of the present disclosure, but rather as descriptions of features that may be specific to particular embodiments. Certain features that are described in the context of separate embodiments may also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment may also be implemented in multiple embodiments separately or in any suitable sub-combination.

[0084] A person of ordinary skill in the art may be aware that, in combination with the examples described in the embodiments disclosed in this specification, units and algorithm steps can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether the functions are performed by hardware or software depends on particular applications and design constraints of the technical solutions. A person skilled in the art may use different methods to implement the described functions for each particular application, but it should not be considered that the implementation goes beyond the scope of this application.

[0085] It may be clearly understood by a person skilled in the art that, for the purpose of convenient and brief description, for a detailed working process of the foregoing system, apparatus, and unit, refer to a corresponding process in the foregoing method embodiment. Details are not described herein again.

[0086] In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described apparatus embodiment is merely an example. For example, the unit division is merely logical function division and may be other division in actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented through some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.

[0087] The units described as separate parts may be or may not be physically separate, and parts displayed as units may be or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected based on actual requirements to achieve the objectives of the solutions of the embodiments.

[0088] In addition, functional units in the embodiments of this application may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit.

[0089] When the functions are implemented in a form of a software functional unit and sold or used as an independent product, the functions may be stored in a computer readable storage medium. Based on such an understanding, the technical solutions in this application essentially, or the part contributing to the prior art, or some of the technical solutions may be implemented in a form of a software product. The computer software product is stored in a storage medium, and includes several instructions for instructing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or some of the steps of the methods described in the embodiments of this application. The foregoing storage medium includes: any medium that can store program code, such as a USB flash drive, a removable hard disk, a read-only memory (Read-Only Memory, ROM) , a random access memory (Random Access Memory, RAM) , a magnetic disk, or an optical disc.

[0090] The foregoing descriptions are merely specific implementations of this application, but are not intended to limit the protection scope of this application. Any variation or replacement readily figured out by a person skilled in the art within the technical scope disclosed in this application shall fall within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

1.A method for pose determination, comprising:determining, by an electronic device, a first set of keypoints of a robot based on visual information of the robot obtained via a camera;determining, by the electronic device, a second set of keypoints of the robot based on joint angle information obtained from the robot, wherein the joint angle information is aligned with the visual information; anddetermining, by the electronic device, a pose of the camera relative to the robot based on the first set of keypoints and the second set of keypoints.2.The method of claim 1, wherein the visual information of the robot is obtained by at least one of the following:obtaining one or more images comprising the robot captured by the camera as the visual information; orobtaining one or more videos comprising the robot captured by the camera as the visual information.3.The method of claim 2, wherein at least one of the following:the one or more images comprise one or more joints of the robot captured by the camera from a plurality of field of view: orthe one or more videos comprise one or more joints of the robot captured by the camera from a plurality of field of view.4.The method of claim 3, wherein the joint angle information is aligned with the visual information by the following:initiating a connection among the camera, the robot and the electronic device; andsynchronizing, by the electronic device, the camera and the robot in time domain via the connection; andaligning, by the electronic device, the joint angle information and the visual information based on the synchronized time domain.5.The method of claim 4, wherein the joint angle information is obtained from the robot by the following:obtaining one or more angles of the one or more joints as the joint angle information, wherein one or more timepoints at which the one or more angles are obtained correspond to one or more timepoints at which the one or more images the or one or more videos are obtained.6.The method of claim 5, wherein aligning, by the electronic device, the joint angle information and the visual information based on the synchronized time domain comprises:obtaining a first set of timepoints of the one or more angles of the one or more joints;obtaining a second set of timepoints of the one or more images or the one or more videos; andmapping the first set of timepoints and the second set of timepoints based on the synchronized time domain.7.The method of claim 3, wherein determining, by the electronic device, the first set of keypoints of the robot based on the visual information of the robot obtained via the camera comprises:determining the one or more joints based on the visual information using an artificial intelligence (AI) model; anddetermining one or more coordinates representing the one or more joints in a camera coordinate system as the first set of keypoints based on the visual information.8.The method of claim 3, wherein determining, by the electronic device, the second set of keypoints of the robot based on the joint angle information obtained from the robot comprises:determining the one or more joints based on the joint angle information and a kinematics model of the robot; anddetermining one or more coordinates representing the one or more joints in a robot coordinate system as the second set of keypoints based on the joint angle information and the robot kinematics model.9.The method of claim 1, wherein determining the pose of the camera relative to the robot based on the first set of keypoints and the second set of keypoints comprises:determining a transform coordinate from the first set of keypoints to the second set of keypoints based on a kinematics model of rigid body; anddetermining the transform coordinate as the pose of the camera relative to the robot.10.The method of claim 1, wherein at least one of the following:the electronic device comprises at least one of a mobile phone or a tablet, and the camera is integrated in the electronic device; orthe camera is separated from the electronic device.11.The method of claim 1, further comprising:in response to determining that the camera or the robot moved a distance, updating, by the electronic device, the pose of the camera relative to the robot.12.An apparatus for pose determination, comprising:a first determining module configured to determine, by an electronic device, a first set of keypoints of a robot based on visual information of the robot obtained via a camera;a second determining module configured to determine, by the electronic device, a second set of keypoints of the robot based on joint angle information obtained from the robot, wherein the joint angle information is aligned with the visual information; anda third determining module configured to determine, by the electronic device, a pose of the camera relative to the robot based on the first set of keypoints and the second set of keypoints.13.An electronic device, comprising:a processor; anda memory coupled to the processor, wherein the memory has instructions stored therein, and the instructions, when executed by the processor, cause the device to execute actions of any of claims 1-11.14.A computer-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform a method of any of claims 1-11.15.A computer program product having instructions stored therein, which when executed by a processor, cause the processor to perform a method of any of claims 1-11.