Method and apparatus for annotating images of objects recorded by means of a camera

CN117136384BActive Publication Date: 2026-06-30VOLKSWAGEN AG

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: VOLKSWAGEN AG
Filing Date: 2022-03-24
Publication Date: 2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, annotating images recorded by cameras requires manual operation, which leads to changes in initial conditions and introduces human error.

Method used

By using a robot arm equipped with a camera, the position and orientation of the camera and the end effector are determined through hand-eye calibration, key points on the image are automatically selected and labeled, and image annotation is performed using projection geometry, thus avoiding manual operation.

Benefits of technology

It automates image annotation, reduces human error, and can efficiently process large numbers of images, making it suitable for training neural networks.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117136384B_ABST

Patent Text Reader

Abstract

The present invention relates to a method (100) for annotating an image (31) of an object (30) recorded by means of a camera (14), wherein the camera (14) is arranged on a robotic arm (12), and wherein the robotic arm (12) includes an end effector (13). The method (100) includes the following steps: determining (101) the position and orientation of the camera (14) relative to the end effector (13); selecting (103) a first key point (20) on the object (30); providing (104) information on additional key points in the object coordinates relative to the first key point (20); guiding (108) the end effector (13) to the position of the first key point (20); determining (111) the position and orientation of the end effector (13) at the first key point (20) in world coordinates; and determining... (112) Determine the position of the first key point (20) in world coordinates; (113) Determine the position of the other key points in world coordinates; (114) Change the position and / or orientation of the end effector (13) and record an image (31) by means of a camera (14); (115) Determine the camera coordinates of the key points by transforming the world coordinates of the key points to the camera coordinate system of the camera (14); and (116) Determine the camera image coordinates of all key points from the camera coordinates of all key points by means of projection.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention relates to a method and apparatus for annotating images recorded by means of a camera, according to the independent claims. Background Technology

[0002] It is known in the prior art to manually annotate images of objects recorded by a camera. Specifically, this involves a person determining and labeling the coordinates of multiple pre-determined points on the object in the image. Alternatively, markers can be placed at the corresponding points on the object so that these points can be identified and labeled in the image. However, this results in a change to the initial conditions, as the object is altered by the placement of the markers. Summary of the Invention

[0003] The present invention is based on the objective of automating a method for annotating images recorded by means of a camera.

[0004] The above task is solved by a method for annotating images of an object recorded by a camera, wherein the camera is mounted on a robotic arm, and wherein the robotic arm includes an end effector (sometimes also called a terminal actuator).

[0005] The method includes determining the position and orientation of a camera relative to an end effector. The camera specifically includes a camera coordinate system. A robotic arm, preferably arranged at a movable robot, has an end effector. The end effector is, in particular, the free end of the robotic arm. In other words, it is the last arm segment of the robotic arm, which may in particular include a grasping arm. In humanoid robots, the end effector may be, for example, a hand. The world coordinates of the end effector are known, and the relationship between the world coordinates of the end effector and the camera is determined by determining the position and orientation of the camera relative to the end effector. In other words, an association is established between the world coordinate system and the camera coordinate system.

[0006] Determining the position and orientation of the camera relative to the end effector can, in particular, include hand-eye calibration. Here, a calibration object, such as an image with a checkerboard pattern, is used in particular. The calibration object is recorded from different positions and orientations of the camera using the camera. The recorded images are stored along with the different positions and orientations. Since the construction of the calibration object (e.g., in the case of a checkerboard pattern, the precise pattern of the calibration object and the dimensions of each square face) is known, the relationship between the position and orientation of the end effector and the camera perspective (in other words, the camera coordinate system) can be determined. Therefore, the world coordinates of the camera in the world coordinate system can be derived. Thus, external calibration of the camera can be performed. In particular, optical errors of the camera can also be determined in the manner described above. For example, lens distortion can be detected by the distortion of the image of the calibration object on the recorded image. Since imaging depends on optical errors (e.g., lens distortion), internal calibration of the camera can also be performed.

[0007] This method includes selecting a first keypoint on the object. Here, the first keypoint is freely chosen. Furthermore, the method includes providing information about additional keypoints relative to the first keypoint in the object coordinate system (i.e., in the object's coordinate system). Therefore, the keypoints are previously defined points on the object.

[0008] Providing this information may in particular include measuring the corresponding positions of additional key points on the object relative to the first key point. For example, this could include manually measuring the corresponding relative positions of the additional key points relative to the first key point. Furthermore, the method may include utilizing existing information about the dimensions of the object. Relative positions can be determined based on existing information, such as dimensions from technical drawings or DIN standards.

[0009] The method involves guiding the end effector to the location of the first key point. In other words, bringing the end effector to the precise location of the first key point.

[0010] Guiding to the location of the first critical point specifically includes manually moving the end effector to the first critical point.

[0011] The end effector can be, in particular, a plug, where the object can be a socket. Here, the plug can be inserted into the socket, and thus guide the end effector to the first key point. The robot can, in particular, be a mobile charging robot.

[0012] The method involves determining the position and orientation of the end effector at a first keypoint in world coordinates. The world coordinates of the first keypoint can then be deduced. Since the position and orientation of the end effector in world coordinates are known, or can be determined in a simple manner after the robot and / or robot arm has moved, the position of the first keypoint can be read out. In particular, it has been previously determined exactly which part of the end effector will be guided to the first keypoint, wherein the location of that part at the end effector is known, and thus its world coordinates are always known, or can be derived after the corresponding movement of the robot arm or robot. Therefore, the world coordinates of that part correspond to the world coordinates of the first keypoint.

[0013] Since the relative positions of the other keypoints with respect to the first keypoint exist in the object coordinates, the positions of the other keypoints in the world coordinates can be determined based on the position of the first keypoint in the world coordinates.

[0014] In the next step, the method includes changing the position and / or orientation of the end effector and recording an image using a camera. This image exists in 2D and includes a 2D camera image coordinate system. The method may include transforming the world coordinates of the keypoints to the camera coordinate system. In this way, the camera coordinates of the keypoints are determined (in other words, 3D camera coordinates). In other words, the position of the keypoints relative to the camera is determined. This step is performed using the previously determined association between the world coordinate system and the camera coordinate system.

[0015] In another step, the method includes determining the camera image coordinates (in other words, 2D camera image coordinates) of all keypoints from their camera coordinates using projection. In other words, the camera image coordinates of all keypoints in the recorded image are determined from 3D camera coordinates using projective geometry. In another step, the determined locations can be labeled on the image. In this way, the recorded image is annotated.

[0016] The term "annotation" should be understood in particular as identifying predefined points (in other words, keypoints) in the recorded image. Here, the method provides an automated keypoint annotation approach, eliminating the need for manual identification of keypoints on the image. Therefore, this method does not include manually identifying keypoints on the recorded image. Manual identification should be understood in particular as manually annotating the image. In terms of manual annotation, this method has significant advantages, namely, avoiding human error and enabling the annotation of a large number of images in a very efficient manner. In particular, this method also does not involve applying markers to objects to identify keypoints on the recorded image.

[0017] This method may include storing the determined camera image coordinates of all key points. In particular, the steps of changing the position and / or orientation of the end effector, recording images by means of a camera, and determining the camera image coordinates of all key points (especially including storing the determined camera image coordinates) are repeated to generate multiple annotated images. In this way, training data for artificial networks can be created, in particular, where such training data consists of annotated images. For example, training data for a mobile charging robot (which, for example, is to autonomously charge a vehicle) can be created to train a neural network such that the mobile charging robot can fully automatically guide the end effector (in other words, the plug) into the vehicle's socket.

[0018] In another aspect, an apparatus is provided for annotating images of an object recorded by a camera, wherein the apparatus is configured to implement the aforementioned method. For this purpose, the apparatus particularly includes an evaluation unit and a control unit. The control unit is used to manipulate a robotic arm and / or a robot and / or a camera. The evaluation unit is specifically used to determine the position and orientation of the camera relative to an end effector, select a first keypoint on the object, evaluate information on the relative positions of other keypoints relative to the first keypoint, determine the position and orientation of the end effector at the first keypoint in world coordinates, determine the positions of the first keypoint and other keypoints in world coordinates, determine the camera coordinates of the keypoints by transforming the world coordinates of the keypoints to the camera coordinate system of the camera, and determine and annotate the camera image coordinates of all keypoints from the camera coordinates of all keypoints by means of projection. Furthermore, the apparatus includes a storage unit for storing the images and the determined camera image coordinates of the keypoints.

[0019] In particular, the device includes a robotic arm (which in turn includes an end effector) and a camera. Furthermore, the device can have a robot including the robotic arm. Attached Figure Description

[0020] The accompanying figures are shown in a purely schematic manner, wherein:

[0021] Figure 1 A schematic diagram of the method according to the present invention is shown; and

[0022] Figures 2 to 8 Different method steps according to the present invention are shown. Detailed Implementation

[0023] Figure 1 A method flow diagram of method 100 according to the present invention is shown, the method comprising determining, as a first step, the position and orientation of camera 14 relative to end effector 13. This step may in particular include hand-eye calibration 102.

[0024] Furthermore, the method includes selecting 103 a first keypoint 20 on the object 30 from which the image should be recorded. Additionally, method 100 includes providing 104 information on additional keypoints relative to the first keypoint 20 in the object coordinate system. Providing 104 may include measuring 105 the position of the additional keypoints on the object 30 relative to the first keypoint 20, or utilizing 106 existing information about the dimensions of the object 30.

[0025] Method 100 includes guiding the end effector 13 to the location of the first key point. In particular, this may include manually moving the end effector 13 109 to the first key point 20.

[0026] Method 100 includes determining 111 the position and orientation of the end effector 13 at a first key point 20, determining 112 the position of the first key point in world coordinates, and determining 113 the position of another key point in world coordinates.

[0027] Method 100 includes changing the position and / or orientation of the end effector 13 (114) and recording an image (31) using a camera (14). The camera coordinates of the keypoints (115) can be determined by transforming the world coordinates of the keypoints to the camera coordinate system of the camera (14). The camera image coordinates of all keypoints can be determined from the camera coordinates by projection and annotated on the image (116). Preferably, the camera image coordinates of all keypoints are stored (117). Steps 114 to 117 are repeated in particular to create (118) multiple annotated images as training data for a suitable network.

[0028] Figure 2 The steps for determining the position and orientation of the camera 14 of the device 10 according to the invention relative to the end effector 13 are shown. The end effector 13 is arranged on the robotic arm 12 of the robot 11.

[0029] exist Figure 3 The diagram illustrates how to select the first keypoint 20 on object 30 in object coordinate system 21. In this example, the first keypoint 20 is the center of the upper opening of the socket (which is object 30).

[0030] Figure 4 This illustrates how an end effector 13 is guided to the position of the first keypoint 20 by progressively moving the robotic arm 12 109 to the object 30 and the first keypoint 20. In this example, the end effector 13 is configured as a plug with two pins, wherein... Figure 4The upper part of the contact pin is inserted into the upper opening of the socket. The center of the fixed end of the upper contact pin is the following part of the end effector, which is located at the first key point when the plug is inserted into the socket. Since its world coordinates are known, the world coordinates of the first key point 20 are determined in this way.

[0031] exist Figure 5 The diagram illustrates how to determine the positions of other keypoints 113 in world coordinates. An example is shown of a second keypoint 23, and how it is transformed into world coordinate system 22 based on information about its relative position to the first keypoint 20.

[0032] Figure 6 It shows how to change the position and orientation of the end effector 13 114, and how to record an image of object 30 114 by means of a camera.

[0033] exist Figure 7 The diagram illustrates how to determine the camera coordinates of keypoint 115. In other words, it determines the position of the keypoint relative to camera 14.

[0034] exist Figure 8 The diagram illustrates how the camera image coordinates of keypoint 116 are determined from the camera coordinates of keypoints using projection. It also schematically illustrates how the camera image coordinates of keypoints in image 31 recorded by camera 14 are determined from the camera coordinates of keypoints (shown on the right) using projection geometry.

[0035] List of reference numerals

[0036] 100 methods

[0037] 101 Determine the position and orientation of the camera relative to the end effector.

[0038] 102 Hand-Eye Calibration

[0039] 103 Select the first key point on the object

[0040] 104 provides additional keypoint information relative to the first keypoint in the object coordinate system.

[0041] 105. Measure the position of other key points relative to the first key point.

[0042] 106. Utilize existing information about the object's dimensions.

[0043] 108 guides the end effector to the location of the first critical point.

[0044] 109. Manually move the end effector to the first critical point.

[0045] 111 Determine the position of the end effector at the first key point in world coordinates and take...

[0046] Towards

[0047] 112 Determine the location of the first key point in world coordinates.

[0048] 113 Determine the location of other key points in world coordinates.

[0049] 114. Change the position and / or orientation of the end effector and record the image using a camera.

[0050] picture

[0051] 115. Transform the world coordinates of the key points to the camera coordinate system.

[0052] Determine the camera coordinates of key points

[0053] 116. All key points are determined from the camera coordinates of all key points using projection.

[0054] The camera image coordinates are obtained, and these camera image coordinates are labeled.

[0055] 117 Stores the camera image coordinates of all key points

[0056] 118 Creating training data for artificial networks based on the meaning of annotated images

[0057] 10 devices

[0058] 11 robots

[0059] 12 robotic arms

[0060] 13 End effector

[0061] 14 cameras

[0062] 20 First Key Point

[0063] 21 Object Coordinate System

[0064] 22 World Coordinate System

[0065] 23 Second key point

[0066] 30 objects

[0067] 31 images

Claims

1. A method (100) for annotating an image (31) of an object (30) recorded by means of a camera (14), wherein The camera (14) is mounted on the robotic arm (12). The robotic arm (12) includes an end effector (13). Its features are, The method (100) includes the following steps: a) Determine the position and orientation of the camera (14) relative to the end effector (13) (101); b) Select (103) the first key point (20) on the object (30); c) Provide (104) information on the other key points in the object coordinates relative to the first key point (20); d) Guide the end effector (13) (108) to the location of the first key point (20); e) Determine the position and orientation of the end effector (13) at the first key point (20) in world coordinates; f) Determine the position (112) of the first key point (20) in world coordinates by means of the determined position and orientation of the end effector (13) in world coordinates; g) Based on the determined position of the first key point (20) in world coordinates and the relative position of the other key points with respect to the first key point in object coordinates, determine (113) the position of the other key points in world coordinates; h) Change the position and / or orientation of the end effector (13) (114) and record an image (31) by means of the camera (14); i) The camera coordinates of the key points are determined (115) by transforming the world coordinates of the key points to the camera coordinate system of the camera (14); j) Determine the camera image coordinates of all key points from the camera coordinates of all key points by means of projection (116).

2. The method (100) according to claim 1, Its features are, The method (100) does not include manually identifying key points on the image (31) recorded by means of the camera (14).

3. The method (100) according to any one of claims 1 or 2, Its features are, The method (100) does not include applying markers to the object (30) to identify key points on the recorded image (31).

4. The method (100) according to any one of claims 1 or 2, Its features are, Determining (101) the position and orientation of the camera (14) relative to the end effector (13) includes hand-eye calibration (102).

5. The method (100) according to any one of claims 1 or 2, Its features are, Providing information on additional key points (104) includes measuring the corresponding positions of the additional key points on the object relative to the first key point (20).

6. The method (100) according to any one of claims 1 or 2, Its features are, Information provided for (104) additional key points includes using existing information about the dimensions of the object (30).

7. The method (100) according to any one of claims 1 or 2, Its features are, Guiding (108) includes manually moving (109) the end effector (13) to the first key point (20).

8. The method (100) according to any one of claims 1 or 2, Its features are, The method (100) includes storing (117) the camera image coordinates of all key points.

9. The method (100) according to any one of claims 1 or 2, Its features are, The method (100) includes repeating steps h) to j) to generate multiple annotated images.

10. The method (100) according to claim 9, Its features are, Training data for artificial networks is created (118) using the method (100). The training data consists of annotated images.

11. A device (10) for annotating an image (31) of an object (30) recorded by means of a camera (14), characterized in that The apparatus is configured to carry out the method according to any one of claims 1 to 10.

12. The apparatus (10) according to claim 11, Its features are, The device (10) includes a robotic arm (12) containing an end effector (13) and a camera (14).