Method and apparatus for determining reference data for camera images of a person

By placing invisible markers and a motion capture system on the head of personnel, and combining this with a camera to record the head posture and gaze information of the personnel, the difficulty of obtaining camera image reference data in the prior art is solved, and efficient and interference-free data acquisition and model training are achieved.

CN122199659APending Publication Date: 2026-06-12ROBERT BOSCH GMBH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ROBERT BOSCH GMBH
Filing Date
2025-12-12
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing technologies struggle to efficiently acquire reference data from personnel camera images, particularly head posture and gaze information. Furthermore, the "antler" devices used in traditional methods interfere with the suitability of training models.

Method used

By placing invisible markers on the head of personnel, combined with a motion capture system and a camera, head posture and gaze information are recorded and determined. The gaze target device is used for automatic positioning and indication, simplifying the external parameter calibration and synchronization process.

Benefits of technology

It enables efficient acquisition of interference-free real data for model training, improves data quality and training efficiency, and simplifies look-ahead indication and data synchronization.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122199659A_ABST
    Figure CN122199659A_ABST
Patent Text Reader

Abstract

The invention relates to a method and a device for determining reference data of a camera image of a person. The invention relates to a method, for example a computer-implemented method, for determining reference data of a camera image of a person, for example a driver of a vehicle, wherein the reference data comprises information about a head pose of the person and / or information about a gaze direction of the person, wherein the method comprises: recording a camera image of the person by means of a first camera device; determining information about a head pose of the person by means of a motion capture system, wherein at least one marker device of the motion capture system is arranged at the head of the person, for example such that the marker device is not visible on the camera image for at least one predefinable angular range of the head, for example with respect to the first camera device; and optionally determining information about a gaze direction of the person.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to a method for using reference data from camera images to identify persons.

[0002] This disclosure also relates to a device for identifying reference data from camera images of people. Background Technology

[0003] The detection of gaze direction based on person-based camera images is a current research problem, particularly relevant to driver monitoring systems in vehicles, for example, to identify driver distraction or to assess their readiness to take over in autonomous vehicles. Acquiring ground-truth data—images containing, for example, precise information about head posture and gaze direction—is difficult and, in the case of some conventional systems, is achieved through a forward-protruding frame (“Hirschgeweih”) worn on the subject’s head with an integrated camera. Summary of the Invention

[0004] Some examples relate to methods, such as computer-implemented methods, for determining reference data from camera images of a person, particularly camera images of a person, such as a driver of a vehicle, wherein the reference data includes information about the person's head posture and / or information about the person's gaze direction. The method includes: recording camera images of the person using a first camera device; determining information about the person's head posture using a motion capture system, wherein at least one marker of the motion capture system is positioned at the person's head, for example, such that the marker is not visible in the camera image for the head, for example, relative to at least one pre-given angular range of the first camera device; and optionally determining information about the person's gaze direction. In some examples, this allows for the efficient determination of the aforementioned information along with the camera images, and further allows for the use of a first camera device positioned away from the person, such as a subject.

[0005] For example, the first camera device is a video camera that provides a sequence, such as a stream, of individual camera images (e.g., frames). This facilitates the allocation of each camera image to a time base (Zeitbezug), such as the timestamp of the corresponding frame.

[0006] In some examples, the method includes using information about the person's head pose and / or information about the person's gaze direction as reference data for a camera image of the person. This allows for the efficient specification or provision of reference data, for example, for training a model.

[0007] For example, using information about a person's head pose and / or their gaze direction as reference data for camera images of the person can, for example, include using the reference data as real data, for example, to train at least one model based on artificial intelligence, such as machine learning. Thus, the model can be trained accordingly efficiently. Furthermore, the real data or the associated camera images do not possess elements inherently undesirable for determining, for example, the structure of the real data, as is the case in some conventional schemes (e.g., "antlers"), making it possible to have particularly high-quality training data and particularly efficient training.

[0008] For example, the method includes at least one of the following elements: a) placing the at least one marking device at the back of the person's head; or b) placing the at least one marking device in the region of at least one ear of the person, for example, behind at least one ear, or, for example, behind both ears relative to the first camera device; or c) placing the at least one marking device on the person's head; or d) placing the at least one marking device below the person's chin, wherein, for example, the placement includes adhesive or adhesive. This allows for the simple placement of at least one marking device on the person's head, enabling reliable information to be determined by means of a motion capture system, wherein the marking device placed on the person is not visible in the camera image, thus allowing valuable training data to be obtained.

[0009] In some examples, the method includes: moving a gaze target device for the person in space; determining the position of the gaze target device by means of the motion capture system; and optionally, determining information about the person's gaze direction based at least on the position of the gaze target device. In some examples, this enables the efficient and automatic (e.g., without human interaction) determination of the position of the gaze target device by the motion capture system, and in some examples simplifies gaze direction instruction, i.e., the command to the person: in which direction the person should direct their gaze.

[0010] For example, moving the line-of-sight target device can be done manually, such as by hand, or by means of a robot or drone. This makes it possible to move the line-of-sight target device particularly efficiently and in a situation-appropriate manner, such as automatically, which in some examples can further simplify the generation of reference data.

[0011] In some examples, the method includes extrinsic calibration of the first camera device when using the line-of-sight target device. Thus, in some examples, a separate calibration or calibration device for extrinsic calibration of the first camera device can be eliminated.

[0012] For example, the method includes at least one of the following elements: a) performing, for example, temporal synchronization between the camera image (or a frame of the video data stream) and information determined by means of the motion capture system, such as information about the head posture of the person; or b) calibrating at least one component, such as the motion capture system, for example (e.g., repeating the calibration at the beginning and end of recording the camera image); or c) filtering the camera image, for example removing camera images on which at least one marking device of the motion capture system is visible. Thus, other advantageous aspects for accurately determining reference data are given according to some examples.

[0013] In some examples, the method includes at least one of the following elements: a) determining the position and / or rotation of the head, for example, in a pre-given coordinate system, for example, for camera images, for example, multiple of all camera images, based at least on information about the head pose; or b) determining the position of the gaze target device, or the position, for example, in a pre-given coordinate system or the pre-given coordinate system, for example, for camera images, for example, one or more of all camera images; or c) determining a parameter characterizing the gaze direction, for example, a gaze vector, for example, in a pre-given coordinate system or the pre-given coordinate system, for example, for camera images, for example, one or more of all camera images; or d) determining a parameter characterizing the angle between the gaze direction and the head direction, for example, in a pre-given coordinate system or the pre-given coordinate system, for example, for camera images, for example, one or more of all camera images. Thus, according to some examples, other advantageous aspects for accurately determining reference data are given.

[0014] Other examples relate to an apparatus for performing methods according to this disclosure.

[0015] Some examples relate to a mobile line-of-sight target device, for example, for use in a method according to this disclosure, the mobile line-of-sight target device including at least a manipulation and / or securing section, for example for manually guiding the line-of-sight target device by a person, and / or for securing the line-of-sight target device to another device, such as a robot or a drone; and a marking area having at least one marking device for a motion capture system or the motion capture system, wherein optionally a connecting element configured, for example as a rod, is provided between the manipulation and / or securing section and the marking area.

[0016] Another example relates to a computer-readable storage medium including instructions that, when executed by a computer, cause the computer to perform a method according to the present disclosure.

[0017] In some examples, a computer program is provided that includes instructions that, when executed by a computer, cause the computer to perform a method according to this disclosure.

[0018] Some examples relate to a data carrier signal that transmits and / or characterizes a computer program according to this disclosure.

[0019] Some examples relate to the use of methods and / or devices and / or gaze-targeting devices and / or computer-readable storage media and / or computer programs and / or data carrier signals according to this disclosure for at least one of the following elements: a) determining reference data for a camera image of the person; or b) acquiring, for example, real data on the head posture and / or gaze direction of the person, together with the associated camera image, when there are no visible recording elements in the camera image, for example, that might affect the suitability of the camera image for training a model; or c) providing settings for acquiring real data on the head posture and / or gaze direction of the person; or d) using the relative motion between the person and the first camera device; or e) simplifying gaze direction indication; or f) simplifying the synchronization between the camera image and information determined by means of the motion capture system.

[0020] Other features, applications, and advantages of the invention will derive from the following description of the examples shown in the accompanying drawings. All features described or illustrated herein, either alone or in any combination, constitute the subject matter of this disclosure, regardless of their generalization in the claims or their reference thereto, or their representation or indication in the specification or drawings. Attached Figure Description

[0021] In the attached diagram: Figure 1 A flowchart illustrating some examples is shown schematically; Figure 2 The apparatus is illustrated schematically according to some examples; Figure 3 A flowchart illustrating some examples is shown schematically; Figure 4A A side view is shown schematically based on some examples; Figure 4B A schematic diagram showing a rear view based on some examples; Figure 5 A flowchart illustrating some examples is shown schematically; Figure 6 A flowchart illustrating some examples is shown schematically; Figure 7 A flowchart illustrating some examples is shown schematically; Figure 8 A block diagram is shown schematically based on some examples; Figure 9 A line-of-sight target device is schematically shown according to some examples; Figure 10 A block diagram is shown schematically based on some examples; Figure 11 The illustrations illustrate aspects of the usage based on some examples. Detailed Implementation

[0022] Some examples (see, for example) Figure 1 , 2 This involves camera images KB used to identify person P, particularly those containing person P, such as vehicle 1. Figure 10 A method for obtaining reference data REF from a camera image KB of a driver, such as a computer-implemented method, wherein the reference data REF has information IK about the head posture (e.g., head position and / or orientation KR) of the person P and / or information I-BR about the gaze direction BR of the person P, wherein the method includes: recording 100 (…) using a first camera device 10. Figure 1 The camera image KB of person P; 102 information IK about the head posture of person P is determined by means of a motion-capturing system 12, wherein at least one marker device 12a of the motion-capturing system 12 is arranged at the head K (currently, for example, the back of the head HK) of person P, for example, such that the marker device is invisible on the camera image KB for at least one pre-given angular range of the head (e.g., relative to the first camera device 10), and optionally information I-BR about the direction of gaze of person P is determined. In some example cases, this makes it possible to efficiently determine the above information IK and I-BR together with the camera image KB, and further makes it possible to use the first camera device 10, which is located away from person P, for example, a subject.

[0023] For example( Figure 2The first camera device 10 is a video camera that provides a sequence, or stream, of individual camera images (e.g., frames). This facilitates the allocation of each camera image KB to a time reference, such as the timestamp of the corresponding frame.

[0024] In some example cases ( Figure 1 The method includes using information IK about a person's head pose and / or information I-BR about a person's gaze direction as reference data REF for a camera image KB of the person. This reference data REF can thus be efficiently specified or provided, for example, as training data TD for training an A1 model MOD. Figure 2 ).

[0025] For example( Figure 1 Using information about a person's head pose and / or gaze direction as reference data for camera images of the person (e.g., 106a reference data REF as ground truth data GT), for example, can be used to train at least one model MOD based on artificial intelligence, such as machine learning. Thus, the model MOD can be trained efficiently. For example, the ground truth data GT contains multiple camera images, each annotated, for example, with information IK and / or I-BR corresponding to the respective head poses of the person in the multiple camera images.

[0026] Furthermore, the real data GT or the camera image KB to which it belongs does not have elements that are not expected in order to determine, for example, the structure of the real data, as is the case in some conventional schemes (e.g., "antlers"), which makes it possible to have particularly good quality training data TD (e.g., camera image KB and real data GT) and to train A1 in a particularly efficient manner.

[0027] In some examples ( Figure 3 The method includes at least one of the following elements: a) placing at least one marking device 12a 110 at the back of a person's head; or b) placing at least one marking device 12a 112 in the region of at least one ear O of the person, for example, behind at least one ear, for example, behind both ears relative to the first camera device 10; or c) placing at least one marking device 12a 114 on the head of the person; or d) placing at least one marking device 12b 116 below the chin KI of the person (see...). Figure 4A(e.g., placement of 110, 112, 114, 116 includes adhesive or adhesive bonding.) This allows at least one marking device 12a, 12b to be easily placed at the head of a person, thereby enabling reliable determination of information by means of the motion capture system 12, wherein the marking devices 12a, 12b placed at the person are not visible in the camera image KB, thus valuable training data can be obtained in the camera image KB without interfering information.

[0028] Figure 4A A schematic side view of a person's head is shown. As an example, a first marking device 12a is positioned at the back of the head (HK), and a second marking device 12b is positioned below the chin (KI). In some examples, such as when person P is at least approximately looking towards the area of ​​the first camera device 10 or turning their head there, both marking devices 12a, 12b are advantageously not visible on the camera image KB of the first camera device 10. In other words, in some examples, at least one marking device 12a, 12b can be positioned at the head (K) of person P such that the marking device is not visible on the camera image KB for at least one pre-given angular range (or solid angle range) of the head (K) (e.g., relative to the first camera device 10).

[0029] Figure 4B A schematic rear view is shown, from which the first marking device 12a can be seen positioned at the rear of the head HK.

[0030] In some example cases ( Figure 4A , 4B The first marking device 12a may also be referred to as a "rear head device (Rig)". For example, the "rear head device" is rigidly connected to the head of the wearer (person), such as the skull, for example, to be able to track the head orientation KR. In some example cases, the "rear head device" is designed such that, on the one hand, sufficient spatial coverage is achieved, for example, for robust tracking, by means of multiple motion capture balls (e.g., in some example cases, at least three balls should be visible simultaneously in at least some, such as multiple motion capture cameras 12'), while on the other hand, the rear head device 12a is obscured by the head of the person P at the largest possible angle when recording from the front using the camera 10.

[0031] Based on some examples ( Figure 5 The method includes: moving a line-of-sight target device 14 for a person P in space (see...). Figure 2 , 9 ); with the help of motion capture system 12 ( Figure 2 ) Determine 122 ( Figure 5The position 14-POS of the gaze target device 14; and optionally, information I-BR regarding the gaze direction BR of the person P, at least based on the position 14-POS of the gaze target device 14. In some examples, this enables the position 14-POS of the gaze target device 14 to be determined efficiently and automatically (e.g., without human interaction) by the motion capture system 12, and in some examples, simplifies gaze direction instruction, i.e., the command to a person: in which direction the person should direct their gaze. For example, the gaze target device 14 can be moved within a pre-given spatial range, and the person P can be instructed to track the gaze target device 14 with their gaze, wherein multiple camera images KB of the person, such as the person P's head K or face, can be determined, each camera image associated with a different gaze direction BR and / or head direction KR.

[0032] In some example cases ( Figure 5 ), causing the line-of-sight target device 14 to move 120 includes, for example, by another person P' ( Figure 2 120a can be moved manually, for example, by hand. Or 120b can be moved by means of a robot 50 or a drone 52. This makes it possible to move the line-of-sight target device 14 particularly efficiently and in a way that is adapted to the situation, such as automatically, which in some examples can further simplify the generation of reference data REF.

[0033] For example( Figure 5 The method includes performing an extrinsis calibration (kalibrierung) on ​​the first camera device 10 while using the line-of-sight target device 14. In some examples, this may eliminate, for example, a separate calibration or calibration device for extrinsis calibration of the first camera device 10. The extrinsis calibration 126 may be performed, for example, before or during processes 120, 122, 124 and / or at other points in time.

[0034] In some example cases ( Figure 6The method includes at least one of the following elements: a) performing 130, for example, temporal synchronization SYNC, between a camera image KB (e.g., a frame of a video data stream) and information IK determined by means of the motion capture system 12, such as information about a person's head posture (e.g., head orientation KR); or b) calibrating at least one component, such as the motion capture system 12, 132 (e.g., at the beginning and end of recording the camera image), for example, repeating calibration 132a; or c) filtering the camera image KB, for example, removing 134a such that at least one marker device of the motion capture system 12 is visible on such a camera image (e.g., when unintentionally exceeding the maximum head rotation to be used). According to some examples, this provides other advantageous aspects for accurately determining the reference data REF.

[0035] As another example, ( Figure 7 The method includes at least one of the following elements: a) based at least on information IK about the head pose, for example in a coordinate system KS1, KS2 (which can be given in advance). Figure 2 In, for example, for camera images KB, such as multiple of all camera images KB, determine the position K-POS and / or rotation K-ROT of head K; or b) for example, in a pre-given coordinate system KS1, KS2 or said pre-given coordinate system KS1, KS2, for example, for camera images KB, such as one or more of all camera images KB, determine the position 14-POS of sight target device 14; or c) for example, in a pre-given coordinate system KS1, KS2, or said pre-given coordinate system KS1, KS2, determine the position 14-POS of sight target device 14; or c) for example, in a pre-given coordinate system KS1, KS2, said pre-given coordinate system KS2, KS2, or said pre-given coordinate system KS2, KS2, KS2, for example, for camera images KB, such as one or more of all camera images KB, determine the position 14-POS of sight target device 14; or c) for example, in a pre-given coordinate system KS1, KS2 ... In KS2 or the pre-given coordinate systems KS1, KS2, for example for camera image KB, or one or more of all camera images KB, determine 144 a parameter characterizing the gaze direction BR, such as the gaze vector BRV; or d) in the pre-given coordinate systems KS1, KS2 or the pre-given coordinate systems KS1, KS2, for example for camera image KB, or one or more of all camera images KB, determine 146 a parameter GR-α characterizing the angle α between the gaze direction BR and the head direction KR. Other advantageous aspects for accurately determining reference data REF are thus provided, according to some examples.

[0036] exist Figure 2 In the accompanying drawings, reference numeral KS1 symbolically denotes a first coordinate system, which in some examples may be used to represent information according to this disclosure, such as the coordinate system of a motion capture system 12.

[0037] exist Figure 2In the accompanying drawings, reference numeral KS2 symbolically denotes a second coordinate system. In some examples, the second coordinate system may be used to represent information according to this disclosure, such as alternatives to or supplements to the first coordinate system KS1, for example, the coordinate system of the first camera device 10.

[0038] Some examples ( Figure 8 This relates to an apparatus 200 for performing the methods according to the present disclosure. For example, the apparatus 200 may be configured to perform at least one aspect of the methods according to the present disclosure, including but not limited to: controlling a first camera device 10, or controlling a motion capture system 12, or (e.g., by controlling a robot 50 or a drone 52) moving a line-of-sight target device 14, processing information IK, I-BR, REF, KB, TD according to the present disclosure, training an A1 model MOD, etc.

[0039] In some example cases ( Figure 8 The device 200 includes: a computing device (“computer”) 202 having at least one computing core 202a, and a storage device 204 allocated to the computing device 202 for at least temporarily storing at least one of the following elements: a) data DAT (e.g., representing at least some information IK, I-BR, REF, KB, TD according to this disclosure), b) computer program PRG, for example for performing a method according to an embodiment.

[0040] For example( Figure 8 The storage device 204 has volatile memory (e.g., working memory (RAM)) 204a and / or non-volatile (NVW) memory (e.g., flash EEPROM) 204b, or combinations thereof or with other memory types not explicitly mentioned.

[0041] Other examples ( Figure 8 This relates to a computer-readable storage medium SM, which includes instructions PRG that, when executed by a computer 202, cause the computer to perform a method according to an embodiment.

[0042] Other examples ( Figure 8 The present invention relates to a computer program PRG, which includes instructions that, when executed by a computer 202, cause the computer to perform a method according to an embodiment.

[0043] Other examples ( Figure 8 This relates to a data carrier signal (DCS) that represents and / or transmits a computer program (PRG) according to an embodiment. The data carrier signal (DCS) can be exchanged, for example, via an optional data interface 206 of device 200.

[0044] In some example cases ( Figure 8The functionality of device 200 can also be achieved by means of, for example, pure hardware circuitry.

[0045] Some examples ( Figure 9 This relates to a mobile line-of-sight targeting device 14 (also sadly referred to as a "line-of-sight targeting apparatus"), for example, for use in a method according to this disclosure, said mobile line-of-sight targeting device comprising at least a handling and / or securing section 14a, for example, for use by personnel P' ( Figure 2 The device is used to manually guide the line-of-sight target 14 and / or to secure the line-of-sight target 14 to another device, such as robot 50 or drone 52; and a marking area 14b having at least one marking device 14b' for motion capture system 12 or said motion capture system 12, wherein optionally, a connecting element 14c configured as a rod is provided between the manipulation and / or securing section 14a and the marking area 14b.

[0046] In some example cases ( Figure 9 The gaze target device 14 is equipped with a motion-tracking ball as a marker device 14b', and is tracked by the subject P with his / her eyes during recording, for example, to track a target point that can be received in the gaze direction BR. In some examples, the gaze target device 14 can also be used as a kalibrier target, for example, to position and orient the camera 10 relative to the motion capture system 12, and / or to synchronize the time of systems 10 and 12.

[0047] Figure 10 As an example, a vehicle 1 is shown, which has a device 2, such as an electronic control device, configured to perform a model MOD, which can be trained based on reference data REF according to this disclosure, for example. Figure 2 In some example cases, the Model MOD can be used to determine the driver's head pose and / or gaze based on one or more camera images of the driver of vehicle 1.

[0048] Some examples ( Figure 11This relates to the use 300 of the method and / or device 200 and / or gaze target device 14 and / or computer-readable storage medium SM and / or computer program PRG and / or data carrier signal DCS according to the present disclosure, for at least one of the following elements: a) determining 301 reference data REF of a camera image KB of person P; or b) acquiring 302 real data GT of the person's head posture and / or gaze direction, together with the associated camera image KB, for example, when there are no visible recording elements in the camera image that may affect the suitability of the camera image for training a model; or c) providing 303 settings SET for acquiring real data of the person's head posture and / or gaze direction. Figure 2 ); or d) using 304 the relative motion between the person and the first camera device, or e) simplifying 305 the direction indication, or f) simplifying 306 the synchronization between the camera image and the information determined by means of the motion capture system.

[0049] Other aspects and examples are described below. In some examples, these other aspects and examples may be combined with at least one of the aspects or examples described above, either individually or in combination with each other.

[0050] In some example cases, the principles of this disclosure enable the collection of real data GT (total data byte) along with the associated image KB. Figure 2 For example, real head pose and gaze data, where there are no visible recording elements in the image, limiting its use, for example, for training the A1 model MOD.

[0051] In some example cases, the principles of this disclosure enable, for example, the use of an IR-based motion capture system 12, to provide, for example, a customized setup SET, in combination with at least one fixed camera 12' (e.g., an NIR camera (a camera operating in the near-infrared spectral range)) and at least one marking device 12a, for example, positioned at the back of the subject P's head HK. Figure 2 For example, to enable the collection of real-world data (GT) on head posture and gaze direction without human intervention.

[0052] In some examples, the relative motion of person P with respect to cameras 10, 12' is advantageously used, which can, for example, significantly simplify the direction indication of person P's gaze.

[0053] Compared to some conventional solutions, some advantages of the principles of this disclosure are that the recording device (e.g., a camera and / or a stand, such as the "antlers" mentioned above) is not visible on the image KB, which in some conventional solutions might interfere with its use for training the model MOD. Furthermore, in some examples, a dedicated device for directing the gaze of a person P is not required.

[0054] Another improvement, according to some examples, could be the recording of video by means of a first camera device 10 instead of a single image, which brings in particular the following advantages: simple time synchronization between the images (“frames”) of the video sequence and the motion capture system 12; detection of blinks and, for example, involuntary eye movements; and better scaling.

[0055] In some example cases ( Figure 2 ),according to Figure 2 The device can be provided, for example, by means of the following components: an IR-based, for example, off-the-shelf (i.e., commercially available) marker-based motion capture system; conventional camera calibration and synchronization equipment (not shown, optionally for example, for the initial calibration of camera device 10); a rear-head "head orientation equipment," for example having or representing at least one marker device 12a; and a "line-of-sight target equipment," for example having or representing a line-of-sight target device 14, for example, see above. Figure 9 .

[0056] In some example cases ( Figure 4A The "head orientation equipment" or marking device 12a can be used with the head K ( Figure 2 The head-oriented device or marker 12a is rigidly connected and arranged, for example, such that it is as invisible as possible in the camera image KB (e.g., obscured by the head K), but it can still be recognized by the motion capture system 12. Alternative positioning of the marker-based device according to other examples is conceivable. For example, instead of or supplementing the arrangement at the back of the head HK, one or more markers, such as marker devices, may also be fastened (e.g., glued) behind the ear O or on the head or chin KI. Figure 4A Below, for example, if these are obscured from the corresponding camera's perspective.

[0057] In some example cases ( Figure 2 The following process is proposed: The cameras 10 and / or 12' of the motion capture system are arranged, for example, in a fixed position. The head-mounted device 12a (with its head positioned behind the head) is used in the head-direction direction. Figure 4APersonnel P is instrumented and calibrated. Video recording (or recording of multiple camera images in KB) is initiated via camera 10 and motion capture system 12. Optionally, the calibration equipment (not shown) is moved through images from, for example, two cameras 10 and 12'. Alternatively, in some examples, a gaze-oriented device 14 may be used for extrinsic calibration of cameras 10 and 12'. In some examples, it is assumed that cameras 10 and 12' have already been intrinsically calibrated using a standard method (e.g., a checkerboard calibration pattern), and therefore extrinsic calibration is performed only, for example, in motion capture coordinate system KS2. In some examples, intrinsic calibration may also be performed optionally. Personnel P moves freely in space and, for example, always focuses on an optional line-of-sight target (e.g., a visual marker) optionally set on the gaze-oriented device 14, wherein, in some examples, in addition to moving in space, the person may change head posture to, for example, cover the full range of the gaze and head orientation to be detected. In some examples, the line-of-sight target can also be moved in space if necessary, whether by hand-held means by another person P' or by, for example, by drone 52 or robot 50. It should be noted that the line-of-sight target is within the detection range of motion capture system 12 so that it can be tracked and its 3D position in space can be determined. At the end of recording, the calibration equipment can optionally be moved through the image again. The camera's extrinsic parameters (i.e., its position and orientation relative to the motion capture coordinate system KS2) are calculated from the motion capture data of the calibration equipment.

[0058] In some examples, time synchronization between video or image data, such as camera image KB, and motion capture data can be viewed as an optimization problem concerning the time difference (Zeitdelta) between calibration marks on the calibration equipment or viewing equipment and the motion capture position projected onto the camera space by the same calibration equipment. For example, by repeating the calibration at the end, time desynchronization can be identified and corrected, for example, during recording.

[0059] In some examples, the gaze vector BRV and the rotation between the gaze vector BRV and the head orientation KR can be determined directly, for example, from all camera images, such as the known position and rotation of the head (e.g., obtained based on motion capture data from the head rear device 12a) and the position of the gaze target device 14 (e.g., obtained based on motion capture data from the gaze device 14), for example, see according to Figure 2 Angle α.

[0060] In some example cases ( Figure 2This method can directly determine not only the overall gaze direction but also the instantaneous direction relative to the head posture. If, in an example, the head posture and the target line of sight are known by tracking the corresponding equipment, the gaze direction relative to the head posture can also be directly calculated. By tracking the camera equipment, parameters, for example, relative to the camera can also be directly calculated.

Claims

1. A method, such as a computer-implemented method, for determining reference data (REF) of a camera image (KB) of a person (P), for example, a driver of a vehicle (1), wherein the reference data (REF) includes information (IK) about the head pose of the person (P) and / or information (I-BR) about the gaze direction (BR) of the person (P), wherein the method comprises: A camera image (KB) of the person (P) is recorded (100) by means of a first camera device (10); information (IK) about the head posture of the person (P) is determined (102) by means of a motion capture system (12), wherein at least one marking device (12a, 12b) of the motion capture system (12) is arranged at the head (K) of the person, in particular such that the marking device is not visible in the camera image (KB) for the head (K) for example, relative to at least one pre-given angular range of the first camera device (10); and optionally information (I-BR) about the direction of gaze of the person (P) is determined (104).

2. The method according to claim 1, comprising: (106) Information (IK) about the head pose of the person (P) and / or information (I-BR) about the gaze direction of the person (P) are used as reference data (REF) for the camera image (KB) of the person (P).

3. The method according to at least one of the preceding claims, comprising at least one of the following steps: a) placing the at least one marking device (12a) (110) at the back (H) of the head of the person (P); or b) placing the at least one marking device (12a) (112) in the region of at least one ear (O) of the person (P), for example, behind the at least one ear (O), for example, relative to the first camera device (10), for example, behind both ears (O); or c) placing the at least one marking device (12a) (114) on the head (K) of the person (P); or d) placing the at least one marking device (12b) (116) below the chin (KI) of the person (P), wherein, for example, the placement (110, 112, 114, 116) includes adhesive or adhesive bonding.

4. The method according to at least one of the preceding claims, comprising: The visual target device (14) for the person (P) is moved in space (120). The position (14-POS) of the line-of-sight target device (14) is determined (122) by means of the motion capture system (12); and optionally, information (I-BR) about the direction of sight of the person (P) is determined (124) based at least on the position (14-POS) of the line-of-sight target device (14).

5. The method of claim 4, wherein moving (120) the line-of-sight target device (14) includes moving it manually, for example by hand (120a), or by means of a robot or drone (120b).

6. The method according to at least one of claims 4 to 5, comprising: When using the line-of-sight target device (14), perform (126) extrinsic parameter calibration on the first camera device (10).

7. The method according to at least one of claims 4 to 5, comprising at least one of the following steps: a) performing (130) for example, temporal synchronization between the camera image (KB) and information determined by means of the motion capture system (12), such as information (IK) about the head posture of the person (P); or b) calibrating (132) at least one component, such as the motion capture system (12), for example, repeating calibration; or c) filtering (134) the camera image (KB), for example, removing such camera images (KB) on which at least one marking device (12a, 12b) of the motion capture system (12) is visible.

8. The method according to at least one of the preceding claims, comprising at least one of the following elements: a) determining (140) the position (K-POS) and / or rotation (K-ROT) of the head (K) at least based on information (IK) regarding head pose, for example, in a pre-given coordinate system (KS1, KS2), for example, for the camera image (KB), for example, multiple of all camera images (KB); or b) determining the position (14-POS) of the gaze target device (14) for example, in a pre-given coordinate system (KS1, KS2) or the pre-given coordinate system (KS1, KS2), for example, for the camera image (KB), for example, one or more of all camera images (KB). (14-POS) or the position (14-POS); or (c) in a pre-given coordinate system (KS1, KS2) or the pre-given coordinate system (KS1, KS2), for example for the camera image (KB), for example for one or more of all camera images (KB), determine (146) a parameter (GR-α) representing the angle (α) between the gaze (BR) and the head direction (KR).

9. An apparatus (200) for performing the method according to at least one of the preceding claims.

10. A mobile line-of-sight target device (14), for example for use in the method according to at least one of claims 1 to 8, the mobile line-of-sight target device comprising at least a manipulation and / or fastening section (14a), for example for manually guiding the line-of-sight target device (14) by a person (P'), and / or for fastening the line-of-sight target device (14) to another device, such as a robot (50) or a drone (52); and a marking area (14b) having at least one marking device (14b') for a motion capture system (12) or the motion capture system (12), wherein optionally a connecting element (14c) configured as a rod is provided between the manipulation and / or fastening section (14a) and the marking area (14b).

11. A computer-readable storage medium (SM) comprising instructions (PRG) that, when executed by a computer (202), cause the computer to perform the method according to at least one of claims 1 to 8.

12. A computer program (PRG) comprising instructions that, when executed by a computer (202), cause the computer to perform the method of at least one of claims 1 to 8.

13. A data carrier signal (DCS) that transmits and / or characterizes the computer program (PRG) according to claim 12.

14. A method (300) according to at least one of claims 1 to 8 and / or an apparatus (200) according to claim 9 and / or a line-of-sight target device (14) according to claim 10 and / or a computer-readable storage medium (SM) according to claim 11 and / or a computer program (PRG) according to claim 12 and / or a data carrier signal (DCS) according to claim 13, for use (300) of at least one of the following elements: a) determining (301) reference data (REF) of a camera image (KB) of the person (P); or b) for example when there is no data in the camera image (KB) that could affect the use of the camera image (KB) for training a model ( In the case of a visible recording element suitable for MOD, real data of the head posture and / or gaze direction (BR) of the person (P) are acquired (302) together with the associated camera image (KB); or c) a setting (SET) is provided (303) for acquiring real data of the head posture and / or gaze direction (BR) of the person (P); or d) the relative motion between the person (P) and the first camera device (10) is used (304); or e) the gaze direction indication is simplified (305); or f) the synchronization (SYNC) between the camera image (KB) and the information determined by means of the motion capture system (12) is simplified (306).