Power constrained mode for gaze-and-gesture tracking applications

The gaze-and-gesture tracking application in XR devices switches between display-based and display-less modes using EOG and EMG technologies to address energy efficiency and reliability challenges, ensuring reliable user interaction in low power conditions.

WO2026130696A1PCT designated stage Publication Date: 2026-06-25TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)
Filing Date
2024-12-19
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

Existing user interface systems for XR devices face challenges in balancing energy efficiency and operational reliability, particularly in critical low power modes, due to limitations in existing eye tracking and gesture recognition technologies like EOG and EMG, which are not suited for XR glasses and other XR devices.

Method used

A gaze-and-gesture tracking application that switches between display-based and display-less modes using EOG and EMG technologies, combined with camera-based systems, to optimize power consumption and maintain functionality in low power conditions.

Benefits of technology

Enables efficient power management by minimizing battery drain while maintaining reliable user interaction through adaptive UI modes, allowing XR devices to operate in critical low power conditions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure EP2024087431_25062026_PF_FP_ABST
    Figure EP2024087431_25062026_PF_FP_ABST
Patent Text Reader

Abstract

There is provided techniques for eye tracking and gesture recognition in a gaze-and-gesture tracking application that combines eye tracking for object selection and gesture recognition for action selection. The gaze-and-gesture tracking application is to be run in an XR device that comprises a display. The method is performed by a controller device. The method comprises activating a first UI input mode for the gaze-and-gesture tracking application based on display-based eye tracking. The display-based eye tracking is performed for detecting user selection of objects as rendered on the display. The method comprises, responsive to detecting a critical low power condition, deactivate the display. The method comprises activating a second UI input mode for the gaze-and-gesture tracking application based on display-less eye tracking and / or gesture tracking. The display-less eye tracking and / or gesture tracking is performed for detecting user selection of objects without any visual feedback.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] POWER CONSTRAINED MODE FOR GAZE-AND-GESTURE TRACKING APPLICATIONS

[0002] TECHNICAL FIELD

[0003] Embodiments presented herein relate to a method, a controller device, a computer program, and a computer program product for eye tracking and gesture recognition in a gaze-and-gesture tracking application that combines eye tracking for object selection and gesture recognition for action selection.

[0004] BACKGROUND

[0005] Recent advancements in user interface (UI) technology have introduced new means for users to interact with electronic devices. One example involves integration of gesture recognition and eye-tracking technologies, allowing users to control applications without traditional input devices, such as hand controllers, touchscreens, or keyboards. Such systems operate by detecting eye gaze and gestures, enabling intuitive control through a combination of visual confirmations and motion-based inputs.

[0006] Currently, implementations of eye tracking in virtual reality (VR) systems predominantly rely on optical or camera-based technologies, commonly referred to as videooculography (VOG). VOG provides high- resolution tracking with high accuracy but is associated with significant energy consumption.

[0007] Some mixed-reality headsets, and other types of user devices, are based on combining two or more sensor technologies for gesture recognition and eye-tracking. Such user devices often incorporate high- performance displays, processors, and multiple sensors, including externally facing cameras for gesture recognition and optical sensors for eye tracking. While such systems can accommodate the high energy demands of these components due to their size and external power sources, they are not suited for smaller devices, such as extended reality (XR) glasses, which require more energy -efficient solutions. AR glasses typically operate in low-power modes and are designed for intermittent use, making the implementation of energy -intensive UIs impractical.

[0008] An alternative approach to camera-based eye tracking is electrooculography (EOG), offering markedly lower power requirements. However, EOG has inherent limitations, including lower resolution and a dependency on calibration that is specific to individual users and environmental conditions. Electromyography (EMG) represents another technological approach that can support gesture recognition. EMG sensors capture the electrical activity generated by skeletal muscles, enabling the detection of finger movements and gestures. EMG sensors can be placed on various parts of the body, such as the wrist or forearm, to monitor muscle activation. However, despite its energy efficiency compared to camera-based gesture recognition, EMG-based systems face challenges related to accuracy, calibration, and potential retraining requirements, especially in scenarios where the sensors are frequently removed and reapplied.

[0009] Existing technology for low-power eye tracking, such as those combining EOG and VOG technologies, have sought to balance the trade-offs between energy efficiency and accuracy. Similarly, EMG-based systems have been explored as a lower-power alternative to camera-based gesture recognition. However, standalone implementations of these technologies are often limited by their individual disadvantages, including calibration requirements and restricted applicability across various usage scenarios.

[0010] Further, the aforementioned technologies fail to support critical low power modes of XR glasses and other types of XR devices.

[0011] Hence, a challenge remains in developing UI systems that combines the above-mentioned technologies to deliver both energy efficiency and operational reliability, particularly for critical low power modes.

[0012] SUMMARY

[0013] An object of embodiments herein is to address the above challenges.

[0014] A particular object is to enable a gaze-and-gesture tracking application to be run in an XR device that is in an critical low power mode.

[0015] According to a first aspect there is presented a controller device for eye tracking and gesture recognition in a gaze-and-gesture tracking application that combines eye tracking for object selection and gesture recognition for action selection. The gaze-and-gesture tracking application is to be run in an XR device. The XR device comprises a display. The controller device comprises processing circuitry. The processing circuitry is configured to cause the controller device to activate a first UI input mode for the gaze-and- gesture tracking application based on display -based eye tracking. The display-based eye tracking is performed for detecting user selection of objects as rendered on the display. The processing circuitry is configured to cause the controller device to, responsive to detecting a critical low power condition, deactivate the display. The processing circuitry is configured to cause the controller device to activate a second UI input mode for the gaze-and-gesture tracking application based on display -less eye tracking and / or gesture tracking. The display -less eye tracking and / or gesture tracking is performed for detecting user selection of objects without any visual feedback.

[0016] According to a second aspect there is presented a system. The system comprises the controller device according to the first aspect, and electrodes and sensors to be placed on a user and configured to perform electrooculography-based eye tracking and electromyography -based gesture recognition.

[0017] According to a third aspect there is presented a method for eye tracking and gesture recognition in a gaze- and-gesture tracking application that combines eye tracking for object selection and gesture recognition for action selection. The gaze-and-gesture tracking application is to be run in an XR device. The XR device comprises a display. The method is performed by a controller device. The method comprises activating a first UI input mode for the gaze-and-gesture tracking application based on display -based eye tracking. The display-based eye tracking is performed for detecting user selection of objects as rendered on the display. The method comprises, responsive to detecting a critical low power condition, deactivate the display. The method comprises activating a second UI input mode for the gaze-and-gesture tracking application based on display-less eye tracking and / or gesture tracking. The display -less eye tracking and / or gesture tracking is performed for detecting user selection of objects without any visual feedback.

[0018] According to a fourth aspect there is presented a computer program for eye tracking and gesture recognition in a gaze-and-gesture tracking application that combines eye tracking for object selection and gesture recognition for action selection. The gaze-and-gesture tracking application is to be run in an XR device. The XR device comprises a display. The computer program comprises computer code which, when run on processing circuitry of a controller device, causes the controller device to perform actions. One action comprises the controller device to activate a first UI input mode for the gaze-and-gesture tracking application based on display -based eye tracking. The display-based eye tracking is performed for detecting user selection of objects as rendered on the display. One action comprises the controller device to, responsive to detecting a critical low power condition, deactivate the display. One action comprises the controller device to activate a second UI input mode for the gaze-and-gesture tracking application based on display -less eye tracking and / or gesture tracking. The display -less eye tracking and / or gesture tracking is performed for detecting user selection of objects without any visual feedback.

[0019] According to a fifth aspect there is presented a computer program product comprising a computer program according to the fourth aspect and a computer readable storage medium on which the computer program is stored. The computer readable storage medium could be a non-transitory computer readable storage medium.

[0020] Advantageously, these aspects enable the gaze-and-gesture tracking application to be run in an XR device that is in critical low power mode.

[0021] Advantageously, these aspects enable display -based eye tracking and gesture recognition to be used only when necessary from an eye tracking and gesture recognition performance perspective.

[0022] Advantageously, these aspects enable an eye tracking and gesture recognition combination to be selected that yields substantial power consumption benefits compared to traditional head-mounted displays.

[0023] Other objectives, features and advantages of the enclosed embodiments will be apparent from the following detailed disclosure, from the attached dependent claims as well as from the drawings.

[0024] Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a / an / the element, apparatus, component, means, module, step, etc." are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, module, step, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated. BRIEF DESCRIPTION OF THE DRAWINGS

[0025] The inventive concept is now described, by way of example, with reference to the accompanying drawings, in which:

[0026] Fig. 1 is a schematic diagram illustrating a gaze-and-gesture tracking system according to embodiments;

[0027] Figs. 2 and 3 are block diagrams of devices according to embodiments;

[0028] Figs. 4 shows different power modes in a gaze-and-gesture tracking application according to embodiments;

[0029] Figs. 5, 6, and 7 are flowcharts of methods according to embodiments;

[0030] Fig. 8 is a schematic diagram showing structural units of a controller device according to an embodiment; and

[0031] Fig. 9 shows one example of a computer program product comprising computer readable storage medium according to an embodiment.

[0032] DETAILED DESCRIPTION

[0033] The inventive concept will now be described more fully hereinafter with reference to the accompanying drawings, in which certain embodiments of the inventive concept are shown. This inventive concept may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and will fully convey the scope of the inventive concept to those skilled in the art. Like numbers refer to like elements throughout the description. Any step or feature illustrated by dashed lines should be regarded as optional.

[0034] As disclosed above, challenges remain in developing UI systems that combines the above-mentioned technologies to deliver both energy efficiency and operational reliability, particularly for critical low power modes.

[0035] Fig. 1 is a schematic diagram illustrating XR devices 110, represented by a pair of smart glasses, or XR glasses, for gaze-and-gesture tracking according to embodiments. The XR device 110 comprises a display 112. The gaze-and-gesture tracking might be performed for a gaze-and-gesture tracking application pertaining to gaming, navigation, or tracking user behaviour in an XR environment. Further, the eyetracking application might be performed for an application pertaining to an online meeting, an online learning event, etc. In these examples, there may be different UI input modes for the gaze-and-gesture tracking application. Gaze-tracking may be performed using either electrooculography -based eye-tracking or camera-based eye-tracking. The electrooculography -based eye-tracking is based on signals obtained from electrodes 120. In more detail, Fig. 1 schematically illustrates different types of implementations of electrooculography-based eye-tracking, and particularly the placements of the electrodes 120. In Fig. 1(a) six electrodes 120 (denoted HR (horizontal right), VL (vertical lower), HL (horizontal left), VU (vertical upper), REF (reference), GND (ground)) are fixed to the skin of a user 150 wearing the user device 110. In Fig. 1(b) the electrodes (as represented by single electrode 120) are part of the user device 110 itself. In Fig. 1(c) the electrodes (as represented by single electrode 120) are placed on an in-ear headset 140. In Fig. 1(c) the electrodes might be integrated with audio and microphone circuitry in the in-ear headset 140. The camera-based eye-tracking is based on signals obtained from an inward-facing camera 130 (i.e., a camera arranged to face the user as the user wears the XR device 110). Further, gesture tracking may be performed using either electromyography -based gesture tracking or camera-based gesture tracking. The camera-based gesture tracking is performed using one or more front-facing cameras 130 (i.e., a camera arranged to face away from the user as the user wears the XR device 110). One or more biosensors 170 can be put on fingers, on the forearm, or in the wrist of the user in order to detect finger movements and gestures. In general terms, the bio sensor may be arranged to different types of biosignals, such as electroencephalography (EEG) signals, EMG signals, electrooculography (EOG) signals, electrocardiogram (ECG) signals, galvanic skin response (GSR) signals, blood volume pulse (BVP) signals, etc. These biosignals will hereinafter be exemplified by EMG signals. For example, EMG-based gesture tracking can be performed based on recordings of the electrical activity produced by skeletal muscles of the user. The biosensor 170 may then be configured to detect the electric potential generated by muscle cells when these cells are electrically or neurologically activated. In the illustrative example of Fig. 1, a biosensor 170 is provided in a smart wearable 160 to be worn by the user. The wearable electronic device 160 may be a smartwatch, or other type of device, such as a smart wristband. The user might wear two such wearable electronic devices 160, e.g., one per wrist, or only one. In case of multiple wearable electronic devices 160 being worn by the user, these wearable electronic devices 160 do not necessarily have to be of the same type. The system 100 further comprises a controller device. In some examples, the controller device is integrated with the XR device 110 or the wearable electronic device 160. In other examples, the controller device is provided in a user equipment. The controller device is configured to control which UI input mode is used for the gaze-and-gesture tracking application, as will be further disclosed hereinafter.

[0036] Whereas both EOG-based eye-tracking and EMG-based gesture recognition technologies are known per se, the herein disclosed embodiments are based on using them jointly and / or in combination with either camera-based eye-tracking or camera -based gesture recognition for enabling low power consumption in XR devices 110, and particularly for enabling the gaze-and-gesture tracking application to utilize a critical low power mode. In further detail, there may be situations where basic operation of the XR device is required when its battery status is very low. In such situations, the XR device must have a robust UI, although EMG and / or EOG are not perfectly calibrated. This is since turning on camera-based technologies will drain the battery too fast. Block diagrams of an XR device 210, a wearable electronic device 220, and EOG device 230 will be described next with reference to Fig. 2.

[0037] The XR device 210 comprises at least one camera 210a for eye-tracking (inwards-facing camera(s) plus infrared light-emitting diodes) and at least one camera 210b for gesture recognition (outwards / downwards facing cameras). Further, the XR device 210 comprises a display 210c for rendering a visual user interface on which various selectable objects, or icons, can be rendered in an XR environment. The general operation of the XR device 210 is controlled by a controller 210d, comprising processing circuitry, power and system control, local communication within the XR device 210, etc. Instructions and other type of data can be stored in a memory 210e. A communication interface 21 Of is provided for communication to external devices, such as a controller device, wearable electronic devices, EOG devices, wireless headphones, and the like. The communication interface 210f might implement any, or any combination of, a Bluetooth interface, an IEEE 802.11 interface, a third-generation partnership project (2GPP) side-link interface, or any other local communication interface, or even a cellular communication interface. The XR device 210 may comprise further modules, such as an EOG-based system for low-power low-accuracy eye-tracking.

[0038] The wearable electronic device 220 comprises at least one first sensor 220a for gesture recognition based on EMG signals, thus configured for sensing movements of muscles etc. of the arm and hand. The wearable electronic device 220 comprises at least one second sensor 220b for force, angular rate, and / or body orientation of the user, such as an inertial measurement unit (IMU). Further, the wearable electronic device 220 comprises a feedback module 220c for providing feedback (tactile, audible, or visual) to a user. The general operation of the wearable electronic device 220 is controlled by a controller 220d, comprising processing circuitry, power and system control, local communication within the wearable electronic device 220, etc. Instructions and other type of data can be stored in a memory 220e. A communication interface 220f is provided for communication to external devices, such as a controller device, XR devices, EOG devices, and the like. The communication interface 220f might implement any, or any combination of, a Bluetooth interface, an IEEE 802.11 interface, a 2GPP side-link interface, or any other local communication interface, or even a cellular communication interface.

[0039] The EOG device 230 comprises at least one first sensor 230a, or electrode, arranged to be fixed to the skin of a user for gesture recognition based on EOG signals. Further, the wearable electronic device 230 comprises a feedback module 230b for providing feedback (tactile, audible, or visual) to a user. The general operation of the wearable electronic device 230 is controlled by a controller 230c, comprising processing circuitry, power and system control, local communication within the wearable electronic device 230, etc. Instructions and other type of data can be stored in a memory 230d. A communication interface 23 Oe is provided for communication to external devices, such as a controller device, XR devices, wearable electronic devices, and the like. The communication interface 230e might implement any, or any combination of, a Bluetooth interface, an IEEE 802.11 interface, a 2GPP side-link interface, or any other local communication interface, or even a cellular communication interface.

[0040] As disclosed above, the controller device is configured to control which UI input mode is used for the gaze-and-gesture tracking application. This is illustrated in the block diagram 300 of Fig. 3. The block diagram 300 may be implemented by the controller device. The block diagram 300 comprises a UI control module 310 configured to control which UI input mode is to be used for the gaze-and-gesture tracking application. Since different combinations of eye-tracking and gesture tracking will be used for different UI input modes, the block diagram 300 comprises a gaze and gesture control module 320 configured to provide instructions for activation, deactivation, calibration, and other types of instructions to the XR device, the wearable electronic device, and the EOG device. For this purpose, the block diagram 300 comprises a camera (CAM) control and calibration module 330a that interfaces the XR device, an EMG control and calibration module 330b that interfaces the wearable electronic device, and an EOG control and calibration module 330c that interfaces the EOG device.

[0041] As will be disclosed in more detail hereinafter, there is disclosed methods for eye tracking and gesture recognition in a gaze-and-gesture tracking application that combines eye tracking for object selection and gesture recognition for action selection, making use of a combination of EOG and camera-based gaze tracking as well as EMG and camera-based gesture recognition, as well as other combinations. In this way, most of the time, the cameras can be in a sleep mode, which will minimize the power consumption of the XR device, and in particular the use of a critical low power mode. This is made possible through switching between the camera-based techniques and EOG as well as EMG, and the adaption of the UI in order to maximize their usage and still allow the eye tracking and gesture recognition to be used for the gaze-and-gesture tracking application when the XR device is in a critical low power mode. Reference is here made to Fig. 4, which shows a scheme 400 comprising different power modes 410:460 that can be used in a gaze-and-gesture tracking application. Bi-directional arrows indicate some possible switching between the different power modes 410:460. Generally, the power consumption increases from left to right. I.e., the only EOG power mode 410 and / or the only EMG power mode 420 require(s) least amount of power and the CAM+CAM power mode 460 requires most amount of power.

[0042] Furthermore, as will be disclosed in more detail below, in a critical low power mode the eye tracking and / or the gesture tracking needs to be performed without having access to a display for displaying information to the user, due to limited power supply in the XR device causing the display to be deactivated in order to save power and prolong the battery life of the XR device. This, for example, implies that eye tracking based on identifying which icon or object the user is gazing at on a display cannot be used in the critical low power mode. This will below be referred to as display -less eye tracking and / or gesture tracking. In this critical low power mode, only calibration-robust gestures and gazes can be used, including specific combinations. Icons or selectable objects are separated in an extreme way - e.g. left, right, upper (top), lower ends (bottom), corners, and middle with respect to the field of view in which the user can gaze. In a similar way, gestures are limited to a small set of distinct gestures. Combination of gaze and gesture can be further used for some basic operation, such as “home screen”, “exit”, “back”, and other common actions for the gaze-and-gesture tracking application at hand. Other common operations under the critical low power mode include other display-less activities, such as audio-based or even tactile-based status feedback to the user. The critical low power mode is thus even more restricted than a normal low-power mode.

[0043] After start-up of the gaze-and-gesture tracking application, the first power mode to be used is the only EOG power mode 410 in which only EOG-based eye tracking is activated. This is sufficient for basic operation of the XR device until more advanced or higher resolution input or control is required. This is a power mode which the system software (e.g. “home screen” or basic control setup of the XR environment) and applications have access to, aware if the constraints. This allows maximizing the usage in low -power UI mode to minimize power consumption. In the only EOG power mode 410 icons as displayed on a visual UI of the XR device are separated based on the resolution and accuracy of the EOGbased eye tracking. The only EOG power mode 410 can also be used for display -less eye tracking, where the display thus is deactivated.

[0044] The only EMG power mode 420 is restricted to perform gesture recognition based on one or more EMG sensors placed on one or more wrists of the user, whereas in the EMG+EOG power mode 430 and / or the CAM+EMG power mode 450 it could be possible to perform gesture recognition based on one or more EMG sensors placed on either a single wrist or both wrists of the user. Hence, in the only EMG power mode 420 the UI should be adapted to avoid the need for dual-hand gestures by avoiding those options in menus etc. Further, since the EMG-based gesture recognition may be run in the background (e.g., for calibration purposes) when camera-based gesture recognition is used, it may in the CAM+EOG power mode 440 and in the CAM+CAM power mode 160 use a combination of EMG and camera-based gesture recognition. The only EMG power mode 420 can be used for display-less gesture recognition.

[0045] In the EMG+EOG power mode 430 only gestures that are accurately recognizable by the EMG-based gesture recognition are used. This can be secured by EMG calibration and feedback to the user about which gestures are possible and guiding the user how these gestures can be better performed (if needed). The EMG+EOG power mode 430 can be used for display-less eye tracking and gesture tracking.

[0046] Since EOG has lower resolution and larger inaccuracy (more margin needed), then multiple objects might fall into the detected gaze. If this occurs, not only one, but several icons may be visually marked, e.g. by gracefully indicating a region rather than a single icon on the visual UI of the XR device, and an “activate” gesture would not function unless more accurate selection has been performed. Instead of enabling the “activate” gesture, e.g. “pinch” gestures or “adjust” gestures could be enabled, allowing the user with simple EMG-detectable gestures to fine-tune the selection before any “activate” gesture can be accepted. In many cases, icons and other objects have been spread out sufficiently so that in-region fine- tuning with EMG gestures should not be needed. When the gaze-and-gesture tracking application requires a more advanced UI, e.g. more advanced gestures or higher-resolution gaze detection and when the remaining power of the XR device allows this, either the CAM+EOG power mode 440 or the CAM+EMG power mode 450 will be enabled. This occurs automatically without any need for activation from the user. The gesture-based and gaze-based camera systems are enabled independently, depending on the respective needs, in order to minimize their energy consumption. In this way, either the CAM+EOG power mode 440 or the CAM+EMG power mode 450 can be enabled, depending on if either camera-based eye tracking or camera-based gesture recognition is required by the gaze-and-gesture tracking application. The CAM+EOG power mode 440 and the CAM+EMG power mode 450 may, or may not, use display-less eye tracking and gesture tracking, depending on the implementation.

[0047] Further, when the gaze-and-gesture tracking application requires an even more advanced UI, e.g., even more advanced gestures or even higher-resolution gaze detection, the CAM+CAM power mode 460 will be enabled.

[0048] In some examples, whenever camera-based eye tracking and / or camera-based gesture recognition is activated, also EOG and EMG technologies are active in parallel for training purposes, such as fine- tuning and calibration. This is possible due to the comparatively low power consumption of both EOGbased eye tracking and EMG-based gesture tracking.

[0049] Fig. 5 is a flowchart illustrating embodiments of methods for eye tracking and gesture recognition in a gaze-and-gesture tracking application that combines eye tracking for object selection and gesture recognition for action selection. The gaze-and-gesture tracking application is to be run in an XR device 110. The XR device 110 comprises a display. The methods are performed by the controller device 800. The methods are advantageously provided as computer programs.

[0050] SI 02: The controller device activates a first UI input mode for the gaze-and-gesture tracking application. The first UI input mode uses display -based eye tracking. The display -based eye tracking is performed for detecting user selection of objects as rendered on the display of the XR device 110. In other words, by display -based eye tracking is meant that the first UI input mode is based on that icons, or objects, the user is selecting from are rendered on the display in the XR device 110.

[0051] SI 04: The controller device, responsive to detecting a critical low power condition, deactivates the display.

[0052] SI 06: The controller device (upon having deactivated the display) activates a second UI input mode for the gaze-and-gesture tracking application. The second UI input mode uses display-less eye tracking and / or gesture tracking. The display -less eye tracking and / or gesture tracking is performed for detecting user selection of objects without any visual feedback. In other words, by display -less eye tracking is io meant that the second UI input mode is based on that icons, or objects, the user is selecting from are not rendered on the display in the XR device 110, since the display itself has been deactivated.

[0053] In this way, after the critical low power condition has been detected (e.g., by the battery level of the XR device being below some threshold level, such as 10% battery life remaining or even 5% battery life remaining), the display is switched off. In this way, no icons, virtual objects, etc. are shown to the user. However, it is still possible to perform basic eye tracking (e.g., by the user gazing to the left for selecting a first action, the user gazing to the right for selecting a second action, etc.) and basic gesture tracking can be performed, but without providing any visual feedback to the user.

[0054] Embodiments relating to further details of eye tracking and gesture recognition in a gaze -and- gesture tracking application that combines eye tracking for object selection and gesture recognition for action selection as performed by the controller device will now be disclosed with continued reference to Fig. 5.

[0055] In general terms, the first UI mode is based on a first combination of camera-based eye tracking or EOGbased eye tracking and camera-based gesture tracking or EMG-based gesture tracking, and the second UI mode is based on a second combination of camera-based eye tracking or EOG-based eye tracking and camera-based gesture tracking or EMG-based gesture tracking, different from the first combination.

[0056] In a first example, a switch is made from CAM+CAM 460 directly to EMG+EOG 430. Thus, in a first embodiment, the first UI mode is based on camera-based eye tracking and camera-based gesture recognition, and the second UI mode is based on EOG-based eye tracking and EMG-based gesture recognition.

[0057] In a second example, a switch is made from CAM+EOG 440 to CAM+EMG 450. Thus, in a second embodiment, the first UI mode is based on EOG-based eye tracking and camera-based gesture recognition, and the second UI mode is based on camera-based eye tracking and EMG-based gesture recognition.

[0058] In a third example, a switch is made from CAM+EOG 440 to EMG+EOG 430. Thus, in a third embodiment, the first UI mode is based on EOG-based eye tracking and camera-based gesture recognition, and the second UI mode is based on EOG-based eye tracking and EMG-based gesture recognition.

[0059] In a fourth example, a switch is made from CAM+EMG 450 to EMG+EOG 430. Thus, in a fourth embodiment, the first UI mode is based on camera-based eye tracking and EMG-based gesture recognition, and the second UI mode is based on EOG-based eye tracking and EMG-based gesture recognition.

[0060] In a fifth example, a switch is made from CAM+CAM 460 to CAM+EOG 440. Thus, in a fifth embodiment, the first UI mode is based on camera-based eye tracking and camera-based gesture recognition, and the second UI mode is based on EOG-based eye tracking and camera-based gesture recognition.

[0061] In a sixth example, a switch is made from CAM+CAM 460 to CAM+EMG 450. Thus, in a sixth embodiment, the first UI mode is based on camera-based eye tracking and camera-based gesture recognition, and the second UI mode is based on camera-based eye tracking and EMG-based gesture recognition.

[0062] As already mentioned, the critical low power condition may be detected e.g., by the battery level of the XR device being below some threshold level, such as 10% battery life remaining or even 5% battery life remaining. Hence, in some embodiments, the critical low power condition is fulfilled when a power supply for the gaze-and-gesture tracking application falls below a power supply threshold level.

[0063] As already mentioned, in the critical low power mode, icons or selectable objects are separated in an extreme way - e.g. left, right, upper (top), lower ends (bottom), comers, and middle with respect to the field of view in which the user can gaze. Therefore, in some embodiments, the second UI input mode is associated with a lower resolution gaze detection requirement than the first UI input mode by being associated with fewer and / or more widely spaced selectable objects than the first UI input mode. In some examples, only calibration-robust gestures and gazes (and specific combinations thereof) can be used in the second UI input mode. Icons or selectable objects may be separated in an extreme way - e.g. left, right, upper (top), lower ends (bottom), comers, and middle with respect to the field of view in which the user can gaze. In a similar way, gestures may be limited to a small set of distinct gestures. Combinations of gaze and gesture can be further used for some basic operation, such as “home screen”, “exit”, “back”, and other common actions for the gaze-and-gesture tracking application at hand. Other common operations in the the second UI input mode may pertain to other display-less activities, such as audiobased or even tactile-based status feedback to the user.

[0064] Then, in case multiple selections fall into the gaze area (which can be fairly large in the second UI input mode), this region and / or a recommended object may be indicated to the user. Therefore, in some embodiments, the controller device is configured to, responsive to the display-less eye tracking associated with the second UI input mode indicating selection of at least two selectable objects in the display -less eye tracking, perform step S108.

[0065] SI 08: The controller device provides audible or tactile feedback to the user of the XR device 110. The feedback is indicative of a region comprising the at least two selectable objects and / or is indicative of one of the at least two selectable objects for selection.

[0066] Further, the feedback may indicate one of the selectable objects. Further, providing the feedback may comprise to temporarily activate the display to highlight this one of the selectable objects. This indicated object may be the most probable object for the user to select. In one example this object is based on that it is the object placed in the center of the area at which the user is gazing. In one example this object is given by the context of the gaze-and-gesture tracking application. In one example this object is given by having previously been selected by the user. In particular, in some embodiments, that the feedback is to be indicative of this one of the at least two selectable objects being selected is based on a probability criterion. The probability criterion pertains to at least one of: location of this one of the at least two selectable objects in the region, context of the gaze-and-gesture tracking application, amount of previous selections of said one of the at least two selectable objects.

[0067] In some aspects, for example in case the indicated object is not selected by the user, the selection is finetuned. Hence, in some embodiments, the controller device is configured to, responsive to this one of the at least two selectable objects not being selected, perform (optional) step SI 10.

[0068] SI 10: The controller device repeatedly provides audible or tactile feedback indicative of a respective one of the at least two selectable objects for selection until the object as indicated is selected.

[0069] The audible feedback can be provided as a sound, for example as one or more beeps, or tones. Different patterns of beeps, or tones, can be used to provide different types of feedback information. This may be achieved by providing the XR device with one or more speakers. The tactile feedback can be provided as one or more vibrations. Different patterns of vibrations can be used to provide different types of feedback information. This may be achieved by providing the XR device with one or more vibrators. Further, feedback indicative of a respective one of the at least two selectable objects for selection can also be provided by means of a blinking light. This may be achieved by providing the XR device with one or more light sources.

[0070] This procedure can be used to update the accuracy-estimation of the EOG for that region, as well as to update the EOG calibration. In particular, in some embodiments, the controller device is configured to, responsive to the object that is being indicated is selected, perform (optional) step SI 12.

[0071] SI 12: The controller device updates an accuracy estimation and / or calibration of eye tracking and gesture recognition of the second UI input mode at least for this region.

[0072] In some example, the accuracy estimation and / or calibration is updated based on a difference in location between this one of the at least two selectable objects and the object that is selected.

[0073] This accuracy estimation can be used to determine the size of the gaze-region for future usages. That is, the accuracy estimation may define the size of a gaze region for the eye tracking.

[0074] For gestures where the classification confidence is lower than a certain threshold, alternative gestures which can be selected based on EOG-robust gaze support can be indicated to the user. Hence, in some embodiments, the controller device is configured to perform (optional) steps SI 14 and SI 16. SI 14: The controller device estimates a classification confidence for the second UI input mode.

[0075] SI 16: The controller device, responsive to detecting a low classification confidence condition for the classification confidence, updates the second UI input mode to be associated with even lower resolution gaze detection requirement, and to be associated with even fewer gestures.

[0076] Further, basic functions (for example to start a phone call, activate voice control, turn on status-indication on display, etc.,) can be activated by a combination of gaze- and gesture-control with known high accuracy. That is, in some embodiments, the display -less eye tracking is updated to be associated only with selectable objects associated with gestures for which the classification confidence satisfies a high classification confidence condition. Further, audio-feedback or haptic feedback can be provided to the user upon selection of some basic commands. Hence, in some embodiments, the controller device is configured to, responsive to having deactivated the display, perform (optional) step SI 18.

[0077] SI 18: The controller device activates audible or haptic feedback to be provided upon selection of a selectable object.

[0078] One particular embodiment for critical low power mode eye tracking and gesture recognition in a gaze- and-gesture tracking application based on at least some of the above disclosed embodiments will now be disclosed in detail with reference to the flowchart of Fig. 6.

[0079] S201: It is detected that the battery level of the XR device is below some threshold level, such as 10% battery life remaining or even 5% battery life remaining.

[0080] S202: The XR device, responsive to the critical low power condition being detected, enters a critical low power mode where, for example, the display is deactivated.

[0081] S203: An UI input mode, based on display-less eye tracking and / or gesture tracking, for critical low power operation of the gaze-and-gesture tracking application is activated.

[0082] S204: The display-less eye tracking and / or gesture tracking is ongoing. It is checked whether a combined gaze and gesture is detected. If yes, step S205 is entered. If no, step S207 is entered.

[0083] S205 : it is checked whether the detected combined gaze and gesture corresponds to a valid gaze and gesture command in the gaze-and-gesture tracking application. If yes, step S206 is entered. If no, step S207 is entered.

[0084] S206: The gaze and gesture command is executed.

[0085] S207: It is checked whether the object selection based on the display -less eye tracking and / or gesture tracking is ambiguous or not. If yes, step S208 is entered. If no, step S204 can be entered again. The object selection based on the display-less eye tracking and / or gesture tracking is ambiguous in case the gaze input cannot clearly be distinguished, e.g., there are two or more objects being close in probability for selection. This could be the case where EOG has low accuracy and multiple options might be possible. The command in step S206 could then be indicated to the user with a most probable choice of the two or more objects, and the user can, by a simple gesture, update that choice as part of a re-training procedure (as in step S208).

[0086] S208: A fine-tuning, or re-training, of the display-less eye tracking and / or gesture tracking is performed. This can be done by temporarily activating the display and highlighting the two or more possible options with one being selected, and the user can either confirm or move the selection to any other highlighted object and then confirm. Step S204 can then be entered again. Thus, when there is an ambiguity and the user corrects the selection, the system gets feedback that out of the two or more likely objects, the system was wrong in what was the most probable object, and the system can thus adjust its calibration accordingly.

[0087] One particular embodiment for switching between different power modes during eye tracking and gesture recognition in a gaze-and-gesture tracking application based on at least some of the above disclosed embodiments will now be disclosed in detail with reference to the flowchart of Fig. 7. The XR device is operated in a first UI input mode for the gaze-and-gesture tracking application, as in step S301. The first UI input mode is based on a first combination of camera-based eye tracking or EOG-based eye tracking and camera-based gesture tracking or EMG-based gesture tracking. The first UI input mode is based on display -based eye tracking. Operation may remain in the first UI input mode until the remaining power goes below 20% battery life remaining, as checked in step S302. When the remaining power goes below 20% battery life remaining, the XR device is operated in a second UI input mode for the gaze-and-gesture tracking application, as in step S303. The second UI input mode is based on a second combination of camera-based eye tracking or EOG-based eye tracking and camera-based gesture tracking or EMG-based gesture tracking. The second UI input mode is a low power mode and requires less power consumption than the first UI input mode. In the second UI input mode, the gaze-and-gesture tracking application thus runs in low-power mode. The display can be selectively activated and deactivated. Operation may remain in the second UI input mode until the remaining power goes below 5% battery life remaining, as checked in step S304. When the remaining power goes below 5% battery life remaining, the XR device is operated in a third UI input mode for the gaze-and-gesture tracking application, as in step S305. The third UI input mode is based on a third combination of camera-based eye tracking or EOG-based eye tracking and camera-based gesture tracking or EMG-based gesture tracking. The third UI input mode is based on display -less eye tracking. The third UI input mode is a critical low power mode and requires less power consumption than the second UI input mode. Optionally, when operating in the third UI input mode, gesture and gaze control is moved to a companion device, as in step S306. The user might still wear the XR device and use combinations of gazes and gestures, but in fact most of the execution and control is performed by the companion device, such as a smart wearable device for gesture detection, and an EOG device for gaze detection. Although the flowchart in Fig. 7 shows operation going from a first UI input mode to a second UI input mode and then to a third UI input mode, this does not exclude the possibility of the operation going from the first UI input mode directly to the third UI input mode, e.g., when the remaining power goes below 20% battery life remaining in the XR device. Further, although the flowchart in Fig. 7 shows operation in three different UI input modes, this does not exclude the fact that there could be further UI input mode, with respective threshold values.

[0088] Fig. 8 schematically illustrates, in terms of a number of structural units, the components of a controller device 800 according to an embodiment. Processing circuitry 810 is provided using any combination of one or more of a suitable central processing unit (CPU), multiprocessor, microcontroller, digital signal processor (DSP), etc., capable of executing software instructions stored in a computer program product 910 (as in Fig. 9), e.g. in the form of a storage medium 830. The processing circuitry 810 may further be provided as at least one application specific integrated circuit (ASIC), or field programmable gate array (FPGA).

[0089] Particularly, the processing circuitry 810 is configured to cause the controller device 800 to perform a set of operations, or steps, as disclosed above. For example, the storage medium 830 may store the set of operations, and the processing circuitry 810 may be configured to retrieve the set of operations from the storage medium 830 to cause the controller device 800 to perform the set of operations. The set of operations may be provided as a set of executable instructions.

[0090] Thus, the processing circuitry 810 is thereby arranged to execute methods as herein disclosed. The storage medium 830 may also comprise persistent storage, which, for example, can be any single one or combination of magnetic memory, optical memory, solid state memory or even remotely mounted memory. The controller device 800 may further comprise a communications (comm.) interface 820 at least configured for communications with other entities, functions, nodes, and devices, such as the XR device, the wearable electronic device, and the EOG device. As such the communications interface 820 may comprise one or more transmitters and receivers, comprising analogue and digital components. The processing circuitry 810 controls the general operation of the controller device 800 e.g. by sending data and control signals to the communications interface 820 and the storage medium 830, by receiving data and reports from the communications interface 820, and by retrieving data and instructions from the storage medium 830. Other components, as well as the related functionality, of the controller device 800 are omitted in order not to obscure the concepts presented herein.

[0091] The controller device 800 may be provided as a standalone device or as a part of at least one further device. Thus, a first portion of the instructions performed by the controller device 800 may be executed in a first device, and a second portion of the of the instructions performed by the controller device 800 may be executed in a second device; the herein disclosed embodiments are not limited to any particular number of devices on which the instructions performed by the controller device 800 may be executed. Hence, the methods according to the herein disclosed embodiments are suitable to be performed by a controller device 800 residing in a cloud computational environment. Therefore, although a single processing circuitry 810 is illustrated in Fig. 8 the processing circuitry 810 may be distributed among a plurality of devices, or nodes. The same applies to the computer program 920 of Fig. 9.

[0092] Fig. 9 shows one example of a computer program product 910 comprising computer readable storage medium 930. On this computer readable storage medium 930, a computer program 920 can be stored, which computer program 920 can cause the processing circuitry 810 and thereto operatively coupled entities and devices, such as the communications interface 820 and the storage medium 830, to execute methods according to embodiments described herein. The computer program 920 and / or computer program product 910 may thus provide means for performing any steps as herein disclosed.

[0093] In the example of Fig. 9, the computer program product 910 is illustrated as an optical disc, such as a CD (compact disc) or a DVD (digital versatile disc) or a Blu-Ray disc. The computer program product 910 could also be embodied as a memory, such as a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or an electrically erasable programmable read-only memory (EEPROM) and more particularly as a non-volatile storage medium of a device in an external memory such as a USB (Universal Serial Bus) memory or a Flash memory, such as a compact Flash memory. Thus, while the computer program 920 is here schematically shown as a track on the depicted optical disk, the computer program 920 can be stored in any way which is suitable for the computer program product 910.

[0094] The inventive concept has mainly been described above with reference to a few embodiments. However, as is readily appreciated by a person skilled in the art, other embodiments than the ones disclosed above are equally possible within the scope of the inventive concept, as defined by the appended patent claims.

Claims

CLAIMS1. A controller device (800) for eye tracking and gesture recognition in a gaze-and-gesture tracking application that combines eye tracking for object selection and gesture recognition for action selection, wherein the gaze-and-gesture tracking application is to be run in an extended reality, XR, device (110), wherein the XR device (110) comprises a display (112, 210c), the controller device (800) comprising processing circuitry (810) configured to: activate a first user interface, UI, input mode for the gaze-and-gesture tracking application based on display -based eye tracking, wherein the display -based eye tracking is performed for detecting user selection of objects as rendered on the display; and responsive to detecting a critical low power condition: deactivate the display; and activate a second UI input mode for the gaze-and-gesture tracking application based on display-less eye tracking and / or gesture tracking, wherein the display -less eye tracking and / or gesture tracking is performed for detecting user selection of objects without any visual feedback.

2. The controller device (800) according to claim 1, wherein the first UI mode is based on a first combination of camera-based or electrooculography, EOG, based eye tracking and camera-based or electromyography, EMG, based gesture tracking, and wherein the second UI mode is based on a second combination of camera-based or EOG-based eye tracking and camera-based or EMG-based gesture tracking, different from the first combination.

3. The controller device (800) according to claim 2, wherein the first UI mode is based on camerabased eye tracking and camera-based gesture recognition, and the second UI mode is based on EOGbased eye tracking and EMG-based gesture recognition.

4. The controller device (800) according to claim 2, wherein the first UI mode is based on EOG-based eye tracking and camera-based gesture recognition, and the second UI mode is based on camera-based eye tracking and EMG-based gesture recognition.

5. The controller device (800) according to any preceding claim, wherein the critical low power condition is fulfilled when a power supply for the gaze-and-gesture tracking application falls below a power supply threshold level.

6. The controller device (800) according to any preceding claim, wherein the second UI input mode is associated with a lower resolution gaze detection requirement than the first UI input mode by being associated with fewer and / or more widely spaced selectable objects than the first UI input mode.

7. The controller device (800) according to claim 6, wherein the processing circuitry (810) is configured to, responsive to the display -less eye tracking associated with the second UI input mode indicating selection of at least two selectable objects in the display -less eye tracking: provide audible or tactile feedback to a user of the XR device (110), wherein the feedback is indicative of a region comprising the at least two selectable objects and / or is indicative of one of the at least two selectable objects for selection.

8. The controller device (800) according to claim 7, wherein that the feedback is to be indicative of said one of the at least two selectable objects being selected is based on a probability criterion, the probability criterion pertaining to at least one of: location of said one of the at least two selectable objects in the region, context of the gaze -and- gesture tracking application, amount of previous selections of said one of the at least two selectable objects.

9. The controller device (800) according to claim 7 or 8, wherein the processing circuitry (810) is configured to, responsive to said one of the at least two selectable objects not being selected: repeatedly provide audible or tactile feedback indicative of a respective one of the at least two selectable objects for selection until the object as indicated is selected.

10. The controller device (800) according to any of claims 7 to 9, wherein the processing circuitry (810) is configured to, responsive to the object that is being indicated is selected: update an accuracy estimation and / or calibration of eye tracking and gesture recognition of the second UI input mode at least for said region.

11. The controller device (800) according to claim 10, wherein the accuracy estimation and / or calibration is updated based on a difference in location between said one of the at least two selectable objects and the object that is selected.

12. The controller device (800) according to claim 10 or 11, wherein the accuracy estimation defines size of a gaze region for the eye tracking.

13. The controller device (800) according to claim 6, wherein the processing circuitry (810) is configured to: estimate a classification confidence for the second UI input mode; and responsive to detecting a low classification confidence condition for the classification confidence: update the second UI input mode to be associated with even lower resolution gaze detection requirement, and to be associated with even fewer gestures.1914. The controller device (800) according to claim 13, wherein the display-less eye tracking is updated to be associated only with selectable objects associated with gestures for which the classification confidence satisfies a high classification confidence condition.

15. The controller device (800) according to any preceding claim, wherein the processing circuitry (810) is configured to, responsive to having deactivated the display: activate audible or haptic feedback to be provided upon selection of a selectable object.

16. A system, the system comprising the controller device (800) according to any proceeding claim, and electrodes (120) and sensors (160) to be placed on a user (150) and configured to perform electrooculography -based eye tracking and electromyography -based gesture recognition.

17. The system according to claim 16, wherein the system further comprises a camera (130) configured to perform camera-based eye tracking and gesture recognition.

18. A method for eye tracking and gesture recognition in a gaze -and- gesture tracking application that combines eye tracking for object selection and gesture recognition for action selection, wherein the gaze- and-gesture tracking application is to be run in an extended reality, XR, device (110), wherein the XR device (110) comprises a display (112, 210c), the method being performed by a controller device (800), the method comprising: activating (SI 02) a first user interface, UI, input mode for the gaze-and-gesture tracking application based on display -based eye tracking, wherein the display-based eye tracking is performed for detecting user selection of objects as rendered on the display; and responsive to detecting a critical low power condition: deactivating (S104) the display; and activating (S106) a second UI input mode for the gaze-and-gesture tracking application based on display -less eye tracking and / or gesture tracking, wherein the display-less eye tracking and / or gesture tracking is performed for detecting user selection of objects without any visual feedback.

19. A computer program (920) for eye tracking and gesture recognition in a gaze-and-gesture tracking application that combines eye tracking for object selection and gesture recognition for action selection, wherein the gaze-and-gesture tracking application is to be run in an extended reality, XR, device (110), wherein the XR device (110) comprises a display (112, 210c), the computer program comprising computer code which, when run on processing circuitry (810) of a controller device (800), causes the controller device (800) to:20 activate (S102) a first user interface, UI, input mode for the gaze-and-gesture tracking application based on display -based eye tracking, wherein the display-based eye tracking is performed for detecting user selection of objects as rendered on the display; and responsive to detecting a critical low power condition: deactivate (SI 04) the display; and activate (SI 06) a second UI input mode for the gaze-and-gesture tracking application based on display -less eye tracking and / or gesture tracking, wherein the display-less eye tracking and / or gesture tracking is performed for detecting user selection of objects without any visual feedback.

20. A computer program product (910) comprising a computer program (920) according to claim 19, and a computer readable storage medium (930) on which the computer program is stored.