Method and system for eye field user interaction

By displaying small light markers within the user's eye movement field and combining this with machine learning analysis of gaze vectors, the problems of interaction latency and dependence on physical actions in traditional eye-tracking technologies are solved, enabling fast and reliable user interaction.

CN122308693APending Publication Date: 2026-06-30UNITY TECHNOLOGIES SAN FRANCISCO INC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
UNITY TECHNOLOGIES SAN FRANCISCO INC
Filing Date
2025-12-29
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Traditional eye-tracking technology suffers from interaction delays and reliance on physical movements in virtual and mixed reality environments, making it difficult to achieve efficient and reliable user interaction without requiring the user's hands to move.

Method used

By displaying small light markers within the user's eye-tracking field and combining machine learning analysis of gaze vectors, a dynamic movement mode is used to confirm the user's intent, reducing interaction time and improving accuracy.

Benefits of technology

It enables fast and reliable user interaction, reduces interaction latency, and is suitable for various virtual and mixed reality environments, especially where traditional physical input is impractical.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122308693A_ABST
    Figure CN122308693A_ABST
Patent Text Reader

Abstract

This disclosure relates to methods and systems for eye-tracking field user interaction. A method for implementing eye-tracking field user interaction is disclosed. One or more light markers are displayed within an eye-tracking field region of a head-mounted display device. The eye-tracking field region is positioned within a configurable distance of the user's eyes and at a distance at which the user's eyes can focus on one or more light markers. It is detected that the user is focusing on a first light marker among the one or more light markers. In response to detecting that the user is focusing on the first light marker, the first light marker is moved in a confirmation mode. A virtual input event is triggered when the user's gaze follows the first light marker through the confirmation mode with at least a threshold level of accuracy for a configurable duration.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention generally relates to eye-tracking-based user interaction systems in virtual and mixed reality environments, and in a particular embodiment, to a system for detecting user intent by eye gaze tracking of small light markers located in the user's eye field. Background Technology

[0002] Eye-tracking technology has been used in virtual and mixed reality environments for user interaction. Traditional methods rely on dwell time systems (where users must maintain their gaze on a target for an extended period) or composite systems (which require additional physical input such as button presses or gestures to confirm selections). While these conventional methods are available, they introduce inherent latency during interaction or require additional physical actions, which may not be ideal for all use cases. Summary of the Invention

[0003] One aspect of the present invention relates to a non-transitory computer-readable storage medium storing an instruction set that, when executed by one or more computer processors, causes the one or more computer processors to perform operations including: displaying one or more light markers within an eye-tracking field region of a head-mounted display device, the eye-tracking field region being located within a configurable distance of a user's eyes and at a distance at which the user's eyes can focus on the one or more light markers; detecting that the user is focusing on a first light marker among the one or more light markers; moving the first light marker in a confirmation mode in response to detecting that the user is focusing on the first light marker; and triggering a virtual input event based on determining that the user's gaze follows the first light marker through the confirmation mode with at least a threshold level of accuracy.

[0004] Another aspect of the invention relates to a system comprising: one or more computer processors; one or more computer memories; and an instruction set stored in the one or more computer memories, the instruction set configuring the one or more computer processors to perform operations including: displaying one or more light markers within an eye-tracking field region of a head-mounted display device, the eye-tracking field region being positioned within a configurable distance of a user's eyes and at a distance at which the user's eyes can focus on the one or more light markers; detecting that the user is focusing on a first light marker among the one or more light markers; moving the first light marker in a confirmation mode in response to detecting that the user is focusing on the first light marker; and triggering a virtual input event based on determining that the user's gaze follows the first light marker through the confirmation mode with at least a threshold level of accuracy.

[0005] Another aspect of the invention relates to a method comprising: displaying one or more light markers within an eye-tracking field region of a head-mounted display device, the eye-tracking field region being positioned within a configurable distance of a user's eyes and at a distance at which the user's eyes can focus on the one or more light markers; detecting that the user is focusing on a first light marker among the one or more light markers; moving the first light marker in a confirmation mode in response to detecting that the user is focusing on the first light marker; and triggering a virtual input event based on determining that the user's gaze follows the first light marker through the confirmation mode with at least a threshold level of accuracy. Attached Figure Description

[0006] The features and advantages of the invention will become apparent from the following detailed description taken in conjunction with the accompanying drawings, in which:

[0007] Figure 1 This is a schematic diagram illustrating an example head-mounted display;

[0008] Figure 2 This is a schematic diagram illustrating an example Mixed Reality User Interface (MR-UI) system;

[0009] Figure 3 This is a flowchart illustrating an example method for implementing eye-tracking field user interaction;

[0010] Figure 4 This is a flowchart illustrating an example method for implementing eye-tracking field user interaction;

[0011] Figure 5 This is a block diagram illustrating example software architectures that can be used in conjunction with the various hardware architectures described herein; and

[0012] Figure 6 This is a block diagram illustrating an example component of a machine configured to read instructions (e.g., from a machine-readable storage medium) and perform any one or more of the operations or methods discussed herein. Detailed Implementation

[0013] The following description illustrates systems, methods, techniques, instruction sequences, and computer program products that constitute illustrative embodiments of the subject matter of this disclosure. In this description, numerous specific details are set forth for illustrative purposes to provide an understanding of various embodiments of the subject matter of the invention. However, it will be apparent to those skilled in the art that embodiments of the subject matter of the invention can be practiced without these specific details.

[0014] Traditional eye-tracking systems face fundamental challenges due to their reliance on extended dwell times, typically requiring users to maintain eye contact with the target (e.g., for one to two seconds) to trigger an action. This introduces significant latency, which degrades the user experience and limits the practical application of the technology. Furthermore, existing systems may require supplemental physical input such as finger taps or button presses to confirm selections, which limits accessibility and usability when the user's hands are occupied or physical input is impractical.

[0015] A technical limitation of conventional systems is that, at typical interaction distances, they cannot reliably distinguish between intentional selection and accidental gaze intersections. When markers or interface elements are positioned at standard distances (e.g., about 1 meter from the user), existing systems may misinterpret a user viewing background objects as intentional interaction.

[0016] The described implementation provides a novel technical solution by, for example, positioning small light markers within the user's eye movement field (e.g., within centimeters of the user's eyes while both eyes can still focus). This precise positioning creates an unprecedented signal-to-noise ratio for gaze detection because there are typically no other objects competing for attention in this space.

[0017] In an example implementation, the light marker may include a shape (e.g., an orb or sphere), a single point of light, or a set color, displayed on one or more 2D stereoscopic displays and appearing at a fixed position in the user-perceived 3D space.

[0018] While light markers can have defined shapes, unlike traditional user interface elements such as pictographs or icons, light markers do not need to include iconography. Light markers are simpler and therefore can appear much smaller to the user while still retaining their functionality.

[0019] For certain applications, such as training new users or users with impaired vision, the light markers can be configured to be larger than normal, or icons can be added to them, but icons are not required.

[0020] The described implementation also advances the technology by innovatively using a confirmation mode instead of static dwell time. When a user’s focus on a marker is detected, the system (e.g., an eye-tracking field interaction system) initiates a dynamic movement mode (e.g., by moving the marker in a direction away from the user, including optionally keeping other markers in their original positions).

[0021] This method integrates traditional dwell time into the motion vector, allowing for much faster interaction times and reducing the required confirmation period (e.g., from about one second to a quarter of a second) while maintaining high reliability.

[0022] In the example implementation, the system incorporates sophisticated machine learning capabilities to continuously improve accuracy and / or reduce required interaction time. The ML model analyzes the time series of gaze vectors, rather than just instantaneous intersections, learning to distinguish patterns indicating intersections between intentional user intent and incidental gazes. This enables the dynamic adjustment of activation thresholds based on detected user behavior patterns, representing a significant improvement over traditional systems that rely solely on static geometric calculations.

[0023] This implementation is designed for maximum flexibility and efficiency, supporting both high-level application integration (e.g., via scripts, such as C# scripts) and low-level firmware implementations to enhance performance.

[0024] In the example implementation, the system directly accesses the eye-tracking camera at the operating system level or even lower (e.g., firmware level) to create a 3D model of each eye and determine precise eye orientation and pointing. This allows for faster processing of gaze vectors compared to relying on higher-level APIs.

[0025] This implementation includes a calibration routine that measures the pixel error rate during an initial calibration phase when the user focuses on a known reference point. These error measurements can be used to define a precise confidence region for gaze intersection detection at the hardware level. The operating system or firmware can also dynamically adjust these confidence regions based on data from environmental sensors (e.g., illumination sensors) from the HMD.

[0026] Each of the lower-level implementations(s) can retain the same core functionality as the application-level version but offers improved performance because it eliminates several layers of software abstraction. In the example implementation, this approach is adapted to different hardware implementations to address variations in lens distortion and / or other hardware-specific characteristics.

[0027] In an example implementation, the system includes advanced calibration capabilities for measuring and / or resolving camera device defects, as well as dynamic confidence zone adjustment based on environmental conditions and / or head movement. The markers themselves can be programmed to represent various functions, similar to physical controller buttons, and can be remapped based on application requirements.

[0028] This technological solution not only addresses the core limitations of existing eye-tracking interfaces but also enables new use cases, such as benefiting users with disabilities or those for whom traditional physical input is impractical. The system's optional integration with XR input systems positions it to serve the growing VR / AR application market, while its minimal computational overhead and robust error handling make it suitable for widespread deployment.

[0029] In an example implementation, a system and method for eye-tracking-based user interaction in virtual and mixed reality environments are disclosed, which significantly reduces interaction latency compared to conventional dwell time methods.

[0030] In an example implementation, the system displays one or more small light markers (e.g., dots) positioned within the user's eye-tracking field, which may be defined as very close to the user's eyes, but at a distance where both eyes can still focus (e.g., based on one or more configurable eye-tracking field parameters). In the example implementation, the light markers do not need to be fully focused and can be activated when the user's eyes are not focused (e.g., within a configurable focus threshold).

[0031] These points can be colored to stand out from or blend into the environment (e.g., depending on user preferences and experience levels).

[0032] In an example implementation (e.g., during an idle state), one or more light spots are positioned within the user's eye-tracking field as potential interaction points.

[0033] In the example implementation, the system detects a specific point using eye-tracking vectors from one or both eyes when the user focuses on it. Because the points are close together, the system achieves high confidence in the user's intent, as there are typically no other objects in the eye-tracking field space.

[0034] In an example implementation, upon detecting user focus, the system moves the selected point in a predetermined pattern (e.g., a pattern of moving away from the user). The pattern can be a simple linear movement, or one or more complex patterns such as a spiral. The pattern can include movement at a fixed distance within a plane, and / or the pattern can include variations in depth.

[0035] In the example implementation, the system tracks whether the user's gaze follows the moving point. If the user maintains focus through the pattern with sufficient accuracy (e.g., above an 80% confidence threshold), the system triggers a virtual button press event.

[0036] In the example implementation, the computational impact is minimal because the main processing overhead comes from the feed analysis of the existing eye-tracking camera device.

[0037] In the example implementation, points can be programmed to represent various functions, similar to physical controller buttons, and can be remapped based on application requirements.

[0038] In the example implementation, the system is particularly suitable for media applications, messaging and / or scenarios where users need rapid response capabilities but their hands are occupied.

[0039] In an example implementation, machine learning can be incorporated to improve the accuracy of intent detection by analyzing gaze patterns over time, potentially allowing for even shorter confirmation times.

[0040] In the example implementation, the model is trained on various eye movements and lens deformations to ensure robust performance across different hardware implementations.

[0041] In the example implementation, the system integrates with an XR package (such as Unity's XR package) as a virtual controller, allowing developers to easily map points to specific functions within their applications.

[0042] In the example implementation, this standardized interface works across various VR / AR platforms that support eye tracking, making it a universal solution for hands-free interaction in virtual environments.

[0043] A method for implementing eye-tracking field user interaction is disclosed. One or more light markers are displayed within the eye-tracking field region of a head-mounted display device. The eye-tracking field region is positioned within a configurable distance of the user's eyes and at a distance where the user's eyes can focus on one or more light markers. It is detected that the user is focusing on a first light marker among the one or more light markers. In response to detecting that the user is focusing on the first light marker, the first light marker is moved in a confirmation mode (e.g., while keeping other light markers in their original positions). A virtual input event is triggered when the user's gaze follows the first light marker through the confirmation mode with at least a threshold level of accuracy (e.g., for a configurable duration).

[0044] Throughout this description, the term mixed reality or mixed reality environment (MR environment) should be understood to encompass all combined environments in the range between reality and virtual reality, including virtual reality, augmented reality, and augmented virtual reality.

[0045] This disclosure includes apparatus (including data processing systems that perform the operations or methods disclosed herein) for performing the operations or methods, and a computer-readable medium including instructions that, when executed by one or more processors of one or more data processing systems, cause one or more data processing systems to perform the operations or methods.

[0046] Figure 1This is a diagram of an example head-mounted display (HMD, also referred to herein as an HMD device) 102 worn by a user (or “wearer”) 100. In the example implementation, the user 100 (e.g., a game developer) experiences a VR or augmented reality (AR) environment while wearing the HMD 102. The HMD device 102 includes a transparent or semi-transparent visor (or “one or more lenses”) 108 through which the wearer 100 views their surrounding environment (also referred to herein as the “real world”). In other implementations, the HMD device 102 may include an opaque visor 108 that may obstruct the wearer 100’s view of the real world and display the full virtual environment on the visor 108.

[0047] In the example implementation, HMD 102 also includes a display device 118 that renders graphics (e.g., virtual objects) onto the goggles 108. Therefore, the goggles 108 act as a “screen” or surface on which the output of the display device 118 appears, and the wearer 100 experiences the virtual content through this “screen” or surface. The display device 118 is driven or controlled by one or more graphics processing units (GPUs) 106. The GPUs 106 process aspects of the graphics output that contribute to accelerating the rendering of the output through the display device 118.

[0048] In an example implementation, HMD device 102 also includes a central processing unit (CPU) 104 that can perform some of the operations and methods described herein. HMD device 102 also includes an audio device 112 (e.g., a speaker) configured to present audio output to wearer 100. Although not shown separately, HMD device 102 also includes a wired or wireless network adapter (e.g., Wi-Fi, Bluetooth, cellular) that facilitates communication between the HMD and other computing devices described herein.

[0049] In some embodiments, HMD device 102 includes a digital camera device 110. The digital camera device (or simply "camera device") 110 is a forward-facing video input device oriented to capture at least a portion of the wearer's field of view (FOV). In other words, the camera device 110 captures or "sees" a real-world perspective (e.g., similar to what the wearer 100 sees in their FOV when viewing through goggles 108) based on the orientation of HMD device 102. The camera device 110 can be configured to capture real-world digital video of the user 100's surroundings (e.g., the field of view, peripheral view, or 360° view around the wearer 100). The camera device 110 can be used to capture digital video of the actual real-world environment surrounding the user 100. In some embodiments, the output from the digital camera device 110 can be projected onto goggles 108 (e.g., in an opaque goggle embodiment) and may also include additional virtual content (e.g., added to the camera device output). In some implementations, a depth camera may also be present on the HMD 102.

[0050] In some embodiments, HMD device 102 may include one or more sensors (not shown separately), or may be coupled to sensors via wired or wireless communication. For example, HMD 102 may include motion or position sensors configured to determine the position or orientation of HMD 102. In some embodiments, HMD device 102 may include a microphone for capturing audio input, such as the voice of user 100.

[0051] In some implementations, HMD 102 can be similar to a virtual reality HMD, such as the Oculus Rift. TM HTC Vive TM PlayStation VR TM In some implementations, HMD 102 can be similar to an augmented reality HMD, such as Microsoft HoloLens. TM or Meta TM HMD. In some implementations, user 100 may hold one or more hand-tracking devices (“handheld devices”) (not in the context of HMD). Figure 1 (Shown separately) (e.g., one for each hand). The handheld device provides information about the absolute or relative position and orientation of the user's hand, and is therefore able to capture hand gesture information. The handheld device can be configured to operate directly with the HMD 102 (e.g., via wired or wireless communication). In some implementations, the handheld device may be an Oculus Touch. TM Hand controller, HTC Vive TMHand tracker or PlayStation VR TM Hand controllers. The handheld device may also include one or more buttons or joysticks built into the handheld device. In other embodiments, user 100 may wear one or more wearable hand tracking devices (e.g., motion tracking gloves, not shown), such as those commercially available from Manus VR (Netherlands). In other embodiments, user 100's hand movements may be tracked without a handheld device or wearable hand tracking device, or in addition to a handheld device or wearable hand tracking device, via hand position sensors (not shown, e.g., using optical methods to track the position and orientation of the user's hand) (such as, for example, those commercially available from Leap Motion (California). Such hand tracking devices (e.g., handheld devices) track the position of one or more of the user's hands during operation.

[0052] During operation, in the example implementation, the HMD 102 is mounted on the head of the wearer 100 and above the wearer's eyes, as shown. Figure 1 As shown. A virtual environment can be presented to the wearer 100, which can be viewed and edited via HMD102 and handheld devices, as described herein.

[0053] In the example implementation, the eye-tracking field interaction system uses... Figure 1 Several components of the head-mounted display (HMD) 102 shown are implemented. The display device 118 renders small light markers (dots) within centimeters of the user's eyes via (e.g., transparent or semi-transparent) goggles 108, positioning them within the user's eye movement field where both eyes can still focus. A digital camera device 110 provides eye-tracking capabilities, capturing precise gaze vectors from each eye to enable high-confidence detection of when the user is focused on a particular marker. A CPU 104 processes the eye-tracking data in real time to determine gaze intersections and confidence levels, while also controlling the movement patterns of the markers during the confirmation sequence. A GPU 106 processes the rendering of the markers, ensuring they maintain correct positioning and visibility while performing movement patterns. In the example implementation, an audio device 112 may provide optional feedback during interaction.

[0054] In the example implementation, the firmware implementation utilizes Figure 1 The hardware components of the HMD shown include direct access to the digital camera device 110 for eye tracking, direct access to the CPU 104 for real-time vector processing, and direct access to the display device 118 for rendering light markers. By operating at the firmware level, the system can optimize communication between these hardware components to reduce latency.

[0055] Figure 2This is a component diagram of a Mixed Reality User Interface System 200 (or MR-UI system), which includes components related to... Figure 1 The discussed HMD 102 and handheld devices are similar components. In an example implementation, the MR-UI system 200 includes an MR-UI device 202, an MR display device 204, and one or more MR input devices (or MR hardware) 206. In some implementations, the MR display device 204 may be similar to the goggles 108, and the (one or more) MR input devices 206 may be similar to those mentioned above. Figure 1 The described handheld device or other tracking device is similar.

[0056] In an example implementation, MR-UI device 202 includes memory 220, one or more CPUs 222, and one or more GPUs 224. In some implementations, CPU 222 may be similar to CPU 104, GPU 224 may be similar to GPU 106, and MR-UI device 202 may be at least a portion of HMD 102. In some implementations, the MR-UI system 200 and various associated hardware and software components described herein may provide AR content in place of VR content, or may provide AR content in addition to VR content (e.g., in a mixed reality (MR) environment). It should be understood that the systems and methods described herein can be performed with AR content, and therefore, the scope of this disclosure covers both AR and VR applications.

[0057] In an example implementation, the MR-UI device 202 includes an MR engine (or MR software) 212 (e.g., mixed reality software) executed by a CPU 222 and / or a GPU 224 to provide an MR environment to an MR display device 204 (e.g., to user 100). The MR engine 212 includes an MR-UI module 210 that implements various aspects of mixed reality user interface actions for user 100 within the MR environment, as described herein. Throughout the description, the MR environment includes a coordinate system referred to as world coordinates. The MR-UI module 210 may be implemented within or communicate with a larger, more general MR software application such as the MR engine 212 (e.g., a mixed reality editing application).

[0058] The MR-UI module 210 and MR engine 212 include computer-executable instructions residing in memory 220, which are executed during operation by the CPU 222 and optionally in conjunction with the GPU 224. The MR engine 212 communicates with the MR display device 204 (e.g., HMD 102) and also with other MR hardware, such as one or more MR input devices 206 (e.g., motion capture devices such as handheld devices). The MR-UI module 210 may be directly integrated within the MR engine 212 or may be implemented as an external part of the software (e.g., a plugin).

[0059] In the example implementation, Figure 2 The system architecture shown illustrates how the components in the MR-UI system 200 can work together. The MR-UI module 210, operating within the MR software 212, contains the core logic for managing the eye-tracking field interaction system. This includes displaying and controlling light markers, processing eye-tracking data from the camera device, detecting the user's focus on a specific marker, implementing a confirmation mode, and / or triggering virtual button events upon successful confirmation. The memory 220 stores necessary configuration data, including marker color, position, movement mode, and / or a mapping of markers to virtual button functions.

[0060] The CPU 222 and GPU 224 can work together to process eye-tracking vectors in real time, calculate gaze intersections, generate marked movement patterns, and / or maintain high-confidence thresholds for interaction detection. The architecture supports both high-level implementations (e.g., via C# scripts) and low-level implementations (e.g., firmware or OS-level) to enhance performance, where computational impact is kept to a minimum, as the primary processing overhead comes from the analysis of feeds from existing eye-tracking camera devices.

[0061] Eye-tracking field interaction system can Figure 3 The example method implementation shown is as follows: Figure 3 A flowchart 300 showing the operation steps is provided.

[0062] At operation 302, a request to display UI elements (e.g., light markers or dots) is received on the HMD device.

[0063] In action 302, a request to display UI elements can be triggered in one or more ways based on the system's implementation. When integrated as part of an XR input system, a request can be generated when application developers map specific functions to virtual buttons represented by dots. For example, a request can be initiated when an application needs to display media playback controls, messaging interactions, and / or quick confirmation messages.

[0064] In the example implementation, while UI elements can be shown and then hidden based on context, they can also be persistently retained. This can be useful for system context items (e.g., "Return to Home," "Back," or "Reset") and for user function shortcuts (e.g., "Take a Photo" or "Turn the Flash On / Off"). This can be similar to buttons (e.g., Android buttons) that typically run along the bottom of the display and have configuration options to show or hide system UI buttons. Such buttons can be configured to be persistently displayed, context-dependent, or gesture-based.

[0065] Requests can also originate from application needs to provide actionable gesture input for heads-up displays (HUDs) or for interacting with objects in a mixed reality environment. The system can generate requests to display multiple points simultaneously, each mapped to a different function, similar to physical controller buttons that can be reassigned based on the application.

[0066] For accessibility use cases, a request can be triggered when the system detects that traditional input methods are unavailable, or when the user has configured the system to prefer eye-tracking-based interaction. This may be particularly relevant for users who may have temporary or long-term physical movement limitations.

[0067] The request may include one or more configuration parameters that specify, for example, the color of the dot (e.g., a high-contrast color can be set for new users, or a subtle / blended color can be set for experienced users), the position of the dot within the eye-tracking field, the mapping of the dot to a specific virtual button function, and / or the confirmation mode for that specific interaction.

[0068] At operation 304, the system can analyze sensor data from the HMD to determine the device's position, orientation, velocity, and / or the user's visual cone. In an example implementation, the position of the light marker can be determined regardless of the HMD's position or orientation (e.g., based on where the user's eyes are looking).

[0069] In an example implementation, sensor data from the HMD device can be analyzed to determine the positioning parameters for achieving correct eye-tracking field interaction. The system can use eye-tracking data from one or more camera devices to determine the precise gaze vector for each eye. Sensor analysis determines the HMD's position, orientation, velocity, and / or the user's visual cone. This sensor data analysis is useful for establishing optimal localization of light markers within the user's eye-tracking field.

[0070] In an example implementation, the system processes the sensor data to calculate the intersection between two eye-tracking vectors, the intersection having a defined confidence region to address camera device defects.

[0071] For optimal performance, the system can incorporate machine learning to analyze gaze patterns over time, helping to improve the accuracy of position and orientation detection. This analysis can help ensure that the light markers are positioned at a distance where both eyes can focus, which can be useful for system reliability. Sensor data analysis also helps prevent misinterpretation of user intent by ensuring that the markers are positioned close enough to avoid confusion with background objects.

[0072] The operating system polling mentioned in Operation 304 provides an alternative or supplementary method for obtaining this localization data when direct sensor access is not optimal. This two-path approach ensures robust tracking of HMD spatial parameters, which may help maintain accurate eye-tracking field interactions.

[0073] At operation 306, based on display style and / or position instructions, one or more UI elements are displayed at a carefully controlled distance within the eye-tracking field of a user in which both eyes can focus, via an HMD display device.

[0074] In the example implementation, one or more UI elements are displayed on the HMD display device based on display style instructions and position instructions.

[0075] The Display Style directive controls parameters such as the dot's color and opacity, allowing it to be configured for high contrast to help new users, or for a softer / blended look for experienced users. The Position directive ensures the dot is placed within centimeters of the user's eyes in the eye-tracking field, at a distance where both eyes can still focus.

[0076] In the example implementation, the color can be changed to reflect a notification from the program, for example, changing from the default green to notification red.

[0077] In an example implementation, the opacity or intensity can be changed (e.g., changing the pulsating bright red to a pale red, repeating) to attract the user's attention.

[0078] One or more UI elements are displayed through transparent or semi-transparent goggles of the HMD to make them visible to the user in a mixed reality environment. The system carefully controls positioning to maintain high confidence in gaze detection by ensuring that markers are held at an optimal distance to prevent confusion with background objects. This positioning allows the system to achieve a high signal-to-noise ratio when detecting user intent.

[0079] Display operations can take into account previously analyzed sensor data regarding HMD position, orientation, and / or the user's visual cone to ensure optimal placement of UI elements. The system can display multiple points simultaneously, each mapped to a different function, similar to physical controller buttons that can be reassigned based on the application. These points serve as potential interaction points, which the user can activate through gaze tracking to achieve functions such as media playback control, message sending and receiving interactions, or quick message confirmation.

[0080] At operation 308, the HMD distance and / or angle relative to a predetermined threshold are continuously monitored. This monitoring ensures optimal positioning of the light marker for reliable eye tracking and user comfort.

[0081] In the example implementation, distance and angle comparisons are performed to ensure optimal interaction conditions. The system compares the HMD distance to a predetermined threshold distance and the HMD angle to a threshold angle. This comparison can be useful for maintaining a high signal-to-noise ratio, which makes the eye-tracking field interaction system effective.

[0082] Distance thresholds can be used to ensure that the light marker remains within centimeters of the user's eye, within the eye-tracking field where both eyes can still focus. This close positioning helps prevent confusion between the marker and background objects that might be presented at a greater distance. For example, if the marker is presented 1 meter away from the user, the system is very likely to misinterpret the user viewing a glass of water at the same distance as viewing the marker.

[0083] Angle threshold comparison helps ensure that the user's head orientation allows for comfortable and accurate eye tracking. In the example implementation, the system relies on accurate gaze vectors from each eye to determine the user's intent. Angle comparison helps maintain optimal conditions for the eye-tracking camera to capture accurate gaze data and for the system to calculate reliable intersections between two eye-tracking vectors.

[0084] These threshold comparisons work in conjunction with the system's ability to process eye-tracking data in real time, allowing for rapid adjustments to maintain interaction accuracy while preventing eye fatigue and accidental activation. The thresholds can be configured based on application requirements and user comfort needs.

[0085] At operation 310, determine whether the HMD distance exceeds the threshold distance or whether the angle exceeds the threshold angle.

[0086] In the example implementation, when the HMD distance exceeds a distance threshold or the HMD angle exceeds a threshold angle, the system proceeds to operation 312 to stop displaying UI elements. In the example implementation, the eye-tracking field interaction system needs to precisely position the light markers within centimeters of the user's eyes to maintain the high signal-to-noise ratio required for accurate gaze detection.

[0087] The threshold evaluated in Operation 310 can be configured based on application requirements and user comfort needs. If the evaluation determines that the threshold has been exceeded, the system transitions to Operations 312 and 314 to manage UI elements and begin monitoring HMD speed. This ensures that the system maintains optimal interaction conditions before re-enabling the display of UI elements.

[0088] At operation 312, if the HMD exceeds a threshold distance or angle exceeds a threshold angle, the system stops displaying UI elements. This helps prevent eye strain by ensuring that markers remain within the optimal eye-tracking field distance and maintains high confidence in gaze detection.

[0089] This operation can be useful for maintaining a high signal-to-noise ratio for the system by ensuring that optical markers are displayed only when they can be reliably detected and tracked. In an example implementation, if a marker remains visible outside its optimal positioning parameters, the system removes the visual marker to prevent potential eye strain and unreliable interaction.

[0090] This operation works in conjunction with the system's ability to process eye-tracking data in real time, allowing for an immediate response when positioning conditions become suboptimal. This rapid response helps maintain system reliability and user comfort by preventing situations where the user might attempt to interact with markers that are not in the ideal position for accurate gaze detection.

[0091] After ceasing the display of UI elements, the system transitions to operation 314, in which it begins monitoring the HMD velocity to determine when conditions may again become favorable for displaying the marker. This creates a smooth transition between the active and inactive states of the eye-tracking field interaction system, ensuring that the marker reappears only when the system can maintain its high confidence threshold for gaze detection.

[0092] At operation 314, the system deletes or ignores any previous anchor locations and begins monitoring the HMD velocity over time.

[0093] In the example implementation, this operation is triggered after the system stops displaying UI elements due to exceeding a distance or angle threshold. Deleting the previous anchor point position helps prevent any residual reference points that might interfere with future interaction attempts.

[0094] Speed ​​monitoring helps determine when conditions may become favorable again for displaying UI elements. The system tracks the movement speed of the HMD to ensure stable conditions for accurate eye tracking and reliable user interaction. This speed data is used in subsequent operation 316 to assess whether the HMD movement has become stable enough to potentially resume displaying UI elements.

[0095] This operation is performed via Figure 2 The MR hardware components shown (including motion sensors that can detect the position, orientation, and velocity of the HMD device) utilize the motion tracking capabilities of the MR-UI system. This monitoring process is crucial for maintaining the high signal-to-noise ratio required for accurate eye tracking and gaze detection, as rapid head movements can impair the system's ability to accurately track eye movement vectors.

[0096] In the example implementation, the HMD senses both relative and absolute motion. For example, if a user is on a train or airplane, the absolute motion vector can be very high, while the local / relative motion vector can still be small enough to enable user interaction.

[0097] In the example implementation, the user can adjust / switch the motion detection threshold. For example, a user who is frequently running or climbing (e.g., a public safety official) might choose a higher setting at the expense of accuracy.

[0098] This operation helps ensure the reliability of the eye-tracking field interaction system by attempting to display UI elements only when the user's head movement is sufficiently stable.

[0099] At operation 316, the system checks if the HMD speed is below a threshold. This speed monitoring helps ensure stable conditions for eye tracking and marker display.

[0100] In the example implementation, operation 316 represents determining whether conditions are suitable for a speed check to re-enable UI elements. The system assesses whether the HMD speed has dropped below a predetermined threshold. This speed check is useful because rapid head movements can impair the system's ability to accurately track eye movement vectors and maintain the high signal-to-noise ratio required for reliable gaze detection.

[0101] The speed threshold serves as a stability indicator; for example, when the HMD moves slowly enough, it indicates that the user has achieved a relatively stable head position suitable for eye-tracking interaction. If the speed remains above the threshold, the system proceeds along the "No" path to operation 318, continuing to suppress the display of UI elements.

[0102] This operation, combined with a machine learning component that can analyze gaze patterns over time, helps determine the optimal conditions for re-enabling UI elements. A speed check helps ensure that when the system does resume displaying the light markers, it maintains its ability to detect the precise intersection between two eye-tracking vectors with high confidence.

[0103] This operation is useful for maintaining system reliability and preventing accidental activation that may occur when displaying UI elements during periods of significant head movement.

[0104] At operation 318, if the speed remains above a threshold, the system suppresses the display of UI elements. This prevents interactions attempted during rapid head movements that could compromise accurate eye tracking.

[0105] In the example implementation, this operation serves as a safeguard against attempted interactions during periods of significant head movement that could compromise accurate eye tracking. Working in conjunction with machine learning components capable of analyzing gaze patterns over time, this helps ensure that UI elements only appear when the system can maintain reliable eye vector tracking. This is useful for maintaining a high signal-to-noise ratio, which is essential for effective eye-field interaction systems.

[0106] By suppressing the display of UI elements during high-speed periods, the system helps prevent accidental activation and maintain user comfort. This can be useful because the system relies on precisely positioning the light markers within centimeters of the user's eyes (where both eyes can focus). This operation continues until the speed drops below a threshold, at which point the system can... Figure 3 The feedback loop shown has resumed normal operation.

[0107] This operation helps ensure the reliability of the eye-tracking field interaction system by displaying UI elements only when stable conditions for accurate gaze detection are present. In the example implementation, larger UI elements and / or slower acknowledgment movements can be used to optionally engage in fallback mode to increase the signal-to-noise ratio.

[0108] In an example implementation, the method utilizes the MR-UI module 210 within MR software 212, thereby utilizing both CPU 222 and GPU 224 to process eye-tracking vectors and render markers.

[0109] Optional machine learning components can be integrated to analyze gaze patterns over time, potentially reducing confirmation time while maintaining an accuracy threshold. The entire process operates with minimal computational overhead, as the primary processing cost comes from the analysis of feeds from existing eye-tracking camera devices.

[0110] In the example implementation, the system employs machine learning techniques to improve eye-tracking accuracy and reduce the required dwell time while maintaining high detection confidence. Specifically, the machine learning model analyzes historical gaze vector data across multiple time slices to identify patterns that indicate the intersection of intentional and unintentional gazes.

[0111] The machine learning system is trained on a dataset containing eye-tracking data with labeled examples of both intentional selection and accidental gaze intersections. This allows the model to learn nuanced patterns of how a user's eyes move when intentionally focusing on and following a target (e.g., a light marker or confirmation pattern) versus when their gaze happens to cross the target area. By analyzing time series of gaze vectors, rather than just instantaneous intersections, the system can achieve greater accuracy in distinguishing intentional selection. One or more models can be trained to detect focus and / or whether a gaze follows a confirmation pattern with a certain level of confidence.

[0112] The technological improvement achieved through machine learning methods lies in the ability to dynamically adjust the activation threshold based on detected user behavior patterns. Instead of using fixed geometric thresholds around the target point, the system learns the optimal threshold that maintains high detection confidence while minimizing the required dwell time. This represents a technological advancement compared to traditional eye-tracking systems that rely solely on static geometric calculations.

[0113] Machine learning models analyze one or more features, such as time series of gaze vector intersections over multiple time slices; eye movement speed and / or acceleration patterns; correlation between eye movement and / or target motion during confirmation patterns; and / or historical accuracy of different types of activation patterns.

[0114] This allows the system to achieve specific improvements in eye-tracking technology by reducing the required dwell time while maintaining reliable detection of user intent. The machine learning system continues to adjust and improve its detection accuracy over time based on ongoing analysis of successful and unsuccessful activation attempts.

[0115] This implementation leverages the existing eye-tracking camera and processing capabilities of MR-UI systems, where a machine learning model operates at the application level or lower to enhance performance. Computational overhead is minimal because the model analyzes the same gaze vector data already collected for basic eye-tracking functionality.

[0116] This machine learning method offers a technical solution to the specific problem of distinguishing between intentional eye-tracking-based choices and accidental gaze intersections, representing an improvement over eye-tracking computer interface technology that enables faster and more reliable interaction. The system demonstrates a practical application of machine learning to enhance the core functionality of eye-tracking interfaces, rather than merely realizing abstract concepts.

[0117] In an example implementation, a method for implementing eye-tracking field user interaction is disclosed. One or more light markers are displayed within an eye-tracking field region of a head-mounted display device. The eye-tracking field region is located within a configurable distance of the user's eyes and at a distance where the user's eyes can focus on one or more light markers. It is detected (e.g., using one or more eye-tracking cameras of the head-mounted display device) that the user is focusing on a first light marker among the one or more light markers (e.g., by determining the intersection between gaze vectors from each of the user's eyes). In response to detecting that the user is focusing on the first light marker, the first light marker is moved in a confirmation mode (e.g., while keeping other light markers in their original positions). In an example implementation, the confirmation mode includes moving the first light marker away from the user. A virtual input event is triggered when the user's gaze follows the first light marker through the confirmation mode with at least a threshold level of accuracy for a configurable duration (e.g., less than one second).

[0118] In an example implementation, displaying one or more light markers includes displaying light markers with configurable visual attributes, including at least one of color and opacity, wherein the color and opacity can be configured between a high contrast setting for new users and a low contrast setting for experienced users.

[0119] In an example implementation, displaying one or more light markers includes positioning the light markers at a distance from the user's eyes that is greater than the near point where the eyes can focus and closer than the distance at which background objects might interfere with gaze detection.

[0120] In an example implementation, displaying one or more light markers includes displaying a variable number of markers with configurable positions, wherein each marker is mapped to a different function that can be reassigned based on the active application.

[0121] In an example implementation, displaying one or more light markers includes temporarily disabling the display of the markers when an unknown object is detected within the eye-tracking field region, until the object is cleared.

[0122] In an example implementation, detecting that a user is focusing on a first light marker includes: generating a first gaze vector of the user's first eye and a second gaze vector of the user's second eye based on eye-tracking camera device data; calculating the intersection point between the first gaze vector and the second gaze vector within a confidence region, wherein the confidence region addresses camera device defects; and / or determining that the intersection point intersects with the first light marker within the confidence region.

[0123] In an example implementation, calculating the intersection point includes using mathematical formulas to find the intersection point between two gaze vectors and / or defining a confidence region based on the accuracy capabilities of the tracking camera device.

[0124] In an example implementation, generating a gaze vector includes: creating a 3D model of each eye; determining the orientation and pointing direction of each eye based on the 3D model; and / or generating a gaze vector based on the determined orientation and pointing direction.

[0125] In the example implementation, machine learning is applied to analyze gaze vectors across multiple time slices to improve the accuracy of intersection detection, and one or more activation thresholds are adjusted based on the machine learning analysis (e.g., to reduce the required dwell time while maintaining detection confidence).

[0126] In an example implementation, calculating the intersection point includes calibrating the eye-tracking camera device (e.g., by measuring the pixel error rate during a calibration phase when the user is focused on a known reference point and / or defining a confidence region based on the measured pixel error rate).

[0127] In an example implementation, defining the confidence region includes: determining a minimum confidence region size based on the measured pixel error rate, the minimum confidence region size maintaining the target signal-to-noise ratio for gaze detection; and / or adjusting the confidence region size based on ambient lighting conditions detected by the illumination sensor of the head-mounted display device.

[0128] In the example implementation, machine learning is applied to analyze historical intersection detection accuracy data to dynamically adjust the confidence region size while maintaining the target signal-to-noise ratio and / or reducing the required dwell time as intersection detection accuracy improves.

[0129] In an example implementation, calculating intersections includes filtering out intersections that occur when an unknown object is detected within the eye-tracking field and / or temporarily expanding the confidence region size when the head movement speed exceeds a threshold.

[0130] In an example implementation, the system calculates the intersection point between two gaze vectors (e.g., one for each eye) by finding where the vectors converge in 3D space. This may involve creating a 3D model of each eye and determining the orientation and pointing direction of each eyeball.

[0131] In an example implementation, the system addresses camera device defects by establishing a confidence region around the calculated intersection point. The size of this confidence region can be determined by the pixel error rate measured during calibration when the user is focusing on a known reference point, ambient lighting conditions detected by the HMD's illumination sensor, and / or a head movement speed threshold.

[0132] In an example implementation, the machine learning model analyzes the time series of intersections across multiple time slices to improve accuracy. This may involve: weighting corresponding data points by their effect size; using multiplication to calculate the weighted values; and / or using addition to sum the weighted values.

[0133] In an example implementation, at a lower level (e.g., firmware or operating system level), the system performs direct vector computation using a 3D eye model to determine the precise gaze intersection with minimal latency. This can involve processing eye-tracking vectors in real time while maintaining a high confidence threshold for interaction detection.

[0134] This method integrates traditional dwell time into the motion vector, rather than the time vector, allowing for much faster interaction times while maintaining accuracy.

[0135] In an example implementation, a lower-level implementation (e.g., a firmware implementation) creates a 3D model of each eye by directly accessing the eye-tracking camera, allowing the system to determine precise eye orientation and pointing direction with minimal latency. The system processes this raw camera data to calculate the intersection between two eye-tracking vectors, establishing a confidence region that addresses limitations of the camera.

[0136] In the example implementation, the machine learning model analyzes the time series of gaze vectors across multiple time slices, learning to distinguish patterns indicating intersections between intentional user intent and accidental gazes. The model is trained on a labeled dataset containing examples of both intentional selections and accidental gaze intersections. This allows the system to dynamically adjust activation thresholds based on detected user behavior patterns while maintaining high detection confidence.

[0137] In an example implementation, the system includes a calibration routine that measures the pixel error rate during the initial phase when the user focuses on a known reference point. These error measurements define a precise confidence region for gaze intersection detection at the hardware level. The confidence region is dynamically adjusted based on ambient lighting conditions from the HMD's illumination sensor, head movement speed thresholds, and / or historical accuracy of different activation modes.

[0138] In the example implementation, this approach optimizes communication between HMD components through direct firmware access to the eye-tracking camera device, real-time vector processing on the CPU, hardware-level confidence region calculation, and / or dynamic threshold adjustment.

[0139] This technological solution enables the system to reduce the required dwell time while maintaining reliable detection of user intent. The machine learning system continuously adjusts and improves its detection accuracy over time based on ongoing analysis of successful and unsuccessful activation attempts.

[0140] Figure 4 This is a flowchart illustrating a method 400 for implementing eye-tracking field user interaction according to some embodiments. This method can be executed by the aforementioned MR-UI system 200.

[0141] At operation 402, one or more light markers are displayed within the eye-tracking field area of ​​the head-mounted display device. The eye-tracking field area is positioned within a configurable distance of the user's eyes, and at a distance where the user's eyes can focus on one or more light markers. The light markers may display configurable visual attributes, including color and opacity, which can be set between high contrast settings for new users and low contrast settings for experienced users.

[0142] At operation 404, the system detects that the user is focusing on a first light marker among one or more light markers. This detection can be performed using one or more eye-tracking cameras in a head-mounted display device by determining the intersection between gaze vectors from each of the user's eyes. The system can generate a first gaze vector for the user's first eye and a second gaze vector for the user's second eye based on eye-tracking camera data, calculate the intersection between the first and second gaze vectors within a confidence region that addresses camera device defects, and determine that this intersection intersects with the first light marker within the confidence region.

[0143] At operation 406, in response to detecting that a user is focusing on a first light marker, the system moves the first light marker in a confirmation mode (e.g., while keeping other light markers in their original positions). The confirmation mode may include moving the first light marker away from the user in a linear path or a more complex pattern such as a spiral.

[0144] At operation 408, the system triggers a virtual input event by following a first light marker through a confirmation pattern with at least a threshold level of accuracy (e.g., for a configurable duration) based on determining the user's gaze. The configurable duration can be less than one second, representing a significant improvement over traditional dwell-time systems. The threshold level of accuracy can be dynamically adjusted based on machine learning analysis of historical gaze tracking data while maintaining high detection confidence.

[0145] This method provides an effective and reliable way to achieve eye-tracking-based interaction in virtual and mixed reality environments by using small light markers located in the user's eye-tracking field in combination with confirmation patterns, achieving faster interaction times compared to conventional dwell-based methods.

[0146] Figure 5 This is a block diagram 1500 illustrating a representative software architecture 1502 that can be used in conjunction with various hardware architectures described herein to provide the MR tools described herein. Figure 5This is merely a non-limiting example of software architecture, and it should be understood that many other architectures can be implemented to facilitate the functionality described herein. Software architecture 1502 can be implemented in, for example... Figure 6 It runs on the hardware of the 1600 machine. Figure 6 The machine 1600 includes a processor 1610, a memory 1630, and I / O components 1650, etc. A representative hardware layer 1504 is shown, and the representative hardware layer 1504 can represent, for example... Figure 6 The machine 1600. A representative hardware layer 1504 includes one or more processing units 1506 having associated executable instructions 1508. The executable instructions 1508 represent executable instructions of the software architecture 1502, including implementations of the methods, modules, etc., described herein. Hardware layer 1504 also includes a memory or storage module 1510, which also has executable instructions 1508. Hardware layer 1504 may also include other hardware indicating any other hardware representing hardware layer 1504, as indicated by 1512, such as other hardware shown as part of machine 1600.

[0147] exist Figure 5 In the example architecture, software 1502 can be conceptualized as a stack of layers, where each layer provides specific functionality. For example, software 1502 may include layers such as an operating system 1514, libraries 1516, frameworks / middleware 1518, applications 1520, and a presentation layer 1544. Operationally, applications 1520 or other components within each layer can call application programming interface (API) calls 1524 through the software stack and receive responses, return values, etc., shown as messages 1526 in response to API calls 1524. The layers shown are representative in nature, and not all software architectures have all layers. For example, some mobile or dedicated operating systems may not provide a framework / middleware layer 1518, while others may provide such a layer. Other software architectures may include additional or different layers.

[0148] Operating system 1514 can manage hardware resources and provide public services. For example, operating system 1514 may include kernel 1528, services 1530, and drivers 1532. Kernel 1528 can serve as an abstraction layer between hardware and other software layers. For example, kernel 1528 may be responsible for memory management, processor management (e.g., scheduling), component management, networking, security settings, etc. Services 1530 can provide other public services to other software layers. Drivers 1532 can be responsible for controlling or interfacing with the underlying hardware. For example, depending on the hardware configuration, drivers 1532 may include display drivers, camera drivers, Bluetooth® drivers, flash memory drivers, serial communication drivers (e.g., Universal Serial Bus (USB) drivers), Wi-Fi® drivers, audio drivers, power management drivers, etc.

[0149] Library 1516 provides common infrastructure that can be utilized by application 1520 or other components or layers. Compared to directly interfacing with the underlying operating system 1514 functions (e.g., kernel 1528, service 1530, or driver 1532), library 1516 typically provides functionality that allows other software modules to perform tasks in a more convenient way. Library 1516 may include system libraries 1534 (e.g., the C standard library), which provide functions such as memory allocation, string manipulation, and mathematical functions. Additionally, library 1516 may include API libraries 1536, such as media libraries (e.g., libraries supporting the rendering and manipulation of various media formats such as MPREG4, H.264, MP3, AAC, AMR, JPG, and PNG), graphics libraries (e.g., OpenGL frameworks for rendering 2D and 3D content on a display), database libraries (e.g., SQLite providing various relational database functions), and network libraries (e.g., WebKit providing web browsing functionality). Library 1516 may also include a wide variety of other libraries 1538 to provide many other APIs to application 1520 and other software components / modules.

[0150] Framework 1518 (sometimes also called middleware) provides a higher level of common infrastructure that can be utilized by Application 1520 or other software components / modules. For example, Framework 1518 can provide various graphical user interface (GUI) functions, advanced resource management, advanced location services, and more. Framework 1518 can provide a wide range of other APIs that can be utilized by Application 1520 or other software components / modules, some of which may be specific to a particular operating system or platform.

[0151] Application 1520 includes either built-in application 1540 or third-party application 1542. Examples of representative built-in applications 1540 may include, but are not limited to: contacts applications, browser applications, book reader applications, location applications, media applications, messaging applications, VR engine 1401, or game applications. Third-party applications 1542 may include any built-in application as well as a wide variety of other applications. In a specific example, third-party application 1542 (e.g., used by entities other than platform-specific vendors using Android) TM Or iOS TM Applications developed using a Software Development Kit (SDK) can run on mobile operating systems such as iOS. TM Android TM Mobile software running on Windows® Phone or other mobile operating systems. In this example, third-party application 1542 may invoke API call 1524 provided by a mobile operating system such as operating system 1514 to facilitate the functionality described herein.

[0152] Application 1520 can utilize built-in operating system functions (e.g., kernel 1528, service 1530, or driver 1532), libraries (e.g., system 1534, API 1536, and other libraries 1538), and frameworks / middleware 1518 to create a user interface for interacting with the system. Alternatively or additionally, in some systems, interaction with the user can occur through a presentation layer, such as presentation layer 1544. In these systems, the application / module "logic" can be decoupled from the aspects of the application / module that interact with the user.

[0153] Some software architectures utilize virtual machines. Figure 5 In the example, the virtual machine is shown as virtual machine 1548. The virtual machine creates a software environment in which applications / modules can execute as if they were running on the hardware machine. The virtual machine is controlled by the host operating system (…). Figure 5 The virtual machine is hosted by an operating system 1514 and typically, but not always, has a virtual machine monitor 1546 that manages the operation of the virtual machine and its interface with the host operating system (i.e., operating system 1514). The software architecture executes within the virtual machine, such as an operating system 1550, libraries 1552, frameworks / middleware 1554, applications 1556, or a presentation layer 1558. These layers of the software architecture executing within the virtual machine 1548 may be the same as or different from the corresponding layers previously described.

[0154] In the example implementation, VR engine 1401 operates as an application within application layer 1520. However, in some implementations, VR engine 1401 may operate in other software layers, or in multiple software layers (e.g., framework 1518 and application 1520), or in any architecture implementing the systems and methods described herein. VR engine 1401 may be similar to VR engine 112.

[0155] Figure 6 This is a block diagram illustrating components of a machine 1600, according to some example embodiments, capable of reading instructions from a machine-readable medium 1638 (e.g., a machine-readable storage medium) and executing any or more of the VR methods discussed herein. Specifically, Figure 6 A graphical representation of a machine 1600 in the example form of a computer system is shown, within which instructions 1616 (e.g., software, programs, applications, applets, apps, or other executable code) can be executed to cause the machine 1600 to perform any one or more of the methods discussed herein. For example, the instructions can cause the machine to perform any of the operations described herein. The instructions transform a general, unprogrammed machine into a specific machine programmed to perform the described and illustrated functions in the described manner. In alternative embodiments, machine 1600 operates as a standalone device or can be coupled (e.g., networked) to other machines. In a networked deployment, machine 1600 can operate as a server machine or client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. Machine 1600 may include, but is not limited to: server computers, client computers, personal computers (PCs), tablet computers, laptop computers, netbooks, set-top boxes (STBs), personal digital assistants (PDAs), entertainment media systems, cellular phones, smartphones, mobile devices, wearable devices (e.g., smartwatches), smart home devices (e.g., smart appliances), other smart devices, web appliances, network routers, network switches, network bridges, or any machine capable of sequentially or otherwise executing instructions 1616 specifying actions to be taken by machine 1600. Furthermore, although only a single machine 1600 is shown, the term "machine" should also be considered as a collection of machines 1600 that individually or jointly execute instructions 1616 to perform any one or more of the methods discussed herein.

[0156] Machine 1600 may include processor 1610, memory 1630, and I / O components 1650 that can be configured to communicate with each other, for example, via bus 1602. In an example embodiment, processor 1610 (e.g., a central processing unit (CPU), a reduced instruction set computing (RISC) processor, a complex instruction set computing (CISC) processor, a graphics processing unit (GPU), a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a radio frequency integrated circuit (RFIC), other processors, or any suitable combination thereof) may include, for example, processors 1612 and 1614 capable of executing instructions 1616. The term "processor" is intended to include multi-core processors, which may include two or more independent processors (sometimes referred to as "cores") capable of executing instructions simultaneously. Although Figure 6 Multiple processors are shown, but machine 1600 may include a single processor with a single core, a single processor with multiple cores (e.g., multi-core processing), multiple processors with a single core, multiple processors with multiple cores, or any combination thereof.

[0157] Memory / storage device 1630 may include memory 1632, such as main memory or other memory storage devices, and storage cells 1636, both of which are accessible by processor 1610, for example, via bus 1602. Storage cells 1636 and memory 1632 store instructions 1616 embodying any one or more of the methods or functions described herein. Instructions 1616 may also reside wholly or partially in memory 1632, in storage cells 1636, in at least one of the processors 1610 (e.g., in the processor's cache memory), or in any suitable combination thereof during execution by machine 1600. Therefore, memory 1632, storage cells 1636, and the memory of processor 1610 are examples of machine-readable media.

[0158] As used herein, "machine-readable medium" means a device capable of temporarily or permanently storing instructions and data, and may include, but is not limited to: random access memory (RAM), read-only memory (ROM), buffer memory, flash memory, optical media, magnetic media, cache memory, other types of storage devices (e.g., erasable programmable read-only memory (EEPROM)), or any suitable combination thereof. The term "machine-readable medium" should be considered to include a single medium or multiple media capable of storing instructions 1616 (e.g., a centralized or distributed database or associated cache and server). The term "machine-readable medium" should also be considered to include any or a combination of media capable of storing instructions (e.g., instructions 1616) that are executed by a machine (e.g., machine 1600), such that, when executed by one or more processors of machine 1600 (e.g., processor 1610), cause machine 1600 to perform any or more of the methods described herein. Therefore, "machine-readable medium" refers to a single storage device or apparatus, and a "cloud-based" storage system or storage network comprising multiple storage devices or apparatuses. The term "machine-readable medium" does not include transient signals themselves.

[0159] I / O component 1650 may include a wide variety of components for receiving input, providing output, generating output, transmitting information, exchanging information, capturing measurement results, etc. The specific I / O component 1650 included in a particular machine will depend on the machine type. For example, a portable machine such as a mobile phone will likely include a touch input device or other such input mechanism, while a headless server machine will likely not include such a touch input device. It should be understood that I / O component 1650 may include... Figure 6Many other components are not shown. The grouping of I / O components 1650 according to function is only for the sake of simplifying the following discussion, and this grouping is by no means limiting. In various example embodiments, I / O components 1650 may include output components 1652 and input components 1654. Output components 1652 may include visual components (e.g., displays such as plasma display panels (PDPs), light-emitting diode (LED) displays, liquid crystal displays (LCDs), projectors, cathode ray tubes (CRTs), or wearable devices such as head-mounted display (HMD) devices), acoustic components (e.g., speakers), tactile components (e.g., vibration motors, resistance mechanisms), other signal generators, etc. Input component 1654 may include alphanumeric input components (e.g., a keyboard, a touchscreen configured to receive alphanumeric input, an optical keyboard, or other alphanumeric input components), point-based input components (e.g., a mouse, touchpad, trackball, joystick, motion sensor, or other pointing instrument), haptic input components (e.g., a physical button, a touchscreen or other haptic input component that provides the position or force of a touch or touch gesture), motion-sensing input components (e.g., a hand controller), audio input components (e.g., a microphone), and the like.

[0160] In other example implementations, I / O component 1650 may include biometric component 1656, motion component 1658, environmental component 1660 or position component 1662, and various other components. For example, biometric component 1656 may include components for detecting expressions (e.g., hand expressions, facial expressions, voice expressions, body posture, or eye tracking), measuring biosignals (e.g., blood pressure, heart rate, body temperature, sweating, or brain waves), and identifying people (e.g., voice recognition, retinal recognition, facial recognition, fingerprint recognition, or EEG-based recognition). Motion component 1658 may include accelerometer components (e.g., accelerometers), gravity sensor components, rotation sensor components (e.g., gyroscopes), position sensing components, etc. Environmental component 1660 may include, for example, a lighting sensor component (e.g., a photometer), a temperature sensor component (e.g., one or more thermometers that detect ambient temperature), a humidity sensor component, a pressure sensor component (e.g., a barometer), an acoustic sensor component (e.g., one or more microphones that detect background noise), a proximity sensor component (e.g., an infrared sensor that detects nearby objects), a gas sensor (e.g., a gas detection sensor that detects the concentration of hazardous gases to ensure safety or to measure pollutants in the atmosphere), or other components that can provide indications, measurements, or signals corresponding to the surrounding physical environment. Position component 1662 may include a positioning sensor component (e.g., a Global Positioning System (GPS) receiver component), an altitude sensor component (e.g., an altimeter or barometer from which altitude can be derived), an orientation sensor component (e.g., a magnetometer), etc.

[0161] Communication can be implemented using a wide variety of technologies. I / O component 1650 may include communication component 1664, which is operable to couple machine 1600 to network 1680 or device 1670 via coupling 1682 and coupling 1672, respectively. For example, communication component 1664 may include network interface component or other suitable device to interface with network 1680. In other examples, communication component 1664 may include wired communication component, wireless communication component, cellular communication component, near field communication (NFC) component, Bluetooth® component (e.g., Bluetooth® Low Energy), Wi-Fi® component, and other communication components that provide communication via other modalities. Device 1670 may be another machine or any peripheral device from a variety of peripheral devices (e.g., a peripheral device coupled via Universal Serial Bus (USB)).

[0162] In various example implementations, one or more portions of network 1680 may be an ad hoc network, intranet, extranet, virtual private network (VPN), local area network (LAN), wireless LAN (WLAN), wide area network (WAN), wireless WAN (WWAN), metropolitan area network (MAN), the Internet, a portion of the Internet, a portion of the Public Switched Telephone Network (PSTN), a Common Old-Style Telephone Service (POTS) network, a cellular telephone network, a wireless network, a Wi-Fi® network, another type of network, or a combination of two or more such networks. For example, network 1680 or a portion of network 1680 may include a wireless or cellular network, and coupling 1682 may be a Code Division Multiple Access (CDMA) connection, a Global System for Mobile Communications (GSM) connection, or other types of cellular or wireless coupling. In this example, the coupling 1682 can implement any of a variety of data transmission technologies, such as single-carrier radio transmission technology (1xRTT), evolved data optimization (EVDO) technology, general packet radio service (GPRS) technology, enhanced data rate evolution of GSM (EDGE) technology, the 3rd Generation Partnership Project (3GPP) including 3G, fourth-generation wireless (4G) networks, Universal Mobile Telecommunications System (UMTS), High-Speed ​​Packet Access (HSPA), Global Microwave Access Interoperability (WiMAX), Long Term Evolution (LTE) standard, other standards defined by various standards setting organizations, other telematics protocols, or other data transmission technologies.

[0163] Instructions 1616 can be sent or received over network 1680 via a network interface device (e.g., a network interface component included in communication component 1664), using a transmission medium and utilizing any of a number of well-known transmission protocols (e.g., Hypertext Transfer Protocol (HTTP)). Similarly, instructions 1616 can be sent or received via a transmission medium through a coupling 1672 to device 1670 (e.g., a peer-to-peer coupling). The term "transmission medium" should be considered to include any intangible medium capable of storing, encoding, or carrying instructions 1616 for execution by machine 1600, and includes digital or analog communication signals or other intangible media to facilitate communication of such software.

[0164] Throughout this specification, multiple instances can implement components, operations, or structures described as single instances. Although the individual operations of one or more methods are shown and described as separate operations, one or more of the operations can be performed simultaneously, and they do not need to be performed in the order shown. Structures and functions presented as separate components in the example configuration can be implemented as combined structures or components. Similarly, structures and functions presented as single components can be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of this document.

[0165] Although an overview of the inventive subject matter has been described with reference to specific example embodiments, various modifications and changes can be made to these embodiments without departing from the broader scope of the embodiments of this disclosure. Such embodiments of the inventive subject matter may be referred to herein individually or collectively by the term "invention" for convenience only and are not intended to voluntarily limit the scope of this application to any single disclosure or inventive concept—if in fact more than one disclosure or inventive concept has been disclosed.

[0166] The embodiments illustrated herein have been described in sufficient detail to enable those skilled in the art to practice the disclosed teachings. Other embodiments may be used, and other embodiments may be derived therefrom, allowing for structural and logical substitutions and changes without departing from the scope of this disclosure. Therefore, the specific embodiments should not be considered limiting, and the scope of the various embodiments is defined only by the appended claims and the full scope of their equivalents.

[0167] As used herein, the term "or" can be interpreted as inclusive or exclusive. Furthermore, multiple instances may be provided for a resource, operation, or structure described herein as a single instance. Additionally, the boundaries between various resources, operations, modules, engines, and data stores are somewhat arbitrary, and specific operations are shown in the context of a particular illustrative configuration. Other allocations of functionality are contemplated, and they may fall within the scope of various embodiments of this disclosure. Generally, structures and functions presented as separate resources in the example configuration may be implemented as combined structures or resources. Similarly, structures and functions presented as single resources may be implemented as separate resources. These and other variations, modifications, additions, and improvements fall within the scope of embodiments of this disclosure as expressed by the appended claims. Therefore, the specification and drawings are to be considered illustrative rather than restrictive.

Claims

1. A non-transitory computer-readable storage medium storing an instruction set that, when executed by one or more computer processors, causes the one or more computer processors to perform operations, the operations including: One or more light markers are displayed in the eye-tracking field area of ​​a head-mounted display device, the eye-tracking field area being located within a configurable distance of the user's eyes and at a distance at which the user's eyes can focus on the one or more light markers; The system detects that the user is focusing on the first light marker among the one or more light markers; In response to detecting that the user is focusing on the first light marker, the first light marker is moved in a confirmation mode; as well as A virtual input event is triggered based on determining that the user's gaze follows the first light marker through the confirmation mode with at least a threshold level of accuracy.

2. The non-transitory computer-readable storage medium according to claim 1, wherein, The confirmation mode includes changing the display depth of the light marker.

3. The non-transitory computer-readable storage medium according to claim 1, wherein, Displaying the one or more light markers includes displaying light markers with configurable visual attributes, including at least one of color and opacity, wherein the color and opacity can be configured between a high contrast setting for new users and a low contrast setting for experienced users.

4. The non-transitory computer-readable storage medium according to claim 1, wherein, Displaying the one or more light markers includes positioning the one or more light markers at a distance from the user's eyes that is greater than the near point at which the eyes can focus, and closer than the distance at which background objects might interfere with gaze detection.

5. The non-transitory computer-readable storage medium according to claim 1, wherein, Displaying the one or more light markers includes displaying a variable number of markers with configurable positions, wherein each of the variable number of markers is mapped to a different function that can be configured based on the active application.

6. The non-transitory computer-readable storage medium according to claim 1, wherein, Displaying the one or more light markers includes: when an unknown object is detected in the eye-tracking field area, temporarily disabling the display of at least one of the one or more markers until the unknown object leaves the eye-tracking field area.

7. The non-transitory computer-readable storage medium according to claim 1, wherein, Detecting that the user is focusing on the first light marker includes: Generate the first gaze vector of the user's first eye and the second gaze vector of the user's second eye; Calculate the intersection point between the first gaze vector and the second gaze vector within the confidence region; and The intersection point is determined to intersect with the first optical marker within the confidence region.

8. The non-transitory computer-readable storage medium according to claim 7, wherein, Calculating the intersection points includes: using mathematical formulas to find one or more intersections between two gaze vectors, and defining the confidence region based on the accuracy capabilities of the tracking camera device.

9. The non-transitory computer-readable storage medium according to claim 7, wherein, Generating the first gaze vector and the second gaze vector includes: Create a 3D model of the user's eyes; The orientation and pointing direction of each eye are determined based on the 3D model; and The first gaze vector and the second gaze vector are generated based on the determined orientation and pointing direction.

10. The non-transitory computer-readable storage medium of claim 7, further comprising applying machine learning to analyze the first gaze vector and the second gaze vector within a plurality of time slices to improve the accuracy of intersection detection, and wherein, The machine learning application is used to adjust one or more activation thresholds.

11. The non-transitory computer-readable storage medium according to claim 7, wherein, Calculating the intersection points includes: Calibrate one or more eye-tracking cameras by measuring the pixel error rate during the calibration phase in which the user focuses on a known reference point; and The confidence region is defined based on the measured pixel error rate.

12. The non-transitory computer-readable storage medium according to claim 11, wherein, The confidence region is defined as including: The minimum confidence region size is determined based on the measured pixel error rate, and the minimum confidence region size maintains the target signal-to-noise ratio for gaze detection; and The confidence region size is adjusted based on the ambient lighting conditions detected by the lighting sensor of the head-mounted display device.

13. The non-transitory computer-readable storage medium of claim 7, further comprising applying machine learning to analyze historical intersection detection accuracy data to dynamically adjust the size of the confidence region while maintaining the target signal-to-noise ratio and reducing the required dwell time as the intersection detection accuracy improves.

14. The non-transitory computer-readable storage medium according to claim 7, wherein, The calculation of the intersection point includes expanding the size of the confidence region when the head movement speed exceeds a threshold.

15. The non-transitory computer-readable storage medium of claim 7, further comprising addressing camera device defects by establishing a confidence region around the calculated intersection point.

16. The non-transitory computer-readable storage medium of claim 7, wherein the operation further comprises training a machine learning model to analyze the time series of intersections within multiple time slices to improve accuracy.

17. The non-transitory computer-readable storage medium according to claim 16, wherein, The training involves using a labeled dataset containing examples of both intentional and accidental gaze intersections to allow for dynamic adjustment of activation thresholds based on detected user behavior patterns while maintaining high detection confidence.

18. The non-transitory computer-readable storage medium of claim 7, further comprising using a 3D model of each eye to guide vector calculation at the firmware level, thereby determining the precise gaze intersection with minimal delay.

19. The non-transitory computer-readable storage medium of claim 18, further comprising creating a 3D model of each eye by directly accessing one or more eye-tracking camera devices, thereby determining precise eye orientation and pointing direction with minimal delay.

20. A system comprising: One or more computer processors; One or more computer memory units; as well as An instruction set, stored in one or more computer memories, configuring the one or more computer processors to perform operations including: One or more light markers are displayed in the eye-tracking field area of ​​a head-mounted display device, the eye-tracking field area being located within a configurable distance of the user's eyes and at a distance at which the user's eyes can focus on the one or more light markers; The system detects that the user is focusing on the first light marker among the one or more light markers; In response to detecting that the user is focusing on the first light marker, the first light marker is moved in a confirmation mode; and A virtual input event is triggered based on determining that the user's gaze follows the first light marker through the confirmation mode with at least a threshold level of accuracy.

21. A method comprising: One or more light markers are displayed in the eye-tracking field area of ​​a head-mounted display device, the eye-tracking field area being located within a configurable distance of the user's eyes and at a distance at which the user's eyes can focus on the one or more light markers; The system detects that the user is focusing on the first light marker among the one or more light markers; In response to detecting that the user is focusing on the first light marker, the first light marker is moved in a confirmation mode; as well as A virtual input event is triggered based on determining that the user's gaze follows the first light marker through the confirmation mode with at least a threshold level of accuracy.