Visual vergence determination

By combining multiple inputs into an artificial reality system, the problem of visual convergence-accommodation conflict was solved, improving the user experience. In particular, when the eye-tracking system malfunctions, the adjustable configuration of the head-mounted display was used to adjust the visual convergence distance, improving the robustness and accuracy of the system.

CN117406863BActive Publication Date: 2026-06-26CTRL-LABS CORP

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CTRL-LABS CORP
Filing Date
2018-12-28
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In existing technologies, user experience problems caused by visual convergence-accommodation conflict in artificial reality systems, especially when eye-tracking systems malfunction, make it impossible to accurately determine the user's visual convergence distance, leading to visual fatigue and discomfort.

Method used

By combining eye-tracking-based, body-based, and content-based approaches, a fusion algorithm is used to weight multiple inputs. When detecting eye-tracking system malfunctions, the adjustable configuration of the head-mounted display (such as the position of the rendered image and optical blocks) is used to adjust the visual convergence distance, thereby eliminating or improving visual convergence-accommodation conflicts.

Benefits of technology

It improves the user experience of artificial reality systems in situations of visual convergence-accommodation conflict, reduces visual fatigue and discomfort, and enhances the robustness and accuracy of the system.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN117406863B_ABST
    Figure CN117406863B_ABST
Patent Text Reader

Abstract

Visual vergence determination is disclosed. In one embodiment, an artificial reality system determines that a performance metric of an eye tracking system is below a first performance threshold. The eye tracking system is associated with a head-mounted display worn by a user. The artificial reality system receives a first input associated with a body of the user and, based on the received first input, determines an area that the user is looking at within a field of view of the head-mounted display. The system determines a visual vergence distance of the user based on at least the first input associated with the body of the user, the area that the user is looking at, and a location of one or more objects in a scene displayed by the head-mounted display. The system adjusts one or more configurations of the head-mounted display based on the determined visual vergence distance of the user.
Need to check novelty before this filing date? Find Prior Art

Description

[0001] This application is a divisional application of the application filed on December 28, 2018, with application number 201811622371.8 and invention title "Visual Convergence Determination". Technical Field

[0002] This disclosure generally relates to artificial reality, such as virtual reality and augmented reality. Background Technology

[0003] Artificial reality is a form of reality that has been adjusted in some way before being presented to a user, and may include, for example, virtual reality (VR), augmented reality (AR), mixed reality (MR), hybrid reality, or some combination and / or derivative thereof. Artificial reality content may include fully generated content or generated content combined with captured content (e.g., real-world photographs). Artificial reality content may include video, audio, haptic feedback, or some combination thereof, and any of these may be presented in a single channel or multiple channels (e.g., stereoscopic video that produces a three-dimensional effect for the viewer). Artificial reality may be associated with applications, products, accessories, services, or some combination thereof, which are used, for example, to create content in artificial reality and / or used in artificial reality (e.g., to perform activities therein). Artificial reality systems that provide artificial reality content can be implemented on a variety of platforms, including head-mounted displays (HMDs) connected to a host computer system, standalone HMDs, mobile devices or computing systems, or any other hardware platform capable of providing artificial reality content to one or more viewers. Summary of the Invention

[0004] The specific embodiments described herein relate to a method for determining a user's visual convergence using a combination of eye-tracking-based methods (e.g., 3D eye tracking, machine learning-based eye tracking), body-based methods (e.g., head position / motion, hand position / motion, body position / motion), and content-based methods (e.g., Z-buffer, face detection, information provided by the application developer). Specific embodiments detect malfunctions in the eye-tracking system (e.g., data out of range or no data at all), and when a malfunction is detected, use this combination of methods to approximate the user's visual convergence. In specific embodiments, a fusion algorithm weights the inputs from all these methods and determines the location the user is likely looking at (e.g., using segmented comparison). For example, when the helmet detects that the user's hand has picked up a virtual object and is moving towards their face, the fusion algorithm can infer that the user is looking at the virtual object in their hand. When a virtual object is identified as a potential object that the user is looking at, the system can determine the appropriate Z depth for the display screen and adjust the configuration of the artificial reality system accordingly (e.g., change the rendered image, move the display screen, move the optical blocks) to eliminate or improve the negative effects caused by the vergence accommodation conflict.

[0005] The embodiments disclosed herein are merely examples, and the scope of this disclosure is not limited to these embodiments. Specific embodiments may include all, some, or none of the components, elements, features, functions, operations, or steps of the embodiments disclosed above. Embodiments of the invention disclosed in the appended claims specifically relate to a method, storage medium, system, and computer program product, wherein any feature mentioned in one claim class (e.g., method) may also be claimed in another claim class (e.g., system). Dependencies or references in the appended claims are chosen solely for formal reasons. However, any subject matter arising from an intentional reference to any prior claim (in particular multiple dependencies) may also be claimed, thereby disclosing any combination of claims and their features, and may be claimed regardless of the dependencies chosen in the appended claims. The subject matter that may be claimed includes not only combinations of features set forth in the appended claims but also any other combination of features in the claims, wherein each feature mentioned in the claims may be combined with any other feature or combination of features in the claims. Furthermore, any embodiments and features described or depicted herein may be claimed in individual claims and / or in any combination of any embodiments or features described or depicted herein or with any features of the appended claims.

[0006] This application provides the following:

[0007] 1) A method comprising:

[0008] The computing system determines that the performance metric of the eye-tracking system is below a first performance threshold, wherein the eye-tracking system is associated with a head-mounted display worn by the user;

[0009] The computing system receives one or more first inputs associated with the user's body;

[0010] The computing system estimates the area within the field of view that the user is looking at the head-mounted display based on one or more of the first inputs received in association with the user's body;

[0011] The computing system determines the user's vergence distance based at least on one or more of the first inputs associated with the user's body, the estimated area the user is looking at, and the positions of one or more objects in the scene displayed on the head-mounted display; and

[0012] The computing system adjusts one or more configurations of the head-mounted display based on the determined visual convergence distance of the user.

[0013] 2) The method according to 1), wherein one or more configurations of the head-mounted display include one or more of the following:

[0014] Rendering the image;

[0015] Display screen position; or

[0016] The position of the optical block.

[0017] 3) The method according to 2) further includes:

[0018] The performance metric of the eye-tracking system is determined to be higher than a second performance threshold;

[0019] Receive eye-tracking data from the eye-tracking system; and

[0020] Based on the eye-tracking data and one or more first inputs associated with the user's body, the user's visual convergence distance is determined.

[0021] 4) The method according to 3) further includes:

[0022] Receive one or more second inputs associated with one or more display elements in the scene displayed on the head-mounted display; and

[0023] The user's visual convergence distance is determined based at least on the eye-tracking data, one or more first inputs associated with the user's body, and one or more second inputs associated with one or more display elements of the scene.

[0024] 5) The method according to 4) further includes:

[0025] One or more first inputs associated with the user's body are fed into a fusion algorithm, wherein the fusion algorithm assigns a weighted score to each of the one or more first inputs; and

[0026] A confidence score and the Z depth of the display screen are determined based on one or more of the first inputs associated with the user's body.

[0027] 6) The method according to 5) further includes:

[0028] The confidence score is compared with the confidence threshold;

[0029] When the confidence score is lower than the confidence threshold, one or more second inputs associated with one or more of the display elements of the scene are fed into the fusion algorithm; and

[0030] The Z depth of the display screen is determined using the fusion algorithm based on one or more first inputs associated with the user's body and one or more second inputs associated with one or more display elements of the scene.

[0031] 7) Based on the method described in 6), further compare:

[0032] The fusion algorithm compares confidence scores determined based on multiple input combinations; and

[0033] The Z depth of the display screen is determined using the fusion algorithm based on the combination of inputs with the highest confidence score.

[0034] 8) According to the method in 5), wherein the Z depth and the confidence score are determined by the fusion algorithm using a segmented comparison of one or more of the first inputs with one or more of the second inputs.

[0035] 9) According to the method in 5), wherein the Z depth and the confidence score are determined based on the correlation between two or more inputs of one or more of the first inputs and one or more of the second inputs.

[0036] 10) The method according to 5), wherein the fusion algorithm includes a machine learning (ML) algorithm, wherein the machine learning (ML) algorithm determines the input combination.

[0037] 11) The method according to 4), wherein one or more of the first inputs associated with the user's body include one or more of the following:

[0038] Hand position;

[0039] The direction of the hand;

[0040] Hand movements;

[0041] gesture;

[0042] Head position;

[0043] Head direction;

[0044] Head movements;

[0045] Head posture;

[0046] Body posture;

[0047] Body posture;

[0048] Physical movement;

[0049] The user's behavior; or

[0050] A weighted combination of one or more relevant parameters.

[0051] 12) The method according to 11), wherein one or more of the first inputs associated with the user's body are received from one or more of the following:

[0052] Controller;

[0053] sensor;

[0054] camera;

[0055] microphone;

[0056] Accelerometer;

[0057] The helmet worn by the user; or

[0058] Mobile device.

[0059] 13) The method according to 4), wherein one or more of the second inputs associated with one or more display elements include one or more of the following:

[0060] The Z-buffer value associated with the display element;

[0061] Display elements marked by the developer;

[0062] Image analysis results;

[0063] Displays the shape of the element;

[0064] Face recognition results;

[0065] Object recognition results;

[0066] People identified in the displayed content;

[0067] Objects identified in the displayed content;

[0068] The correlation between two or more display elements; or

[0069] A weighted combination of one or more of the second inputs.

[0070] 14) The method according to 1) further includes:

[0071] The performance metric of the eye-tracking system is determined to be below a second performance threshold;

[0072] Receive one or more second inputs associated with one or more display elements in the scene displayed on the head-mounted display; and

[0073] The user's visual convergence distance is determined based at least on one or more of the first inputs associated with the user's body and one or more of the second inputs associated with one or more of the display elements.

[0074] 15) According to the method of 14), wherein determining that the performance metric of the eye-tracking system is lower than the second performance threshold includes: determining that the eye-tracking system does not have eye-tracking data or that the eye-tracking system cannot provide eye-tracking data.

[0075] 16) According to the method of 1), wherein the performance metric of the eye-tracking system includes one or more of the following:

[0076] The accuracy of the parameters from the eye-tracking system;

[0077] The accuracy of the parameters derived from the eye-tracking system;

[0078] Parameter values ​​from the eye-tracking system;

[0079] Detectability of the pupil;

[0080] A metric based on one or more parameters associated with the user;

[0081] Parameter changes;

[0082] Parameter change trend;

[0083] Data availability; or

[0084] A weighted combination of one or more performance-related parameters.

[0085] 17) The method according to 16), wherein one or more parameters associated with the user include one or more of the following:

[0086] The user's eye distance;

[0087] Pupil position;

[0088] Pupil status;

[0089] The correlation between the user's two pupils;

[0090] The user's head size;

[0091] The location where the user wears the helmet;

[0092] The angle at which the user wears the helmet;

[0093] The direction in which the user wears the helmet;

[0094] The alignment of the user's eyes; or

[0095] A weighted combination of one or more relevant parameters associated with the user.

[0096] 18) The method according to 1), wherein the first performance threshold includes one or more of the following:

[0097] Preset value;

[0098] Pre-order range;

[0099] The status of the data;

[0100] The rate of change of the data; or

[0101] Trends in data changes.

[0102] 19) One or more non-transitory computer-readable storage media containing software, which, when executed by a server computing device, is operable to:

[0103] The performance metric of the eye-tracking system is determined to be below a first performance threshold, wherein the eye-tracking system is associated with a head-mounted display worn by the user;

[0104] Receive one or more first inputs associated with the user's body;

[0105] Based on one or more of the first inputs received in association with the user's body, the area within the field of view of the head-mounted display that the user is looking at is estimated;

[0106] The user's vergence distance is determined based at least on one or more of the first inputs associated with the user's body, the estimated area the user is looking at, and the positions of one or more objects in the scene displayed on the head-mounted display; and

[0107] Based on the determined visual convergence distance of the user, one or more configurations of the head-mounted display are adjusted.

[0108] 20) A system comprising:

[0109] One or more sensors;

[0110] Head-mounted displays;

[0111] One or more non-transitory computer-readable storage media containing instructions;

[0112] One or more processors, coupled to the storage medium and operable to execute the instructions, to:

[0113] The performance metric of the eye-tracking system is determined to be below a first performance threshold, wherein the eye-tracking system is associated with the head-mounted display worn by the user;

[0114] Receive one or more first inputs associated with the user's body;

[0115] Based on one or more of the first inputs received in association with the user's body, the area within the field of view of the head-mounted display that the user is looking at is estimated;

[0116] The user's vergence distance is determined based at least on one or more of the first inputs associated with the user's body, the estimated area the user is looking at, and the positions of one or more objects in the scene displayed on the head-mounted display; and

[0117] Based on the determined visual convergence distance of the user, one or more configurations of the head-mounted display are adjusted. Attached Figure Description

[0118] Figure 1 An example network environment associated with a social networking system is shown.

[0119] Figure 2 An example artificial reality system is shown.

[0120] Figure 3 An example of a visual convergence-accommodation conflict in a head-mounted display is shown.

[0121] Figure 4 An example 3D eye-tracking system is shown.

[0122] Figure 5 An example head-mounted display with an adjustable display screen is shown.

[0123] Figure 6 Example performance evaluation charts are shown for different combinations of body-based and content-based inputs.

[0124] Figure 7 An example scene is shown in the field of view of a user wearing an AI reality helmet.

[0125] Figure 8A An example fusion algorithm for determining the Z-depth and confidence score of a display screen is shown.

[0126] Figure 8B An example fusion algorithm using segmented comparisons of the input is shown.

[0127] Figure 9 An example method for determining a user's visual vergence distance based on a combination of inputs is shown.

[0128] Figure 10 An example computer system is shown. Detailed Implementation

[0129] Figure 1 An example network environment 100 associated with a social networking system is shown. Network environment 100 includes users 101, client systems 130, social networking systems 160, and third-party systems 170 connected to each other via network 110. While a specific setup of users 101, client systems 130, social networking systems 160, third-party systems 170, and network 110 is shown, this disclosure contemplates that users 101, client systems 130, social networking systems 160, third-party systems 170, and network 110 may have any suitable setup. As an example, not by limitation, two or more of client systems 130, social networking systems 160, and third-party systems 170 may be directly connected to each other, bypassing network 110. As another example, two or more of client systems 130, social networking systems 160, and third-party systems 170 may be physically or logically fully or partially co-located with each other. Moreover, although... Figure 1A specific number of users 101, client systems 130, social networking systems 160, third-party systems 170, and network 110 are shown, but this disclosure contemplates that users 101, client systems 130, social networking systems 160, third-party systems 170, and network 110 may have any suitable number. As an example, and not by limitation, network environment 100 may include multiple users 101, client systems 130, social networking systems 160, third-party systems 170, and network 110.

[0130] In a specific embodiment, user 101 may be an individual (individual user), entity (e.g., enterprise, business, or third-party application), or group (e.g., a group of individuals or entities) that interacts or communicates with or through social networking system 160. In a specific embodiment, social networking system 160 may be a network-addressable computing system hosting an online social network. Social networking system 160 may generate, store, receive, and transmit social network data, such as user profile data, concept profile data, social graph information, or other suitable data related to the online social network. Social networking system 160 may be directly accessed by other components of network environment 100 or accessed via network 110. In a specific embodiment, social networking system 160 may include an authorization server (or other suitable component) that allows user 101 to choose to join or leave social networking system 160, record their actions, or share their actions with other systems (e.g., third-party system 170), for example, by setting appropriate privacy settings. A user's privacy settings can determine what information associated with the user can be recorded, how such information can be recorded, when such information can be recorded, who can record such information, with whom such information can be shared, and for what purpose such information can be recorded or shared. An authorization server can be used to enforce one or more privacy settings of users of the social networking system 30 through blocking, data hashing, anonymization, or other suitable techniques. In a specific embodiment, the third-party system 170 may be a network-addressable computing system. The third-party system 170 may be accessed directly or via network 110 by other components of the network environment 100. In a specific embodiment, one or more users 101 may use one or more client systems 130 to access the social networking system 160 or the third-party system 170, send data to the social networking system 160 or the third-party system 170, and receive data from the social networking system 160 or the third-party system 170. The client system 130 may access the social networking system 160 or the third-party system 170 directly, via network 110, or via a third-party system. As an example, not in a restrictive manner, client system 130 can access third-party system 170 via social networking system 160. Client system 130 can be any suitable computing device, such as a personal computer, laptop computer, cellular phone, smartphone, tablet computer, or augmented / virtual reality device.

[0131] This disclosure envisions any suitable network 110. As an example, and not by way of limitation, one or more portions of network 110 may include an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a wireless WAN (WWAN), a metropolitan area network (MAN), a portion of the Internet, a portion of the Public Switched Telephone Network (PSTN), a cellular telephone network, or a combination of two or more thereof. Network 110 may include one or more networks 110.

[0132] Link 150 enables client system 130, social networking system 160, and third-party system 170 to connect to or interconnect with communication network 110. This disclosure contemplates any suitable link 150. In specific embodiments, one or more links 150 include one or more wired (e.g., Digital Subscriber Line (DSL) or Cable Data Service Interface Specification (DOCSIS)), wireless (e.g., Wi-Fi or Global Microwave Access Interoperability (WiMAX)), or optical (e.g., Synchronous Fiber Network (SONET) or Synchronous Digital Hierarchy (SDH)) links. In specific embodiments, one or more links 150 may include ad hoc networks, intranets, extranets, VPNs, LANs, WLANs, WANs, WWANs, MANs, portions of the Internet, portions of the PSTN, cellular-based networks, satellite-based networks, another link 150, or combinations of two or more such links 150. Links 150 need not be identical throughout network device 100. One or more first links 150 may differ from one or more second links 150 in one or more aspects.

[0133] Figure 2An example artificial reality system 200 is illustrated. In a specific embodiment, the artificial reality system 200 may include a helmet 204 (e.g., a head-mounted display (HMD)), a controller 206, and a computing system 208. A user 202 may wear the helmet 204, which may display visual artificial reality content to the user 202. The helmet 204 may include an audio device that can provide audio artificial reality content to the user 202. The helmet 204 may include one or more cameras that can capture images and video of the environment. The helmet 204 may include an eye-tracking system to determine the visual convergence of the user 202. The helmet 204 may include one or more display screens for rendering artificial reality content. The controller 206 may include a touchpad and one or more buttons. The controller 206 may receive input from the user 202 and relay that input to the computing system 208. The controller 206 may also provide haptic feedback to the user 202. The computing system 208 may be connected to the helmet 204 and the controller 206 via a cable or wireless connection. The computing system 208 can control the helmet 204 and controller 206 to provide artificial reality content to the user 202 and receive input from the user 202. The computing system 208 can be a standalone host computer system, an onboard computer system integrated with the helmet 204, a mobile device, or any other hardware platform capable of providing artificial reality content to the user 202 and receiving input from the user 202. In this disclosure, the terms "helmet" and "head-mounted display" are used interchangeably to refer to a head-mounted device used in an artificial reality system.

[0134] Vertigo distance can be the distance from the user's eyes to the object the user's eyes converge on (e.g., a real-world object or a virtual object in virtual space). Focal length can be the distance from the user's eyes to the object the user's eyes are adapted to. In the real world, when a user's two eyes are focused on a real-world object, both eyes converge and adjust to that object. The vertigo distance and focal length of the two eyes match. In artificial reality, a user can focus on a virtual object presented on a head-mounted display. The user's two eyes can converge on the virtual object, which can be relatively far from the user in virtual space, while being adjusted to be relatively close to the user's eyes on the head-mounted display. A mismatch between vertigo and user eye accommodation can lead to vertigo-accommodation conflict, which can negatively impact the artificial reality experience. For example, vertigo-accommodation conflict can lead to eye strain or VR nausea over time.

[0135] Figure 3An example of a vergence-accommodation conflict in a head-mounted display 300 is illustrated. The head-mounted display 300 may have a display screen 320 for displaying content to a user's eyes 302 and 304. The display 320 may present a virtual object 322 to the user. The user's two eyes 302 and 304 may gaze at the virtual object 322. In this case, the vergence distance 342 or depth of gaze of the user's two eyes corresponds to the virtual distance between the eyes (302, 304) and the virtual object 322. However, the two eyes 302 and 304 may have a focal length 340 because they are accommodating to the display screen 320, which is the actual light source for this virtual object 322. The mismatch between the focal length 340 and the vergence distance 342 causes a vergence-accommodation conflict, which can negatively impact the artificial reality experience provided by the head-mounted display 300. Specific embodiments address the vergence-accommodation conflict problem and improve the user experience of artificial reality.

[0136] In a specific embodiment, the artificial reality helmet system may include an eye-tracking system for tracking the user's eyes in real time. The eye-tracking system may be a 3D eye-tracking system that tracks the user's eye movements (e.g., gaze direction, gaze angle, gaze depth, convergence) and determines the location the user is looking at (e.g., visual convergence distance or gaze point). Figure 4 An example 3D eye-tracking system 400 is illustrated. The 3D eye-tracking system 400 can track three-dimensional eye movements to determine a user's visual convergence distance or fixation point. The eye-tracking system 400 may include a lens 410, multiple infrared light sources (e.g., 412A to 412H), a thermal reflector 420, and an infrared camera 440. The light sources 412A to 412H may be infrared LEDs mounted on the lens 410. The thermal reflector 420 may be a dichroic filter that reflects infrared light while allowing visible light to pass through. Infrared light (e.g., 414) emitted by one or more light sources 412A to 412H can reach and be reflected from the eye 450. The reflected light 416 may be further reflected by the thermal reflector 420 and reach the infrared camera 440. The camera 440 may be an infrared camera that uses the reflected infrared light to capture an image of the eye 450. The eye-tracking system 400 can capture images of the user's eyes (e.g., pupils) and process the images using computer vision techniques. The eye-tracking system 400 can measure the angle between the two eyes and use geometric relationships to determine the user's visual convergence distance and fixation point. The 3D eye-tracking system 400 can, for example, use a 1 o The accuracy of the measurement is such that the user's eye angle is measured. Visible light 432 from the display screen 430 can reach the eye 450 through the heat reflector 420 and the lens 410, allowing the user to see the content presented on the display screen 430.

[0137] In a specific embodiment, the helmet system may use a machine learning (ML)-based approach for eye tracking. The helmet system may capture image sequences of the user's eyes while wearing the helmet (e.g., using a 3D eye tracking system) and use machine learning (ML) algorithms to process the images and output visual convergence information. For example, the machine learning (ML) algorithm may include an inference model to determine the user's visual convergence distance and fixation point. In a specific embodiment, the helmet system may include a hybrid approach combining 3D eye tracking and ML-based eye tracking.

[0138] However, tracking systems may not always function optimally. For example, if the user wears the helmet incorrectly, the eye-tracking system may fail to detect pupils. As another example, eye-tracking systems may have lower accuracy and precision due to malfunctions or user error. As yet another example, eye-tracking data may be out of range, or there may be no data from the eye-tracking system at all. Furthermore, some AIVR helmet systems may not even include any eye-tracking system. Without reliable eye-tracking information, the ability of AIVR helmet systems to improve visual-verb-accommodative conflict will be compromised.

[0139] In a specific embodiment, the helmet system can detect malfunctions in the eye-tracking system. Upon detecting a malfunction, the helmet system can switch states to receive one or more inputs and use combinations of these inputs to determine the user's vergence distance or gaze point. These inputs can be based on various methods, including, but not limited to, eye-tracking-based methods (e.g., 3D eye tracking, ML-based eye tracking), body-based methods (e.g., head position / motion, hand position / motion, body position / motion), and content-based methods (e.g., Z-buffer, face / object recognition, developer-provided information). Specific embodiments can use combinations of various methods to provide more robust eye tracking. The fusion algorithm can weight the inputs based on all these methods and determine the location the user is likely looking at, the Z-depth of the display screen, and a confidence score. In a specific embodiment, the fusion algorithm can determine the correlation between one or more inputs and determine the location the user is likely looking at based on the correlation of the inputs. For example, when the helmet system detects that the user's hand has picked up a virtual object and is moving towards their face, the fusion algorithm can infer that the user is looking at the virtual object in their hand. Upon identifying the virtual object as a possible object the user is looking at, the helmet system can determine the appropriate Z-depth of the display screen. The helmet system can then physically move the display screen associated with the zoom system to a position corresponding to the Z depth to resolve the visual convergence-accommodation conflict.

[0140] Figure 5An example head-mounted display 500 with an adjustable display screen 502 is shown. The head-mounted display 500 may have a display screen 502 and a lens 504. In a specific embodiment, the display screen 502 may move along axis 506 toward or away from the lens 504 within a movable range 520 (e.g., 1 cm) between positions 512 and 514. The head-mounted display 500 and the lens 504 may have a distance that may be referred to as Z distance or Z depth 530. Z depth 530 may affect the focal length of the user's eyes. Position 512 of the display screen 502 may correspond to a situation where the user is viewing a virtual object at a visual convergence distance of 25 cm. Position 514 may correspond to a situation where the user is viewing a virtual object at an infinite visual convergence distance. When adjusting the display screen 502, the lens 504 or other parts of the head-mounted display 500 may be used as a reference. In a specific embodiment, the adjustable display screen may be associated with a zoom system of the head-mounted display 500. The zoom system can use the Z-depth 530 of the display screen to coordinate the user's focal length and vergence distance to improve accommodative conflict. In a specific embodiment, the head-mounted display 500 can move the optical block associated with the lens 504 to adjust the Z-depth 530, thereby improving accommodative conflict. In a specific embodiment, the head-mounted display 500 can move the display screen 502 and the optical block associated with the lens 504 to adjust the Z-depth 530, thereby improving accommodative conflict. In a specific embodiment, the helmet can present different images to the user based on the user's vergence distance or fixation point to eliminate or improve accommodative conflict.

[0141] In a specific embodiment, the helmet system may determine one or more performance metrics and compare those metrics with one or more performance thresholds to evaluate the performance of the eye-tracking system and, accordingly, determine a combination of various methods. Figure 6 Example performance evaluation graphs are shown with different combinations of eye-tracking-based input, body-based input, and content-based input. The horizontal axis 602 can correspond to the performance metric level of the eye-tracking system. The vertical axis 604 can correspond to different inputs and / or methods under different performance conditions. The performance metric can be compared to a first threshold 610 and a second threshold 620. When the performance metric is above the first threshold 610, the eye-tracking system performs as expected under good conditions, and this performance can be identified as good. In this case, the helmet system can continue to use the eye-tracking data from the eye-tracking system to determine the user's visual convergence distance and gaze point without requiring additional data or input.

[0142] Performance can be identified as poor when the performance metric is below a first threshold 610 and above a second threshold 620. In this case, the eye-tracking system partially works, but has some malfunctions that negatively impact its performance (e.g., reduced confidence scores, reduced accuracy and / or precision in determining vergence distance and Z-depth). When the eye-tracking system performs poorly, the helmet system can determine a combination of inputs to improve the quality and confidence scores for determining the user's vergence distance and gaze point. This combination can include eye-tracking data, body-based inputs, or content-based inputs. For example, the combination can include one or more body-based inputs. As another example, the combination can include one or more content-based inputs. As yet another example, the combination can include one or more body-based and content-based inputs, along with eye-tracking data.

[0143] When the performance metric falls below the second threshold of 620, the eye-tracking system can be identified as non-functional. In this case, the helmet system may not have available eye-tracking data because it lacks an eye-tracking system or the eye-tracking system is not functioning. When the eye-tracking system is performing poorly, the helmet system can use a combination of inputs to determine the user's likely visual convergence distance and gaze point. This combination may include one or more body-based or content-based inputs.

[0144] In specific embodiments, performance metrics may include, but are not limited to, parameter accuracy of the eye-tracking system, parameter values ​​of the eye-tracking system, pupil detectability, metrics based on one or more parameters associated with the user, parameter changes, parameter change trends, data availability, and weighted combinations of one or more performance metrics or related parameters. Thresholds for performance metrics may include, but are not limited to, predetermined values, predetermined ranges, data status, data change rate, and data change trends. In specific embodiments, the threshold may be predetermined by the developer. In specific embodiments, the threshold may be determined by input from a user using the helmet, or it may be adaptively determined using machine learning or deep learning algorithms that use current or historical data from the helmet. In specific embodiments, the helmet system may use performance metrics to detect one or more malfunctions in the eye-tracking system. In specific embodiments, the helmet system may detect malfunctions in the eye-tracking system by comparing two or more parameters of the eye-tracking data (e.g., information from different sensing channels) and determining whether these parameters are consistent with each other.

[0145] As an example, rather than through limitation, a helmet system can compare eye-tracking data parameter values ​​(e.g., Z-depth) with predetermined values ​​or ranges (e.g., the Z-depth range specified in the helmet's specifications or manual) and determine whether the parameter values ​​are within the predetermined range. When parameter values ​​are outside the range, the eye-tracking system may be identified as malfunctioning. As another example, the helmet system can determine the trend of change in the eye-tracking data parameters and determine that the parameter values ​​are drifting and the deviation is beyond acceptable limits. As another example, the helmet system may not receive data from the eye-tracking system and can determine that the helmet does not include an eye-tracking system or that the eye-tracking system is not functioning. As another example, the eye-tracking system may fail to detect the user's pupil when the user blinks or is otherwise obstructed. As yet another example, the helmet system may detect that the user's eyes have problems (e.g., ocular rheology or inability of the two eyes to converge) that hinder the proper functioning of the eye-tracking system.

[0146] In a specific embodiment, the helmet system can determine one or more parameters associated with the user wearing the helmet and determine that the performance of the eye-tracking system may be negatively affected. User-related parameters may include, but are not limited to, the distance between the user's two eyes (e.g., pupils), pupil position, pupil state, correlation between the user's two pupils, user's head size, the position of the user wearing the helmet, the angle at which the user wears the helmet, the direction in which the user wears the helmet, the alignment of the user's eyes, the alignment of the helmet with the user's eyes, and a weighted combination of one or more related parameters associated with the user. The helmet can compare the user-related parameters to one or more criteria, which may be predetermined by the developers or adaptively determined by the user or the algorithm. When the user-related parameters do not meet the criteria, the helmet system can determine that the eye-tracking system does not work well or cannot work in these conditions. As an example, and not by limitation, the helmet system can detect that the user is wearing the helmet incorrectly (e.g., incorrect orientation, posture, or alignment), and that eye-tracking data is unavailable or inaccurate. As another example, the helmet system may be unable to detect the user's pupils and determine that the eye-tracking system cannot track the current user's eyes. As another example, a helmet system may fail to properly detect or track a user's eyes because the user is wearing prescription lenses or contact lenses that are beyond the system's support range. As yet another example, a helmet system may determine that a user has a larger interpupillary distance or a larger head size than the system is designed for. In such cases, the eye-tracking system may fail to detect the pupil or may fail to properly track the user's gaze.

[0147] In a specific embodiment, the helmet system can determine a confidence score for the determined user vergence distance or fixation point and the Z-depth of the display screen. The helmet system can compare the confidence score to a confidence threshold to determine whether the determined vergence distance or fixation point meets predetermined requirements (e.g., accuracy, precision, update rate, stability). In a specific embodiment, the helmet system can use the confidence score to continuously evaluate the quality of the determined vergence distance or fixation point to determine if further data is needed to improve the determination quality. For example, the helmet system can determine the vergence distance and fixation point based on body-based input and a confidence score higher than a confidence threshold. In this case, no other data besides body-based input is needed. As another example, the helmet system can determine that the confidence score of the determined vergence distance or fixation point does not meet predetermined requirements, and the helmet system needs further data (e.g., more body-based input, eye-tracking data, or content-based input) to improve the determination quality and confidence score.

[0148] When performance metrics fall below a first threshold, the eye-tracking system may perform poorly or fail to function. The helmet system can receive one or more first inputs associated with the body of the user wearing the helmet. The helmet system can determine the area the user is looking at within the field of view of the head-mounted display on the helmet. The area the user is looking at can be determined based on the received one or more first inputs associated with the user's body. The helmet system can compare the area the user is looking at with the positions of one or more objects in a scene displayed on the head-mounted display to determine which objects in the scene fall within that area. The helmet system can then determine the user's possible visual convergence distance or fixation point based on the one or more first inputs associated with the user's body, the area the user is looking at, and / or the displayed objects in the scene that fall within that area. Visual convergence distance can be the distance from the user's eyes to a virtual object, assuming the user is looking at a virtual object. Fixation point can be a point in the virtual space the user is looking at. The helmet system can adjust the position of the head-mounted display screen based on the determined user visual convergence distance. In specific embodiments, the helmet system can determine the vergence distance or fixation point based on one or more content-based inputs in addition to body-based inputs. In specific embodiments, the helmet system can determine the vergence distance or fixation point based on both body-based and content-based inputs. In specific embodiments, the helmet system can determine the vergence distance or fixation point based on eye-tracking data, body-based inputs, and content-based inputs. In specific embodiments, the helmet system can adjust one or more configurations of the head-mounted display based on the determined user vergence distance or fixation point to eliminate or improve vergence-accommodation conflict. The helmet system can configure the head-mounted display by presenting different images to the user, adjusting the position of the display screen, or adjusting optical blocks based on the determined user vergence distance or fixation point.

[0149] In specific embodiments, the first input associated with the user's body may include, but is not limited to, hand position, hand orientation, hand movement, gesture, head position, head gaze, head orientation, head movement, head posture, body posture, body pose, body movement, user behavior, or a weighted combination of one or more related parameters. In specific embodiments, body-based input may include the position, movement, or state of any other body part of the user besides the eyes. In specific embodiments, the helmet system may include one or more user input devices or sensing devices, including but not limited to controllers, one or more sensors, cameras, microphones, accelerometers, mobile devices, or other user input devices. The user input devices or sensing devices may be associated with the user wearing the helmet and may communicate with the helmet system via a wireless or wired connection. The user input devices or sensing devices may track the user's movement or state and send data to the helmet system. The helmet system may receive one or more first inputs associated with the user from one or more user input devices or sensing devices. The user input devices or sensing devices may be separate from the helmet or may be integrated into the helmet.

[0150] As an example, and not by way of limitation, a user wearing a helmet may hold the controller with one or both hands. The user may use the controller to select or interact with one or more objects in the field of view of the helmet's head-mounted display. These objects may be virtual objects presented by the head-mounted helmet, real-world objects captured by one or more cameras and displayed on the head-mounted display, or real-world objects seen by the user through an augmented reality helmet. The interaction between the user and the object may be tracked by the controller and transmitted to the helmet system. As another example, the user's hands holding the controller may move in three-dimensional space, and the controller may track motion (e.g., velocity, direction, acceleration, trajectory, pattern, angle, gesture, position, correlation or coordination between the two hands). As another example, the helmet may include one or more sensors to measure head orientation, gaze angle, head movement, head posture, etc. As another example, the helmet system may include one or more sensors mounted on the user's body, and these sensors may measure the user's body motion (e.g., velocity, direction, acceleration, trajectory, pattern, angle, posture, position, correlation or coordination between multiple body parts or multiple users), body posture, or body position. As another example, a helmet system may include cameras that monitor user behavior and movements. The cameras may be integrated into the helmet or mounted in the user's environment, communicating with the helmet via wireless or wired connections.

[0151] In a specific embodiment, when the helmet system identifies that the eye-tracking system is performing poorly (i.e., partially working but imperfect), the helmet system can continue to receive eye-tracking data from the eye-tracking system. The system can determine the user's visual convergence distance or fixation point based on the data from the eye-tracking system. The helmet system can receive one or more first inputs associated with the user's body. The helmet system can determine the user's visual convergence distance and fixation point based on the eye-tracking data and one or more first inputs associated with the user's body. In a specific embodiment, the helmet system can first use the eye-tracking data to determine the visual convergence distance or fixation point and determine that the confidence score is below a confidence threshold. Then, the helmet system can use body-based inputs to improve the quality (e.g., accuracy) of the determined visual convergence distance and fixation point, and increase the determined confidence score.

[0152] In a specific embodiment, when the performance of the eye-tracking system is identified as poor, the helmet system can continue to receive eye-tracking data from the eye-tracking system and determine the visual convergence distance and gaze point based on the eye-tracking data. The helmet system can also receive one or more first inputs associated with the user's body and one or more second inputs associated with one or more display elements of the displayed content of the scene presented by the head-mounted display. The helmet system can determine the user's visual convergence distance and gaze point based on a combination of eye-tracking data, one or more first inputs associated with the user's body, or one or more second inputs associated with the displayed content.

[0153] In specific embodiments, one or more second inputs associated with one or more display elements in the scene's display content may include, for example, but not limited to: Z-buffer values ​​associated with display elements, display elements marked by the developer, image analysis results, the shape of display elements, facial recognition results, object recognition results, people identified in the display content, objects identified in the display content, the correlation of two or more display elements, or a weighted combination of one or more content-based inputs. In specific embodiments, content-based inputs may include one or more parameters generated by computer vision algorithms, including, for example, but not limited to, facial recognition, object recognition, machine learning, deep learning, background-foreground analysis, image analysis, and other computer vision algorithms. In specific embodiments, display elements associated with content-based inputs may be associated with virtual objects presented in virtual space via a head-mounted display or real-world objects in the field of view of a user wearing an augmented reality headset. Display elements may include, for example, but not limited to: objects (e.g., trees, buildings), computer-generated content (e.g., text, icons, graphics, illustrations), people, or background views.

[0154] Figure 7An example scene 700 is shown in the field of view of a user wearing an augmented reality headset. Scene 700 may include a person 702, a house 704, and a background mountain 706. In a specific embodiment, scene 700 may be a virtual reality scene presented by the headset in virtual space and in the field of view of the user wearing the headset. In a specific embodiment, scene 700 may be a real-world scene in the field of view of a user wearing an augmented reality headset. In a specific embodiment, the headset system may determine the user's vergence distance or fixation point with a confidence score (e.g., above a predetermined threshold) based on one or more second inputs associated with the displayed content of the scene (e.g., person 702, house 704, mountain 706). By way of example and not by limitation, the headset system may determine that person 702 has been marked as the focus of scene 700 at this time by the developer of the displayed content (e.g., virtual reality game or application) (or implicitly marked as the focus by placing person 702 in focus while the rest of the scene is not in focus). The helmet system can determine the Z-depth of the display screen based on the Z-buffer value associated with one or more pixels of person 702, and adjust the display screen based on the determined Z-depth that allows person 702 to be focused. As another example, game developers can directly program the helmet system to move the display screen, thereby forcing the user to look at a part of the scene by making that part more focused than others. As another example, the helmet system can use facial recognition to detect person 702 and use a series of motion detections on the scene to determine that person 702 is moving towards the user in this motion. The helmet can infer that person 702 should be the user's focus in the scene and determine the Z-depth of the screen based on the Z-buffer value associated with person 702's pixels, and move the display screen accordingly. As another example, the helmet system can determine that the user is looking at person 702 based on head gaze information and use the Z-buffer associated with person 702's pixels to determine the Z-depth of the display screen. As another example, the helmet system can determine (e.g., using object recognition) that there are two objects in the scene and determine that the user may be looking at the object closer to the user in visual space. As another example, a helmet system can determine that a user has already been chasing a virtual object in a previous scene (e.g., in a game) and that the user may still be looking at that virtual object in the current scene. As yet another example, a helmet system can predict the trajectory of a user's gaze based on the tracking direction and speed of the user's gaze point in a previous scene and determine the object the user is looking at.

[0155] In a specific embodiment, the helmet system can determine the Z-depth of the display screen based on a weighted average of Z-buffer values ​​associated with pixels of multiple displayed contents (e.g., person 702, house 704, mountain 706), and move the display screen accordingly to allow the user a better field of view of the scene 700 as a whole. As an example, and not by limitation, the helmet system can use a weighted combination or average of Z-buffer values ​​from 81 points (e.g., a 9×9 grid) of the scene to determine the Z-depth of the display screen, allowing areas associated with these points to become the user's focus. The helmet system can assign different weighted scores to different points based on the relative importance of corresponding areas in the scene.

[0156] In a specific embodiment, the helmet system can (e.g., based on body-based input or eye-tracking data) determine a specific area of ​​the scene the user is looking at, and use computer vision algorithms (e.g., face recognition, object recognition, background-foreground segmentation) to identify objects displayed in that area, and further determine the Z-depth of the display screen based on Z-buffer values ​​associated with the pixels of the identified objects in that area. As an example, and not by way of limitation, the helmet system can use body-based input to determine that the user is looking at the middle portion of scene 700. The helmet system can use object recognition to detect house 704 in the middle portion of scene 700, and use Z-buffer values ​​associated with the pixels of house 704 to determine the Z-depth of the display screen, and move the display screen accordingly.

[0157] In a specific embodiment, when the eye-tracking system is identified as non-functional, the helmet system may receive one or more first inputs associated with the user's body and one or more second inputs associated with one or more display elements of the scene presented by the head-mounted display. The helmet system may determine the user's visual convergence distance and fixation point based at least on a combination of the one or more first inputs associated with the user's body and the one or more second inputs associated with the display content.

[0158] Figure 8AAn example fusion algorithm 800A for determining the Z-depth and confidence score of a display screen is illustrated. The inputs to the fusion algorithm 800A may include, for example, but are not limited to: 3D eye-tracking data 802, ML-based eye-tracking data 804, head position 806, hand position 808, gaze angle 810, Z-buffer 812, developer-provided information, etc. The fusion algorithm 800A can weight all inputs to determine the appropriate Z-depth 830 and confidence score 832 for the display screen. The fusion algorithm 800A can continuously monitor all or some of the inputs and assign a weighted score to each input based on its quality (e.g., accuracy, precision, availability, data rate) or importance. In a specific embodiment, the fusion algorithm may assign a higher weighted score to a particular input when it has higher quality than other inputs. For example, when 3D eye-tracking data is available and relatively accurate, the fusion algorithm may assign a higher weighted score to the eye-tracking data than to other inputs. As another example, when the performance of a 3D eye-tracking system is poor, the fusion algorithm can increase the weighted score of other inputs and decrease the weighted score of the 3D eye-tracking data.

[0159] Figure 8B An example fusion algorithm 800B that uses segmented comparisons on the input is shown. For simplicity of description, Figure 8B Only four inputs of the fusion algorithm 800B are shown in the diagram. However, the fusion algorithm 800B can include all possible inputs from eye-tracking data (e.g., 3D eye tracking, ML-based eye tracking), body-based inputs (e.g., head position, hand position, gaze angle), and content-based inputs (e.g., Z-buffer, face / object recognition, developer-provided information). In a specific embodiment, the fusion algorithm 800B can use segmented comparisons (e.g., 840, 841, 842, 843, 844, 845) to compare and analyze each pair of inputs to determine corrections between them. In a specific embodiment, the fusion algorithm can compare multiple inputs to determine their correlation. In a specific embodiment, the fusion algorithm can use multi-level comparisons and analyses to determine the correlation of inputs. The fusion algorithm 800B can determine the user's gaze location based on the correlation of the inputs, and therefore based on the Z-depth and confidence scores of the display screen.

[0160] As an example, rather than by constraint, the fusion algorithm can determine that a user's hand is moving towards their head while holding a virtual object. The fusion algorithm can determine that the user is likely looking at the virtual object in their hand. Based on the virtual object in the user's hand, the fusion algorithm can determine the Z-depth of the display screen with a confidence score of 0.6 (e.g., 60% confidence). The fusion algorithm can further analyze the user's head gaze direction and angle, and determine the area where the user is looking at the moving hand. The fusion algorithm can determine with a confidence score of 0.9 (i.e., 90% confidence) that the user is looking at the virtual object in their hand. The user's hand position can be tracked by the controller held by the user. The helmet system can dynamically actuate the zoom system based on the user's head-hand position to keep the virtual object in focus for the user.

[0161] As another example, the fusion algorithm can identify a person in the user's field of view who is moving towards the user at a certain speed. The fusion algorithm can determine that the user is looking at the moving person in the scene with a confidence score of 0.8 (i.e., 80% confidence). The fusion algorithm can further improve the confidence score to 0.9 (i.e., 90% confidence) using information that the user has been looking at the moving person in previous scenes. The fusion algorithm can determine the Z-distance based on the moving person. As another example, the fusion algorithm can determine the user's head movement, corresponding to the movement of a virtual object (e.g., head movement to look up and down or left and right in sync with the movement of the virtual object). The fusion algorithm can determine that the user is looking at the moving virtual object with a confidence score of 0.9 (i.e., 90% confidence).

[0162] In a specific embodiment, the fusion algorithm can use a specific combination of inputs to determine the Z-depth and confidence score. The fusion algorithm can compare the determined confidence score with a confidence score threshold to determine if the combination provides a result that meets quality requirements. When the confidence score is higher than the threshold, the fusion algorithm can accept the Z-depth result. When the confidence score is lower than the threshold, the fusion algorithm can search for and try other combinations of inputs. In a specific embodiment, the fusion algorithm can use different combinations of inputs to determine the Z-depth and confidence score. The fusion algorithm can rank the combinations based on the determined confidence scores and select the combination with the highest confidence score. The fusion algorithm can construct an N-dimensional matrix for exhaustive segmental comparison of all inputs and determine the relevance that leads to the highest confidence score in the Z-depth result.

[0163] In a specific embodiment, the fusion algorithm may include a machine learning or deep learning algorithm to determine a combination of inputs. The machine learning model can be trained using data from different inputs and determine which input combination leads to the highest confidence score for determining the Z-depth. In a specific embodiment, the machine learning algorithm may further determine the Z-depth and confidence score based on the selected input combination. In a specific embodiment, the fusion algorithm may evaluate each input in parallel to accelerate the computation process.

[0164] Figure 9 An example method 900 for determining a user's visual convergence distance based on a combination of inputs is illustrated. In step 910, the helmet system may use a 3D eye-tracking system to track the user's visual convergence. In step 920, while tracking the user's visual convergence, the helmet system may evaluate the performance of the eye-tracking system (e.g., good, poor, no functionality). In a specific embodiment, the helmet system may continuously evaluate the eye-tracking system performance at a predetermined or adaptively determined frequency. The helmet system may calculate one or more performance metrics (e.g., accuracy, precision, data availability) based on data from the eye-tracking system. In a specific embodiment, the helmet system may use a specific performance metric, multiple performance metrics, or a weighted combination of multiple performance metrics to evaluate the performance of the eye-tracking system. In step 930, the helmet system may compare the performance metric to a first threshold. When the performance metric is above the first threshold, the eye-tracking system's performance can be considered good. The helmet system can continue to use the eye-tracking system to track the user's visual convergence with very high accuracy and precision, without requiring additional data. When the performance metric is below the first threshold, the eye-tracking system's performance may be poor or no functionality. In step 940, the helmet system can compare the performance metric with a second threshold.

[0165] The performance of the eye-tracking system may be considered poor when the performance metric is above a second threshold and below a first threshold. In this case, the eye-tracking system may be partially working, but with some malfunctions. In step 950, the helmet system may receive eye-tracking data, body-based input, or content-based input, and use a fusion algorithm to determine a combination of inputs (e.g., based on the usability and quality of the inputs or a obtained confidence score). In step 952, the helmet system may estimate the region within the field of view of the head-mounted display that the user is looking at based on the combination of inputs. In step 954, the fusion algorithm may weight all inputs and determine the user's vergence distance or fixation point based at least on a combination of the received inputs, the estimated region the user is looking at, and the positions of one or more objects in the scene displayed on the head-mounted display. The helmet system may use the fusion algorithm to determine the Z-depth of the display screen (and a confidence score) based on the user's vergence distance or fixation point. In a specific embodiment, the combination of inputs may include one or more body-based inputs. In a specific embodiment, the combination of inputs may include one or more content-based inputs. In a specific embodiment, the combination of inputs may include one or more inputs of eye-tracking data, body-based inputs, and content-based inputs.

[0166] When the performance metric falls below a second threshold, the eye-tracking system is inoperable, and eye-tracking data is unavailable. In step 960, the helmet system may receive body-based or content-based input and use a fusion algorithm to determine a combination of inputs. In step 962, the helmet system may estimate the region within the field of view of the head-mounted display that the user is looking at based on the combination of inputs. In step 964, the fusion algorithm may weight all inputs and determine the user's vergence distance or fixation point based at least on a combination of the received inputs, the estimated region the user is looking at, and the positions of one or more objects in the scene displayed on the head-mounted display. The helmet system may use the fusion algorithm to determine the Z-depth of the display screen (and a confidence score) based on the user's vergence distance or fixation point. In a specific embodiment, the combination of inputs may include one or more body-based inputs. In a specific embodiment, the combination of inputs may include one or more content-based inputs. In a specific embodiment, the combination of inputs may include one or more of body-based and content-based inputs.

[0167] In step 970, the helmet system can reconfigure the head-mounted display based on the determined user's vergence distance or fixation point to eliminate or improve vergence-accommodation conflict. The helmet system can adjust the configuration of the head-mounted display by presenting different images to the user, adjusting the position of the display screen, or adjusting the position (e.g., location) of the optical blocks based on the determined user's vergence distance or fixation point. Where appropriate, specific embodiments can be repeated. Figure 9 One or more steps of the method. Although this disclosure will Figure 9 The specific steps of the method are described and shown as occurring in a specific order, but this disclosure contemplates that they may occur in any suitable order. Figure 9 Any appropriate steps of the method. Furthermore, although this disclosure describes and illustrates methods based on including... Figure 9 This disclosure presents example methods for determining a user's visual convergence distance based on a combination of inputs for specific steps of the method. However, it contemplates any suitable method for determining a user's visual convergence distance based on a combination of inputs that includes any suitable steps, where appropriate, which may include... Figure 9 The method may include all steps, some steps, or no steps. Furthermore, although this disclosure describes and illustrates the execution... Figure 9 The method may refer to specific components, apparatus, or systems for specific steps, but this disclosure contemplates the execution of... Figure 9 Any appropriate combination of any appropriate component, device, or system of any appropriate step of the method.

[0168] Figure 10 An example computer system 1000 is illustrated. In specific embodiments, one or more computer systems 1000 perform one or more steps of one or more methods described or shown herein. In specific embodiments, one or more computer systems 1000 provide the functionality described or shown herein. In specific embodiments, software running on one or more computer systems 1000 performs one or more steps of one or more methods described or shown herein, or provides the functionality described or shown herein. Specific embodiments include one or more portions of one or more computer systems 1000. Throughout this document, references to computer systems may include computing devices and vice versa, where appropriate. Furthermore, references to computer systems may include one or more computer systems, where appropriate.

[0169] This disclosure contemplates any suitable number of computer systems 1000. This disclosure contemplates computer systems 1000 employing any suitable physical form. By way of example, and not by limitation, computer systems 1000 may be embedded computer systems, system-on-a-chip (SOC), single-board computer systems (SBC) (e.g., computer modules (COM) or system modules (SOM)), desktop computer systems, laptop or notebook computer systems, interactive self-service machines, mainframes, grids of computer systems, mobile phones, personal digital assistants (PDAs), servers, tablet systems, augmented / virtual reality devices, or combinations of two or more of these. Where appropriate, computer systems 1000 may include one or more computer systems 1000; may be singular or distributed; span multiple locations; span multiple machines; or be located in the cloud, which may include one or more cloud elements in one or more networks. Where appropriate, one or more computer systems 1000 may perform one or more steps of one or more methods described or shown herein without significant space or time limitations. By way of example, and not by way of limitation, one or more computer systems 1000 may execute one or more steps of one or more methods described or shown herein in real time or in batches. Where appropriate, one or more computer systems 1000 may execute one or more steps of one or more methods described or shown herein at different times or at different locations.

[0170] In a specific embodiment, computer system 1000 includes a processor 1002, a memory 1004, a storage unit 1006, an input / output (I / O) interface 1008, a communication interface 1010, and a bus 1012. While this disclosure describes and illustrates a particular computer system having a particular number of particular elements in a particular setup, this disclosure contemplates any suitable computer system having any suitable number of any suitable elements in any suitable setup.

[0171] In a specific embodiment, processor 1002 includes hardware for executing instructions, such as those that constitute a computer program. By way of example, and not by way of limitation, to execute instructions, processor 1002 may retrieve (or fetch) instructions from internal registers, internal caches, memory 1004, or storage 1006; decode and execute those instructions; and then write one or more results to internal registers, internal caches, memory 1004, or storage 1006. In a specific embodiment, processor 1002 may include one or more internal caches for data, instructions, or addresses. This disclosure contemplates that processor 1002 may include any suitable number of suitable internal caches where appropriate. By way of example, and not by way of limitation, processor 1002 may include one or more instruction caches, one or more data caches, and one or more translation lookup buffers (TLBs). Instructions in the instruction cache may be copies of instructions in memory 1004 or storage 1006, and the instruction cache may accelerate the retrieval of those instructions by processor 1002. The data in the data cache may be a copy of data in memory 1004 or storage 1006 for instructions to be executed on processor 1002; the result of a previous instruction executed on processor 1002 is accessed or written to memory 1004 or storage 1006 by a subsequent instruction executed on processor 1002; or other suitable data. The data cache can accelerate read or write operations on processor 1002. The TLB can accelerate virtual address translation on processor 1002. In a specific embodiment, processor 1002 may include one or more internal registers for data, instructions, or addresses. Where appropriate, this disclosure contemplates that processor 1002 may include any suitable number of suitable internal registers. Where appropriate, processor 1002 may include one or more arithmetic logic units (ALUs); is a multi-core processor; or includes one or more processors 1002. While this disclosure describes and shows particular processors, this disclosure contemplates any suitable processor.

[0172] In a specific embodiment, memory 1004 includes main memory for storing instructions to be executed by processor 1002 or data for processor 1002 to function. As an example, and not by limitation, computer system 1000 may load instructions from memory 1006 or another source (e.g., another computer system 1000) into memory 1004. Processor 1002 may then load the instructions from memory 1004 into internal registers or an internal cache. To execute these instructions, processor 1002 may retrieve the instructions from the internal registers or internal cache and decode them. During or after instruction execution, processor 1002 may write one or more results (which may be intermediate or final) into the internal registers or internal cache. Processor 1002 may then write one or more of these results into memory 1004. In a specific embodiment, memory 1004 executes only instructions within one or more internal registers or internal caches, or within memory 1004 itself (as opposed to memory 1006 or elsewhere), and operates only on data within one or more internal registers or internal caches, or within memory 1004 itself (as opposed to memory 1006 or elsewhere). One or more memory buses (which may include address and data buses) couple processor 1002 to memory 1004. As described below, bus 1012 may include one or more memory buses. In a specific embodiment, one or more memory management units (MMUs) are located between processor 1002 and memory 1004 and facilitate access to memory 1004 requested by processor 1002. In a specific embodiment, memory 1004 includes random access memory (RAM). Where appropriate, the RAM may be volatile memory. Where appropriate, the RAM may be dynamic RAM (DRAM) or static RAM (SRAM). Moreover, where appropriate, the RAM may be single-port or multi-port RAM. This disclosure contemplates any suitable RAM. Where appropriate, memory 1004 may include one or more memory units 1004. While this disclosure describes and shows specific memories, it contemplates any suitable memory.

[0173] In a specific embodiment, storage 1006 includes mass storage for data or instructions. By way of example and not limitation, storage 1006 may include an HDD, floppy disk drive, flash memory, optical disk, magneto-optical disk, magnetic tape, or Universal Serial Bus (USB) drive, or two or more combinations thereof. Where appropriate, storage 1006 may include removable or fixed (or stationary) media. Where appropriate, storage 1006 may be located internally or externally to computer system 1000. In a specific embodiment, storage 1006 is non-volatile solid-state memory. In a specific embodiment, storage 1006 includes read-only memory (ROM). Where appropriate, the ROM may be a mask-programmed ROM, a programmable ROM (PROM), an erasable PROM (EPROM), an electrically erasable PROM (EEPROM), an electrically rewritable ROM (EAROM), or flash memory, or two or more combinations thereof. This disclosure contemplates mass storage 1006 in any suitable physical form. Where appropriate, memory 1006 may include one or more memory control units that facilitate communication between processor 1002 and memory 1006. Where appropriate, memory 1006 may include one or more memory units. While this disclosure describes and shows particular memory units, this disclosure contemplates any suitable memory unit.

[0174] In a specific embodiment, the I / O interface 1008 includes hardware and / or software that provides one or more interfaces for communication between the computer system 1000 and one or more I / O devices. Where appropriate, the computer system 1000 may include one or more of these I / O devices. One or more of these I / O devices are capable of communicating between a person and the computer system 1000. By way of example, and not by limitation, I / O devices may include a keyboard, buttons, a microphone, a display, a mouse, a printer, a scanner, a speaker, a still camera, a stylus, a tablet computer, a touchscreen, a trackball, a camera, another suitable I / O device, or a combination of two or more of these. I / O devices may include one or more sensors. This disclosure contemplates any suitable I / O device and any suitable I / O interface 1008 for such I / O devices. Where appropriate, the I / O interface 1008 may include one or more devices or software drivers that enable the processor 1002 to drive one or more of these I / O devices. Where appropriate, I / O interface 1008 may include one or more I / O interfaces 1008. Although this disclosure describes and shows specific I / O interfaces, this disclosure contemplates any suitable I / O interface.

[0175] In a specific embodiment, the communication interface 1010 includes hardware and / or software that provides one or more interfaces for communication (e.g., packet-based communication) between the computer system 1000 and one or more other computer systems 1000 or one or more networks. By way of example, and not by limitation, the communication interface 1010 may include a network interface controller (NIC), or a network adapter for communicating with Ethernet or other wired networks or wireless NICs (WNICs), or a network adapter for communicating with wireless networks (e.g., Wi-Fi networks). This disclosure contemplates any suitable network and any suitable communication interface 1010 for that network. By way of example, and not by limitation, the computer system 1000 may communicate with one or more portions of an ad hoc network, a personal area network (PAN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), or the Internet, or a combination of two or more of these. One or more portions of one or more of these networks may be wired or wireless. As an example, computer system 1000 may communicate with a wireless PAN (WPAN) (e.g., BLUETOOTH WPAN), a Wi-Fi network, a Wi-Fi Max network, a cellular telephone network (e.g., a Global System for Mobile Communications (GSM) network), or other suitable wireless networks, or combinations of two or more thereof. Where appropriate, computer system 1000 may include any suitable communication interface 1010 for any of these networks. Where appropriate, communication interface 1010 may include one or more communication interfaces 1010. While this disclosure describes and shows specific communication interfaces, this disclosure contemplates any suitable communication interface.

[0176] In a specific embodiment, bus 1012 includes hardware and / or software that couples components of computer system 1000 to each other. By way of example, and not by limitation, bus 1012 may include Accelerated Graphics Port (AGP) or other graphics bus, Enhanced Industry Standard Architecture (EISA) bus, Front Side Bus (FSB), HyperTransport (HT) interconnect, Industry Standard Architecture (ISA) bus, Infinite Bandwidth Interconnect, Low Pin (LPC) bus, memory bus, Micro Channel Architecture (MCA) bus, Peripheral Component Interconnect (PCI) bus, PCIe bus, Serial Advanced Technology Attachment (SATA) bus, Video Electronics Standards Association Local (VLB) bus, or another suitable bus, or a combination of two or more of these. Where appropriate, bus 1012 may include one or more buses 1012. While this disclosure describes and shows specific buses, this disclosure contemplates any suitable bus or interconnect.

[0177] In this document, references to computer-readable persistent storage media, where appropriate, may include semiconductor-based or other integrated circuits (ICs) (e.g., field-programmable gate arrays (FPGAs) or application-specific integrated circuits (ASICs)), hard disk drives (HDDs), hybrid hard disk drives (HHDs), optical disks, optical disk drives (ODDs), magneto-optical disks, magneto-optical drives, floppy disks, floppy disk drives (FDDs), magnetic tape, solid-state drives (SSDs), RAM drives, secure digital cards, secure digital cards or drives, another suitable computer-readable persistent storage medium, or a suitable combination thereof. Where appropriate, computer-readable persistent storage media may be volatile, non-volatile, or a combination of volatile and non-volatile.

[0178] In this document, unless otherwise expressly stated or specified in the context, "or" has an inclusive rather than exclusive meaning. Therefore, in this document, unless otherwise expressly stated or specified in the context, "A or B" means "A and / or B". Furthermore, unless otherwise expressly stated or specified in the context, "and" has both a common and a separate meaning. Therefore, in this document, unless otherwise expressly stated or specified in the context, "A and B" means "A and B, either common or separate".

[0179] The scope of this disclosure includes all variations, substitutions, alterations, modifications, and alterations of the exemplary embodiments described or shown herein that would be understood by those skilled in the art. The scope of this disclosure is not limited to the exemplary embodiments described or shown herein. Furthermore, while this disclosure describes and illustrates corresponding embodiments including specific elements, components, features, functions, operations, or steps, any of these embodiments may include any combination or arrangement of any element, component, feature, function, operation, or step described or shown anywhere herein that would be understood by those skilled in the art. Moreover, any device, system, or element referred to in the appended claims, whether or not it is activated, turned on, or enabled, is suitable for, set up, capable of, configured, enabled, usable, or effective in performing a particular function. Additionally, although this disclosure describes or illustrates specific embodiments to provide particular advantages, specific embodiments may not provide these advantages, or may provide some or all of these advantages.

Claims

1. A method comprising the following steps performed by a computing system: The performance metric of the eye-tracking system was determined to be below a performance threshold, whereby... The eye-tracking system is associated with a head-mounted display worn by the user; In response to determining that the performance metric is below the performance threshold, one or more pieces of content displayed by the head-mounted display are identified; Based on one or more identified pieces of content, access one or more attributes of said one or more pieces of content; The user's visual convergence distance is predicted based on at least one or more attributes associated with one or more displayed contents; and One or more configurations associated with the head-mounted display are adjusted based on the predicted visual convergence distance of the user.

2. The method according to claim 1, further comprising: The performance metric of the eye-tracking system is determined to be below a second performance threshold; and Receive one or more inputs associated with the user's body, wherein the user's visual convergence distance is predicted based at least on one or more of the inputs associated with the user's body and one or more attributes of the one or more contents displayed by the head-mounted display.

3. The method according to claim 2, wherein, Determining that the performance metric of the eye-tracking system is below the second performance threshold includes: determining that the eye-tracking system does not exist or is unable to provide eye-tracking data.

4. The method of claim 2, further comprising: Based on one or more inputs received that are associated with the user's body, the region within the field of view of the head-mounted display that the user is looking at is estimated, wherein the user's visual convergence distance is predicted based at least on the estimated region within the field of view of the head-mounted display that the user is looking at, and the position of the one or more contents displayed by the head-mounted display.

5. The method of claim 1, further comprising: The performance metric of the eye-tracking system is determined to be higher than a second performance threshold; and Eye-tracking data is received from the eye-tracking system, wherein the user's visual convergence distance is predicted based at least on the eye-tracking data, one or more inputs associated with the user's body, and one or more attributes associated with one or more contents displayed by the head-mounted display.

6. The method according to claim 5, wherein, One or more inputs associated with the user's body include one or more of the following: hand position, hand orientation, hand movement, gesture, head position, head orientation, head movement, head posture, gaze angle, body posture, body pose, body movement, the user's behavior, or a weighted combination of one or more related parameters.

7. The method according to claim 6, wherein, Receive one or more inputs associated with the user's body from one or more of the following: controller, sensor, camera, microphone, accelerometer, helmet worn by the user, or mobile device.

8. The method of claim 1, further comprising: The first combination of inputs is fed into a fusion algorithm to predict the user's visual convergence distance, wherein the first combination of inputs includes: one or more inputs associated with the user's body, one or more attributes of one or more displayed contents, or eye-tracking data from the eye-tracking system.

9. The method of claim 8, further comprising: The first confidence score is determined by the fusion algorithm based on the predicted visual convergence distance of the user; In response to determining that the first confidence score is lower than a confidence threshold, a second combination of inputs is fed into the fusion algorithm, wherein the second combination of inputs is different from the first combination of inputs; and Based on the second combination of the inputs, the user's new visual convergence distance is determined, wherein the new visual convergence distance is correlated with a second confidence score, which is higher than the first confidence score.

10. The method of claim 9, further comprising: The fusion algorithm is used to determine the Z depth of the display screen of the head-mounted display based on a first combination of the inputs, wherein adjusting one or more configurations includes: adjusting the position of the display screen of the head-mounted display based on the determined Z depth of the display screen.

11. The method according to claim 10, wherein, The fusion algorithm determines the Z-depth, the first confidence score, and the second confidence score by segmented comparison of two or more inputs in a first combination of the inputs.

12. The method according to claim 10, wherein, The Z-depth, the first confidence score, and the second confidence score are determined based on the correlation between two or more inputs in the first combination of the inputs.

13. The method according to claim 8, wherein, The fusion algorithm includes a machine learning (ML) algorithm, and wherein a first combination of the inputs fed to the fusion algorithm is determined by the machine learning algorithm.

14. The method according to claim 1, wherein, The one or more attributes of the displayed content include one or more of the following: Z-buffer value associated with the display element, display element marked by the developer, image analysis results, shape of the display element, face recognition results, object recognition results, people identified in the display content, objects identified in the display content, correlation between two or more display elements, or a weighted combination of one or more second inputs.

15. The method according to claim 1, wherein, The performance metrics of the eye-tracking system include one or more of the following: accuracy of parameters from the eye-tracking system, precision of parameters from the eye-tracking system, parameter values ​​from the eye-tracking system, pupil detectability, a metric based on one or more parameters associated with the user, parameter variation, parameter variation trend, data availability, or a weighted combination of one or more performance parameters.

16. The method according to claim 15, wherein, One or more parameters associated with the user include one or more of the following: the user's eye distance, pupil position, pupil state, correlation between the user's two pupils, the user's head size, the position of the helmet worn by the user, the angle at which the user wears the helmet, the direction in which the user wears the helmet, the alignment of the user's eyes, or a weighted combination of one or more related parameters associated with the user.

17. The method according to claim 1, wherein, The performance threshold is associated with one or more of the following: a predetermined value, a predetermined range, the state of the data, the rate of change of the data, or the trend of data change.

18. The method according to claim 1, wherein, One or more configurations associated with the head-mounted display are associated with one or more of the following: the rendered image, the position of the display screen, or the position of the optical block.

19. One or more computer-readable non-transitory storage media comprising software, which, when executed, is operable to: The performance metric of the eye-tracking system was determined to be below a performance threshold, whereby... The eye-tracking system is associated with a head-mounted display worn by the user; In response to determining that the performance metric is below the performance threshold, one or more pieces of content displayed by the head-mounted display are identified; Based on one or more identified pieces of content, access one or more attributes of said one or more pieces of content; The user's visual convergence distance is predicted based on at least one or more attributes associated with one or more displayed contents; and One or more configurations associated with the head-mounted display are adjusted based on the predicted visual convergence distance of the user.

20. A system comprising: One or more non-transitory computer-readable storage media containing instructions; and One or more processors, coupled to the storage medium and operable to execute the instructions, to perform the following steps: The performance metric of the eye-tracking system is determined to be below a performance threshold, wherein the eye-tracking system is associated with a head-mounted display worn by the user; In response to determining that the performance metric is below the performance threshold, one or more pieces of content displayed by the head-mounted display are identified; Based on one or more identified pieces of content, access one or more attributes of said one or more pieces of content; Predict the user's vergence distance based at least on the one or more attributes associated with one or more displayed contents; and One or more configurations associated with the head-mounted display are adjusted based on the predicted visual convergence distance of the user.

21. A method comprising the following steps performed by a computing system: Determine the visual convergence of a user of an eye-tracking system associated with a head-mounted display worn by the user; The computing system applies a fusion algorithm to determine where the user may be looking, wherein the fusion algorithm weights inputs from eye-tracking-based methods, body-based methods, and content-based methods. When the helmet detects that the user's hand has interacted with a virtual object, the fusion algorithm infers that the user is looking at the virtual object in the user's hand; and After inferring that the user is looking at the virtual object, the configuration is adjusted accordingly to eliminate or mitigate the negative impact caused by visual convergence-accommodation conflict.

22. The method of claim 21, further comprising: The appropriate Z depth for the display screen is determined by the computing system.

23. One or more computer-readable non-transitory storage media comprising software, which, when executed, is operable to: Determine the visual convergence of a user of an eye-tracking system associated with a head-mounted display worn by the user; A fusion algorithm is applied to determine where the user may be looking, wherein the fusion algorithm weights inputs from eye-tracking-based methods, body-based methods, and content-based methods; When the helmet detects that the user's hand has interacted with a virtual object, the fusion algorithm infers that the user is looking at the virtual object in the user's hand; and After inferring that the user is looking at the virtual object, the configuration is adjusted accordingly to eliminate or mitigate the negative impact caused by visual convergence-accommodation conflict.

24. One or more computer-readable non-transitory storage media comprising software according to claim 23, wherein the software is further operable to: Determine the appropriate Z depth for the display screen.

25. A system comprising: One or more non-transitory computer-readable storage media containing instructions; and One or more processors, coupled to the storage medium and operable to execute the instructions, to perform the following steps: Determine the visual convergence of a user of an eye-tracking system associated with a head-mounted display worn by the user; A fusion algorithm is applied to determine where the user may be looking, wherein the fusion algorithm weights inputs from eye-tracking-based methods, body-based methods, and content-based methods; When the helmet detects that the user's hand has interacted with a virtual object, the fusion algorithm infers that the user is looking at the virtual object in the user's hand; and After inferring that the user is looking at the virtual object, the configuration is adjusted accordingly to eliminate or mitigate the negative impact caused by visual convergence-accommodation conflict.

26. The system according to claim 25, wherein, The instructions further include instructions for implementing the following steps: Determine the appropriate Z depth for the display screen.