Method, system, and apparatus for identifying the point of gaze in eye movements in a three-dimensional environment

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The 3D gaze cone model with dynamic weight assignment and error correction enhances gaze point identification accuracy in complex 3D environments, addressing limitations of conventional 2D trackers.

JP7876090B1Active Publication Date: 2026-06-18南航科技(広東横琴)有限公司

View PDF 3 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Patents
Current Assignee / Owner: 南航科技(広東横琴)有限公司
Filing Date: 2026-04-07
Publication Date: 2026-06-18

Application Information

Patent Timeline

07 Apr 2026

Application

18 Jun 2026

Publication

JP7876090B1

IPC: A61B3/113

AI Tagging

Application Domain

Eye diagnostics

Technology Topics

Pattern recognition Computer graphics (images)

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Conventional eye trackers are limited to 2D space representation, lacking adaptability to 3D environments and lacking error correction mechanisms, leading to inaccurate gaze point identification in complex scenarios.

Method used

A method and system that constructs a 3D gaze cone model based on eye position and direction, assigns dynamic weights to scene objects, calculates geometric intersections, and uses a time-series analysis to correct errors, enhancing accuracy and adaptability in 3D spaces.

Benefits of technology

Enables high-precision gaze point identification in 3D environments by correcting errors and improving accuracy in dynamic scenes, reducing false positives and negatives.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 0007876090000001_ABST

Patent Text Reader

Abstract

This invention belongs to the field of gaze point identification, and more specifically, relates to a method, system, and device for identifying the gaze point of eye movements in a three-dimensional environment. It aims to solve the problems of insufficient adaptability to three-dimensional space under the constraints of a two-dimensional plane and the lack of an error correction mechanism. [Solution] The present invention involves constructing a 3D gaze cone model of the user and a dynamic scene model with semantic labels, selecting candidate objects based on the common part of the geometric bounding boxes, calculating an object score by combining the common part volume ratio and dynamic weights, processing time-series data using a time-series analysis model, and finally identifying the target object that the user is gazing at. The present invention achieves high-precision positioning and error correction of gaze points in a 3D scene by fusing spatial geometric relationships, semantic weights, and time continuity analysis.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] The present invention belongs to the field of fixation point identification, and specifically relates to a method, system and device for identifying a fixation point of eye movement in a three-dimensional environment.

Background Art

[0002] During the process of flight and training, the pilot needs to pay attention to the main panel, overhead panel, center console, and the environment outside the front and side windshields, etc., but the eye tracker cannot cover all these areas. At the same time, in flight training, the evaluation of the pilot's attention allocation is an essential part. In recent years, eye movement tracking technology has been widely applied in fields such as human-computer interaction, psychological research, and virtual reality. The currently mainstream eye tracker realizes the detection of the fixation point based on the infrared light source and image analysis principle. That is, by irradiating the eyes with infrared rays and capturing the reflection images of the sclera, iris and pupil with a high-speed camera, important feature points such as the pupil center and corneal reflection point are extracted, and the fixation direction is specified by calculating the eye movement vector. In order to achieve spatial positioning, in the prior art, usually, at the upper left corner of the screen (the stimulus display area), a virtual display coordinate system is defined where the X-axis extends horizontally to the right, the Y-axis extends vertically downward (the smaller the Y value is towards the upper part of the screen, and the larger the Y value is towards the lower part of the screen), the Z-axis is perpendicular to the screen plane and faces the subject direction, and the Z value of the screen position is always zero. This coordinate system maps the reach point of the line of sight on a two-dimensional plane and realizes basic interaction functions by combining the parameters of the eye position and the fixation direction.

[0003] However, the above technical solution has obvious limitations.

[0004] Lack of adaptability to 3D space under the constraints of a 2D plane: Conventional virtual display coordinate systems are limited to representing only the point of gaze within a 2D plane and cannot accurately identify the depth information (Z-axis direction) of the object of gaze in a true 3D environment (e.g., virtual reality scenes, multi-screen interaction systems, or spatial projection interfaces). As a result, the identification of the point of gaze is limited to projection onto the screen surface, making it difficult to meet the requirements of scenes such as augmented reality and 3D modeling. Therefore, in the current 3D simulation training environments based on panoramic screens, cylindrical screens, and LED displays of simulators, there is a lack of a method to identify the pilot's point of gaze in the training scene. This makes it impossible to evaluate the pilot's attention allocation during the training process, making it difficult to detect problems in the pilot's attention allocation and degrading the quality of training.

[0005] Lack of error correction mechanisms: Limited by hardware precision (e.g., scattering of infrared light sources, camera resolution, ambient light interference) and algorithmic robustness (e.g., pupil elliptic fitting deviation, corneal reflection point drift), conventional technologies lack dynamic error compensation mechanisms. This leads to the accumulation of fixation point positioning errors, especially during prolonged use or in complex environments (e.g., slight head movements, user wearing glasses), impacting the accuracy of interactions and user experience.

[0006] Based on these considerations, the present invention proposes a method, system, and apparatus for identifying the point of gaze in eye movements in a three-dimensional environment. [Overview of the project] [Problems that the invention aims to solve]

[0007] To address the aforementioned problems in the prior art, namely the lack of adaptability to three-dimensional space under the constraints of a two-dimensional plane and the absence of an error correction mechanism, the present invention provides a method, system, and apparatus for identifying the point of fixation of eye movements in a three-dimensional environment. [Means for solving the problem]

[0008] A first aspect of the present invention is a method for identifying the point of gaze in eye movements in a three-dimensional environment, The steps include constructing a 3D gaze cone model based on the user's eye position and gaze direction vector, The steps include constructing a dynamic 3D scene model that includes semantic labels, and assigning dynamic weights to each object in the scene that reflect the object's probability of being looked at within the scene, The process involves obtaining the geometric bounding box of each object in the 3D scene model, calculating the intersection of the geometric bounding box and the 3D gaze cone model, and, if at least one vertex of the object's geometric bounding box is located within the 3D gaze cone model, adding that object to the candidate gaze target set. The process involves calculating the common volume between the candidate object and the 3D gaze cone model, generating weight coefficients based on the ratio of the common volume to the target volume, combining the weight coefficients with the dynamic weights of the object to obtain a final score, processing the score sequence within a time window using a time series analysis model, and outputting the target object that the user is gazing at. including, We propose a method for identifying the point of fixation in eye movements in a three-dimensional environment.

[0009] Furthermore, the method for constructing the three-dimensional gaze cone model is as follows: The steps include obtaining the user's eye position as the vertex of the three-dimensional gaze cone model, The steps include obtaining the line-of-sight direction vector as the central axis of the three-dimensional gaze cone model, The steps include determining the half-opening angle of the three-dimensional gaze cone model according to the measurement accuracy of the eye tracker and the line of sight diffusion angle, The steps include calculating the fixation distance in the three-dimensional space of the three-dimensional fixation cone model by calculating binocular disparity or by using a preset distance extraction method, Steps include constructing a three-dimensional gaze cone model based on the vertex, central axis, half-open angle, and gaze distance, Includes.

[0010] Furthermore, the semantic label includes at least one of the following: airport buildings and equipment, environmental elements, and urban elements.

[0011] Furthermore, the method for calculating the intersection of the geometric boundary box and the three-dimensional gaze cone model is: The steps include: calculating the direction vectors of the vertices of the geometric bounding box of the target object relative to the user's eye position; The steps include determining whether the angle between the aforementioned direction vector and the line-of-sight direction vector is less than or equal to the half-open angle of the three-dimensional gaze cone model, If the bounding angle corresponding to all vertices of the geometric bounding box is greater than the half-opening angle, the object is excluded. If the angle between at least one vertex of the geometric bounding box is less than or equal to the half-open angle, the object is added to the candidate gaze target set. Includes.

[0012] Furthermore, the geometric bounding box may include an axis-parallel bounding box or a polygon mesh.

[0013] Furthermore, the method for obtaining the final score by combining the weight coefficients and the dynamic weights of the objects is as follows: The steps include: calculating the ratio of the common portion volume to the sum of the volume of the target object and the volume of the 3D gaze-on cone model to obtain a weighting coefficient; The steps include multiplying the weight coefficient by the dynamic weight of the object to obtain the final score, Includes.

[0014] Furthermore, the method for calculating the half-opening angle is as follows: A step to calculate the maximum deviation angle according to the measurement accuracy error of the eye tracker, The steps include adding the line-of-sight diffusion angle to the maximum deviation angle to obtain the half-open angle, Includes.

[0015] Furthermore, a method for processing a score series within a time window by a time series analysis model to output an object that a user is gazing at is setting the time series analysis model as a Long Short-Term Memory (LSTM) network, setting the input of the time series analysis model as a score series of candidate objects within a time window, and setting the output as a probability distribution of each object being gazed at and includes.

[0016] Another aspect of the present invention is a fixation point identification system for eye movement in a 3D environment, based on a method for identifying a fixation point of eye movement in a 3D environment, comprising a 3D fixation cone model construction module configured to construct a 3D fixation cone model based on the position of a user's eyes and the line-of-sight direction vector a weight assignment module configured to construct a dynamic 3D scene model including semantic labels and assign a dynamic weight reflecting the probability of an object in the scene being gazed at to each object in the scene a candidate fixation target set construction module configured to obtain the geometric bounding box of each object in the 3D scene model, calculate the common part between the geometric bounding box and the 3D fixation cone model, and incorporate the object into the candidate fixation target set if at least one of the vertices of the geometric bounding box of the object is located within the 3D fixation cone model a target object generation module configured to calculate the common part volume between the candidate target and the 3D fixation cone model, generate a weight coefficient based on the ratio of the common part volume to the target volume, combine the weight coefficient and the dynamic weight of the object to obtain a final score, and process the score series within a time window by a time series analysis model to output an object that a user is gazing at and is provided with proposes a fixation point identification system for eye movement in a 3D environment.

[0017] A third aspect of the present invention is an electronic device At least one processor, and at least one memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the processor, and when the instructions are executed by the processor, a method for identifying a fixation point of eye movement in the above-described three-dimensional environment is realized. An electronic device is proposed.

Advantages of the Invention

[0018] The beneficial effects of the present invention are as follows.

[0019] High-precision identification of fixation points in three-dimensional space: By constructing a three-dimensional fixation cone model and combining the user's line-of-sight direction vector and eye position, the constraints of the conventional two-dimensional plane coordinate system are broken, and thus, it becomes possible to directly represent the fixation direction and potential fixation area in three-dimensional space, effectively meeting the identification requirements for depth information in three-dimensional scenes such as virtual reality, multi-screen interaction, and spatial projection, and solving the problem that in the prior art, it could only be mapped to the projection on the screen surface.

[0020] Fusion of scene dynamic weights and semantics: By assigning dynamic weights to objects in a three-dimensional scene and associating semantic labels, and combining a comprehensive scoring mechanism based on the common partial volume and dynamic weights of candidate objects, the accuracy of discriminating fixation objects in complex scenes is significantly improved. For example, in a dynamic interaction scene, for an object with a high weight, even if a part of the geometric bounding box enters the fixation cone due to device error, it can be preferentially identified through compensation by semantic weight, reducing the false positive rate.

[0021] Autonomous error correction and enhanced robustness: By using a time-series analysis model to smooth the score sequence of the gazed object within a time window, instantaneous errors caused by hardware noise, momentary occlusion, or slight head movements can be suppressed, enabling dynamic error correction. Simultaneously, by calculating the common volume of the geometric boundary box and the gazed cone, it becomes possible to quantify the spatial correlation of the gazed object, avoiding issues such as false detections and missed detections in conventional threshold methods caused by fluctuations in equipment accuracy.

[0022] Optimization of computational efficiency and scene adaptability: A high-speed intersection detection algorithm based on geometric bounding boxes enables efficient selection of candidate gaze targets within a 3D scene, reducing the real-time computation load. By combining this with a hierarchical scoring mechanism using dynamic weights, the technology can be adapted not only to static scenes but also to dynamic scenes, expanding its range of applications. [Brief explanation of the drawing]

[0023] Other features, purposes, and advantages of the present invention will become more apparent through a detailed description of non-limiting embodiments made below with reference to the drawings.

[0024] [Figure 1] This is a schematic flowchart of the overall method for identifying a fixation point in eye movements in a three-dimensional environment according to the present invention. [Figure 2] This is a schematic diagram illustrating the process of constructing a three-dimensional gaze cone model using the gaze point identification method for eye movements in a three-dimensional environment according to the present invention. [Figure 3] This is a schematic diagram of the calculation process for the intersection of the geometric boundary box and the three-dimensional gaze cone model, according to the method for identifying the gaze point of eye movement in a three-dimensional environment according to the present invention. [Modes for carrying out the invention]

[0025] The present invention will be described in more detail below, combining drawings and embodiments. It should be understood that the specific embodiments described here are merely for interpreting the related invention and do not limit the present invention. Furthermore, for the sake of clarity, only the parts relating to the related invention are shown in the drawings.

[0026] Furthermore, the embodiments and features described herein can be combined with each other, provided that no contradictions arise. The present application will now be described in detail in combination with the embodiments, with reference to the drawings.

[0027] A first embodiment of the present invention is a method for identifying the point of gaze of eye movements in a three-dimensional environment, Step S10 involves constructing a three-dimensional gaze cone model based on the user's eyeball position and gaze direction vector, Step S20 involves constructing a dynamic 3D scene model that includes semantic labels, and assigning dynamic weights to each object in the scene that reflect the object's probability of being looked at within the scene. Step S30 involves obtaining the geometric bounding box of each object in the 3D scene model, calculating the intersection of the geometric bounding box and the 3D gaze cone model, and if at least one vertex of the object's geometric bounding box is located within the 3D gaze cone model, then adding the object to the candidate gaze target set. Step S40 involves calculating the common volume between the candidate object and the 3D gaze cone model, generating weight coefficients based on the ratio of the common volume to the target volume, combining the weight coefficients with the dynamic weights of the object to obtain a final score, processing the score sequence within a time window using a time series analysis model, and outputting the target object that the user is gazing at. including, This invention provides a method for identifying the point of fixation in eye movements in a three-dimensional environment.

[0028] To more clearly explain the method for identifying the point of fixation of eye movements in a three-dimensional environment according to the present invention, each step, including steps S10 to S40 of the embodiment of the present invention, will be described in detail below with reference to Figure 1. The details of each step are as follows.

[0029] In step S10, a three-dimensional gaze cone model is constructed based on the user's eyeball position and gaze direction vector.

[0030] Referring to Figure 2, in this embodiment, the method for constructing the three-dimensional gaze cone model is as follows: Step S11 involves obtaining the user's eyeball position as the vertex of the three-dimensional gaze cone model, Step S12 involves obtaining the line-of-sight direction vector as the central axis of the three-dimensional gaze cone model, Step S13 determines the half-opening angle of the three-dimensional gaze cone model according to the measurement accuracy of the eye tracker and the line of sight diffusion angle, Step S14 involves calculating the fixation distance in the three-dimensional space of the three-dimensional fixation cone model by calculating binocular disparity or by using a preset distance extraction method, Step S15 involves constructing a three-dimensional gaze cone model based on the vertex, central axis, half-open angle, and gaze distance. Includes.

[0031] In this embodiment, the method for calculating the half-opening angle is as follows: A step to calculate the maximum deviation angle according to the measurement accuracy error of the eye tracker, The steps include adding the line-of-sight diffusion angle to the maximum deviation angle to obtain the half-open angle, Includes.

[0032] Specifically, in a dynamic 3D scene, the gaze ray provided by an eye tracker typically consists of two elements: a starting point and a direction vector. The starting point is generally the user's eye position (e.g., the center of the eye, or the center point where the gaze of both eyes converges), indicating the origin of the gaze. The direction vector indicates the direction of the eye's gaze and is a unit vector pointing from the starting point towards the user's gaze direction. By using a non-contact eye tracker, a gaze ray (O, d) can be obtained at each time step, where O is the eye position and d is the vector representing the direction of the gaze.

[0033] Theoretically, a gaze ray accurately passes through the object or point the user is fixated on. However, in practical applications, factors such as the accuracy limitations of the eye tracker and compensation errors for head movement introduce a certain degree of uncertainty into the measured gaze direction. This means that a small angle deviation may exist between the true gaze direction and the measured direction. Therefore, representing the gaze direction with only a single infinitely extending ray may be insufficient to cover the true point of fixation.

[0034] To more accurately characterize the region of uncertainty in the line of sight, we introduce the concept of a gaze cone. A gaze cone is a conical volume centered on a measured line of sight ray, defining a conical field of view in space and representing the region that the eye can fixate on. Constructing this cone requires considering two parameters: gaze distance and line of sight uncertainty.

[0035] The gaze distance (or focal point estimate) refers to the depth at which the line of sight intersects in space, as estimated by the eye tracker. If the system can estimate the distance the user is gazing at (for example, by calculating the focal distance using binocular disparity), it can determine one preliminary gaze point at that distance; otherwise, it can select one default distance (for example, extending the line of sight to a fixed depth in the scene, or to the point of collision with the first object) and extract a line of sight ray.

[0036] The uncertainty in gaze direction stems from measurement errors in the eye tracker and residual errors in head movement compensation. Typically, eye trackers provide a single accuracy index (e.g., ±1°) representing the maximum angle by which the gaze direction can deviate from the true direction. This uncertainty angle reflects the tolerance of the gaze direction and is expressed as θ, which is the maximum deviation angle plus the gaze diffusion angle (i.e., the half-opening angle of the cone). For example, if the accuracy of the eye tracker is approximately 1° and the gaze diffusion angle is 3°, then θ = 4° can be adopted as the opening angle parameter for constructing the gaze cone.

[0037] The shape of the gaze cone is determined by combining the two factors described above. When the gaze distance D is known, at a distance D from O, the uncertainty θ of the line of sight corresponds to a single circular region centered on the predicted gaze point, with a radius of approximately Dtanθ. In other words, the greater the distance, the larger the spatial region covered by the uncertainty angle, and conversely, at closer distances, the region covered by uncertainty becomes smaller.

[0038] Geometrically, the point-of-view cone can be defined using the cone's vertex, axis, and opening angle.

[0039] Cone vertex: The eyeball position O is the vertex of the cone.

[0040] Cone axis: The vector d in the line of sight is defined as the central axis of the cone.

[0041] Half-Angle: The half-opening angle θ of the cone (i.e., the angle between the axis and the side surface of the cone) is determined by the maximum deviation determined by the measurement accuracy of the eye tracker.

[0042] Based on the above definition, the criteria for determining whether a point is inside a cone can be explained in algebraic form. Let P be any point in space, and its direction vector relative to the eye be Let's assume the image is JPEG0007876090000002.jpg936. Point P is located inside the cone only if the angle between vector v and the line-of-sight vector d is less than or equal to θ. Expressing this condition using the dot product, we obtain the following determination formula. JPEG0007876090000003.jpg1830 Here, JPEG0007876090000004.jpg147 is In other words, it is the cosine of the angle between vector v and the vector d in the direction of gaze. If this value is greater than or equal to cosθ (equivalent to the angle being less than or equal to θ), it means that P is within the range of the gaze cone; otherwise, it is outside the range of the cone. Intuitively, the gaze cone can be seen as a single "line of sight cone" with O as its vertex, extending along the vector d in the direction of gaze at a half-angle θ, and all possible true gaze directions are encompassed within it.

[0043] In step S20, a dynamic 3D scene model including semantic labels is constructed, and dynamic weights are assigned to each object in the scene that reflect the object's probability of being noticed within the scene.

[0044] In this embodiment, the semantic label includes at least one of the following: airport buildings and equipment, environmental elements, and urban elements.

[0045] The aforementioned airport structures and equipment include at least runways, terminal buildings, PAPI (Precision Approach Path Indicator) lights, airport vehicles, other aircraft, and runway edge lights, while environmental elements include at least mountains, sea surfaces, lake surfaces, river surfaces, clouds, and vegetation, and urban elements include at least buildings, bridges, and automobiles.

[0046] Step S20 in the present invention substantially constructs a digital twin 3D scene and a dynamic weight assignment mechanism, the digital twin being a detailed 3D model representing the real-world environment, including all relevant objects, created by 3D scanning technology or manual modeling. Semantic labels are assigned to each 3D object using computer vision technology. For example, object classification (e.g., runway, terminal building, PAPI lights) is achieved using a 3D object detection algorithm or manual annotation. The scene type (e.g., cruising, landing, ground taxiing, etc.) is identified by a machine learning model based on the placement of objects and label characteristics.

[0047] Depending on the identified scene, dynamic weights are assigned to each object to reflect the likelihood of it being viewed in that scene. For example, in the landing process of a night flight, the runway centerline might be weighted 0.5, the PAPI lights 0.3, and the runway edge lights 0.2. These weights may be predefined (based on statistical data or expert knowledge) or learned through user gaze data.

[0048] In step S30, the geometric bounding box of each object in the 3D scene model is obtained, the intersection of the geometric bounding box and the 3D gaze cone model is calculated, and if at least one vertex of the object's geometric bounding box is located within the 3D gaze cone model, the object is added to the candidate gaze target set.

[0049] Referring to Figure 3, in this embodiment, the method for calculating the intersection of the geometric boundary box and the three-dimensional gaze cone model is: Step S31 calculates the direction vectors of the vertices of the geometric bounding box of the target object relative to the user's eye position, Step S32 determines whether the angle between the direction vector and the line of sight direction vector is less than or equal to the half-open angle of the three-dimensional gaze cone model, If the bounding angle corresponding to all vertices of the geometric bounding box is greater than the half-opening angle, step S33 excludes the object. If the angle between at least one vertex of the geometric bounding box is less than or equal to the half-open angle, step S34 is to include the object in the candidate gaze target set, Includes.

[0050] Here, the geometric bounding box includes an axis-parallel bounding box or a polygon mesh.

[0051] In this embodiment, by utilizing a gaze cone, it is possible to efficiently select objects of interest that the user may focus on within a scene. The method includes the step of checking for the presence of a common area between the surrounding body (e.g., an axis-parallel bounding box AABB or its polygon mesh) of each potential object (object or region) and the gaze cone. One simple determination concept is to calculate the direction vector of the object relative to the eye and determine whether the angle between it and the line of sight is within the range of θ. If the entire geometric shape of the object lies outside the cone (i.e., the angle between all important points and the line of sight axis is greater than θ), the object is unlikely to be a focus of attention and can be excluded. Conversely, if any one point of the object is located within the cone, it indicates that the line of sight cone covers the object, and it is included as a candidate object of interest.

[0052] Apply the above cosine condition to the eight vertices of the AABB of each object. If any one of the vertices is If the condition JPEG0007876090000005.jpg1831 is met, the object is determined to intersect with the gaze cone and should be added to the candidate list. Furthermore, by performing more refined detection (e.g., checking the intersection between the object's fine mesh and the cone) on these preliminaryly selected objects, it becomes possible to identify which parts of the object the line of sight can actually reach.

[0053] In this embodiment, the method for obtaining the final score by combining the weight coefficient and the dynamic weight of the object is as follows: The steps include: calculating the ratio of the common portion volume to the sum of the volume of the target object and the volume of the 3D gaze-on cone model to obtain a weighting coefficient; The steps include multiplying the weight coefficient by the dynamic weight of the object to obtain the final score, Includes.

[0054] The common volume between the cone and each potential target object is calculated, and the area weight coefficient is defined as common volume / (target volume + cone volume). The coefficient is then multiplied by the dynamic weight of the object to obtain the final score. For example, if the common volume is 10, the target volume is 20, and the cone volume is 15, the coefficient will be 10 / (20+15) ≈ 0.286, and if the dynamic weight is 0.5, the final score will be approximately 0.143.

[0055] In step S40, the common volume between the candidate target and the 3D gaze cone model is calculated, weight coefficients are generated based on the ratio of the common volume to the target volume, the weight coefficients and the dynamic weights of the object are combined to obtain the final score, and the score series within the time window is processed by a time series analysis model to output the target object that the user is gazing at.

[0056] In this embodiment, the method for processing the score series within a time window using a time series analysis model and outputting the target object that the user is focusing on is as follows: The time series analysis model is defined as a long-term short-term memory network, the input to the time series analysis model is defined as a score sequence of candidate objects within a time window, and the output is defined as a probability distribution of which objects are being looked at. Includes.

[0057] Within a set detection time window (e.g., 200ms), score sequences for each object are collected. These sequences are processed using an LSTM network, with the input being the time-series score and the output being the probability distribution of when an object is gazed upon. The LSTM model is trained on labeled gaze data, which includes gaze sequences of known target objects, thereby capturing spatiotemporal features (e.g., dynamic changes in gaze location and time).

[0058] Based on the score sequence, spatiotemporal features (e.g., gaze position and dynamic changes over time) are combined to ultimately identify the object the user is fixated on.

[0059] In the above embodiment, each step was described in the order described above, but as those skilled in the art will understand, in order to achieve the effects of this embodiment, the steps do not necessarily have to be performed in this order, but may be performed simultaneously (in parallel) or in reverse order, and all of these simple variations are within the scope of protection of the present invention.

[0060] A second embodiment of the present invention is a gaze point identification system for eye movements in a three-dimensional environment, based on a gaze point identification method for eye movements in a three-dimensional environment, A 3D gaze cone model construction module configured to construct a 3D gaze cone model based on the user's eyeball position and gaze direction vector, A weight assignment module is configured to construct a dynamic 3D scene model including semantic labels and to assign dynamic weights to each object in the scene that reflect the object's probability of being looked at within the scene. A candidate gaze target set construction module is configured to obtain the geometric bounding box of each object in a 3D scene model, calculate the intersection of the geometric bounding box and the 3D gaze cone model, and if at least one vertex of the object's geometric bounding box is located within the 3D gaze cone model, include that object in the candidate gaze target set. A target object generation module is configured to calculate the common volume of a candidate target and a 3D gaze cone model, generate weight coefficients based on the ratio of the common volume to the target volume, combine the weight coefficients with the dynamic weights of the object to obtain a final score, process the score sequence within a time window using a time series analysis model, and output the target object that the user is gazing at. It is equipped with.

[0061] For the convenience and brevity of explanation, and so that those skilled in the art can clearly understand, the specific operating processes and related explanations of the system described above can be found in the corresponding processes in the previously mentioned method embodiments and will not be described again here.

[0062] It should be noted that the gaze point identification system for eye movements in a three-dimensional environment according to the above embodiment was merely illustrated in the allocation of each functional module, and in actual applications, the above functions can be assigned to different functional modules as needed to complete the system. In other words, the modules or steps according to the embodiment of the present invention may be further broken down or combined. For example, the modules of the above embodiment may be integrated into a single module to complete all or some of the functions described above, or they may be further divided into multiple submodules. The names of the modules and steps related to the embodiment of the present invention are merely for distinguishing each module or step and should not be considered an unreasonable limitation to the present invention.

[0063] A third embodiment of the present invention is an electronic device, At least one processor, A memory connected to at least one of the processors, The memory stores instructions that can be executed by the processor, and these instructions, when executed by the processor, realize the method for identifying the point of fixation of eye movements in the three-dimensional environment described above.

[0064] A fourth embodiment of the present invention is a computer-readable storage medium, the computer-readable storage medium storing computer instructions for realizing the fixation point identification method for eye movements in the three-dimensional environment described above, which are executed by the computer.

[0065] For the convenience and brevity of explanation, and so that those skilled in the art can clearly understand, the specific operating processes and related explanations of the storage device and processing device described above can be found in the corresponding processes in the previously mentioned method embodiments and will not be described again here.

[0066] As will be recognized by those skilled in the art, each exemplary module and method step described in relation to the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination thereof, and the programs corresponding to the software modules and method steps may be stored in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or any other form of storage medium known in the art. To clearly illustrate the compatibility of electronic hardware and software, the above description generally describes each exemplary configuration and step according to its function. Whether these functions are ultimately performed as electronic hardware or as software will depend on the specific application and design constraints of the invention. Those skilled in the art may implement the described functions using different methods for each specific application, but such implementations should not be construed as exceeding the scope of the invention.

[0067] The terms "first," "second," etc., are used to distinguish similar objects and are not intended to describe or represent a specific order or sequence.

[0068] The term “including” or any other similar term is intended to cover non-exclusive inclusion such that a process, method, article or apparatus / device containing a set of elements includes not only those elements but also other elements not explicitly listed, or elements specific to those processes, methods, articles or apparatus / devices.

[0069] Although the technical proposals of the present invention have been described above in relation to preferred embodiments shown in the drawings, it goes without saying that the scope of protection of the present invention is not limited to these specific embodiments, as will be easily understood by those skilled in the art. Those skilled in the art can make equivalent modifications or substitutions to the relevant technical features without departing from the principles of the present invention, and all such modified or substituted technical proposals will fall within the scope of protection of the present invention.

Claims

1. A method for identifying the point of fixation in eye movements in a three-dimensional environment, The steps include constructing a three-dimensional gaze cone model based on the user's eyeball position and gaze direction vector, The steps include constructing a dynamic 3D scene model that includes semantic labels, and assigning dynamic weights to each object in the scene that reflect the object's probability of being looked at within the scene, The process involves obtaining the geometric bounding box of each object in the 3D scene model, calculating the intersection of the geometric bounding box and the 3D gaze cone model, and, if at least one vertex of the object's geometric bounding box is located within the 3D gaze cone model, adding that object to the candidate gaze target set. The process involves calculating the common volume between the candidate object and the 3D gaze cone model, generating weight coefficients based on the ratio of the common volume to the target volume, combining the weight coefficients with the dynamic weights of the object to obtain a final score, processing the score sequence within a time window using a time series analysis model, and outputting the target object that the user is gazing at. including, A method for identifying a point of fixation in eye movements in a three-dimensional environment, characterized by the above.

2. The method for constructing the aforementioned three-dimensional gaze cone model is: The steps include obtaining the user's eyeball position as the vertex of the three-dimensional gaze cone model, The steps include obtaining the line-of-sight direction vector as the central axis of the three-dimensional gaze cone model, The steps include determining the half-opening angle of the three-dimensional gaze cone model according to the measurement accuracy of the eye tracker and the line of sight diffusion angle, The steps include calculating the fixation distance in the three-dimensional space of the three-dimensional fixation cone model by calculating binocular disparity or by using a preset distance extraction method, Steps include constructing a three-dimensional gaze cone model based on the vertex, central axis, half-open angle, and gaze distance, including, The method for identifying a point of fixation in eye movements in a three-dimensional environment as described in feature 1.

3. The aforementioned semantic label includes at least one of the following: airport buildings and equipment, environmental elements, and urban elements. The method for identifying a point of fixation in eye movements in a three-dimensional environment as described in feature 1.

4. The method for calculating the intersection of the geometric bounding box and the three-dimensional gaze cone model is: The steps include: calculating the direction vectors of the vertices of the geometric bounding box of the target object relative to the user's eye position; The steps include determining whether the angle between the aforementioned direction vector and the line-of-sight direction vector is less than or equal to the half-open angle of the three-dimensional gaze cone model, If the bounding angle corresponding to all vertices of the geometric bounding box is greater than the half-opening angle, the object is excluded. If the angle between at least one vertex of the geometric bounding box is less than or equal to the half-open angle, the object is added to the candidate gaze target set. including, The method for identifying a point of fixation for eye movements in a three-dimensional environment as described in feature 2.

5. The aforementioned geometric boundary box includes an axis-parallel boundary box or a polygon mesh. The method for identifying a point of fixation for eye movements in a three-dimensional environment according to feature 4.

6. The method for obtaining the final score by combining the aforementioned weight coefficients and the dynamic weights of the objects is: The steps include: calculating the ratio of the common portion volume to the sum of the volume of the target object and the volume of the 3D gaze-on cone model to obtain a weighting coefficient; The steps include multiplying the weight coefficient by the dynamic weight of the object to obtain the final score, including, The method for identifying a point of fixation in eye movements in a three-dimensional environment as described in feature 1.

7. The method for calculating the half-opening angle is as follows: A step to calculate the maximum deviation angle according to the measurement accuracy error of the eye tracker, The steps include adding the line-of-sight diffusion angle to the maximum deviation angle to obtain the half-open angle, including, The method for identifying a point of fixation for eye movements in a three-dimensional environment as described in feature 2.

8. A method for processing score sequences within a time window using a time series analysis model to output the target object that the user is focusing on is: The time series analysis model is defined as a long-term short-term memory network, the input to the time series analysis model is defined as a score sequence of candidate objects within a time window, and the output is defined as a probability distribution of which objects are being looked at. including, The method for identifying a point of fixation in eye movements in a three-dimensional environment as described in feature 1.

9. A system for identifying the point of gaze of eye movements in a three-dimensional environment, based on a method for identifying the point of gaze of eye movements in a three-dimensional environment according to any one of claims 1 to 8, A 3D gaze cone model construction module configured to construct a 3D gaze cone model based on the user's eyeball position and gaze direction vector, A weight assignment module is configured to construct a dynamic 3D scene model including semantic labels and to assign dynamic weights to each object in the scene that reflect the object's probability of being noticed within the scene. A candidate gaze target set construction module is configured to obtain the geometric bounding box of each object in a 3D scene model, calculate the intersection of the geometric bounding box and the 3D gaze cone model, and if at least one vertex of the object's geometric bounding box is located within the 3D gaze cone model, include that object in the candidate gaze target set. A target object generation module is configured to calculate the common volume between a candidate target and a 3D gaze cone model, generate weight coefficients based on the ratio of the common volume to the target volume, combine the weight coefficients with the dynamic weights of the object to obtain a final score, process the score sequence within a time window using a time series analysis model, and output the target object that the user is gazing at. Equipped with, A gaze point identification system for eye movements in a three-dimensional environment, characterized by the following features.

10. It is an electronic device, At least one processor, A memory connected to at least one of the processors, The memory stores instructions that can be executed by the processor, and the instructions, when executed by the processor, realize a method for identifying a point of gaze of eye movement in a three-dimensional environment according to any one of claims 1 to 8. An electronic device characterized by the following features.