A calibration-free eye tracking method and system based on constructing eyeball internal parameters

CN120949929BActive Publication Date: 2026-06-23ZHEJIANG UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG UNIV
Filing Date
2025-07-29
Publication Date
2026-06-23

Smart Images

  • Figure CN120949929B_ABST
    Figure CN120949929B_ABST
Patent Text Reader

Abstract

The application discloses a kind of based on the construction eyeball reference's free calibration eye movement tracking method and system, comprising: obtaining the eye image and scene image when verifier gazes multiple specific target points, corresponding three-dimensional line-of-sight vector and the two-dimensional coordinates of target point are generated based on existing algorithm;Utilize the three-dimensional line-of-sight vector and two-dimensional coordinates, mapping matrix parameters based on camera reference model are solved by least square method;In actual eye movement tracking process, the three-dimensional line-of-sight vector of real-time acquisition is substituted into mapping relationship model containing the mapping matrix parameters, and the two-dimensional coordinates of gazing point in world camera image are calculated;Wherein, the component ratio of three-dimensional line-of-sight vector is used as normalized image coordinate, and linear mapping relationship with two-dimensional coordinates is established.The application reduces cost while improving the accuracy of tracking.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of eye tracking and image processing technology, and particularly relates to a calibration-free eye tracking method and system based on constructing intraocular parameters. Background Technology

[0002] Eye-tracking technology refers to the process of acquiring images of a user's eyes, analyzing their gaze direction, and thus inferring their gaze location or region of interest. Specifically, the core objective of eye-tracking is to determine the spatial location of a user's gaze at a given moment, typically represented as a two-dimensional coordinate point in a scene image. This process is widely used in human-computer interaction, visual perception research, augmented reality, driver assistance, and advertising effectiveness evaluation, among other fields. However, current eye-tracking methods require calibration before use, increasing the complexity of device deployment and usage. This is because current three-dimensional gaze estimation results for eye images are based on the eye coordinate system, and the mapping relationship between the eye coordinate system and the scene camera image plane needs to be calibrated before use. Furthermore, over extended use, device slippage can lead to decreased accuracy, necessitating recalibration.

[0003] Existing technical solutions and their shortcomings:

[0004] Explicit calibration methods: These methods require users to gaze at a series of preset screen calibration points (typically 9 points) or perform other specific tasks (such as watching videos, clicking a mouse, or tracking targets) before use to collect gaze data. This data is used to fit an individualized model, thereby achieving more accurate gaze prediction. Implicit calibration methods: These methods typically obtain calibration data indirectly through the user's natural interactive behaviors during actual tasks, such as clicking, gazing at predictable targets, and hand-eye coordination. Compared to explicit calibration, implicit calibration methods are more natural, but still rely on the user's active participation in specific interactive processes. Current mainstream explicit or implicit calibration methods all require performing specific interactive tasks before use to build an individualized model. This process is not only time-consuming and cumbersome, but also requires a high degree of user cooperation, especially for children, the elderly, or users with cognitive / physical disabilities. Even after individualized calibration is completed, the established model is still extremely sensitive to changes in head posture; slight displacement or rotation can cause gaze prediction to fail, thus requiring a re-calibration process.

[0005] Calibration-free methods: These methods typically construct a mapping relationship between 3D gaze direction and eye image features (such as pupil center, iris, corneal reflection, eye corners, and head posture) based on geometric features or deep learning, allowing them to be used without a calibration process in practical applications. While existing calibration-free methods avoid explicit interaction, they still rely on eye and head features expressed in global space, making them susceptible to inter-individual physiological differences (such as eye position and head shape) and natural posture variations, leading to significant prediction errors. The lack of additional supervision or personalized adjustments makes it difficult to simultaneously meet the requirements of robustness, low cost, and immediacy in practical deployments. Summary of the Invention

[0006] To address the aforementioned technical problems, this invention proposes a calibration-free eye-tracking method and system based on constructing intraocular parameters, which reduces costs while improving tracking accuracy.

[0007] To achieve the above objectives, this invention provides a calibration-free eye-tracking method based on constructing intraocular parameters, comprising:

[0008] Acquire eye images and scene images when the verifier gazes at multiple specific target points, and generate corresponding three-dimensional gaze vectors and two-dimensional coordinates of the target points based on existing algorithms;

[0009] Using the aforementioned three-dimensional line-of-sight vector and two-dimensional coordinates, the mapping matrix parameters based on the camera intrinsic parameter model are solved by the least squares method;

[0010] In actual eye tracking, the real-time acquired three-dimensional gaze vector is substituted into the mapping relationship model containing the mapping matrix parameters to calculate the two-dimensional coordinates of the gaze point in the world camera image.

[0011] The mapping relationship model uses the component ratio of the three-dimensional line-of-sight vector as normalized image coordinates to establish a linear mapping relationship with the two-dimensional coordinates.

[0012] On the other hand, to achieve the above objectives, the present invention also provides a calibration-free eye-tracking system based on constructing intraocular parameters, comprising:

[0013] The first acquisition module is used to acquire eye images and scene images when the verifier stares at multiple specific target points, and generate corresponding three-dimensional gaze vectors and two-dimensional coordinates of the target points based on existing algorithms;

[0014] The second calibration module is used to receive the three-dimensional line-of-sight vector and two-dimensional coordinates, and solve the mapping matrix parameters based on the camera intrinsic parameter model by the least squares method;

[0015] The third inference module is used to substitute the real-time three-dimensional gaze vector into the mapping relationship model containing the mapping matrix parameters to calculate the two-dimensional coordinates of the gaze point in the world camera image.

[0016] The mapping relationship model described above establishes a linear mapping using the component ratios of the three-dimensional view vector as normalized image coordinates.

[0017] Technical effects of the invention:

[0018] (1) Eliminate user interaction burden: The calibration point data collection (such as 9-point calibration board) is completed at the factory, and the end user does not need to perform any calibration operation, which completely solves the limitations of existing technology on children, the elderly and people with mobility impairments.

[0019] (2) Overcoming the limitations of individual physiological differences: The ratio of three-dimensional gaze vector components is used as normalized image coordinates. A linear mapping relationship is established through the camera intrinsic physical model, which reduces the system’s sensitivity to individual differences such as eye position and head shape by more than 80% (compared to the traditional calibration-free method).

[0020] (3) Improved head motion tolerance: The mapping model remains stable when the head displacement is ≤5mm or the rotation is ≤10°, which is 5 times larger than the tolerance range of traditional calibration methods.

[0021] (4) Achieve sub-millisecond real-time computing: The single solution delay of the linear mapping equation is ≤0.05ms, which meets the real-time interaction requirements of scenarios such as augmented reality.

[0022] (5) Ensure high-precision gaze positioning: Under standard test conditions, the gaze point prediction angle error is ≤1°, and the coordinate positioning accuracy is 2 times higher than that of the individualized model. Attached Figure Description

[0023] The accompanying drawings, which form part of this application, are used to provide a further understanding of this application. The illustrative embodiments and descriptions of this application are used to explain this application and do not constitute an undue limitation of this application. In the drawings:

[0024] Figure 1 This is a schematic flowchart of a calibration-free eye-tracking method based on constructing intraocular parameters according to an embodiment of the present invention;

[0025] Figure 2 This is a schematic diagram of the structure of a calibration-free eye-tracking system based on the construction of intraocular parameters, according to an embodiment of the present invention. Detailed Implementation

[0026] It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other. This application will now be described in detail with reference to the accompanying drawings and embodiments.

[0027] It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and although a logical order is shown in the flowchart, in some cases the steps shown or described may be executed in a different order than that shown here.

[0028] like Figure 1 As shown, this embodiment provides a calibration-free eye-tracking method based on constructing intraocular parameters, including:

[0029] Acquire eye images and scene images when the verifier gazes at multiple specific target points, and generate corresponding three-dimensional gaze vectors and two-dimensional coordinates of the target points based on existing algorithms;

[0030] Using the aforementioned three-dimensional line-of-sight vector and two-dimensional coordinates, the mapping matrix parameters based on the camera intrinsic parameter model are solved by the least squares method;

[0031] In actual eye tracking, the real-time acquired three-dimensional gaze vector is substituted into the mapping relationship model containing the mapping matrix parameters to calculate the two-dimensional coordinates of the gaze point in the world camera image.

[0032] The mapping relationship model uses the component ratio of the three-dimensional line-of-sight vector as normalized image coordinates to establish a linear mapping relationship with the two-dimensional coordinates.

[0033] Furthermore, the process of acquiring the three-dimensional line-of-sight vector and two-dimensional coordinates is performed in a controlled offline environment before the device leaves the factory.

[0034] Furthermore, the mapping relationship model is a camera intrinsic parameter model, which specifically establishes a linear mapping with two-dimensional pixel coordinates by using the component ratio of the three-dimensional gaze vector as normalized image coordinates.

[0035] Furthermore, the three-dimensional gaze vector is represented by a unit vector d = (α, β, γ), and the component ratios are α / γ and β / γ.

[0036] Furthermore, the solution process includes:

[0037] Construct a linear equation about the mapping matrix parameters for each set of target point data;

[0038] Integrate all equations to form an overdetermined linear system;

[0039] The optimal estimates of the mapping matrix parameters are obtained by solving the normal equations.

[0040] Furthermore, the calculation process specifically involves substituting the component ratios of the real-time three-dimensional view vector into an equation containing mapping matrix parameters to directly parse out the two-dimensional pixel coordinates.

[0041] Furthermore, the gaze point is defined as the center point of the image region at the calculated two-dimensional coordinate position.

[0042] Specifically, the implementation process of this embodiment includes the following steps:

[0043] The gaze point is obtained from the scene image, and the mapping point of the gaze point p on the pixel plane of the forward-looking camera image is defined as: p0 = (u, v);

[0044] Before the equipment is put into use, a calibrator wears the equipment and observes nine calibration points on a given calibration board. Nine sets of mapping point data (p) are obtained using computer target recognition technology. i =(u i v i ).

[0045] The gaze direction vector is obtained by using existing algorithms. The three-dimensional unit vector of the wearer's gaze direction in the near-eye camera coordinate system is defined as: d=(α,β,γ).

[0046] While the verifier gazes at the nine calibration points on the given calibration board, nine sets of three-dimensional unit vectors d representing the gaze direction are calculated through eye image processing. i =(α i ,β i γ i ).

[0047] Mapping Modeling Based on Camera Intrinsic Model: In computer vision, the camera intrinsic model describes how a camera projects points from the 3D world onto its 2D image sensor. Essentially, it's a mathematical framework that defines the transformation relationship between 3D points in the camera coordinate system and 2D pixel coordinates on the image plane. Using a near-eye camera as the origin of the camera coordinate system and the foreseeable world as the image plane, a mapping model between the gaze direction and the projected points is constructed. Because its structure is identical to that of an ideal camera intrinsic model, the mapping relationship can be constructed by obtaining the intrinsic parameter matrix.

[0048] Projection process: For the constructed mapping model, the fixation point p = (x, y, z) is first projected onto a normalized image plane with a depth of 1. Based on the principle of similar triangles, the normalized image coordinates can be obtained:

[0049]

[0050] Then, the "intrinsic parameter matrix" K of the mapping relationship model is used to establish a relationship between it and the pixel coordinates p0 = (u, v):

[0051]

[0052] When the gaze direction is accurate, that is, when the gaze direction points to the fixation point, It can be obtained from the unit vector d of the gaze direction, that is:

[0053] Therefore, as long as matrix K is determined through scientific calibration methods, the pixel coordinates p0 of the gaze point in the forward-looking world camera image can be obtained using the gaze direction unit vector d.

[0054] Solving for matrix K using the least squares method:

[0055] Based on the nine sets of data obtained and the derived matrix equations, the unknowns in matrix K are first organized into a vector x to be solved. k :

[0056]

[0057] Let x′=α / γ, y′=β / γ, we get Equivalent to:

[0058] u = f x x′+sy′+c x ;

[0059] v = f y y′+c y ;

[0060] 1 = 1;

[0061] For each set of calibration data (α) i ,β i γ i u i v i Both can yield two results regarding the unknown x. k linear equation u i =f x x′ i +sy i ′+c x and v i =f y y i ′+c y Therefore, for a single set of data i, its matrix form is:

[0062]

[0063] There are 9 sets of data (i.e., N=9). Each set of data provides 2 equations (2 lines). Therefore, all 9 sets of data need to be stacked together to form a large matrix A and a large vector b.

[0064] Matrix A will have 2 × 9 = 18 rows and 5 columns (because there are 5 unknowns).

[0065] Vector b will have 18 rows and 1 column.

[0066] The matrix A is constructed as follows, for the i-th data set (i from 1 to 9):

[0067] Line 2i-1 is: [x i ′ 0 1 0 y i ′];

[0068] Line 2i is: [0 y i ′ 0 1 0];

[0069] The b vector is constructed as follows:

[0070] The (2i-1)th element is: u i ;

[0071] The second i-th element is: v i ;

[0072] A complete, overdetermined linear system was obtained:

[0073] Ax k =b;

[0074] Where A is an 18×5 matrix, x k is a 5×1 unknown vector, and b is an 18×1 observed vector.

[0075] The goal is to find an x k The sum of squares of the errors is ||Ax k -b|| 2 Minimize, by solving the normal equation (A T A)x k =A T b obtains x k Best estimate:

[0076] x k =(A T A) -1 A T b;

[0077] Finally, the intrinsic parameter matrix K is reconstructed using this solution:

[0078]

[0079] In actual eye tracking, based on the real-time obtained 3D gaze vector d and the intrinsic parameter matrix K of the mapping model obtained from prior calibration, the equation can be used to...

[0080]

[0081] Obtain the pixel coordinates (u, v) of the gaze point in the forward-looking world camera view.

[0082] like Figure 2 As shown, this embodiment provides a calibration-free eye-tracking system based on constructing intraocular parameters, including:

[0083] The first acquisition module is used to acquire the necessary raw data when the user gazes at a specific target point. This module consists of two parts: a near-eye camera and a world camera. The near-eye camera acquires images of the user's eyes and calculates a 3D gaze vector based on existing eye-tracking algorithms; the world camera simultaneously acquires images of the corresponding scene and determines the 2D coordinates of the user's current gaze target point in the image plane. The output data of this module consists of multiple 3D gaze vectors and their corresponding 2D gaze point coordinate pairs, which are used for subsequent modeling.

[0084] The second calibration module is used to establish the mapping relationship between the 3D gaze vectors and the world camera image plane. This module takes the multiple gaze vectors acquired by the first acquisition module and their corresponding 2D gaze points as input, and uses the least squares method to solve for the projection matrix parameters based on the camera intrinsic parameter model, thereby establishing a linear mapping model equivalent to the camera intrinsic parameter matrix. This model reflects the projection pattern of the 3D gaze vectors in the world camera image plane, serving as the core basis for subsequent inference.

[0085] The third inference module is used to calculate the coordinates of the user's current gaze point in the world camera image during actual eye tracking, based on the real-time obtained 3D gaze vector and the mapping matrix parameters obtained by the second calibration module. This module substitutes the input gaze vector into the established mapping relationship to infer the corresponding 2D image coordinates, and selects the center of the image region or other specified feature points at these coordinates as the eye tracking result, thereby achieving real-time gaze point localization without user interaction calibration.

[0086] Specifically, the first acquisition module takes as input the eye image and scene image when the user gazes at a specific target point, and outputs a three-dimensional gaze vector calculated based on the eye image and two-dimensional coordinates of the target point on the image plane; the second calibration module takes as input the three-dimensional gaze vector and two-dimensional coordinates corresponding to multiple target points, and outputs mapping matrix parameters calculated based on the camera intrinsic parameter model; the third inference module takes as input the three-dimensional gaze vector and mapping matrix parameters obtained in real time, and outputs two-dimensional coordinates of the current gaze point on the world camera image plane.

[0087] The above are merely preferred embodiments of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.

Claims

1. A calibration-free eye-tracking method based on constructing intraocular parameters, characterized in that, include: Acquire eye images and scene images when the verifier gazes at multiple specific target points, and generate corresponding three-dimensional gaze vectors and two-dimensional coordinates of the target points based on existing algorithms; Using the aforementioned three-dimensional line-of-sight vector and two-dimensional coordinates, the mapping matrix parameters based on the camera intrinsic parameter model are solved by the least squares method; In actual eye tracking, the real-time acquired three-dimensional gaze vector is substituted into the mapping relationship model containing the mapping matrix parameters to calculate the two-dimensional coordinates of the gaze point in the world camera image. The mapping relationship model uses the component ratio of the three-dimensional line-of-sight vector as normalized image coordinates to establish a linear mapping relationship with the two-dimensional coordinates. The process of acquiring the three-dimensional line-of-sight vector and two-dimensional coordinates is performed in a controlled offline environment before the device leaves the factory. The mapping relationship model is a camera intrinsic parameter model, which specifically establishes a linear mapping with two-dimensional pixel coordinates by using the component ratio of the three-dimensional gaze vector as normalized image coordinates. The three-dimensional line-of-sight vector is a unit vector. This indicates that the component ratio is / and / ; The solution process includes: Construct a linear equation about the mapping matrix parameters for each set of target point data; Integrate all equations to form an overdetermined linear system; The optimal estimates of the mapping matrix parameters are obtained by solving the normal equations.

2. The calibration-free eye-tracking method based on constructing intraocular parameters as described in claim 1, characterized in that, The calculation process is as follows: the component ratios of the real-time three-dimensional gaze vector are substituted into the equation containing the mapping matrix parameters to directly parse out the two-dimensional pixel coordinates.

3. The calibration-free eye-tracking method based on constructing intraocular parameters as described in claim 1, characterized in that, The gaze point is defined as the center point of the image region at the calculated two-dimensional coordinate position.

4. A system for a calibration-free eye-tracking method based on constructing intraocular parameters according to any one of claims 1-3, characterized in that, include: The first acquisition module is used to acquire eye images and scene images when the verifier stares at multiple specific target points, and generate corresponding three-dimensional gaze vectors and two-dimensional coordinates of the target points based on existing algorithms; The second calibration module is used to receive the three-dimensional line-of-sight vector and two-dimensional coordinates, and solve the mapping matrix parameters based on the camera intrinsic parameter model by the least squares method; The third inference module is used to substitute the real-time three-dimensional gaze vector into the mapping relationship model containing the mapping matrix parameters to calculate the two-dimensional coordinates of the gaze point in the world camera image. The mapping relationship model described above establishes a linear mapping using the component ratios of the three-dimensional view vector as normalized image coordinates.