Method for locating a line of sight impact point and monitoring screen device

By combining multiple camera sensors and CNN convolutional neural networks, the impact of environmental factors on the accuracy of image pattern recognition is solved, achieving high-precision gaze point localization and ROI region detection, thus improving the overall accuracy of pattern recognition.

CN116363200BActive Publication Date: 2026-06-23ZHEJIANG UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG UNIV OF TECH
Filing Date
2023-02-21
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

In existing technologies, image pattern recognition is affected by environmental factors such as lighting, translation, tilt, occlusion, and blur, resulting in low target detection accuracy, especially in the target detection stage.

Method used

This method employs multiple camera sensors to acquire user facial images, combines them with a CNN convolutional neural network, and uses training and testing sets to locate the gaze point. It defines the eye region by utilizing the gradient modulus distribution of grayscale images and connecting edge curves with extreme points, and combines a fully connected layer classifier to determine the gaze point, thus achieving high-precision ROI region detection.

Benefits of technology

It achieves high-precision line-of-sight positioning in complex environments, which helps improve the accuracy of pattern recognition and provides a solid ROI region foundation for subsequent algorithm recognition.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116363200B_ABST
    Figure CN116363200B_ABST
Patent Text Reader

Abstract

The application discloses a line-of-sight landing point positioning method and a monitoring screen device thereof, and comprises the following steps: 1) collecting a user face image through a multi-path camera sensor of the monitoring screen device; 2) inputting the collected face image into a line-of-sight landing point positioning system; 3) extracting an eye area of an image signal received by each sensor; 4) performing CNN convolutional neural network training to generate a line-of-sight landing point positioning system based on a multi-path sensor image; and 5) actual application, outputting a position of a user gaze screen to complete a line-of-sight landing point positioning function. The application uses multiple camera sensors, and is arranged in a mode that the camera sensors are far away from eyeballs and are arranged around the monitoring screen, and based on image processing of common visible light imaging, an artificial intelligence algorithm is assisted to analyze a line-of-sight landing point of a user gaze monitoring screen.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer pattern recognition technology, specifically to a method for locating the point of gaze and a monitoring screen device thereof. Background Technology

[0002] Image-based computer pattern recognition commonly includes facial recognition, license plate recognition, fingerprint recognition, iris recognition, and QR code recognition. In industrial applications, machine vision-based equipment fault detection, product quality inspection, and automatic control often require processing acquired images using built-in algorithms. Examples include the positioning of optical mice, X-ray detection of equipment faults, and reading the draft of bulk carriers. In many cases, factors affecting the accuracy of the algorithm include differences in ambient lighting during image acquisition, translation or rotation of the target (movement or tilt of the camera relative to the target during acquisition), occlusion of the target by interference objects (or damage to the target), and blur (focus blur of the camera or motion blur relative to the target).

[0003] Image pattern recognition algorithms consist of two stages. The first is "Region of Interest (ROI) localization," or target detection, which involves using a bounding box to enclose the target region (i.e., outputting a sub-square in the original image containing the target to be identified). The second stage involves feeding the pixels of the ROI region into the recognition algorithm to complete the formal pattern recognition. For example, face recognition first performs face detection and then face recognition; similarly, other methods, such as recognizing the draft of a bulk carrier, require a preliminary step of detecting and extracting a waterline scale region.

[0004] Internationally, eye-tracking technology, exemplified by Tobii, uses sensors suspended between the eyebrows to monitor the movement of the two eyes relative to the head through iris features, pupil reflection, or infrared imaging. This analysis determines the point of gaze and can be used to monitor driver focus, student attention, and so on. This is one type of optical tracking method. Other solutions use contact sensors to detect pupil movement or voltage around the eyes to determine the eye's position relative to the head.

[0005] In summary, existing image pattern recognition technologies are limited by environmental factors such as lighting, translation, tilt, occlusion, and blurring, as well as the recognition context. These factors have a fatal impact on pattern recognition accuracy, primarily affecting the target detection stage. However, the line-of-sight localization proposed in this invention can be well applied to any pattern recognition context, thus achieving satisfactory target detection accuracy even under these critical conditions, thereby helping to improve the accuracy of the pattern recognition stage. Summary of the Invention

[0006] To address the problems existing in the prior art, this invention provides a reasonably designed method for locating the line of sight and a monitoring screen device thereof.

[0007] The technical solution of the present invention is as follows:

[0008] A method for locating the point of view includes the following steps:

[0009] 1) Collect user facial images using multiple camera sensors on the monitoring screen device;

[0010] 2) Input the collected facial images into the constructed gaze-point localization system;

[0011] 3) Extract the eye region from the image signals received by each camera sensor and divide them into training and test sets;

[0012] 4) Train a CNN convolutional neural network using the training set to generate a gaze-point localization system based on multi-sensor images;

[0013] 5) Input the test set into the gaze-attention positioning system based on multi-sensor images, output the position of the user's gaze on the screen, and complete the gaze-attention positioning of the entire system.

[0014] Furthermore, step 3) is as follows:

[0015] 1) Convert the image to grayscale and calculate the gradient magnitude distribution of the grayscale image;

[0016] 2) Connect the extreme points of the gradient magnitude to form an edge curve;

[0017] 3) Define the eye portion using paired closed elliptical edge curves;

[0018] 4) Based on the distance between the two eyes, a square area is selected to completely include the eye area, thus completing the facial eye area detection.

[0019] Furthermore, step 4) is as follows:

[0020] 1) Training data collection: Divide the screen into multiple grid areas and have different users look at different grid areas in front of the screen. At the same time, multiple cameras on the monitoring screen device collect images of users looking at different areas. For each fixed area, multiple images from multiple users are saved uniformly. With N grids, N+1 types of data are collected, including N types of grid areas that are looked at and 1 type of data that is not looked at.

[0021] 2) For N+1 gaze patterns, design a fully connected layer classifier with M leaf nodes, where 2^M>N+1; perform eye region detection, convolution operation, and pooling on the multi-channel images of each gaze data, and then convert the pooled data of the multi-channel into a one-dimensional vector, that is, obtain a one-dimensional vector by traversing the rows and columns or zigzags of the pooling blocks corresponding to the multi-channel images, and concatenate the corresponding vectors of the multi-channel images.

[0022] 3) Label the output values ​​of the one-dimensional vectors with the binary numbers of the corresponding gaze patterns, and then use these vectors as inputs and the numbers as outputs to train a classifier of the fully connected layer, and finally obtain the trained CNN convolutional neural network.

[0023] Furthermore, the positioning process of the line-of-sight positioning system based on multi-sensor images in step 4) is as follows:

[0024] 1) When the user is looking at the screen, multiple camera sensors capture the user's facial images in real time;

[0025] 2) Extract the eye region from multiple images and input it into the gaze-point localization system;

[0026] 3) Convolve the data from multiple eye regions;

[0027] 4) Pooling of multi-channel convolutional data;

[0028] 5) Convert multi-pooled data into a one-dimensional vector;

[0029] 6) Input the one-dimensional vector into the fully connected layer classifier to obtain the number of the gaze region and determine the localization location.

[0030] A monitoring screen device includes a display screen and four cameras deployed at the four corners of the display screen; it can be used to implement the line-of-sight positioning method of the present invention.

[0031] The beneficial effects of this invention are as follows:

[0032] This invention uses multiple camera sensors, positioned far from the eye and around the monitoring screen. Based on image processing using ordinary visible light imaging, and aided by artificial intelligence algorithms, it analyzes the point where the user's gaze falls on the monitoring screen. Once the position of this focal point on the screen is determined, the ROI region can be determined with relatively high accuracy, thus achieving the goal of perfect target detection and laying a solid foundation for further determination of the ROI boundary. Attached Figure Description

[0033] Figure 1 This is a system block diagram of the present invention;

[0034] Figure 2 This is a network architecture diagram of the present invention.

[0035] Figure 3 This is a screen partitioning diagram of the present invention;

[0036] Figure 4 This is a schematic diagram of the positioning decision of the present invention. Detailed Implementation

[0037] The present invention will be further described below with reference to the accompanying drawings.

[0038] like Figure 1 As shown, the present invention is composed of software and hardware working together. The hardware includes a display screen and four cameras deployed at specific locations on the display screen. In this embodiment, the four cameras are located at the four corners of the display screen.

[0039] When a user looks at the screen, four cameras capture the user's facial image from their respective positions and input it into the gaze positioning software. The software analyzes and makes decisions to output the position where the user is looking at the screen, thus completing the gaze positioning function of the entire system.

[0040] like Figure 2 As shown, the gaze-point localization software consists of two parts: eye region detection and localization calculation based on a CNN (Convolutional Neural Network). The localization calculation based on the CNN is further divided into three parts: convolutional layers, pooling layers, and a fully connected layer classifier for the image.

[0041] Eye region detection: The process of extracting the eye region from the image signals received by each sensor is as follows:

[0042] 1) Convert the image to grayscale and calculate the gradient magnitude distribution of the grayscale image;

[0043] 2) Connect the extreme points of the gradient magnitude to form an edge curve;

[0044] 3) Define the eye portion using paired closed elliptical edge curves;

[0045] 4) Based on the distance between the two eyes, a square area is selected to completely include the eye area, thus completing the facial eye area detection.

[0046] 5) The extracted facial eye region is used as the input for viewpoint localization calculation.

[0047] like Figure 3 As shown, before the localization calculation based on the CNN convolutional neural network, the CNN convolutional neural network needs to be trained. The screen is divided into multiple regions using a nine-square grid or a sixteen-square grid to collect training data.

[0048] The training process for a CNN convolutional neural network is as follows:

[0049] 1) The first step is to collect training data by having different users gaze at different grid areas on the screen, while cameras at the four corners capture images of users gazing at different areas. For each fixed area, the four images from multiple users are saved uniformly; if there is a 16-grid layout, then 17 types of data are collected, including 16 types of gazed grid areas and 1 type of non-gazed grid data.

[0050] 2) For 17 gaze patterns, a fully connected layer classifier with 5 leaf nodes (2^5 = 32 > 17) is designed; then, eye region detection, convolution operation, and pooling are performed on the four images of each gaze pattern. Then, the pooled data of the four images are converted into a one-dimensional vector (by traversing the rows, columns, or zigzags of the pooling blocks corresponding to the four images to obtain a one-dimensional vector, and concatenating the first and last ends of the corresponding vectors of the four images to form a one-dimensional vector with a length four times that of the original).

[0051] 3) Label the output value of the one-dimensional vector with the binary number of the corresponding gaze pattern, and then use these vectors as input and the numbers as output to train the classifier of the fully connected layer, and finally obtain the trained CNN convolutional neural network.

[0052] like Figure 4 As shown, the positioning process of the line-of-sight positioning system based on 4-channel sensor images is as follows:

[0053] 1) When the user is looking at the screen, four cameras capture the user's facial images in real time;

[0054] 2) Four images are detected in the eye region and fed into the gaze-point localization system;

[0055] 3) Convolve the data from the four eye regions;

[0056] 4) Pool the 4-way convolutional data;

[0057] 5) Convert the 4-way pooled data into a one-dimensional vector;

[0058] 6) Input the vector into the fully connected layer classifier to obtain the number of the gaze region and determine the localization location.

[0059] The line-of-sight positioning system is used to determine the location of the line of sight, which will be used for further ROI delimitation;

[0060] Taking image-guided, man-portable cruise missiles as an example, the process involves aiming, launching, and then leaving the target unattended. Specifically, the soldier aims the launch tube at a moving target, then watches the target on a monitor. The monitor, equipped with line-of-sight positioning, accurately determines the soldier's gaze point. Subsequently, built-in software determines the target pixel boundaries based on the viewpoint position and motion frame differences, thus identifying the Region of Interest (ROI). The navigation algorithm then locks onto this ROI for real-time matching and tracking.

[0061] This invention is suitable for image pattern recognition under manual monitoring. Through the design of the image monitoring screen and the cooperation of corresponding supporting software, it can automatically complete the detection and localization of pattern targets, creating conditions for subsequent accurate target recognition by algorithms. Since the human eye can more easily and accurately lock onto the target during manual monitoring, this method can provide highly accurate target detection and localization (which then facilitates accurate ROI box delimitation; an example of the method in this patent will show "how to perform ROI delimitation after the target location is determined"), thus making it easier to achieve the accuracy of subsequent pattern recognition.

[0062] This invention estimates the location of the Region of Interest (ROI) by identifying the focal point of the human eye on the screen through a combination of specialized hardware and software design. This method and system can be applied to automatic pattern recognition in scenarios involving manual surveillance. For example, when customs uses drone cameras to read waterline markings on ships, the monitoring screen only needs to identify the focal point of the human eye to determine the location of the ROI. Then, by utilizing the color difference between the markings and the background, the boundary range of the ROI can be determined. Another example is image-guided, man-portable cruise missiles. The soldier looks at the target on the monitoring screen and clicks to launch. The launching device locks onto the target based on the soldier's gaze, allowing the navigation and control system to guide the missile to track and attack the target.

Claims

1. A method for locating the point of view, characterized in that, Includes the following steps: 1) The user's facial images are captured by multiple camera sensors on the monitoring screen device; 2) Input the collected facial images into the constructed gaze-point localization system; 3) Extract the eye region from the image signals received by each camera sensor and divide them into training and test sets; The positioning process of the line-of-sight positioning system based on multi-sensor images in step 3) is as follows: 1) When the user is looking at the screen, multiple camera sensors capture the user's facial images in real time; 2) Extract the eye region from multiple images and input it into the gaze-point localization system; 3) Convolve the data from multiple eye regions; 4) Pooling of multi-channel convolutional data; 5) Convert multi-pooled data into a one-dimensional vector; 6) Input the one-dimensional vector into the fully connected layer classifier to obtain the number of the gaze region and determine the localization location; 4) Train a CNN convolutional neural network using the training set to generate a gaze-point localization system based on multi-sensor images; Step 4) is as follows: 1) Training data collection: Divide the screen into multiple grid areas and have different users look at different grid areas in front of the screen. At the same time, multiple camera sensors on the monitoring screen device collect images of users looking at different areas. For each fixed area, multiple images from multiple users are saved uniformly. Dividing the screen into N grids requires collecting N+1 types of data, including N types of grid areas being looked at and 1 type of data not being looked at. 2) For N+1 gaze patterns, design a fully connected layer classifier with M leaf nodes, where 2^M>N+1; perform eye region detection, convolution operation, and pooling on the multi-channel images of each gaze data, and then convert the multi-channel pooling data into a one-dimensional vector, that is, obtain a one-dimensional vector by traversing the rows, columns, or zigzags of the pooling blocks corresponding to the multi-channel images, and concatenate the corresponding vectors of the multi-channel images. 3) Label the output value of the one-dimensional vector with the binary number of the corresponding gaze pattern, and then use these vectors as input and the numbers as output to train the classifier of the fully connected layer, and finally obtain the trained CNN convolutional neural network. 4) Input the test set into the gaze-attention positioning system based on multi-sensor images, output the position of the user's gaze on the screen, and complete the gaze-attention positioning of the entire system.

2. The method for locating the point of view according to claim 1, characterized in that, Step 3) is as follows: 1) Convert the image to grayscale and calculate the gradient magnitude distribution of the grayscale image; 2) Connect the extreme points of the gradient magnitude to form an edge curve; 3) Define the eye portion using paired closed elliptical edge curves; 4) Based on the distance between the two eyes, a square area is selected to completely include the eye area, thus completing the facial eye area detection.

3. A monitoring screen device, characterized in that, The monitoring screen device is used for a line-of-sight positioning method according to any one of claims 1-2. The monitoring screen device includes a display screen and four cameras deployed at the four corners of the display screen.