The invention provides a line-of-
sight estimation method based on a depth appearance
gaze network. Main content of the method includes a
gaze data set, a
gaze network, and cross-
data set evaluation. The method comprise the following steps: a large number of images from different participants are collected as a gaze
data set, face marks are manually annotated on subsets of the data set, face calibration is performed on input images obtained via a
monocular RGB camera, a
face detection method and a face mark detection method are adopted to position the marks, a general three-dimensional
face shape model is fitted to estimate a detected three-dimensional face posture, a spatial normalization technique is applied, the
head posture and eye images are distorted to a normalized training space, and a
convolutional neural network is used to learn mapping of the
head posture and the eye images to three-dimensional gaze in a camera coordinate
system. According to the method, a continuous conditional neural
network model is employed to detect the face marks and average face shapes for performing three-dimensional posture
estimation, the method is suitable for line-of-
sight estimation in different environments, and accuracy of estimation results is improved.