Gaze point estimation device, selected path prediction device, movement device, gaze point estimation method, selected path prediction method, and program
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Patents
- Current Assignee / Owner
- HIROSHIMA CITY UNIVERSITY
- Filing Date
- 2022-05-25
- Publication Date
- 2026-06-15
Smart Images

Figure 0007873814000001 
Figure 0007873814000002 
Figure 0007873814000003
Abstract
Claims
[Claim 1] A gaze point estimation device that estimates the gaze point of a subject, A first camera that images the subject's face from the left front side, A second camera, whose positional relationship with the first camera is fixed, captures images of the subject's face from the right front side, A 360-degree camera whose positional relationship with the first and second cameras is fixed, and which images the area around the subject, A learning data generation unit generates supervised learning learning data, which takes a first image captured by the first camera and a second image captured by the second camera as input data, and outputs the position coordinates of the point of fixation in the unit spherical coordinate system in a third image captured by the 360-degree camera. An estimation unit is provided which includes a deep learning model, and after supervised learning of the deep learning model using the learning data generated by the learning data generation unit, the deep learning model estimates the position coordinates of the point of focus based on the newly captured first image and second image. A gaze point estimation device equipped with the following features. [Claim 2] The deep learning machine described above is The system comprises a first neural network which is a convolutional neural network that takes the first image and the second image as inputs. The gaze point estimation device according to claim 1. [Claim 3] The deep learning machine described above is The system includes a second neural network, which is a neural network that obtains information for estimating the point of focus based on time-series data output from the first neural network. The gaze point estimation device according to claim 2. [Claim 4] The second neural network is LSTM (Long Short-Term Memory). The gaze point estimation device according to claim 3. [Claim 5] The aforementioned learning data generation unit, The first, second, and third images are captured when the subject gazes at a specific location, and the learning data is generated by detecting the specific location as the gaze point from the third image. The gaze point estimation device according to claim 1. [Claim 6] A drive unit that drives a mobile body carrying a person, A gaze point estimation device according to any one of claims 1 to 5, A control unit controls the drive unit to move in the direction of the position coordinates of the object's gaze point estimated by the gaze point estimation device, A mobile device equipped with the following features. [Claim 7] A route selection prediction device that predicts the route a subject will choose, A first camera that outputs time-series data of a first image obtained by imaging the subject's face from the left front side, A second camera whose positional relationship with the first camera is fixed, and which outputs time-series data of a second image obtained by imaging the subject's face from the right front side, A 360-degree camera whose positional relationship with the first and second cameras is fixed, and which outputs time-series data of a third image obtained by imaging the area around the subject, A gaze point estimation unit estimates the time series data of the position coordinates of the subject's gaze point using deep learning, based on the time series data of the first image and the time series data of the second image. An environmental structure estimation unit estimates, based on the time-series data of the third image, the environmental structure estimation unit estimates the time-series data of the environmental structure related to the paths around the subject using deep learning, A path prediction unit predicts the path selected by the subject using deep learning, based on time-series data of the position coordinates of the subject's gaze point estimated by the gaze point estimation unit, and time-series data of either the environmental structure estimated by the environmental structure estimation unit or feature vectors obtained intermediately when estimating the environmental structure. A selection path prediction device equipped with the following features. [Claim 8] The aforementioned environmental structure estimation unit is A convolutional neural network that takes the time-series data of the third image as input, A recurrent neural network that obtains feature vectors for estimating the time-series data of the environmental structure based on the time-series data output from the aforementioned convolutional neural network, A fully connected layer that combines the feature vectors output from the recurrent neural network to output time-series data of the estimated environmental structure, Equipped with, The aforementioned path prediction unit, Based on the estimated time-series data of the environmental structure, predict the path that the subject will choose. The selected path prediction device according to claim 7. [Claim 9] The aforementioned environmental structure estimation unit is A convolutional neural network that takes the time-series data of the third image as input, A recurrent neural network that obtains feature vectors for estimating the time-series data of the environmental structure based on the time-series data output from the aforementioned convolutional neural network, A fully connected layer that combines the feature vectors output from the recurrent neural network to output time-series data of the estimated environmental structure, Equipped with, The aforementioned path prediction unit, Based on the time-series data of feature vectors output from the layer immediately preceding the output layer of the fully connected layer that outputs the estimated time-series data of the environmental structure, the system predicts the path that the subject will choose. The selected path prediction device according to claim 7. [Claim 10] The aforementioned path prediction unit, It is an encoder-decoder network with an attention mechanism. The selected path prediction device according to claim 7. [Claim 11] A drive unit that drives a mobile body carrying a person, A selected path prediction device according to any one of claims 7 to 10, A control unit controls the drive unit to move in the direction of the path selected by the subject, as estimated by the selected path prediction device. A mobile device equipped with the following features. [Claim 12] A method for estimating a subject's gaze point, performed by an estimation device that estimates the subject's gaze point, A learning data generation step for supervised learning generates learning data, which takes a first image obtained by capturing the subject's face from the left front side with a first camera and a second image obtained by capturing the subject's face from the right front side with a second camera fixed in position to the first camera as input data, and outputs the position coordinates of the point of fixation in a unit spherical coordinate system in a third image obtained by capturing the area around the subject with a 360-degree camera fixed in position to the first and second cameras as output data. A learning process in which supervised learning of a deep learner is performed using the learning data generated in the learning data generation process, An estimation step in which the deep learning model estimates the position coordinates of the point of fixation based on the newly captured first image and second image, A method for estimating the point of gaze, including the point of gaze. [Claim 13] A method for predicting a chosen path, performed by a chosen path prediction device that predicts the path a subject will choose, A first estimation step involves estimating the time-series data of the position coordinates of the subject's gaze point by deep learning, based on time-series data of a first image obtained by capturing the subject's face from the left front side with a first camera, and time-series data of a second image obtained by capturing the subject's face from the right front side with a second camera whose positional relationship with the first camera is fixed. A second estimation step involves using deep learning to estimate time-series data of the environmental structure related to the path around the subject, based on time-series data of a third image obtained by imaging the area around the subject with a 360-degree camera whose positional relationship with the first and second cameras is fixed; A third estimation step predicts the path selected by the subject using deep learning, based on time-series data of the position coordinates of the subject's gaze point estimated in the first estimation step, and time-series data of either the environmental structure estimated in the second estimation step or feature vectors obtained intermediately when estimating the environmental structure in the second estimation step. A method for predicting selected paths, including the following: [Claim 14] A computer that estimates the gaze points of the subjects, A learning data generation unit generates supervised learning learning data, which takes as input data a first image obtained by capturing the subject's face from the left front side with a first camera, and a second image obtained by capturing the subject's face from the right front side with a second camera fixed in position to the first camera, and outputs the position coordinates of the point of fixation in a unit spherical coordinate system in a third image obtained by capturing the area around the subject with a 360-degree camera fixed in position to the first and second cameras. An estimation unit, which includes a deep learning model, performs supervised learning of the deep learning model using the learning data generated by the learning data generation unit, and then estimates the position coordinates of the point of focus using the deep learning model based on the newly captured first and second images. A program that makes it function as such. [Claim 15] A computer that predicts the route a person will choose, A gaze point estimation unit estimates the position coordinates of the gaze point of the subject by deep learning, based on time-series data of a first image obtained by capturing the subject's face from the left front side with a first camera, and time-series data of a second image obtained by capturing the subject's face from the right front side with a second camera whose positional relationship with the first camera is fixed. An environmental structure estimation unit estimates time-series data of the environmental structure related to the path around the subject by deep learning, based on time-series data of a third image obtained by imaging the area around the subject with a 360-degree camera whose positional relationship with the first camera and the second camera is fixed. A path prediction unit predicts a path selected by the subject based on time-series data of the position coordinates of the subject's gaze point estimated by the gaze point estimation unit, and time-series data of either the environmental structure estimated by the environmental structure estimation unit or feature vectors obtained intermediately when estimating the environmental structure. A program that makes it function as such.