Gaze point estimation device, selected path prediction device, movement device, gaze point estimation method, selected path prediction method, and program

JP7873814B2Active Publication Date: 2026-06-15HIROSHIMA CITY UNIVERSITY

View PDF 2 Cites 0 Cited by

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Patents
Current Assignee / Owner: HIROSHIMA CITY UNIVERSITY
Filing Date: 2022-05-25
Publication Date: 2026-06-15

Application Information

Patent Timeline

25 May 2022

Application

15 Jun 2026

Publication

JP7873814B2

IPC: G06T7/70; G06N3/045; G06T7/00; G06V10/82

AI Tagging

Application Domain

Image analysis Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure 0007873814000001
Figure 0007873814000002
Figure 0007873814000003

Patent Text Reader

Abstract

To provide a gazing point estimation device or the like for accurately estimating where an object person looks at.SOLUTION: In a gazing point estimation device 1 for estimating a gazing point P of an object person M, a left camera 2 images a face F of the object person M from a left front side. A right camera 3 has a fixed positional relation with the left camera 2 so as to image the face F of the object person M from a right front side. An omnidirectional camera 4 has a fixed positional relation with the left camera 2 and the right camera 3 so as to image the periphery of the object person M. A learning data generation part 5 takes a first image Im1 imaged by the left camera 2 and a second image Im2 imaged by the right camera 3 as input data, and generates learning data of learning with a teacher by taking position coordinates of the gazing point P of a unit spherical coordinate system in a third image Im3 imaged by the omnidirectional camera 4 as output data. An estimation part 6 estimates position coordinates of the gazing point P on the basis of the first image Im1 and the second image Im2 to be newly imaged after performing learning with a teacher by using the learning data generated by the learning data generation part 5.SELECTED DRAWING: Figure 2

Need to check novelty before this filing date? Find Prior Art

Claims

1. A gaze point estimation device that estimates the gaze point of a subject, A first camera that images the subject's face from the left front side, A second camera, whose positional relationship with the first camera is fixed, captures images of the subject's face from the right front side, A 360-degree camera whose positional relationship with the first and second cameras is fixed, and which images the area around the subject, A learning data generation unit generates supervised learning learning data, which takes a first image captured by the first camera and a second image captured by the second camera as input data, and outputs the position coordinates of the point of fixation in the unit spherical coordinate system in a third image captured by the 360-degree camera. An estimation unit is provided which includes a deep learning model, and after supervised learning of the deep learning model using the learning data generated by the learning data generation unit, the deep learning model estimates the position coordinates of the point of focus based on the newly captured first image and second image. A gaze point estimation device equipped with the following features.

2. The deep learning machine described above is The system comprises a first neural network which is a convolutional neural network that takes the first image and the second image as inputs. The gaze point estimation device according to claim 1.

3. The deep learning machine described above is The system includes a second neural network, which is a neural network that obtains information for estimating the point of focus based on time-series data output from the first neural network. The gaze point estimation device according to claim 2.

4. The second neural network is LSTM (Long Short-Term Memory). The gaze point estimation device according to claim 3.

5. The aforementioned learning data generation unit, The first, second, and third images are captured when the subject gazes at a specific location, and the learning data is generated by detecting the specific location as the gaze point from the third image. The gaze point estimation device according to claim 1.

6. A drive unit that drives a mobile body carrying a person, A gaze point estimation device according to any one of claims 1 to 5, A control unit controls the drive unit to move in the direction of the position coordinates of the object's gaze point estimated by the gaze point estimation device, A mobile device equipped with the following features.

7. A route selection prediction device that predicts the route a subject will choose, A first camera that outputs time-series data of a first image obtained by imaging the subject's face from the left front side, A second camera whose positional relationship with the first camera is fixed, and which outputs time-series data of a second image obtained by imaging the subject's face from the right front side, A 360-degree camera whose positional relationship with the first and second cameras is fixed, and which outputs time-series data of a third image obtained by imaging the area around the subject, A gaze point estimation unit estimates the time series data of the position coordinates of the subject's gaze point using deep learning, based on the time series data of the first image and the time series data of the second image. An environmental structure estimation unit estimates, based on the time-series data of the third image, the environmental structure estimation unit estimates the time-series data of the environmental structure related to the paths around the subject using deep learning, A path prediction unit predicts the path selected by the subject using deep learning, based on time-series data of the position coordinates of the subject's gaze point estimated by the gaze point estimation unit, and time-series data of either the environmental structure estimated by the environmental structure estimation unit or feature vectors obtained intermediately when estimating the environmental structure. A selection path prediction device equipped with the following features.

8. The aforementioned environmental structure estimation unit is A convolutional neural network that takes the time-series data of the third image as input, A recurrent neural network that obtains feature vectors for estimating the time-series data of the environmental structure based on the time-series data output from the aforementioned convolutional neural network, A fully connected layer that combines the feature vectors output from the recurrent neural network to output time-series data of the estimated environmental structure, Equipped with, The aforementioned path prediction unit, Based on the estimated time-series data of the environmental structure, predict the path that the subject will choose. The selected path prediction device according to claim 7.

9. The aforementioned environmental structure estimation unit is A convolutional neural network that takes the time-series data of the third image as input, A recurrent neural network that obtains feature vectors for estimating the time-series data of the environmental structure based on the time-series data output from the aforementioned convolutional neural network, A fully connected layer that combines the feature vectors output from the recurrent neural network to output time-series data of the estimated environmental structure, Equipped with, The aforementioned path prediction unit, Based on the time-series data of feature vectors output from the layer immediately preceding the output layer of the fully connected layer that outputs the estimated time-series data of the environmental structure, the system predicts the path that the subject will choose. The selected path prediction device according to claim 7.

10. The aforementioned path prediction unit, It is an encoder-decoder network with an attention mechanism. The selected path prediction device according to claim 7.

11. A drive unit that drives a mobile body carrying a person, A selected path prediction device according to any one of claims 7 to 10, A control unit controls the drive unit to move in the direction of the path selected by the subject, as estimated by the selected path prediction device. A mobile device equipped with the following features.

12. A method for estimating a subject's gaze point, performed by an estimation device that estimates the subject's gaze point, A learning data generation step for supervised learning generates learning data, which takes a first image obtained by capturing the subject's face from the left front side with a first camera and a second image obtained by capturing the subject's face from the right front side with a second camera fixed in position to the first camera as input data, and outputs the position coordinates of the point of fixation in a unit spherical coordinate system in a third image obtained by capturing the area around the subject with a 360-degree camera fixed in position to the first and second cameras as output data. A learning process in which supervised learning of a deep learner is performed using the learning data generated in the learning data generation process, An estimation step in which the deep learning model estimates the position coordinates of the point of fixation based on the newly captured first image and second image, A method for estimating the point of gaze, including the point of gaze.

13. A method for predicting a chosen path, performed by a chosen path prediction device that predicts the path a subject will choose, A first estimation step involves estimating the time-series data of the position coordinates of the subject's gaze point by deep learning, based on time-series data of a first image obtained by capturing the subject's face from the left front side with a first camera, and time-series data of a second image obtained by capturing the subject's face from the right front side with a second camera whose positional relationship with the first camera is fixed. A second estimation step involves using deep learning to estimate time-series data of the environmental structure related to the path around the subject, based on time-series data of a third image obtained by imaging the area around the subject with a 360-degree camera whose positional relationship with the first and second cameras is fixed; A third estimation step predicts the path selected by the subject using deep learning, based on time-series data of the position coordinates of the subject's gaze point estimated in the first estimation step, and time-series data of either the environmental structure estimated in the second estimation step or feature vectors obtained intermediately when estimating the environmental structure in the second estimation step. A method for predicting selected paths, including the following:

14. A computer that estimates the gaze points of the subjects, A learning data generation unit generates supervised learning learning data, which takes as input data a first image obtained by capturing the subject's face from the left front side with a first camera, and a second image obtained by capturing the subject's face from the right front side with a second camera fixed in position to the first camera, and outputs the position coordinates of the point of fixation in a unit spherical coordinate system in a third image obtained by capturing the area around the subject with a 360-degree camera fixed in position to the first and second cameras. An estimation unit, which includes a deep learning model, performs supervised learning of the deep learning model using the learning data generated by the learning data generation unit, and then estimates the position coordinates of the point of focus using the deep learning model based on the newly captured first and second images. A program that makes it function as such.

15. A computer that predicts the route a person will choose, A gaze point estimation unit estimates the position coordinates of the gaze point of the subject by deep learning, based on time-series data of a first image obtained by capturing the subject's face from the left front side with a first camera, and time-series data of a second image obtained by capturing the subject's face from the right front side with a second camera whose positional relationship with the first camera is fixed. An environmental structure estimation unit estimates time-series data of the environmental structure related to the path around the subject by deep learning, based on time-series data of a third image obtained by imaging the area around the subject with a 360-degree camera whose positional relationship with the first camera and the second camera is fixed. A path prediction unit predicts a path selected by the subject based on time-series data of the position coordinates of the subject's gaze point estimated by the gaze point estimation unit, and time-series data of either the environmental structure estimated by the environmental structure estimation unit or feature vectors obtained intermediately when estimating the environmental structure. A program that makes it function as such.