Monitoring device, monitoring method, and non-transitory computer-readable medium
By combining two machine learning models, the problem of low performance in LLM/VLM for recognizing the location and actions of people in surveillance systems in existing technologies is solved, achieving high-precision prediction of the state of monitored objects and efficient surveillance processing.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- TOYOTA JIDOSHA KK
- Filing Date
- 2025-12-15
- Publication Date
- 2026-06-19
AI Technical Summary
Existing surveillance systems using large-scale language models (LLM/VLM) struggle to accurately locate people and identify their actions, resulting in low recognition performance.
A combination of two machine learning models is adopted. The first machine learning model is used for the detection and location determination of the monitored object, and the second machine learning model is used for detailed state inference. By generating cues that include the location and identification information of the monitored object, state inference is performed in combination with images.
It achieves high-precision prediction of the state of monitored objects, improves the accuracy of personnel location and action recognition, and can effectively predict the state even in wide-field images, and runs efficiently in resource-limited environments.