Monitoring device, monitoring method, and non-transitory computer-readable medium

By combining two machine learning models, the problem of low performance in LLM/VLM for recognizing the location and actions of people in surveillance systems in existing technologies is solved, achieving high-precision prediction of the state of monitored objects and efficient surveillance processing.

CN122244642APending Publication Date: 2026-06-19TOYOTA JIDOSHA KK

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
TOYOTA JIDOSHA KK
Filing Date
2025-12-15
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing surveillance systems using large-scale language models (LLM/VLM) struggle to accurately locate people and identify their actions, resulting in low recognition performance.

Method used

A combination of two machine learning models is adopted. The first machine learning model is used for the detection and location determination of the monitored object, and the second machine learning model is used for detailed state inference. By generating cues that include the location and identification information of the monitored object, state inference is performed in combination with images.

🎯Benefits of technology

It achieves high-precision prediction of the state of monitored objects, improves the accuracy of personnel location and action recognition, and can effectively predict the state even in wide-field images, and runs efficiently in resource-limited environments.

✦ Generated by Eureka AI based on patent content.
Patent Text Reader

Abstract

This disclosure provides a monitoring device, a monitoring method, and a non-transitory computer-readable medium for highly accurate estimation of the state of a monitored object. The monitoring device (1) includes a control unit (12), which inputs an image into a first machine learning model trained on a dataset labeled with information about the presence or absence of a monitored object, determines whether the image contains a monitored object, and if the image contains a monitored object, generates a prompt including coordinate information indicating the position of the monitored object in the image and identification information identifying the monitored object, and inputs a symbol of the image and a symbol of the prompt into a second machine learning model trained on a dataset labeled with information about the state of the monitored object, thereby estimating the state of the monitored object in the image.
Need to check novelty before this filing date? Find Prior Art