Human action and language instruction combined recognition system
By using multi-view modeling and time-frequency decomposition of the speech recognition and action recognition modules, combined with the mutual information value module and instruction generation module, the problems of poor accent adaptability and viewpoint limitation are solved, achieving efficient and flexible joint recognition and instruction generation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- PUWANG (SHANGHAI) INFORMATION TECH CO LTD
- Filing Date
- 2025-08-07
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies for recognizing human movements and language commands suffer from poor accent adaptability, significant perspective limitations, and simple and inefficient joint recognition and fusion logic, resulting in high misjudgment rates and poor interaction flexibility.
Employing a language recognition module, an action recognition module, a mutual information value module, an independent analysis module, and a fusion analysis module, the system dynamically generates instructions by expanding training data, multi-view modeling, time-frequency decomposition, and parameter calculation, adapting to scenarios with varying accents and diverse perspectives.
It improves recognition accuracy and interaction flexibility in complex environments, reduces misjudgments, and provides a natural and efficient user experience.
Smart Images

Figure CN120808781B_ABST