Human action and language instruction combined recognition system

By using multi-view modeling and time-frequency decomposition of the speech recognition and action recognition modules, combined with the mutual information value module and instruction generation module, the problems of poor accent adaptability and viewpoint limitation are solved, achieving efficient and flexible joint recognition and instruction generation.

CN120808781BActive Publication Date: 2026-06-19PUWANG (SHANGHAI) INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
PUWANG (SHANGHAI) INFORMATION TECH CO LTD
Filing Date
2025-08-07
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies for recognizing human movements and language commands suffer from poor accent adaptability, significant perspective limitations, and simple and inefficient joint recognition and fusion logic, resulting in high misjudgment rates and poor interaction flexibility.

Method used

Employing a language recognition module, an action recognition module, a mutual information value module, an independent analysis module, and a fusion analysis module, the system dynamically generates instructions by expanding training data, multi-view modeling, time-frequency decomposition, and parameter calculation, adapting to scenarios with varying accents and diverse perspectives.

Benefits of technology

It improves recognition accuracy and interaction flexibility in complex environments, reduces misjudgments, and provides a natural and efficient user experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN120808781B_ABST
    Figure CN120808781B_ABST
Patent Text Reader

Abstract

This invention discloses a joint recognition system for human actions and language commands, specifically relating to the field of intelligent recognition. The system includes a language recognition module, an action recognition module, a mutual information value module, an independent analysis module, a fusion analysis module, and a command generation module. The language recognition module collects human language information and extracts features from it to generate a language signal X. The action recognition module collects human action information and extracts features from it to generate an action signal Y. The mutual information value module constructs a joint distribution from the language signal X and the action signal Y, obtaining the probability distributions of language signal X, action signal Y, and their joint probability distribution. This system can reliably recognize speech even in complex environments with diverse accents and varying perspectives, reducing interaction errors caused by signal misjudgment and providing users with a natural and efficient experience in different interaction scenarios.
Need to check novelty before this filing date? Find Prior Art