Human action and language instruction combined recognition system

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By using multi-view modeling and time-frequency decomposition of the speech recognition and action recognition modules, combined with the mutual information value module and instruction generation module, the problems of poor accent adaptability and viewpoint limitation are solved, achieving efficient and flexible joint recognition and instruction generation.

CN120808781BActive Publication Date: 2026-06-19PUWANG (SHANGHAI) INFORMATION TECH CO LTD

View PDF 1 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: PUWANG (SHANGHAI) INFORMATION TECH CO LTD
Filing Date: 2025-08-07
Publication Date: 2026-06-19

Application Information

Patent Timeline

07 Aug 2025

Application

19 Jun 2026

Publication

CN120808781B

IPC: G10L15/22; G06F3/01

AI Tagging

Application Domain

Input/output for user-computer interaction Speech recognition

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

User interface display system, method, computer device and storage medium
US12657756B2Input/output for user-computer interaction Image analysis
Electronic devices with finger sensors
US12656914B2Input/output for user-computer interaction Details for portable computers
Semiconductor inventory equipment maintenance system and method
CN120087937Blower requirementEasy to carry outInput/output for user-computer interaction Data processing applications
Device for work support in a predefined work area within an assigned spatial profile
DE102013201309B4Input/output for user-computer interactionMeasuring points marking
AR head-mounted device, and AR head-mounted device and terminal device combination system
CN114967926BInput/output for user-computer interaction Graph reading

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing technologies for recognizing human movements and language commands suffer from poor accent adaptability, significant perspective limitations, and simple and inefficient joint recognition and fusion logic, resulting in high misjudgment rates and poor interaction flexibility.

Method used

Employing a language recognition module, an action recognition module, a mutual information value module, an independent analysis module, and a fusion analysis module, the system dynamically generates instructions by expanding training data, multi-view modeling, time-frequency decomposition, and parameter calculation, adapting to scenarios with varying accents and diverse perspectives.

Benefits of technology

It improves recognition accuracy and interaction flexibility in complex environments, reduces misjudgments, and provides a natural and efficient user experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN120808781B_ABST

Patent Text Reader

Abstract

This invention discloses a joint recognition system for human actions and language commands, specifically relating to the field of intelligent recognition. The system includes a language recognition module, an action recognition module, a mutual information value module, an independent analysis module, a fusion analysis module, and a command generation module. The language recognition module collects human language information and extracts features from it to generate a language signal X. The action recognition module collects human action information and extracts features from it to generate an action signal Y. The mutual information value module constructs a joint distribution from the language signal X and the action signal Y, obtaining the probability distributions of language signal X, action signal Y, and their joint probability distribution. This system can reliably recognize speech even in complex environments with diverse accents and varying perspectives, reducing interaction errors caused by signal misjudgment and providing users with a natural and efficient experience in different interaction scenarios.

Need to check novelty before this filing date? Find Prior Art