Voice interaction method, device, medium, equipment and program product

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By activating multiple voice recognition links and comparing the recognized text within the interaction confirmation countdown, the problem of voice interaction delay in existing technologies is solved, achieving a faster response speed.

CN122224166APending Publication Date: 2026-06-16CHINA MERCHANTS ADVANCED TECHNOLOGY DEVELOPMENT (SHENZHEN) CO LTD

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: CHINA MERCHANTS ADVANCED TECHNOLOGY DEVELOPMENT (SHENZHEN) CO LTD
Filing Date: 2026-03-11
Publication Date: 2026-06-16

AI Technical Summary

Technical Problem

In existing technologies, AI voice interaction requires a waiting period when determining when a user has finished voice input, resulting in long response delays and making it difficult to meet the needs of real-time interaction.

Method used

By enabling multiple secondary speech recognition links, the system recognizes the accumulated received speech data before the interactive confirmation countdown ends, generates candidate response text, and compares the candidate recognition text with the baseline recognition text at the end of the countdown. If they match, the candidate response speech is directly output, saving the time spent waiting to generate the response text.

Benefits of technology

It shortens the delay between the final voice response and the user ending voice input, thus improving the response speed of real-time interaction.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122224166A_ABST

Patent Text Reader

Abstract

The application belongs to the technical field of artificial intelligence, and discloses a voice interaction method, device, medium, equipment and program product. When a trigger event is met, a plurality of second voice recognition links are first started. When it is determined that a user ends voice input, an interaction confirmation countdown is started, and before the interaction confirmation countdown ends, a pre-emptive answer voice is generated during the interaction confirmation countdown through the second voice recognition link. When the interaction confirmation countdown ends, a reference recognition text is obtained by identifying accumulated received voice data through a first voice recognition link, and a candidate recognition text obtained by the second voice recognition link is compared with the reference recognition text. If any candidate recognition text matches the reference recognition text, the candidate answer voice corresponding to the candidate recognition text is directly taken as a final answer voice. The delay time between the final answer voice and the user's ending voice input can be shortened.

Need to check novelty before this filing date? Find Prior Art