Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Audio and video hybrid voice front-end processing method for service robot voice interaction

A voice interaction and mixed voice technology, applied in voice analysis, voice recognition, instruments, etc., can solve the problems of signal low-pass distortion, main lobe narrowing, etc., achieve good sound quality and voice intelligibility, and improve accuracy

Active Publication Date: 2021-12-31
南京南大电子智慧型服务机器人研究院有限公司 +2
View PDF14 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The delay and sum (DS) beam (BRANDSTEIN M, WARD D. Microphone arrays: signal processing techniques and applications [M]. [S.l.] : Springer Science & Business Media, 2013.) is the most commonly used fixed beam Algorithm, it is robust to disturbances, but the main lobe narrows as the frequency increases, that is, the higher the frequency, the stronger the directivity, resulting in low-pass distortion of the signal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Audio and video hybrid voice front-end processing method for service robot voice interaction
  • Audio and video hybrid voice front-end processing method for service robot voice interaction
  • Audio and video hybrid voice front-end processing method for service robot voice interaction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025]Below in conjunction with accompanying drawing and specific embodiment, further illustrate the present invention, should be understood that these examples are only for illustrating the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various aspects of the present invention All modifications of the valence form fall within the scope defined by the appended claims of the present application.

[0026] An audio-video mixed voice front-end processing method for service robot voice interaction, such as figure 1 shown, including the following steps:

[0027] Step 1, model training: collect training audio and video samples, divide the video part of the training audio and video samples into images by frame, label the voice part of the training audio and video samples according to the corresponding frame image, and obtain the clean voice VAD label of the correspond...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an audio-video mixed voice front-end processing method for voice interaction of a service robot. The specific steps are as follows: (1) capture the mouth movement information of the expected speaker through video processing means; (2) capture the mouth movement information of the expected speaker according to the mouth movement of the expected speaker (3) Optimize the beam algorithm of the robot microphone array according to the voice activity detection results; (4) Realize voice enhancement through the array microphone, suppress environmental noise, and improve the signal-to-noise ratio of the robot's collected voice. The invention can effectively improve the signal quality of the voice collected by the robot in the complex sound field environment where the robot is located.

Description

technical field [0001] The invention belongs to the technical field of voice signal processing, and in particular relates to a voice front-end using a microphone array in a complex environment, which is used to improve the voice collection quality of a service robot. Background technique [0002] Voice interaction system, as the fastest and most effective intelligent human-computer interaction system, is ubiquitous in our lives. The speech interaction system needs to capture the user's speech audio in different scenarios, and perform automatic speech recognition (ASR) after preprocessing steps such as speech enhancement and separation. In the far-field, noisy and other harsh acoustic environments, the recognition accuracy drops rapidly. In order to improve the robustness of the system, it is necessary to use various algorithms for speech enhancement to improve the quality and reliability of speech. Speech enhancement mainly includes: speech separation, speech reverberation...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/06G10L15/14G10L15/20G10L15/25G10L25/84
Inventor 雷桐卢晶刘晓峻狄敏吴宝佳
Owner 南京南大电子智慧型服务机器人研究院有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products