Audio and video hybrid voice front-end processing method for service robot voice interaction

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A voice interaction and mixed voice technology, applied in voice analysis, voice recognition, instruments, etc., can solve the problems of signal low-pass distortion, main lobe narrowing, etc., achieve good sound quality and voice intelligibility, and improve accuracy

Active Publication Date: 2021-12-31

南京南大电子智慧型服务机器人研究院有限公司 +2

View PDF14 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The delay and sum (DS) beam (BRANDSTEIN M, WARD D. Microphone arrays: signal processing techniques and applications [M]. [S.l.] : Springer Science & Business Media, 2013.) is the most commonly used fixed beam Algorithm, it is robust to disturbances, but the main lobe narrows as the frequency increases, that is, the higher the frequency, the stronger the directivity, resulting in low-pass distortion of the signal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0025]Below in conjunction with accompanying drawing and specific embodiment, further illustrate the present invention, should be understood that these examples are only for illustrating the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various aspects of the present invention All modifications of the valence form fall within the scope defined by the appended claims of the present application.

[0026] An audio-video mixed voice front-end processing method for service robot voice interaction, such as figure 1 shown, including the following steps:

[0027] Step 1, model training: collect training audio and video samples, divide the video part of the training audio and video samples into images by frame, label the voice part of the training audio and video samples according to the corresponding frame image, and obtain the clean voice VAD label of the correspond...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an audio-video mixed voice front-end processing method for voice interaction of a service robot. The specific steps are as follows: (1) capture the mouth movement information of the expected speaker through video processing means; (2) capture the mouth movement information of the expected speaker according to the mouth movement of the expected speaker (3) Optimize the beam algorithm of the robot microphone array according to the voice activity detection results; (4) Realize voice enhancement through the array microphone, suppress environmental noise, and improve the signal-to-noise ratio of the robot's collected voice. The invention can effectively improve the signal quality of the voice collected by the robot in the complex sound field environment where the robot is located.

Description

technical field [0001] The invention belongs to the technical field of voice signal processing, and in particular relates to a voice front-end using a microphone array in a complex environment, which is used to improve the voice collection quality of a service robot. Background technique [0002] Voice interaction system, as the fastest and most effective intelligent human-computer interaction system, is ubiquitous in our lives. The speech interaction system needs to capture the user's speech audio in different scenarios, and perform automatic speech recognition (ASR) after preprocessing steps such as speech enhancement and separation. In the far-field, noisy and other harsh acoustic environments, the recognition accuracy drops rapidly. In order to improve the robustness of the system, it is necessary to use various algorithms for speech enhancement to improve the quality and reliability of speech. Speech enhancement mainly includes: speech separation, speech reverberation...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L15/06G10L15/14G10L15/20G10L15/25G10L25/84

Inventor 雷桐卢晶刘晓峻狄敏吴宝佳

Owner 南京南大电子智慧型服务机器人研究院有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Audio and video hybrid voice front-end processing method for service robot voice interaction

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology