Voice activity detection method combined with voice enhancement

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A voice activity detection and voice enhancement technology, which is applied in voice analysis, neural learning methods, biological neural network models, etc., can solve the problems of limited VAD performance improvement, and achieve the effects of improving robustness, improving work efficiency, and high performance

Inactive Publication Date: 2021-07-13

NORTHWESTERN POLYTECHNICAL UNIV +1

View PDF2 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the above methods have limited performance improvement for VAD.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment

[0063] In this embodiment, two groups of sub-experiments are set up. Groups 1 and 2 are intended to illustrate the improvement effect of the algorithm on VAD tasks and SE tasks.

[0064] Group I: The present invention expresses the proposed method as a joint model using mSI-SDR loss (Multi-mSS). For comparison with Multi-mSS, a joint model (Multi-SS) using SI-SDR loss and a model with only VAD features, denoted as single-VAD model, are trained. Multi-SS has exactly the same network structure as Multi-mSS. The target setting of its SE decoder is SI-SDR. The single-VAD model removes the SE decoder and uses only the VAD loss function vad as the optimization target. The receiver operating characteristic (ROC) curve, the area under the ROC curve (AUC) and the equal error rate (EER) were used as the evaluation indicators of VAD. The signal every 10ms is used as the value for calculating AUC and EER.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a voice activity detection method combined with voice enhancement, and the method comprises the steps: firstly constructing a combined network model on the basis of a Conv-TasNet full convolution network, wherein the whole combined network model is divided into three parts: an encoder, a time convolution network and a decoder; for a voice enhancement SE task and a voice activity detection VAD task, adopting two independent decoders, and sharing the same group of encoders and a TCN network; multiplying the TCN network output mask and the encoder output point to serve as the input of the two decoders; adopting a joint loss function of mSI-SDR and cross entropy to evaluate a result in a training stage; and finally, training the network by adopting an Adam optimizer, wherein the trained network can better realize voice activity detection. According to the invention, through joint training of speech enhancement and speech endpoint detection, the robustness of speech endpoint detection is improved, and the speech endpoint detection can still maintain high performance in a complex noise environment, especially in an environment with severe human voice interference.

Description

technical field [0001] The invention belongs to the technical field of voice recognition, and in particular relates to a voice activity detection method. Background technique [0002] Voice activity detection (VAD) aims to distinguish speech segments from noise segments in audio recordings. An important front-end for many speech-related applications such as speech recognition and speaker recognition. In recent years, deep learning-based VAD has brought significant performance improvements. In particular, end-to-end VAD, which directly brings temporal signals into deep networks, is a recent research trend. [0003] Although deep learning-based VAD has shown its effectiveness, how to further improve its performance in low signal-to-noise ratio (SNR) environments is of long-term interest. A single VAD is difficult to meet the requirements. A natural idea is to introduce speech enhancement (SE) into VAD. The earliest methods initialize VAD using deep learning-based inter-me...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L25/78G10L25/84G10L25/30G06N3/04G06N3/08

CPCG10L25/78G10L25/84G10L25/30G06N3/08G06N3/048G06N3/045

Inventor 张晓雷谭旭陈益江

Owner NORTHWESTERN POLYTECHNICAL UNIV

Voice activity detection method combined with voice enhancement

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology