Multi-target voice enhancement method based on SCNN (Stacked Convolutional Neural Network) and TCNN (Temporal Convolutional Neural Network) joint estimation

A joint estimation and speech enhancement technology, applied in speech analysis, speech recognition, instruments, etc., can solve the problem of unsatisfactory speech enhancement performance

Active Publication Date: 2020-03-06
BEIJING UNIV OF TECH
View PDF5 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the invention is to propose a brand-new multi-objective speech enhancement algorithm for the unsat...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-target voice enhancement method based on SCNN (Stacked Convolutional Neural Network) and TCNN (Temporal Convolutional Neural Network) joint estimation
  • Multi-target voice enhancement method based on SCNN (Stacked Convolutional Neural Network) and TCNN (Temporal Convolutional Neural Network) joint estimation
  • Multi-target voice enhancement method based on SCNN (Stacked Convolutional Neural Network) and TCNN (Temporal Convolutional Neural Network) joint estimation

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach

[0034] Such as figure 1 Shown, the present invention provides a kind of new speech enhancement method based on multi-objective learning, comprises the following steps:

[0035]Step 1, the input signal is subjected to windowing and framing processing to obtain the time-frequency representation of the input signal;

[0036] (1) First, time-frequency decomposition is performed on the input signal;

[0037] The speech signal is a typical time-varying signal, and the time-frequency decomposition focuses on the time-varying spectral characteristics of the components of the real speech signal, and decomposes the one-dimensional speech signal into a two-dimensional signal represented by time-frequency, aiming to reveal How many frequency component levels are contained in a speech signal and how each component varies with time.

[0038] First, the original speech signal y(p) is preprocessed in Equation (1), the signal is divided into frames, and each frame is smoothed by Hamming wind...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a multi-target voice enhancement method based on SCNN (Stacked Convolutional Neural Network) and TCNN (Temporal Convolutional Neural Network) joint estimation. On the basis of aSCNN and a TCNN, a new stacked and temporal convolutional neural network (STCNN) is provided; a log-power spectra (LPS) is used as the main characteristic, and input into the SCNN, so that high-levelabstract characteristics are extracted; then, a power function compression Mel-frequency cepstral coefficient (PC-MFCC) according with auditory characteristics of human ears better is provided; the TCNN takes the high-level abstract characteristics extracted by the SCNN and the PC-MFCC as the input; then, sequence modelling is carried out; furthermore, joint estimation on the clean LPS, PC-MFCC and an ideal ratio mask (IRM) is carried out; finally, in an enhancement stage, different voice characteristics have complementarity in a voice synthesis process; an IRM-based post-processing method isprovided; and enhancement voice is synthesized by adaptively adjusting the weight of the estimated LPS and IRM through voice presence information.

Description

Technical field: [0001] The invention belongs to the technical field of speech signal processing, and relates to speech recognition and speech enhancement in mobile speech communication, which is the key speech signal processing technology. Background technique: [0002] The purpose of speech enhancement is to remove background noise in noisy speech and improve the quality and intelligibility of noisy speech. Single-channel speech enhancement technology is widely used in many fields of speech signal processing, including mobile speech communication, speech recognition and digital hearing aids. But at present, the performance of speech enhancement systems in these fields in real acoustic environments is not always satisfactory. Traditional speech enhancement techniques, such as spectral subtraction, Wiener filtering, least mean square error, statistical models, and wavelet transform, which are unsupervised speech enhancement methods, have been extensively studied in the past...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L15/20G10L15/02G10L15/06G10L21/0216G10L21/0264G10L25/03G10L25/24G10L25/30
CPCG10L15/20G10L15/02G10L15/063G10L21/0216G10L21/0264G10L25/03G10L25/24G10L25/30
Inventor 李如玮孙晓月李涛赵丰年
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products