Voice enhancing method based on multiresolution auditory cepstrum coefficient and deep convolutional neural network

A deep convolution, neural network technology, applied in speech analysis, speech recognition, instruments, etc., can solve the problems of unsatisfactory performance of speech enhancement algorithms, unsatisfactory algorithm performance, etc.

Active Publication Date: 2018-03-27
BEIJING UNIV OF TECH
View PDF3 Cites 62 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is to propose a method based on multi-resolution cepstral coefficients and deep convolutional neural network for the unsatisfactory performance of the current speech enhancement algorithm under non-stationary noise and the problems existing in the extraction process of speech feature parameters. Combined Speech Enhancement Technology
Then, the adaptive masking threshold based on ideal so...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice enhancing method based on multiresolution auditory cepstrum coefficient and deep convolutional neural network
  • Voice enhancing method based on multiresolution auditory cepstrum coefficient and deep convolutional neural network
  • Voice enhancing method based on multiresolution auditory cepstrum coefficient and deep convolutional neural network

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach

[0031] Such as figure 1 As shown, the present invention provides a kind of speech enhancement method based on multi-resolution auditory cepstral coefficient and deep convolutional neural network, comprising the following steps:

[0032] Step 1, performing time-frequency decomposition on the input signal, and then performing windowing and framing processing to obtain the time-frequency representation of the input signal;

[0033] (1) First, time-frequency decomposition is performed on the input signal;

[0034] The speech signal is a typical time-varying signal, and the time-frequency decomposition focuses on the time-varying spectral characteristics of the components of the real speech signal, and decomposes the one-dimensional speech signal into a two-dimensional signal represented by time-frequency, aiming to reveal How many frequency component levels are contained in a speech signal and how each component varies with time. Gammatone filter is a good tool for time-frequenc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a voice enhancing method based on a multiresolution auditory cepstrum system and a deep convolutional neural network. The voice enhancing method comprises the following steps:firstly, establishing new characteristic parameters, namely multiresolution auditory cepstrum coefficient (MR-GFCC), capable of distinguishing voice from noise; secondly, establishing a self-adaptivemasking threshold on based on ideal soft masking (IRM) and ideal binary masking (IBM) according to noise variations; further training an established seven-layer neural network by using new extracted characteristic parameters and first/second derivatives thereof and the self-adaptive masking threshold as input and output of the deep convolutional neural network (DCNN); and finally enhancing noise-containing voice by using the self-adaptive masking threshold estimated by the DCNN. By adopting the method, the working mechanism of human ears is sufficiently utilized, voice characteristic parameters simulating a human ear auditory physiological model are disposed, and not only is a relatively great deal of voice information maintained, but also the extraction process is simple and feasible.

Description

technical field [0001] The invention belongs to the technical field of speech signal processing, and relates to a speech enhancement method based on multi-resolution auditory cepstral coefficients and a deep convolutional neural network. Background technique [0002] Speech enhancement technology refers to extracting the purest possible speech signal from the noise background, enhancing the useful speech signal, and suppressing and reducing noise interference when the speech signal is interfered or even submerged by various noises (including speech). technology. Due to the randomness of interference, it is almost impossible to extract completely pure speech signal from noisy speech. In this case, there are two main purposes of speech enhancement: one is to improve speech quality, eliminate background noise, so that the listener is willing to accept it without feeling tired, which is a subjective measure; the other is to improve the intelligibility of speech , which is an o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L21/0216G10L15/16G10L25/24
CPCG10L15/16G10L21/0216G10L25/24
Inventor 李如玮刘亚楠李涛孙晓月
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products