Voice enhancing method based on multiresolution auditory cepstrum coefficient and deep convolutional neural network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A deep convolution, neural network technology, applied in speech analysis, speech recognition, instruments, etc., can solve the problems of unsatisfactory performance of speech enhancement algorithms, unsatisfactory algorithm performance, etc.

Active Publication Date: 2018-03-27

BEIJING UNIV OF TECH

View PDF3 Cites 62 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0008] The purpose of the present invention is to propose a method based on multi-resolution cepstral coefficients and deep convolutional neural network for the unsatisfactory performance of the current speech enhancement algorithm under non-stationary noise and the problems existing in the extraction process of speech feature parameters. Combined Speech Enhancement Technology

Then, the adaptive masking threshold based on ideal soft masking (IRM) and ideal binary masking (IBM) was constructed by tracking noise changes; then, the deep convolutional neural network (DCNN) model in deep learning has the ability to extract complex features, Good at modeling structured information in data and estimating adaptive masking thresholds, which can solve the problem of unsatisfactory performance of traditional speech enhancement algorithms in non-stationary noise environments

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment approach

[0031] Such as figure 1 As shown, the present invention provides a kind of speech enhancement method based on multi-resolution auditory cepstral coefficient and deep convolutional neural network, comprising the following steps:

[0032] Step 1, performing time-frequency decomposition on the input signal, and then performing windowing and framing processing to obtain the time-frequency representation of the input signal;

[0033] (1) First, time-frequency decomposition is performed on the input signal;

[0034] The speech signal is a typical time-varying signal, and the time-frequency decomposition focuses on the time-varying spectral characteristics of the components of the real speech signal, and decomposes the one-dimensional speech signal into a two-dimensional signal represented by time-frequency, aiming to reveal How many frequency component levels are contained in a speech signal and how each component varies with time. Gammatone filter is a good tool for time-frequenc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a voice enhancing method based on a multiresolution auditory cepstrum system and a deep convolutional neural network. The voice enhancing method comprises the following steps:firstly, establishing new characteristic parameters, namely multiresolution auditory cepstrum coefficient (MR-GFCC), capable of distinguishing voice from noise; secondly, establishing a self-adaptivemasking threshold on based on ideal soft masking (IRM) and ideal binary masking (IBM) according to noise variations; further training an established seven-layer neural network by using new extracted characteristic parameters and first / second derivatives thereof and the self-adaptive masking threshold as input and output of the deep convolutional neural network (DCNN); and finally enhancing noise-containing voice by using the self-adaptive masking threshold estimated by the DCNN. By adopting the method, the working mechanism of human ears is sufficiently utilized, voice characteristic parameters simulating a human ear auditory physiological model are disposed, and not only is a relatively great deal of voice information maintained, but also the extraction process is simple and feasible.

Description

technical field [0001] The invention belongs to the technical field of speech signal processing, and relates to a speech enhancement method based on multi-resolution auditory cepstral coefficients and a deep convolutional neural network. Background technique [0002] Speech enhancement technology refers to extracting the purest possible speech signal from the noise background, enhancing the useful speech signal, and suppressing and reducing noise interference when the speech signal is interfered or even submerged by various noises (including speech). technology. Due to the randomness of interference, it is almost impossible to extract completely pure speech signal from noisy speech. In this case, there are two main purposes of speech enhancement: one is to improve speech quality, eliminate background noise, so that the listener is willing to accept it without feeling tired, which is a subjective measure; the other is to improve the intelligibility of speech , which is an o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L21/0216G10L15/16G10L25/24

CPCG10L15/16G10L21/0216G10L25/24

Inventor 李如玮刘亚楠李涛孙晓月

Owner BEIJING UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Voice enhancing method based on multiresolution auditory cepstrum coefficient and deep convolutional neural network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment approach

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology