Unlock instant, AI-driven research and patent intelligence for your innovation.

Speech enhancement method based on time-frequency domain generative adversarial network

A speech enhancement, time-frequency domain technology, applied in speech analysis, instruments, etc., can solve the problem of ignoring the frequency domain characteristics of speech and noise

Active Publication Date: 2021-05-14
WUHAN UNIV
View PDF9 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the traditional GAN-based speech enhancement method (SEGAN) only maps noisy speech to clean speech in the time domain, ignoring the frequency domain characteristics of speech and noise

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech enhancement method based on time-frequency domain generative adversarial network
  • Speech enhancement method based on time-frequency domain generative adversarial network
  • Speech enhancement method based on time-frequency domain generative adversarial network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0111] Further, the specific implementation plan is as follows:

[0112] The training of the generated confrontation network is a cross-training process, which is divided into:

[0113] Step 4.1, the voice of the training set, the voice of the training set includes clean original voice and noisy original voice, the voice of the training set is divided into frames and samples to obtain clean voice x and noisy voice x c . Wherein, the frame length of sub-framing is N=16384, the frame shift is M=10ms, and the sampling rate is S=16kHz;

[0114] Step 4.2, short-time Fourier transform (STFT) is performed on the speech of the training set to obtain the frequency-domain amplitude spectrum X and X of the clean speech and the noisy speech c . Among them, the window function adopted by STFT is Hamming window, the window length is N, and the sampling rate is S. The standard short-time Fourier transform is shown in Equation 4.

[0115]

[0116] Among them, n is time, x(n) is time d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a speech enhancement method based on a time-frequency domain generative adversarial network, which is characterized in that a frequency domain discriminator is additionally arranged on the basis of the traditional speech enhancement method based on the time-frequency domain generative adversarial network, so that a speech enhancement model can simultaneously learn time domain characteristics and frequency domain characteristics of input speech, and the performance of the model is improved. The time domain discriminator directly discriminates the enhanced speech output by the generator; the frequency domain discriminator discriminates frequency domain characteristics obtained after short-time Fourier transform is carried out on the enhanced speech. When the model is trained, the time domain discriminator and the frequency domain discriminator supervise the generator at the same time, so that the generator can learn time domain and frequency domain characteristics of speech and noise at the same time. Moreover, in order to retain the underlying information of the original speech and prevent the generator from generating overfitting, a frequency domain constraint term is added into the loss function of the model. The method provided by the invention is better in enhancement performance, can process more types of noise, and is wider in applicable scene.

Description

technical field [0001] The invention relates to speech enhancement technology, in particular to a speech enhancement method based on time-frequency domain generation confrontation network. Background technique [0002] Speech enhancement refers to the technology of suppressing and reducing noise in speech by certain methods, and its main purpose is to improve the quality and intelligibility of speech. Speech enhancement technology was developed in the 1970s and has a history of about 50 years. From the earliest spectral subtraction method, to the later method based on statistical model, and then to the method based on deep learning, speech enhancement technology has developed rapidly. [0003] Speech enhancement methods based on deep learning mainly include: a speech enhancement method based on Deep Neural Networks (DNN), a speech enhancement method based on Convolutional Neural Networks (CNN), and a Recurrent Neural Network (Recurrent Neural Network) based method. ,, RNN)...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L21/0224G10L21/0232G10L25/30G10L19/02
CPCG10L21/0224G10L21/0232G10L19/0212G10L25/30
Inventor 高戈尹文兵陈怡杨玉红曾邦王霄
Owner WUHAN UNIV