Speech enhancement method based on time-frequency domain joint loss function

A loss function, speech enhancement technology, applied in speech analysis, neural learning methods, instruments, etc., can solve problems affecting speech recognition accuracy, damage, speech enhancement phase mismatch, etc.

Active Publication Date: 2021-06-08
WUHAN UNIV
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The mismatch between the magnitude spectrum and the phase spectrum will damage the characteristic information of the speech signal, such as the Mel Frequency Cepstrum Coefficient (MFCC), thus affecting the accuracy of speech recognition.
[0009] It...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech enhancement method based on time-frequency domain joint loss function
  • Speech enhancement method based on time-frequency domain joint loss function
  • Speech enhancement method based on time-frequency domain joint loss function

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0068] This embodiment is used to implement training and testing based on the aishell speech set and the musan noise set.

[0069] like figure 1 As shown, this embodiment performs speech enhancement and training based on a convolutional neural network (Convolutional Neural Network, CNN) model. Comparing with existing algorithms for speech enhancement by replacing the loss function with a joint loss function in the time-frequency domain.

[0070] The first embodiment of the present invention is a speech enhancement method based on a joint loss function in the time-frequency domain, and the specific steps are as follows:

[0071] Step 1: Assemble the clean speech data set and the noise data in the open source data set into a noisy speech data set. The clean speech in the clean speech data set is divided into frames and overlapped by the method of short-time Fourier transform, and converted into each clean speech. The frequency domain amplitude spectrum of speech, construct a c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a speech enhancement method based on a time-frequency domain joint loss function. The method comprises steps: integrating a clean voice data set and a noise data set in an open source data set into a noisy voice data set, converting the noisy voice data set into an amplitude spectrum, a phase spectrum and waveform data through preprocessing operation, and constructing a training set; constructing a CNN network model, taking the noisy voice amplitude spectrum as input, taking the clean voice amplitude spectrum as a label, and carrying out model training; performing waveform reconstruction on an amplitude spectrum estimation value output by the model and a noisy speech phase spectrum through an inverse short-time Fourier transform method to obtain a time domain waveform of estimated speech; calculating frequency domain loss through the clean voice amplitude spectrum and the amplitude spectrum estimated value; calculating time domain loss through the clean voice time domain waveform and the estimated voice time domain waveform; and constructing time-frequency domain joint loss according to the frequency domain loss and the time domain loss, and guiding the CNN network model to perform weight optimization. The phenomenon that the estimated amplitude spectrum is not matched with the phase spectrum is reduced, and the speech enhancement effect is improved.

Description

technical field [0001] The invention relates to the field of speech enhancement, in particular to a speech enhancement method based on a time-frequency domain joint loss function. Background technique [0002] Voice communication is the most convenient way of information interaction between people and machines. However, no matter what the environment is, voice communication is more or less disturbed by ambient noise. Speech enhancement technology is an effective way to solve the impact of noise in the process of voice interaction. The purpose of speech enhancement is to extract clean speech signals from background noise as much as possible, eliminate environmental noise, and improve speech quality and speech intelligibility. [0003] In recent years, the popularity of artificial intelligence technology has remained high, and speech enhancement technology has also developed rapidly, and various speech enhancement technologies emerge in endlessly. These speech enhancement s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L21/0232G10L21/0224G10L25/30G06N3/04G06N3/08
CPCG10L21/0232G10L21/0224G10L25/30G06N3/08G06N3/045
Inventor 高戈王霄陈怡杨玉红曾邦尹文兵
Owner WUHAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products