Method for performing voice denoising by using single noisy voice sample

A technology for noisy speech and speech denoising, which is applied in speech analysis, neural learning methods, biological neural network models, etc., to achieve the effect of improving speech denoising performance and good applicability

Pending Publication Date: 2021-12-21
SOUTHEAST UNIV
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Aiming at the deficiencies of the existing denoising network, the present invention integrates a complex module based on two-stage Tansformer in the complex encoder and decoder to learn the local and global context information output by the encoder to solve the long-term dependency problem of parallel computing, thereby Improving the Performance of Speech Denoising Networks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for performing voice denoising by using single noisy voice sample
  • Method for performing voice denoising by using single noisy voice sample
  • Method for performing voice denoising by using single noisy voice sample

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0087] The technical solutions in the embodiments of the present invention are clearly and completely described below in conjunction with the accompanying drawings.

[0088] Data set: The present invention uses the Voice Bank data set as a clean speech sample, which contains a total of 28 different speaker sets, 26 for training and 2 for evaluation. Superimposed Gaussian white noise and UrbanSound8K datasets on clean speech samples to generate noisy speech datasets, where Gaussian white noise is obtained by randomly selecting a signal-to-noise ratio in the range of 0 to 10, and UrbanSound8K datasets are selected from real-world noise sample, experimenting with all ten noise classes in it. In the overlay process, PyDub is used to overlay the noise on the clean audio, and the noise is truncated or repeated to cover the entire speech segment to form a complete noisy speech sample.

[0089] Experimental environment: This embodiment is developed under the Ubuntu operating system a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for voice denoising by using a single noisy voice sample. The method comprises the following steps: (1) for a clean voice signal, respectively superposing synthetic noise and different noise types in the real world to generate a noisy voice sample; (2) for a single noisy voice sample, generating a pair of voice training samples by using a voice down-sampler; (3) converting the trained input voice into a spectrogram, and then inputting the spectrogram into a denoising network for training, wherein the denoising network overlaps a plurality of two-stage Transformer modules between an encoder and a decoder of a ten-layer depth complex Unet; and (4) forming a loss function used for training by basic loss and regularization loss, wherein the basic loss is determined by network characteristics, and the regularization loss can prevent an excessive smoothness phenomenon from occurring in single sample denoising training. Compared with a traditional method for training by using clean voice and a pair of noisy voices, the method obtains better results on a plurality of evaluation indexes such as signal-to-noise ratio, voice quality perception evaluation, short-time objective intelligibility and the like.

Description

technical field [0001] The invention relates to a method for denoising speech by using a single noisy speech sample, and belongs to the fields of deep learning, speech denoising and speech enhancement. Background technique [0002] At present, electronic technology is widely used. As a typical non-stationary random signal, voice is the most commonly used medium for people to transmit information or communicate with each other. As voice services gradually emerge on smart terminals, people pay more and more attention to voice quality. Today, with the rapid development of informatization, voice signals will inevitably be interfered by various noises. People understand that it will also make it difficult for man-machine devices to obtain accurate information. Therefore, various speech denoising techniques have been developed and researched rapidly. In traditional research ideas, in order to achieve good speech noise reduction results, a large number of noisy speech samples and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L21/0208G10L21/0232G10L21/0264G10L25/30G06N3/04G06N3/08
CPCG10L21/0208G10L21/0232G10L21/0264G10L25/30G06N3/08G06N3/045
Inventor 伍家松李清淳孔佑勇杨淳沨杨冠羽姜龙玉陈阳舒华忠
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products