Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Speech enhancement method using stacked multiscale modules

A speech enhancement and multi-scale technology, applied in speech analysis, instruments, etc., can solve the problems of time-domain signal modeling difficulty and modeling difficulty, so as to improve direct processing ability, improve speech enhancement effect, and good noise resistance Effect

Active Publication Date: 2020-02-04
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF11 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] These end-to-end methods directly map the one-dimensional time-domain waveform to the target speech. However, the time-domain waveform signal itself cannot show obvious characteristic structures, and it is difficult to directly model the time-domain signal. The difficulty of modeling will be further increased

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech enhancement method using stacked multiscale modules
  • Speech enhancement method using stacked multiscale modules
  • Speech enhancement method using stacked multiscale modules

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0051] see Figure 1-2 , the present invention provides a technical solution: an end-to-end speech enhancement method using stacked multi-scale modules, comprising the following steps:

[0052] S1: Construct a cascaded end-to-end speech enhancement framework and stitch stacked multi-scale modules into the network structure;

[0053] S2: In the preprocessing stage, the time domain signal is transformed into two-dimensional features;

[0054] S3: Use the speech enhancement module to enhance the two-dimensional features;

[0055] S4: In the post-processing stage, the enhanced feature representation is transformed into a one-dimensional time-domain signal by decoding synthesis.

[0056] The end-to-end speech enhancement framework proposed by the present invention includes speech time domain signal preprocessing, speech enhancement module and target speech synthesis post-processing, such as figure 1 shown.

[0057]Assuming that the time-domain clean speech is x and the noise si...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an end-to-end speech enhancement method using stacked multiscale modules. The end-to-end speech enhancement method using the stacked multiscale modules comprises the followingsteps of S1, building a cascading end-to-end speech enhancement frame, and splicing the stacked multiscale modules into a network structure; S2, in the preprocessing stage, converting a time-domain signal into a two-dimensional characteristic; S3, utilizing a speech enhancement module for enhancing the two-dimensional characteristic; and S4, in the postprocessing stage, converting enhanced character representation into a one-dimensional time-domain signal through decoding synthesis. In order to further improve the performance of an algorithm, speech enhancement evaluation indicators STOI and SDR are integrated into a loss function through applying a multi-target joint optimization training strategy. The experiment shows that the method provided by the invention is capable of remarkably improving a speech enhancement effect and has better noise immunity under the conditions of unknown noise and low signal-to-noise ratio.

Description

technical field [0001] The invention belongs to the technical field of speech enhancement, in particular to an end-to-end speech enhancement method using stacked multi-scale modules. Background technique [0002] Speech enhancement refers to the task of removing or attenuating additional noise in noisy speech. It improves the overall perceptual quality of speech and speech intelligibility by suppressing and separating noise. In terms of robust speech recognition, hearing aid design, speaker verification, etc. Has a wide range of applications. Traditional speech enhancement methods include spectral subtraction, Wiener filtering, statistical model-based methods, and subspace-based methods. In the past few years, supervised speech enhancement methods based on deep learning have gradually become the main focus of scholars. research direction. [0003] Some scholars consider processing the time-domain signal of speech directly instead of relying on the frequency-domain represen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L21/02G10L25/30
CPCG10L21/0364G10L25/30
Inventor 蓝天李森吕忆蓝刘峤钱宇欣叶文政惠国强李萌彭川
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products