Transfer Learning Speech Enhancement Method Based on Self-Attention Multi-kernel Maximum Mean Difference

A technology of maximum mean difference and speech enhancement, applied in speech analysis, instruments, etc., can solve problems such as model mismatch, and achieve the effect of improving robustness and performance, ingenious and novel methods, and improving feature effectiveness.

Active Publication Date: 2021-02-19
NANJING INST OF TECH
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to overcome the existing speech (single-channel) enhancement method, and the problem of model mismatch occurs when the environment changes

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Transfer Learning Speech Enhancement Method Based on Self-Attention Multi-kernel Maximum Mean Difference
  • Transfer Learning Speech Enhancement Method Based on Self-Attention Multi-kernel Maximum Mean Difference
  • Transfer Learning Speech Enhancement Method Based on Self-Attention Multi-kernel Maximum Mean Difference

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The present invention will be further described below in conjunction with the accompanying drawings.

[0044] Such as figure 1 As shown, the transfer learning speech enhancement method based on self-attention multi-core maximum mean difference of the present invention comprises the following steps,

[0045] Step (A), extract (gamma-pass frequency cepstral coefficient) GFCC feature from original speech, and as the input feature of deep neural network;

[0046] Step (B), using the noisy speech and the clean speech information to calculate the ideal floating value mask in the Fourier transform domain, and use it as the training target of the deep neural network;

[0047] Step (C), constructing the speech enhancement model based on deep neural network, as baseline model, described baseline model is 4 layers of DNN speech enhancement models, and the first two layers are feature encoders, and the latter two layers are reconstruction decoders;

[0048] Step (D), according to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a transfer learning speech enhancement method based on self-attention multi-core maximum mean value difference, which includes extracting GFCC features from original speech and using them as input features of a deep neural network; using noisy speech and clean speech information to calculate Fourier Ideal floating-value masking in the transform domain and as a training target for deep neural networks; building a speech enhancement model based on deep neural networks; building a transfer learning speech enhancement model with self-attention multi-core maximum mean difference; training transfer of self-attention multi-core maximum mean difference Learn the speech enhancement model; input the frame-level features of the noisy speech in the target domain, and reconstruct the enhanced speech waveform. The present invention adds a self-attention algorithm to the front end of the multi-core maximum mean difference, by minimizing the multi-core maximum mean difference between the features noticed in the source domain and the features noticed in the target domain, to realize the transfer learning of the unlabeled target domain, and to improve Speech enhancement performance, has a good application prospect.

Description

technical field [0001] The invention relates to the technical field of speech enhancement, in particular to a transfer learning speech enhancement method based on self-attention multi-core maximum mean difference. Background technique [0002] Speech enhancement has important applications in various fields of speech processing. The purpose of speech enhancement is to improve the quality and intelligibility of speech polluted by noise. The focus of early research on single-channel speech enhancement algorithms is how to effectively estimate the noise spectrum from noisy speech and suppress it. Typical algorithms include spectral subtraction, Wiener filter method, minimum mean square error method, minimum controlled iterative average noise estimation algorithm and its improved algorithm, etc. These algorithms mainly study additive background noise and are designed based on the complex statistical properties between noise and pure speech. However, the complex statistical int...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G10L21/02G10L25/30G10L25/03G10L25/24
CPCG10L21/02G10L25/03G10L25/24G10L25/30
Inventor 梁瑞宇程佳鸣梁镇麟谢跃王青云包永强赵力
Owner NANJING INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products