Self-attention multi-core maximum mean difference-based transfer learning speech enhancement method

A technology of maximum mean difference and speech enhancement, applied in speech analysis, instruments, etc., can solve problems such as model mismatch, achieve the effects of improving robustness and performance, good application prospects, and ingenious and novel methods

Active Publication Date: 2019-08-09
NANJING INST OF TECH
View PDF9 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to overcome the existing speech (single-ch...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Self-attention multi-core maximum mean difference-based transfer learning speech enhancement method
  • Self-attention multi-core maximum mean difference-based transfer learning speech enhancement method
  • Self-attention multi-core maximum mean difference-based transfer learning speech enhancement method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The present invention will be further described below in conjunction with the accompanying drawings.

[0044] Such as figure 1 As shown, the transfer learning speech enhancement method based on self-attention multi-core maximum mean difference of the present invention comprises the following steps,

[0045] Step (A), extract (gamma-pass frequency cepstral coefficient) GFCC feature from original speech, and as the input feature of deep neural network;

[0046] Step (B), using the noisy speech and the clean speech information to calculate the ideal floating value mask in the Fourier transform domain, and use it as the training target of the deep neural network;

[0047] Step (C), constructing the speech enhancement model based on deep neural network, as baseline model, described baseline model is 4 layers of DNN speech enhancement models, and the first two layers are feature encoders, and the latter two layers are reconstruction decoders;

[0048] Step (D), according to...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a self-attention multi-core maximum mean difference-based transfer learning speech enhancement method, which comprises the following steps: extracting GFCC features from original speech and using the GFCC features as input features of a deep neural network; calculating ideal floating value masking of Fourier transform domain by using noisy speech and clean speech information, and using the ideal floating value masking as the training target of the deep neural network; building a speech enhancement model based on deep neural network; building a self-attention multi-coremaximum mean difference-based transfer learning speech enhancement model; training the self-attention multi-core maximum mean difference-based transfer learning speech enhancement model; inputting frame-level features of the noisy speech in the target domain, and rebuilding enhanced speech waveform. By adding a self-attention algorithm to the front end of the multi-core maximum mean difference andby minimizing the multi-core maximum mean difference between features noticed by the source field and features noticed by the target domain, transfer learning of the unlabeled target domain is realized, and the speech enhancement performance is improved. The method of the invention has a good application prospect.

Description

technical field [0001] The invention relates to the technical field of speech enhancement, in particular to a transfer learning speech enhancement method based on self-attention multi-core maximum mean difference. Background technique [0002] Speech enhancement has important applications in various fields of speech processing. The purpose of speech enhancement is to improve the quality and intelligibility of speech polluted by noise. The focus of early research on single-channel speech enhancement algorithms is how to effectively estimate the noise spectrum from noisy speech and suppress it. Typical algorithms include spectral subtraction, Wiener filter method, minimum mean square error method, minimum controlled iterative average noise estimation algorithm and its improved algorithm, etc. These algorithms mainly study additive background noise and are designed based on the complex statistical properties between noise and pure speech. However, the complex statistical int...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L21/02G10L25/30G10L25/03G10L25/24
CPCG10L21/02G10L25/03G10L25/24G10L25/30
Inventor 梁瑞宇程佳鸣梁镇麟谢跃王青云包永强赵力
Owner NANJING INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products