Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Speech enhancement method based on hybrid masking learning target

A technology for learning objectives and speech enhancement, applied in the field of speech enhancement based on mixed masking learning objectives, can solve the problems of affecting speech intelligibility and quality, not well represented features, poor generalization, etc., to improve intelligibility The effect of improving the quality and calculation accuracy and reducing the amount of calculation

Active Publication Date: 2020-05-08
TIANJIN UNIV
View PDF5 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Commonly used time-frequency masking targets include: Ideal Binary Mask (IBM), Ideal Ratio Mask (IRM), Target Binary Mask (TBM), etc.; among them, the most commonly used The learning objectives are ideal binary masking and ideal floating value masking, but these two learning objectives have their own shortcomings such as inaccurate prediction and poor generalization.
[0005] When the learning target is IRM, the model only needs to classify (0 or 1) whether each time-frequency unit is dominated by noise or target voice, which will cause noise information to be retained in the time-frequency unit dominated by the target voice, and these noise signals will be Seriously affect the intelligibility and quality of speech; when the learning target is IRM, the model needs to predict the coefficients in each time-frequency unit. In the time-frequency unit dominated by noise, the extracted features cannot well represent this The characteristics of the target speech in the time-frequency unit, but for the model, it is difficult to accurately predict the coefficient of the time-frequency unit

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech enhancement method based on hybrid masking learning target
  • Speech enhancement method based on hybrid masking learning target
  • Speech enhancement method based on hybrid masking learning target

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015] A speech enhancement method based on mixed masking learning objectives of the present invention will be described in detail below with reference to the embodiments.

[0016] A kind of speech enhancement method based on mixed masking learning target of the present invention, comprises, following steps:

[0017] 1) Carry out the traditional feature extraction of speech signal, comprise the speech signal that obtains is divided into training set and test set, extract the traditional feature of the speech signal of training set and test set respectively;

[0018] Including: Randomly extract 1500 segments of speech from the training part of the TIMIT corpus, randomly mix them with 9 kinds of noises extracted from the NOISEX-92 corpus, and generate 1500 segments of mixed speech signals to form a training set under a continuously changing signal-to-noise ratio of -5 to 5dB. Randomly select 500 pieces of pure speech from the test part of the TIMIT corpus, and randomly mix them ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A speech enhancement method based on a hybrid masking learning target comprises the steps of performing traditional feature extraction of speech signals, dividing the acquired speech signals into a training set and a test set, and respectively extracting traditional features of the speech signals of the training set and the test set, respectively extracting amplitude spectrum features of STFT domains of the speech signals of the training set and the test set, constructing a deep stacking residual network, constructing a learning target, training the deep stacking residual network by using theextracted traditional features of the training set, the extracted amplitude spectrum features of the STFT domain and the learning target, and inputting the extracted traditional features of the test set and the amplitude spectrum features of the STFT domain into the trained deep stacking residual network to obtain a predicted learning target, performing ISTFT on the predicted learning target to obtain an enhanced speech signal, and calculating a PESQ value of the speech signal. Noise information is not reserved in the speech-dominated time frequency unit, so that the calculation amount is reduced, and neural network learning is easy to train to improve the intelligibility and quality of speech.

Description

technical field [0001] The present invention relates to a hybrid masking learning objective. In particular, it concerns a method for speech enhancement based on hybrid masking learning objectives. Background technique [0002] At present, there are many speech enhancement methods based on deep learning, and the key technologies mainly involve three aspects: which feature to extract, which model to use, and which target to learn. Like features, the study of learning objectives is also very valuable. Under the premise of the same training data characteristics and learning model, better learning objectives can make model training better. [0003] In a speech enhancement system using a supervised neural network, the acquisition of learning targets is generally calculated based on background noise and pure speech. Effective learning targets have an important impact on the learning ability of the speech enhancement model and the generalization of the system. [0004] Currently u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L21/02G10L21/0208G10L25/24G10L25/30G06N20/00
CPCG10L21/02G10L21/0208G10L25/30G10L25/24G06N20/00
Inventor 张涛王泽宇朱诚诚
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products