Speech enhancement method based on hybrid masking learning target

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for learning objectives and speech enhancement, applied in the field of speech enhancement based on mixed masking learning objectives, can solve the problems of affecting speech intelligibility and quality, not well represented features, poor generalization, etc., to improve intelligibility The effect of improving the quality and calculation accuracy and reducing the amount of calculation

Active Publication Date: 2020-05-08

TIANJIN UNIV

View PDF5 Cites 9 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Commonly used time-frequency masking targets include: Ideal Binary Mask (IBM), Ideal Ratio Mask (IRM), Target Binary Mask (TBM), etc.; among them, the most commonly used The learning objectives are ideal binary masking and ideal floating value masking, but these two learning objectives have their own shortcomings such as inaccurate prediction and poor generalization.

[0005] When the learning target is IRM, the model only needs to classify (0 or 1) whether each time-frequency unit is dominated by noise or target voice, which will cause noise information to be retained in the time-frequency unit dominated by the target voice, and these noise signals will be Seriously affect the intelligibility and quality of speech; when the learning target is IRM, the model needs to predict the coefficients in each time-frequency unit. In the time-frequency unit dominated by noise, the extracted features cannot well represent this The characteristics of the target speech in the time-frequency unit, but for the model, it is difficult to accurately predict the coefficient of the time-frequency unit

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0015] A speech enhancement method based on mixed masking learning objectives of the present invention will be described in detail below with reference to the embodiments.

[0016] A kind of speech enhancement method based on mixed masking learning target of the present invention, comprises, following steps:

[0017] 1) Carry out the traditional feature extraction of speech signal, comprise the speech signal that obtains is divided into training set and test set, extract the traditional feature of the speech signal of training set and test set respectively;

[0018] Including: Randomly extract 1500 segments of speech from the training part of the TIMIT corpus, randomly mix them with 9 kinds of noises extracted from the NOISEX-92 corpus, and generate 1500 segments of mixed speech signals to form a training set under a continuously changing signal-to-noise ratio of -5 to 5dB. Randomly select 500 pieces of pure speech from the test part of the TIMIT corpus, and randomly mix them ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A speech enhancement method based on a hybrid masking learning target comprises the steps of performing traditional feature extraction of speech signals, dividing the acquired speech signals into a training set and a test set, and respectively extracting traditional features of the speech signals of the training set and the test set, respectively extracting amplitude spectrum features of STFT domains of the speech signals of the training set and the test set, constructing a deep stacking residual network, constructing a learning target, training the deep stacking residual network by using theextracted traditional features of the training set, the extracted amplitude spectrum features of the STFT domain and the learning target, and inputting the extracted traditional features of the test set and the amplitude spectrum features of the STFT domain into the trained deep stacking residual network to obtain a predicted learning target, performing ISTFT on the predicted learning target to obtain an enhanced speech signal, and calculating a PESQ value of the speech signal. Noise information is not reserved in the speech-dominated time frequency unit, so that the calculation amount is reduced, and neural network learning is easy to train to improve the intelligibility and quality of speech.

Description

technical field [0001] The present invention relates to a hybrid masking learning objective. In particular, it concerns a method for speech enhancement based on hybrid masking learning objectives. Background technique [0002] At present, there are many speech enhancement methods based on deep learning, and the key technologies mainly involve three aspects: which feature to extract, which model to use, and which target to learn. Like features, the study of learning objectives is also very valuable. Under the premise of the same training data characteristics and learning model, better learning objectives can make model training better. [0003] In a speech enhancement system using a supervised neural network, the acquisition of learning targets is generally calculated based on background noise and pure speech. Effective learning targets have an important impact on the learning ability of the speech enhancement model and the generalization of the system. [0004] Currently u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L21/02G10L21/0208G10L25/24G10L25/30G06N20/00

CPCG10L21/02G10L21/0208G10L25/30G10L25/24G06N20/00

Inventor 张涛王泽宇朱诚诚

Owner TIANJIN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Speech enhancement method based on hybrid masking learning target

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology