Speech enhancement model construction method and system, speech enhancement method and system

A technology of speech enhancement and construction method, applied in the field of acoustics, can solve the problems of denoising speech intelligibility and poor intelligibility, achieve accurate speech intelligibility, high definition and intelligibility, and improve the effect of enhancement effect

Active Publication Date: 2021-11-05
浙江芯劢微电子股份有限公司
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention aims at the shortcomings of poor clarity and intelligibility of the denoised speech obtained by denoising the noise reduction algorithm based on the neural network, and proposes a construction technology of a speech enhancement model, and also proposes a speech enhancement model based on the constructed speech enhancement model. Implemented Speech Enhancement Technology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech enhancement model construction method and system, speech enhancement method and system
  • Speech enhancement model construction method and system, speech enhancement method and system
  • Speech enhancement model construction method and system, speech enhancement method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0069] Embodiment 1, a construction method of speech enhancement model, comprises the following steps:

[0070] S100. Obtain training sample pairs, where the training sample pairs include corresponding pure speech and noisy speech;

[0071] In this embodiment, the pure speech, the noisy speech and the estimated speech all refer to the time-domain sampling point data of the corresponding audio.

[0072] The noisy speech includes real speech to be noised and synthesized speech to be noisy;

[0073] S110. Constructing a synthesized speech to be noised:

[0074] To obtain pure speech, manually adjust the noise energy based on the speech signal-to-noise ratio calculation formula to obtain synthetic noisy speech with different signal-to-noise ratios. The speech SNR calculation formula is as follows:

[0075]

[0076] Among them, t is the time domain subscript, For pure voice energy, is the noise energy, and the synthesized noisy speech is y(t), y(t) = s(t)+n(t).

[0077] S...

Embodiment 2

[0142] Embodiment 2, improve the scheme of calculating the speech intelligibility of estimated speech in embodiment 1, all the other are equal to embodiment 1;

[0143] refer to image 3 , step S343 is based on the specific steps of generating the speech intelligibility of the corresponding estimated speech based on each frame intelligibility as follows:

[0144] S410. Group the pure speech frames based on the sound decibel value to obtain several pure speech frame sets, and construct an estimated speech frame set corresponding to the pure speech frame sets;

[0145] Since there is a one-to-one correspondence between the pure speech frames and the estimated speech frames, the estimated speech frame set corresponding to the pure speech frame set can be constructed by extracting the estimated speech frame set corresponding to each clean speech frame in the pure speech frame set.

[0146] Specifically:

[0147] S411, grouping the pure voice frames:

[0148] The pure speech fra...

Embodiment 3

[0173] Embodiment 3, change the weight of the mid-section definition in Embodiment 2 from a fixed weight to an adaptive weight, and all the others are equal to Embodiment 2;

[0174] In this embodiment, the weighted calculation is carried out for each segment of clarity, and the calculation formula for obtaining the voice clarity of the corresponding estimated voice is:

[0175]

[0176] W high , W middle , W low is an adaptive weight, and the calculations are the same, so in this embodiment, W high The steps are illustrated with examples, refer to Figure 4 , the specific calculation steps are as follows:

[0177] ①. Calculate the short-term average amplitude of each pure speech frame, and obtain the corresponding frame amplitude data M m ,Calculated as follows:

[0178]

[0179] in, i is the subscript of the time-domain sampling point of the current frame, I is the length of one frame (number of sampling points), x m (i) is the time-domain sampling point data of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and system for constructing a speech enhancement model, and also discloses a speech enhancement method and system realized by using the constructed speech enhancement model, wherein the construction method includes iteratively training speech based on corresponding pure speech and noisy speech The iterative training step of the enhanced network is specifically: inputting the noisy speech to the speech enhancement network, and outputting the corresponding estimated speech by the speech enhancement network; calculating the magnitude square coherence between the corresponding pure speech and the estimated speech ; calculating the energy spectral density data of the estimated speech; obtaining a preset auditory filter, and calculating the speech intelligibility of the estimated speech based on the magnitude square coherence, the energy spectral density data and the auditory filter; The speech enhancement network is updated based on the speech intelligibility. The present application updates the model parameters based on the speech intelligibility, so that the noise reduction result of the trained model is clearer and understandable.

Description

technical field [0001] The invention relates to the technical field of acoustics, in particular to a speech enhancement technology based on human perception. Background technique [0002] With the rapid development of deep learning technology, neural network models are widely used in speech noise reduction scenarios, such as speech enhancement generation confrontation network SEGAN and the famous audio processing network wavenet; [0003] However, when the existing neural network-based denoising algorithm denoises noisy speech in complex scenes, the speech intelligibility and clarity of the denoised speech are poor, especially in the case of non-stationary noise, which is prone to serious problems. The noise cancellation phenomenon and the non-stationary noise residue seriously affect the quality of the denoised speech. Contents of the invention [0004] The present invention aims at the disadvantages of poor clarity and intelligibility of the denoised speech obtained by ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G10L21/02
CPCG10L21/02
Inventor 高旭博
Owner 浙江芯劢微电子股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products