Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Speech enhancement model training and application method, device and equipment, equipment and storage medium

A speech enhancement and training method technology, applied in speech analysis, speech synthesis, instruments, etc., can solve the problems of loss, audio or acoustic feature large information, high cost, etc., and achieve the effect of small distortion

Pending Publication Date: 2021-09-24
PING AN TECH (SHENZHEN) CO LTD
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, the recording of high-quality voice data will consume a lot of cost, and if it is recorded in an ordinary indoor environment, the background noise and other environmental noises and reverberation will be collected or even amplified by the recording equipment
If the current mainstream deep neural network method is used for speech enhancement, it will often cause large distortion, and the audio or acoustic features will suffer from large information loss before the speech synthesis model training.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech enhancement model training and application method, device and equipment, equipment and storage medium
  • Speech enhancement model training and application method, device and equipment, equipment and storage medium
  • Speech enhancement model training and application method, device and equipment, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Next, the technical scheme in the present application will be clear and completely, and the embodiments described herein are described herein, and not all of the embodiments of the present disclosure, not all of the embodiments of the present application. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without making creative labor premises, all of the present application protected.

[0030] The flowchart shown in the drawings is merely illustrative, and it is not necessary to include all content and operation / steps, nor must be performed in the described order. For example, some operations / steps can also be decomposed, combined, or partially combined, so the order actually performed may change according to the actual situation.

[0031] It should be understood that the terms used in this present application specification are merely intended to limit the purposes of the specific embodiments. As us...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the field of artificial intelligence speech enhancement, and particularly discloses a speech enhancement model training and application method, device and equipment, and a storage medium, and a speech enhancement model with small distortion and noise reduction capability is obtained through joint modeling of the speech enhancement model and a vocoder. The method comprises the following steps: performing analog noise addition on clean speech to obtain noisy speech, and determining a target time-frequency mask according to the clean voice and the noisy speech; extracting noisy Mel spectrum features from the noisy speech, inputting the noisy Mel spectrum features into the speech enhancement model, outputting a predicted time-frequency mask, and determining a first loss value according to the predicted time-frequency mask and a target time-frequency mask; obtaining de-noised Mel spectrum features according to the predicted time-frequency mask and the noisy Mel spectrum features; and inputting the de-noised Mel spectrum features into a vocoder to obtain synthetic speech, and determining a second loss value according to the synthetic speech and the clean speech. And optimizing parameters of the speech enhancement model and the vocoder according to the first loss value and the second loss value to obtain a trained speech enhancement model.

Description

Technical field [0001] The present application relates to artificial intelligent speech enhancements, and in particular, a training method, an application, apparatus, computer device, and storage medium, a speech enhancement model. Background technique [0002] Voice synthesis technology has been able to generate a relatively close voice, but to create a high quality speech synthesis system, high quality voice training data is required. High quality speech data typically requires recording in a muffle room equipped with high-end recording equipment and having a very low under noise. So the recording of high quality voice data will cost a lot of cost, and if it is recorded in ordinary indoor environment, the bottom noise and other environmental noise and reverberation will be collected by the recording equipment. If voice enhancement is performed using the current mainstream depth neural network, a larger distortion is often caused to make audio or acoustic characteristics before ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L25/30G10L25/18G10L21/0232G10L13/02
CPCG10L25/18G10L25/30G10L21/0232G10L13/02
Inventor 孙奥兰王健宗
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products