Supercharge Your Innovation With Domain-Expert AI Agents!

Speech enhancement method and system based on generative adversarial network

A speech enhancement and network technology, applied in biological neural network models, speech analysis, neural learning methods, etc., can solve the problem of not taking into account the timing characteristics of speech, and achieve high quality and intelligibility, high speech quality, The effect of high and short-term intelligibility

Pending Publication Date: 2022-06-24
SHANDONG COMP SCI CENTNAT SUPERCOMP CENT IN JINAN +1
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The fully convolutional neural network of the generator and the discriminator does not take into account the temporal characteristics of speech very well

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech enhancement method and system based on generative adversarial network
  • Speech enhancement method and system based on generative adversarial network
  • Speech enhancement method and system based on generative adversarial network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0037] This embodiment provides a speech enhancement method based on a generative adversarial network;

[0038] Speech enhancement methods based on generative adversarial networks, including:

[0039] S101: obtain a speech signal with noise;

[0040] S102: Input the noisy speech signal into the trained generative adversarial network, and output the enhanced speech signal;

[0041] Wherein, the generative adversarial network includes two generators and two discriminators;

[0042] In the generative adversarial network, the ability of the generator to approach the target signal is improved through the mutual game between the two generators and the two discriminators during the training process.

[0043] The training process of the two generators is to minimize the following loss function:

[0044]

[0045] The training process of the two discriminators is to minimize the following loss function:

[0046]

[0047] During training, the training input to the generator is ...

Embodiment

[0085] Example: Determine the optimizer to be RMSProp.

[0086] (1.3) Change the size of random noise and clean speech, and implement pre-emphasis in the range of 0.9 to 1 at the same time.

[0087] Example: Change the range of random noise and clean speech to -1 to 1 to prevent problems such as gradient explosion, and implement a pre-emphasis of 0.95 to make its high-frequency characteristics have better performance

[0088] (1.4) Put random noise and clean speech into the queue, and take out the required batch of enhanced speech and clean speech each time.

[0089] Example: batch size is 50, 16384 frames long;

[0090] Considering the multi-generators cooperate to generate speech in multiple stages, the training method of two generators is adopted to reconstruct and generate clean speech.

[0091] Further, the first generator and the second generator initialization steps; specifically include:

[0092] (2.1) Take out the random noise adjustment dimension separately.

[0...

Embodiment 2

[0120] This embodiment provides a speech enhancement system based on a generative adversarial network;

[0121] Speech enhancement systems based on generative adversarial networks, including:

[0122] an acquisition module, which is configured to: acquire a speech signal with noise;

[0123] a speech enhancement module, which is configured to: input the noisy speech signal into the trained generative adversarial network, and output the enhanced speech signal;

[0124] Wherein, the generative adversarial network includes two generators and two discriminators;

[0125] In the generative adversarial network, the ability of the generator to approach the target signal is improved through the mutual game between the two generators and the two discriminators during the training process.

[0126] It should be noted here that the above acquisition module and speech enhancement module correspond to steps S101 to S102 in the first embodiment, and the examples and application scenarios ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a speech enhancement method and system based on a generative adversarial network. The method comprises the following steps: acquiring a speech signal with noise; inputting the voice signal with noise into the trained generative adversarial network, and outputting an enhanced voice signal; wherein the generative adversarial network comprises two generators and two discriminators; according to the generative adversarial network, in the training process, through the mutual game of the two generators and the two discriminators, the target signal approaching capability of the generators is improved. According to the method, the sequential relationship of voice signals is fully considered, the previous full convolution design of the generator and the discriminator is improved, a multi-head attention mechanism is added and used in the generator, multi-generator multi-stage enhancement is combined with the attention mechanism, and the multi-head attention mechanism and the generative adversarial network game idea are fully utilized. According to the method, the enhanced voice has higher quality and intelligibility.

Description

technical field [0001] The present invention relates to the technical field of speech signal processing, in particular to a speech enhancement method and system based on a generative confrontation network. Background technique [0002] The statements in this section merely provide background related to the present disclosure and do not necessarily constitute prior art. [0003] Voice is the most direct way to transmit information, but there will be a lot of noise interference in our various life scenarios, which will affect the quality of voice. Noise will cause interference in human-to-human communication and human-computer interaction, and the quality of speech with noise will greatly affect the operating efficiency of the speech system. In the speech signal, mixed with various interference noises, the purpose of speech enhancement is to remove the unwanted noise contained in the signal as much as possible, improve the quality of the noisy speech, and at the same time inc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L21/0208G10L25/30G06N3/04G06N3/08
CPCG10L21/0208G10L25/30G06N3/084G06N3/045
Inventor 汪付强袁从刚夏源张鹏吴晓明张建强刘祥志郝秋赟马晓凤
Owner SHANDONG COMP SCI CENTNAT SUPERCOMP CENT IN JINAN
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More