Voice processing method and device thereof, electronic equipment and computer storage medium

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A technology of voice processing and voice data, applied in the direction of voice analysis, instruments, etc., can solve the problems that affect the noise separation effect and cannot deeply dig out the correlation and difference between normal signal and noise signal

Active Publication Date: 2021-09-14

出门问问创新科技有限公司

View PDF7 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Traditional speech noise reduction methods are based on various signal filtering algorithms to build models, such as the Kalman filter algorithm, which constructs a linear combination model of normal signals and noise signals to achieve the purpose of noise separation. Mining the correlation and difference between the normal signal and the noise signal, which affects the noise separation effect, and requires the model builder to add certain prior knowledge to ensure the robustness of the model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0043] An embodiment of the present invention provides a speech processing method, such as figure 1 As shown, including:

[0044] Step 101, construct a training pair of first voice data and second voice data.

[0045] Among them, the first voice data can be clean speech data, referred to as Clean Audio, no noise voice data; the second voice data can be a voice data after increasing noise on the first voice data, referred to as Noisyaudio.

[0046] Constructing the training pair of first voice data and second voice data, can include:

[0047] Data enhancement processing is performed on the first voice data to obtain a corresponding second voice data; the first voice data and its corresponding second voice data composition training pair. Data enhancements include at least one of the following methods: the same category enhancement, noise enhancement, time shift enhancement, and pitch transformation enhancement.

[0048] NOISY AUDIO data is its corresponding Clean Audio generated by ...

Embodiment 2

[0074] Embodiment of the present invention provides a voice processing device, such as figure 2 As shown, including:

[0075] Building module 10 for building a training pair of first voice data and second voice data;

[0076] Generating module 20, configured to generate a generating model for generating the first voice data and the second speech data, generating a first embedding data corresponding to the first voice data, and corresponding to the second voice data Second embedding data;

[0077] The discrimination module 30 is used to train the first embedded data and the second embedded data input discriminant model to obtain the result of the discrimination;

[0078] The learning module 40 is configured to confront model learning according to the discriminant model, a random gradient decreased manner to obtain a speech noise reduction model;

[0079] Processing module 50 for noise reduction processing on the target voice data according to the speech noise reduction model.

[00...

Embodiment 3

[0088] The embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, a communication interface, a memory that performs communication between each other through a communication bus; a memory for storing a computer program; processor The method steps of the embodiments of the present invention are implemented when the program is stored on the memory is executed.

[0089] The embodiment of the present invention further provides a computer readable storage medium, the computer readable storage medium stores a computer program, the method of implementing the method of the embodiment of the present invention when executed by the processor.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a voice processing method. The method comprises the following steps: constructing a training pair of first voice data and second voice data; respectively inputting the original features of the first voice data and the second voice data into a generation model, and generating first embedded data corresponding to the first voice data and second embedded data corresponding to the second voice data; inputting the first embedded data and the second embedded data into a discrimination model for training to obtain a discrimination result; according to the discrimination model, performing adversarial model learning in a stochastic gradient descent mode to obtain a voice noise reduction model; and performing noise reduction processing on the target voice data according to the voice noise reduction model. According to the method, on the premise that robustness is guaranteed and dependence on priori knowledge is small, relevance and difference between normal signals and noise signals are fully learned in a self-adaptive mode through the deep learning network, and a good voice noise reduction effect is achieved.

Description

Technical field [0001] The present invention relates to the field of speech processing, and more particularly to a speech processing method, apparatus, electronic device, and computer storage medium. Background technique [0002] With the development of voice communication systems, speech has entered all aspects, such as mobile phone audio and video calls, car calls, etc., the external environment usually affects the acupuncture and clarity of voice, and will cause audible fatigue to the listeners. The traditional speech noise reduction method is based on various signal filtration algorithms, such as Kalman filter algorithm, constructing a linear combination model of normal signal and noise signal to achieve noise separation, and the existence is that it cannot be deeply removed. Mining the correlation and difference between the normal signal and the noise signal, thereby affecting the noise separation effect, and requires the robustness of the model constructor to ensure the rob...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L21/0208G10L21/0216G10L21/0224G10L25/30

CPCG10L21/0208G10L21/0216G10L21/0224G10L25/30

Inventor汪剑李志飞

Owner出门问问创新科技有限公司

Voice processing method and device thereof, electronic equipment and computer storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology