Deep network waveform synthesis method and device based on filter bank frequency discrimination

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A filter bank and deep network technology, applied in speech synthesis, instrumentation, speech analysis, etc., can solve problems such as aliasing failure in high-frequency parts, spectral distortion in high-frequency bands, large size, etc., to reduce spectral distortion and infer speed Improve the effect of clearing the details of the mel spectrum

Pending Publication Date: 2022-08-09

TIANJIN UNIV

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The second is WaveGlow [9] , is a Glow-based [17] The generative model of the GPU can be inferred very quickly, but the size of this type of model is too large and the memory requirements are high

JKong et al. [19] pointed out that since speech signals are composed of sinusoidal signals with different periods, modeling the periodic pattern of speech is crucial to improve the quality of generated waveform samples, and applied this point of view to GAN networks in their work In the discriminator, the quality of the generated speech waveform is improved, but the average pooling method is used in the discriminator to process the speech waveform, which leads to the aliasing failure of the high-frequency part and the spectral distortion of the high-frequency band

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0062] Examples of the present invention provide a method of deep network text transposition voice waveform synthesis based on the frequency distinction of filters. See Figure 1-Figure 6 This method includes the following steps:

[0063] 101: The voice data set used by the training center, the transcription text corresponding to the voice, and the test text, to give the front -end network of acoustic model from text to Melphen;

[0064] 102: Divide the voice set of the data into the training set, and then calculate the Mel spectrum of each voice in turn, thereby constructing the training Merr spectrum data set to achieve the pre -processing of the data set;

[0065] 103: Build the network:

[0066] Build figure 1 The generator network shown, including: Transposed Convolutional module and more multi-receptive filed fusion (MRF); and figure 2 The multi -frequency discriminator network shown is composed of a number of sub -identifier, and each sub -identifier is processed to the signal...

Embodiment 2

[0080] The following combined with specific calculation formulas and instances, the scheme in Example 1 is further introduced. For details, please refer to the description below:

[0081] 1. Based on generating a vocal coder design that generates confrontation network

[0082] 1. Network structure

[0083] Suppose it is in low -dimensional space There is a simple and easy -to -sample distribution P (Z), and P (Z) is usually a standard diverse normal distribution n (0, i). Construct a mapping function with a neural network Known as the generation network. Using the powerful fitting ability of the neural network, G (Z) obey data distribution P r (X). This model is called implicit density model. The so -called implicit density model refers r (X), but the modeling process.

[0084] One key to implicit density model is how to ensure that the sample generated by generating networks must be obediently obediently distributed.

[0085] Generating a confrontation network is to obey the re...

Embodiment 3

[0150] The sampling rate of the audio used in the experiment is 22.05kHz, and the frequency sampling vector length is n = 512. Take the ninth filter in the filter group as an example. L F H ) = [700Hz, 1000Hz], F L = 700Hz, F H = 1000Hz, F s = 22050Hz, N = 512 Interture (19), get P = 16, Q = 8, and further convulsure the Hanying window with a length of N and the length of the flip of N Window accumulation element W c (N), substitute the above value (25), you can bring the filter coefficient g (n), and further find the frequency response function G (j2πf) of the filter, such as Image 6 Show in the black line. Take the fourth filter in the filter group as an example, Figure 7 List the original voice wave shape and its spectrum, and the waveform and spectrum after being filtered by Analytic Filter 4.

[0151] Secondly, the overall effect of the embodiments of the invention in the end -to -end of the model is verified. First, a TTS front -end model is used to generate the intermediate...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a deep network waveform synthesis method and device based on filter bank frequency distinguishing. The method comprises the following steps: designing a plurality of filter banks of any frequency passbands by adopting an analytical method; voice signals output by the generator are fed into a filter bank in parallel, and signals of a plurality of narrow frequency bands are obtained; the method comprises the following steps of: respectively inputting a narrow-band signal into each sub-discriminator for processing, training parameters of a generative adversarial network by synthesizing a loss function of the sub-discriminator, feeding a test text into a given acoustic model front-end network to generate a test Mel spectrum, and inputting the test Mel spectrum into a generator to generate a voice signal. The device comprises a processor and a memory. The voice waveform synthesis GAN network provided by the invention solves the problem of aliasing failure of a high-frequency part, and greatly reduces the spectrum distortion of a high-frequency band.

Description

Technical field [0001] The invention involves the field of text transformation technology, especially a deep network waveform synthesis method and device based on the frequency distinction of filters. Background technique [0002] Text Voice Voice (TTS, Text-TO-Speech) [1-4] , It has been a popular research topic in the field of artificial intelligence for a long time. It aims to let machines speak fluently and naturally like humans. This technology can be used in many voice interaction applications, such as: smart personal assistants, robots, games, e -books, e -books, e -books Wait. [0003] The end -to -end TTS model based on neural networks usually first converts text into acoustic characteristics [3,5-7] , For example: Melpu, this process is usually divided into two parts: pronunciation and linguistics information from text information, and acoustic characteristics based on linguistics information; second, this model converts the Melpupe spectrum into audio waveform samples ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L13/08G10L25/03G10L25/30

CPCG10L13/08G10L25/03G10L25/30

Inventor 黄翔东王俊芹甘霖王文俊

Owner TIANJIN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Deep network waveform synthesis method and device based on filter bank frequency discrimination

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology