Neural network vocoder training method based on short-time spectrum consistency

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A neural network and short-time spectrum technology, applied in the field of speech signal processing, to improve the quality of synthesized speech, improve inconsistency, and improve quality.

Pending Publication Date: 2021-04-09

UNIV OF SCI & TECH OF CHINA

View PDF0 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] On the basis of the original training method of the neural network vocoder, the present invention designs a short-term spectrum consistency loss function to jointly train the amplitude spectrum predictor and the phase spectrum predictor in the neural network vocoder, so as to reduce the predicted amplitude spectrum The problem of inconsistency in the short-term spectrum combined with the phase spectrum improves the quality of synthesized speech

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0052] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0053]The neural network vocoder training method based on short-term spectrum consistency provided by the present invention is applied to the neural network vocoder HiNet for hierarchical prediction of amplitude-phase spectrum, and is used to alleviate the existence of short-term spectrum combining the predicted amplitude spectrum and phase spectrum inconsistency problem. The HiNet vocoder consists of a magnitude spectrum predictor and a phase spectrum predictor.

[0054] Since the magnitude spectrum and phase spectrum of the HiNet vocoder are predicted separately, it is difficult for the short-time spectrum composed of the two to meet the consistency condition, that is, the composed short-time spectrum falls outside the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a neural network vocoder training method based on short-time spectrum consistency. The method comprises steps of training an amplitude spectrum predictor through employing natural acoustic features and a natural logarithm amplitude spectrum, and training a phase spectrum predictor through employing the natural logarithm amplitude spectrum, a natural fundamental frequency and a natural waveform; connecting the trained amplitude spectrum predictor and phase spectrum predictor, and training the connected amplitude spectrum predictor and phase spectrum predictor through the natural acoustic features, the natural logarithm amplitude spectrum, the natural fundamental frequency and the natural waveform. According to the neural network vocoder training method provided by the invention, firstly, the amplitude spectrum predictor and the phase spectrum predictor are respectively trained, and finally, the short-time spectrum consistency loss function is added to jointly train the amplitude spectrum predictor and the phase spectrum predictor, so a problem of inconsistency of short-time spectrums formed by predicted amplitude spectrums and phase spectrums can be greatly improved; and the quality of the synthesized voice is improved.

Description

technical field [0001] The invention relates to the technical field of speech signal processing, in particular to a neural network vocoder training method and a speech synthesis method based on short-time spectrum consistency. Background technique [0002] Speech synthesis, which aims to enable machines to speak fluently and naturally like humans, has benefited many voice-interactive applications, such as intelligent personal assistants and robots. Currently, statistical parametric speech synthesis (SPSS) is one of the mainstream methods. [0003] Statistical parametric speech synthesis utilizes an acoustic model to model the relationship between text features and acoustic features, and utilizes a vocoder to obtain speech waveforms given predicted acoustic features. The performance of the vocoder can significantly affect the quality of synthesized speech. Traditional vocoders such as STRAIGHT and WORLD are widely used in current SPSS systems. However, these traditional vo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L19/16G10L25/30G10L13/02

CPCG10L19/16G10L25/30G10L13/02

Inventor艾杨凌震华

OwnerUNIV OF SCI & TECH OF CHINA

Neural network vocoder training method based on short-time spectrum consistency

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology