Neural network vocoder training method based on short-time spectrum consistency

A neural network and short-time spectrum technology, applied in the field of speech signal processing, to improve the quality of synthesized speech, improve inconsistency, and improve quality.

Pending Publication Date: 2021-04-09
UNIV OF SCI & TECH OF CHINA
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] On the basis of the original training method of the neural network vocoder, the present invention designs a short-term spectrum consistency loss function to jointly train the amplitude spectrum predictor and the phase spectrum predictor in the neural network vocoder, so as to reduce the predicted amplitude spectrum The problem of inconsistency in the short-term spectrum combined with the phase spectrum improves the quality of synthesized speech

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Neural network vocoder training method based on short-time spectrum consistency
  • Neural network vocoder training method based on short-time spectrum consistency
  • Neural network vocoder training method based on short-time spectrum consistency

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0053]The neural network vocoder training method based on short-term spectrum consistency provided by the present invention is applied to the neural network vocoder HiNet for hierarchical prediction of amplitude-phase spectrum, and is used to alleviate the existence of short-term spectrum combining the predicted amplitude spectrum and phase spectrum inconsistency problem. The HiNet vocoder consists of a magnitude spectrum predictor and a phase spectrum predictor.

[0054] Since the magnitude spectrum and phase spectrum of the HiNet vocoder are predicted separately, it is difficult for the short-time spectrum composed of the two to meet the consistency condition, that is, the composed short-time spectrum falls outside the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a neural network vocoder training method based on short-time spectrum consistency. The method comprises steps of training an amplitude spectrum predictor through employing natural acoustic features and a natural logarithm amplitude spectrum, and training a phase spectrum predictor through employing the natural logarithm amplitude spectrum, a natural fundamental frequency and a natural waveform; connecting the trained amplitude spectrum predictor and phase spectrum predictor, and training the connected amplitude spectrum predictor and phase spectrum predictor through the natural acoustic features, the natural logarithm amplitude spectrum, the natural fundamental frequency and the natural waveform. According to the neural network vocoder training method provided by the invention, firstly, the amplitude spectrum predictor and the phase spectrum predictor are respectively trained, and finally, the short-time spectrum consistency loss function is added to jointly train the amplitude spectrum predictor and the phase spectrum predictor, so a problem of inconsistency of short-time spectrums formed by predicted amplitude spectrums and phase spectrums can be greatly improved; and the quality of the synthesized voice is improved.

Description

technical field [0001] The invention relates to the technical field of speech signal processing, in particular to a neural network vocoder training method and a speech synthesis method based on short-time spectrum consistency. Background technique [0002] Speech synthesis, which aims to enable machines to speak fluently and naturally like humans, has benefited many voice-interactive applications, such as intelligent personal assistants and robots. Currently, statistical parametric speech synthesis (SPSS) is one of the mainstream methods. [0003] Statistical parametric speech synthesis utilizes an acoustic model to model the relationship between text features and acoustic features, and utilizes a vocoder to obtain speech waveforms given predicted acoustic features. The performance of the vocoder can significantly affect the quality of synthesized speech. Traditional vocoders such as STRAIGHT and WORLD are widely used in current SPSS systems. However, these traditional vo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L19/16G10L25/30G10L13/02
CPCG10L19/16G10L25/30G10L13/02
Inventor 艾杨凌震华
Owner UNIV OF SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products