Unlock instant, AI-driven research and patent intelligence for your innovation.

Sequence-to-sequence speech synthesis method and system for double-layer autoregressive decoding

A speech synthesis and autoregressive technology, applied in speech synthesis, speech analysis, neural learning methods, etc., can solve problems such as unstoppable, unsatisfactory attention mechanism robustness, and insufficient long-term correlation modeling ability, etc.

Active Publication Date: 2020-11-03
UNIV OF SCI & TECH OF CHINA
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005]The current neural network-based sequence-to-sequence speech synthesis methods are all designed based on the frame-level autoregressive decoding structure, and there is a lack of long-term correlation modeling ability. In addition, the robustness of the attention mechanism adopted by the model is not ideal. When synthesizing complex text, there are synthetic speech errors such as repetition, omission, and inability to stop.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sequence-to-sequence speech synthesis method and system for double-layer autoregressive decoding
  • Sequence-to-sequence speech synthesis method and system for double-layer autoregressive decoding
  • Sequence-to-sequence speech synthesis method and system for double-layer autoregressive decoding

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0054] According to an embodiment of the present invention, a sequence-to-sequence speech synthesis system with double-layer autoregressive decoding is proposed, including an encoder and a decoder. The structure of the encoder is the same as that of the Tacotraon2 model, and its decoder includes three modules of phoneme-level representation, phoneme-level prediction and frame-level prediction. Additionally, a total of four loss functions are propos...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a sequence-to-sequence speech synthesis method and system for double-layer autoregressive decoding. The system comprises an encoder and a decoder, and the decoder comprises a phoneme-level representation module, a phoneme-level prediction module and a frame-level prediction module; the encoder represents phoneme names, tones and rhythm phrase boundary information by using vectors, and then encodes and fuses the information by using a convolutional neural network and a bidirectional long-short-term memory network to obtain context unit representation of each phoneme in asentence; the phoneme-level representation module is used for obtaining acoustic unit representation of each phoneme unit through a frame-level long-short-term memory network (LSTM) and pooling processing; the phoneme level prediction module adopts a phoneme level autoregressive structure to predict acoustic unit representation of the current phoneme and establish a dependency relationship betweencontinuous phonemes; the frame level prediction module predicts acoustic characteristics of a frame level through a decoder LSTM.

Description

technical field [0001] The invention belongs to the field of speech signal processing, and in particular relates to a sequence-to-sequence speech synthesis method and system of double-layer autoregressive decoding. Background technique [0002] Speech synthesis, which aims to enable machines to speak fluently and naturally like humans, has benefited many voice-interactive applications, such as intelligent personal assistants and robots. Currently, statistical parametric speech synthesis (SPSS) is one of the mainstream methods. [0003] Statistical parametric speech synthesis utilizes an acoustic model to model the relationship between text features and acoustic features, and utilizes a vocoder to obtain speech waveforms given predicted acoustic features. Although this approach can produce clear voices, the quality of synthesized speech will always suffer due to the limitations of the acoustic model and vocoder. Recently, Wang and Shen et al. proposed a neural network-based...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L13/047G10L13/04G10L13/08G10L25/30G10L25/24G06N3/04G06N3/08
CPCG10L13/047G10L13/08G10L25/30G10L25/24G06N3/08G06N3/084G06N3/045
Inventor 周骁凌震华戴礼荣
Owner UNIV OF SCI & TECH OF CHINA