Speech synthesis method and system, electronic equipment and storage medium

A speech synthesis and speech technology, applied in speech synthesis, speech analysis, instruments, etc., can solve problems such as the impact of audio quality, and achieve the effect of reducing error transmission, preserving prosody diversity, and improving sound quality

Pending Publication Date: 2022-07-22
AISPEECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to at least solve the problem in the prior art that in the cascade system, the prediction error of the acoustic model will be passed down, and if the Mel spectrum prediction is inaccurate, the quality of the generated audio will be affected

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech synthesis method and system, electronic equipment and storage medium
  • Speech synthesis method and system, electronic equipment and storage medium
  • Speech synthesis method and system, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

[0028] like figure 1 Shown is a flowchart of a speech synthesis method provided by an embodiment of the present invention, including the following steps:

[0029] S11: Obtain a hidden layer representation for speech synthesis data, and input the hidden layer representation to a phoneme-level prosody controller to obtain discrete ph...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a speech synthesis method and system, electronic equipment and a storage medium. The method comprises the following steps: acquiring hidden layer representation for speech synthesis data, and inputting the hidden layer representation into a phoneme-level rhythm controller to obtain discrete phoneme-level rhythm prediction; inputting discrete phoneme-level rhythm prediction and hidden layer representation into an acoustic model in a mixed manner, predicting discrete acoustic features of each frame through a classifier in the acoustic model, and predicting frame-level rhythm features by using a convolutional neural network in the acoustic model; and inputting the discrete acoustic features and the frame-level rhythm features into a vocoder to generate various rhythm voices. According to the embodiment of the invention, the discretized voice representation is used for replacing the traditional Mel spectrum, so that the problem of wrong transmission is greatly reduced. Not only is the tone quality of the synthesized voice greatly improved, but also the rhythm diversity is reserved. Different rhythms can be generated through the rhythm controller, so that various voices are generated.

Description

technical field [0001] The present invention relates to the field of intelligent speech, in particular to a speech synthesis method, system, electronic device and storage medium. Background technique [0002] TTS (text-to-speech, text-to-speech) synthesis is the process of converting text into corresponding speech. Compared with traditional statistical parametric speech synthesis, the neural TTS model based on deep neural network has better performance. Mainstream neural text-to-speech synthesis systems are usually a cascaded system that converts input text to mel-spectrum and mel-spectrum to audio. Tacotron2, FastSpeech 2, GlowTTS, etc. can be used during conversion. Among them, Tacotron2 is a sequence-to-sequence model based on attention mechanism, FastSpeech2 is a parallel generation model based on Transformer network, GlowTTS uses a reversible network, inversely transforms the distribution of Mel spectrum into a simple distribution, through the maximum likelihood crite...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L13/10G10L25/30G10L25/03
CPCG10L13/10G10L25/30G10L25/03
Inventor 俞凯杜晨鹏郭奕玮陈谐
Owner AISPEECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products