Unlock instant, AI-driven research and patent intelligence for your innovation.

Speech synthesis model training method, speech synthesis method and device thereof

A technology for speech synthesis and training methods, applied in speech synthesis, speech analysis, speech recognition, etc., can solve the problems of difficulty in ensuring the accuracy of synthesized speech, blurred pronunciation, and background noise.

Pending Publication Date: 2021-09-14
TENCENT TECH (SHENZHEN) CO LTD
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since a very small error on the spectrum may lead to ambiguous pronunciation or background noise, and the MSE loss value only takes into account the spectral error, it is difficult to guarantee the accuracy of the synthesized speech for the trained speech synthesis model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech synthesis model training method, speech synthesis method and device thereof
  • Speech synthesis model training method, speech synthesis method and device thereof
  • Speech synthesis model training method, speech synthesis method and device thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0093] Embodiments of the present application provide a method for training a speech synthesis model, a method and device for speech synthesis, which can comprehensively evaluate a speech synthesis model in combination with speech recognition error and spectral error, thereby facilitating training to obtain a speech synthesis model with better prediction effect, Improve the accuracy of synthesized speech.

[0094] The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein can, for example, be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "corresponding to", and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a speech synthesis model training method based on an artificial intelligence technology, and particularly relates to the technical field of speech processing. The method comprises the steps of obtaining a to-be-trained sample pair; based on the to-be-trained text, obtaining a first Mel spectrum through a speech synthesis model; obtaining a first phoneme sequence through a speech recognition model based on the first Mel spectrum; and updating model parameters of the speech synthesis model according to a loss value between the first Mel spectrum and the real Mel spectrum and a loss value between the first phoneme sequence and the labeled phoneme sequence. The embodiment of the invention further provides a speech synthesis method and a device thereof, the speech synthesis model can be comprehensively evaluated in combination with the speech recognition error and the spectrum error, so that the speech synthesis model with a better prediction effect can be obtained through training, and the speech synthesis accuracy is improved.

Description

technical field [0001] The present application relates to the technical field of speech processing, and in particular, to a method for training a speech synthesis model, a method and apparatus for speech synthesis. Background technique [0002] Voice is a very common way of people's daily communication. With the development of artificial intelligence (AI) technology, text to speech (TTS) technology has attracted more and more attention. Using TTS technology, any text information can be converted into corresponding speech, so that the synthesized speech can be Understandable, clear, natural and expressive. [0003] At present, the mainstream end-to-end speech synthesis model is usually used to realize speech synthesis. First, the text to be synthesized is converted into a phoneme sequence, and then the phoneme sequence is input into the speech synthesis model, and the synthesized speech is output through the speech synthesis model. [0004] In the process of training the sp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/04G10L13/08G10L15/02G10L15/06G10L25/24
CPCG10L13/04G10L25/24G10L15/02G10L15/063G10L13/08G10L2015/025
Inventor 张泽旺
Owner TENCENT TECH (SHENZHEN) CO LTD