Speech synthesis model training method, speech synthesis method and device thereof

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for speech synthesis and training methods, applied in speech synthesis, speech analysis, speech recognition, etc., can solve the problems of difficulty in ensuring the accuracy of synthesized speech, blurred pronunciation, and background noise.

Pending Publication Date: 2021-09-14

TENCENT TECH (SHENZHEN) CO LTD

View PDF0 Cites 7 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, since a very small error on the spectrum may lead to ambiguous pronunciation or background noise, and the MSE loss value only takes into account the spectral error, it is difficult to guarantee the accuracy of the synthesized speech for the trained speech synthesis model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0093] Embodiments of the present application provide a method for training a speech synthesis model, a method and device for speech synthesis, which can comprehensively evaluate a speech synthesis model in combination with speech recognition error and spectral error, thereby facilitating training to obtain a speech synthesis model with better prediction effect, Improve the accuracy of synthesized speech.

[0094] The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein can, for example, be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "corresponding to", and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a speech synthesis model training method based on an artificial intelligence technology, and particularly relates to the technical field of speech processing. The method comprises the steps of obtaining a to-be-trained sample pair; based on the to-be-trained text, obtaining a first Mel spectrum through a speech synthesis model; obtaining a first phoneme sequence through a speech recognition model based on the first Mel spectrum; and updating model parameters of the speech synthesis model according to a loss value between the first Mel spectrum and the real Mel spectrum and a loss value between the first phoneme sequence and the labeled phoneme sequence. The embodiment of the invention further provides a speech synthesis method and a device thereof, the speech synthesis model can be comprehensively evaluated in combination with the speech recognition error and the spectrum error, so that the speech synthesis model with a better prediction effect can be obtained through training, and the speech synthesis accuracy is improved.

Description

technical field [0001] The present application relates to the technical field of speech processing, and in particular, to a method for training a speech synthesis model, a method and apparatus for speech synthesis. Background technique [0002] Voice is a very common way of people's daily communication. With the development of artificial intelligence (AI) technology, text to speech (TTS) technology has attracted more and more attention. Using TTS technology, any text information can be converted into corresponding speech, so that the synthesized speech can be Understandable, clear, natural and expressive. [0003] At present, the mainstream end-to-end speech synthesis model is usually used to realize speech synthesis. First, the text to be synthesized is converted into a phoneme sequence, and then the phoneme sequence is input into the speech synthesis model, and the synthesized speech is output through the speech synthesis model. [0004] In the process of training the sp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L13/04G10L13/08G10L15/02G10L15/06G10L25/24

CPCG10L13/04G10L25/24G10L15/02G10L15/063G10L13/08G10L2015/025

Inventor 张泽旺

Owner TENCENT TECH (SHENZHEN) CO LTD

Speech synthesis model training method, speech synthesis method and device thereof

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology