Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for optimizing end-to-end speech synthesis model, electronic device

A technology of speech synthesis and optimization method, which is applied in speech synthesis, speech analysis, instruments, etc. It can solve problems affecting speech quality and poor robustness of end-to-end models, and achieve the effect of increasing data disturbance and improving robustness

Active Publication Date: 2022-08-09
BEIJING SINOVOICE TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are also certain problems in end-to-end synthesis, which are related to the pure black-box structure of the end-to-end model
For example, for a model with an explicit duration module, its decoder is prone to overfitting to wrong information, thus affecting the quality of the final synthesized speech
It can be seen that the current end-to-end model has poor robustness, and it is urgent for those skilled in the art to provide a solution to improve the robustness of the end-to-end model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for optimizing end-to-end speech synthesis model, electronic device
  • Method and device for optimizing end-to-end speech synthesis model, electronic device
  • Method and device for optimizing end-to-end speech synthesis model, electronic device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present disclosure will be more thoroughly understood, and will fully convey the scope of the present disclosure to those skilled in the art.

[0047] refer to figure 1 , showing a flow chart of steps of an end-to-end speech synthesis model optimization method according to an embodiment of the present invention.

[0048] The end-to-end speech synthesis model optimization method according to the embodiment of the present invention may include the following steps:

[0049] Step 101: According to the first preset rule, perform a first soft occlusion on the phonemes i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides an optimization method and device for an end-to-end speech synthesis model, an electronic device and a storage medium, wherein the method includes: according to a first preset rule, performing an optimization on the text input in the end-to-end speech synthesis model. The included phonemes are subjected to the first soft occlusion to generate the second text; the phoneme encoder is used to encode the second text, and the variable information predictor is used to perform prediction processing on the encoded second text to obtain the first text. an output; according to the second preset rule, perform second soft occlusion on the first output; input the first output processed by the second soft occlusion into a preset decoder to decode to obtain a mel spectrum . The optimization method of the end-to-end speech synthesis model provided by the present invention adds soft occlusion to the input of the end-to-end speech synthesis model and the decoder input respectively, thereby increasing data disturbance and improving the robustness of the end-to-end speech synthesis model. .

Description

technical field [0001] The present invention relates to the technical field of speech synthesis, in particular to a method and device for optimizing an end-to-end speech synthesis model, and electronic equipment. Background technique [0002] Generally speaking, as figure 1 As shown, TTS (text-to-speech, speech synthesis) is divided into several parts such as text analysis (eg, text regularization, polyphonic word disambiguation, etc.) module, prosody prediction module, duration model, acoustic model and vocoder . The processed text goes through the prosody prediction module to output the text with prosodic symbols, and then performs word-to-sound conversion and other links. The current mainstream end-to-end model combines the duration model and the acoustic model into one model. The text generates phoneme information through the front end, while the end-to-end model uses the phoneme information as input to generate a Mel spectrum, and then an external vocoder. Convert ac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L13/08G10L13/047
CPCG10L13/08G10L13/047
Inventor 李睿端李健陈明武卫东
Owner BEIJING SINOVOICE TECH CO LTD