Method and device for optimizing end-to-end speech synthesis model, electronic device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of speech synthesis and optimization method, which is applied in speech synthesis, speech analysis, instruments, etc. It can solve problems affecting speech quality and poor robustness of end-to-end models, and achieve the effect of increasing data disturbance and improving robustness

Active Publication Date: 2022-08-09

BEIJING SINOVOICE TECH CO LTD

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, there are also certain problems in end-to-end synthesis, which are related to the pure black-box structure of the end-to-end model

For example, for a model with an explicit duration module, its decoder is prone to overfitting to wrong information, thus affecting the quality of the final synthesized speech

It can be seen that the current end-to-end model has poor robustness, and it is urgent for those skilled in the art to provide a solution to improve the robustness of the end-to-end model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0046] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present disclosure will be more thoroughly understood, and will fully convey the scope of the present disclosure to those skilled in the art.

[0047] refer to figure 1 , showing a flow chart of steps of an end-to-end speech synthesis model optimization method according to an embodiment of the present invention.

[0048] The end-to-end speech synthesis model optimization method according to the embodiment of the present invention may include the following steps:

[0049] Step 101: According to the first preset rule, perform a first soft occlusion on the phonemes i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention provides an optimization method and device for an end-to-end speech synthesis model, an electronic device and a storage medium, wherein the method includes: according to a first preset rule, performing an optimization on the text input in the end-to-end speech synthesis model. The included phonemes are subjected to the first soft occlusion to generate the second text; the phoneme encoder is used to encode the second text, and the variable information predictor is used to perform prediction processing on the encoded second text to obtain the first text. an output; according to the second preset rule, perform second soft occlusion on the first output; input the first output processed by the second soft occlusion into a preset decoder to decode to obtain a mel spectrum . The optimization method of the end-to-end speech synthesis model provided by the present invention adds soft occlusion to the input of the end-to-end speech synthesis model and the decoder input respectively, thereby increasing data disturbance and improving the robustness of the end-to-end speech synthesis model. .

Description

technical field [0001] The present invention relates to the technical field of speech synthesis, in particular to a method and device for optimizing an end-to-end speech synthesis model, and electronic equipment. Background technique [0002] Generally speaking, as figure 1 As shown, TTS (text-to-speech, speech synthesis) is divided into several parts such as text analysis (eg, text regularization, polyphonic word disambiguation, etc.) module, prosody prediction module, duration model, acoustic model and vocoder . The processed text goes through the prosody prediction module to output the text with prosodic symbols, and then performs word-to-sound conversion and other links. The current mainstream end-to-end model combines the duration model and the acoustic model into one model. The text generates phoneme information through the front end, while the end-to-end model uses the phoneme information as input to generate a Mel spectrum, and then an external vocoder. Convert ac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L13/08G10L13/047

CPCG10L13/08G10L13/047

Inventor 李睿端李健陈明武卫东

Owner BEIJING SINOVOICE TECH CO LTD

Method and device for optimizing end-to-end speech synthesis model, electronic device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology