Method and device for optimizing end-to-end speech synthesis model, and electronic equipment

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for speech synthesis and optimization methods, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of poor end-to-end model robustness and affect speech quality, and achieve the effect of improving robustness and increasing data disturbance.

Active Publication Date: 2021-05-28

BEIJING SINOVOICE TECH CO LTD

View PDF4 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, there are also certain problems in end-to-end synthesis, which are related to the pure black-box structure of the end-to-end model

For example, for a model with an explicit duration module, its decoder is prone to overfitting to wrong information, thus affecting the quality of the final synthesized speech

It can be seen that the current end-to-end model has poor robustness, and it is urgent for those skilled in the art to provide a solution to improve the robustness of the end-to-end model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0046] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0047] refer to figure 1 , shows a flow chart of steps of an end-to-end speech synthesis model optimization method according to an embodiment of the present invention.

[0048] The end-to-end speech synthesis model optimization method of the embodiment of the present invention may include the following steps:

[0049] Step 101: Perform first soft occlusion on the phonemes included in the text input into the end-to-end speech synth...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a method and a device for optimizing an end-to-end speech synthesis model, electronic equipment and a storage medium. The method comprises the following steps: carrying out the first soft shielding of phonemes contained in a text inputted into an end-to-end speech synthesis model according to a first preset rule, and generating a second text; sequentially encoding the second text by using a phoneme encoder, and performing prediction processing on the encoded second text by using a variable information predictor to obtain a first output; performing second soft shielding on the first output according to a second preset rule; and inputting the first output subjected to the second soft shielding processing into a preset decoder, and decoding to obtain a Mel spectrum. According to the optimization method of the end-to-end speech synthesis model provided by the invention, soft shielding is added to the input of the end-to-end speech synthesis model and the input of the decoder respectively, so that data disturbance is increased, and the robustness of the end-to-end speech synthesis model can be improved.

Description

technical field [0001] The invention relates to the technical field of speech synthesis, in particular to an end-to-end speech synthesis model optimization method and device, and electronic equipment. Background technique [0002] Generally speaking as figure 1 As shown, TTS (text-to-speech, speech synthesis) is divided into several parts such as text analysis (for example, text regularization, polyphone disambiguation, etc.) module, prosody prediction module, duration model, acoustic model and vocoder . The processed text passes through the prosody prediction module, outputs the text with prosody symbols, and then performs word-to-sound conversion and other links. The current mainstream end-to-end model integrates the duration model and the acoustic model into one model. The text generates phoneme information through the front end, while the end-to-end model uses phoneme information as input to generate a mel spectrum, and then connects a vocoder. Convert acoustic featur...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L13/08G10L13/047

CPCG10L13/08G10L13/047

Inventor 李睿端李健陈明武卫东

Owner BEIJING SINOVOICE TECH CO LTD

Method and device for optimizing end-to-end speech synthesis model, and electronic equipment

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology