Method and device for optimizing end-to-end speech synthesis model, and electronic equipment

A technology for speech synthesis and optimization methods, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of poor end-to-end model robustness and affect speech quality, and achieve the effect of improving robustness and increasing data disturbance.

Active Publication Date: 2021-05-28
BEIJING SINOVOICE TECH CO LTD
View PDF4 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are also certain problems in end-to-end synthesis, which are related to the pure black-box structure of the end-to-end model
For example, for a model with an explicit duration module, its decoder is prone to overfitting to wrong information, thus affecting the quality of the final synthesized speech
It can be seen that the current end-to-end model has poor robustness, and it is urgent for those skilled in the art to provide a solution to improve the robustness of the end-to-end model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for optimizing end-to-end speech synthesis model, and electronic equipment
  • Method and device for optimizing end-to-end speech synthesis model, and electronic equipment
  • Method and device for optimizing end-to-end speech synthesis model, and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0047] refer to figure 1 , shows a flow chart of steps of an end-to-end speech synthesis model optimization method according to an embodiment of the present invention.

[0048] The end-to-end speech synthesis model optimization method of the embodiment of the present invention may include the following steps:

[0049] Step 101: Perform first soft occlusion on the phonemes included in the text input into the end-to-end speech synth...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and a device for optimizing an end-to-end speech synthesis model, electronic equipment and a storage medium. The method comprises the following steps: carrying out the first soft shielding of phonemes contained in a text inputted into an end-to-end speech synthesis model according to a first preset rule, and generating a second text; sequentially encoding the second text by using a phoneme encoder, and performing prediction processing on the encoded second text by using a variable information predictor to obtain a first output; performing second soft shielding on the first output according to a second preset rule; and inputting the first output subjected to the second soft shielding processing into a preset decoder, and decoding to obtain a Mel spectrum. According to the optimization method of the end-to-end speech synthesis model provided by the invention, soft shielding is added to the input of the end-to-end speech synthesis model and the input of the decoder respectively, so that data disturbance is increased, and the robustness of the end-to-end speech synthesis model can be improved.

Description

technical field [0001] The invention relates to the technical field of speech synthesis, in particular to an end-to-end speech synthesis model optimization method and device, and electronic equipment. Background technique [0002] Generally speaking as figure 1 As shown, TTS (text-to-speech, speech synthesis) is divided into several parts such as text analysis (for example, text regularization, polyphone disambiguation, etc.) module, prosody prediction module, duration model, acoustic model and vocoder . The processed text passes through the prosody prediction module, outputs the text with prosody symbols, and then performs word-to-sound conversion and other links. The current mainstream end-to-end model integrates the duration model and the acoustic model into one model. The text generates phoneme information through the front end, while the end-to-end model uses phoneme information as input to generate a mel spectrum, and then connects a vocoder. Convert acoustic featur...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/08G10L13/047
CPCG10L13/08G10L13/047
Inventor 李睿端李健陈明武卫东
Owner BEIJING SINOVOICE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products