Speech model training method and speech synthesis method based on front-end design
A training method and technology of speech models, applied in speech synthesis, speech analysis, neural learning methods, etc., can solve the problems of long model optimization iteration cycle, difficult to flexibly adjust end-to-end network, difficult to solve synthesis problems, etc., to reduce training Difficulty, improve stability and controllability, reduce the effect of pronunciation error and speech rate error probability
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0039] Specific embodiments of the present invention will be further described in detail below.
[0040] The speech model training method based on the front-end design of the present invention includes sample collection and follow-up steps:
[0041] Among them, the sample collection is to collect the high-quality audio data of a single speaker and the text corresponding to the audio data as the original training data, and extract the Mel features of the audio data. Next steps such as figure 1 shown.
[0042] S1: Predict and label the prosody of the text through the prosody prediction model, and generate prosody-labeled text with prosody labels.
[0043] The prosody model is mainly to predict the short pauses and long pauses of the text and make corresponding labels.
[0044] Specifically include:
[0045] S1.1 Use the text prosodic labeling dataset to train the prosody prediction model, and use special symbols to mark the text prosody. For example, symbols #1, #2, #3, and...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 

