Unlock instant, AI-driven research and patent intelligence for your innovation.

Speech model training method and speech synthesis method based on front-end design

A training method and technology of speech models, applied in speech synthesis, speech analysis, neural learning methods, etc., can solve the problems of long model optimization iteration cycle, difficult to flexibly adjust end-to-end network, difficult to solve synthesis problems, etc., to reduce training Difficulty, improve stability and controllability, reduce the effect of pronunciation error and speech rate error probability

Active Publication Date: 2021-08-13
成都启英泰伦科技有限公司
View PDF9 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The iterative cycle of model optimization is long, and it is found that the synthesis problem is not easy to solve
In addition, for different application scenarios, the speech rate, pronunciation, prosody, etc. may change, and it is difficult for a highly integrated end-to-end network to flexibly adjust to these changes.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech model training method and speech synthesis method based on front-end design
  • Speech model training method and speech synthesis method based on front-end design

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] Specific embodiments of the present invention will be further described in detail below.

[0040] The speech model training method based on the front-end design of the present invention includes sample collection and follow-up steps:

[0041] Among them, the sample collection is to collect the high-quality audio data of a single speaker and the text corresponding to the audio data as the original training data, and extract the Mel features of the audio data. Next steps such as figure 1 shown.

[0042] S1: Predict and label the prosody of the text through the prosody prediction model, and generate prosody-labeled text with prosody labels.

[0043] The prosody model is mainly to predict the short pauses and long pauses of the text and make corresponding labels.

[0044] Specifically include:

[0045] S1.1 Use the text prosodic labeling dataset to train the prosody prediction model, and use special symbols to mark the text prosody. For example, symbols #1, #2, #3, and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a speech model training method and a speech synthesis method based on front-end design. The speech model training method comprises sample collection and subsequent steps. The subsequent steps are as follows: S1, generating a rhythm annotation text with rhythm annotation; S2, obtaining a linguistic feature first code of the text content; S3, obtaining the pronunciation duration of each phoneme; S4, training a pronunciation duration model of each phoneme; S5, outputting a front-end feature coding vector of a fixed dimension; and S6, carrying out iterative training to obtain an autoregression model. According to the method, the pronunciation error and speed error probability of the single word in the whole sentence can be effectively reduced. Meanwhile, pronunciation of special phonemes, phoneme pronunciation duration, sentence rhythm and the like can be controlled by finely adjusting front-end linguistic features and duration features.

Description

technical field [0001] The invention belongs to the technical field of artificial intelligence speech synthesis, and in particular relates to a speech model training method and a speech synthesis method based on front-end design. Background technique [0002] Speech synthesis is a technology that converts text into corresponding audio, also known as text-to-speech technology (TextTo Speech, TTS). With the development of artificial intelligence and the increase of social needs, the speech synthesis technology with accurate, clear, natural and pleasing pronunciation has attracted much attention. Traditional speech synthesis technologies include concatenation and parametric synthesis. Due to their poor naturalness and sense of hearing, these two methods are gradually replaced by end-to-end speech synthesis solutions. [0003] The end-to-end speech synthesis solution is to directly pass the text content through a more complex model to generate acoustic features, and then use th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/027G10L13/08G10L13/10G10L25/30G06N3/08
CPCG06N3/08G10L13/027G10L13/08G10L13/10G10L25/30
Inventor 陈佩云曹艳艳高君效
Owner 成都启英泰伦科技有限公司