Method and device for forecasting duration of speech synthesis unit

A technology of speech synthesis and duration, applied in the field of information processing, can solve problems such as single

Active Publication Date: 2013-03-20
BEIJING SINOVOICE TECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, there is a significant shortcoming in the existing duration prediction method: a decision tree-Gaussian mixture model is used to predict the duration. The prediction first roughly classifies the value space of the context parameters, and then uses a single mean value to describe the class space, there is overaverage in both processes

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for forecasting duration of speech synthesis unit
  • Method and device for forecasting duration of speech synthesis unit
  • Method and device for forecasting duration of speech synthesis unit

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] In order to make the above objectives, features and advantages of the present invention more obvious and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0061] The existing training method of the time prediction model uses a decision tree-Gaussian mixture model to predict the time length. The reason why an accurate time length prediction result cannot be obtained is that the decision tree-Gaussian mixture model is established on the basis of a decision tree. Because decision tree-based clustering is limited by the number of tree nodes, only the most significant classification criteria can be selected for rough classification; this will enable the prediction of duration through the decision tree-Gaussian mixture model, which uses the mean of a single duration to describe the entire sub Class duration value, thus obliterating the differences between the specific personaliti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method and device for forecasting duration of a speech synthesis unit. The method comprises the steps of: aiming at context environmental parameters, carrying out initial forecasting on the duration of the speech synthesis unit by utilizing a stepwise linear regression duration forecasting model so as to obtain an initial duration forecasting result; and distributing the initial duration forecasting result by utilizing a decision tree-Gaussian mixture model so as to obtain a distributed duration forecasting result. According to the method and device which are providedby the invention, the accuracy of the duration forecasting result can be increased to ensure that a speech synthesized in a speech synthesis system has a real sense of rhythm.

Description

Technical field [0001] The present invention relates to the technical field of information processing, in particular to a method and device for training a duration prediction model of stepwise linear regression, and a method and device for predicting the duration of a speech synthesis unit. Background technique [0002] In a speech synthesis system (Text-to-Speech, TTS), the prediction and generation of the duration of a speech synthesis unit is an indispensable step, which plays a vital role in the prosody hearing of synthesized speech. [0003] According to the theories of phonetics and phonology, the duration and other characteristics of the speech synthesis unit are determined by its context. The prediction of speech duration is essentially a mapping from the value space of the context environment parameter to the duration value space. For the analysis and modeling method of this kind of mapping relationship, the existing time length prediction method usually adopts the decisi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/187G10L13/04
Inventor 王愈李健
Owner BEIJING SINOVOICE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products