Method for synthesizing pronunciation based on rhythm model and parameter selecting voice

A prosody and parameter technology, applied in the field of speech synthesis, can solve the problems of reducing naturalness of speech, not considering acoustic parameters, and mismatching sound length, etc., to achieve the effect of improving naturalness

Active Publication Date: 2006-06-14
BEIJING SINOVOICE TECH CO LTD
View PDF0 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For example, if two consecutive syllables in a sentence are selected from the two sentences respectively, although the sound selection is carried out according to the position, the actual acoustic parameters may not be considered,

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for synthesizing pronunciation based on rhythm model and parameter selecting voice
  • Method for synthesizing pronunciation based on rhythm model and parameter selecting voice
  • Method for synthesizing pronunciation based on rhythm model and parameter selecting voice

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] Before the specific speech synthesis, first establish the following resource base:

[0023] Large-scale recording sound library: speech waveform data, the starting position of each syllable in the speech waveform and its acoustic parameter data (pitch, sound length, sound intensity).

[0024] Index library: For all syllables, the serial numbers of all samples in the large-scale recording sound library are recorded, and the relevant data of this syllable can be quickly obtained by searching the large-scale recording sound library from this serial number.

[0025] Prosodic model library: The prosody model obtained through statistical training, that is, what the pitch, sound length, and sound intensity of each syllable in a sentence should be like. The values ​​of these acoustic parameters are closely related to factors such as sentence pattern, part-of-speech sequence, and length of sentences and prosodic phrases.

[0026] Such as figure 1 The process of speech synthesi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a voice synthesizing method based on rhythm model and parameter-based sound selection, making acoustics rhythm parameter planning to obtain the target values of acoustics parameters expected by each syllable; then making maximum matching, selecting those with the smallest difference as really used samples; after the maximum matching, making single character matching treatment on the unmatched segments; calculating a synthesizing cost of all segment paths through all syllable candidate samples, where the synthesizing cost is determined by the difference between the acoustics parameters of each candidate sample and their planned values and the difference synthesis between the candidate samples of two adjacent syllables in the paths; obtaining a path with the lowest synthesizing cost by dynamical planning algorithm; as all syllable samples are selected, obtaining the data in a voice base and making waveform splicing and obtaining the final synthesis result.

Description

technical field [0001] The invention relates to the technical field of speech synthesis, in particular to a speech synthesis method. Background technique [0002] At present, the development direction of Chinese speech synthesis is based on the waveform splicing technology of large-scale real recording sound library. The so-called large-scale real recording sound library refers to the recording sound library that records a large number of natural speech, and its scope basically covers various pronunciation situations in most contexts. For different contexts, the system will select the most matching Original speech fragments to be spliced. Due to the large size of the sound library, in almost all cases, the most suitable original natural speech can be found without using other techniques for adjustment, thus ensuring the consistency of the final synthesized speech and the original speech. In addition, the fragments selected here go beyond the level of syllables, and can be ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L13/00G10L13/10
Inventor 陈明吕士楠张连毅武卫东肖娜
Owner BEIJING SINOVOICE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products