Method for synthesizing pronunciation based on rhythm model and parameter selecting voice

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A prosody and parameter technology, applied in the field of speech synthesis, can solve the problems of reducing naturalness of speech, not considering acoustic parameters, and mismatching sound length, etc., to achieve the effect of improving naturalness

Active Publication Date: 2006-06-14

BEIJING SINOVOICE TECH CO LTD

View PDF0 Cites 18 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

For example, if two consecutive syllables in a sentence are selected from the two sentences respectively, although the sound selection is carried out according to the position, the actual acoustic parameters may not be considered, which may cause the two consecutive syllables not to conform to the actual voice. law of change

This leads to a pitch jump in the sense of hearing, or a mismatch in the length of the sound, which reduces the naturalness of the voice

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0022] Before the specific speech synthesis, first establish the following resource base:

[0023] Large-scale recording sound library: speech waveform data, the starting position of each syllable in the speech waveform and its acoustic parameter data (pitch, sound length, sound intensity).

[0024] Index library: For all syllables, the serial numbers of all samples in the large-scale recording sound library are recorded, and the relevant data of this syllable can be quickly obtained by searching the large-scale recording sound library from this serial number.

[0025] Prosodic model library: The prosody model obtained through statistical training, that is, what the pitch, sound length, and sound intensity of each syllable in a sentence should be like. The values of these acoustic parameters are closely related to factors such as sentence pattern, part-of-speech sequence, and length of sentences and prosodic phrases.

[0026] Such as figure 1 The process of speech synthesi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a voice synthesizing method based on rhythm model and parameter-based sound selection, making acoustics rhythm parameter planning to obtain the target values of acoustics parameters expected by each syllable; then making maximum matching, selecting those with the smallest difference as really used samples; after the maximum matching, making single character matching treatment on the unmatched segments; calculating a synthesizing cost of all segment paths through all syllable candidate samples, where the synthesizing cost is determined by the difference between the acoustics parameters of each candidate sample and their planned values and the difference synthesis between the candidate samples of two adjacent syllables in the paths; obtaining a path with the lowest synthesizing cost by dynamical planning algorithm; as all syllable samples are selected, obtaining the data in a voice base and making waveform splicing and obtaining the final synthesis result.

Description

technical field [0001] The invention relates to the technical field of speech synthesis, in particular to a speech synthesis method. Background technique [0002] At present, the development direction of Chinese speech synthesis is based on the waveform splicing technology of large-scale real recording sound library. The so-called large-scale real recording sound library refers to the recording sound library that records a large number of natural speech, and its scope basically covers various pronunciation situations in most contexts. For different contexts, the system will select the most matching Original speech fragments to be spliced. Due to the large size of the sound library, in almost all cases, the most suitable original natural speech can be found without using other techniques for adjustment, thus ensuring the consistency of the final synthesized speech and the original speech. In addition, the fragments selected here go beyond the level of syllables, and can be ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L13/00G10L13/10

Inventor陈明吕士楠张连毅武卫东肖娜

OwnerBEIJING SINOVOICE TECH CO LTD

Method for synthesizing pronunciation based on rhythm model and parameter selecting voice

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology