Voice synthesis method based on limited Boltzmann machine

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A Boltzmann machine and speech synthesis technology, applied in the field of signal processing, can solve the problems of loss of spectral detail information, poor naturalness, and unsatisfactory sound quality of synthesized speech, so as to improve sound quality and naturalness, and improve modeling accuracy Effect

Active Publication Date: 2015-06-17

UNIV OF SCI & TECH OF CHINA

View PDF5 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the sound quality of synthetic speech is often not ideal, resulting in poor overall naturalness

[0003] The insufficiency of the above-mentioned traditional speech synthesis method based on HMM parameters in spectrum modeling is an important reason for the unsatisfactory sound quality of synthesized speech

Specifically, since the spectral features used in traditional spectral modeling are often some high-level spectral features, such as Mel Cepstra (Mel Cepstra), Line Spectral Pairs, etc., these features are based on the original speech A modeling or approximate representation of the spectrum has caused the loss of spectral detail information in the process of feature extraction; at the same time, since the traditional spectral modeling method usually uses a single Gaussian distribution to describe the spectral feature output of each state in the HMM Probability, in the synthesis stage, the spectral features are predicted based on the maximum output probability criterion. Since the mean value of the single Gaussian distribution has the largest output probability, the parameter generation result is very close to the mean value of the model, and the mean value is based on the maximum likelihood in the training phase. The criterion is estimated by averaging the training samples, which causes the predicted spectral features to be too smooth, thus affecting the sound quality of the final synthesized speech

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0025] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0026] Embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings, as figure 1 Shown is a schematic flow chart of a speech synthesis method based on a restricted Boltzmann machine provided by an embodiment of the present invention, and the method includes:

[0027] Step 11: In the model training phase, the spectral envelope extracted by the adaptive weighted spectral interpolation ST...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a voice synthesis method based on a limited Boltzmann machine. The method comprises the following steps: substituting spectral envelope which is extracted by an adaptive weighted spectrum interpolation STRAIGHT synthesizer for high-level spectrum feature for spectrum modeling; performing state segmentation on acoustic feature sequence in a training database by using a Gaussian-hidden Markov model (HMM) model obtained through training; segmenting the original spectral envelope feature of the extracted training database by utilizing the starting and ending time of each state obtained through segmentation, and acquiring spectral envelope data corresponding to each state in a context related HMM model; and predicting the spectrum feature by using the Gaussian-HMM mode, feeding the spectral envelope feature obtained through prediction and base frequency feature into the STRAIGHT synthesizer and generating the final synthesized voice. By the method, the spectrum feature modeling precision of an HMM-based parameter voice synthesis method can be increased, so that the tone quality and the naturalness of the synthesized voice can be improved.

Description

technical field [0001] The invention relates to the technical field of signal processing, in particular to a speech synthesis method based on a restricted Boltzmann machine. Background technique [0002] At present, speech synthesis realizes the conversion of text to speech, and is one of the core technologies of intelligent human-computer interaction. Parametric speech synthesis based on Hidden Markov Model (HMM) is a mainstream speech synthesis method at this stage. During training, the method first extracts the acoustic features such as frequency spectrum and fundamental frequency in the training speech database, and then uses a unified HMM framework to model the acoustic features; when synthesizing, firstly use the statistical model obtained from the training based on the maximum output probability criterion. The prediction of various acoustic features, and then the predicted acoustic features are sent to the parameter synthesizer to reconstruct the synthesized speech. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L13/027

Inventor 凌震华陈凌辉戴礼荣

Owner UNIV OF SCI & TECH OF CHINA

Voice synthesis method based on limited Boltzmann machine

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology