Unlock instant, AI-driven research and patent intelligence for your innovation.

Method for improving naturalness of speech synthesis

A speech synthesis and natural technology, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of fewer models and the loss of naturalness of synthesized speech, and achieve the effects of real human pronunciation, reduced complexity, and saving computing and deployment costs

Pending Publication Date: 2021-10-08
HANGZHOU QUWEI SCI & TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The quality of synthesis is getting better and better, but there are very few real end-to-end models at present, basically building a bridge between text and speech through Mel spectrum
This results in a loss of naturalness of synthesized speech

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for improving naturalness of speech synthesis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0019] Such as figure 1 In the described embodiment, a method for improving the naturalness of speech synthesis specifically includes the following steps:

[0020] (1) Text encoding: the text is obtained from the phoneme corresponding to the text through the tool from the grapheme to the phoneme, and then all the phonemes form a phoneme dictionary, and the number of the phoneme dictionary is used as the dimension of the embedding layer to characterize the phoneme of the text, that is, through Embedding in deep learning maps phonemes to a feature vector;

[0021] (2) The represented features are encoded by the CBHG module. The represented features refer to the feature vectors in deep learning. Coding refers to mapping the represented features to another feature vector through the CBHG module; the CBHG module consists of a one-dimensional volume...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for improving the naturalness of speech synthesis. The method comprises the following steps:phonemes corresponding to a text are obtained from the text through a font-to-phoneme tool, all the phonemes form a phoneme dictionary, the number of the phoneme dictionaries serves as the dimension of an embedded layer, the phonemes of the text are represented, and represented features are coded through a CBHG module; a text coding result is used as input, the duration of each phoneme is predicted, a prediction result is compared with a real label, and a duration model is optimized; and the features expanded by the time length model are decoded , the decoded results are combined into complex features, and the decoded complex features are restored into a voice waveform through short-time inverse Fourier transform in the original audio. The method has the beneficial effects that the complexity of the model can be reduced, the calculation amount is reduced, and the calculation and deployment cost is saved; and the naturalness of the synthesized voice is improved, and the pronunciation is more like a real person.

Description

technical field [0001] The invention relates to the technical field of speech synthesis, in particular to a method for improving the naturalness of speech synthesis. Background technique [0002] Due to the development of deep learning and its application in various fields, speech synthesis has also benefited a lot. Speech synthesis can be roughly divided into two stages: 1. Splicing method and parameter method. The splicing method refers to searching for speech fragments in a relatively large corpus, and then searching for corresponding speech fragments and combining them according to the text to be synthesized. Although the voice synthesized in this way is the voice of a real person, it will be limited in the performance of some global features, such as speaking tone and rhythm. At the same time, the splicing method also requires a relatively large corpus, and has relatively high requirements for the data set. The parametric method refers to the establishment of a mappi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/02G10L13/08
CPCG10L13/02G10L13/08
Inventor 盛乐园
Owner HANGZHOU QUWEI SCI & TECH