Speech synthesis method, vocoder training method and device, medium and electronic equipment

A technology of speech synthesis and vocoder, which is applied in the fields of vocoder training method, speech synthesis method, device, medium and electronic equipment, and can solve problems such as unsatisfactory speed of speech synthesis

Active Publication Date: 2020-08-25
BEIJING BYTEDANCE NETWORK TECH CO LTD
View PDF15 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In recent years, with the successful application of deep neural network models in acoustic modeling, the accuracy and naturalness of speech synthesis have been effectively improved, but the speed of speech synthesis is not ideal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech synthesis method, vocoder training method and device, medium and electronic equipment
  • Speech synthesis method, vocoder training method and device, medium and electronic equipment
  • Speech synthesis method, vocoder training method and device, medium and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

preparation example Construction

[0041] figure 1 It is a flowchart of a speech synthesis method according to an exemplary embodiment. Such as figure 1 As shown, the method may include the following steps 101 and 102.

[0042] In step 101, the acoustic feature information of each speech frame corresponding to each phoneme in the text to be synthesized is acquired.

[0043] In an implementation manner, the above-mentioned acoustic feature information may be features such as cepstrum and linear spectrum.

[0044] In another implementation manner, the above-mentioned acoustic feature information may include a spectrum envelope and a fundamental frequency. Due to the higher dimensionality of the spectral envelope, it contains more spectral details than cepstrum and linear spectrum, that is, it contains richer feature information, thereby improving the accuracy of subsequent speech synthesis.

[0045] In step 102, the acoustic feature information of each speech frame is input into a vocoder to obtain audio info...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a speech synthesis method, a vocoder training method and device, a medium and electronic equipment. The method comprises the steps of obtaining acoustic feature information ofeach speech frame corresponding to each phoneme in a to-be-synthesized text; and inputting the acoustic feature information of each speech frame into a vocoder to obtain audio information corresponding to the to-be-synthesized text, the vocoder being a vocoder based on an extended convolutional neural network. Since the extended convolutional neural network has the capability of processing data in parallel, the speed of speech synthesis can be improved and the training speed of the vocoder can be accelerated by performing speech synthesis through the vocoder based on the extended convolutional neural network. Moreover, the vocoder performs speech synthesis based on the acoustic feature information of each speech frame, and the acoustic feature information of each speech frame does not need to be extracted, so that the operand of the vocoder is reduced, and the speed of speech synthesis is further improved.

Description

technical field [0001] The present disclosure relates to the technical field of speech synthesis, and in particular, to a speech synthesis method, a vocoder training method, a device, a medium, and an electronic device. Background technique [0002] The speech synthesis vocoder reconstructs the speech waveform from the acoustic features such as fundamental frequency and frequency spectrum, which is an integral part of the speech synthesis system. Among them, the accuracy, naturalness and synthesis speed of the synthesized speech are the measures of the performance of the vocoder Important indicators. In recent years, with the successful application of deep neural network models in acoustic modeling, the accuracy and naturalness of speech synthesis have been effectively improved, but the speed of speech synthesis is not ideal. Therefore, how to increase the speed of speech synthesis while ensuring the accuracy and naturalness of the synthesized speech has become the research...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/08G10L13/10G10L13/02G10L19/04G10L19/24G10L25/18G10L25/24G10L25/30
CPCG10L13/02G10L13/08G10L13/10G10L19/04G10L19/24G10L25/18G10L25/24G10L25/30
Inventor 顾宇
Owner BEIJING BYTEDANCE NETWORK TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products