Speech synthesis method and device

A speech synthesis and speech technology, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of too long text, slow speech synthesis, inability to accurately synthesize audio, etc., and achieve the effect of improving speech synthesis speed

Pending Publication Date: 2020-12-04
SICHUAN CHANGHONG ELECTRIC CO LTD
View PDF11 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to provide a method and device for speech synthesis to improve the stability and efficiency of the speech synthesis system, in order to solve the technical problems in the background art: when the text is too long in the speech synthesis process, the speech synthesis speed is slow, and due to An issue where the text was too long and the audio could not be synthesized accurately

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech synthesis method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038] Such as figure 1 Shown, a kind of speech synthesis method comprises the following steps:

[0039] Analyze the long text to be synthesized and select the segmentation level;

[0040] Segment and process the long text after the selected segmentation level to obtain a short text set;

[0041] Specify the number of concurrency, and batch process the short text collection;

[0042] Loading the speech synthesis model, calculating the acoustic features of the respective voices of the short texts in batches, and splicing the acoustic features of the voices in order;

[0043] The acoustic features of the spliced ​​speech are input into the vocoder model, and the audio after speech synthesis is output.

[0044] The analyzing the long text to be synthesized and selecting the segmentation level includes: analyzing the prosody of the long text to be synthesized, and after obtaining the pause level label, selecting the level of the pause label as the segmentation level.

[0045] ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a speech synthesis method, and belongs to the technical field of speech synthesis. The method comprises the following steps: performing rhythm analysis on a to-be-synthesized text; segmenting a long text into a short text set with a proper length according to a rhythm analysis result, and recording a text segmenting sequence; calling a voice synthesis model for the text objects in the text set to generate acoustic features in parallel; splicing the acoustic features obtained by the text objects according to the text segmenting sequence; and enabling the spliced completeacoustic features to pass through a vocoder model, and finally, outputting an audio. On the basis of traditional speech synthesis methods, parallel processing in the speech spectrum generation process of a to-be-synthesized text is reasonably utilized, the problems that when the text is too long, the speech synthesis speed is low, and speech synthesis model spectrum synthesis is prone to failuredue to the fact that the text is too long are effectively solved, the speech synthesis speed is effectively increased, so that the speech synthesis system is more efficient, stable and natural.

Description

technical field [0001] The invention relates to the technical field of speech synthesis, in particular to a speech synthesis method and device. Background technique [0002] Speech synthesis is a technology that converts text information into speech information. Its process mainly includes: processing text, such as text preprocessing, word segmentation, prosody prediction, phoneme labeling, etc., and then training the acoustic model, using Mel spectrum or linear The spectrum is used as an acoustic feature, and finally a vocoder is used to synthesize the spectrum into sound. [0003] At present, the end-to-end (End-to-End) modeling method represented by Tacotron has become the mainstream. Tacotron is a mid-section structure proposed by Google that combines the original duration model and acoustic model, and can connect any TTS front-end and back-end. The TTS front-end includes text regularization, prosody prediction, phoneme conversion, etc., while the TTS back-end mainly re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/08G10L13/02
CPCG10L13/08G10L13/02
Inventor 朱海王昆周琳珉
Owner SICHUAN CHANGHONG ELECTRIC CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products