Voice synthesis method and device, computer readable medium and electronic equipment

A technology of speech synthesis and speech features, applied in speech synthesis methods, computer-readable media, electronic equipment, and devices, which can solve problems such as increasing the difficulty of text content, not smooth enough language conversion, and inability to stop

Pending Publication Date: 2020-06-16
BEIJING BYTEDANCE NETWORK TECH CO LTD
View PDF14 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In this way, the text reading system can only read the text content in sequence, and cannot pause according to the text content
In this way, it is difficult for the user to understand the corresponding text content based on the spoken voice
Especially when th

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice synthesis method and device, computer readable medium and electronic equipment
  • Voice synthesis method and device, computer readable medium and electronic equipment
  • Voice synthesis method and device, computer readable medium and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

preparation example Construction

[0042] figure 2 It is a flowchart of a speech synthesis method according to another exemplary embodiment. Such as figure 2 As shown, the above method may further include the following step 103 .

[0043] In step 103, the prosodic representation of the target speaker is obtained.

[0044] In the present disclosure, the above-mentioned target reader may be a default reader, or may be a reading value set by the user. The prosodic representations described above can be used to indicate pitch and volume changes. Moreover, the prosody representation of the target reader can be obtained in the following manner: first, obtain the first Mel spectrum feature information corresponding to any second audio information read by the target reader; after that, the first Mel spectrum feature information Input into the preset Variational Auto-Encoder (VAE) model to obtain the prosodic representation of the target speaker. Wherein, the above-mentioned VAE model is trained based on the Mel ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a voice synthesis method and device, a computer readable medium and electronic equipment. The method comprises the steps that voice feature information of a multilingual textand language feature vectors of all languages in the multilingual text are acquired, and the voice feature information comprises phonemes, tones, segmented words and rhythm boundaries; and voice synthesis is performed according to the voice feature information and the language feature vector to obtain first audio information corresponding to the multilingual text. Therefore, the accuracy and understandability of the first audio data are improved, and a user can quickly understand the text content corresponding to the first audio data. In addition, pause can be carried out at the natural rhythmboundary during speech synthesis, so that the naturalness and fluency of the first audio information can be improved. Besides, the voice synthesis method can realize smooth conversion of different languages, supports voice synthesis of texts of various languages, and does not limit specific languages, namely, has wide applicability.

Description

technical field [0001] The present disclosure relates to the technical field of speech synthesis, and in particular, to a speech synthesis method, device, computer readable medium and electronic equipment. Background technique [0002] Nowadays, readers use speech synthesis (also known as text to speech (Text to Speech, TTS), which can convert text information into speech and read it in real time, which is equivalent to installing an artificial mouth on the machine) to read the text. There are more and more functions. At the present stage, when the text reading system reads the text aloud, each word usually takes the same reading time, and pauses for a little longer at the punctuation marks in the text. In this way, the text reading system can only read the text content sequentially, and cannot pause according to the text content. In this way, it is difficult for the user to understand the corresponding text content based on the voice read aloud. Especially when the text ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L13/08G10L13/10G10L13/047G10L13/04G10L13/06G10L25/24G10L25/30
CPCG10L13/047G10L13/06G10L13/08G10L13/086G10L13/10G10L25/24G10L25/30
Inventor 殷翔
Owner BEIJING BYTEDANCE NETWORK TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products