Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Voice synthesis method and device based on rhythm boundary, medium and equipment

A prosodic boundary, speech synthesis technology, applied in speech synthesis, speech analysis, instruments, etc., can solve problems such as the inability to fully represent the semantic and grammatical information of the input text sequence, and the prosodic effect that does not conform to the prosodic rhythm of the text content.

Pending Publication Date: 2020-12-29
PING AN TECH (SHENZHEN) CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] These methods try to analyze prosodic information from the phonetic side, that is, extract prosodic features from spectral information in the frequency domain, because the prosody of a speech can be fully displayed in the frequency domain, but it cannot fully represent the semantics and semantics of the input text sequence. Grammatical information, but the text-side information largely determines the local prosody information of a sentence, so the prosody effect of synthesized speech often does not conform to the prosody rhythm of the text content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice synthesis method and device based on rhythm boundary, medium and equipment
  • Voice synthesis method and device based on rhythm boundary, medium and equipment
  • Voice synthesis method and device based on rhythm boundary, medium and equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0053] figure 1 It shows a schematic flowchart of a speech synthesis method based on prosodic boundaries according to an embodiment of the present invention, see figure 1 It can be seen that the prosodic boundary-based speech synthesis method provided by the embodiment of the present invention may at least include the following steps S102-S108.

[0054] Step S102, acquiring prosodic boundary information of the text information to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a speech synthesis method and device based on a rhythm boundary, a medium and equipment. The method comprises the following steps: acquiring rhythm boundary information of to-be-synthesized text information, and generating graph embedding information based on the rhythm boundary information; generating an implicit state vector of the graph embedding information and a sequence code of the text information to be synthesized; generating a speech spectrum based on the implicit state vector and the sequence code; and synthesizing voice information of the to-be-synthesized text information according to the voice spectrum. Based on the method provided by the invention, semantics and grammatical structures of sentences can be analyzed from a text side, and rhythm boundariesare represented through graph embedding, so that rhythm information in the text can fully participate in training and reasoning, and the rhythm feeling of the synthesized voice information is improved. The invention also relates to a blockchain technology, and data such as the hidden state vector and the sequence code of the text information to be synthesized are stored in a blockchain, so that the security of data storage is improved.

Description

technical field [0001] The present invention relates to the technical field of speech synthesis, in particular to a speech synthesis method, device, medium and equipment based on prosodic boundaries. Background technique [0002] In a speech synthesis system (TTS) based on deep learning, prosody is an important factor determining the naturalness and fluency of synthesized speech. Prosody can be subdivided into 3-dimensional features, fundamental frequency, loudness and duration. In end-to-end speech synthesis systems, academia and industry try to extract the hidden state of prosodic embedding from the MEL spectrum of the speech, and then a global style vector is introduced into the multi-head attention mechanism for training to control the synthesis of speech sentence prosodic effect; variational autoencoder is used as a prosodic classifier to learn the hidden state of prosodic embedding in a variety of prosodic data sets; in order to obtain more accurate local prosodic con...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/10G10L13/027
CPCG10L13/10G10L13/027
Inventor 孙奥兰王健宗程宁
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products