Unlock instant, AI-driven research and patent intelligence for your innovation.

Speech synthesis method and structure, terminal and storage medium

A speech synthesis and phoneme technology, applied in speech synthesis, speech analysis, instruments, etc., can solve problems such as requiring more layers or parameters, and poor fine-grained extraction, so as to reduce the amount of model parameters, improve naturalness, and improve The effect of granularity

Pending Publication Date: 2022-06-28
PING AN TECH (SHENZHEN) CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention provides a speech synthesis method, structure, terminal and storage medium, aiming at solving the problem of poor fine-grained local feature extraction in the existing speech synthesis based on the Transformer structure and the need for more layers in the speech synthesis based on CNN Technical issues such as numbers or parameters

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech synthesis method and structure, terminal and storage medium
  • Speech synthesis method and structure, terminal and storage medium
  • Speech synthesis method and structure, terminal and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

[0043] The terms "first", "second" and "third" in the present invention are only used for description purposes, and should not be understood as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as "first", "second", "third" may expressly or implicitly include at least one of that feature. In the description of the present invention, "a plurality of" means at least tw...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a speech synthesis method and structure, a terminal and a storage medium. The method comprises the following steps: inputting a phoneme sequence of a text to be synthesized into a pre-trained speech synthesis model, and extracting local features of the phoneme sequence through the speech synthesis model; wherein the speech synthesis model is a speech synthesis model based on CNN and Transform, global context information of a phoneme sequence is obtained by using a multi-head attention mechanism, local information of the phoneme sequence is obtained by using depth separable convolution, and phoneme local features are obtained according to the global context information and the local information of the phoneme sequence; and mapping the phoneme local features into a Mel spectrum to obtain a speech synthesis result of the to-be-synthesized text. According to the method, the relationship between different phonemes is modeled by using the multi-head attention mechanism of Transform to obtain global context information, and local features in the phonemes are refined by using depth separable convolution, so that the granularity of the local information is improved, and the model parameter quantity, the calculation quantity and the training time are reduced.

Description

technical field [0001] The present invention relates to the technical field of speech synthesis, in particular to a speech synthesis method, structure, terminal and storage medium. Background technique [0002] The neural network-based end-to-end speech synthesis system TTS (Text To Speech) has made great progress in recent years. Existing speech synthesis technologies mainly include speech synthesis based on Transformer structure and speech synthesis based on CNN (Convolutional Neural Networks, Convolutional Neural Networks). However, Transformer or CNN models have certain limitations. The Transformer structure can capture long-term dependencies and has high training efficiency, but its local feature extraction is less fine-grained. CNN gradually captures local context information through layer-by-layer local receptive fields, however, CNN requires more layers or parameters to capture global context information. SUMMARY OF THE INVENTION [0003] The present invention pr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/08G10L25/30
CPCG10L13/08G10L25/30
Inventor 郭洋王健宗程宁
Owner PING AN TECH (SHENZHEN) CO LTD