Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-speaker speech synthesis method based on probability generation and non-autoregression model

An autoregressive model and speech synthesis technology, applied in speech synthesis, speech analysis, instruments, etc., can solve problems such as low similarity and insufficient generalization

Pending Publication Date: 2022-04-01
XIAMEN UNIV
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] In order to solve the technical problems of insufficient generalization and low similarity of the multi-speaker speech synthesis system for speakers outside the data set in the prior art, the present invention proposes a multi-speaker speech synthesis system based on probability generation and non-autoregressive models. The speaker speech synthesis method is used to solve the above-mentioned technical problems to realize

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-speaker speech synthesis method based on probability generation and non-autoregression model
  • Multi-speaker speech synthesis method based on probability generation and non-autoregression model
  • Multi-speaker speech synthesis method based on probability generation and non-autoregression model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The characteristics and exemplary embodiments of various aspects of the present invention will be described in detail below. In order to make the purpose, technical solutions and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only configured to explain the present invention, not to limit the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is only to provide a better understanding of the present invention by showing examples of the present invention.

[0047] It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a multi-speaker speech synthesis method based on probability generation and a non-autoregression model. The method comprises the steps that S1, a speaker personalized encoder and a probability generation encoder receive a target Mel spectrum and extract a speaker personalized vector and a probability generation vector respectively; s2, encoding the spliced and fused input vectors based on an encoder of a deep network to obtain phoneme-level deep features; s3, a phoneme duration predictor receives the spliced and fused fusion features and predicts a phoneme duration sequence; s4, receiving the phoneme time length sequence and expanding the fusion feature by the length regulation network to obtain a frame-level feature; s5, a decoder based on the deep network receives the frame-level features and maps the frame-level features into a predicted Mel spectrum, and a post-processing network supplements residual information of the predicted Mel spectrum; and S6, the vocoder maps the predicted Mel spectrum supplemented with the residual information into a sound waveform to obtain synthetic speech. The generalization of a multi-speaker speech synthesis system and the similarity of synthesized speech can be improved.

Description

technical field [0001] The invention relates to the technical field of speech synthesis, in particular to a multi-speaker speech synthesis method based on probability generation and non-autoregressive models. Background technique [0002] Text To Speech (TTS) refers to a technology that can convert any text into audio. In recent years, the end-to-end single-speaker speech synthesis model based on deep learning has been able to synthesize clear and natural speech. With the speech synthesis With the further development of technology, its application scenarios are gradually increasing, and there is also a certain demand for multi-speaker speech synthesis technology, such as: rapid customization of sound library, audio novels, etc. [0003] The traditional multi-speaker speech synthesis (multi-speaker TTS) system uses a one-hot vector to represent the identity of the speaker, and synthesizes the speech of a specific speaker by changing the one-hot vector, but the one-hot vector ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/08G10L25/30
Inventor 李琳欧阳贝贝洪青阳
Owner XIAMEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products