Multi-speaker speech synthesis method based on probability generation and non-autoregression model

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
An autoregressive model and speech synthesis technology, applied in speech synthesis, speech analysis, instruments, etc., can solve problems such as low similarity and insufficient generalization

Pending Publication Date: 2022-04-01

XIAMEN UNIV

View PDF0 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] In order to solve the technical problems of insufficient generalization and low similarity of the multi-speaker speech synthesis system for speakers outside the data set in the prior art, the present invention proposes a multi-speaker speech synthesis system based on probability generation and non-autoregressive models. The speaker speech synthesis method is used to solve the above-mentioned technical problems to realize

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0046] The characteristics and exemplary embodiments of various aspects of the present invention will be described in detail below. In order to make the purpose, technical solutions and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only configured to explain the present invention, not to limit the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is only to provide a better understanding of the present invention by showing examples of the present invention.

[0047] It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a multi-speaker speech synthesis method based on probability generation and a non-autoregression model. The method comprises the steps that S1, a speaker personalized encoder and a probability generation encoder receive a target Mel spectrum and extract a speaker personalized vector and a probability generation vector respectively; s2, encoding the spliced and fused input vectors based on an encoder of a deep network to obtain phoneme-level deep features; s3, a phoneme duration predictor receives the spliced and fused fusion features and predicts a phoneme duration sequence; s4, receiving the phoneme time length sequence and expanding the fusion feature by the length regulation network to obtain a frame-level feature; s5, a decoder based on the deep network receives the frame-level features and maps the frame-level features into a predicted Mel spectrum, and a post-processing network supplements residual information of the predicted Mel spectrum; and S6, the vocoder maps the predicted Mel spectrum supplemented with the residual information into a sound waveform to obtain synthetic speech. The generalization of a multi-speaker speech synthesis system and the similarity of synthesized speech can be improved.

Description

technical field [0001] The invention relates to the technical field of speech synthesis, in particular to a multi-speaker speech synthesis method based on probability generation and non-autoregressive models. Background technique [0002] Text To Speech (TTS) refers to a technology that can convert any text into audio. In recent years, the end-to-end single-speaker speech synthesis model based on deep learning has been able to synthesize clear and natural speech. With the speech synthesis With the further development of technology, its application scenarios are gradually increasing, and there is also a certain demand for multi-speaker speech synthesis technology, such as: rapid customization of sound library, audio novels, etc. [0003] The traditional multi-speaker speech synthesis (multi-speaker TTS) system uses a one-hot vector to represent the identity of the speaker, and synthesizes the speech of a specific speaker by changing the one-hot vector, but the one-hot vector ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G10L13/08G10L25/30

Inventor李琳欧阳贝贝洪青阳

OwnerXIAMEN UNIV

Multi-speaker speech synthesis method based on probability generation and non-autoregression model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology