Lightweight multi-speaker speech synthesis system and electronic equipment

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A speech synthesis, lightweight technology, applied in the direction of speech synthesis, speech analysis, instruments, etc., can solve the problems of large amount of calculation, slow synthesis speed, etc., to achieve the effect of speeding up the synthesis speed, improving the speed, and reducing the computational complexity

Active Publication Date: 2022-07-08

XIAMEN UNIV

View PDF6 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] In addition, most of the existing text-to-speech systems can only realize a single speaker's single-style speech synthesis, and a few speech synthesis systems that can realize multi-speaker synthesis have the disadvantages of slow synthesis speed, large amount of calculation and memory consumption. question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0037] When realizing the technical concept of the present disclosure, the inventor found that the prior art has the following technical problems: (1) Most of the existing end-to-end speech synthesis systems belong to the autoregressive generative formula that learns the text-to-speech alignment relationship based on the attention mechanism model, the speech synthesis speed is slow, which affects the user experience of the actual product. (2) The non-autoregressive model FastSpeech extracts text features based on the self-attention mechanism. The computational complexity of this mechanism is the quadratic of the total length of the input text, and the computational complexity is high and the memory resource consumption is large. (3) The non-autoregressive model FastSpeech can currently only synthesize the speech of a single speaker, and does not introduce any prosody-related speech information, which limits the personalized characteristics of the speech synthesis system and the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A lightweight multi-speaker speech synthesis system and electronic equipment, the system comprises: a text feature extraction and regularization module, a speaker feature extraction module, a feature fusion module and a speech generation module. The text feature extraction and regularization module is used to use a lightweight encoder to encode and extract the text information to be processed, and use a lightweight duration prediction network to perform each word corresponding to the text deep features output by the lightweight encoder. Or phoneme for duration prediction, and for length warping to obtain regular text features with the same length as the target mel spectrum. The speaker feature extraction module is used to generate features that can characterize the target speaker. The feature fusion module is used to fuse the features of the target speaker with regular text features. The speech generation module is used to perform deep feature extraction, dimension mapping, residual integration and speech generation on the fused features. The system supports multi-speaker speech synthesis and the synthesis speed is fast.

Description

technical field [0001] The present disclosure belongs to the technical field of speech synthesis, and relates to a lightweight multi-speaker speech synthesis system and electronic equipment. Background technique [0002] In recent years, neural network-based end-to-end speech synthesis systems have surpassed traditional statistical parametric speech synthesis systems in terms of system architecture and generated speech quality. End-to-end speech synthesis systems, such as the Tacotron2 system and the Transformer text-to-speech system (Transformer TTS system for short), directly use neural networks to convert text into corresponding speech, eliminating the need for a lot of complex text front-end processing, various Linguistic feature extraction, and complex domain expert knowledge. [0003] However, most of the current mainstream end-to-end speech synthesis systems use the attention mechanism to implicitly learn the text-to-speech alignment relationship, which brings a huge...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityPatents(China)

IPC IPC(8): G10L13/08G10L25/18G10L25/30

CPCG10L13/08G10L25/18G10L25/30

Inventor李琳李松洪青阳

OwnerXIAMEN UNIV

Lightweight multi-speaker speech synthesis system and electronic equipment

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology