Lightweight multi-speaker voice synthesis system and electronic equipment

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A speech synthesis, lightweight technology, applied in the direction of speech synthesis, speech analysis, instruments, etc., can solve the problems of large amount of calculation, slow synthesis speed, etc., to achieve the effect of speeding up synthesis speed, improving speed, and reducing model parameters

Active Publication Date: 2020-12-25

XIAMEN UNIV

View PDF6 Cites 18 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] In addition, most of the existing text-to-speech systems can only realize a single speaker's single-style speech synthesis, and a few speech synthesis systems that can realize multi-speaker synthesis have the disadvantages of slow synthesis speed, large amount of calculation and memory consumption. question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0037] The inventor found the following technical problems in the prior art when implementing the technical concept of the present disclosure: (1) most of the existing end-to-end speech synthesis systems are autoregressive generative formulas that learn text-to-speech alignment relationships based on attention mechanisms The model and speech synthesis speed are slow, which affects the user experience of the actual landing product. (2) The non-autoregressive model FastSpeech extracts text features based on the self-attention mechanism. The computational complexity of this mechanism is the quadratic of the total length of the input text. The computational complexity is high and the memory resource consumption is large. (3) The non-autoregressive model FastSpeech currently can only synthesize the speech of a single speaker, and does not introduce any prosody-related speech information, which limits the personalized characteristics of the speech synthesis system and the expressiven...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a lightweight multi-speaker voice synthesis system and electronic equipment. The system comprises a text feature extraction and normalization module, a speaker feature extraction module, a feature fusion module and a voice generation module. The text feature extraction and normalization module is used for carrying out encoding and feature extraction on to-be-processed textinformation by adopting a lightweight encoder, carrying out duration prediction on each word or phoneme corresponding to text deep features output by the lightweight encoder by adopting a lightweightduration prediction network, and carrying out length normalization processing to obtain regular text features with length equal to that of a target Mel spectrum. The speaker feature extraction moduleis used for generating features capable of representing a target speaker. The feature fusion module is used for fusing the features of the target speaker with the regular text features. The voice generation module is used for carrying out deep feature extraction, dimension mapping and residual error integration on the fused features and generating voice. The system supports multi-speaker voice synthesis and is high in synthesis speed.

Description

technical field [0001] The disclosure belongs to the technical field of speech synthesis, and relates to a lightweight multi-speaker speech synthesis system and electronic equipment. Background technique [0002] In recent years, neural network-based end-to-end speech synthesis systems have surpassed traditional statistical parametric speech synthesis systems in terms of system architecture and generated speech quality. End-to-end speech synthesis systems, such as Tacotron2 system and Transformer text-to-speech system (Transformer TTS system for short), directly use the neural network to convert text into corresponding speech, no longer need a lot of complicated text front-end processing work, various Extraction of linguistic features and complex domain expert knowledge. [0003] However, most of the current mainstream end-to-end speech synthesis systems use the attention mechanism to implicitly learn the text-to-speech alignment relationship, which brings a huge amount of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G10L13/08G10L25/18G10L25/30

CPCG10L13/08G10L25/18G10L25/30

Inventor李琳李松洪青阳

OwnerXIAMEN UNIV

Lightweight multi-speaker voice synthesis system and electronic equipment

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology