Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Lightweight multi-speaker speech synthesis system and electronic equipment

A speech synthesis, lightweight technology, applied in the direction of speech synthesis, speech analysis, instruments, etc., can solve the problems of large amount of calculation, slow synthesis speed, etc., to achieve the effect of speeding up the synthesis speed, improving the speed, and reducing the computational complexity

Active Publication Date: 2022-07-08
XIAMEN UNIV
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In addition, most of the existing text-to-speech systems can only realize a single speaker's single-style speech synthesis, and a few speech synthesis systems that can realize multi-speaker synthesis have the disadvantages of slow synthesis speed, large amount of calculation and memory consumption. question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Lightweight multi-speaker speech synthesis system and electronic equipment
  • Lightweight multi-speaker speech synthesis system and electronic equipment
  • Lightweight multi-speaker speech synthesis system and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] When realizing the technical concept of the present disclosure, the inventor found that the prior art has the following technical problems: (1) Most of the existing end-to-end speech synthesis systems belong to the autoregressive generative formula that learns the text-to-speech alignment relationship based on the attention mechanism model, the speech synthesis speed is slow, which affects the user experience of the actual product. (2) The non-autoregressive model FastSpeech extracts text features based on the self-attention mechanism. The computational complexity of this mechanism is the quadratic of the total length of the input text, and the computational complexity is high and the memory resource consumption is large. (3) The non-autoregressive model FastSpeech can currently only synthesize the speech of a single speaker, and does not introduce any prosody-related speech information, which limits the personalized characteristics of the speech synthesis system and the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A lightweight multi-speaker speech synthesis system and electronic equipment, the system comprises: a text feature extraction and regularization module, a speaker feature extraction module, a feature fusion module and a speech generation module. The text feature extraction and regularization module is used to use a lightweight encoder to encode and extract the text information to be processed, and use a lightweight duration prediction network to perform each word corresponding to the text deep features output by the lightweight encoder. Or phoneme for duration prediction, and for length warping to obtain regular text features with the same length as the target mel spectrum. The speaker feature extraction module is used to generate features that can characterize the target speaker. The feature fusion module is used to fuse the features of the target speaker with regular text features. The speech generation module is used to perform deep feature extraction, dimension mapping, residual integration and speech generation on the fused features. The system supports multi-speaker speech synthesis and the synthesis speed is fast.

Description

technical field [0001] The present disclosure belongs to the technical field of speech synthesis, and relates to a lightweight multi-speaker speech synthesis system and electronic equipment. Background technique [0002] In recent years, neural network-based end-to-end speech synthesis systems have surpassed traditional statistical parametric speech synthesis systems in terms of system architecture and generated speech quality. End-to-end speech synthesis systems, such as the Tacotron2 system and the Transformer text-to-speech system (Transformer TTS system for short), directly use neural networks to convert text into corresponding speech, eliminating the need for a lot of complex text front-end processing, various Linguistic feature extraction, and complex domain expert knowledge. [0003] However, most of the current mainstream end-to-end speech synthesis systems use the attention mechanism to implicitly learn the text-to-speech alignment relationship, which brings a huge...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L13/08G10L25/18G10L25/30
CPCG10L13/08G10L25/18G10L25/30
Inventor 李琳李松洪青阳
Owner XIAMEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products