Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Lightweight multi-speaker voice synthesis system and electronic equipment

A speech synthesis, lightweight technology, applied in the direction of speech synthesis, speech analysis, instruments, etc., can solve the problems of large amount of calculation, slow synthesis speed, etc., to achieve the effect of speeding up synthesis speed, improving speed, and reducing model parameters

Active Publication Date: 2020-12-25
XIAMEN UNIV
View PDF6 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In addition, most of the existing text-to-speech systems can only realize a single speaker's single-style speech synthesis, and a few speech synthesis systems that can realize multi-speaker synthesis have the disadvantages of slow synthesis speed, large amount of calculation and memory consumption. question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Lightweight multi-speaker voice synthesis system and electronic equipment
  • Lightweight multi-speaker voice synthesis system and electronic equipment
  • Lightweight multi-speaker voice synthesis system and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] The inventor found the following technical problems in the prior art when implementing the technical concept of the present disclosure: (1) most of the existing end-to-end speech synthesis systems are autoregressive generative formulas that learn text-to-speech alignment relationships based on attention mechanisms The model and speech synthesis speed are slow, which affects the user experience of the actual landing product. (2) The non-autoregressive model FastSpeech extracts text features based on the self-attention mechanism. The computational complexity of this mechanism is the quadratic of the total length of the input text. The computational complexity is high and the memory resource consumption is large. (3) The non-autoregressive model FastSpeech currently can only synthesize the speech of a single speaker, and does not introduce any prosody-related speech information, which limits the personalized characteristics of the speech synthesis system and the expressiven...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a lightweight multi-speaker voice synthesis system and electronic equipment. The system comprises a text feature extraction and normalization module, a speaker feature extraction module, a feature fusion module and a voice generation module. The text feature extraction and normalization module is used for carrying out encoding and feature extraction on to-be-processed textinformation by adopting a lightweight encoder, carrying out duration prediction on each word or phoneme corresponding to text deep features output by the lightweight encoder by adopting a lightweightduration prediction network, and carrying out length normalization processing to obtain regular text features with length equal to that of a target Mel spectrum. The speaker feature extraction moduleis used for generating features capable of representing a target speaker. The feature fusion module is used for fusing the features of the target speaker with the regular text features. The voice generation module is used for carrying out deep feature extraction, dimension mapping and residual error integration on the fused features and generating voice. The system supports multi-speaker voice synthesis and is high in synthesis speed.

Description

technical field [0001] The disclosure belongs to the technical field of speech synthesis, and relates to a lightweight multi-speaker speech synthesis system and electronic equipment. Background technique [0002] In recent years, neural network-based end-to-end speech synthesis systems have surpassed traditional statistical parametric speech synthesis systems in terms of system architecture and generated speech quality. End-to-end speech synthesis systems, such as Tacotron2 system and Transformer text-to-speech system (Transformer TTS system for short), directly use the neural network to convert text into corresponding speech, no longer need a lot of complicated text front-end processing work, various Extraction of linguistic features and complex domain expert knowledge. [0003] However, most of the current mainstream end-to-end speech synthesis systems use the attention mechanism to implicitly learn the text-to-speech alignment relationship, which brings a huge amount of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/08G10L25/18G10L25/30
CPCG10L13/08G10L25/18G10L25/30
Inventor 李琳李松洪青阳
Owner XIAMEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products