Speech synthesis device supporting styles of multiple speakers, language switching and controllable rhythm
A speech synthesis and speaker technology, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of inability to decouple and separate multiple speakers, inability to mix speech synthesis, single control of synthesized speech, etc., to achieve rich functions and reduce deployment Cost, the effect of improving fault tolerance
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0114] The present invention is tested on a text data set containing 32,500 audio and corresponding prosody annotations from six speakers, including 30,000 in Chinese, 2,000 in English, and 500 mixed in Chinese and English. The present invention carries out following pretreatment to data set:
[0115] 1) Extract Chinese and English phoneme files and corresponding audio, and use the open source tool Montreal-forced-aligner to extract the pronunciation duration of the phoneme.
[0116] 2) Extract the mel spectrum for each audio, where the window size is 50 milliseconds, the size of the frame shift is 12.5 milliseconds, and the dimension is 80 dimensions.
[0117] 3) For each audio, the pitch of the audio is extracted using the World vocoder.
[0118] 4) Summing the mel-spectrum extracted from the audio in dimensions to obtain the energy of the mel-spectrum.
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com