Clock level variational encoder based on attention
A technology of attention and duration, applied in instruments, speech analysis, biological neural network models, etc., can solve problems such as invalid prosody modeling and lack of expressiveness in synthesized speech
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0025] Text-to-speech (TTS) models commonly used by speech synthesis systems are typically run-time given only a textual input, without any reference acoustic representation, and in order to produce synthetic speech that sounds realistic, many must be introduced that are not provided by the textual input language factor. A subset of these linguistic factors is collectively referred to as prosody, and can include intonation (pitch changes), stress (stressed versus unstressed syllables), voice duration, loudness, pitch, rhythm, and voice style. Prosody may indicate the emotional state of speech, the form of speech (eg, statements, questions, commands, etc.), the presence of sarcasm or sarcasm in speech, uncertainty in speech knowledge, or other factors that cannot be encoded by the grammar or lexical choices of the input text language element. Thus, a given text input associated with a high degree of prosodic variability can produce synthetic speech with local variations in pit...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com