Clock level variational encoder based on attention

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of attention and duration, applied in instruments, speech analysis, biological neural network models, etc., can solve problems such as invalid prosody modeling and lack of expressiveness in synthesized speech

Pending Publication Date: 2022-07-12

GOOGLE LLC

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

While traditional splicing and parametric synthesis models are able to provide intelligible speech, and recent advances in neural modeling of speech have significantly improved the naturalness of synthesized speech, most existing TTS models are ineffective at modeling prosody , resulting in a lack of expressiveness in the synthesized speech used by important applications

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0025] Text-to-speech (TTS) models commonly used by speech synthesis systems are typically run-time given only a textual input, without any reference acoustic representation, and in order to produce synthetic speech that sounds realistic, many must be introduced that are not provided by the textual input language factor. A subset of these linguistic factors is collectively referred to as prosody, and can include intonation (pitch changes), stress (stressed versus unstressed syllables), voice duration, loudness, pitch, rhythm, and voice style. Prosody may indicate the emotional state of speech, the form of speech (eg, statements, questions, commands, etc.), the presence of sarcasm or sarcasm in speech, uncertainty in speech knowledge, or other factors that cannot be encoded by the grammar or lexical choices of the input text language element. Thus, a given text input associated with a high degree of prosodic variability can produce synthetic speech with local variations in pit...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A method (400) for representing an expected rhythm in synthetic speech includes receiving a textual utterance (310) having at least one word (240), and selecting an utterance insert (204) for the textual utterance. Each word in the text utterance has at least one syllable (230), and each syllable has at least one phoneme (220). The utterance embedding represents an expected rhythm. For each syllable, using the selected utterance embedding, the method further includes predicting a duration (238) of the syllable by decoding a rhythm syllable embedding (232, 234) of the syllable based on attention of a linguistic feature (222) of each phoneme of the syllable by an attention mechanism (340), and generate a plurality of fixed length prediction frames (260) based on the predicted duration of the syllable.

Description

technical field [0001] The present disclosure relates to an attention-based clock-level variational encoder. Background technique [0002] Speech synthesis systems use a text-to-speech (TTS) model to generate speech from textual input. The generated / synthesized speech should accurately convey the message (intelligibility) while sounding like human speech (naturalness) with the expected prosody (expressiveness). While traditional concatenation and parametric synthesis models are able to provide intelligible speech, and recent advances in neural modeling of speech have significantly improved the naturalness of synthesized speech, most existing TTS models are ineffective in modeling prosody , resulting in the lack of expressiveness of synthesized speech used in important applications. For example, for applications such as conversational assistants and long-form readers, it is desirable to generate authentic speech from prosodic features that are not conveyed in the input text...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L13/10G06N3/04G10L13/047

CPCG10L13/10G10L13/047G10L2013/105G06N3/084G06N3/047G06N3/044G06N3/045G10L25/30

Inventor 罗伯特·克拉克詹竣安文森特·万

Owner GOOGLE LLC

Who we serve

R&D Engineer
R&D Manager
IP Professional

Why Eureka

Industry Leading Data Capabilities
Powerful AI technology
Patent DNA Extraction

Social media

Eureka Blog

Learn More

PatSnap group products

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Clock level variational encoder based on attention

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology