Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

85 results about "Text to speech synthesis" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Text-to-speech (TTS) is a type of speech synthesis application that is used to create a spoken sound version of the text in a computer document, such as a help file or a Web page.

System for handling frequently asked questions in a natural language dialog service

InactiveUS7197460B1Improve customer relationshipEfficient mannerSpeech recognitionInput/output processes for data processingSpoken languageDialog management

A voice-enabled help desk service is disclosed. The service comprises an automatic speech recognition module for recognizing speech from a user, a spoken language understanding module for understanding the output from the automatic speech recognition module, a dialog management module for generating a response to speech from the user, a natural voices text-to-speech synthesis module for synthesizing speech to generate the response to the user, and a frequently asked questions module. The frequently asked questions module handles frequently asked questions from the user by changing voices and providing predetermined prompts to answer the frequently asked question.

System for handling frequently asked questions in a natural language dialog service

System for handling frequently asked questions in a natural language dialog service

System for handling frequently asked questions in a natural language dialog service

Owner:NUANCE COMM INC

Method and system for text-to-speech synthesis with personalized voice

ActiveUS20080235024A1Speech synthesisPersonalizationData set

A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker. The synthesis from text may include providing a synthesized image personalized to look like the image of the input speaker with expressions added from the visual input (455).

Method and system for text-to-speech synthesis with personalized voice

Method and system for text-to-speech synthesis with personalized voice

Method and system for text-to-speech synthesis with personalized voice

Owner:CERENCE OPERATING CO

Systems and methods for selective text to speech synthesis

ActiveUS20100082349A1Easy to convertSpeech synthesisText stringText to speech synthesis

Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.

Systems and methods for selective text to speech synthesis

Systems and methods for selective text to speech synthesis

Systems and methods for selective text to speech synthesis

Owner:APPLE INC

Systems and methods for text to speech synthesis

ActiveUS20100082346A1Easy to convertSpeech synthesisNatural language processingText to speech synthesis

Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.

Systems and methods for text to speech synthesis

Systems and methods for text to speech synthesis

Systems and methods for text to speech synthesis

Owner:APPLE INC

Systems and methods for text normalization for text to speech synthesis

ActiveUS8355919B2Easy to convertSpecial data processing applicationsSpeech synthesisText to speech synthesisText string

Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.

Systems and methods for text normalization for text to speech synthesis

Systems and methods for text normalization for text to speech synthesis

Systems and methods for text normalization for text to speech synthesis

Owner:APPLE INC

Systems and methods for text to speech synthesis

ActiveUS8352272B2Easy to convertSpeech synthesisText to speech synthesisText string

Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.

Systems and methods for text to speech synthesis

Systems and methods for text to speech synthesis

Systems and methods for text to speech synthesis

Owner:APPLE INC

Systems and methods for concatenation of words in text to speech synthesis

ActiveUS8396714B2Easy to convertSpeech recognitionSpeech synthesisText stringText to speech synthesis

Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.

Systems and methods for concatenation of words in text to speech synthesis

Systems and methods for concatenation of words in text to speech synthesis

Systems and methods for concatenation of words in text to speech synthesis

Owner:APPLE INC

Systems and methods for selecting from multiple phonectic transcriptions for text-to-speech synthesis

ActiveUS7869999B2Quality improvementReduce in quantitySpeech synthesisEnd systemText to speech synthesis

A system and method for generating synthetic speech, which operates in a computer implemented Text-To-Speech system. The system comprises at least a speaker database that has been previously created from user recordings, a Front-End system to receive an input text and a Text-To-Speech engine. The Front-End system generates multiple phonetic transcriptions for each word of the input text, and the TTS engine uses a cost function to select which phonetic transcription is the more appropriate for searching the speech segments within the speaker database to be concatenated and synthesized.

Systems and methods for selecting from multiple phonectic transcriptions for text-to-speech synthesis

Systems and methods for selecting from multiple phonectic transcriptions for text-to-speech synthesis

Systems and methods for selecting from multiple phonectic transcriptions for text-to-speech synthesis

Owner:CERENCE OPERATING CO

Voice recording tool for creating database used in text to speech synthesis system

InactiveUS7890330B2ReadingNavigation instrumentsSyllableGreedy algorithm

A method records verbal expressions of a person for use in a vehicle navigation system. The vehicle navigation system has a database including a map and text describing street names and points of interest of the map. The method includes the steps of obtaining from the database text of a word having at least one syllable, analyzing the syllable with a greedy algorithm to construct at least one text phrase comprising each syllable, such that the number of phrases is substantially minimized, converting the text phrase to at least one corresponding phonetic symbol phrase, displaying to the person the phonetic symbol phrase, the person verbally expressing each phrase of the phonetic symbol phrase, and recording the verbal expression of each phrase of the phonetic symbol phrase.

Voice recording tool for creating database used in text to speech synthesis system

Voice recording tool for creating database used in text to speech synthesis system

Voice recording tool for creating database used in text to speech synthesis system

Owner:ALPINE ELECTRONICS INC

Text to speech synthesis

ActiveUS20090076819A1Fast wayEasy procedureSpeech synthesisInput languageIteration loop

An input linguistic description is converted into a speech waveform by deriving at least one target unit sequence corresponding to the linguistic description, selecting from a waveform unit database for the target unit sequences a plurality of alternative unit sequences approximating the target unit sequences, concatenating the alternative unit sequences to alternative speech waveforms and presenting the alternative speech waveforms to an operating person and enabling the choice of one of the presented alternative speech waveforms. There are no iterative cycles of manual modification and automatic selection, which enables a fast way of working. The operator does not need knowledge of units, targets, and costs, but chooses from a set of given alternatives. The fine-tuning of TTS prompts therefore becomes accessible to non-experts.

Text to speech synthesis

Text to speech synthesis

Text to speech synthesis

Owner:CERENCE OPERATING CO

Systems and methods for speech preprocessing in text to speech synthesis

InactiveUS20100082328A1Easy to convertNatural language data processingSpecial data processing applicationsNatural language processingText to speech synthesis

Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.

Systems and methods for speech preprocessing in text to speech synthesis

Systems and methods for speech preprocessing in text to speech synthesis

Systems and methods for speech preprocessing in text to speech synthesis

Owner:APPLE INC

Systems and methods of detecting language and natural language strings for text to speech synthesis

ActiveUS20100082329A1Easy to convertNatural language data processingSpeech recognitionText stringText to speech synthesis

Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.

Systems and methods of detecting language and natural language strings for text to speech synthesis

Systems and methods of detecting language and natural language strings for text to speech synthesis

Systems and methods of detecting language and natural language strings for text to speech synthesis

Owner:APPLE INC

Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis

InactiveUS8719006B2Natural language data processingSpecial data processing applicationsPart of speechText to speech synthesis

In response to a word of a text sequence, a first part-of-speech (POS) tag is generated using a statistical part-of-speech (POS) tagger based on a corpus of trained text sequences, each representing a likely POS of a word for a given text sequence. A second POS tag is generated using a rule-based POS tagger based on a set of one or more rules associated with a type of an application associated with the text sequence. A final POS tag is assigned to the word of the text sequence for TTS synthesis based on the first POS tag and the second POS tag.

Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis

Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis

Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis

Owner:APPLE INC

Systems and methods for mapping phonemes for text to speech synthesis

InactiveUS20100082327A1Easy to convertSpecial data processing applicationsSpeech synthesisText to speech synthesisText string

Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.

Systems and methods for mapping phonemes for text to speech synthesis

Systems and methods for mapping phonemes for text to speech synthesis

Systems and methods for mapping phonemes for text to speech synthesis

Owner:APPLE INC

Systems and methods for text normalization for text to speech synthesis

ActiveUS20100082348A1Easy to convertSpecial data processing applicationsSpeech synthesisText to speech synthesisText string

Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.

Systems and methods for text normalization for text to speech synthesis

Systems and methods for text normalization for text to speech synthesis

Systems and methods for text normalization for text to speech synthesis

Owner:APPLE INC

Personal message service with enhanced text to speech synthesis

InactiveUS7027568B1Distribution costReduce manufacturing costAutomatic call-answering/message-recording/conversation-recordingAutomatic exchangesShort Message ServiceTerminal equipment

A server in a network gathers textual information, such as news items, E-mail and the like. From that information, the server develops or identifies messages for use by individual subscribers. The same server that accumulates the text messages or another server in the network converts the textual information in each message to a sequence of speech synthesizer instructions. The converted messages, containing the sequences of speech synthesizer instructions, are transmitted to each identified subscriber's terminal device. A synthesizer in the terminal generates an audio waveform signal, representing the speech information, in response to the instructions. In the preferred embodiment, the terminals utilize concatenative type speech synthesizers, each of which has an associated vocabulary of stored fundamental sound samples. The instructions identify the sound samples, in order. The instructions also provide parameters for controlling characteristics of the signal generated during waveform synthesis for each sound sample in each sequence. For example, the instructions may specify the pitch, duration, amplitude, attack envelope and decay envelope for each sample. The division of the text to speech synthesis processing between the server and the terminals places the cost of the front end processing in the server, which is a shared resource. As a result, the hardware and software of the terminal may be relatively simple and inexpensive. Also, it is possible to upgrade the quality of the synthesis by upgrading the server software, without modifying the terminals.

Personal message service with enhanced text to speech synthesis

Personal message service with enhanced text to speech synthesis

Personal message service with enhanced text to speech synthesis

Owner:GOOGLE LLC

Systems and methods for concatenation of words in text to speech synthesis

ActiveUS20100082347A1Easy to convertSpeech recognitionSpeech synthesisText stringText to speech synthesis

Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.

Systems and methods for concatenation of words in text to speech synthesis

Systems and methods for concatenation of words in text to speech synthesis

Systems and methods for concatenation of words in text to speech synthesis

Owner:APPLE INC

Voice-enabled dialog system

ActiveUS7869998B1Improve customer relationshipEfficient mannerSpeech recognitionSpecial data processing applicationsSpoken languageDialog management

A voice-enabled help desk service is disclosed. The service comprises an automatic speech recognition module for recognizing speech from a user, a spoken language understanding module for understanding the output from the automatic speech recognition module, a dialog management module for generating a response to speech from the user, a natural voices text-to-speech synthesis module for synthesizing speech to generate the response to the user, and a frequently asked questions module. The frequently asked questions module handles frequently asked questions from the user by changing voices and providing predetermined prompts to answer the frequently asked question.

Voice-enabled dialog system

Voice-enabled dialog system

Voice-enabled dialog system

Owner:NUANCE COMM INC

Method for facilitating text to speech synthesis using a differential vocoder

InactiveUS20070106513A1High memory requirementEffectively primeSpeech synthesisAdemetionineText to speech synthesis

A text to speech system (100) uses differential voice coding (230, 416) to compress a database of digitized speech waveform segments (210). A seed waveform (535) is used to precondition each speech waveform prior to encoding which, upon encoding, provides a seeded preconditioned encoded speech token (550). The seed portion (541) may be removed and the preconditioned encoded speech token portion (542) may be stored in a database for text to speech synthesis. When speech it to be synthesized, upon requesting the appropriate speech waveform for the present sound to be produced, the seed portion is preappended to the preconditioned encoded speech token for differential decoding.

Method for facilitating text to speech synthesis using a differential vocoder

Method for facilitating text to speech synthesis using a differential vocoder

Method for facilitating text to speech synthesis using a differential vocoder

Owner:MOTOROLA INC

Methods and apparatus related to pruning for concatenative text-to-speech synthesis

InactiveUS20080091428A1Speech recognitionSpeech synthesisSingular value decompositionCharacteristic space

The present invention provides, among other things, automatic identification of near-redundant units in a large TTS voice table, identifying which units are distinctive enough to keep and which units are sufficiently redundant to discard. According to an aspect of the invention, pruning is treated as a clustering problem in a suitable feature space. All instances of a given unit (e.g. word or characters expressed as Unicode strings) are mapped onto the feature space, and cluster units in that space using a suitable similarity measure. Since all units in a given cluster are, by construction, closely related from the point of view of the measure used, they are suitably redundant and can be replaced by a single instance. The disclosed method can detect near-redundancy in TTS units in a completely unsupervised manner, based on an original feature extraction and clustering strategy. Each unit can be processed in parallel, and the algorithm is totally scalable, with a pruning factor determinable by a user through the near-redundancy criterion. In an exemplary implementation, a matrix-style modal analysis via Singular Value Decomposition (SVD) is performed on the matrix of the observed instances for the given word unit, resulting in each row of the matrix associated with a feature vector, which can then be clustered using an appropriate closeness measure. Pruning results by mapping each instance to the centroid of its cluster.

Methods and apparatus related to pruning for concatenative text-to-speech synthesis

Methods and apparatus related to pruning for concatenative text-to-speech synthesis

Methods and apparatus related to pruning for concatenative text-to-speech synthesis

Owner:APPLE INC

System for handling frequently asked questions in a natural language dialog service

InactiveUS20090070113A1Improve usabilityImprove customer satisfactionSpeech recognitionInput/output processes for data processingSpoken languageDialog management

A voice-enabled help desk service is disclosed. The service comprises an automatic speech recognition module for recognizing speech from a user, a spoken language understanding module for understanding the output from the automatic speech recognition module, a dialog management module for generating a response to speech from the user, a natural voices text-to-speech synthesis module for synthesizing speech to generate the response to the user, and a frequently asked questions module. The frequently asked questions module handles frequently asked questions from the user by changing voices and providing predetermined prompts to answer the frequently asked question.

System for handling frequently asked questions in a natural language dialog service

System for handling frequently asked questions in a natural language dialog service

System for handling frequently asked questions in a natural language dialog service

Owner:NUANCE COMM INC

Systems and methods for selective text to speech synthesis

ActiveUS8712776B2Easy to convertSpeech synthesisText stringText to speech synthesis

Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.

Systems and methods for selective text to speech synthesis

Systems and methods for selective text to speech synthesis

Systems and methods for selective text to speech synthesis

Owner:APPLE INC

Accuracy of text-to-speech synthesis

ActiveUS20140222415A1Easy to manageReduce in quantitySpecial data processing applicationsSpeech synthesisText to speech synthesisSpeech sound

According to a first example configuration, a pair of text-to-speech synthesizers produces audio representations for each of multiple words. The outputs are compared to identify instances in which a lexicon lookup algorithm and a grapheme-to-phoneme algorithm produce different audio representations for the same words. Results of the analysis are used to train a classifier that subsequently determines a degree to which a grapheme-to-phoneme algorithm is likely to detect a newly detected out-of-vocabulary word to be converted into an audio representation. According to a second example configuration, a text analyzer tags a non-standard word. A group of reviewers generate one or more proposed text-to-speech expansion rules for a detected non-standard word. When there is a high amount of agreement amongst the reviewers how to expand the non-standard word, the proposed expansion rule is published for use by respective one or more text-to-speech synthesizers.

Accuracy of text-to-speech synthesis

Accuracy of text-to-speech synthesis

Accuracy of text-to-speech synthesis

Owner:CERENCE OPERATING CO

Method of handling frequently asked questions in a natural language dialog service

ActiveUS8645122B1Improve customer relationshipEfficient mannerNatural language data processingSpeech recognitionSpoken languageDialog management

A voice-enabled help desk service is disclosed. The service comprises an automatic speech recognition module for recognizing speech from a user, a spoken language understanding module for understanding the output from the automatic speech recognition module, a dialog management module for generating a response to speech from the user, a natural voices text-to-speech synthesis module for synthesizing speech to generate the response to the user, and a frequently asked questions module. The frequently asked questions module handles frequently asked questions from the user by changing voices and providing predetermined prompts to answer frequently asked questions.

Method of handling frequently asked questions in a natural language dialog service

Method of handling frequently asked questions in a natural language dialog service

Method of handling frequently asked questions in a natural language dialog service

Owner:NUANCE COMM INC

Personalized text-to-speech synthesis and personalized speech feature extraction

InactiveUS20110165912A1Defect of and inflexibilityReduce the amount of calculationSubstation equipmentSpeech recognitionPersonalizationFeature extraction

A personalized text-to-speech synthesizing device includes: a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by comparing a random speech fragment of the specific speaker with preset keywords, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker. A personalized speech feature library of a specific speaker is established without a deliberate training process, and a text is synthesized into personalized speech with the speech characteristics of the speaker.

Personalized text-to-speech synthesis and personalized speech feature extraction

Personalized text-to-speech synthesis and personalized speech feature extraction

Personalized text-to-speech synthesis and personalized speech feature extraction

Owner:SONY MOBILE COMM INC +1

System and method of spoken language understanding in a spoken dialog service

InactiveUS7451089B1Improve usabilityImprove customer satisfactionSpeech recognitionInput/output processes for data processingSpoken languageDialog management

A voice-enabled help desk service is disclosed. The service comprises an automatic speech recognition module for recognizing speech from a user, a spoken language understanding module for understanding the output from the automatic speech recognition module, a dialog management module for generating a response to speech from the user, a natural voices text-to-speech synthesis module for synthesizing speech to generate the response to the user, and a frequently asked questions module. The frequently asked questions module handles frequently asked questions from the user by changing voices and providing predetermined prompts to answer the frequently asked question.

System and method of spoken language understanding in a spoken dialog service

System and method of spoken language understanding in a spoken dialog service

System and method of spoken language understanding in a spoken dialog service

Owner:NUANCE COMM INC

Method and System for a Speech Synthesis and Advertising Service

ActiveUS20080059189A1Reduce the need for computing resourcesReduce deploymentAutomatic call-answering/message-recording/conversation-recordingMultiple digital computer combinationsText to speech synthesisSpeech sound

Methods and systems for providing a network-accessible text-to-speech synthesis service are provided. The service accepts content as input. After extracting textual content from the input content, the service transforms the content into a format suitable for high-quality speech synthesis. Additionally, the service produces audible advertisements, which are combined with the synthesized speech. The audible advertisements themselves can be generated from textual advertisement content.

Method and System for a Speech Synthesis and Advertising Service

Method and System for a Speech Synthesis and Advertising Service

Method and System for a Speech Synthesis and Advertising Service

Owner:CHEMTRON RES

Prosody generation for text-to-speech synthesis based on micro-prosodic data

InactiveUS20060074678A1Avoids round-off errorHigh complexitySpeech synthesisOriginal dataText to speech synthesis

A prosody modification system for use in text-to-speech includes an input receiving a sequence of prosodic data vectors Pn, measured at time Tn, which samples a sound waveform. A prosody data warping module directly derives new prosodic data vectors Qn from the original data vectors Pn using a function, which is controlled by warping parameters A0, . . . Ak, which avoids round-off errors in deriving quantized values, which has derivatives with respect to A0, . . . Ak, Pn, and Tn that are continuous, and which has sufficiently high complexity to model intentional prosody of the sound waveform, and sufficiently low complexity to avoid modeling micro-prosody of the sound waveform. The smoothness and simplicity of the function ensure that micro-prosodic perturbations and errors in measurement of Tn are transferred directly to the output Qn. The errors are thus reversed during re-synthesis and therefore eliminated, resulting in micro-prosodic perturbations being preserved during re-synthesis.

Prosody generation for text-to-speech synthesis based on micro-prosodic data

Prosody generation for text-to-speech synthesis based on micro-prosodic data

Prosody generation for text-to-speech synthesis based on micro-prosodic data

Owner:PANASONIC CORP

Systems and methods of detecting language and natural language strings for text to speech synthesis

ActiveUS8583418B2Easy to convertNatural language data processingSpeech recognitionText to speech synthesisText string

Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.

Systems and methods of detecting language and natural language strings for text to speech synthesis

Systems and methods of detecting language and natural language strings for text to speech synthesis

Systems and methods of detecting language and natural language strings for text to speech synthesis

Owner:APPLE INC

Method and system for intuitive text-to-speech synthesis customization

InactiveUS20050177369A1Speech synthesisGraphicsTransformation of text

A system for tuning the text-to-speech conversion process having a text-to-speech engine that converts the input text into a processed text form which includes speech features. A visual editing interface displaying the processed text form using graphical indicators on an output device to allow a user to edit the text and graphical indicators to modify the speech features of the text input.

Method and system for intuitive text-to-speech synthesis customization

Method and system for intuitive text-to-speech synthesis customization

Method and system for intuitive text-to-speech synthesis customization

Owner:PANASONIC CORP

Popular searches

Speech identification Automatic speech Frequently asked questions Human language Change voice Desk Natural language Human–computer interaction Speech synthesis Subvocal recognition