Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

2492 results about "Speech synthesis" patented technology

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.

System for handling frequently asked questions in a natural language dialog service

A voice-enabled help desk service is disclosed. The service comprises an automatic speech recognition module for recognizing speech from a user, a spoken language understanding module for understanding the output from the automatic speech recognition module, a dialog management module for generating a response to speech from the user, a natural voices text-to-speech synthesis module for synthesizing speech to generate the response to the user, and a frequently asked questions module. The frequently asked questions module handles frequently asked questions from the user by changing voices and providing predetermined prompts to answer the frequently asked question.
Owner:NUANCE COMM INC

System and method for supporting interactive user interface operations and storage medium

There is provided a system for supporting interactive operations for inputting user commands to a household electric apparatus such as a television set / monitor and information apparatuses. According to the system for supporting interactive operations applying an animated character called a personified assistant interacting with a user based on speech synthesis and animation, realizing a user-friendly user interface and simultaneously making it possible to meet a demand for complex commands or providing an entry for services. Further, since the system is provided with a command system producing an effect close to human natural language, the user can easily operate the apparatus with a feeling close to ordinary human conversation.
Owner:SONY CORP

Interactive multimedia tour guide

An interactive multimedia tour guide provides a user with packaged tours in a multimedia format that includes directions and useful information about a selected tour. The packaged tours are composed of principle and ancillary points of interest. A user profile is developed which is used to generate a preference mask for the user. The preference mask is used to select only those ancillary points of interest that would be of most interest to the user. The selected tour is stored on a portable self-contained electronic system which includes a GPS navigation system and cell phone. The system includes voice recognition software and speech synthesis software to provide the user with a verbal interface That provides directions and information on various points of interest during the tour. Combined with optional camera, the interactive multimedia tour guide allows for rapid identification and editing of pictures or videos made on a tour.
Owner:WHITHAM HLDG

Method and apparatus for delivering a virtual reality environment

This invention involves a virtual environment created through the combination of technologies. The invention employs the knowledge and experience of a Personal Assistant or Host, created through Artificial Intelligence applications, which assists and guides the user of the environment to products and / or services that they will most likely be interested in purchasing or requiring. The intelligent assistant's choices are based on its experiences with the specific user. The intelligent assistant communicates with the user by means of a speech recognition and speech synthesis device. This invention is an easy to use virtual reality environment that takes advantage of existing technologies and global communications networks such as the Internet without requiring any given degree of computer literacy. This invention includes a virtual intelligent assistant for each user which adapts to its user as it provides individualized guidance. The intelligent assistant or avatar projects human-like features and behaviors appropriate to the preferences of its user and appears as a virtual person to the user.
Owner:MISSION THE

Front-end architecture for a multi-lingual text-to-speech system

A text processing system for processing multi-lingual text for a speech synthesizer includes a first language dependent module for performing at least one of text and prosody analysis on a portion of input text comprising a first language. A second language dependent module performs at least one of text and prosody analysis on a second portion of input text comprising a second language. A third module is adapted to receive outputs from the first and second dependent module and performs prosodic and phonetic context abstraction over the outputs based on multi-lingual text.
Owner:MICROSOFT TECH LICENSING LLC

Method, apparatus for synthesizing speech and acoustic model training method for speech synthesis

According to one embodiment, a method, apparatus for synthesizing speech, and a method for training acoustic model used in speech synthesis is provided. The method for synthesizing speech may include determining data generated by text analysis as fuzzy heteronym data, performing fuzzy heteronym prediction on the fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and probabilities thereof, generating fuzzy context feature labels based on the plurality of candidate pronunciations and probabilities thereof, determining model parameters for the fuzzy context feature labels based on acoustic model with fuzzy decision tree, generating speech parameters from the model parameters, and synthesizing the speech parameters via synthesizer as speech.
Owner:KK TOSHIBA

Phonetic decoding and concatentive speech synthesis

A speech processing system includes a multiplexer that receives speech data input as part of a conversation turn in a conversation session between two or more users where one user is a speaker and each of the other users is a listener in each conversation turn. A speech recognizing engine converts the speech data to an input string of acoustic data while a speech modifier forms an output string based on the input string by changing an item of acoustic data according to a rule. The system also includes a phoneme speech engine for converting the first output string of acoustic data including modified and unmodified data to speech data for output via the multiplexer to listeners during the conversation turn.
Owner:CERENCE OPERATING CO

Text-to-speech method and system, computer program product therefor

A text-to-speech system adapted to operate on text in a first language including sections in a second language, includes a grapheme / phoneme transcriptor for converting the sections in the second language into phonemes of the second language; a mapping module configured for mapping at least part of the phonemes of the second language onto sets of phonemes of the first language; and a speech-synthesis module adapted to be fed with a resulting stream of phonemes including the sets of phonemes of the first language resulting from mapping and the stream of phonemes of the first language representative of the text, and to generate a speech signal from the resulting stream of phonemes.
Owner:CERENCE OPERATING CO

Text structure for voice synthesis, voice synthesis method, voice synthesis apparatus, and computer program thereof

InactiveUS7487093B2Continuously and easily changeSound input/outputSpeech synthesisMorphingSynthesis methods
In a voice synthesis apparatus, by bounding a desired range of input text to be output by, e.g., a start tag “<morphing type=“emotion” start=“happy” end=“angry”>” and end tag < / morphing>, a feature of synthetic voice is continuously changed while gradually changing voice from a happy voice to an angry voice upon outputting synthetic voice.
Owner:CANON KK

Method and apparatus for speech synthesis of text message

Provided is a method and apparatus for speech synthesis of a text message. The method includes receiving input of voice parameters for a text message, storing each of the text message and the input voice parameters in a data packet, and transmitting the data packet to a receiving terminal.
Owner:SAMSUNG ELECTRONICS CO LTD

Dynamically Changing Voice Attributes During Speech Synthesis Based upon Parameter Differentiation for Dialog Contexts

A method of speech synthesis can include automatically identifying spoken passages within a text source and converting the text source to speech by applying different voice configurations to different portions of text within the text source according to whether each portion of text was identified as a spoken passage. The method further can include identifying the speaker and / or the gender of the speaker and applying different voice configurations according to the speaker identity and / or speaker gender.
Owner:CERENCE OPERATING CO

Personalized agent for portable devices and cellular phone

Personalized agent services are provided in a personal messaging device, such as a cellular telephone or personal digital assistant, through services of a speech recognizer that converts speech into text and a text-to-speech synthesizer that converts text to speech. Both recognizer and synthesizer may be server-based or locally deployed within the device. The user dictates an e-mail message which is converted to text and stored. The stored text is sent back to the user as text or as synthesized speech, to allow the user to edit the message and correct transcription errors before sending as e-mail. The system includes a summarization module that prepares short summaries of incoming e-mail and voice mail. The user may access these summaries, and retrieve and organize email and voice mail using speech commands.
Owner:SOVEREIGN PEAK VENTURES LLC

Dialog device with dialog support generated using a mixture of language models combined using a recurrent neural network

A dialog device comprises a natural language interfacing device (chat interface or a telephonic device), and a natural language output device (the chat interface, a display device, or a speech synthesizer outputting to the telephonic device). A computer stores natural language dialog conducted via the interfacing device and constructs a current utterance word-by-word. Each word is chosen by applying a plurality of language models to a context comprising concatenation of the stored dialog and the current utterance thus far. Each language model outputs a distribution over the words of a vocabulary. A recurrent neural network (RNN) is applied to the distributions to generate a mixture distribution. The next word is chosen using the mixture distribution. The output device outputs the current natural language utterance after it has been constructed by the computer.
Owner:CONDUENT BUSINESS SERVICES LLC

Phonetic decoding and concatentive speech synthesis

A speech processing system includes a multiplexer that receives speech data input as part of a conversation turn in a conversation session between two or more users where one user is a speaker and each of the other users is a listener in each conversation turn. A speech recognizing engine converts the speech data to an input string of acoustic data while a speech modifier forms an output string based on the input string by changing an item of acoustic data according to a rule. The system also includes a phoneme speech engine for converting the first output string of acoustic data including modified and unmodified data to speech data for output via the multiplexer to listeners during the conversation turn.
Owner:CERENCE OPERATING CO

Voice personalization of speech synthesizer

InactiveUS6970820B2Excellent personalization resultMinimal computing burdenSpeech synthesisPersonalizationIndependent parameter
The speech synthesizer is personalized to sound like or mimic the speech characteristics of an individual speaker. The individual speaker provides a quantity of enrollment data, which can be extracted from a short quantity of speech, and the system modifies the base synthesis parameters to more closely resemble those of the new speaker. More specifically, the synthesis parameters may be decomposed into speaker dependent parameters, such as context-independent parameters, and speaker independent parameters, such as context dependent parameters. The speaker dependent parameters are adapted using enrollment data from the new speaker. After adaptation, the speaker dependent parameters are combined with the speaker independent parameters to provide a set of personalized synthesis parameters. To adapt the parameters with a small amount of enrollment data, an eigenspace is constructed and used to constrain the position of the new speaker so that context independent parameters not provided by the new speaker may be estimated.
Owner:SOVEREIGN PEAK VENTURES LLC

Hybrid Speech Synthesizer, Method and Use

Disclosed are novel embodiments of a speech synthesizer and speech synthesis method for generating human-like speech wherein a speech signal can be generated by concatenation from phonemes stored in a phoneme database. Wavelet transforms and interpolation between frames can be employed to effect smooth morphological fusion of adjacent phonemes in the output signal. The phonemes may have one prosody or set of prosody characteristics and one or more alternative prosodies may be created by applying prosody modification parameters to the phonemes from a differential prosody database. Preferred embodiments can provide fast, resource-efficient speech synthesis with an appealing musical or rhythmic output in a desired prosody style such as reportorial or human interest. The invention includes computer-determining a suitable prosody to apply to a portion of the text by reference to the determined semantic meaning of another portion of the text and applying the detennined prosody to the text by modification of the digitized phonemes. In this manner, prosodization can effectively be automated.
Owner:LESSAC TECH INC

Speech synthesis method and system

The invention provides a speech synthesis method and a speech synthesis system. The method comprises: receiving a text input by a user; performing text analysis to obtain a syllable sequence corresponding to the text and the syllable name of each syllable in the syllable sequence; for each syllable in the syllable sequence, planning and acquiring a corresponding duration parameter and a corresponding basic frequency parameter by combining a statistic parameter model according to the syllable name and context; for each syllable in the syllable sequence, acquiring corresponding spectrum parameter by matching from a spectrum parameter database according to the syllable name, the context, the duration parameter and the basic frequency parameter; and acquiring speech data corresponding to the syllable sequence by using a synthesizer according to the duration parameter, duration parameter, basic frequency parameter and spectrum parameter of each syllable in the syllable sequence. The method and the system can be used in embedded equipment and effectively reduce data storage space occupation while achieving a high tone quality.
Owner:BEIJING SINOVOICE TECH CO LTD

Knowledge-information-processing server system having image recognition system

Extensive social communication is induced. Connection is made with a network terminal capable of connecting to the Internet, and an image and voice signal reflecting the subjective visual field of the user and the like which can be obtained from the headset system that can be worn by the user on the head is uploaded via the network terminal to a knowledge-information-processing server system, and specifying and selecting of an attention-given target by the voice of the user himself / herself are enabled on the server system with collaborative operation with the voice recognition system with regard to a specific object and the like to which the user gives attention and which is included in the image, and with regard to the series of image recognition processes and image recognition result made by the user, image recognition result and recognition processes thereof are notified as voice information to an earphone incorporated into the headset system of the user by way of the user's network terminal via the Internet by the server system with collaborative operation with a voice-synthesizing system, so that user's message or tweet can be extensively shared by users.
Owner:CYBER AI ENTERTAINMENT

Customizing the speaking style of a speech synthesizer based on semantic analysis

A method is provided for customizing the speaking style of a speech synthesizer. The method includes: receiving input text; determining semantic information for the input text; determining a speaking style for rendering the input text based on the semantic information; and customizing the audible speech output of the speech synthesizer based on the identified speaking style.
Owner:SOVEREIGN PEAK VENTURES LLC

Speech synthesis apparatus and speech synthesis method

The present invention includes: a characteristic parameter DB 106 that holds, with respect to each speech-unit, speech-unit data indicating a loan word attribute and acoustic characteristics; a language analysis unit 104 and a prosody prediction unit 109 that obtain text data and respectively predict a loan word attribute and acoustic characteristics of each of a plurality of speech-units that form text indicated by the text data; a speech-unit selection unit 108 that selects, from the characteristic parameter DB 106, speech-unit data that represents the loan word attribute and the acoustic characteristics similar to the predicted loan word attribute and acoustic characteristics of each speech-unit; and a speech synthesis unit 110 that generates synthesized speech using a plurality of the selected speech-units and outputs the synthesized speech.
Owner:PANASONIC CORP

Translingual visual speech synthesis

A computer implemented method in a language independent system generates audio-driven facial animation given the speech recognition system for just one language. The method is based on the recognition that once alignment is generated, the mapping and the animation hardly have any language dependency in them. Translingual visual speech synthesis can be achieved if the first step of alignment generation can be made speech independent. Given a speech recognition system for a base language, the method synthesizes video with speech of any novel language as the input.
Owner:PENDRAGON NETWORKS

Multi-lingual speech synthesis

A method for speech synthesis of a word in a first language, comprising dividing the word into a first sequence of pronunciation phonemes in the first language, mapping the first phoneme sequence to a second sequence of pronunciation phonemes in at least one second language, and generating an audio output of the phonemes in the second phoneme sequence using prosody models adapted for the at least one second language. According to this method, an audio output of a word in a first language can be generated by a speech synthesizing engine not having actual support for this language. Instead, the pronunciation phonemes of the word are mapped onto phonemes of at least one second language, for which the speech synthesizing engine does have support.
Owner:NOKIA CORP

Optimization of an objective measure for estimating mean opinion score of synthesized speech

A method is provided for optimizing an objective measure used to estimate mean opinion score or naturalness of synthesized speech from a speech synthesizer. The method includes using an objective measure that has components derived directly from textual information used to form synthesized utterances. The objective measure has a high correlation with mean opinion score such that a relationship can be formed between the objective measure and corresponding mean opinion score. The objective measure is altered to provide a different function of textual information derived from the utterances so as to improve the relationship between the scores of the objective measure and subjective ratings of the synthesized utterances.
Owner:MICROSOFT TECH LICENSING LLC

Synthesis by Generation and Concatenation of Multi-Form Segments

A speech synthesis system and method is described. A speech segment database references speech segments having various different speech representational structures. A speech segment selector selects from the speech segment database a sequence of speech segment candidates corresponding to a target text. A speech segment sequencer generates from the speech segment candidates sequenced speech segments corresponding to the target text. A speech segment synthesizer combines the selected sequenced speech segments to produce a synthesized speech signal output corresponding to the target text.
Owner:CERENCE OPERATING CO

Audible menu system

An audible menu system associated with distribution of television content over a service provider network is disclosed. The menu system includes a speech synthesizer and screen reader. Electronic programming guide (EPG) elements are read by a screen reader and provided to a speech synthesizer for presenting audible representations of EPG elements to a user. The user may provide inputs to a remote control device to navigate an EPG that may also be presented through a graphical user interface. As a user navigates a cursor over selectable EPG elements, disclosed embodiments provide audible outputs that correspond to the selectable EPG elements. In some embodiments, users may provide customized audio inputs that are played as audio outputs during future menu navigation sessions.
Owner:SBC KNOWLEDGE VENTURES LP

Speech synthesis system

A speech synthesizing system producing a speech of an improved quality of voice by selecting a combination of speech segment most suitable for a synthesis speech unit sequence. The speech synthesizing system comprises a speech segment storage section where speech segment is stored, a speech segment selection information storage section where speech segment selection information including combinations of speech segment constituted of speech segment stored in the speech segment storage section for an arbitrary speech unit sequence and the appropriateness information representing the appropriatenesses of the combinations are stored, a speech segment selecting section for selecting a combination of speech segment most suitable for a synthesis parameter according to the speech segment selection information stored in the speech segment storage section, and a waveform generating section for generating speech waveform data from the combination of speech segment selected by the speech segment selecting section.
Owner:FUJITSU LTD

Voice synthesis system

A voice synthesis system for interactive voice services comprises a voice server connected to a packet network dispensing a voice service to a user terminal by executing a service file associated with the voice service. An HTTP client in the voice server transmits a request containing a text to be synthesized during execution of the service file. The service file includes an address designating a resource in a voice synthesis server connected to the packet network and a command responsive to the audio format for commanding the transmitting of the request to the voice synthesis server. An HTTP server in the voice synthesis server transmits to the voice server an audio response including the text that has been synthesized by the voice synthesis server independently of the voice server.
Owner:FRANCE TELECOM SA

Interactive multimedia book

An interactive multimedia book provides hands-on multimedia instruction to the user in response to voiced commands. The book is implemented on a computer system and includes both text and audio / video clips. The interactive multimedia book is accessed by voiced commands and natural language queries as the primary user input. The displayed text is written in a markup language and contains hyperlinks which link the current topic with other related topics. The user may command the book to read the text and, as the text is read by the voice synthesizer, a word which is also a hyperlink will change its attributes upon being spoken. The user will be able to observe or hear this and simply utter the word which is the hyperlink to navigate to the linked topic.
Owner:WHITHAM HLDG
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products