Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

42 results about "Phonetic representation" patented technology

Phonetic representation, or more commonly phonetic transcription is the representation of speech sounds using symbols in phonetic alphabet such as IPA, X-SAMPA, Kirshenbaum for linguistic studies and for learning the pronunciation of languages. Among these systems, the International Phonetic Alphabet has been the most widely used so far, whose symbols are printed in most dictionaries and books on linguistics.

Generating large units of graphonemes with mutual information criterion for letter to sound conversion

A method and apparatus are provided for segmenting words into component parts. Under the invention, mutual information scores for pairs of graphoneme units found in a set of words are determined. Each graphoneme unit includes at least one letter. The graphoneme units of one pair of graphoneme units are combined based on the mutual information score. This forms a new graphoneme unit. Under one aspect of the invention, a syllable n-gram model is trained based on words that have been segmented into syllables using mutual information. The syllable n-gram model is used to segment a phonetic representation of a new word into syllables. Similarly, an inventory of morphemes is formed using mutual information and a morpheme n-gram is trained that can be used to segment a new word into a sequence of morphemes.
Owner:MICROSOFT TECH LICENSING LLC

Synthesis by Generation and Concatenation of Multi-Form Segments

A speech synthesis system and method is described. A speech segment database references speech segments having various different speech representational structures. A speech segment selector selects from the speech segment database a sequence of speech segment candidates corresponding to a target text. A speech segment sequencer generates from the speech segment candidates sequenced speech segments corresponding to the target text. A speech segment synthesizer combines the selected sequenced speech segments to produce a synthesized speech signal output corresponding to the target text.
Owner:CERENCE OPERATING CO

Apparatus and method for synthesized audible response to an utterance in speaker-independent voice recognition

When a speaker-independent voice-recognition (SIVR) system recognizes a spoken utterance that matches a phonetic representation of a speech element belonging to a predefined vocabulary, it may play a synthesized speech fragment as a means for the user to verify that the utterance was correctly recognized. When a speech element in the vocabulary has more than one possible pronunciation, the system may select the one most closely matching the user's utterance, and play a synthesized speech fragment corresponding to that particular representation.
Owner:MARVELL WORLD TRADE LTD

Method and apparatus for recognizing a speaker in lawful interception systems

A method and apparatus for identifying a speaker within a captured audio signal from a collection of known speakers. The method and apparatus receive or generate voice representations for each known speakers and tag the representations according to meta data related to the known speaker or to the voice. The representations are grouped into one or more groups according to the indices. When a voice to be recognized is introduced, characteristics are determined according to which the groups are prioritized, so that the representations participating only in part of the groups are matched against the o voice to be identified, thus reducing identification time and improving the statistical significance.
Owner:CYBERBIT

Arrangement for Creating and Using a Phonetic-Alphabet Representation of a Name of a Party to a Call

A first party creates and edits a phonetic-alphabet representation of its name. The phonetic representation is conveyed to a second party as “caller-identification” information by messages that set up a call between the parties. The phonetic representation of the name is displayed to the second party, converted to speech, and / or converted to an alphabet of a language of the second party and then displayed to the second party.
Owner:AVAYA INC

Method and apparatus for recognizing a speaker in lawful interception systems

A method and apparatus for identifying a speaker within a captured audio signal from a collection of known speakers. The method and apparatus receive or generate voice representations for each known speakers and tag the representations according to meta data related to the known speaker or to the voice. The representations are grouped into one or more groups according to the indices. When a voice to be recognized is introduced, characteristics are determined according to which the groups are prioritized, so that the representations participating only in part of the groups are matched against the voice to be identified, thus reducing identification time and improving the statistical significance.
Owner:CYBERBIT

Reducing a size of a compiled speech recognition grammar

The present invention discloses creating and using speech recognition grammars of reduced size. The reduced speech recognition grammars can include a set of entries, each entry having a unique identifier and a phonetic representation that is used when matching speech input against the entries. Each entry can lack a textual spelling corresponding to the phonetic representation. The reduced speech recognition grammar can be digitally encoded and stored in a computer readable media, such as a hard drive or flash memory of a portable speech enabled device.
Owner:NUANCE COMM INC

Context sensitive multi-stage speech recognition

A system enables devices to recognize and process speech. The system includes a database that retains one or more lexical lists. A speech input detects a verbal utterance and generates a speech signal corresponding to the detected verbal utterance. A processor generates a phonetic representation of the speech signal that is designated a first recognition result. The processor generates variants of the phonetic representation based on context information provided by the phonetic representation. One or more of the variants of the phonetic representation selected by the processor are designated as a second recognition result. The processor matches the second recognition result with stored phonetic representations of one or more of the stored lexical lists.
Owner:NUANCE COMM INC

System and a Method For Representing Unrecognized Words in Speech to Text Conversions as Syllables

The present invention is a novel system and method for overcoming the shortcomings of existing speech-to-text systems which relates to the processing of unrecognized words. On encountering words which are not decipherable by it the preferred embodiment of the present invention analyzes the syllables which make up these words and translates them into the appropriate phonetic representations. The method described by the present invention ensures that words which were not uttered clearly would not be lost or distorted in the process of transcribing the text. Additionally, it allows using smaller and simpler speech-to-text applications, which are suitable for mobile devices with limited storage and processing resources, since these applications may use smaller dictionaries and may be designed only to identify commonly used words. Also disclosed are several examples for possible implementations of the described system and method.
Owner:SHPIGEL AVRAHAM

Method and system for obtaining personal aliases through voice recognition

Methods and systems for recognizing a spoken alias are disclosed. The present invention includes generating a plurality of alias variations based on a discoverable name and creating a phonetic representation for each of the alias variations. The present invention also includes capturing a phonetic pronunciation of the spoken alias. At least one of the created alias variations that has a phonetic representation that corresponds to the captured phonetic pronunciation is selected.
Owner:MICROSOFT TECH LICENSING LLC

Midi-compatible hearing device and reproduction of speech sound in a hearing device

The method for providing a user of a hearing device with speech sound comprises the step ofa) providing in the hearing device speech-representing data representative of speech-bound contents.The speech-bound contents is encoded in said speech-representing data in a compressed way by means of a set of encoded-speech-segment data, wherein each of the encoded-speech-segment data of the set is indicative of one speech segment, and wherein the speech-representing data comprise a multitude of the encoded-speech-segment data.And it also comprises the steps ofb) deriving from the multitude of the encoded-speech-segment data audio signals representative of the speech-bound contents by composing audio signal segments derived by decoding the multitude of encoded-speech-segment data; andc) converting the so-derived audio signals into speech sound by means of an output converter of the hearing device.Preferably, the encoded-speech-segment data are MIDI data, wherein MIDI stands for Musical Instrument Digital Interface. For example, the speech-bound contents is the contents of an audio book or news to which the user wants to listen.
Owner:PHONAK

Method and system for autocompletion for languages having ideographs and phonetic characters

When a user enters text in a text input box (eg, a browser or a tool bar), a sorted set of predicted input-complete strings comprising ideographic strings is presented to the user. The user-entered text may include zero or more ideograms followed by one or more phonetic characters, or the entered text may be one or more. The predicted completion string can be a URL or a query string. Ranking can be based on any number of factors (eg, frequency of queries submitted by user groups). URLs may be ranked based on the URL's importance value. The set of sort-predicted completion strings may be obtained by matching the fingerprint of the user input string with the fingerprint-to-table mapping containing the set of sort-predicted input complete strings. The sequence-predicted string generation takes into account multiple phonetic representations of an ideographic string.
Owner:GOOGLE LLC

Method and apparatus for voice controlled devices with improved phrase storage, use, conversion, transfer, and recognition

The embodiments of the invention provide for the storage of speech phrases. Speech phrases are processed by a speaker-independent speech recognition engine of a voice controlled device. This engine returns a speaker-independent representation of the phrase. The speaker-independent representation is stored. Embodiments of the invention include methods of converting text to speaker-independent representations of speech and speaker-independent representations of speech into text.
Owner:WINBOND ELECTRONICS CORP

Generating large units of graphonemes with mutual information criterion for letter to sound conversion

A method and apparatus are provided for segmenting words into component parts. Under the invention, mutual information scores for pairs of graphoneme units found in a set of words are determined. Each graphoneme unit includes at least one letter. The graphoneme units of one pair of graphoneme units are combined based on the mutual information score. This forms a new graphoneme unit. Under one aspect of the invention, a syllable n-gram model is trained based on words that have been segmented into syllables using mutual information. The syllable n-gram model is used to segment a phonetic representation of a new word into syllables. Similarly, an inventory of morphemes is formed using mutual information and a morpheme n-gram is trained that can be used to segment a new word into a sequence of morphemes.
Owner:MICROSOFT TECH LICENSING LLC

Providing speech recognition data to a speech enabled device when providing a new entry that is selectable via a speech recognition interface of the device

InactiveUS20090157392A1Reduce consumptionEnhance solution scalabilitySpeech recognitionSpeech identificationSpeech sound
The present invention discloses a solution for providing a phonetic representation for a content item along with a content item delivered to a speech enabled computing device. The phonetic representation can be specified in a manner that enables it to be added to a speech recognition grammar of the speech enabled computing device. Thus, the device can recognize speech commands using the newly added phonetic representation that involve the content item. Current implementations of speech recognition systems of this type rely internal generation of speech recognition data that is added to the speech recognition grammar. Generation of speech recognition data can, however, be resource intensive, which can be particularly problematic when the speech enabled device is resource limited. The disclosed solution offloads the task of providing the speech recognition data to an external device, such as a relatively resource rich server or a desktop device.
Owner:IBM CORP

Method and apparatus for providing foreign language text display when encoding is not available

A method and apparatus include referencing a phonetic language database that includes double-byte font entries and associated phonetic representations of the double-byte font entries. At least one of the double byte font entries is used to obtain a phonetic representation of the used at least one double-byte font. The phonetic representation is displayed on a display device.
Owner:MICROSOFT TECH LICENSING LLC

System and method for phonetic searching of data

A method of phonetically searching media information comprises receiving a plurality of search queries from one or more client systems and providing a phonetic representation of each search query. One or more search jobs are instantiated, each search job comprising a plurality of tasks, each task being arranged to sequentially read a block from an archive file. The archive file is stored within a distributed filing system (DFS) in which sequential blocks of data comprising the archive file are replicated to be locally available to one or more processors from a cluster of processors for executing the tasks. Each block stores index files corresponding to a plurality of source media files, each index file containing a phonetic stream corresponding to audio information for a given source media file. Each task obtains phonetic representations of outstanding search queries for a block and sequentially searches the block for each outstanding search query.
Owner:AVAYA INC

Flexible keyword searching

A search engine implements a multi-level search scheme. A first level involves performing a keyword search based on character matching. A second level, performed only if the first level yields no results, is a keyword search based on phonetic representations of a search phrase and of the keywords. A third level, performed only if the first and second levels yield no results, is a rough matching search. The keywords or keyword phrases are specified in a phrase table. Each entry of the phrase table specifies a keyword phrase, its phonetic representation, a topic URL, and an action that is to be performed in conjunction with the topic URL. There are a plurality of defined actions, having different priorities. If multiple keyword phrases are found in the multi-level search, the one having the action with the highest priority is initiated. If there is a tie for the highest priority, the results are listed in a results page, regardless of the actions associated with the matched entries. Different actions can be specified in the phrase table entries, corresponding to different levels of the multi-level search scheme that were required to discover a matching entry.
Owner:MICROSOFT TECH LICENSING LLC

Semantic recognition method, device, storage medium and computer equipment based on voice interaction

The invention provides a semantic recognition method, device, storage medium and computer equipment based on voice interaction. The semantic recognition method based on voice interaction comprises the steps that voice data acquired is converted into phonetic text, wherein the phonetic text comprises at least one phonetic representation character; by means of the phonetic representation characters in the phonetic text, term matching is conducted, so that text is acquired, wherein the text contains meaning representation terms matched with the phonetic representation characters; semantic understanding processing is conducted on the text, so that the semantic meaning of the voice data is acquired. By means of the technical scheme, the accuracy of voice recognition can be improved.
Owner:SHANGHAI XIAOI ROBOT TECH CO LTD

System and method for phonetic searching of data

A method for phonetically searching media including a plurality of audio tracks is disclosed where each audio track is indexed to provide a phonetic representation of the audio track. The method comprises obtaining a text search query and searching for the text query against a set of reference documents to obtain a sub-set of pseudo-relevant documents. The pseudo-relevant documents are examined for a set of search expressions characterizing the pseudo-relevant documents. A phonetic representation corresponding to at least some of the set of search expressions is provided and for each of the phonetic representations of the search expressions, the indexed phonetic representations for one or more of the plurality of audio tracks is phonetically searched to provide any indicators of the incidence of the search expression within the one or more audio tracks.
Owner:AVAYA INC

Method and system for obtaining personal aliases through voice recognition

Methods and systems for recognizing a spoken alias are disclosed. The present invention includes generating a plurality of alias variations based on a discoverable name and creating a phonetic representation for each of the alias variations. The present invention also includes capturing a phonetic pronunciation of the spoken alias. At least one of the created alias variations that has a phonetic representation that corresponds to the captured phonetic pronunciation is selected.
Owner:MICROSOFT TECH LICENSING LLC

Apparatus and methods for pronunciation lexicon compression

A compressed pronunciation lexicon file is generated from a source pronunciation lexicon using a pronunciation prediction algorithm in a multi-output mode. The pronunciation prediction algorithm may generate a deterministic ordered list of phoneme strings from the textual representation of a particular word. The compressed pronunciation lexicon file may include a sorted list of records of compressed textual representations of words and compressed phonetic representations of the words.
Owner:MARVELL ASIA PTE LTD

Method and apparatus for providing foreign language text display when encoding is not available

A method and apparatus include referencing a phonetic language database that includes double-byte font entries and associated phonetic representations of the double-byte font entries. At least one of the double-byte font entries is used to obtain a phonetic representation of the used at least one double-byte font. The phonetic representation is displayed on a display device.
Owner:MICROSOFT TECH LICENSING LLC

System and method for combining phonetic and automatic speech recognition search

A text search query including one or more words may be received. An ASR index created for an audio recording may be searched over using the query to produce ASR search results including words, each word associated with a confidence score. For each of the words in the ASR search results associated with a confidence score below a threshold (and in some cases having one or more preceding words in the ASR index and one or more subsequent words in the ASR index), a phonetic representation of the audio recording may be searched for the word having the confidence score below the threshold, where it occurs in the audio recording, possibly after the one or more preceding words and in the audio recording before the one or more subsequent words, to produce phonetic search results. Search results may be returned include ASR and phonetic results.
Owner:NICE LTD

Database storing syllables and sound units for use in text to speech synthesis system

In embodiments the present invention includes a method for populating a text to speech synthesis database. This method can include the steps of defining a set of phonetic symbols, wherein each symbol is a single alphabetic character representing a separate sound, representing a syllable by at least one phonetic symbol of the set of phonetic symbols to form a phonetic representation of the syllable, recording a verbal expression of the syllable using the phonetic representation, indexing the recording of the verbal expression of the syllable to a description of the recording, and storing the indexed recording of the verbal expression of the syllable in a database.
Owner:ALPINE ELECTRONICS INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products