Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

19577 results about "Speech recognition" patented technology

Speech recognition is a interdisciplinary subfield of computational linguistics that develops methodologies and technologies that enables the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the linguistics, computer science, and electrical engineering fields.

Using Context Information To Facilitate Processing Of Commands In A Virtual Assistant

A virtual assistant uses context information to supplement natural language or gestural input from a user. Context helps to clarify the user's intent and to reduce the number of candidate interpretations of the user's input, and reduces the need for the user to provide excessive clarification input. Context can include any available information that is usable by the assistant to supplement explicit user input to constrain an information-processing problem and / or to personalize results. Context can be used to constrain solutions during various phases of processing, including, for example, speech recognition, natural language processing, task flow processing, and dialog generation.
Owner:APPLE INC

System and methods for recognizing sound and music signals in high noise and distortion

A method for recognizing an audio sample locates an audio file that most closely matches the audio sample from a database indexing a large set of original recordings. Each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints. Landmarks occur at reproducible locations within the file, while fingerprints represent features of the signal at or near the landmark timepoints. To perform recognition, landmarks and fingerprints are computed for the unknown sample and used to retrieve matching fingerprints from the database. For each file containing matching fingerprints, the landmarks are compared with landmarks of the sample at which the same fingerprints were computed. If a large number of corresponding landmarks are linearly related, i.e., if equivalent fingerprints of the sample and retrieved file have the same time evolution, then the file is identified with the sample. The method can be used for any type of sound or music, and is particularly effective for audio signals subject to linear and nonlinear distortion such as background noise, compression artifacts, or transmission dropouts. The sample can be identified in a time proportional to the logarithm of the number of entries in the database; given sufficient computational power, recognition can be performed in nearly real time as the sound is being sampled.
Owner:APPLE INC

Ontology-based parser for natural language processing

An ontology-based parser incorporates both a system and method for converting natural-language text into predicate-argument format that can be easily used by a variety of applications, including search engines, summarization applications, categorization applications, and word processors. The ontology-based parser contains functional components for receiving documents in a plurality of formats, tokenizing them into instances of concepts from an ontology, and assembling the resulting concepts into predicates. The ontological parser has two major functional elements, a sentence lexer and a parser. The sentence lexer takes a sentence and converts it into a sequence of ontological entities that are tagged with part-of-speech information. The parser converts the sequence of ontological entities into predicate structures using a two-stage process that analyzes the grammatical structure of the sentence, and then applies rules to it that bind arguments into predicates.
Owner:LEIDOS

Method and system for mitigating delay in receiving audio stream during production of sound from audio stream

A communication component modifies production of an audio waveform at determined modification segments to thereby mitigate the effects of a delay in processing and / or receiving a subsequent audio waveform. The audio waveform and / or data associated with the audio waveform are analyzed to identify the modification segments based on characteristics of the audio waveform and / or data associated therewith. The modification segments show where the production of the audio waveform may be modified without substantially affecting the clarity of the sound or audio. In one embodiment, the invention modifies the sound production at the identified modification segments to extend production time and thereby mitigate the effects of delay in receiving and / or processing a subsequent audio waveform for production.
Owner:VOCOLLECT

System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input

Systems and methods are provided for performing focus detection, referential ambiguity resolution and mood classification in accordance with multi-modal input data, in varying operating conditions, in order to provide an effective conversational computing environment for one or more users.
Owner:IBM CORP

Method and system for purchasing pre-recorded music

A method and system is described which allows users to identify (pre-recorded) sounds such as music, radio broadcast, commercials, and other audio signals in almost any environment. The audio signal (or sound) must be a recording represented in a database of recordings. The service can quickly identify the signal from just a few seconds of excerption, while tolerating high noise and distortion. Once the signal is identified to the user, the user may perform transactions interactively in real-time or offline using the identification information.
Owner:APPLE INC

Method and apparatus for automatically recognizing input audio and/or video streams

A method and system for the automatic identification of audio, video, multimedia, and / or data recordings based on immutable characteristics of these works. The invention does not require the insertion of identifying codes or signals into the recording. This allows the system to be used to identify existing recordings that have not been through a coding process at the time that they were generated. Instead, each work to be recognized is “played” into the system where it is subjected to an automatic signal analysis process that locates salient features and computes a statistical representation of these properties. These features are then stored as patterns for later recognition of live input signal streams. A different set of features is derived for each audio or video work to be identified and stored. During real-time monitoring of a signal stream, a similar automatic signal analysis process is carried out, and many features are computed for comparison with the patterns stored in a large feature database. For each particular pattern stored in the database, only the relevant characteristics are compared with the real-time feature set. Preferably, during analysis and generation of reference patterns, data are extracted from all time intervals of a recording. This allows a work to be recognized from a single sample taken from any part of the recording.
Owner:ICEBERG IND

System and methods for maintaining speech-to-speech translation in the field

A method and apparatus are provided for updating the vocabulary of a speech translation system for translating a first language into a second language including written and spoken words. The method includes adding a new word in the first language to a first recognition lexicon of the first language and associating a description with the new word, wherein the description contains pronunciation and word class information. The new word and description are then updated in a first machine translation module associated with the first language. The first machine translation module contains a first tagging module, a first translation model and a first language module, and is configured to translate the new word to a corresponding translated word in the second language. Optionally, the invention may be used for bidirectional or multi-directional translation
Owner:META PLATFORMS INC

Electronic text input involving word completion functionality for predicting word candidates for partial word inputs

A text input method is described for an electronic apparatus having a user interface with text input means and a display screen. Word completion functionality is provided for predicting word candidates for partial word inputs made by the user with the text input means. The method involves receiving a partial word input from the user and deriving a set of word completion candidates using the word completion functionality. Each of the word completion candidates in the set has a prefix and a suffix, wherein the prefix corresponds to the partial word input. The method also involves presenting the suffices for at least a sub set of the word completion candidates in a predetermined area on the display screen, wherein each of the presented suffices is made selectable for the user.
Owner:NOKIA CORP

System and method for rendering text synchronized audio

One or more computing devices include software and / or hardware implemented processing units synchronize a textual content with an audio content, where the textual content is made up of a sequence of textual units and the audio content is made up of a sequence of sound units. The system and / or method matches each of the sequence of sound units with a corresponding textual unit. The system and / or method determines a corresponding time of occurrence for each sound unit in the audio content relative to a time reference. Each matched textual unit is then associated with a tag that corresponds to the time of occurrence for the sound unit matched with the textual unit.
Owner:I SCROLL

Method and system for generating spelling suggestions

A computer implemented method of suggesting replacement words for words of a string. In the method, an input string of input words is received. The input words are then matched to subject words of a candidate table. Next, candidate replacement words and scores from the candidate table corresponding to the matched subject words are extracted. Each score is indicative of a probability that the input word should be replaced with the corresponding candidate replacement word. Finally, replacement of the input words with their corresponding candidate replacement words is selectively suggested based on the scores for the replacement words. Another aspect of the present invention is directed to a spell checking system that is configured to implement the method.
Owner:MICROSOFT TECH LICENSING LLC

Un-tethered wireless stereo speaker system

A wireless speaker system configured to receive stereo audio information wirelessly transmitted by an audio source including first and second loudspeakers. The first loudspeaker establishes a bidirectional secondary wireless link with the audio source for receiving and acknowledging receipt of the stereo audio information. The first and second loudspeakers communicate with each other via a primary wireless link, and the first and second loudspeakers are configured to extract first and second audio channels, respectively, from the stereo audio information. A wireless audio system including an audio source and first and second loudspeakers, each having a wireless transceiver. The first and second loudspeakers communicate via a primary wireless link. The audio source communicates audio information to the first loudspeaker via a secondary wireless link which is configured according to a standard wireless protocol. The first loudspeaker is configured to acknowledge successful reception of audio information via the secondary wireless link.
Owner:APPLE INC

Method, medium and apparatus for providing mobile voice web service

Provided are a method and apparatus for providing a mobile voice web service in a mobile terminal. The method includes analyzing a web history of a user from web search logs of the user and generating a voice access list based on the analysis results, and performing voice recognition by dynamically generating a voice recognition syntax according to the generated voice access list. Accordingly, by limiting syntax required for voice recognition by generating a syntax suitable for a web context of the user, efficient voice recognition, which can be performed in a terminal not a server, can be implemented.
Owner:SAMSUNG ELECTRONICS CO LTD

Foreign language abbreviation translation in an instant messaging system

A system for automatically providing foreign language abbreviation translation in an instant messaging system that identifies a foreign language abbreviation translation database based on a user indicated source culture. The foreign abbreviation translation database stores abbreviation translations for foreign language abbreviations frequently used by people from the user indicated source culture. The system locates a candidate term in an instant message and compares the candidate term to the foreign language abbreviations in the foreign language abbreviation translation database. In the event that the candidate term matches one of the foreign language abbreviations in the identified foreign language abbreviation translation database, the corresponding translation is retrieved and displayed. The comparison of the candidate term with the foreign language abbreviations may include automatically obtaining a transliteration of the candidate term. The disclosed system advantageously enables translation of foreign language abbreviations to be performed in real-time.
Owner:IBM CORP

Gesture recognition system

A gesture recognition system includes: elements for detecting and generating a signal corresponding a number of markers arranged on an object, elements for processing the signal from the detecting elements, members for detecting position of the markers in the signal. The markers are divided into first and second set of markers, the first set of markers constituting a reference position and the system comprises elements for detecting movement of the second set of markers and generating a signal as a valid movement with respect to the reference position.
Owner:CREDOBLE

Audio signal decorrelator, multi channel audio signal processor, audio signal processor, method for deriving an output audio signal from an input audio signal and computer program

An audio signal decorrelator for deriving an output audio signal from an input audio signal has a frequency analyzer for extracting from the input audio signal a first partial signal descriptive of an audio content in a first audio frequency range and a second partial signal descriptive of an audio content in a second audio frequency range having higher frequencies compared to the second audio frequency range. A partial signal modifier for modifies the first and second partial signals, to obtain first and second processed partial signals, so that a modulation amplitude of a time variant phase shift or time variant delay applied to the first partial signal is higher than that applied to the second partial signal, or for modifying only the first partial signal. A signal combiner combines the first and second processed partial signals, or combines the first processed partial signal and the second partial signal, to obtain an output audio signal.
Owner:FRAUNHOFER GESELLSCHAFT ZUR FOERDERUNG DER ANGEWANDTEN FORSCHUNG EV

Audio network distribution system

An audio distribution network system (20) allowing an audio distribution system to be created that is integrated with the home automation system into a home network that permits vocal feedback, status and even control with the audio through network speakers (100).
Owner:CLEARONCE COMM INC

Method and apparatus for aligning texts

A method and apparatus for aligning texts. The method includes acquiring a target text and a reference text and aligning the target text and the reference text at word level based on phoneme similarity. The method can be applied to automatically archiving a multimedia resource and a method of automatically searching a multimedia resource.
Owner:IBM CORP

Data framing for adaptive-block-length coding system

An audio encoder applies an adaptive block-encoding process to segments of audio information to generate frames of encoded information that are aligned with a reference signal conveying the alignment of a sequence of video information frames. The audio information is analyzed to determine various characteristics of the audio signal such as the occurrence and location of a transient, and a control signal is generated that causes the adaptive block-encoding process to encode segments of varying length. A complementary decoder applies an adaptive block-decoding process to recover the segments of audio information from the frames of encoded information. In embodiments that apply time-domain aliasing cancellation (TDAC) transforms, window functions and transforms are applied according to one of a plurality of segment patterns that define window functions and transform parameters for each segment in a sequence of segments. The segments in each frame of a sequence of overlapping frames may be recovered without aliasing artifacts independently from the recovery of segments in other frames. Window functions are adapted to provide preferred frequency-domain responses and time-domain gain profiles.
Owner:DOLBY LAB LICENSING CORP

World Wide Web-based melody retrieval system with thresholds determined by using distribution of pitch and span of notes

A World Wide Web-based melody retrieval system takes a sung melody as a query and retrieves the song's title or other information from a music database over a WWW network which comprises a method of obtaining search clues with the maximum quantity of information from pitch and span (dynamic threshold determination) and a method of effectively reducing the number of answer candidates (coarse-to-fine matching), thus increasing the matching accuracy, and it is characterized in that a user can retrieve music or media with music by singing.
Owner:SONODA TOMONARI

Audio coding system using spectral hole filling

Audio coding processes like quantization can cause spectral components of an encoded audio signal to be set to zero, creating spectral holes in the signal. These spectral holes can degrade the perceived quality of audio signals that are reproduced by audio coding systems. An improved decoder avoids or reduces the degradation by filling the spectral holes with synthesized spectral components. An improved encoder may also be used to realize further improvements in the decoder.
Owner:DOLBY LAB LICENSING CORP

Apparatus for delivering music and information

The invention comprises music and information delivery systems and methods. One system comprises a portable communication device configured to receive a piece of music from an audio source and transmit the piece of music via a first communication medium to a host computer. The host computer is configured to receive the piece of music from the portable communication device and search a storage medium to identify and access the piece of music from the storage medium. The host computer is configured to transmit the piece of music via a second communication medium to one or more reception units that are configured to receive the piece of music from the host computer via the second communication medium.
Owner:CDN INNOVATIONS LLC

Personalized sound system hearing profile selection process

A method of generating a personalized sound system hearing profile for a user. The method begins by selecting an initial profile, based on selected factors of user input. In an embodiment, the initial profile is selected based on demographic factors. Then the system identifies one or more alternate profiles, each having a selected relationship with the initial profile. The relationship between alternate profiles and the initial profile can be based on gain as a function of frequency, one alternate profile having a higher sensitivity at given frequencies and the other a lower sensitivity. The next step links at least one audio sample with the initial and alternate profiles and then plays the selected samples for the user. The system then receives identification of the preferred sample from the user; and selects a final profile based on the user's preference. An embodiment offers multiple sound samples in different modes, resulting in the selection of multiple final profiles for the different modes. Finally, the system may apply the final profile to the sound system.
Owner:HIMPP

Systems and methods for multimedia time stretching

The invention is related to methods and apparatus that can advantageously alter a playback rate of a multimedia presentation, such as a video clip. One embodiment of the invention permits a multimedia presentation to be sped up or slowed down with a controlled change in pitch of the sped up or slowed down audio. In one embodiment, this controlled change in the pitch permits the sped up or slowed down audio to retain a same sounding pitch as at normal playback speeds. In one embodiment, a duration is specified and playback of the video clip is advantageously sped to complete playback within the specified duration. In another embodiment, a finish by a time is specified, and the playback of the video clip is advantageously sped to complete playback by the specified time.
Owner:COREL CORP +1

Method for automatically producing music videos

Music videos are automatically produced from source audio and video signals. The music video contains edited portions of the video signal synchronized with the audio signal. An embodiment detects transition points in the audio signal and the video signal. The transition points are used to align in time the video and audio signals. The video signal is edited according to its alignment with the audio signal. The resulting edited video signal is merged with the audio signal to form a music video.
Owner:FUJIFILM BUSINESS INNOVATION CORP

Personalized audio system and method

A personalized audio system and method that overcomes many of the broadcast-type disadvantages associated with conventional radio stations. According to one embodiment, the personalized audio system includes the following: (1) a user interface that enables a user of the personalized audio system to specify a profile for a personalized audio channel, (2) a sound recording library comprising a plurality of sound recordings, (3) a playlist generator that (a) selects a plurality of sound recording identifiers from a set of sound recording identifiers, wherein each of the plurality of sound recording identifiers identifies a sound recording that matches the profile and that is stored in the library, and that (b) creates a playlist that lists the plurality of sound recording identifiers in a particular order, and (4) a sound recording reproducing device for reproducing the plurality of identified sound recordings according to the particular order in which the sound recording identifiers are listed in the playlist so that the user can listen to the sound recordings. Advantageously, the personalized audio system does not provide the user with a way to determine the plurality of sound recording identifiers prior to the reproducing means reproducing the plurality of sound recordings, and the personalized audio system does not provide the user with a way to directly control which sound recording identifiers in the set are selected by the playlist generator to be included in the plurality of sound recording identifiers.
Owner:MUSIC CHOICE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products