Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

90 results about "Speech transcription" patented technology

Transcription (linguistics), the representations of speech or signing in written form Orthographic transcription, a transcription method that employs the standard spelling system of each target language. Phonetic transcription, the representation of specific speech sounds or sign components.

Systems and methods for building a native language phoneme lexicon having native pronunciations of non-native words derived from non-native pronunciations

Systems and methods are provided for automatically building a native phonetic lexicon for a speech-based application trained to process a native (base) language, wherein the native phonetic lexicon includes native phonetic transcriptions (base forms) for non-native (foreign) words which are automatically derived from non-native phonetic transcriptions of the non-native words.
Owner:NUANCE COMM INC

Systems and methods for selecting from multiple phonectic transcriptions for text-to-speech synthesis

A system and method for generating synthetic speech, which operates in a computer implemented Text-To-Speech system. The system comprises at least a speaker database that has been previously created from user recordings, a Front-End system to receive an input text and a Text-To-Speech engine. The Front-End system generates multiple phonetic transcriptions for each word of the input text, and the TTS engine uses a cost function to select which phonetic transcription is the more appropriate for searching the speech segments within the speaker database to be concatenated and synthesized.
Owner:CERENCE OPERATING CO

Apparatus, system, and method for voice chat transcription

An apparatus, system, and method to transcribe a voice chat session initiated from a text chat session. The system includes a chat server, a voice server, and a transcription engine. The chat server is configured to facilitate a text chat session between multiple instant messaging clients. The voice server is coupled to the chat server and configured to facilitate a transition from the text chat session to a voice chat session between the multiple instant messaging clients. The transcription engine is coupled to the voice server and configured to generate a voice transcription of the voice chat session. The voice transcription may be aggregated into a text chat history.
Owner:SNAP INC

Continuous speech transcription performance indication

A method of providing speech transcription performance indication includes receiving, at a user device data representing text transcribed from an audio stream by an ASR system, and data representing a metric associated with the audio stream; displaying, via the user device, said text; and via the user device, providing, in user-perceptible form, an indicator of said metric. Another method includes displaying, by a user device, text transcribed from an audio stream by an ASR system; and via the user device, providing, in user-perceptible form, an indicator of a level of background noise of the audio stream. Another method includes receiving data representing an audio stream; converting said data representing an audio stream to text via an ASR system; determining a metric associated with the audio stream; transmitting data representing said text to a user device; and transmitting data representing said metric to the user device.
Owner:AMAZON TECH INC

Combining Re-Speaking, Partial Agent Transcription and ASR for Improved Accuracy / Human Guided ASR

A speech transcription system is described for producing a representative transcription text from one or more different audio signals representing one or more different speakers participating in a speech session. A preliminary transcription module develops a preliminary transcription of the speech session using automatic speech recognition having a preliminary recognition accuracy performance. A speech selection module enables user selection of one or more portions of the preliminary transcription to receive higher accuracy transcription processing. A final transcription module is responsive to the user selection for developing a final transcription output for the speech session having a final recognition accuracy performance for the selected one or more portions which is higher than the preliminary recognition accuracy performance.
Owner:NUANCE COMM INC

Speech Recognition Model Construction Method, Speech Recognition Method, Computer System, Speech Recognition Apparatus, Program, and Recording Medium

A construction method for a speech recognition model, in which a computer system includes; a step of acquiring alignment between speech of each of a plurality of speakers and a transcript of the speaker; a step of joining transcripts of the respective ones of the plurality of speakers along a time axis, creating a transcript of speech of mixed speakers obtained from synthesized speech of the speakers, and replacing predetermined transcribed portions of the plurality of speakers overlapping on the time axis with a unit which represents a simultaneous speech segment; and a step of constructing at least one of an acoustic model and a language model which make up a speech recognition model, based on the transcript of the speech of the mixed speakers.
Owner:IBM CORP

Real-time transcription correction system

A voice transcription system employing a speech engine to transcribe spoken words, detects the spelled entry of words via keyboard or voice to invoke a database of common words attempting to complete the word before all the letters have been input. This database is separate from the database of words used by the speech engine. A voice level indicator is presented to the operator to help the operator keep his or her voice in the ideal range of the speech engine.
Owner:ULTRATEC INC

Multi-command single utterance input method

Systems and processes are disclosed for handling a multi-part voice command for a virtual assistant. Speech input can be received from a user that includes multiple actionable commands within a single utterance. A text string can be generated from the speech input using a speech transcription process. The text string can be parsed into multiple candidate substrings based on domain keywords, imperative verbs, predetermined substring lengths, or the like. For each candidate substring, a probability can be determined indicating whether the candidate substring corresponds to an actionable command. Such probabilities can be determined based on semantic coherence, similarity to user request templates, querying services to determine manageability, or the like. If the probabilities exceed a threshold, the user intent of each substring can be determined, processes associated with the user intents can be executed, and an acknowledgment can be provided to the user.
Owner:APPLE INC

Instant Translation System

A method of, and corresponding headset computer for, performing instant speech translation including, establishing a local network including a link between a first and a second headset computer in which preferred language settings of each headset computer are exchanged, transmitting captured speech in a first language from a first headset computer to a network-based speech recognition service to recognize and transcribe the captured speech as text, receiving the text at the first headset computer, broadcasting the text over the local network to at least the second headset computer, receiving the text at the second headset computer, transmitting the received text from the second headset computer to a network-based text translation service to translate the text to a text in a second language, receiving the text in the second language at the second headset computer from the network-based text translation service, and displaying the translated text at the second headset computer.
Owner:KOPIN CORPORATION

Use of intermediate speech transcription results in editing final speech transcription results

A communication system includes at least one transmitting device and at least one receiving device, one or more network systems for connecting the transmitting device to the receiving device, and an automatic speech recognition (“ASR”) system, including an ASR engine. A user speaks an utterance into the transmitting device, and the recorded speech audio is sent to the ASR engine. The ASR engine returns intermediate transcription results to the transmitting device, which displays the intermediate transcription results in real-time to the user. The intermediate transcription results are also correlated by utterance fragment to final transcription results and displayed to the user. The user may use the information thus presented to make decisions as to whether to edit the final transcription results or to speak the utterance again, thereby repeating the process. The intermediate transcription results may also be used by the user to edit the final transcription results.
Owner:AMAZON TECH INC +1

Automatic multi-language phonetic transcribing system

A multi-language phonetic transcribing system and method are provided for automatically transcribing speech into a phonetic equivalent. The phonetic transcribing system need only be able to recognize the limited number of phonemes of a particular language, e.g., the forty-two phonemes in the English language. Each language to be translated may be broken down into the phonetic elements and stored in a phonetic library for that language. The transcribing system detects speech, converts the speech to an electrical signal, and analyzes the frequency, amplitude, and timing characteristics of the speech to produce incoming phoneme information. The incoming phoneme information is compared to the phonetic elements in the active library. A correlator determines the degree of correlation between the incoming phoneme information and the stored information for each phoneme in the library. For each match, based on the degree of correlation, a respective phoneme is preferably stored and / or printed.
Owner:THE UNITED STATES OF AMERICA AS REPRESENTED BY THE SECRETARY OF THE NAVY

Operating method for an automated language recognizer intended for the speaker-independent language recognition of words in different languages and automated language recognizer

The invention relates to an operating method for an automated language recognizer intended for the speaker-independent language recognition of words from different languages, particularly for recognizing names from different languages. The method is based on a language defined as the mother tongue and has an input phase for establishing a language recognizer vocabulary. Phonetic transcripts are determined for words in various languages in order to obtain phoneme sequences for pronunciation variants The phonemes of each relevant phoneme set of the mother tongue are then specifically mapped to determine phoneme sequences that correspond to pronunciation variants.
Owner:SIEMENS AG

Multi-command single utterance input method

Systems and processes are disclosed for handling a multi-part voice command for a virtual assistant. Speech input can be received from a user that includes multiple actionable commands within a single utterance. A text string can be generated from the speech input using a speech transcription process. The text string can be parsed into multiple candidate substrings based on domain keywords, imperative verbs, predetermined substring lengths, or the like. For each candidate substring, a probability can be determined indicating whether the candidate substring corresponds to an actionable command. Such probabilities can be determined based on semantic coherence, similarity to user request templates, querying services to determine manageability, or the like. If the probabilities exceed a threshold, the user intent of each substring can be determined, processes associated with the user intents can be executed, and an acknowledgment can be provided to the user.
Owner:APPLE INC

Transcription of Speech

A speech media transcription system comprises a playback device arranged to play back speech delimited in segments. The system is programmed to provide, for a segment being transcribed, an adaptive estimate of the proportion of the segment that has not been transcribed by a transcriber. The device is arranged to play back that proportion of the segment, optionally after having already played back the entire segment. Additionally, a segmentation engine is arranged to divide speech media into a plurality of segments by identifying speech as such and using timing information but without using a machine conversion of the speech media into text or a representation of text.
Owner:JPAL

System and method for decoding speech

The system and method for speech decoding in speech recognition systems provides decoding for speech variants common to such languages. These variants include within-word and cross-word variants. For decoding of within-word variants, a data-driven approach is used, in which phonetic variants are identified, and a pronunciation dictionary and language model of a dynamic programming speech recognition system are updated based upon these identifications. Cross-word variants are handled with a knowledge-based approach, applying phonological rules, part-of-speech tagging or tagging of small words to a speech transcription corpus and updating the pronunciation dictionary and language model of the dynamic programming speech recognition system based upon identified cross-word variants.
Owner:KING FAHD UNIVERSITY OF PETROLEUM AND MINERALS +1

Spoken Document Retrieval using Multiple Speech Transcription Indices

A method and system are provided of spoken document retrieval using multiple search transcription indices. The method includes receiving a query input formed of one or more query terms and determining a type of a query term, wherein a type includes a term in a speech recognition vocabulary or a term not in a speech recognition vocabulary. One or more indices of search transcriptions are selected for searching the query term based on the type of the query term. The one or more indices are generated using different speech transcription methods. The results for the query term are scored by the one or more indices and the results of the one or more indices for the query term are merged. The results of the one or more query terms are then merged to provide the results for the query.
Owner:NUANCE COMM INC

Method for building language model, speech recognition method and electronic apparatus

A method for building a language model, a speech recognition method and an electronic apparatus are provided. The speech recognition method includes the following steps. Phonetic transcriptions of a speech signal are obtained from an acoustic model. Phonetic spellings matching the phonetic transcriptions are obtained according to the phonetic transcriptions and a syllable acoustic lexicon. According to the phonetic spellings, a plurality of text sequences and a plurality of text sequence probabilities are obtained from a language model. Each phonetic spelling is matched to a candidate sentence table; a word probability of each phonetic spelling matching a word in a sentence of the sentence table are obtained; and the word probabilities of the phonetic spellings are calculated so as to obtain the text sequence probabilities. The text sequence corresponding to a largest one of the sequence probabilities is selected as a recognition result of the speech signal.
Owner:VIA TECH INC

Fast transcription of speech

A transcription tool assists a user in transcribing audio. The transcription tool includes an audio classification component that classifies an incoming audio stream based on whether portions of the audio stream contain speech data. The transcription tool plays the portions of the audio that contain speech data back to the user and skips the portions of the audio that do not contain speech data. Using a relatively simple command set, the user can control the transcription tool and annotate transcribed text.
Owner:BBN TECHNOLOGIES CORP

Method and system for efficient pacing of speech for transcription

A method and system for improving the efficiency of real-time and non-real-time speech transcription by machine speech recognizers, human dictation typists, and human voicewriters using speech recognizers. In particular, the pacing with which recorded speech is presented to transcriptionists is automatically adjusted by monitoring the transcriptionists' output by comparing the output acoustically or phonetically to the presented recorded speech as well as monitoring the resulting transcription, and accordingly adjusting the pacing.
Owner:COGI

Combining re-speaking, partial agent transcription and ASR for improved accuracy / human guided ASR

A speech transcription system is described for producing a representative transcription text from one or more different audio signals representing one or more different speakers participating in a speech session. A preliminary transcription module develops a preliminary transcription of the speech session using automatic speech recognition having a preliminary recognition accuracy performance. A speech selection module enables user selection of one or more portions of the preliminary transcription to receive higher accuracy transcription processing. A final transcription module is responsive to the user selection for developing a final transcription output for the speech session having a final recognition accuracy performance for the selected one or more portions which is higher than the preliminary recognition accuracy performance.
Owner:MICROSOFT TECH LICENSING LLC

Method and system for automatic transcription prioritization

A visual toolkit for prioritizing speech transcription is provided. The toolkit can include a logger (102) for capturing information from a speech recognition system, a processor (104) for determining an accuracy rating of the information, and a visual display (106) for categorizing the information and prioritizing a transcription of the information based on the accuracy rating. The prioritizing identifies spoken utterances having a transcription priority in view of the recognized result. The visual display can include a transcription category (156) having a modifiable textbox entry with a text entry initially corresponding to a text of the recognized result, and an accept button (157) for validating a transcription of the recognized result. The categories can be automatically ranked by the accuracy rating in an ordered priority for increasing an efficiency of transcription.
Owner:NUANCE COMM INC

Speech processing method and device, and device for speech processing

The embodiments of the invention provide a speech processing method and a device, and a device for speech processing. The method comprises the following steps: after one speech transcription operation on a speech stream, acquiring a target speech data packet needing to be re-transcribed from the speech data packets in the speech stream according to a processing result of the speech data packets in the speech stream returned by a server, wherein the processing result includes a speech recognition result and / or an error code; resending the target speech data packet to the server to enable the server to recognize speech from the target speech data packet; receiving a speech recognition result returned by the server for the target speech data packet; and adding the speech recognition result corresponding to the target speech data packet to a speech transcription result corresponding to the speech stream. According to the embodiments of the invention, the integrity of the speech transcription result corresponding to the speech stream is improved, and the accuracy of speech transcription is improved.
Owner:BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD

Speech transcription tool for efficient speech transcription

A transcription tool [115] includes a graphical user interface [209] that displays the waveform of an input audio signal to a user. The user may define speaker turn segments using the displayed waveform. The graphical user interface further displays a transcription section [302] that includes a textual representation of speech that was transcribed by the user and a graphical representation of annotation information [314] relating to the transcribed text. The user may enter the annotation information on-the-fly while transcribing the text using predefined keyboard shortcut commands or other mechanisms. The graphical user interface may further display a structured representation section [303] that may present the transcribed text as a hierarchical tree structure.
Owner:BBN TECHNOLOGIES CORP

System and method for generating accurate speech transcription from natural speech audio signals

Apparatus for generating accurate speech transcription from natural speech, comprising a data storage for storing a plurality of audio data items, each of which being recitation of text by a specific speaker! a plurality of ASR modules, each of which being trained to optimally create a unique acoustic / linguistic model according to the spectral components contained in said audio data item and analyzing each audio data item and representing said audio data item by an ASR module! a memory for storing all unique acoustic / linguistic models! a controller, adapted to receive natural speech audio signals and divide each natural speech audio signal to equal segments of a predetermined time! adjust the length of each segment, such that each segment will contain one or more complete words! distribute said segments to all ASR module and activate each ASR module to generate a transcription of the words in each segment according to the level of matching to its unique acoustic / linguistic model! calculate, for each given word in a segment, a confidence measure being the probability that said given word is correct; for each segment and for each ASR module, calculate the average confidence of the transcription; obtain the confidence for each word in the segment and calculating mean confidence value of said word! for each segment, decide which transcription is the most accurate by choose only the ASR module with the highest average confidence, from all chosen ASR modules for said segment and creating the transcription of said audio signal by combining all transcriptions resulting from the decisions made for each segment.
Owner:VOCASEE TECH LTD

Inquiry information processing method and device, storage medium and computer equipment

The invention discloses an inquiry information processing method and device, a storage medium and computer equipment, and relates to the technical field of artificial intelligence; the main purpose isto receive and identify answering information of a patient, extract keywords through word segmentation processing, match corresponding question information, build an optimal inquiry path by using a reinforcement learning model, and output the inquiry information corresponding to a path end point, so that more accurate question information is matched according to answers, and the inquiry accuracyand the inquiry efficiency are improved. The method comprises the following steps: acquiring answer text data transcribed by voice; performing word segmentation processing on the text data of the answering language; obtaining a numerical vector of the text data of the answering language through feature extraction; obtaining corresponding question text data according to a preset question answeringmatching algorithm and the feature vector of the answer text data; and constructing an optimal inquiry path by utilizing a preset machine learning algorithm, the answer text data feature vector and the question text data, and outputting inquiry information corresponding to the path end point.
Owner:NORTHEASTERN UNIV

Automatic application-based exercise tracking system and method

An automatic application-based exercise tracking system and methods comprising: i) voice-transcribed or typed text natural language processing and automatic tracking to record exercises, comprehensive exercise quantities, and calories burned data, and ii) multi-exercise administration to record multiple exercises and related data in a single user voice-transcribed or typed text submission. Further, such automatic application-based exercise tracking system is usable through computers, tablets, mobile phones, smart watches, wearables and other similar devices.
Owner:GENESANT TECH INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products