Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

877 results about "Voice transformation" patented technology

Multimodal natural language query system for processing and analyzing voice and proximity-based queries

The present invention provides a natural language query system and method for processing and analyzing multimodally-originated queries, including voice and proximity-based queries. The natural language query system includes a Web-enabled device including a speech input module for receiving a voice-based query in natural language form from a user and a location / proximity module for receiving location / proximity information from a location / proximity device. The query system also includes a speech conversion module for converting the voice-based query in natural language form to text in natural language form and a natural language processing module for converting the text in natural language form to text in searchable form. The query system further includes a semantic engine module for converting the text in searchable form to a formal database query and a database-look-up module for using the formal database query to obtain a result related to the voice-based query in natural language form from a database.
Owner:PORTAL COMM LLC

Intelligent Text-to-Speech Conversion

Techniques for improved text-to-speech processing are disclosed. The improved text-to-speech processing can convert text from an electronic document into an audio output that includes speech associated with the text as well as audio contextual cues. One aspect provides audio contextual cues to the listener when outputting speech (spoken text) pertaining to a document. The audio contextual cues can be based on an analysis of a document prior to a text-to-speech conversion. Another aspect can produce an audio summary for a file. The audio summary for a document can thereafter be presented to a user so that the user can hear a summary of the document without having to process the document to produce its spoken text via text-to-speech conversion.
Owner:APPLE INC

Mobile Speech-to-Speech Interpretation System

Interpretation from a first language to a second language via one or more communication devices is performed through a communication network (e.g. phone network or the internet) using a server for performing recognition and interpretation tasks, comprising the steps of: receiving an input speech utterance in a first language on a first mobile communication device; conditioning said input speech utterance; first transmitting said conditioned input speech utterance to a server; recognizing said first transmitted speech utterance to generate one or more recognition results; interpreting said recognition results to generate one or more interpretation results in an interlingua; mapping the interlingua to a second language in a first selected format; second transmitting said interpretation results in the first selected format to a second mobile communication device; and presenting said interpretation results in a second selected format on said second communication device.
Owner:NANT HLDG IP LLC

Method and device for converting speech

Electronic device and method for speech to text conversion procedure, wherein the overall conversion result may include smaller portions with multiple conversion options that are audibly and optionally visually or tactilely reproduced for user confirmation, thereby resulting enhanced conversion accuracy with minimal additional effort by the user.
Owner:MOBITER DICTA

Method, apparatus and computer program product for providing voice conversion using temporal dynamic features

An apparatus for providing voice conversion using temporal dynamic features includes a feature extractor and a transformation element. The feature extractor may be configured to extract dynamic feature vectors from source speech. The transformation element may be in communication with the feature extractor and configured to apply a first conversion function to a signal including the extracted dynamic feature vectors to produce converted dynamic feature vectors. The first conversion function may have been trained using at least dynamic feature data associated with training source speech and training target speech. The transformation element may be further configured to produce converted speech based on an output of applying the first conversion function.
Owner:WSOU INVESTMENTS LLC

Apparatus and method for foreign language study

The apparatus for foreign language study includes: a voice recognition device configured to recognize a speech entered by a user and convert the speech into a speech text; a speech intent recognition device configured to extract a user speech intent for the speech text using skill level information of the user and dialogue context information; and a feedback processing device configured to extract a different expression depending on the user speech intent and a speech situation of the user. According to the present invention, the intent of a learner's speech may be determined even though the learner's skill is low, and customized expressions for various situations may be provided to the learner.
Owner:POSTECH ACAD IND FOUND

Automated transcription system and method using two speech converting instances and computer-assisted correction

A system for automating transcription services for one or more users. This system receives a voice dictation file from a current user, which is automatically converted into a first written text based on a set of conversion variables. The same voice dictation file is automatically converted into a second written text based on a second set of conversion variables. The first and second sets of conversion variables have at least one difference, such as different speech recognition programs, different vocabularies, and the like. The system further includes a program for manually editing a copy of the first and second written text to create a verbatim text of the voice dictation file. This verbatim text can be delivered to the current user as transcribed text. The verbatim text can also be fed back into each speech recognition instance to improve the accuracy of each instance with respect to the human voice in the file.
Owner:CUSTOM SPEECH USA

Electronic book with voice emulation features

A method and system for providing text-to-audio conversion of an electronic book displayed on a viewer. A user selects a portion of displayed text and converts it into audio. The text-to-audio conversion may be performed via a header file and pre-recorded audio for each electronic book, via text-to-speech conversion, or other available means. The user may select manual or automatic text-to audio conversion. The automatic text-to-audio conversion may be performed by automatically turning the pages of the electronic book or by the user manually turning the pages. The user may also select to convert the entire electronic book, or portions of it, into audio. The user may also select an option to receive an audio definition of a particular word in the electronic book. The present invention allows a user to control the system by selecting options from a screen or by entering voice commands.
Owner:ADREA LLC

Method, apparatus and computer program product for providing voice conversion using temporal dynamic features

An apparatus for providing voice conversion using temporal dynamic features includes a feature extractor and a transformation element. The feature extractor may be configured to extract dynamic feature vectors from source speech. The transformation element may be in communication with the feature extractor and configured to apply a first conversion function to a signal including the extracted dynamic feature vectors to produce converted dynamic feature vectors. The first conversion function may have been trained using at least dynamic feature data associated with training source speech and training target speech. The transformation element may be further configured to produce converted speech based on an output of applying the first conversion function.
Owner:WSOU INVESTMENTS LLC

Multilingual text-to-speech system with limited resources

A multilingual text-to-speech system includes a source datastore of primary source parameters providing information about a speaker of a primary language. A plurality of primary filter parameters provides information about sounds in the primary language. A plurality of secondary filter parameters provides information about sounds in a secondary language. One or more secondary filter parameters is normalized to the primary filter parameters and mapped to a primary source parameter.
Owner:SOVEREIGN PEAK VENTURES LLC

Text-to-speech user interface control

A system and method includes a detecting computer readable text associated with a device, detecting a starting point for a text-to-speech conversion of text, beginning the text-to-speech conversion upon detection of movement of a pointing device in a direction of text flow, and controlling a rate of the text-to-speech conversion based on a rate of movement of the pointing device in relation to the text to be converted.
Owner:NOKIA CORP

Precision speech to text conversion

A speech-to-text conversion module uses a central database of user speech profiles to convert speech to text. Incoming audio information is fragmented into numerous audio fragments based upon detecting silence. The audio information is also converted to numerous text files by any number of speech engines. Each text file is then fragmented into numerous text fragments based upon the boundaries established during the audio fragmentation. Each set of text fragments from the different speech engines corresponding to a single audio fragments is then compared. The best approximation of the audio fragment is produced from the set of text fragments; a hybrid may be produced. If no agreement is reached, the audio fragment and set the text fragments are sent to human agents who verify and edit to produce a final edited text fragment that best corresponds to the audio fragment. Fragmentation that produces overlapping audio fragments requires splicing of the final text fragments to produce the output text file.
Owner:FONEWEB

Method and system for statistic-based distance definition in text-to-speech conversion

A method for distance definition in a text-to-speech conversion system by applying Gaussian Mixture Model (GMM) to a distance definition. According to an embodiment, the text that is to be subjected to text-to-speech conversion is analyzed to obtain a text with descriptive prosody annotation; clustering is performed for samples in the obtained text; and a GMM model is generated for each cluster, to determine the distance between the sample and the corresponding GMM model.
Owner:CERENCE OPERATING CO

Speech transformation using log energy and orthogonal matrix

Calculate the log frame energy value of each of a pre-determined number n of frames of an input speech signal and apply a matrix transform to the n log frame energy values to form a temporal matrix representing the input speech signal. The matrix transform may be a discrete cosine transform.
Owner:BRITISH TELECOMM PLC

Multimodal natural language query system for processing and analyzing voice and proximity-based queries

The present invention provides a natural language query system and method for processing and analyzing multimodally-originated queries, including voice and proximity-based queries. The natural language query system includes a Web-enabled device including a speech input module for receiving a voice-based query in natural language form from a user and a location / proximity module for receiving location / proximity information from a location / proximity device. The query system also includes a speech conversion module for converting the voice-based query in natural language form to text in natural language form and a natural language processing module for converting the text in natural language form to text in searchable form. The query system further includes a semantic engine module for converting the text in searchable form to a formal database query and a database-look-up module for using the formal database query to obtain a result related to the voice-based query in natural language form from a database.
Owner:PORTAL COMM LLC

Prosthetic hearing device that transforms a detected speech into a speech of a speech form assistive in understanding the semantic meaning in the detected speech

A speech transformation apparatus comprises a microphone 21 for detecting speech and generating a speech signal; a signal processor 22 for performing a speech recognition process using the speech signal; a speech information generator for transforming the recognition result responsive to the physical state of the user, the operating conditions, and / or the purpose for using the apparatus; and a display unit 26 and loudspeaker 25 for generating a control signal for outputting a raw recognition result and / or a transformed recognition result. In a speech transformation apparatus thus constituted, speech enunciated by a spoken-language-impaired individual can be transformed and presented to the user, and sounds from outside sources can also be transformed and presented to the user.
Owner:YUGEN KAISHA GM & M

Internet accessed text-to-speech reading assistant

An Internet accessed server that on demand downloads and activates text-to-speech program elements to a subscriber's computer. Program elements are customized to match the operating system of the subscriber's computer. Upon termination of the text-to-speech session, the server deactivates all program elements and the subscriber becomes free to reinitiate over the Internet the text-to-speech program on the same or another computer system.
Owner:SZCZEPANEK NOAH JOHN

Method and system for searching recorded speech and retrieving relevant segments

A system and method for searching recorded speech is disclosed. The system and method comprises converting the recorded speech into text using a voice recognition system. As the speech is being converted, naturally occurring breaks in the languages will be used to take time indexes from the recording. The system and method includes creating a full text index of the recorded speech utilizing an information extender. The full text index contains a plurality of time stamps that point to the occurrence of words in the recorded speech. Finally, the text is searched by a full text search server that has linguistic search capabilities using the full text index. Finally, the searched text, the text index and the recorded speech are stored in the database. The recorded speech is searched by locating relevant phrases or words, and then mapping the time stamps associated with the relevant phrases words back to the recorded speech in the database.
Owner:UNILOC 2017 LLC

Method and system for interactive conversational dialogue for cognitively overloaded device users

A system and method to interactively converse with a cognitively overloaded user of a device, includes maintaining a knowledge base of information regarding the device and a domain, organizing the information in at least one of a relational manner and an ontological manner, receiving speech from the user, converting the speech into a word sequence, recognizing a partial proper name in the word sequence, identifying meaning structures from the word sequence using a model of the domain information, adjusting a boundary of the partial proper names to enhance an accuracy of the meaning structures, interpreting the meaning structures in a context of the conversation with the cognitively overloaded user using the knowledge base, selecting a content for a response to the cognitively overloaded user, generating the response based on the selected content, the context of the conversation, and grammatical rules, and synthesizing speech wave forms for the response.
Owner:ROBERT BOSCH CORP +1

System and method for illustrating a menu of insights associated with visualizations

A system and method are provided for generating one or more menus having options that display insights from visualizations. The options presented in the menus enable users to determine relationships between elements of the visualization. The relationships may be displayed textually to enable user to navigate the menus using a keyboard, a text-to-voice converter, and / or pointers.
Owner:IBM CORP

Method for building voice transformation model and method and system for voice transformation

The invention discloses a method for building a voice transformation model and a method and device for achieving voice transformation between first language and second language. The method for transformation includes conducting voice segmentation on first language voice to be transformed to obtain at least one first language syllable, recording syllable duration parameter of each first language syllable and obtained by voice segmentation, extracting base frequency parameter of each first language syllable, determining base frequency parameter and syllable duration of each corresponding second language syllable according to the base frequency parameter and the syllable duration parameter of each first language syllable, adjusting voice waveform of the first corresponding language syllables by the base frequency parameter and the syllable duration according to each second language syllable to obtain voice waveform of each second language syllable and output the syllables. When the method is used for conducting voice transformation, voice quality of input voice and transformed output voice is basically consistent, and real-time transformation can be conducted.
Owner:SIEMENS AG

Linguistic extraction of temporal and location information for a recommender system

One embodiment of the present invention provides a system that recommends activities. During operation, the system receives a piece of content obtained from text or converted to text from speech. The system then analyzes the received content to identify any activity type, indication of willingness to participate in any type of activities, and at least one piece of temporal information, which can be implicitly and / or explicitly stated in the content, and / or one piece of location information associated with the activity type. The system further recommends one or more activities, venues, and / or services that afford or support activities for a user based on the information extracted from the content.
Owner:XEROX CORP

Method and apparatus for speech privacy

A privacy apparatus adds a privacy sound based on a speaker's own voice into the environment, thereby confusing listeners as to which of the sounds is the real source. This permits disruption of the ability to understand the source speech of the user by eliminating segregation cues that the auditory system uses to interpret speech. The privacy apparatus minimizes segregation cues. The privacy apparatus is relatively quiet and thus easily acceptable in a typical open floor design office space. The privacy apparatus contains an A / D converter that converts the speech into a digital signal, a DSP that converts the digital signal into a privacy signal with pre-recorded speech fragments of the person speaking, a D / A converter that converts the privacy signal into an output signal and one or more loudspeakers from which the output signal is emitted.
Owner:HERMAN MILLER INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products