Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

9347 results about "Voice data" patented technology

System and method for the creation and automatic deployment of personalized, dynamic and interactive inbound and outbound voice services, with real-time interactive voice database queries

The interactive delivery of voice serve messages communicating financial, personal or other news telecasts can be accessed at a subscriber's convenience on an inbound telephone, network enabled or other call. A voice service bureau may generate voice messages for individual subscribers according to their natural language queries. For instance, a subscriber may inquire “What was the closing Dow today” and the system converts the request to digital form, discriminates search terms, interrogates a database and annunciates to the call “The closing Dow was 11,000 today”. Responses to those or other inquiries may be delivered in real time or substantially real time.
Owner:MICROSTRATEGY

Method and system for text-to-speech synthesis with personalized voice

A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker. The synthesis from text may include providing a synthesized image personalized to look like the image of the input speaker with expressions added from the visual input (455).
Owner:CERENCE OPERATING CO

Method and system for adapting a wireless mobile communication device for wireless transactions

A method for configuring a mobile communication device to perform transactions using a second communication channel that is different from a first communication channel through which the mobile communication device sends voice data. The method includes attaching a secure element to the mobile communication device. The secure element includes a memory storing an application, a processor configured to execute the application stored in the memory; and a wireless transceiver configured to send transaction data associated with the executed application through the second communication channel to a terminal that is remote from the mobile communication device.
Owner:BLAZE MOBILE

Communication-status notification apparatus for communication system, communication-status display apparatus, communication-status notification method, medium in which communication-status notification program is recorded and communication apparatus

A communication-status notification apparatus enabling a subscriber to observe various kinds of communication status in a network easily via the subscriber's own terminal in a communication system. The apparatus includes a request analysis section for discriminating whether or not voice data received by gateway equipment from a subscriber terminal contains a request on monitoring / controlling or notifying of a communication status in the network and for analyzing the content of the request when contains, a communication-status monitor / control section for monitoring / controlling the communication status responsive to the content of the request analyzed by the request analysis section based on a processing status of the voice data in the gateway equipment, and a communication-status notification section for notifying the subscriber terminal of the communication status monitored / controlled by the communication-status monitor / control section via the gateway equipment responsive to the content of the request analyzed by the request analysis section. The apparatus is useful when applied to VoIP gateway equipment or the like used for a VoIP communication system.
Owner:FUJITSU LTD

Virtual conference room for voice conferencing

A system and method are disclosed for packet voice conferencing. The system and method divide a conferencing presentation sound field into sectors, and allocate one or more sectors to each conferencing endpoint. At some point between capture and playout, the voice data from each endpoint is mapped into its designated sector or sectors. Thereafter, when the voice data from a plurality of participants from multiple endpoints is combined, a listener can identify a unique apparent location within the presentation sound field for each participant. The system allows a conference participant to increase their comprehension when multiple participants speak simultaneously, as well as alleviate confusion as to who is speaking at any given time.
Owner:CISCO TECH INC

System, method, program product, and networking use for recognizing words and their parts of speech in one or more natural languages

A system, method, and computer program are disclosed for recognizing one or more words not listed in a dictionary database. One or more sequences of characters in the word are checked to determine a probability that the word is valid. A prefix removal process removes any prefixes from a word, and obtains information about the removed prefix. A suffix removal process removes any suffixes from the word, and obtains information about the removed suffix. A root process obtains information about a root word from the dictionary database. A combination process then determines if the prefix, the root, and the suffix can be combined into a valid word as defined by one or more combination rules, obtains one or more of the possible parts of speech of the valid word, and stores the parts of speech with the valid word in the dictionary database.
Owner:IBM CORP

Method and apparatus for monitoring and processing voice over internet protocol packets

A processor architecture for processing data packets representing voice over Internet Protocol (VoIP) calls in a packet-switched network is disclosed. According to an embodiment, a VoIP processor executes a voice packet processing operating system that is configured to monitor or manipulate the packets at an IP layer, media layer and signaling layer of the call. The VoIP processor includes a plurality of independently callable primitive software functions that carry out low-level VoIP packet processing functions. The VoIP processor executes one or more application programs that selectively call one or more of the primitive software functions and are independent of any underlying protocols of the existing network, thereby isolating the application programs from low-level processing details. Further, techniques are described for modifying characteristics of VoIP traffic for the purpose of monitoring and directing the VoIP traffic through a network. The techniques include extracting information associated with the VoIP traffic and using the information for the purpose of controlling access, for fraud detection, for billing, for enforcing policy decisions, for protection against denial of service attacks, for lawful interception, for service selection, and other applications.
Owner:JUMIPER NETWORKS INC

Retrieval and Presentation of Network Service Results for Mobile Device Using a Multimodal Browser

A method of obtaining information using a mobile device can include receiving a request including speech data from the mobile device, and querying a network service using query information extracted from the speech data, whereby search results are received from the network service. The search results can be formatted for presentation on a display of the mobile device. The search results further can be sent, along with a voice grammar generated from the search results, to the mobile device. The mobile device then can render the search results.
Owner:NUANCE COMM INC

Transaction apparatus and method that identifies an authorized user by appearance and voice

A financial transaction apparatus (30) includes a financial transaction machine (32). The machine includes devices (34) including transaction function devices (42, 44, 46, 48) for carrying out operations associated with financial transactions. The terminal also includes an imaging device (50) and an audio input device (52), as well as a visual output device (36) and an audio output device (54). Terminal (32) is connected to a computer (68) which has an associated data store (70). The data store includes user data including image data and voice data corresponding to authorized users. The identity of a customer operating the machine is determined by resolving first identity data based on image signals from the imaging device which correspond to a user's appearance. Second identity data is resolved by the processor from voice signals from the audio input device corresponding to the user's voice. The computer enables operation of the transaction function devices if the level of correlation between the first and second identity data is sufficient to establish that the image and voice signals originate from a single authorized user.
Owner:DIEBOLD NIXDORF

Synthesis unit selection apparatus and method, and storage medium

Input text data undergoes language analysis to generate prosody, and a speech database is searched for a synthesis unit on the basis of the prosody. A modification distortion of the found synthesis unit, and concatenation distortions upon connecting that synthesis unit to those in the preceding phoneme are computed, and a distortion determination unit weights the modification and concatenation distortions to determine the total distortion. An Nbest determination unit obtains N best paths that can minimize the distortion using the A* search algorithm, and a registration unit determination unit selects a synthesis unit to be registered in a synthesis unit inventory on the basis of the N best paths in the order of frequencies of occurrence, and registers it in the synthesis unit inventory.
Owner:CANON KK

Internet-based telephone call manager

A method is provided that allows data access service provider subscribers to manage their telephone service through a data connection. The subscriber is enabled to obtain call data information and is provided real time control. During a data call, a visual incoming call indicator informs the subscriber, through a popup window, connected to the data access service provider that there is a call attempt. A visual message waiting indicator allows a subscriber, connected to the data access service provider to be notified of a pending message on the voice message system. A visual call disposition allows the subscriber, through the data connection, to dispose of calls. The call disposition options include forwarding a call to voice mail, playing an announcement to the calling party, forwarding the call to another line, sending a text message which could be converted to speech using text to speech technology, answering the call using voice over data call or terminating the data connection in order to accept the call.
Owner:RPX CLEARINGHOUSE

Adaptive context for automatic speech recognition systems

A system improves speech recognition includes an interface linked to a speech recognition engine. A post-recognition processor coupled to the interface compares recognized speech data generated by the speech recognition engine to contextual information retained in a memory, generates a modified recognized speech data, and transmits the modified recognized speech data to a parsing component.
Owner:NUANCE COMM INC +1

Voice authentication method and system utilizing same

A system for authorizing user access to a secure site includes a memory unit, first and second input devices, and first and second processing devices. The memory unit stores voice prints and identities of the set of individuals that have access to the secure site. The first input device is for inputting information that identifies the user as a member of the set. The second input device is for inputting temporary user voice data. The first processing device is for generating a temporary voice print from the temporary data. The second processing device is for comparing the temporary voice print to the stored voice prints. Access is granted only if the temporary voice print is most similar to the voice print of the individual that the user claims to be.
Owner:SENTRYCOM

Technique of Generating High Quality Synthetic Speech

A synthetic speech system includes a phoneme segment storage section for storing multiple phoneme segment data pieces; a synthesis section for generating voice data from text by reading phoneme segment data pieces representing the pronunciation of an inputted text from the phoneme segment storage section and connecting the phoneme segment data pieces to each other; a computing section for computing a score indicating the unnaturalness of the voice data representing the synthetic speech of the text; a paraphrase storage section for storing multiple paraphrases of the multiple first phrases; a replacement section for searching the text and replacing with appropriate paraphrases; and a judgment section for outputting generated voice data on condition that the computed score is smaller than a reference value and for inputting the text after the replacement to the synthesis section to cause the synthesis section to further generate voice data for the text.
Owner:CERENCE OPERATING CO

Apparatus and method for reproducing voice in synchronism with music piece

InactiveUS7365260B2Avoid wastingEdited and revised with easeGearworksMusical toysData designEvent data
Music piece sequence data are composed of a plurality of event data which include performance event data and user event data designed for linking a voice to progression of a music piece. A plurality of voice data files are stored in a memory separately from the music piece sequence data. In music piece reproduction, the individual event data of the music piece sequence data are sequentially read out, and a tone signal is generated in response to each readout of the performance event data. In the meantime, a voice reproduction instruction is output in response to each readout of the user event data. In accordance with the voice reproduction instruction, a voice data file is selected from among the voice data files stored in the memory, and a voice signal is generated on the basis of each read-out voice data.
Owner:YAMAHA CORP

Hierarchical real-time speaker recognition for biometric VoIP verification and targeting

A method for real-time speaker recognition including obtaining speech data of a speaker, extracting, using a processor of a computer, a coarse feature of the speaker from the speech data, identifying the speaker as belonging to a pre-determined speaker cluster based on the coarse feature of the speaker, extracting, using the processor of the computer, a plurality of Mel-Frequency Cepstral Coefficients (MFCC) and a plurality of Gaussian Mixture Model (GMM) components from the speech data, determining a biometric signature of the speaker based on the plurality of MFCC and the plurality of GMM components, and determining in real time, using the processor of the computer, an identity of the speaker by comparing the biometric signature of the speaker to one of a plurality of biometric signature libraries associated with the pre-determined speaker cluster.
Owner:THE BOEING CO

Portable digital audio recorder with adaptive control configurations

A portable digital audio recorder provides a first set of control or editing options with respect to a first category of voice data files stored in the recorder, and provides a different set of control or editing options with respect to a second category of voice data files stored in the recorder. The recorder may also be conveniently shared among a number of users and adapts various operating parameters to the preferences of the current user.
Owner:NUANCE COMM INC

Using network time protocol in voice over packet transmission

One or more methods and systems of effectively transmitting voice and voice band data from one node to another are presented. In one embodiment, the system comprises an NTP time server generating absolute times to computing devices such as residential voice over internet protocol (VoIP) gateways. The NTP time server generates absolute times in response to NTP time requests made by one or more computing devices such as residential VoIP gateways. In one embodiment, the method comprises determining an adequate rate for requesting absolute times from an NTP server, making periodic requests to the NTP server, obtaining the absolute times from the NTP server, and generating an adjustment parameter for use by a computing device such as a residential VoIP gateway.
Owner:AVAGO TECH INT SALES PTE LTD

Phonetic decoding and concatentive speech synthesis

A speech processing system includes a multiplexer that receives speech data input as part of a conversation turn in a conversation session between two or more users where one user is a speaker and each of the other users is a listener in each conversation turn. A speech recognizing engine converts the speech data to an input string of acoustic data while a speech modifier forms an output string based on the input string by changing an item of acoustic data according to a rule. The system also includes a phoneme speech engine for converting the first output string of acoustic data including modified and unmodified data to speech data for output via the multiplexer to listeners during the conversation turn.
Owner:CERENCE OPERATING CO

Apparatus for speech recognition using multiple acoustic model and method thereof

Disclosed are an apparatus for recognizing voice using multiple acoustic models according to the present invention and a method thereof. An apparatus for recognizing voice using multiple acoustic models includes a voice data database (DB) configured to store voice data collected in various noise environments; a model generating means configured to perform classification for each speaker and environment based on the collected voice data, and to generate an acoustic model of a binary tree structure as the classification result; and a voice recognizing means configured to extract feature data of voice data when the voice data is received from a user, to select multiple models from the generated acoustic model based on the extracted feature data, to parallel recognize the voice data based on the selected multiple models, and to output a word string corresponding to the voice data as the recognition result.
Owner:ELECTRONICS & TELECOMM RES INST

Conference support system, record generation method and a computer program product

A conference support system includes a data reception portion for receiving image data of attendants in a conference and voice data, an emotion distinguishing portion for distinguishing emotions of attendants in accordance with the image data, a text data generation portion for generating comment text data that indicate contents of speeches of the attendants in accordance with the voice data, and a record generation portion for generating record data that include contents of a speech of an attendant and emotions of attendants when the speech was made, in accordance with emotion data that indicate a result of distinguishing made by the emotion distinguishing portion and the comment text data.
Owner:FUJITSU LTD

Multiplexing several individual application sessions over a pre-allocated reservation protocol session

Apparatus and methods are provided for multiplexing application flows over a pre-allocated bandwidth reservation protocol session. According to one embodiment, a pre-allocated reservation protocol session, such as an RSVP session, is shared by one or more application sessions. The reservation protocol session is pre-allocated over a path between a first network device associated with a first user community and a second network device associated with a second user community based upon an estimated usage of the path for application sessions between users of the first and second user communities. Subsequently, the one or more application sessions are dynamically aggregated by multiplexing application flows associated with the one or more individual application sessions onto the pre-allocated reservation protocol session at the first network device and demultiplexing at the second network device. According to another embodiment, a network device enables multiple applications, such as VoIP applications, that require real-time performance to share an aggregated reservation protocol session, such as an RSVP session. The network device includes a storage device having stored therein one or more routines for establishing and managing the aggregated reservation protocol session. A processor coupled to the storage device executes the one or more routines to pre-allocate the aggregated reservation protocol session and thereafter share the aggregated reservation protocol session among multiple application sessions of individual application sessions. The aggregated reservation protocol session is pre-allocated based upon an estimate of the bandwidth requirements to accommodate the multiple application sessions. The aggregated reservation protocol session is shared by multiplexing, onto the aggregated reservation protocol session, outbound media packets (e.g., packetized voice data) originated by local application / endpoints associated with the application sessions, and demultiplexing, from the aggregated reservation protocol session, inbound media packets (e.g., packetized voice data) originated by remote application / endpoints.
Owner:WINTERSPRING DIGITAL LLC

System and method of providing conversational visual prosody for talking heads

ActiveUS7136818B1High-quality impressionNatural appearing interactionSpeech recognitionVisual prosodySpeech sound
A system and method of controlling the movement of a virtual agent while the agent is speaking to a human user during a conversation is disclosed. The method comprises receiving speech data to be spoken by the virtual agent, performing a prosodic analysis of the speech data, selecting matching prosody patterns from a speaking database and controlling the virtual agent movement according to the selected prosody patterns.
Owner:INTERACTIONS LLC (US)

Speech data mining for call center management

InactiveUS20050010411A1Improving automatic recognition of speechEasy to identifySpeech recognitionQuality of serviceFrustration
A speech data mining system for use in generating a rich transcription having utility in call center management includes a speech differentiation module differentiating between speech of interacting speakers, and a speech recognition module improving automatic recognition of speech of one speaker based on interaction with another speaker employed as a reference speaker. A transcript generation module generates a rich transcript based on recognized speech of the speakers. Focused, interactive language models improve recognition of a customer on a low quality channel using context extracted from speech of a call center operator on a high quality channel with a speech model adapted to the operator. Mined speech data includes number of interaction turns, customer frustration phrases, operator polity, interruptions, and / or contexts extracted from speech recognition results, such as topics, complaints, solutions, and resolutions. Mined speech data is useful in call center and / or product or service quality management.
Owner:PANASONIC CORP

Voice-controlled data system

A voice-controlled data system may include a data storage unit including media files having associated file identification data, and a vocabulary generating unit generating phonetic data corresponding to the file identification data, the phonetic data being supplied to a speech recognition unit as a recognition vocabulary, where one of the media files may be selected according to a recognized speech control command on the basis of the generated phonetic data, where the file identification data include a language identification part for identifying the language of the file identification data, and where the vocabulary generating unit generates the phonetic data for the file identification data of a media file based on its language identification part.
Owner:HARMAN BECKER AUTOMOTIVE SYST
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products