Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

225results about How to "Improve speech recognition accuracy" patented technology

Transparent monitoring and intervention to improve automatic adaptation of speech models

A system and method to improve the automatic adaptation of one or more speech models in automatic speech recognition systems. After a dialog begins, for example, the dialog asks the customer to provide spoken input and it is recorded. If the speech recognizer determines it may not have correctly transcribed the verbal response, i.e., voice input, the invention uses monitoring and if necessary, intervention to guarantee that the next transcription of the verbal response is correct. The dialog asks the customer to repeat his verbal response, which is recorded and a transcription of the input is sent to a human monitor, i.e., agent or operator. If the transcription of the spoken input is correct, the human does not intervene and the transcription remains unmodified. If the transcription of the verbal response is incorrect, the human intervenes and the transcription of the misrecognized word is corrected. In both cases, the dialog asks the customer to confirm the unmodified and corrected transcription. If the customer confirms the unmodified or newly corrected transcription, the dialog continues and the customer does not hang up in frustration because most times only one misrecognition occurred. Finally, the invention uses the first and second customer recording of the misrecognized word or utterance along with the corrected or unmodified transcription to automatically adapt one or more speech models, which improves the performance of the speech recognition system.
Owner:AVAYA INC

Speech-to-speech translation system with user-modifiable paraphrasing grammars

The present invention discloses a speech-to-speech translation device which allows one or more users to input a spoken utterance in one language, translates the utterance into one or more second languages, and outputs the translation in speech form. Additionally, the device allows for translation both directions, recognizing inputs in the one or more second languages and translating them back into the first language. The device recognizes and translates utterances in a limited domain as in a phrase book translation system, so the translation accuracy is essentially 100%. By limiting the domain the system increases the accuracy of the speech recognition component and thus the accuracy of the overall system. However unlike other phrase book systems, the device also allows wide variations and paraphrasing in the input, so that the user is much more likely to find the desired phrase from the stored list of phrases. The device paraphrases the input to a basic canonical form and performs the translation on that canonical form, ignoring the non-essential variations in the surface form of the input. The device can provide visual and / or auditory feedback to confirm the recognized input and makes the system usable for non-bilingual users with absolute confidence.
Owner:EHSANI FARZAD +2

Method and system for speech recognition

A method and a system for speech recognition are provided. In the method, vocal characteristics are captured from speech data and used to identify a speaker identification of the speech data. Next, a first acoustic model is used to recognize a speech in the speech data. According to the recognized speech and the speech data, a confidence score of the speech recognition is calculated and it is determined whether the confidence score is over a threshold. If the confidence score is over the threshold, the recognized speech and the speech data are collected, and the collected speech data is used for performing a speaker adaptation on a second acoustic model corresponding to the speaker identification.
Owner:ASUSTEK COMPUTER INC

Speech recognition system and method

A speech recognition system having multiple recognition vocabularies, and a method of selecting an optimal working vocabulary used by the system are disclosed. Each vocabulary is particulary suited for recognizing speech in a particular language, or with a particular accent or dialect. The system prompts a speaker for an initial spoken response; receives the initial spoken response; compares the response to each of a set of possible responses in an initial speech recognition vocabulary to determine a response best matched in the initial vocabulary. A working speech recognition vocabulary is selected from a plurality of speech recognition vocabularies, based on the best matched response.
Owner:RPX CLEARINGHOUSE

Speech processing apparatus and control method thereof

An speech processing apparatus comprises a setting unit that sets an association between a speech recognition target vocabulary and the shortcut data for transitioning to a state to which a transition is made, when a user makes a transition to a state among the plurality of states using an operation input unit, an speech input unit that inputs an audio, a speech recognition unit that employs the speech recognition target vocabulary to recognize the audio that is input via the speech input unit, and a control unit that employs the shortcut data that corresponds to the speech recognition target vocabulary that is a recognition result of the speech recognition unit to transition to the state, in order to improve speech recognition accuracy for audio shortcuts, while also preserving the convenience of the audio shortcuts.
Owner:CANON KK

Voice interaction method and voice interaction device

The invention discloses a voice interaction method and a voice interaction device. The method comprises steps: after a voice recognition text is received, the voice recognition text is distributed to each service respectively, and semantic understanding is carried out respectively; and based on the obtained semantic understanding result and the application state of a client, confidence ranking is carried out, a semantic understanding result with the highest confidence is acquired, and a response is given to the semantic understanding result with the highest confidence. As confidence ranking of semantic understanding results on multidimensional-based information not only considers the matching degree between a semantic understanding result and each service but also considers the application state of the client, for example, whether the client is in a navigation state or a music listening state, the application of the client and the application state are possibly objects to be processed by voice interaction, semantic understanding on the multidimensional-based information can effectively enhance the accuracy of judging the service belonging, the accuracy of man-machine interaction semantic understanding is improved, and the user experience is enhanced.
Owner:IFLYTEK CO LTD

Distributed voice recognition system and method

A distributed voice recognition system (500) and method employs principles of bottom-up (i.e., raw input) and top-down (i.e., prediction based on past experience) processing to perform client-side and server-side processing by (i) at the client-side, replacing application data by a phonotactic table (504); (ii) at the server-side, tracking separate confidence scores for matches against an acoustic model and comparison to a grammar; and (iii) at the server-side using a contention resolver (514) to weight the client-side and server-side results to establish a single output which represents the collaboration between client-side processing and server-side processing.
Owner:MICROSOFT TECH LICENSING LLC

Enhanced multilingual speech recognition system

A speech recognition system comprising: a language identification unit for identifying the language of a text item entry; at least one separate pronunciation modelling unit including a phoneme set and pronunciation model for at least one language; means for activating the pronunciation modelling unit including the phoneme set and pronunciation model for the language corresponding to the language identified in the language identification unit for obtaining a phoneme transcription for the entry; and a multilingual acoustic modelling unit for creating a recognition model for the entry.
Owner:NOKIA CORP

Disambiguating results within a speech based IVR session

Within an interactive voice response system, a method of automatically disambiguating results presented to a user can include determining the identity of a user within an interactive voice response session, receiving user inputs specifying selections in an interactive voice response menu hierarchy, and storing historical information specifying the user selections within a profile associated with the identity of the user. For at least one subsequent input from the user, identifying the historical information associated with the identity of the user and using the historical information to reduce a number of possible selections in the interactive voice response menu hierarchy which are presented to the user.
Owner:NUANCE COMM INC

Voice recognition method and device

The present invention discloses a voice recognition method and a device. The method comprises the steps of firstly, acquiring the voice information input by a speaker and acquiring the information of the speaker; secondly, judging the existence / absence of a personal acoustic model corresponding to the speaker according to the information of the speaker; thirdly, upon judging the existence of the personal acoustic model, acquiring the personal acoustic model and conducting the voice recognition on the voice information according to the personal acoustic model of the speaker; fourthly, upon judging the absence of the personal acoustic model, conducting the voice recognition on the voice information according to a basic acoustic model, generating the corpus information of the speaker according to the voice information, and storing the corpus information; fifthly, generating a personal acoustic model for the speaker based on the basic acoustic model and the stored corpus information. Based on the method, acoustic models can be customized for all speakers based on the characteristics of the speakers during the self-adaptive voice recognition process of the speakers. Therefore, the recognition accuracy for each speaker is improved and the user experience is improved.
Owner:BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Speech recognition method, speech assessment method, speech recognition system, and speech assessment system

The invention discloses a speech recognition method used for recognizing speech of users and generating speech recognition results. The method includes following steps: a speech obtaining step, obtaining the speech of the user; a speech recognition step, recognizing the obtained speech as text data and regarding the data as an initial speech recognition result; and an error correction step, checking an error correction list which records the relevance of a plurality of original vocabularies-deviation vocabularies, performing error correction processing if a deviation vocabulary in the relevance of a certain original vocabulary-deviation vocabulary in the error correction list existing in the initial speech recognition result, and replacing the vocabulary which is the same with the deviation vocabulary in the initial speech recognition result into the original vocabulary in the relevance of the original vocabulary-deviation vocabulary. In this way, the speech recognition result after error correction is obtained and regarded as the speech recognition result generated by the speech recognition method. The invention also discloses a speech assessment method based on the speech recognition method, and a corresponding speech recognition system and a speech assessment system.
Owner:RICOH KK

System and method for improving recognition accuracy in speech recognition applications

A speech recognition system and method are provided to correctly distinguish among multiple interpretations of an utterance. This system is particularly useful when the set of possible interpretations is large, changes dynamically, and / or contains items that are not phonetically distinctive. The speech recognition system extends the capabilities of mobile wireless communication devices that are voice operated after their initial activation.
Owner:STRYKER CORP

Distributed voice recognition system and method

A distributed voice recognition system (500) and method employs principles of bottom-up (i.e., raw input) and top-down (i.e., prediction based on past experience) processing to perform client-side and server-side processing by (i) at the client-side, replacing application data by a phonotactic table (504); (ii) at the server-side, tracking separate confidence scores for matches against an acoustic model and comparison to a grammar; and (iii) at the server-side using a contention resolver (514) to weight the client-side and server-side results to establish a single output which represents the collaboration between client-side processing and server-side processing.
Owner:NUANCE COMM INC

Speech processing system and terminal

[Object] An object is to provide an easy-to-use speech processing system attaining higher accuracy of speech recognition.[Solution] Receiving a speech utterance, the speech processing system performs speech recognition and displays a text 158 of the recognition result. Further, the speech processing system translates the recognition result in accordance with settings to a text 176 of another language and displays and synthesizes speech of the translated result. Further, the speech processing system selects utterance candidates having high possibility to be uttered as the next utterance and having high translation and speech recognitions scores, using outputs of various sensors at the time of utterance, a pre-trained utterance sequence model and translation and speech recognition scores of utterance candidates, and recommends utterance candidates in the form of an utterance candidate recommendation list 192. A user can think of what to say next using the utterances in utterance candidate recommendation list 192 as hints.
Owner:NAT INST OF INFORMATION & COMM TECH

Speech recognition method and device based on Chinese and English mixed dictionary

The invention provides a speech recognition method and a speech recognition device based on a Chinese and English mixed dictionary. The speech recognition method comprises the steps of: acquiring the Chinese and English mixed dictionary marked by the International Phonetic Alphabet IPA, wherein the Chinese and English mixed dictionary comprises a Chinese dictionary and an English dictionary corrected by means of the Chinese dictionary; training the model by regarding the Chinese and English mixed dictionary as a training dictionary, regarding a layer of convolutional neural network CNN plus five layers of Long Short-Term Memory (LSTM) network as a model, regarding status of the IPA as a target and regarding a connectionist time classifier CTC as a training criterion, so as to obtain a trained CTC acoustic model; and combining with the trained CTC acoustic model for performing speech recognition on a Chinese and English mixed language. According to the speech recognition method and the speech recognition device, the Chinese and English mixed dictionary comprising the Chinese dictionary and the English dictionary corrected by means of the Chinese dictionary is adopted for training, the English word coverage is comprehensive and Chinglish can be recognized, and the accuracy degree of Chinese and English mixed language recognition is further improved by combining the application of the CTC acoustic model.
Owner:BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Speech recognition method and speech recognition device

The invention discloses a speech recognition method and a speech recognition device. A specific embodiment of the method comprises the following steps: segmenting speech information to be recognized into multiple frames of speech segments; performing acoustic model scoring and language model score-checking on the speech segments frame by frame through a preset decoding network; and determining a word sequence corresponding to at least one decoding path in the decoding network as the speech recognition result based on the score result, wherein first language model score-checking and second language model score-checking are carried out in sequence during language model score-checking on a frame of speech segment. By implementing the technical scheme, accurate and efficient speech recognition is realized.
Owner:BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Distributed language processing system and method of outputting intermediary signal thereof

A unified speech input dialogue interface, and a distributed multiple application-dependent language processing unit system with the unified speech recognition function and the unified dialogue interface are provided. The system not only provides a convenient user's environment, but also enhances the whole performance of speech recognition. The distributed multiple application-dependent language processing unit system uses a speech input interface so that the user can be familiar with a simple, unified interface. The system also improves the speech recognition accuracy and enhances the convenience of use by self-learning personalized dialogue model.
Owner:DELTA ELECTRONICS INC

Voice recognition method and device based on Chinese and English mixed dictionary

The invention proposes a voice recognition method and device based on a Chinese and English mixed dictionary, and the method comprises the steps: obtaining the Chinese and English mixed dictionary marked with the IPA (International Phonetic Alphabet), wherein the Chinese and English mixed dictionary comprises a Chinese dictionary and an English dictionary which passes through the Chinglish correction; taking the Chinese and English mixed dictionary as a training dictionary, adding one CNN (Convolutional Neural Network) to five time recursion neural network LSTMs to form a model, taking the syllables or words as the targets, taking a CTC (Connectionist Temporal Classifier) as a training rule for the training of the model, and obtaining a trained CTC acoustic model; combining with the trained CTC acoustic model to carry out the voice recognition of the Chinese and English mixed language. According to the embodiment of the invention, the method employs the Chinese and English mixed dictionary for training, wherein the Chinese and English mixed dictionary comprises the Chinese dictionary and the English dictionary which passes through the Chinglish correction. The Chinese and English words are complete, and the method and device can achieve the recognition of Chinglish. The method and device further improve the recognition accuracy of the Chinese and English mixed language through combining with the CTC acoustic model.
Owner:BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Methods and systems for improving alphabetic speech recognition accuracy

Methods and systems are provided for improving the accuracy of a speech recognition system in recognizing alphabetic character input. If spoken alphabetic characters are erroneously recognized by a speech recognition system, a user may reenter the alphabetic characters using DTMF key tones on the user's telephone keypad to assist the speech recognition system in determining the correct input. If the input is originally input using DTMF key tones, the user may reenter the input as spoken alphabetic characters to assist the system to identify the correct input from the combinations of input that may be associated with the DTMF key tone entry.
Owner:AT&T INTPROP I L P

Small data speech acoustic modeling method in speech recognition

The invention belongs to the technical field of signal processing in the electronic industry, and aims at solving a problem that the discrimination performance of an acoustic model of a target language with just a little mark data is low. In order to solve the above problem, the invention provides a small data speech acoustic modeling method in speech recognition, and the method comprises the steps: carrying out the adversarial training of the acoustic features of many languages through a language adversarial discriminator, so as to build a multi-language adversarial bottleneck network model;taking the acoustic features of a target language as the input of the multi-language adversarial bottleneck network model, so as to extract the bottleneck features which is irrelevant to the language;carrying out the fusion of the bottleneck features which is irrelevant to the language with the acoustic features of the target language, so as to obtain fusion features; carrying out the training through the fusion features, so as to build an acoustic model of the target language. The method effectively irons out the defects, caused by a condition that the bottleneck information comprises the information correlated with the language, of the unremarkable improvement of the recognition performance of the target language and even the negative migration phenomenon in the prior art, thereby improving the voice recognition precision of the target language.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

Voice recognition method and device and electronic equipment

The invention relates to the technical field of voice recognition, provides a voice recognition method and device and electronic equipment, which aim to solve the problem of relatively low voice recognition accuracy. The method comprises the following steps of acquiring a to-be-recognized voice, performing feature extraction on the to-be-recognized voice to obtain voice feature information, determining a target character sequence corresponding to the voice feature information according to the target acoustic model and the target language model, wherein the target language model comprises a first language model and a second language model, the first language model is obtained by performing language model training through a command word training text of a first scene, and the second languagemodel is obtained by performing language model training through a first text training set. In the voice recognition process, two language models are adopted, and the first language model is obtainedby performing language model training through the command word training text of the first scene, so that the recognition capability of the first language model for related command words in the first scene can be enhanced, and the voice recognition accuracy can be improved.
Owner:SOUNDAI TECH CO LTD

Method and system for speech input

Inputting speech includes receiving feature information obtained by a client, the feature information comprising speech signals and user feature image signals, recognizing first candidate recognition data matching the user feature image signals, determining target recognition data based at least on the first candidate recognition data, and outputting the target recognition data.
Owner:ALIBABA GRP HLDG LTD

Network speech recognition method in English oral language machine examination system

The invention relates to a scheme of realizing network speech recognition in an English oral language machine examination system. According to the scheme, traditional spectral subtraction (SS) noise reduction technology and cepstral mean normalization (CMN) noise reduction technology are improved, combined with a probability scale DP identification method of a continuous state hidden Markov model(HMM), the invention provides a network speech recognition scheme of unspecified people in an English network examination system, and by utilizing the scheme, a network speech recognition apparatus in a physical environment is realized. By employing the above method, an SS method with input amplitude spectrum self-adapting and a CMN method based on progressive adaptive mode MAP algorithm are combined, and influence of ambient noise on an identification system is substantially reduced. Simultaneously, according to the scheme, based on a traditional DP method, by utilizing a DP algorithm of probability scale, recognition is carried out, thus a DSP speech recognition apparatus can be applied to speech recognition of unspecified people of different outdoor occasions, and a recognition system scope and recognition precision are raised.
Owner:SOUTHEAST UNIV

Speech scoring method and device, electronic device, and storage medium

InactiveCN109256152AImprove review scoreAccurate assessment scoreSpeech analysisScore methodAcoustic model
The invention discloses a speech scoring method and device, an electronic device, and a storage medium, and relates to the technical field of computers. The speech scoring method includes the steps that sample features are extracted from sample speech data, and an acoustic model is trained through the sample features, and the trained acoustic model is obtained; a language model is constructed according to standard text data corresponding to the sample speech data, the sample speech data are decoded by the language model and the trained acoustic model to obtain acoustic features of the sample speech data; and a scoring model is trained through the acoustic features and prosodic features of the sample speech data, and target speech data is scored according to the trained scoring model to obtain scores of the target speech data. The speech scoring method and device, the electronic device, and the storage medium can accurately score the target speech data.
Owner:上海一起作业信息科技有限公司

Lip language recognition method and device based on deep learning

The invention relates to a lip language recognition method and device based on deep learning. The method includes the following steps that a voice signal and a video of a user are obtained, and the video is obtained by shooting the face of the user when the user sends the voice signal; the voice signal is identified by the voice recognition technology, and a first text is obtained; a to-be-recognized lip image sequence is obtained from the video; a lip feature vector is extracted from the to-be-recognized lip image sequence, and a second text is obtained according to the lip feature vector; the first text is corrected according to the second text, and a text corresponding to the voice signal of the user is obtained. According to the technical scheme, the problem of low accuracy of voice recognition in a noisy environment of the prior art can be solved.
Owner:ONE CONNECT SMART TECH CO LTD SHENZHEN

Voice input method, device, and system

Inputting speech includes receiving feature information obtained by a client, the feature information comprising speech signals and user feature image signals, recognizing first candidate recognition data matching the user feature image signals, determining target recognition data based at least on the first candidate recognition data, and outputting the target recognition data.
Owner:ALIBABA GRP HLDG LTD

Speech processing device, speech processing method, and speech processing program

A speech processing device includes a speech recognition unit configured to sequentially recognize recognition segments from an input speech, a reverberation influence storage unit configured to store a degree of reverberation influence indicating an influence of a reverberation based on a preceding speech to a subsequent speech subsequent to the preceding speech and a recognition segment group including a plurality of recognition segments in correlation with each other, a reverberation influence selection unit configured to select the degree of reverberation influence corresponding to the recognition segment group which includes the plurality of recognition segments recognized by the speech recognition unit from the reverberation influence storage unit, and a reverberation reduction unit configured to remove a reverberation component weighted with the degree of reverberation influence from the speech from which at least a part of recognition segments of the recognition segment group is recognized.
Owner:HONDA MOTOR CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products