Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

74 results about "Phoneme recognition" patented technology

Phoneme recognition is carried out using the acoustic model. The acoustic model is created using machine learning algorithms. The machine learning is divided into two phases: training and testing.

Speech recognition apparatus, speech recognition method, and speech recognition robot

ActiveUS8886534B2Low correct answer rateAvoid correcting a phonemeSpeech recognitionPhoneme recognitionSpeech input
A speech recognition apparatus includes a speech input unit that receives input speech, a phoneme recognition unit that recognizes phonemes of the input speech and generates a first phoneme sequence representing corrected speech, a matching unit that matches the first phoneme sequence with a second phoneme sequence representing original speech, and a phoneme correcting unit that corrects phonemes of the second phoneme sequence based on the matching result.
Owner:HONDA MOTOR CO LTD

Method for Automated Training of a Plurality of Artificial Neural Networks

The invention provides a method for automated training of a plurality of artificial neural networks for phoneme recognition using training data, wherein the training data comprises speech signals subdivided into frames, each frame associated with a phoneme label, wherein the phoneme label indicates a phoneme associated with the frame. A sequence of frames from the training data are provided, wherein the number of frames in the sequence of frames is at least equal to the number of artificial neural networks. Each of the artificial neural networks is assigned a different subsequence of the provided sequence, wherein each subsequence comprises a predetermined number of frames. A common phoneme label for the sequence of frames is determined based on the phoneme labels of one or more frames of one or more subsequences of the provided sequence. Each artificial neural network using the common phoneme label.
Owner:CERENCE OPERATING CO

Multi-phoneme streamer and knowledge representation speech recognition system and method

A system and method related to a new approach to speech recognition that reacts to concepts conveyed through speech. In its fullest implementation, the system and method shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. This is done by using a probabilistically unbiased multi-phoneme recognition process, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance's conceptual representation that can be used to produce an adequate response. The invention can be employed for a myriad of applications, such as improving accuracy or automatically generating punctuation for transcription and dictation, word or concept spotting in audio streams, concept spotting in electronic text, customer support, call routing and other command / response scenarios.
Owner:CHEMTRON RES

Systems and methods for combining subword recognition and whole word recognition of a spoken input

A computer-based detection (e.g., speech recognition) system combines a word decoder and subword decoder to detect words (or phrases) in a spoken input provided by a user into a speaker connected to the detection system. The word decoder detects words by comparing an input pattern (e.g., of hypothetical word matches) to reference patterns (e.g., words). The subword decoder compares an input pattern (e.g., hypothetical words matches based on subword or phoneme recognition) to reference patterns (e.g., words) based on a word pronunciation distance measure that indicates how close each input pattern is to matching each reference pattern. The subword decoder sorts the source set of reference patterns based on a closeness of each reference pattern to correctly matching the input pattern based on generated pattern comparisons. The word decoder and subword decoder each provide an N-best list of hypothetical matches to the spoken input. A list fusion module of the detection system selectively combines the two N-best lists to produce a final or combined N-best list. The final or combined list has a predefined number of matches.
Owner:HEWLETT PACKARD DEV CO LP

Downsampling Schemes in a Hierarchical Neural Network Structure for Phoneme Recognition

An approach for phoneme recognition is described. A sequence of intermediate output posterior vectors is generated from an input sequence of cepstral features using a first layer perceptron. The intermediate output posterior vectors are then downsampled to form a reduced input set of intermediate posterior vectors for a second layer perceptron. A sequence of final posterior vectors is generated from the reduced input set of intermediate posterior vectors using the second layer perceptron. Then the final posterior vectors are decoded to determine an output recognized phoneme sequence representative of the input sequence of cepstral features.
Owner:NUANCE COMM INC

Network teaching method and system with voice assessment function

The invention provides a network teaching method and system with a voice assessment function. According to the voice assessment method provided by the invention, a phoneme state of a voice is used for replacing a multi-Gaussian mixture model trained by a conventional Mel-frequency cepstral coefficient (MFCC), and a posterior probability and a zero-order Baum-Welch statistical magnitude are calculated according to the feature. A voice feature based on phonemes is extracted through a multi-language phoneme identifier. A feature based on multi-language extraction is complementary during catching of non-native pronunciation information, and a feature based on phoneme duration is effective in automatic native accent assessment. Finally, a fusion system is provided in the method, so that Spearman relevant coefficients of 0.5706 and 0.6089 are reached on a development set and a test set. As indicated by the relevant coefficients, the method provided by the invention is very accurate and effective in oral speech assessment.
Owner:SHENZHEN EAGLESOUL EDUCATION SERVICE CO LTD

Method and apparatus for recognizing speech

Provided are an apparatus and method for recognizing speech, in which reliability with respect to phoneme-recognized phoneme sequences is calculated and performance of speech recognition is enhanced using the calculated results. The method of recognizing speech includes the steps of: determining a boundary between phonemes included in character sequences that are phonetically input to detect each phoneme interval; calculating reliability according to a probability that a phoneme indicated by the detected phoneme interval corresponds to a phoneme included in a predefined phoneme model; calculating a phoneme alignment cost with respect to the character sequences based on the calculated reliability and a pre-trained and stored phoneme recognition probability distribution; and performing phoneme alignment based on the calculated phoneme alignment cost to perform speech recognition on the input character sequences. As a result, reliability with respect to the phoneme-recognized phoneme sequences can be calculated, and the performance of speech recognition can be enhanced using the calculated results.
Owner:ELECTRONICS & TELECOMM RES INST

Microphone assembly comprising a phoneme recognizer

The present invention relates to a microphone assembly comprising a phoneme recognizer. The phoneme recognizer comprises an artificial neural network (ANN) comprising at least one phoneme expect pattern and a digital processor configured to repeatedly applying one or more sets of frequency components derived from a digital filter bank to respective inputs of an artificial neural network. The artificial neural network is configured to detect and indicate a match between the at least one phoneme expect pattern and the one or more sets of frequency components.
Owner:KNOWLES ELECTRONICS INC

Identity uniformity check method and device based on spectrogram and phoneme retrieval

The invention provides an identity uniformity check method and device based on a spectrogram and phoneme retrieval. The method includes the following steps: acquiring a spectrogram corresponding to asample audio file; acquiring the speech feature parameters of the sample audio file; constructing a phoneme recognition model, inputting the speech feature parameters to the phoneme recognition modeland carrying out phoneme retrieval to get qualified phonemes; and marking the qualified phonemes on the spectrogram, checking the uniformity of vowels or vowel combinations with the same identifier, and judging whether a to-be-identified person corresponding to the sample audio file passes identity verification. The technical problem on phoneme searching and finding in actual voiceprint authentication is solved. Phonemes can be displayed visually. The identification efficiency of investigators is improved.
Owner:SPEAKIN TECH CO LTD

DNN (Deep Neural Network)-HMM (Hidden Markov Model)-based civil aviation radiotelephony communication acoustic model construction method

The invention relates to a DNN (Deep Neural Network)-HMM (Hidden Markov Model)-based civil aviation radiotelephony communication acoustic model construction method. The method includes the following steps that: a Chinese radiotelephony communication corpus is set up; civil aviation radiotelephony communication speech signals are pre-processed; Fbank features are extracted from the civil aviation radiotelephony communication speech signals and are adopted as civil aviation radiotelephony communication speech features; linear discrimination analysis, feature space maximum likelihood regression transformation and speaker adaptive training transformation processing are performed on the civil aviation radiotelephony communication speech features; and the processed speech features are utilized to build a DNN-HMM-based radiotelephony communication acoustic model. With the method of the invention adopted, the FBANK and MFCC features of radiotelephony communication speech are extracted to traina DNN network, so that the DNN-HMM acoustic model suitable for radiotelephony communication speech recognition can be obtained; and since a dictionary and a language model are combined, so that the feature enhanced DNN-HMM model can reduce the phoneme recognition error rate of the radiotelephony communication speech to 5.62% on the basis of constructed data.
Owner:CIVIL AVIATION UNIV OF CHINA

Lexical acquisition apparatus, multi dialogue behavior system, and lexical acquisition program

A lexical acquisition apparatus includes: a phoneme recognition section 2 for preparing a phoneme sequence candidate from an inputted speech; a word matching section 3 for preparing a plurality of word sequences based on the phoneme sequence candidate; a discrimination section 4 for selecting, from among a plurality of word sequences, a word sequence having a high likelihood in a recognition result; an acquisition section 5 for acquiring a new word based on the word sequence selected by the discrimination section 4; a teaching word list 4A used to teach a name; and a probability model 4B of the teaching word and an unknown word, wherein the discrimination section 4 calculates, for each word sequence, a first evaluation value showing how much words in the word sequence correspond to teaching words in the list 4A and a second evaluation value showing a probability at which the words in the word sequence are adjacent to one another and selects a word sequence for which a sum of the first evaluation value and the second evaluation value is maximum, and wherein the acquisition section 5 acquires, as a new word, a word in the word sequence selected by the discrimination section that is not involved in the calculation of the first evaluation value.
Owner:HONDA MOTOR CO LTD +1

Method for estimating language model weight and system for the same

Method of the present invention may include receiving speech feature vector converted from speech signal, performing first search by applying first language model to the received speech feature vector, and outputting word lattice and first acoustic score of the word lattice as continuous speech recognition result, outputting second acoustic score as phoneme recognition result by applying an acoustic model to the speech feature vector, comparing the first acoustic score of the continuous speech recognition result with the second acoustic score of the phoneme recognition result, outputting first language model weight when the first coustic score of the continuous speech recognition result is better than the second acoustic score of the phoneme recognition result and performing a second search by applying a second language model weight, which is the same as the output first language model, to the word lattice.
Owner:ELECTRONICS & TELECOMM RES INST

Automatic pattern recognition using category dependent feature selection

Disclosed are apparatus and methods that employ a modified version of a computational model of the human peripheral and central auditory system, and that provide for automatic pattern recognition using category dependent feature selection. The validity of the output of the model is examined by deriving feature vectors from the dimension expanded cortical response of the central auditory system for use in a conventional phoneme recognition task. In addition, the cortical response may be a place-coded data set where sounds are categorized according to the regions containing their most distinguishing features. This provides for a novel category-dependent feature selection apparatus and methods in which this mechanism may be utilized to better simulate robust human pattern (speech) recognition.
Owner:GEORGIA TECH RES CORP

Method for automated training of a plurality of artificial neural networks

The invention provides a method for automated training of a plurality of artificial neural networks for phoneme recognition using training data, wherein the training data comprises speech signals subdivided into frames, each frame associated with a phoneme label, wherein the phoneme label indicates a phoneme associated with the frame. A sequence of frames from the training data are provided, wherein the number of frames in the sequence of frames is at least equal to the number of artificial neural networks. Each of the artificial neural networks is assigned a different subsequence of the provided sequence, wherein each subsequence comprises a predetermined number of frames. A common phoneme label for the sequence of frames is determined based on the phoneme labels of one or more frames of one or more subsequences of the provided sequence. Each artificial neural network using the common phoneme label.
Owner:CERENCE OPERATING CO

Method and apparatus for context independent gender recognition utilizing phoneme transition probability

InactiveUS20140172428A1Discriminately distinguishingSpeech recognitionFeature vectorPhoneme recognition
Provided is a method for context independent gender recognition utilizing phoneme transition probability. The method for the context independent gender recognition includes detecting a voice section from a received voice signal, generating feature vectors within the detected voice section, performing a hidden Markov model on the feature vectors by using a search network that is set according to a phoneme rule to recognize a phoneme and obtain scores of first and second likelihoods, and comparing final scores of the first and second likelihoods obtained while the phoneme recognition is performed up to the last section of the voice section to finally decide gender with respect to the voice signal.
Owner:ELECTRONICS & TELECOMM RES INST

Conceptual analysis driven data-mining and dictation system and method

A new approach to speech recognition that reacts to concepts conveyed through speech, which shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. A probabilistically unbiased multi-phoneme recognition process is employed, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance's conceptual representation that can be used to produce an adequate response. Applications include improving accuracy or automatically generating punctuation for transcription and dictation, word or concept spotting in audio streams, concept spotting in electronic text, customer support, call routing and other command / response scenarios.
Owner:CHEMTRON RES

Recognition method and device for voice phoneme

ActiveCN109754789AMake up unary hypothesisCompensating for binary assumptionsSpeech recognitionLocal optimumPhoneme recognition
The invention discloses a recognition method and device for a voice phoneme, and relates to the technical field of voice recognition. A main purpose is to solve a problem of low phoneme segmentation efficiency or a locally optimal solution during voice recognition. According to the main technical scheme provided in the invention, the recognition method comprises the following steps of inputting ato-be-recognized voice into a phoneme recognition model, and obtaining, according to an output result, an expected result corresponding to the to-be-recognized voice, wherein the phoneme recognition model identifies each phoneme in the to-be-recognized voice through multiple neural network models and a hidden Markov model; training a model parameter in the phoneme recognition model according to the expected result until a rate of change of an output result of a phoneme model is less than a preset threshold value; and determining an output result with a rate of change less than the preset threshold value as a final phoneme recognition result corresponding to the to-be-recognized voice. The recognition method is mainly applied to a process of recognizing a sound.
Owner:BEIJING GRIDSUM TECH CO LTD

Cross-language timbre conversion system and method based on zero-order learning

The invention discloses a cross-language timbre conversion system and method based on zero-order learning. The system sequentially comprises a mixed phoneme recognition module, a timbre conversion module, a speaker coding module and a vocoder module. According to the system, a voice signal Mel spectrum serves as an input signal, bottleneck features of the voice signal Mel spectrum are extracted through the phoneme recognition module, the features are normalized and then transmitted to an acoustic model, the Mel spectrum synthesized by the acoustic model is controlled by controlling a speaker reference vector, and finally audio is synthesized through a vocoder. The system can convert the voice of a common speaker into the timbre of a specified speaker, is suitable for accent corpora which do not appear in a training database, can be suitable for voice change of dialects in multiple regions, and has a wide application prospect.
Owner:SOUTH CHINA UNIV OF TECH

Voice adaptive completion system based on multi-modal knowledge graph

The invention discloses a voice adaptive completion system based on a multi-modal knowledge graph. The system comprises a data receiver, a data analyzer and a data inference device. The data receiver preprocesses received audio and video data and outputs the audio and video data to the data analyzer; the data analyzer analyzes the voice and the image to extract waveform time sequence features and lip track features, and a phoneme sequence is obtained through multi-mode joint representation; and the data inference device carries out domain session modeling and candidate text prediction according to historical texts, text inference is carried out in combination with a phoneme sequence, statements with semantics are obtained, and complemented voice is synthesized according to waveform features. According to the invention, through a phoneme reasoning model, phoneme recognition is carried out when the voice modality is lost, the domain session modeling is carried out on the historical text generated by the existing voice according to the semantic relationship between the entities in the multi-modal knowledge graph, so that reasoning is carried out to generate the text with semantic, the voice is synthesized in combination with the waveform characteristics of the user voice, and the complemented audio is formed.
Owner:SHANGHAI JIAO TONG UNIV

Voice phoneme recognition method and device, storage medium and electronic device

The invention discloses a voice phoneme recognition method, a voice phoneme recognition device, a storage medium and an electronic device. The method comprises the following steps: extracting a plurality of first voice features from a plurality of voice frames by using a shared coder; determining a plurality of key voice features from the plurality of first voice features by using a CTC model, wherein each key voice feature corresponds to a peak position output by the CTC model; determining a voice feature set corresponding to each key voice feature, wherein each voice feature set comprises acorresponding key voice feature and one or more voice features adjacent to the corresponding key voice feature in the plurality of first voice features; carrying out feature fusion on the voice features in each voice feature set by using self-attention network, thus obtaining a plurality of fused voice features, wherein each voice feature set corresponds to one fused voice feature; and recognizingthe phoneme corresponding to each fused voice feature in the phoneme set by using a coder of a target attention model.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Interactive language learning system and method thereof

The invention discloses an interactive language learning system and a method thereof. The interactive language learning system comprises a voice reference module, a characteristic extracting module, aphoneme associating module, a voice learning module, a phoneme correction module, a correction suggestion module, a phoneme evaluating module, a voice feedback module and a corpus, wherein the voicelearning module is used for collecting voice data designated by aloud reading of a learner; the phoneme correction module is used for synthesizing feedback voice having reference voice rhythm and learner's tone, and rhythm corrected voice can guide the learner to simulate rhythm of the reference voice; the correction suggestion module, the phoneme evaluating module and the voice feedback module are used for saving results in a data collection module; and the corpus is used for transmitting random spoken language information to the learner, and learner's learning is fed back to a database via the correction suggestion module. The interactive language learning system and the method provided by the invention, besides usual pronunciation evaluation, also provide an error detection function based on phoneme associating and phoneme recognition; and in combination with standard voice improvement suggestions and phoneme correction voice in the corpus, the learner can be helped timely, and mostunintentional errors of learners having certain basis can be corrected.
Owner:合肥凌极西雅电子科技有限公司

Spoken language pronunciation evaluation method based on deep neural network posterior probability algorithm

InactiveCN108364634AAccurate Voice Evaluation ResultsSpeech recognitionEvaluation resultPhoneme recognition
The present invention discloses a spoken language pronunciation evaluation method based on a deep neural network posterior probability algorithm. The method comprises the following steps of: selectinga certain amount of voice frequencies from voice, wherein the number of words of each voice frequency is in a certain range, calculating the average likelihood of the phoneme of one word, the averageEGOP of the phoneme of one word and the average duration probability of the phoneme of one word in each voice frequency; and taking the average likelihood of the phoneme of one word, the average EGOPof the phoneme of one word and the average duration probability of the phoneme of one word in each voice frequency as input items, inputting the average likelihood of the phoneme of one word, the average EGOP of the phoneme of one word and the average duration probability of the phoneme of one word in each voice frequency into a neural network, and outputting scores of words. The spoken languagepronunciation evaluation method based on a deep neural network posterior probability algorithm starts from an acoustic model, the LSTM modeling is employed to improve the phoneme recognition rate, theFA likelihood and all the similar phoneme likelihoods are compared, a GOP method is extended to an EGOP method, an artificial neural network scoring model is employed to perform scoring so as to obtain an accurate voice evaluation result.
Owner:苏州声通信息科技有限公司

Lip language combination method and device, electronic device and storage medium

The embodiment of the invention discloses a lip language combination method and device, an electronic device and a storage medium. The method comprises the steps of: performing automatic speech recognition, performing phoneme recognition according to a recognition result, determining a time interval of the phonemes in the speech signals to achieve conversion of original speech signals to phonemeswith period information (namely pronunciation duration of the phonemes in the speech signals), and finally, combining a lip language through a corresponding relation of preset phonemes and a mouth shape. The method is employed to combine the lip language to improve the matching degree of the dynamic rhythms of the lip language and the rhythms of the speech, improve the mouth shape accuracy and achieve the combination of the lip language with high vividness while automatic combination of the lip language.
Owner:广州方硅信息技术有限公司

Lexical acquisition apparatus, multi dialogue behavior system, and lexical acquisition program

InactiveUS8566097B2Speech recognitionPhoneme recognitionLexical acquisition
A lexical acquisition apparatus includes: a phoneme recognition section 2 for preparing a phoneme sequence candidate from an inputted speech; a word matching section 3 for preparing a plurality of word sequences based on the phoneme sequence candidate; a discrimination section 4 for selecting, from among a plurality of word sequences, a word sequence having a high likelihood in a recognition result; an acquisition section 5 for acquiring a new word based on the word sequence selected by the discrimination section 4; a teaching word list 4A used to teach a name; and a probability model 4B of the teaching word and an unknown word, wherein the discrimination section 4 calculates, for each word sequence, a first evaluation value showing how much words in the word sequence correspond to teaching words in the list 4A and a second evaluation value showing a probability at which the words in the word sequence are adjacent to one another and selects a word sequence for which a sum of the first evaluation value and the second evaluation value is maximum, and wherein the acquisition section 5 acquires, as a new word, a word in the word sequence selected by the discrimination section that is not involved in the calculation of the first evaluation value.
Owner:HONDA MOTOR CO LTD +1

Method and apparatus for recognizing continuous speech using search space restriction based on phoneme recognition

Provided are an apparatus and method for recognizing continuous speech using search space restriction based on phoneme recognition. In the apparatus and method, a search space can be primarily reduced by restricting connection words to be shifted at a boundary between words based on the phoneme recognition result. In addition, the search space can be secondarily reduced by rapidly calculating a degree of similarity between the connection word to be shifted and the phoneme recognition result using a phoneme code and shifting the corresponding phonemes to only connection words having degrees of similarity equal to or higher than a predetermined reference value. Therefore, the speed and performance of the speech recognition process can be improved in various speech recognition services.
Owner:ELECTRONICS & TELECOMM RES INST

System and method for detection and correction of incorrectly pronounced words

A system and method are disclosed for capturing a segment of speech audio, performing phoneme recognition on the segment of speech audio to produce a segmented phoneme sequence, comparing the segmented phoneme sequence to stored phoneme sequences that represent incorrect pronunciations of words to determine if there is a match, and identifying an incorrect pronunciation for a word in the segment of speech audio. The system builds a library based on the data collected for the incorrect pronunciations.
Owner:SOUNDHOUND AI IP LLC

Language model training method and system, mobile terminal and storage medium

The invention provides a language model training method and system, a mobile terminal and a storage medium, and the method comprises the steps: obtaining a training text and a training vocabulary, carrying out the classification of the training text so as to obtain a plurality of language modules, and constructing a language dictionary corresponding to the language modules according to the training vocabulary; performing model training on a module language model in the language module according to the language dictionary, and training the training text to obtain a text language model; obtaining to-be-recognized voice to perform phoneme recognition to obtain a phoneme string, and matching the phoneme string with the module language model to obtain a phoneme matching result; and performing probability calculation on the phoneme matching result through a text language model, and outputting the sentence corresponding to the maximum probability value. According to the method, the training efficiency and accuracy of the language model are improved by classifying the training texts and constructing and designing the language dictionary, and the language model can be effectively expanded on the basis of the training design of the module language model and the training texts.
Owner:XIAMEN KUAISHANGTONG TECH CORP LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products