Patents

Literature

Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.

74 results about "Phoneme recognition" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Phoneme recognition is carried out using the acoustic model. The acoustic model is created using machine learning algorithms. The machine learning is divided into two phases: training and testing.

Speech recognition apparatus, speech recognition method, and speech recognition robot

ActiveUS8886534B2Low correct answer rateAvoid correcting a phonemeSpeech recognitionPhoneme recognitionSpeech input

A speech recognition apparatus includes a speech input unit that receives input speech, a phoneme recognition unit that recognizes phonemes of the input speech and generates a first phoneme sequence representing corrected speech, a matching unit that matches the first phoneme sequence with a second phoneme sequence representing original speech, and a phoneme correcting unit that corrects phonemes of the second phoneme sequence based on the matching result.

Speech recognition apparatus, speech recognition method, and speech recognition robot

Speech recognition apparatus, speech recognition method, and speech recognition robot

Speech recognition apparatus, speech recognition method, and speech recognition robot

Owner:HONDA MOTOR CO LTD

Method for Automated Training of a Plurality of Artificial Neural Networks

ActiveUS20100217589A1High phoneme recognition accuracyImprove networkingSpeech recognitionNeural architecturesPattern recognitionPhoneme recognition

The invention provides a method for automated training of a plurality of artificial neural networks for phoneme recognition using training data, wherein the training data comprises speech signals subdivided into frames, each frame associated with a phoneme label, wherein the phoneme label indicates a phoneme associated with the frame. A sequence of frames from the training data are provided, wherein the number of frames in the sequence of frames is at least equal to the number of artificial neural networks. Each of the artificial neural networks is assigned a different subsequence of the provided sequence, wherein each subsequence comprises a predetermined number of frames. A common phoneme label for the sequence of frames is determined based on the phoneme labels of one or more frames of one or more subsequences of the provided sequence. Each artificial neural network using the common phoneme label.

Method for Automated Training of a Plurality of Artificial Neural Networks

Method for Automated Training of a Plurality of Artificial Neural Networks

Method for Automated Training of a Plurality of Artificial Neural Networks

Owner:CERENCE OPERATING CO

Multi-phoneme streamer and knowledge representation speech recognition system and method

ActiveUS7286987B2Adequate responseImprove power balanceSpeech recognitionSpeech identificationCall routing

A system and method related to a new approach to speech recognition that reacts to concepts conveyed through speech. In its fullest implementation, the system and method shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. This is done by using a probabilistically unbiased multi-phoneme recognition process, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance's conceptual representation that can be used to produce an adequate response. The invention can be employed for a myriad of applications, such as improving accuracy or automatically generating punctuation for transcription and dictation, word or concept spotting in audio streams, concept spotting in electronic text, customer support, call routing and other command / response scenarios.

Multi-phoneme streamer and knowledge representation speech recognition system and method

Multi-phoneme streamer and knowledge representation speech recognition system and method

Multi-phoneme streamer and knowledge representation speech recognition system and method

Owner:CHEMTRON RES

Systems and methods for combining subword recognition and whole word recognition of a spoken input

InactiveUS6985861B2Build accuratelyEfficient productionSpeech recognitionPhoneme recognitionSpoken language

A computer-based detection (e.g., speech recognition) system combines a word decoder and subword decoder to detect words (or phrases) in a spoken input provided by a user into a speaker connected to the detection system. The word decoder detects words by comparing an input pattern (e.g., of hypothetical word matches) to reference patterns (e.g., words). The subword decoder compares an input pattern (e.g., hypothetical words matches based on subword or phoneme recognition) to reference patterns (e.g., words) based on a word pronunciation distance measure that indicates how close each input pattern is to matching each reference pattern. The subword decoder sorts the source set of reference patterns based on a closeness of each reference pattern to correctly matching the input pattern based on generated pattern comparisons. The word decoder and subword decoder each provide an N-best list of hypothetical matches to the spoken input. A list fusion module of the detection system selectively combines the two N-best lists to produce a final or combined N-best list. The final or combined list has a predefined number of matches.

Systems and methods for combining subword recognition and whole word recognition of a spoken input

Systems and methods for combining subword recognition and whole word recognition of a spoken input

Systems and methods for combining subword recognition and whole word recognition of a spoken input

Owner:HEWLETT PACKARD DEV CO LP

Downsampling Schemes in a Hierarchical Neural Network Structure for Phoneme Recognition

InactiveUS20120239403A1Speech recognitionPattern recognitionPhoneme recognition

An approach for phoneme recognition is described. A sequence of intermediate output posterior vectors is generated from an input sequence of cepstral features using a first layer perceptron. The intermediate output posterior vectors are then downsampled to form a reduced input set of intermediate posterior vectors for a second layer perceptron. A sequence of final posterior vectors is generated from the reduced input set of intermediate posterior vectors using the second layer perceptron. Then the final posterior vectors are decoded to determine an output recognized phoneme sequence representative of the input sequence of cepstral features.

Downsampling Schemes in a Hierarchical Neural Network Structure for Phoneme Recognition

Downsampling Schemes in a Hierarchical Neural Network Structure for Phoneme Recognition

Downsampling Schemes in a Hierarchical Neural Network Structure for Phoneme Recognition

Owner:NUANCE COMM INC

Network teaching method and system with voice assessment function

ActiveCN105578115ATroubleshoot slow evaluation performanceImprove accuracyTelevision system detailsData processing applicationsSpoken languageMulti language

The invention provides a network teaching method and system with a voice assessment function. According to the voice assessment method provided by the invention, a phoneme state of a voice is used for replacing a multi-Gaussian mixture model trained by a conventional Mel-frequency cepstral coefficient (MFCC), and a posterior probability and a zero-order Baum-Welch statistical magnitude are calculated according to the feature. A voice feature based on phonemes is extracted through a multi-language phoneme identifier. A feature based on multi-language extraction is complementary during catching of non-native pronunciation information, and a feature based on phoneme duration is effective in automatic native accent assessment. Finally, a fusion system is provided in the method, so that Spearman relevant coefficients of 0.5706 and 0.6089 are reached on a development set and a test set. As indicated by the relevant coefficients, the method provided by the invention is very accurate and effective in oral speech assessment.

Network teaching method and system with voice assessment function

Network teaching method and system with voice assessment function

Network teaching method and system with voice assessment function

Owner:SHENZHEN EAGLESOUL EDUCATION SERVICE CO LTD

Method and apparatus for recognizing speech

InactiveUS20090076817A1Improve speech recognition performanceSpeech recognitionPhoneme recognitionSpeech identification

Provided are an apparatus and method for recognizing speech, in which reliability with respect to phoneme-recognized phoneme sequences is calculated and performance of speech recognition is enhanced using the calculated results. The method of recognizing speech includes the steps of: determining a boundary between phonemes included in character sequences that are phonetically input to detect each phoneme interval; calculating reliability according to a probability that a phoneme indicated by the detected phoneme interval corresponds to a phoneme included in a predefined phoneme model; calculating a phoneme alignment cost with respect to the character sequences based on the calculated reliability and a pre-trained and stored phoneme recognition probability distribution; and performing phoneme alignment based on the calculated phoneme alignment cost to perform speech recognition on the input character sequences. As a result, reliability with respect to the phoneme-recognized phoneme sequences can be calculated, and the performance of speech recognition can be enhanced using the calculated results.

Method and apparatus for recognizing speech

Method and apparatus for recognizing speech

Method and apparatus for recognizing speech

Owner:ELECTRONICS & TELECOMM RES INST

Phonetic, syntactic and conceptual analysis driven speech recognition system and method

InactiveUS20090063147A1Adequate responseImprove power balanceSpeech recognitionSpeech identificationCall routing

A new approach to speech recognition that reacts to concepts conveyed through speech, which shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. A probabilistically unbiased multi-phoneme recognition process is employed, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance's conceptual representation that can be used to produce an adequate response. Applications include improving accuracy or automatically generating punctuation for transcription and dictation, word or concept spotting in audio streams, concept spotting in electronic text, customer support, call routing and other command / response scenarios.

Phonetic, syntactic and conceptual analysis driven speech recognition system and method

Phonetic, syntactic and conceptual analysis driven speech recognition system and method

Phonetic, syntactic and conceptual analysis driven speech recognition system and method

Owner:CHEMTRON RES

Microphone assembly comprising a phoneme recognizer

InactiveUS20170154620A1Speech recognitionSound input/outputPhoneme recognitionDigital filter

The present invention relates to a microphone assembly comprising a phoneme recognizer. The phoneme recognizer comprises an artificial neural network (ANN) comprising at least one phoneme expect pattern and a digital processor configured to repeatedly applying one or more sets of frequency components derived from a digital filter bank to respective inputs of an artificial neural network. The artificial neural network is configured to detect and indicate a match between the at least one phoneme expect pattern and the one or more sets of frequency components.

Microphone assembly comprising a phoneme recognizer

Microphone assembly comprising a phoneme recognizer

Microphone assembly comprising a phoneme recognizer

Owner:KNOWLES ELECTRONICS INC

Identity uniformity check method and device based on spectrogram and phoneme retrieval

ActiveCN107680601AImprove identification efficiencyImprove accuracySpeech analysisPhoneme recognitionPhase retrieval

The invention provides an identity uniformity check method and device based on a spectrogram and phoneme retrieval. The method includes the following steps: acquiring a spectrogram corresponding to asample audio file; acquiring the speech feature parameters of the sample audio file; constructing a phoneme recognition model, inputting the speech feature parameters to the phoneme recognition modeland carrying out phoneme retrieval to get qualified phonemes; and marking the qualified phonemes on the spectrogram, checking the uniformity of vowels or vowel combinations with the same identifier, and judging whether a to-be-identified person corresponding to the sample audio file passes identity verification. The technical problem on phoneme searching and finding in actual voiceprint authentication is solved. Phonemes can be displayed visually. The identification efficiency of investigators is improved.

Identity uniformity check method and device based on spectrogram and phoneme retrieval

Identity uniformity check method and device based on spectrogram and phoneme retrieval

Identity uniformity check method and device based on spectrogram and phoneme retrieval

Owner:SPEAKIN TECH CO LTD

DNN (Deep Neural Network)-HMM (Hidden Markov Model)-based civil aviation radiotelephony communication acoustic model construction method

InactiveCN109119072AReduce false recognition rateSpeech recognitionPhoneme recognitionHide markov model

The invention relates to a DNN (Deep Neural Network)-HMM (Hidden Markov Model)-based civil aviation radiotelephony communication acoustic model construction method. The method includes the following steps that: a Chinese radiotelephony communication corpus is set up; civil aviation radiotelephony communication speech signals are pre-processed; Fbank features are extracted from the civil aviation radiotelephony communication speech signals and are adopted as civil aviation radiotelephony communication speech features; linear discrimination analysis, feature space maximum likelihood regression transformation and speaker adaptive training transformation processing are performed on the civil aviation radiotelephony communication speech features; and the processed speech features are utilized to build a DNN-HMM-based radiotelephony communication acoustic model. With the method of the invention adopted, the FBANK and MFCC features of radiotelephony communication speech are extracted to traina DNN network, so that the DNN-HMM acoustic model suitable for radiotelephony communication speech recognition can be obtained; and since a dictionary and a language model are combined, so that the feature enhanced DNN-HMM model can reduce the phoneme recognition error rate of the radiotelephony communication speech to 5.62% on the basis of constructed data.

DNN (Deep Neural Network)-HMM (Hidden Markov Model)-based civil aviation radiotelephony communication acoustic model construction method

DNN (Deep Neural Network)-HMM (Hidden Markov Model)-based civil aviation radiotelephony communication acoustic model construction method

DNN (Deep Neural Network)-HMM (Hidden Markov Model)-based civil aviation radiotelephony communication acoustic model construction method

Owner:CIVIL AVIATION UNIV OF CHINA

Lexical acquisition apparatus, multi dialogue behavior system, and lexical acquisition program

InactiveUS20100332231A1Speech recognitionPhoneme recognitionWord list

A lexical acquisition apparatus includes: a phoneme recognition section 2 for preparing a phoneme sequence candidate from an inputted speech; a word matching section 3 for preparing a plurality of word sequences based on the phoneme sequence candidate; a discrimination section 4 for selecting, from among a plurality of word sequences, a word sequence having a high likelihood in a recognition result; an acquisition section 5 for acquiring a new word based on the word sequence selected by the discrimination section 4; a teaching word list 4A used to teach a name; and a probability model 4B of the teaching word and an unknown word, wherein the discrimination section 4 calculates, for each word sequence, a first evaluation value showing how much words in the word sequence correspond to teaching words in the list 4A and a second evaluation value showing a probability at which the words in the word sequence are adjacent to one another and selects a word sequence for which a sum of the first evaluation value and the second evaluation value is maximum, and wherein the acquisition section 5 acquires, as a new word, a word in the word sequence selected by the discrimination section that is not involved in the calculation of the first evaluation value.

Lexical acquisition apparatus, multi dialogue behavior system, and lexical acquisition program

Lexical acquisition apparatus, multi dialogue behavior system, and lexical acquisition program

Lexical acquisition apparatus, multi dialogue behavior system, and lexical acquisition program

Owner:HONDA MOTOR CO LTD +1

Voice recognizing method and device, as well as neural network training method and device

ActiveCN110600018AImprove experienceEasy to integrateSpeech recognitionNeural learning methodsPhoneme recognitionFrequency spectrum

The invention provides a voice recognizing method and a voice recognizing device, as well as a neural network training method and a neural network training device, and relates to the technical field of artificial intelligence. The neural network training method comprises the following steps: acquiring sample data, wherein the sample data comprises mixed voice frequency spectra and labelling phoneme of the mixed voice frequency spectra; extracting the target voice frequency spectrum from the mixed voice frequency spectra through a first sub-network; carrying out adaptive conversion on the target voice frequency spectrum through a second sub-network, so that middle transition representation is obtained; carrying out phoneme recognition based on the middle transition representation through athird sub-network; and according to the phoneme recognition result and the labelling phoneme, updating the parameters of the first sub-network, the second sub-network and the third sub-network. With the technical scheme, the voice recognition performance under the complicated interference sound condition can be promoted.

Voice recognizing method and device, as well as neural network training method and device

Voice recognizing method and device, as well as neural network training method and device

Voice recognizing method and device, as well as neural network training method and device

Owner:TENCENT TECH (SHENZHEN) CO LTD

Method for estimating language model weight and system for the same

InactiveUS20120150539A1Speech recognitionFeature vectorPhoneme recognition

Method of the present invention may include receiving speech feature vector converted from speech signal, performing first search by applying first language model to the received speech feature vector, and outputting word lattice and first acoustic score of the word lattice as continuous speech recognition result, outputting second acoustic score as phoneme recognition result by applying an acoustic model to the speech feature vector, comparing the first acoustic score of the continuous speech recognition result with the second acoustic score of the phoneme recognition result, outputting first language model weight when the first coustic score of the continuous speech recognition result is better than the second acoustic score of the phoneme recognition result and performing a second search by applying a second language model weight, which is the same as the output first language model, to the word lattice.

Method for estimating language model weight and system for the same

Method for estimating language model weight and system for the same

Method for estimating language model weight and system for the same

Owner:ELECTRONICS & TELECOMM RES INST

Method of recognizing speech and electronic device thereof

InactiveCN103544955ASpeech recognitionPhoneme recognitionAcoustic model

A method of recognizing a speech and an electronic device thereof are provided. The method includes: segmenting a speech signal into a plurality of sections at preset time intervals; performing a phoneme recognition with respect to one of the plurality of sections of the speech signal by using a first acoustic model; extracting a candidate word of the one of the plurality of sections of the speech signal by using the phoneme recognition result; and performing a speech recognition with respect to the one the plurality of sections the speech signal by using the candidate word.

Method of recognizing speech and electronic device thereof

Method of recognizing speech and electronic device thereof

Method of recognizing speech and electronic device thereof

Owner:SAMSUNG ELECTRONICS CO LTD

Automatic pattern recognition using category dependent feature selection

InactiveUS20080147402A1Character and pattern recognitionSpeech recognitionFeature vectorData set

Disclosed are apparatus and methods that employ a modified version of a computational model of the human peripheral and central auditory system, and that provide for automatic pattern recognition using category dependent feature selection. The validity of the output of the model is examined by deriving feature vectors from the dimension expanded cortical response of the central auditory system for use in a conventional phoneme recognition task. In addition, the cortical response may be a place-coded data set where sounds are categorized according to the regions containing their most distinguishing features. This provides for a novel category-dependent feature selection apparatus and methods in which this mechanism may be utilized to better simulate robust human pattern (speech) recognition.

Automatic pattern recognition using category dependent feature selection

Automatic pattern recognition using category dependent feature selection

Automatic pattern recognition using category dependent feature selection

Owner:GEORGIA TECH RES CORP

Method for automated training of a plurality of artificial neural networks

ActiveUS8554555B2Improve networkingEasy to identifySpeech recognitionNeural architecturesPattern recognitionPhoneme recognition

The invention provides a method for automated training of a plurality of artificial neural networks for phoneme recognition using training data, wherein the training data comprises speech signals subdivided into frames, each frame associated with a phoneme label, wherein the phoneme label indicates a phoneme associated with the frame. A sequence of frames from the training data are provided, wherein the number of frames in the sequence of frames is at least equal to the number of artificial neural networks. Each of the artificial neural networks is assigned a different subsequence of the provided sequence, wherein each subsequence comprises a predetermined number of frames. A common phoneme label for the sequence of frames is determined based on the phoneme labels of one or more frames of one or more subsequences of the provided sequence. Each artificial neural network using the common phoneme label.

Method for automated training of a plurality of artificial neural networks

Method for automated training of a plurality of artificial neural networks

Method for automated training of a plurality of artificial neural networks

Owner:CERENCE OPERATING CO

Method and apparatus for context independent gender recognition utilizing phoneme transition probability

InactiveUS20140172428A1Discriminately distinguishingSpeech recognitionFeature vectorPhoneme recognition

Provided is a method for context independent gender recognition utilizing phoneme transition probability. The method for the context independent gender recognition includes detecting a voice section from a received voice signal, generating feature vectors within the detected voice section, performing a hidden Markov model on the feature vectors by using a search network that is set according to a phoneme rule to recognize a phoneme and obtain scores of first and second likelihoods, and comparing final scores of the first and second likelihoods obtained while the phoneme recognition is performed up to the last section of the voice section to finally decide gender with respect to the voice signal.

Method and apparatus for context independent gender recognition utilizing phoneme transition probability

Method and apparatus for context independent gender recognition utilizing phoneme transition probability

Method and apparatus for context independent gender recognition utilizing phoneme transition probability

Owner:ELECTRONICS & TELECOMM RES INST

Conceptual analysis driven data-mining and dictation system and method

InactiveUS20090048830A1Adequate responseImprove power balanceDigital computer detailsSpeech recognitionData dredgingSpeech identification

A new approach to speech recognition that reacts to concepts conveyed through speech, which shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. A probabilistically unbiased multi-phoneme recognition process is employed, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance's conceptual representation that can be used to produce an adequate response. Applications include improving accuracy or automatically generating punctuation for transcription and dictation, word or concept spotting in audio streams, concept spotting in electronic text, customer support, call routing and other command / response scenarios.

Conceptual analysis driven data-mining and dictation system and method

Conceptual analysis driven data-mining and dictation system and method

Conceptual analysis driven data-mining and dictation system and method

Owner:CHEMTRON RES

Recognition method and device for voice phoneme

ActiveCN109754789AMake up unary hypothesisCompensating for binary assumptionsSpeech recognitionLocal optimumPhoneme recognition

The invention discloses a recognition method and device for a voice phoneme, and relates to the technical field of voice recognition. A main purpose is to solve a problem of low phoneme segmentation efficiency or a locally optimal solution during voice recognition. According to the main technical scheme provided in the invention, the recognition method comprises the following steps of inputting ato-be-recognized voice into a phoneme recognition model, and obtaining, according to an output result, an expected result corresponding to the to-be-recognized voice, wherein the phoneme recognition model identifies each phoneme in the to-be-recognized voice through multiple neural network models and a hidden Markov model; training a model parameter in the phoneme recognition model according to the expected result until a rate of change of an output result of a phoneme model is less than a preset threshold value; and determining an output result with a rate of change less than the preset threshold value as a final phoneme recognition result corresponding to the to-be-recognized voice. The recognition method is mainly applied to a process of recognizing a sound.

Recognition method and device for voice phoneme

Recognition method and device for voice phoneme

Recognition method and device for voice phoneme

Owner:BEIJING GRIDSUM TECH CO LTD

Cross-language timbre conversion system and method based on zero-order learning

PendingCN112767958AAccurate modelingReduce dependenceSpeech recognitionPhoneme recognitionVoice change

The invention discloses a cross-language timbre conversion system and method based on zero-order learning. The system sequentially comprises a mixed phoneme recognition module, a timbre conversion module, a speaker coding module and a vocoder module. According to the system, a voice signal Mel spectrum serves as an input signal, bottleneck features of the voice signal Mel spectrum are extracted through the phoneme recognition module, the features are normalized and then transmitted to an acoustic model, the Mel spectrum synthesized by the acoustic model is controlled by controlling a speaker reference vector, and finally audio is synthesized through a vocoder. The system can convert the voice of a common speaker into the timbre of a specified speaker, is suitable for accent corpora which do not appear in a training database, can be suitable for voice change of dialects in multiple regions, and has a wide application prospect.

Cross-language timbre conversion system and method based on zero-order learning

Cross-language timbre conversion system and method based on zero-order learning

Cross-language timbre conversion system and method based on zero-order learning

Owner:SOUTH CHINA UNIV OF TECH

Voice adaptive completion system based on multi-modal knowledge graph

PendingCN113936637AReduce irrelevant variablesImprove interpretabilityCharacter and pattern recognitionNatural language data processingPhoneme recognitionKnowledge graph

The invention discloses a voice adaptive completion system based on a multi-modal knowledge graph. The system comprises a data receiver, a data analyzer and a data inference device. The data receiver preprocesses received audio and video data and outputs the audio and video data to the data analyzer; the data analyzer analyzes the voice and the image to extract waveform time sequence features and lip track features, and a phoneme sequence is obtained through multi-mode joint representation; and the data inference device carries out domain session modeling and candidate text prediction according to historical texts, text inference is carried out in combination with a phoneme sequence, statements with semantics are obtained, and complemented voice is synthesized according to waveform features. According to the invention, through a phoneme reasoning model, phoneme recognition is carried out when the voice modality is lost, the domain session modeling is carried out on the historical text generated by the existing voice according to the semantic relationship between the entities in the multi-modal knowledge graph, so that reasoning is carried out to generate the text with semantic, the voice is synthesized in combination with the waveform characteristics of the user voice, and the complemented audio is formed.

Voice adaptive completion system based on multi-modal knowledge graph

Voice adaptive completion system based on multi-modal knowledge graph

Voice adaptive completion system based on multi-modal knowledge graph

Owner:SHANGHAI JIAO TONG UNIV

Voice phoneme recognition method and device, storage medium and electronic device

ActiveCN110335592AImprove accuracySpeech recognitionAttention modelSelf attention

The invention discloses a voice phoneme recognition method, a voice phoneme recognition device, a storage medium and an electronic device. The method comprises the following steps: extracting a plurality of first voice features from a plurality of voice frames by using a shared coder; determining a plurality of key voice features from the plurality of first voice features by using a CTC model, wherein each key voice feature corresponds to a peak position output by the CTC model; determining a voice feature set corresponding to each key voice feature, wherein each voice feature set comprises acorresponding key voice feature and one or more voice features adjacent to the corresponding key voice feature in the plurality of first voice features; carrying out feature fusion on the voice features in each voice feature set by using self-attention network, thus obtaining a plurality of fused voice features, wherein each voice feature set corresponds to one fused voice feature; and recognizingthe phoneme corresponding to each fused voice feature in the phoneme set by using a coder of a target attention model.

Voice phoneme recognition method and device, storage medium and electronic device

Voice phoneme recognition method and device, storage medium and electronic device

Voice phoneme recognition method and device, storage medium and electronic device

Owner:TENCENT TECH (SHENZHEN) CO LTD

Interactive language learning system and method thereof

PendingCN108806719ACorrect unintentional errorsSpeech recognitionElectrical appliancesSpoken languagePhoneme recognition

The invention discloses an interactive language learning system and a method thereof. The interactive language learning system comprises a voice reference module, a characteristic extracting module, aphoneme associating module, a voice learning module, a phoneme correction module, a correction suggestion module, a phoneme evaluating module, a voice feedback module and a corpus, wherein the voicelearning module is used for collecting voice data designated by aloud reading of a learner; the phoneme correction module is used for synthesizing feedback voice having reference voice rhythm and learner's tone, and rhythm corrected voice can guide the learner to simulate rhythm of the reference voice; the correction suggestion module, the phoneme evaluating module and the voice feedback module are used for saving results in a data collection module; and the corpus is used for transmitting random spoken language information to the learner, and learner's learning is fed back to a database via the correction suggestion module. The interactive language learning system and the method provided by the invention, besides usual pronunciation evaluation, also provide an error detection function based on phoneme associating and phoneme recognition; and in combination with standard voice improvement suggestions and phoneme correction voice in the corpus, the learner can be helped timely, and mostunintentional errors of learners having certain basis can be corrected.

Interactive language learning system and method thereof

Owner:合肥凌极西雅电子科技有限公司

Spoken language pronunciation evaluation method based on deep neural network posterior probability algorithm

InactiveCN108364634AAccurate Voice Evaluation ResultsSpeech recognitionEvaluation resultPhoneme recognition

The present invention discloses a spoken language pronunciation evaluation method based on a deep neural network posterior probability algorithm. The method comprises the following steps of: selectinga certain amount of voice frequencies from voice, wherein the number of words of each voice frequency is in a certain range, calculating the average likelihood of the phoneme of one word, the averageEGOP of the phoneme of one word and the average duration probability of the phoneme of one word in each voice frequency; and taking the average likelihood of the phoneme of one word, the average EGOPof the phoneme of one word and the average duration probability of the phoneme of one word in each voice frequency as input items, inputting the average likelihood of the phoneme of one word, the average EGOP of the phoneme of one word and the average duration probability of the phoneme of one word in each voice frequency into a neural network, and outputting scores of words. The spoken languagepronunciation evaluation method based on a deep neural network posterior probability algorithm starts from an acoustic model, the LSTM modeling is employed to improve the phoneme recognition rate, theFA likelihood and all the similar phoneme likelihoods are compared, a GOP method is extended to an EGOP method, an artificial neural network scoring model is employed to perform scoring so as to obtain an accurate voice evaluation result.

Spoken language pronunciation evaluation method based on deep neural network posterior probability algorithm

Spoken language pronunciation evaluation method based on deep neural network posterior probability algorithm

Owner:苏州声通信息科技有限公司

Lip language combination method and device, electronic device and storage medium

ActiveCN108831463AAchieve synthesisSynthetic high-fidelitySpeech recognitionPhoneme recognitionSpeech identification

The embodiment of the invention discloses a lip language combination method and device, an electronic device and a storage medium. The method comprises the steps of: performing automatic speech recognition, performing phoneme recognition according to a recognition result, determining a time interval of the phonemes in the speech signals to achieve conversion of original speech signals to phonemeswith period information (namely pronunciation duration of the phonemes in the speech signals), and finally, combining a lip language through a corresponding relation of preset phonemes and a mouth shape. The method is employed to combine the lip language to improve the matching degree of the dynamic rhythms of the lip language and the rhythms of the speech, improve the mouth shape accuracy and achieve the combination of the lip language with high vividness while automatic combination of the lip language.

Lip language combination method and device, electronic device and storage medium

Lip language combination method and device, electronic device and storage medium

Lip language combination method and device, electronic device and storage medium

Owner:广州方硅信息技术有限公司

Lexical acquisition apparatus, multi dialogue behavior system, and lexical acquisition program

InactiveUS8566097B2Speech recognitionPhoneme recognitionLexical acquisition

A lexical acquisition apparatus includes: a phoneme recognition section 2 for preparing a phoneme sequence candidate from an inputted speech; a word matching section 3 for preparing a plurality of word sequences based on the phoneme sequence candidate; a discrimination section 4 for selecting, from among a plurality of word sequences, a word sequence having a high likelihood in a recognition result; an acquisition section 5 for acquiring a new word based on the word sequence selected by the discrimination section 4; a teaching word list 4A used to teach a name; and a probability model 4B of the teaching word and an unknown word, wherein the discrimination section 4 calculates, for each word sequence, a first evaluation value showing how much words in the word sequence correspond to teaching words in the list 4A and a second evaluation value showing a probability at which the words in the word sequence are adjacent to one another and selects a word sequence for which a sum of the first evaluation value and the second evaluation value is maximum, and wherein the acquisition section 5 acquires, as a new word, a word in the word sequence selected by the discrimination section that is not involved in the calculation of the first evaluation value.

Lexical acquisition apparatus, multi dialogue behavior system, and lexical acquisition program

Lexical acquisition apparatus, multi dialogue behavior system, and lexical acquisition program

Lexical acquisition apparatus, multi dialogue behavior system, and lexical acquisition program

Owner:HONDA MOTOR CO LTD +1

Method and apparatus for recognizing continuous speech using search space restriction based on phoneme recognition

ActiveUS20080133239A1Narrow down the search spaceImprove performanceSpeech recognitionPhoneme recognitionSpeech identification

Provided are an apparatus and method for recognizing continuous speech using search space restriction based on phoneme recognition. In the apparatus and method, a search space can be primarily reduced by restricting connection words to be shifted at a boundary between words based on the phoneme recognition result. In addition, the search space can be secondarily reduced by rapidly calculating a degree of similarity between the connection word to be shifted and the phoneme recognition result using a phoneme code and shifting the corresponding phonemes to only connection words having degrees of similarity equal to or higher than a predetermined reference value. Therefore, the speed and performance of the speech recognition process can be improved in various speech recognition services.

Method and apparatus for recognizing continuous speech using search space restriction based on phoneme recognition

Method and apparatus for recognizing continuous speech using search space restriction based on phoneme recognition

Method and apparatus for recognizing continuous speech using search space restriction based on phoneme recognition

Owner:ELECTRONICS & TELECOMM RES INST

System and method for detection and correction of incorrectly pronounced words

ActiveUS20200184958A1Correct pronunciationSound input/outputSpeech recognitionPhoneme recognitionSpeech sound

A system and method are disclosed for capturing a segment of speech audio, performing phoneme recognition on the segment of speech audio to produce a segmented phoneme sequence, comparing the segmented phoneme sequence to stored phoneme sequences that represent incorrect pronunciations of words to determine if there is a match, and identifying an incorrect pronunciation for a word in the segment of speech audio. The system builds a library based on the data collected for the incorrect pronunciations.

System and method for detection and correction of incorrectly pronounced words

System and method for detection and correction of incorrectly pronounced words

System and method for detection and correction of incorrectly pronounced words

Owner:SOUNDHOUND AI IP LLC

Language model training method and system, mobile terminal and storage medium

ActiveCN111192570AEffective expansionImprove training efficiencyCharacter and pattern recognitionNatural language data processingPhoneme recognitionLanguage module

The invention provides a language model training method and system, a mobile terminal and a storage medium, and the method comprises the steps: obtaining a training text and a training vocabulary, carrying out the classification of the training text so as to obtain a plurality of language modules, and constructing a language dictionary corresponding to the language modules according to the training vocabulary; performing model training on a module language model in the language module according to the language dictionary, and training the training text to obtain a text language model; obtaining to-be-recognized voice to perform phoneme recognition to obtain a phoneme string, and matching the phoneme string with the module language model to obtain a phoneme matching result; and performing probability calculation on the phoneme matching result through a text language model, and outputting the sentence corresponding to the maximum probability value. According to the method, the training efficiency and accuracy of the language model are improved by classifying the training texts and constructing and designing the language dictionary, and the language model can be effectively expanded on the basis of the training design of the module language model and the training texts.

Language model training method and system, mobile terminal and storage medium

Language model training method and system, mobile terminal and storage medium

Language model training method and system, mobile terminal and storage medium

Owner:XIAMEN KUAISHANGTONG TECH CORP LTD

Popular searches

Robot Training methods Speech recognition Knowledge representation and reasoning Complete Method Model transformation Statistical model Recognition system Grammaticality Utterance

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com