Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm
A deep neural network and posterior probability technology, applied in the field of oral pronunciation evaluation based on deep neural network algorithm, can solve problems such as time-consuming
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Example Embodiment
[0040] Example one:
[0041] Deep neural network algorithm (DNN) is a new hot topic in the field of machine learning in industry and academia in recent years. The DNN algorithm has successfully improved the previous recognition rate by a significant level. In addition, most current speech recognition systems use Hidden Markov Models (HMM) to process real-time changes in speech, Gaussian mixture models are used to determine the degree of matching of each state of each HMM model to acoustic observations, and the other is to evaluate matching The degree method is to use a feedforward neural network (NN), and the deep neural network (DNN) is a neural network with more hidden layers. The DNN method has been proven to be compared with the Gaussian mixture model, which is used in various speech recognition The benchmark performance has been greatly improved.
[0042] From the traditional traditional spoken language pronunciation evaluation methods, we can see that to improve the quality...
Example Embodiment
[0055] Embodiment two:
[0056] According to this embodiment, the specific solutions of the above embodiments are described in more detail.
[0057] First, the speech is extracted into a sequence of feature vectors by frame.
[0058] Common speech features include perceptual linear prediction feature (PLP) and Mel cepstrum coefficient (MFCC) features. Then, according to the trained acoustic model DNN+HMM, the given spoken language evaluation text, and the corresponding word pronunciation dictionary, the time boundary of the phoneme state is determined through the Viterbi algorithm.
[0059] After determining the time boundary, extract the DNN posterior probabilities corresponding to all frames within the time boundary, and take the average according to the frame length as the posterior probability of the phoneme state, so we have a word posterior score calculation based on the phoneme state posterior Program:
[0060] P ( word ) = 1 n X j = 0 n 1 ...
Example Embodiment
[0072] Embodiment three:
[0073] In summary, our oral evaluation algorithm based on DNN posterior is as follows:
[0074] Step 1: Extract audio features.
[0075] Step 2: Input the audio features into the pre-trained DNN+HMM model, and use the Viterbi algorithm to determine the phone boundary of the sentence read by the speaker and the corresponding DNN posterior probability according to the given text and pronunciation dictionary.
[0076] Step 3: Use formula (1) to calculate word-level score
[0077] Step 4: Use formula (2) to calculate sentence-level score
[0078] Step 5: Finally, the word-level and sentence-level posterior scores are mapped to the required score segments through the preset mapping function.
[0079] In addition, in the above steps, in steps 3 and 4, the posterior probability of the phoneme state can adopt the following preferred calculation scheme:
[0080] According to the centralphone posterior probability calculation scheme, adjust the posterior probability of eac...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap