Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm

A deep neural network and posterior probability technology, applied in the field of oral pronunciation evaluation based on deep neural network algorithm, can solve problems such as time-consuming

Active Publication Date: 2015-04-29
SUZHOU CHIVOX INFORMATION TECH CO LTD
View PDF15 Cites 48 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Since REC is a Viterbi decoding process of an unconstrained phoneme sequ

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm
  • Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm
  • Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0040] Example one:

[0041] Deep neural network algorithm (DNN) is a new hot topic in the field of machine learning in industry and academia in recent years. The DNN algorithm has successfully improved the previous recognition rate by a significant level. In addition, most current speech recognition systems use Hidden Markov Models (HMM) to process real-time changes in speech, Gaussian mixture models are used to determine the degree of matching of each state of each HMM model to acoustic observations, and the other is to evaluate matching The degree method is to use a feedforward neural network (NN), and the deep neural network (DNN) is a neural network with more hidden layers. The DNN method has been proven to be compared with the Gaussian mixture model, which is used in various speech recognition The benchmark performance has been greatly improved.

[0042] From the traditional traditional spoken language pronunciation evaluation methods, we can see that to improve the quality...

Example Embodiment

[0055] Embodiment two:

[0056] According to this embodiment, the specific solutions of the above embodiments are described in more detail.

[0057] First, the speech is extracted into a sequence of feature vectors by frame.

[0058] Common speech features include perceptual linear prediction feature (PLP) and Mel cepstrum coefficient (MFCC) features. Then, according to the trained acoustic model DNN+HMM, the given spoken language evaluation text, and the corresponding word pronunciation dictionary, the time boundary of the phoneme state is determined through the Viterbi algorithm.

[0059] After determining the time boundary, extract the DNN posterior probabilities corresponding to all frames within the time boundary, and take the average according to the frame length as the posterior probability of the phoneme state, so we have a word posterior score calculation based on the phoneme state posterior Program:

[0060] P ( word ) = 1 n X j = 0 n 1 ...

Example Embodiment

[0072] Embodiment three:

[0073] In summary, our oral evaluation algorithm based on DNN posterior is as follows:

[0074] Step 1: Extract audio features.

[0075] Step 2: Input the audio features into the pre-trained DNN+HMM model, and use the Viterbi algorithm to determine the phone boundary of the sentence read by the speaker and the corresponding DNN posterior probability according to the given text and pronunciation dictionary.

[0076] Step 3: Use formula (1) to calculate word-level score

[0077] Step 4: Use formula (2) to calculate sentence-level score

[0078] Step 5: Finally, the word-level and sentence-level posterior scores are mapped to the required score segments through the preset mapping function.

[0079] In addition, in the above steps, in steps 3 and 4, the posterior probability of the phoneme state can adopt the following preferred calculation scheme:

[0080] According to the centralphone posterior probability calculation scheme, adjust the posterior probability of eac...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a spoken language pronunciation detecting and evaluating method based on a deep neural network posterior probability algorithm. The method comprises the following steps: firstly, extracting voice to be an audio feature vector sequence by frames; secondly, inputting audio features into a model which is train in advance, a spoken language detecting and evaluating text and a corresponding word pronunciation dictionary, determining the time boundary of phoneme state, and ensuring that the model is a DNN plus HMM model; thirdly, extracting all frames within the time boundary after the time boundary is determined, averaging the frame sizes of voice frames, taking the average value as the posterior probability of the phoneme state, obtaining a word posterior score based on phoneme state posterior, and ensuring that the word posterior score is the average value of phoneme state posterior scores contained in the word posterior score.

Description

technical field [0001] The invention belongs to the field of language recognition, and relates to a method for evaluating spoken pronunciation based on a deep neural network algorithm. Background technique [0002] The current globalization of people in different language areas has accelerated the requirement for foreign language proficiency, and for learners of English as a second language, computer-assisted language learning is very helpful. Computer-aided pronunciation training, aimed at assessing a student's speech proficiency and detecting or identifying pronunciation errors or deficiencies with a high degree of accuracy, remains a challenging area of ​​research. [0003] The purpose of the traditional oral assessment scheme is to give a score based on the phoneme. When calculating this score, it is assumed that a GMM+HMM model can well determine the likelihood probability of the phonemes corresponding to these segments based on certain acoustic segments, and then meas...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L15/00G10L15/06G10L15/14G10L25/69
Inventor 惠寅华王欢良杨嵩代大明袁军峰林远东
Owner SUZHOU CHIVOX INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products