Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm

A deep neural network and posterior probability technology, applied in the field of oral pronunciation evaluation based on deep neural network algorithm, can solve problems such as time-consuming

Active Publication Date: 2015-04-29
SUZHOU CHIVOX INFORMATION TECH CO LTD
View PDF15 Cites 48 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Since REC is a Viterbi decoding process of an unconstrained phoneme sequence, it is larger and more time-consuming than the FA phoneme decoding network

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm
  • Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm
  • Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041] Deep neural network algorithm (DNN) is a new hot topic in the field of machine learning in industry and academia in recent years. The DNN algorithm has successfully improved the previous recognition rate to a significant level. Moreover, most current speech recognition systems use Hidden Markov Models (HMMs) to deal with real-time changes in speech, use Gaussian mixture models to determine how well each state of each HMM model matches acoustic observations, and another method to evaluate the matching The degree method is to use a feedforward neural network (NN), and the deep neural network (DNN) is a neural network with more hidden layers. The DNN method has been proved to be better than the Gaussian mixture model in various speech recognition. The benchmark performance has been greatly improved.

[0042] From the traditional oral pronunciation evaluation method, we can see that to improve the quality of the oral evaluation algorithm, we need a high-quality acoustic mo...

Embodiment 2

[0056] According to this embodiment, the specific solutions of the above embodiments will be described in more detail.

[0057] First, the speech is extracted frame by frame as a sequence of feature vectors.

[0058] Common speech features include Perceptual Linear Prediction (PLP) and Mel Cepstral Coefficient (MFCC) features. Then, according to the trained acoustic model DNN+HMM, the given oral evaluation text, and the corresponding word pronunciation dictionary, the time boundary of the phoneme state is determined through the Viterbi algorithm.

[0059] After determining the time boundary, extract the DNN posterior probability corresponding to all frames in the time boundary, and take the average value according to the frame length as the posterior probability of the phoneme state, so we have the word posterior score calculation based on the phoneme state posterior plan:

[0060] P ( word ) = ...

Embodiment 3

[0073] To sum up, our oral evaluation algorithm based on DNN posterior is as follows:

[0074] Step 1: Extract audio features.

[0075] Step 2: Input the audio features into the pre-trained DNN+HMM model, and use the Viterbi algorithm to determine the phone boundary of the sentence read by the speaker and the corresponding DNN posterior probability according to the given text and pronunciation dictionary.

[0076] Step 3: Calculate the word-level score using formula (1)

[0077] Step 4: Calculate the sentence-level score using formula (2)

[0078] Step 5: Finally, the word-level and sentence-level posterior scores are mapped to the required score segments through a preset mapping function.

[0079] In addition, in the above steps, in steps 3 and 4, the posterior probability of the phoneme state can adopt the following optimal calculation scheme:

[0080] According to the centralphone posterior probability calculation scheme, the posterior probability of each phoneme state i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a spoken language pronunciation detecting and evaluating method based on a deep neural network posterior probability algorithm. The method comprises the following steps: firstly, extracting voice to be an audio feature vector sequence by frames; secondly, inputting audio features into a model which is train in advance, a spoken language detecting and evaluating text and a corresponding word pronunciation dictionary, determining the time boundary of phoneme state, and ensuring that the model is a DNN plus HMM model; thirdly, extracting all frames within the time boundary after the time boundary is determined, averaging the frame sizes of voice frames, taking the average value as the posterior probability of the phoneme state, obtaining a word posterior score based on phoneme state posterior, and ensuring that the word posterior score is the average value of phoneme state posterior scores contained in the word posterior score.

Description

technical field [0001] The invention belongs to the field of language recognition, and relates to a method for evaluating spoken pronunciation based on a deep neural network algorithm. Background technique [0002] The current globalization of people in different language areas has accelerated the requirement for foreign language proficiency, and for learners of English as a second language, computer-assisted language learning is very helpful. Computer-aided pronunciation training, aimed at assessing a student's speech proficiency and detecting or identifying pronunciation errors or deficiencies with a high degree of accuracy, remains a challenging area of ​​research. [0003] The purpose of the traditional oral assessment scheme is to give a score based on the phoneme. When calculating this score, it is assumed that a GMM+HMM model can well determine the likelihood probability of the phonemes corresponding to these segments based on certain acoustic segments, and then meas...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/00G10L15/06G10L15/14G10L25/69
Inventor 惠寅华王欢良杨嵩代大明袁军峰林远东
Owner SUZHOU CHIVOX INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products