Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A deep neural network and posterior probability technology, applied in the field of oral pronunciation evaluation based on deep neural network algorithm, can solve problems such as time-consuming

Active Publication Date: 2015-04-29

SUZHOU CHIVOX INFORMATION TECH CO LTD

View PDF15 Cites 48 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] Since REC is a Viterbi decoding process of an unconstrained phoneme sequence, it is larger and more time-consuming than the FA phoneme decoding network

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0041] Deep neural network algorithm (DNN) is a new hot topic in the field of machine learning in industry and academia in recent years. The DNN algorithm has successfully improved the previous recognition rate to a significant level. Moreover, most current speech recognition systems use Hidden Markov Models (HMMs) to deal with real-time changes in speech, use Gaussian mixture models to determine how well each state of each HMM model matches acoustic observations, and another method to evaluate the matching The degree method is to use a feedforward neural network (NN), and the deep neural network (DNN) is a neural network with more hidden layers. The DNN method has been proved to be better than the Gaussian mixture model in various speech recognition. The benchmark performance has been greatly improved.

[0042] From the traditional oral pronunciation evaluation method, we can see that to improve the quality of the oral evaluation algorithm, we need a high-quality acoustic mo...

Embodiment 2

[0056] According to this embodiment, the specific solutions of the above embodiments will be described in more detail.

[0057] First, the speech is extracted frame by frame as a sequence of feature vectors.

[0058] Common speech features include Perceptual Linear Prediction (PLP) and Mel Cepstral Coefficient (MFCC) features. Then, according to the trained acoustic model DNN+HMM, the given oral evaluation text, and the corresponding word pronunciation dictionary, the time boundary of the phoneme state is determined through the Viterbi algorithm.

[0059] After determining the time boundary, extract the DNN posterior probability corresponding to all frames in the time boundary, and take the average value according to the frame length as the posterior probability of the phoneme state, so we have the word posterior score calculation based on the phoneme state posterior plan:

[0060] P ( word ) = ...

Embodiment 3

[0073] To sum up, our oral evaluation algorithm based on DNN posterior is as follows:

[0074] Step 1: Extract audio features.

[0075] Step 2: Input the audio features into the pre-trained DNN+HMM model, and use the Viterbi algorithm to determine the phone boundary of the sentence read by the speaker and the corresponding DNN posterior probability according to the given text and pronunciation dictionary.

[0076] Step 3: Calculate the word-level score using formula (1)

[0077] Step 4: Calculate the sentence-level score using formula (2)

[0078] Step 5: Finally, the word-level and sentence-level posterior scores are mapped to the required score segments through a preset mapping function.

[0079] In addition, in the above steps, in steps 3 and 4, the posterior probability of the phoneme state can adopt the following optimal calculation scheme:

[0080] According to the centralphone posterior probability calculation scheme, the posterior probability of each phoneme state i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a spoken language pronunciation detecting and evaluating method based on a deep neural network posterior probability algorithm. The method comprises the following steps: firstly, extracting voice to be an audio feature vector sequence by frames; secondly, inputting audio features into a model which is train in advance, a spoken language detecting and evaluating text and a corresponding word pronunciation dictionary, determining the time boundary of phoneme state, and ensuring that the model is a DNN plus HMM model; thirdly, extracting all frames within the time boundary after the time boundary is determined, averaging the frame sizes of voice frames, taking the average value as the posterior probability of the phoneme state, obtaining a word posterior score based on phoneme state posterior, and ensuring that the word posterior score is the average value of phoneme state posterior scores contained in the word posterior score.

Description

technical field [0001] The invention belongs to the field of language recognition, and relates to a method for evaluating spoken pronunciation based on a deep neural network algorithm. Background technique [0002] The current globalization of people in different language areas has accelerated the requirement for foreign language proficiency, and for learners of English as a second language, computer-assisted language learning is very helpful. Computer-aided pronunciation training, aimed at assessing a student's speech proficiency and detecting or identifying pronunciation errors or deficiencies with a high degree of accuracy, remains a challenging area of research. [0003] The purpose of the traditional oral assessment scheme is to give a score based on the phoneme. When calculating this score, it is assumed that a GMM+HMM model can well determine the likelihood probability of the phonemes corresponding to these segments based on certain acoustic segments, and then meas...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/00G10L15/06G10L15/14G10L25/69

Inventor 惠寅华王欢良杨嵩代大明袁军峰林远东

Owner SUZHOU CHIVOX INFORMATION TECH CO LTD

Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

Agents

Company

Spoken language pronunciation detecting and evaluating method based on deep neural network posterior probability algorithm

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

Agents

Company

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology