Unlock instant, AI-driven research and patent intelligence for your innovation.

Pronunciation mistake detection method and device based on depth learning

A technology of deep learning and detection methods, applied in speech analysis, instruments, etc., can solve problems such as poor phoneme performance, and achieve the effect of avoiding poor detection performance

Active Publication Date: 2017-01-04
SUZHOU CHIVOX INFORMATION TECH CO LTD
View PDF10 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the performance of some rare phonemes with fewer training samples is worse than that of phonemes with sufficient training samples.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Pronunciation mistake detection method and device based on depth learning
  • Pronunciation mistake detection method and device based on depth learning
  • Pronunciation mistake detection method and device based on depth learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] Such as figure 1 As shown, a mispronunciation detection method based on deep learning, including:

[0056] Step 1) extracting acoustic features by reading the audio, constructing a phoneme-level decoding network by reading the text and the corresponding word pronunciation dictionary;

[0057] Step 2) Decoding the phoneme-level decoding network in combination with the acoustic features and the pre-trained acoustic model to determine the boundaries of the phonemes to be detected;

[0058]Based on a deep autoencoder composed of a deep neural network, phoneme-level features are extracted according to the boundary of the phoneme and the acoustic features in the boundary, and the deep neural network is a deep Bayesian belief network;

[0059] The phoneme-level features of the phoneme to be detected are sent to the pre-trained correct pronunciation classifier to judge the correct pronunciation of the detected phoneme.

[0060] After the present invention takes the above sche...

Embodiment 2

[0064] The above-mentioned embodiments will be described in detail in combination with the following embodiments, wherein, in the process of phoneme-level feature extraction, specifically include:

[0065] Through the causal relationship between the nodes of the deep Bayesian network, the probability values ​​of a group of nodes are calculated to form a vector, which is used as a phoneme-level feature.

[0066] Preferably, the conditional probability value of the causal relationship between nodes in the deep Bayesian network is obtained from a large amount of statistics.

[0067] Preferably, in step 2), also include:

[0068] Using a deep neural network as a classifier, so that all phonemes can share the hidden layer in the deep neural network when training the classifier;

[0069] Wherein, the hidden layer is the remaining layer of the in-out layer and the output layer in the multi-layer graph structure of the deep neural network.

[0070] Preferably, in step 2), specifical...

Embodiment 3

[0077] Such as figure 2 As shown, in one embodiment, the present invention adopts a deep automatic encoding (DAE) method based on deep learning technology, and extracts more abstract and general features to represent phonemes.

[0078] At the same time, the deep neural network is used as the classifier, so that all phonemes can share the hidden layer in the deep neural network when training the classifier, thereby avoiding the problem of poor performance of scarce phoneme detection.

[0079] Specifically, the boundaries of the phoneme sequence sequence are determined through the forced alignment operation through the given read-aloud text. Then output the frame-level features through the first three layers of the acoustic model, and then use statistical methods to convert the frame-level features into phoneme-level features according to the boundary information of the phoneme to represent the phoneme, and reduce the phoneme-level feature vector to a lower dimension through DA...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention discloses a pronunciation mistake detection method and device based on depth learning. The method comprises: the step 1) extracting the acoustics characteristics through reading the audio frequency, and constructing a phoneme grade decoding network through the reading test and the corresponding word pronunciation dictionary; the step 2) combining the acoustics characteristics and the acoustics model trained in advance to decode the phoneme grade decoding network and determine the boundary of phonemes to be detected; based on the depth automatic coder formed by the depth nerve network, extracting the phoneme-grade features according to the phoneme boundary and the acoustics characteristics in the boundary, wherein the depth nerve network is a depth Bayes confidence network; and feeding the phoneme-grade features of the phonemes to be detected into the pronunciation right and wrong classifier trained in advance to determine the pronunciation correction of the phonemes to be detected.

Description

technical field [0001] The invention belongs to a detection method and device for mispronunciation detection based on deep learning. Background technique [0002] Pronunciation errors in oral English include phoneme errors and prosody errors. [0003] Phoneme errors include: non-standard phoneme pronunciation, overpronunciation (insertion error), underpronunciation (deletion error), and mispronunciation into other sounds (substitution error). [0004] The detection scheme we propose is mainly to find out the phonemes that are not pronounced properly and mispronounced into other sounds, collectively referred to as mispronunciation detection. [0005] Traditional schemes are mainly divided into GOP schemes based on likelihood difference and classification schemes based on extracting phoneme-level features. [0006] GOP scheme based on likelihood difference: extract acoustic features by reading aloud audio, construct phoneme-level decoding network by reading aloud text and co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L25/78
CPCG10L25/78
Inventor 惠寅华王欢良杨嵩黄正伟方敏袁军峰戚自力
Owner SUZHOU CHIVOX INFORMATION TECH CO LTD