Pronunciation mistake detection method and device based on depth learning
A technology of deep learning and detection methods, applied in speech analysis, instruments, etc., can solve problems such as poor phoneme performance, and achieve the effect of avoiding poor detection performance
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0055] Such as figure 1 As shown, a mispronunciation detection method based on deep learning, including:
[0056] Step 1) extracting acoustic features by reading the audio, constructing a phoneme-level decoding network by reading the text and the corresponding word pronunciation dictionary;
[0057] Step 2) Decoding the phoneme-level decoding network in combination with the acoustic features and the pre-trained acoustic model to determine the boundaries of the phonemes to be detected;
[0058]Based on a deep autoencoder composed of a deep neural network, phoneme-level features are extracted according to the boundary of the phoneme and the acoustic features in the boundary, and the deep neural network is a deep Bayesian belief network;
[0059] The phoneme-level features of the phoneme to be detected are sent to the pre-trained correct pronunciation classifier to judge the correct pronunciation of the detected phoneme.
[0060] After the present invention takes the above sche...
Embodiment 2
[0064] The above-mentioned embodiments will be described in detail in combination with the following embodiments, wherein, in the process of phoneme-level feature extraction, specifically include:
[0065] Through the causal relationship between the nodes of the deep Bayesian network, the probability values of a group of nodes are calculated to form a vector, which is used as a phoneme-level feature.
[0066] Preferably, the conditional probability value of the causal relationship between nodes in the deep Bayesian network is obtained from a large amount of statistics.
[0067] Preferably, in step 2), also include:
[0068] Using a deep neural network as a classifier, so that all phonemes can share the hidden layer in the deep neural network when training the classifier;
[0069] Wherein, the hidden layer is the remaining layer of the in-out layer and the output layer in the multi-layer graph structure of the deep neural network.
[0070] Preferably, in step 2), specifical...
Embodiment 3
[0077] Such as figure 2 As shown, in one embodiment, the present invention adopts a deep automatic encoding (DAE) method based on deep learning technology, and extracts more abstract and general features to represent phonemes.
[0078] At the same time, the deep neural network is used as the classifier, so that all phonemes can share the hidden layer in the deep neural network when training the classifier, thereby avoiding the problem of poor performance of scarce phoneme detection.
[0079] Specifically, the boundaries of the phoneme sequence sequence are determined through the forced alignment operation through the given read-aloud text. Then output the frame-level features through the first three layers of the acoustic model, and then use statistical methods to convert the frame-level features into phoneme-level features according to the boundary information of the phoneme to represent the phoneme, and reduce the phoneme-level feature vector to a lower dimension through DA...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


