Pronunciation mistake detection method and device based on depth learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of deep learning and detection methods, applied in speech analysis, instruments, etc., can solve problems such as poor phoneme performance, and achieve the effect of avoiding poor detection performance

Active Publication Date: 2017-01-04

SUZHOU CHIVOX INFORMATION TECH CO LTD

View PDF10 Cites 15 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the performance of some rare phonemes with fewer training samples is worse than that of phonemes with sufficient training samples.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0055] Such as figure 1 As shown, a mispronunciation detection method based on deep learning, including:

[0056] Step 1) extracting acoustic features by reading the audio, constructing a phoneme-level decoding network by reading the text and the corresponding word pronunciation dictionary;

[0057] Step 2) Decoding the phoneme-level decoding network in combination with the acoustic features and the pre-trained acoustic model to determine the boundaries of the phonemes to be detected;

[0058]Based on a deep autoencoder composed of a deep neural network, phoneme-level features are extracted according to the boundary of the phoneme and the acoustic features in the boundary, and the deep neural network is a deep Bayesian belief network;

[0059] The phoneme-level features of the phoneme to be detected are sent to the pre-trained correct pronunciation classifier to judge the correct pronunciation of the detected phoneme.

[0060] After the present invention takes the above sche...

Embodiment 2

[0064] The above-mentioned embodiments will be described in detail in combination with the following embodiments, wherein, in the process of phoneme-level feature extraction, specifically include:

[0065] Through the causal relationship between the nodes of the deep Bayesian network, the probability values of a group of nodes are calculated to form a vector, which is used as a phoneme-level feature.

[0066] Preferably, the conditional probability value of the causal relationship between nodes in the deep Bayesian network is obtained from a large amount of statistics.

[0067] Preferably, in step 2), also include:

[0068] Using a deep neural network as a classifier, so that all phonemes can share the hidden layer in the deep neural network when training the classifier;

[0069] Wherein, the hidden layer is the remaining layer of the in-out layer and the output layer in the multi-layer graph structure of the deep neural network.

[0070] Preferably, in step 2), specifical...

Embodiment 3

[0077] Such as figure 2 As shown, in one embodiment, the present invention adopts a deep automatic encoding (DAE) method based on deep learning technology, and extracts more abstract and general features to represent phonemes.

[0078] At the same time, the deep neural network is used as the classifier, so that all phonemes can share the hidden layer in the deep neural network when training the classifier, thereby avoiding the problem of poor performance of scarce phoneme detection.

[0079] Specifically, the boundaries of the phoneme sequence sequence are determined through the forced alignment operation through the given read-aloud text. Then output the frame-level features through the first three layers of the acoustic model, and then use statistical methods to convert the frame-level features into phoneme-level features according to the boundary information of the phoneme to represent the phoneme, and reduce the phoneme-level feature vector to a lower dimension through DA...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention discloses a pronunciation mistake detection method and device based on depth learning. The method comprises: the step 1) extracting the acoustics characteristics through reading the audio frequency, and constructing a phoneme grade decoding network through the reading test and the corresponding word pronunciation dictionary; the step 2) combining the acoustics characteristics and the acoustics model trained in advance to decode the phoneme grade decoding network and determine the boundary of phonemes to be detected; based on the depth automatic coder formed by the depth nerve network, extracting the phoneme-grade features according to the phoneme boundary and the acoustics characteristics in the boundary, wherein the depth nerve network is a depth Bayes confidence network; and feeding the phoneme-grade features of the phonemes to be detected into the pronunciation right and wrong classifier trained in advance to determine the pronunciation correction of the phonemes to be detected.

Description

technical field [0001] The invention belongs to a detection method and device for mispronunciation detection based on deep learning. Background technique [0002] Pronunciation errors in oral English include phoneme errors and prosody errors. [0003] Phoneme errors include: non-standard phoneme pronunciation, overpronunciation (insertion error), underpronunciation (deletion error), and mispronunciation into other sounds (substitution error). [0004] The detection scheme we propose is mainly to find out the phonemes that are not pronounced properly and mispronounced into other sounds, collectively referred to as mispronunciation detection. [0005] Traditional schemes are mainly divided into GOP schemes based on likelihood difference and classification schemes based on extracting phoneme-level features. [0006] GOP scheme based on likelihood difference: extract acoustic features by reading aloud audio, construct phoneme-level decoding network by reading aloud text and co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L25/78

CPCG10L25/78

Inventor 惠寅华王欢良杨嵩黄正伟方敏袁军峰戚自力

Owner SUZHOU CHIVOX INFORMATION TECH CO LTD

Pronunciation mistake detection method and device based on depth learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology