Modeling approach and modeling system of acoustic model used in speech recognition

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of acoustic model and modeling method, applied in speech recognition, speech analysis, instruments, etc., can solve problems such as low modeling accuracy and poor speech recognition effect, and achieve the effect of improving modeling accuracy

Active Publication Date: 2013-05-22

INST OF ACOUSTICS CHINESE ACAD OF SCI +1

View PDF6 Cites 69 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

It can be seen that the traditional acoustic model has low modeling accuracy, resulting in poor speech recognition effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0016] The technical solutions of the embodiments of the present invention will be described in further detail below with reference to the drawings and embodiments.

[0017] Considering that the mixed Gaussian model needs to make inappropriate assumptions about speech features and their probability distributions, the embodiment of the present invention uses a context-dependent deep neural network instead of the mixed Gaussian model for acoustic model modeling. The deep neural network includes a plurality of hidden layers, and its modeling unit is a context-dependent triphone state clustered by a phoneme decision tree. The basic block diagram of the whole system is as follows figure 2 shown.

[0018] The minimum cross-entropy criterion is used as the objective function during deep neural network training. Because it has multiple hidden layers, its error function has many local extremums, which makes it easy for the deep neural network to fall into local extremums during the t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a modeling approach and a modeling system of an acoustic model used in speech recognition. The modeling approach includes the steps of: S1, training an initial model, wherein a modeling unit is a tri-phone state which is clustered by a phoneme decision tree and a state transition probability is provided by the model, S2, obtaining state information of a frame level based on the fact that the initial model aligns the tri-phone state of phonetic features of training data compulsively, S3, pre-training a deep neural network to obtain initial weights of each hidden layer, S4, training the initialized network through error back propagation algorithm based on the obtained frame level state information and updating the weights. According to the modeling approach, a context relevant tri-phone state is used as the modeling unit, the model is established based on the deep neural network, weight of each hidden layer of the network is initialized through restricted Boltzmann algorithm, and the weights can be updated subsequently by means of error back propagation algorithm. Therefore, risk that the network is easy to get into local extremum in pre-training is relieved effectively, and modeling accuracy of the acoustic model is improved greatly.

Description

technical field [0001] The invention relates to the field of speech recognition, in particular to a modeling method and modeling system of an acoustic model for speech recognition. Background technique [0002] The current mainstream framework for speech recognition is based on statistical pattern recognition. A typical speech recognition system framework such as figure 1 Shown: including speech acquisition and front-end processing module, feature extraction module, acoustic model module, language model module and decoder module. The basic process of speech recognition is as follows: the speech collection device collects the human speech and performs feature extraction after front-end processing. The extracted feature sequence, such as MFCC or PLP, obtains its observation probability through the acoustic model, and sends it to the decoder in combination with the language model probability to obtain the most effective possible text sequences. The modeling of the acoustic m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/14G10L15/06

Inventor 颜永红肖业鸣潘接林

Owner INST OF ACOUSTICS CHINESE ACAD OF SCI

Modeling approach and modeling system of acoustic model used in speech recognition

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology