Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Modeling method and modeling system of acoustic model for speech recognition

A technology of acoustic models and modeling methods, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of low modeling accuracy and poor speech recognition effect, and achieve the effect of improving modeling accuracy

Active Publication Date: 2015-10-28
INST OF ACOUSTICS CHINESE ACAD OF SCI +1
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It can be seen that the traditional acoustic model has low modeling accuracy, resulting in poor speech recognition effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Modeling method and modeling system of acoustic model for speech recognition
  • Modeling method and modeling system of acoustic model for speech recognition
  • Modeling method and modeling system of acoustic model for speech recognition

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] The technical solutions of the embodiments of the present invention will be described in further detail below with reference to the drawings and embodiments.

[0017] Considering that the mixed Gaussian model needs to make inappropriate assumptions about speech features and their probability distributions, the embodiment of the present invention uses a context-dependent deep neural network instead of the mixed Gaussian model for acoustic model modeling. The deep neural network includes a plurality of hidden layers, and its modeling unit is a context-dependent triphone state clustered by a phoneme decision tree. The basic block diagram of the whole system is as follows figure 2 shown.

[0018] The minimum cross-entropy criterion is used as the objective function during deep neural network training. Because it has multiple hidden layers, its error function has many local extremums, which makes it easy for the deep neural network to fall into local extremums during the t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a modeling approach and a modeling system of an acoustic model used in speech recognition. The modeling approach includes the steps of: S1, training an initial model, wherein a modeling unit is a tri-phone state which is clustered by a phoneme decision tree and a state transition probability is provided by the model, S2, obtaining state information of a frame level based on the fact that the initial model aligns the tri-phone state of phonetic features of training data compulsively, S3, pre-training a deep neural network to obtain initial weights of each hidden layer, S4, training the initialized network through error back propagation algorithm based on the obtained frame level state information and updating the weights. According to the modeling approach, a context relevant tri-phone state is used as the modeling unit, the model is established based on the deep neural network, weight of each hidden layer of the network is initialized through restricted Boltzmann algorithm, and the weights can be updated subsequently by means of error back propagation algorithm. Therefore, risk that the network is easy to get into local extremum in pre-training is relieved effectively, and modeling accuracy of the acoustic model is improved greatly.

Description

technical field [0001] The invention relates to the field of speech recognition, in particular to a modeling method and modeling system of an acoustic model for speech recognition. Background technique [0002] The current mainstream framework for speech recognition is based on statistical pattern recognition. A typical speech recognition system framework such as figure 1 Shown: including speech acquisition and front-end processing module, feature extraction module, acoustic model module, language model module and decoder module. The basic process of speech recognition is as follows: the speech collection device collects the human speech and performs feature extraction after front-end processing. The extracted feature sequence, such as MFCC or PLP, obtains its observation probability through the acoustic model, and sends it to the decoder in combination with the language model probability to obtain the most effective possible text sequences. The modeling of the acoustic m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/14G10L15/06
Inventor 颜永红肖业鸣潘接林
Owner INST OF ACOUSTICS CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products