Automatic speech recognition method based on random depth delay neural network model

An automatic speech recognition and neural network model technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of limiting neural network learning ability, model parameter growth, gradient disappearance, etc., to solve overfitting and gradient disappearance, Enhanced modeling ability, the effect of strong modeling ability

Active Publication Date: 2018-12-21
SOUTH CHINA UNIV OF TECH
View PDF9 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional acoustic modeling uses Gaussian mixture model (GMM) to model each phoneme state, but this model has several disadvantages: First, GMM has no advantages for nonlinear modeling, and for some complex signals (such as speech ) requires more parameters to achieve good results; secondly, GMM is sensitive to input feature dimensions, and the growth of input dimensions brings geometric

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic speech recognition method based on random depth delay neural network model
  • Automatic speech recognition method based on random depth delay neural network model
  • Automatic speech recognition method based on random depth delay neural network model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The technical solutions of the present invention will be further described below in conjunction with the drawings and embodiments.

[0037] An automatic speech recognition method based on the stochastic deep time-delay neural network (TDNN-SD) model, which fully considers the respective advantages of stochastic depth and TDNN, and embeds stochastic depth into TDNN. As a long-term dependent modeling model, TDNN has higher computational efficiency and training time than recurrent neural networks. By embedding random depth into TDNN, that is, in the original TDNN, for each TDNN layer with upper and lower frame splicing, a random deep network is introduced to enhance the modeling ability and robustness of the network, and solve the problem of overcrowding in the training process. Fitting and gradient disappearance problems, thereby improving the accuracy of speech recognition.

[0038] A typical speech recognition system consists of feature extraction, acoustic model, lang...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belonging to the field of automatic speech recognition technology relates to an automatic speech recognition method based on a random depth delay neural network model. The method comprises: preparing training data; extracting acoustic features from trained speech audio data; training a traditional GMM-HMM model and carrying out forced alignment on the trained speech audio data by using the trained GMM-HMM model to obtain a corresponding frame level training label; supervising and training a random-depth-based time-delay neural network model by using the trained speech audio dataand the corresponding frame level training label and acquiring an acoustic model by combining a hidden Markov model; carrying out training by using corresponding text annotation data or texts of otherdata sets to obtain a trained language model; and constructing an automatic speech recognition decoder by using the trained language model and acoustic model. Therefore, the modeling ability of the model is strengthened and problems of over-fitting and gradient disappearing during the training process are solved, so that the accuracy of the speech recognition is improved.

Description

technical field [0001] The invention belongs to the technical field of automatic speech recognition, and relates to an automatic speech recognition method based on a random deep time-delay neural network model. Background technique [0002] With the continuous development of deep learning technology, the scope of automatic speech recognition in practical applications is becoming wider and wider, such as Apple Siri and Amazon Alexa, and it continues to penetrate into people's work, study and life. Therefore, there is an increasing demand for models that are more robust and capable of modeling. [0003] The main task of automatic speech recognition is to find a way to achieve the same recognition rate as human beings under the premise of effectively solving different environmental factors (such as speakers, vocal channels, etc.). features, the corresponding text is obtained by decoding the acoustic model and the language model. Traditional acoustic modeling uses Gaussian mix...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L15/16G10L15/14
CPCG10L15/144G10L15/16
Inventor 黄晓荣张伟彬徐向民殷瑞祥
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products