Automatic speech recognition method based on random depth delay neural network model

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An automatic speech recognition and neural network model technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of limiting neural network learning ability, model parameter growth, gradient disappearance, etc., to solve overfitting and gradient disappearance, Enhanced modeling ability, the effect of strong modeling ability

Active Publication Date: 2018-12-21

SOUTH CHINA UNIV OF TECH

View PDF9 Cites 15 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Traditional acoustic modeling uses Gaussian mixture model (GMM) to model each phoneme state, but this model has several disadvantages: First, GMM has no advantages for nonlinear modeling, and for some complex signals (such as speech ) requires more parameters to achieve good results; secondly, GMM is sensitive to input feature dimensions, and the growth of input dimensions brings geometric growth of model parameters

[0006] 1. When the TDNN model models at the granularity of each context, there is only one TDNN layer, and its modeling ability is insufficient;

[0007] 2. The deeper TDNN model will lead to the problem of gradient disappearance, which limits the learning ability of the neural network;

[0008] 3. When using a larger TDNN model, it is easy to cause over-fitting problems

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0036] The technical solutions of the present invention will be further described below in conjunction with the drawings and embodiments.

[0037] An automatic speech recognition method based on the stochastic deep time-delay neural network (TDNN-SD) model, which fully considers the respective advantages of stochastic depth and TDNN, and embeds stochastic depth into TDNN. As a long-term dependent modeling model, TDNN has higher computational efficiency and training time than recurrent neural networks. By embedding random depth into TDNN, that is, in the original TDNN, for each TDNN layer with upper and lower frame splicing, a random deep network is introduced to enhance the modeling ability and robustness of the network, and solve the problem of overcrowding in the training process. Fitting and gradient disappearance problems, thereby improving the accuracy of speech recognition.

[0038] A typical speech recognition system consists of feature extraction, acoustic model, lang...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belonging to the field of automatic speech recognition technology relates to an automatic speech recognition method based on a random depth delay neural network model. The method comprises: preparing training data; extracting acoustic features from trained speech audio data; training a traditional GMM-HMM model and carrying out forced alignment on the trained speech audio data by using the trained GMM-HMM model to obtain a corresponding frame level training label; supervising and training a random-depth-based time-delay neural network model by using the trained speech audio dataand the corresponding frame level training label and acquiring an acoustic model by combining a hidden Markov model; carrying out training by using corresponding text annotation data or texts of otherdata sets to obtain a trained language model; and constructing an automatic speech recognition decoder by using the trained language model and acoustic model. Therefore, the modeling ability of the model is strengthened and problems of over-fitting and gradient disappearing during the training process are solved, so that the accuracy of the speech recognition is improved.

Description

technical field [0001] The invention belongs to the technical field of automatic speech recognition, and relates to an automatic speech recognition method based on a random deep time-delay neural network model. Background technique [0002] With the continuous development of deep learning technology, the scope of automatic speech recognition in practical applications is becoming wider and wider, such as Apple Siri and Amazon Alexa, and it continues to penetrate into people's work, study and life. Therefore, there is an increasing demand for models that are more robust and capable of modeling. [0003] The main task of automatic speech recognition is to find a way to achieve the same recognition rate as human beings under the premise of effectively solving different environmental factors (such as speakers, vocal channels, etc.). features, the corresponding text is obtained by decoding the acoustic model and the language model. Traditional acoustic modeling uses Gaussian mix...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G10L15/16G10L15/14

CPCG10L15/144G10L15/16

Inventor 黄晓荣张伟彬徐向民殷瑞祥

Owner SOUTH CHINA UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Automatic speech recognition method based on random depth delay neural network model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology