Improved time delay neural network acoustic model

An acoustic model and neural network technology, applied in the field of time-delayed neural network acoustic model, can solve the problems of acoustic model performance to be improved, TDNN model layer features without explicit modeling, etc., to achieve performance improvement and strengthen modeling capabilities Effect

Active Publication Date: 2019-01-04
SOUTH CHINA UNIV OF TECH
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Based on the development of deep learning, the time-delayed deep neural network model (TDNN) was applied to acoustic modeling and achieved good results, but the TDNN model did

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Improved time delay neural network acoustic model
  • Improved time delay neural network acoustic model
  • Improved time delay neural network acoustic model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0023] An improved time-delay neural network (TDNN) acoustic model, adding a specific module (also known as attention layer, attention layer or attention module) between several hidden layers of TDNN, and using the specific module to process the original input features weighted, and send the weighted features to the next hidden layer.

[0024] The attention module consists of an affine transformation and a weighting function. The output of the previous hidden layer is used as input to extract the feature weight value of the input, and the extracted weight value is used to weight the original input feature (element-by-element multiplication operation), Get the weighted features. The attention module can be effectively combined with TDNN to effectively improve the performance of the TDNN acoustic model without introducing too many para...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of speech recognition, and relates to an improved time delay neural network acoustic model comprising the steps: a basic TDNN network is established; an attention module is added between two adjacent hidden layers so as to obtain the improved TDNN network; and the improved TDNN network is trained and the final acoustic model is obtained. The attention module is composed of affine transformation and a weighting function. The output of the previous hidden layer is used as input to extract the feature weight value of the input, and the extracted weightvalue is applied to perform weighing of the original input feature so as to obtain the weighted feature. The factors including the modeling capability of the model, the context information extractioncapacity and the model size are considered, and multilayer weighing of the features of the neural network hidden layers is performed to effectively perform explicit modeling of the relative importance of the interlayer features so as to enhance the performance of the TDNN acoustic model and enhance the overall performance of the speech recognition system.

Description

technical field [0001] The invention belongs to the technical field of speech recognition and relates to a delay neural network acoustic model. Background technique [0002] From the birth of the world's first speech recognition system in the 1950s to the first decade of the 21st century, the core technology of speech recognition has undergone a gradual evolution from template matching to statistical model building. The most classic and still significant in the field of speech recognition is the method of combining Hidden Markov Model (HMM) and Gaussian Mixture Model (GMM). Dynamically model the speech signal, describe the time domain jump of the pronunciation state, and use the mixed Gaussian model to fit the characteristic distribution of each pronunciation state, because this method makes good use of the short-term stationary characteristics of the speech signal, so In the past few decades, it has become the core technology of acoustic modeling in speech recognition. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L15/16
CPCG10L15/16
Inventor 陈凯斌张伟彬徐向民
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products