Depth bidirectional LSTM acoustic model based on Maxout nerve cells

An acoustic model and neuron technology, applied in speech analysis, speech recognition, instruments, etc., can solve problems that cannot be directly applied, speech recognition delays, etc.

Active Publication Date: 2017-10-27
CHONGQING UNIV OF POSTS & TELECOMM
View PDF8 Cites 33 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the bidirectional dependence of DBLSTM on each time step, the above training algorithm cannot be directly applied to DBLSTM training
Because in LVCSR, DBLSTM is not suitable for low-latency recognition, which may cause a delay in the entire speech recognition

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Depth bidirectional LSTM acoustic model based on Maxout nerve cells
  • Depth bidirectional LSTM acoustic model based on Maxout nerve cells
  • Depth bidirectional LSTM acoustic model based on Maxout nerve cells

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] The technical solutions in the embodiments of the present invention will be described clearly and in detail below with reference to the drawings in the embodiments of the present invention. The described embodiments are only some of the embodiments of the invention.

[0055] The technical scheme that the present invention solves the problems of the technologies described above is:

[0056] The present invention as figure 1 Shown is a diagram of a single LSTM cell structure, which differs from a standard RNN. For standard RNN, given an input sequence x=(x 1 ,x 2 ,...,x T ), the state vector h=(h 1 , h 2 ,..., h T ) and output vector y=(y 1 ,y 2 ,...,y T ),which is

[0057] h t =H(W xh x t +W hh h t-1 +b h )

[0058] the y t =W hy h t +b y

[0059] Among them, W represents the weight matrix between each layer; b h and b y are the bias vectors of the hidden layer and the output layer respectively; H is the activation function of the output layer. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an acoustic model based on a depth bidirectional long short-term memory (DBLSTM) recurrent neural network (RNN). The DBLSTM network is mainly divided into three parts: in a full-connection part of the DBLSTM, Maxout nerve cells are used for replacing original Sigmoid nerve cells to solve the problems of gradient disappearance and explosion commonly appearing in RNN; simultaneously, Dropout regularization training algorithm is used for preventing the neural network from causing overfitting in the training process. In the multi-layer BLSTM part, in order to adapt to the bidirectional dependency of DBLSTM on each time step length, a Context-sensitive-chunk Back-propagation through time (CSC-BPTT) algorithm is provided for training the network. A selection link layer is adopted on the back of the multi-layer BLSTM part, and is used for carrying out conversion on the output of the DBLSTM to obtain the input of the full-connection part. According to the acoustic model, higher voice recognition rate can be obtained.

Description

technical field [0001] The invention belongs to the field of artificial intelligence speech recognition, and mainly relates to the application of a deep neural network in a speech acoustic model. Background technique [0002] The research and application of Deep Neural Network (DNN) has greatly promoted the development of Automatic Speech Recognition (ASR) technology. In the large vocabulary continuous speech recognition (LVCSR) system, compared with the traditional Gaussian mixture model-hidden Markov model (GMM-HMM) acoustic model, based on DNN Acoustic models of , show better advantages. Seide F et al. used the DNN-HMM acoustic model for the Switchboard phone transcription task, and its word error rate (Word error rate, WER) dropped by 33%. The research and extension of the DNN-HMM acoustic model has brought unprecedented development to ASR technology. [0003] Previous research has mainly focused on feed-forward neural networks for processing contextual acoustic featu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/16G10L15/14G10L15/06
CPCG10L15/063G10L15/144G10L15/16
Inventor 罗元刘宇张毅
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products