A method for establishing a cldnn structure applied to end-to-end speech recognition

A technology for speech recognition and establishment methods, applied in speech recognition, speech analysis, neural learning methods, etc., which can solve problems such as gradient explosion, disappearance, and increasing gradients

Active Publication Date: 2020-12-22
CHONGQING UNIV OF POSTS & TELECOMM
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

A CLDNN applied to end-to-end speech recognition is proposed that can effectively solve the problem that LSTM is prone to overfitting in traditional CLDNN, and overcome the gradient disappearance, gradient explosion and "degeneration" problems caused by increasing the model depth. How to build the structure

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for establishing a cldnn structure applied to end-to-end speech recognition
  • A method for establishing a cldnn structure applied to end-to-end speech recognition
  • A method for establishing a cldnn structure applied to end-to-end speech recognition

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The technical solutions in the embodiments of the present invention will be described clearly and in detail below with reference to the drawings in the embodiments of the present invention. The described embodiments are only some of the embodiments of the invention.

[0050] The technical scheme that the present invention solves the problems of the technologies described above is:

[0051] S1, dividing the speech data set, and dividing the data set into a training set, a cross-validation set and a test set;

[0052] S2, carry out preprocessing to all data, and then obtain the mel-frequency cepstral coefficient (MFCC) of speech signal, preprocessing step is:

[0053] Pre-emphasis: For the signal passing through the high-pass filter H(Z)=1-μz -1

[0054] Framing: divide the entire speech signal into small segments of 30ms per frame and 10ms frame shift.

[0055] Windowing: add a Hamming window to each frame signal

[0056] S'(n)=S(n)*W(n)

[0057] (a takes 0.46) ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention claims to protect an end-to-end speech recognition method based on an improved CLDNN structure. A conventional CLDNN structure commonly used for speech recognition adopts a fully connected LSTM (Long Short Term Memory) model to process time sequence information in a speech signal, so that an overfitting phenomenon is easy to generate in the training process to influence a learning effect. A deeper model usually shows better, but by simply stacking network layers to increase a depth of the model, problems of gradient vanishing, gradient explosion and degeneration can be generated.For the phenomenon and the problems above, the invention discloses the improved CLDNN structure which adopts a residual network and ConvLSTM combined mode to establish a residual ConvLSTM model and replaces the fully connected LSTM model in the conventional CLDNN structure with the residual ConvLSTM model. The model structure improves the problem of the conventional CLDNN model, and can increasethe depth of the model by stacking residual ConvLSTM blocks without generating the problems of gradient vanishing, gradient explosion and degeneration , so that a speech recognition system is better in performance.

Description

technical field [0001] The invention belongs to the field of speech recognition, in particular to a method for establishing a CLDNN structure applied to end-to-end speech recognition. Background technique [0002] Automatic speech recognition technology has always played a pivotal role in the field of artificial intelligence. The traditional speech recognition technology represented by the HMM-GMM model has been the mainstream and dominated the field of speech recognition for decades. In recent years, thanks to breakthroughs in deep learning, automatic speech recognition technology is also in a stage of rapid development. At present, the popularity of end-to-end speech recognition systems based on deep learning has surpassed traditional speech recognition systems in academia, and has begun to gradually replace traditional speech recognition systems in actual production. [0003] Since the 1980s, acoustic models based on Gaussian Mixture Model / Hidden Markov Model (GMM / HMM) ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/16G10L15/06G06N3/08G06N3/04
Inventor 冯昱劼张毅徐轩
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products