A method for establishing a cldnn structure applied to end-to-end speech recognition

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for speech recognition and establishment methods, applied in speech recognition, speech analysis, neural learning methods, etc., which can solve problems such as gradient explosion, disappearance, and increasing gradients

Active Publication Date: 2020-12-22

CHONGQING UNIV OF POSTS & TELECOMM

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

A CLDNN applied to end-to-end speech recognition is proposed that can effectively solve the problem that LSTM is prone to overfitting in traditional CLDNN, and overcome the gradient disappearance, gradient explosion and "degeneration" problems caused by increasing the model depth. How to build the structure

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0049] The technical solutions in the embodiments of the present invention will be described clearly and in detail below with reference to the drawings in the embodiments of the present invention. The described embodiments are only some of the embodiments of the invention.

[0050] The technical scheme that the present invention solves the problems of the technologies described above is:

[0051] S1, dividing the speech data set, and dividing the data set into a training set, a cross-validation set and a test set;

[0052] S2, carry out preprocessing to all data, and then obtain the mel-frequency cepstral coefficient (MFCC) of speech signal, preprocessing step is:

[0053] Pre-emphasis: For the signal passing through the high-pass filter H(Z)=1-μz -1

[0054] Framing: divide the entire speech signal into small segments of 30ms per frame and 10ms frame shift.

[0055] Windowing: add a Hamming window to each frame signal

[0056] S'(n)=S(n)*W(n)

[0057] (a takes 0.46) ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention claims to protect an end-to-end speech recognition method based on an improved CLDNN structure. A conventional CLDNN structure commonly used for speech recognition adopts a fully connected LSTM (Long Short Term Memory) model to process time sequence information in a speech signal, so that an overfitting phenomenon is easy to generate in the training process to influence a learning effect. A deeper model usually shows better, but by simply stacking network layers to increase a depth of the model, problems of gradient vanishing, gradient explosion and degeneration can be generated.For the phenomenon and the problems above, the invention discloses the improved CLDNN structure which adopts a residual network and ConvLSTM combined mode to establish a residual ConvLSTM model and replaces the fully connected LSTM model in the conventional CLDNN structure with the residual ConvLSTM model. The model structure improves the problem of the conventional CLDNN model, and can increasethe depth of the model by stacking residual ConvLSTM blocks without generating the problems of gradient vanishing, gradient explosion and degeneration , so that a speech recognition system is better in performance.

Description

technical field [0001] The invention belongs to the field of speech recognition, in particular to a method for establishing a CLDNN structure applied to end-to-end speech recognition. Background technique [0002] Automatic speech recognition technology has always played a pivotal role in the field of artificial intelligence. The traditional speech recognition technology represented by the HMM-GMM model has been the mainstream and dominated the field of speech recognition for decades. In recent years, thanks to breakthroughs in deep learning, automatic speech recognition technology is also in a stage of rapid development. At present, the popularity of end-to-end speech recognition systems based on deep learning has surpassed traditional speech recognition systems in academia, and has begun to gradually replace traditional speech recognition systems in actual production. [0003] Since the 1980s, acoustic models based on Gaussian Mixture Model / Hidden Markov Model (GMM / HMM) ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L15/16G10L15/06G06N3/08G06N3/04

Inventor 冯昱劼张毅徐轩

Owner CHONGQING UNIV OF POSTS & TELECOMM

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A method for establishing a cldnn structure applied to end-to-end speech recognition

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology