LSTM model optimization method, accelerator, device and medium

An optimization method and model technology, applied in neural learning methods, biological neural network models, neural architectures, etc., can solve problems such as difficulty in deploying models, high power consumption of computing platforms, and inability to carry LSTMs to enhance overall performance and applicability range, improved computing efficiency and speed, and the effect of facilitating hardware deployment

Pending Publication Date: 2021-10-22
SHENZHEN ECHIEV AUTONOMOUS DRIVING TECH CO LTD
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional computing platforms cannot carry such a large amount of data calculation of LSTM
In embedded applications, especially in areas with extremely high latency requirements such as autonomous driving, the LSTM model itself has a huge amount of parameters, as well as a large amount of training data and inference test data, resulting in not only computing The complexity is high, and the power consumption of the computing platform is also very large
It is difficult to deploy models in embedded devices with extremely high power consumption requirements
[0003] For the hardware-accelerated calculation of the sparse LSTM model, the industry introduced the Delta algorithm to construct and mine the sparsity of the sequence data by using the numerical similarity of the sequence data, and reconstruct the LSTM model and accelerate the algorithm. However, this method Limited by the time dependence of sequence data, the input data at adjacent times needs to have a high similarity, and the scope of application has obvious limitations, and the implementation of hardware acceleration for Delta-based LSTM model reconstruction is complicated, which is not conducive to hardware deployment

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • LSTM model optimization method, accelerator, device and medium
  • LSTM model optimization method, accelerator, device and medium
  • LSTM model optimization method, accelerator, device and medium

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0051] refer to figure 1 , figure 1 For the first embodiment of the LSTM model optimization method of the present application, the method includes:

[0052] Step S110: Obtain the weight matrix of the pruned LSTM network.

[0053]Specifically, the Recurrent Neural Network (RNN) based on Long Short-Term Memory (LSTM) is a neural network model for processing sequence data, which effectively solves the problem of gradient disappearance and explosion, and It is widely used in the field of intelligent cognition, such as speech recognition, behavior recognition and natural language processing.

[0054] Specifically, the pruning operation is to compress the LSTM network to reduce the storage and computing costs of the LSTM network. The methods for compressing LSTM networks mainly include but are not limited to parameter pruning and sharing, low-rank factorization, transferred / compact convolutional filters, and Knowledge distillation.

[0055] Specifically, the weight matrix is ​​...

no. 2 example

[0090] refer to Image 6 , Image 6 For the second embodiment of the LSTM model optimization method of the present application, the method also includes:

[0091] Step S210: Obtain the weight matrix of the pruned LSTM network.

[0092] Step S220: Obtain the weight sparsity of the weight matrix based on the weight matrix.

[0093] Step S230: Obtain the sparsity of the input sequence.

[0094] Step S240: If the weight sparsity is greater than or equal to the weight sparsity threshold and / or the input sequence sparsity is greater than or equal to the input sequence sparsity threshold, determine the sparse operation mode.

[0095] Step S250: Calculate the input sequence according to the sparse operation mode.

[0096] Step S260: If the weight sparsity is less than the weight sparsity threshold and / or the input sequence sparsity is less than the input sequence sparsity threshold, determine that it is an intensive computing mode.

[0097]Step S270: Calculate the input sequence ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an LSTM model optimization method, an accelerator, a device and a medium. The method comprises the following steps: acquiring a weight matrix of a pruned LSTM network; based on the weight matrix, obtaining the weight sparseness of the weight matrix; obtaining the sparseness of an input sequence; based on the weight sparseness and the sparseness of the input sequence, judging an operation mode; and if the operation mode is judged to be a sparse operation mode, calculating the input sequence according to the sparse operation mode. The invention aims to improve the energy efficiency ratio of the LSTM hardware accelerator, and is simple to implement and easy to deploy.

Description

technical field [0001] The invention relates to the field of computer hardware acceleration, in particular to an LSTM model optimization method, accelerator, device and medium. Background technique [0002] Recurrent Neural Network (RNN) based on Long Short-Term Memory (LSTM) is a neural network model for processing sequence data. It is widely used in cognitive fields, such as speech recognition, behavior recognition and natural language processing. But in the actual engineering application practice, it faces many problems. Traditional computing platforms cannot carry such a large amount of data calculation of LSTM. In embedded applications, especially in areas with extremely high latency requirements such as autonomous driving, the LSTM model itself has a huge amount of parameters, as well as a large amount of training data and inference test data, resulting in not only computing The complexity is high, and the power consumption of the computing platform is also very lar...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/04G06N3/063G06N3/08
CPCG06N3/08G06N3/063G06N3/044
Inventor 宋朝忠李小莲连帅军
Owner SHENZHEN ECHIEV AUTONOMOUS DRIVING TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products