Unlock instant, AI-driven research and patent intelligence for your innovation.

Compression LSTM accelerator and acceleration method based on FPGA

An accelerator and multiplication module technology, applied in FPGA-based compressed LSTM accelerator and acceleration field, can solve the problems of low overall efficiency and idle computing units, and achieve saving on-chip cache, shortening calculation cycle time, and improving calculation performance and throughput volume effect

Active Publication Date: 2021-08-06
NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
View PDF7 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] At present, FPGA-based LSTM accelerators generally use parallel computing and parallel reading data to improve the acceleration performance of the accelerator. If the above-mentioned sparse weight matrix is ​​directly involved in the calculation, due to the existence of multiple zero elements, many computing units are idle in one computing cycle, thus lead to overall inefficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Compression LSTM accelerator and acceleration method based on FPGA
  • Compression LSTM accelerator and acceleration method based on FPGA
  • Compression LSTM accelerator and acceleration method based on FPGA

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. The term "temporary" and "first" in the invention are used to explain the different stages in the algorithm training, and have no limiting meaning. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative efforts fall within the protection scope of the present invention.

[0024] Such as figure 1 Shown, a kind of compressed LSTM accelerator based on FPGA, inside described FPGA accelerator comprises a plurality of calculation unit (PE unit), storage unit and control unit;

[0025] The calculation unit includes a non-zero detection module, a weight storage unit, four weight decoding module...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a compressed LSTM accelerator based on an FPGA and an acceleration method. The FPGA accelerator internally comprises a plurality of calculation units, a storage unit and a control unit. The method comprises the steps of: detecting a non-zero weight element value of the hidden node sparse weight matrix and a corresponding column index value by using a non-zero detection module; finding a corresponding excitation value according to the column index value by utilizing a weight decoding module; sending the plurality of non-zero weight element values and the plurality of excitation values to a matrix vector multiplication module for operation to obtain result vectors of the four gates; and enabling the Element operation module to calculate a cell state value and an output value at the current moment according to the result vectors of the four gates. In one calculation period, only the non-zero weight element value of each gate and the corresponding excitation value are multiplied, so that each matrix vector multiplication module is not idle in one calculation period, meanwhile, the time of a single calculation period is shortened, the calculation performance and throughput of the accelerator are improved, and meanwhile, the on-chip cache of the FPGA is saved.

Description

technical field [0001] The invention relates to the field of neural network computer hardware acceleration, in particular to an FPGA-based compressed LSTM accelerator and an acceleration method. Background technique [0002] At present, LSTM networks have achieved great success in various applications such as machine translation, multilingual processing, handwriting generation, and image caption generation. The higher the value, the more important it is to choose the right accelerator platform. FPGA can design a hardware structure suitable for neural network algorithms, and developers can connect the logic units inside FPGA through programmable connections according to their own needs to realize corresponding functions. At the same time, FPGA can design the hardware architecture according to the characteristics of the algorithm when designing the hardware acceleration of the neural network algorithm. And in terms of comprehensive computing and power consumption, FPGA has a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/063G06N3/08G06N3/04G06F15/78G06F17/16
CPCG06N3/063G06N3/082G06F15/781G06F17/16G06N3/048G06N3/044Y02D10/00
Inventor 葛芬崔晨晨张伟枫岳鑫李梓瑜周芳吴宁
Owner NANJING UNIV OF AERONAUTICS & ASTRONAUTICS