Hardware accelerator and method for realizing sparse GRU neural network based on FPGA

A neural network and sparse technology, applied in the field of hardware accelerators, can solve the problems of limited processor acceleration ratio, general-purpose processors cannot obtain better benefits from sparse technology, etc.

Active Publication Date: 2017-10-03
XILINX INC
View PDF7 Cites 114 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, existing general-purpose processors (such as GPUs or CPUs) do not benefit from sparsification
Publ

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hardware accelerator and method for realizing sparse GRU neural network based on FPGA
  • Hardware accelerator and method for realizing sparse GRU neural network based on FPGA
  • Hardware accelerator and method for realizing sparse GRU neural network based on FPGA

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0171] Next, two calculation units (Process Element, referred to as PE) PE0 and PE1 are used to calculate a matrix-vector multiplication, and column storage (ccs) is used as an example to briefly explain the basic idea of ​​performing corresponding operations based on the hardware of the present invention.

[0172] The matrix sparsity in the compressed GRU is unbalanced, which leads to lower utilization of computing resources.

[0173] Such as Figure 11 As shown, suppose the input vector a contains 6 elements {a0, a1, a2, a3, a4, a5}, and the weight matrix contains 8×6 elements. Two PEs (PE0 and PE1) are responsible for calculating a3×w[3], where a3 is the fourth element of the input vector, and w[3] is the fourth column of the weight matrix.

[0174] From Figure 11 It can be seen from the figure that the workloads of PE0 and PE1 are different. PE0 performs three multiplication operations, while PE1 only performs one multiplication operation.

[0175] In the prior art, th...

example 2

[0188] Through this embodiment, it is intended to explain the IO bandwidth and computing unit balance of the present invention.

[0189] If the memory controller user interface is 512bit and the clock is 250Mhz, then the required PE concurrency is 512*250Mhz=(PE_num*freq_PE*data_bit), if the fixed point is 8bit weight, the clock frequency of the PE calculation module is 200Mhz, and the number of PEs required for 80 pcs.

[0190] For a network with 2048*1024 input of 1024, under different sparsity, the most time-consuming calculation is still the matrix multiplication vector. For sparse GRU networks z t , r t , and for h t The calculation can be multiplied by matrix vector Wx t and Uh t-1 covered by the calculation. Since the subsequent point multiplication and addition operations are serially pipelined, the resources required are relatively small. In summary, the present invention fully combines sparse matrix-vector multiplication, IO and calculation balance, and serial...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a hardware accelerator and method for realizing a sparse GRU neural network based on FPGA. According to the invention, an apparatus for realizing sparse GRU neural network includes: an input receiving unit which is intended for receiving a plurality of input vectors and distributing the plurality of input vectors to a plurality of computing units; a plurality of computing units which acquire input vectors from the input receiving unit, read weight matrix data of a neural network, decoding the weight matrix data and conduct matrix calculation on the decoded weight matrix data and the input vectors, and output the result of the matrix calculation to a hidden layer state computing module; a hidden layer state computing module which acquires the result of matrix calculation from the computing units PE, and computing the state of the hidden layer; and a control unit which is intended for global control. In addition, the invention also provides a method for realizing sparse GRU neural network through iteration.

Description

[0001] This application claims U.S. Patent Application No. 15 / 242,622, filed August 22, 2016, U.S. Patent Application No. 15 / 242,624, filed August 22, 2016, U.S. Patent Application No. 15 / 242,624, filed August 22, 2016 Application 15 / 242,625 has priority. The entire contents of which are hereby incorporated by reference. field of invention [0002] The invention relates to the field of artificial intelligence, in particular, the invention relates to a hardware accelerator and a method for implementing a sparse GRU neural network based on FPGA. Background technique [0003] Introduction to RNNs [0004] A recurrent neural network (RNN) is a class of artificial neural networks in which the connections between units form directed loops. This creates an internal state of the network that allows it to exhibit dynamic temporal behavior. RNNs can handle variable-length sequences by having recurrent hidden states, where the activation at each instant depends on the activation at ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/04G06N3/063
CPCG06N3/063G06N3/047
Inventor 谢东亮韩松单羿
Owner XILINX INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products