Hardware accelerator and method for realizing sparse GRU neural network based on FPGA

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A neural network and sparse technology, applied in the field of hardware accelerators, can solve the problems of limited processor acceleration ratio, general-purpose processors cannot obtain better benefits from sparse technology, etc.

Active Publication Date: 2017-10-03

XILINX INC

View PDF7 Cites 114 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, existing general-purpose processors (such as GPUs or CPUs) do not benefit from sparsification

Published experiments show limited speedup on existing general-purpose processors when model compression is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

example 1

[0171] Next, two calculation units (Process Element, referred to as PE) PE0 and PE1 are used to calculate a matrix-vector multiplication, and column storage (ccs) is used as an example to briefly explain the basic idea of performing corresponding operations based on the hardware of the present invention.

[0172] The matrix sparsity in the compressed GRU is unbalanced, which leads to lower utilization of computing resources.

[0173] Such as Figure 11 As shown, suppose the input vector a contains 6 elements {a0, a1, a2, a3, a4, a5}, and the weight matrix contains 8×6 elements. Two PEs (PE0 and PE1) are responsible for calculating a3×w[3], where a3 is the fourth element of the input vector, and w[3] is the fourth column of the weight matrix.

[0174] From Figure 11 It can be seen from the figure that the workloads of PE0 and PE1 are different. PE0 performs three multiplication operations, while PE1 only performs one multiplication operation.

[0175] In the prior art, th...

example 2

[0188] Through this embodiment, it is intended to explain the IO bandwidth and computing unit balance of the present invention.

[0189] If the memory controller user interface is 512bit and the clock is 250Mhz, then the required PE concurrency is 512*250Mhz=(PE_num*freq_PE*data_bit), if the fixed point is 8bit weight, the clock frequency of the PE calculation module is 200Mhz, and the number of PEs required for 80 pcs.

[0190] For a network with 2048*1024 input of 1024, under different sparsity, the most time-consuming calculation is still the matrix multiplication vector. For sparse GRU networks z t , r t , and for h t The calculation can be multiplied by matrix vector Wx t and Uh t-1 covered by the calculation. Since the subsequent point multiplication and addition operations are serially pipelined, the resources required are relatively small. In summary, the present invention fully combines sparse matrix-vector multiplication, IO and calculation balance, and serial...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a hardware accelerator and method for realizing a sparse GRU neural network based on FPGA. According to the invention, an apparatus for realizing sparse GRU neural network includes: an input receiving unit which is intended for receiving a plurality of input vectors and distributing the plurality of input vectors to a plurality of computing units; a plurality of computing units which acquire input vectors from the input receiving unit, read weight matrix data of a neural network, decoding the weight matrix data and conduct matrix calculation on the decoded weight matrix data and the input vectors, and output the result of the matrix calculation to a hidden layer state computing module; a hidden layer state computing module which acquires the result of matrix calculation from the computing units PE, and computing the state of the hidden layer; and a control unit which is intended for global control. In addition, the invention also provides a method for realizing sparse GRU neural network through iteration.

Description

[0001] This application claims U.S. Patent Application No. 15 / 242,622, filed August 22, 2016, U.S. Patent Application No. 15 / 242,624, filed August 22, 2016, U.S. Patent Application No. 15 / 242,624, filed August 22, 2016 Application 15 / 242,625 has priority. The entire contents of which are hereby incorporated by reference. field of invention [0002] The invention relates to the field of artificial intelligence, in particular, the invention relates to a hardware accelerator and a method for implementing a sparse GRU neural network based on FPGA. Background technique [0003] Introduction to RNNs [0004] A recurrent neural network (RNN) is a class of artificial neural networks in which the connections between units form directed loops. This creates an internal state of the network that allows it to exhibit dynamic temporal behavior. RNNs can handle variable-length sequences by having recurrent hidden states, where the activation at each instant depends on the activation at ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/04G06N3/063

CPCG06N3/063G06N3/047

Inventor 谢东亮韩松单羿

Owner XILINX INC

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Hardware accelerator and method for realizing sparse GRU neural network based on FPGA

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

example 1

example 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology