Aneural network acceleration system based on a block cyclic sparse matrix

A neural network and sparse matrix technology, which is applied in the field of neural network acceleration systems based on block cyclic sparse matrices, can solve problems such as the inability to effectively utilize excitation and weight sparsity, irregularity, and load imbalance, and improve processing energy efficiency. and throughput, reducing capacity requirements, reducing the effect of excessive access

Active Publication Date: 2019-03-15
NANJING UNIV
View PDF5 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The accelerator structure based on sparse neural network has the problem of unbalanced load due to irregular operations
While the accelerati

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Aneural network acceleration system based on a block cyclic sparse matrix
  • Aneural network acceleration system based on a block cyclic sparse matrix
  • Aneural network acceleration system based on a block cyclic sparse matrix

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The solution of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0031] Such as figure 1 , the accelerator system of this embodiment combines the two compression methods of cycle and sparsification, and utilizes the characteristics of the compressed neural network for acceleration. The architecture effectively utilizes the characteristics of compressed weights and incentives, and has the advantages of high throughput and low latency.

[0032] The calculation formula of the fully connected layer algorithm is as follows:

[0033] y=f(Wa+b) (1)

[0034] Among them, a is the excitation vector of the calculation input, y is the output vector, b is the bias, f is the nonlinear function, and W is the weight matrix.

[0035] The operation of each element value of the output vector y in formula (1) can be expressed as:

[0036]

[0037] Therefore, the main operations of the fully connected layer are divided into: mat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a neural network acceleration system based on a block cyclic sparse matrix, and the system comprises an extensible processing unit array which stores a part of weights of a neural network, and carries out the decoding and operation of a compressed network; The main controller is mainly responsible for controlling the operation process; And the excitation distribution unitis used for distributing non-zero operation data to the extensible processing unit array under the control of the main controller. The method has the beneficial effects that the characteristics of a block cyclic sparse matrix are effectively utilized, the problem of unbalanced load of sparse matrix vector multiplication operation is solved, and the utilization rate of operation units is improved;By utilizing the sparsity of excitation and weight, the use of on-chip storage is reduced, and redundant operation is skipped, so that the throughput of a hardware accelerator is improved, and the real-time requirement for processing a deep neural network is met.

Description

technical field [0001] The invention relates to the field of neural network hardware acceleration, in particular to a neural network acceleration system based on block cyclic sparse matrix. Background technique [0002] Deep neural network has received extensive attention from academia and industry due to its current optimal results in artificial intelligence applications such as image recognition. The scale of deep neural networks is getting larger and larger, and large-scale networks have the characteristics of high computational complexity and a large number of operational parameters. At the same time, due to the limitations of traditional processor performance and energy efficiency, it is difficult to implement large-scale convolutional neural networks on embedded or terminal devices. In resource-constrained systems such as embedded systems, the energy efficiency of the processor is critical. Therefore, under the premise of maintaining the recognition accuracy of the d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/04G06N3/063
CPCG06N3/063G06N3/045
Inventor 潘红兵秦子迪朱志炜郭良蛟查弈陈轩沈庆宏
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products