Parallel method and device for convolution calculation and data loading of neural network accelerator

A neural network and accelerator technology, applied in the field of neural network computing, can solve the problems of long inference time, affect performance, and increase the chip area of ​​large neural networks, so as to improve efficiency and real-time performance, reduce cache space, and reduce chip area. Effect

Active Publication Date: 2021-11-02
ZHEJIANG LAB +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The principle of the existing technology is simple, direct and easy to understand, but it does not fully consider the parallelism between various operations, which makes the reasoning time of large neural networks too long, especially for the target detection neural network with relatively high real-time requirements, the real-time performance becomes worse It will cause missed detection of the network, and then seriously affect the performance
At the same time, due to the lack of parallelism in the existing technology, a large number of convolution kernels need to be stored on-chip in order to ensure the normal calculation, so that the on-chip convolution kernel cache is larger, and then the chip area increases and the cost increases

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel method and device for convolution calculation and data loading of neural network accelerator
  • Parallel method and device for convolution calculation and data loading of neural network accelerator
  • Parallel method and device for convolution calculation and data loading of neural network accelerator

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] Specific embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to illustrate and explain the present invention, and are not intended to limit the present invention.

[0044] Such as figure 1 As shown, a neural network accelerator convolution calculation and data loading parallel scheme, including input feature map loading, 64 convolution kernel loading, 64 convolution kernel loading and convolution calculation, 64 next layer Convolution kernel loading and convolution calculation.

[0045]Input feature map loading includes all input feature map channels used for convolution calculation; 64 convolution kernel loading includes not only loading 64 convolution kernels, but also loading the total number of convolution kernels is less than 64 case; 64 convolution kernel loading and convolution calculations include parallel operati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a parallel method and device for neural network accelerator convolution calculation and data loading. The parallel mode requires two input feature maps and convolution kernel cache blocks respectively, and the input feature maps and 64 convolution kernels are loaded according to the load. The input length is stored in each cache sub-block in turn, and the convolution calculation is performed efficiently while the data loading of the next group of 64 convolution kernels is realized. A general neural network requires hundreds or thousands of convolution kernels, and storing all of them on-chip requires a large amount of storage space, which will result in a larger chip area and a corresponding increase in manufacturing costs. The present invention significantly reduces on-chip storage space while ensuring efficient calculation of convolution, reduces chip area, and further reduces chip cost. Its implementation method is simple, flexible and controllable, and has nothing to do with the specific number of neural network layers.

Description

technical field [0001] The invention relates to the field of neural network calculations, in particular to a parallel method and device for convolution calculation and data loading of a neural network accelerator. Background technique [0002] Once the neural network appeared, it has become a research hotspot in academia and industry. With the deepening of research, various neural networks have been proposed, including large networks with hundreds of layers in depth and thousands of parameters. . At present, the main research work is done using a graphics processing unit (GPU). The GPU is easy to operate and easy to program. It has inherent advantages for training a neural network, but it also has shortcomings. In terms of executing some reasoning applications, especially large-scale terminal deployment , huge power consumption and high cost. [0003] In March 2016, Google AlphaGo defeated Go world champion and professional nine-dan Go player Li Sedol. Since then, research...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F15/78G06N3/04G06N3/063
CPCG06F15/781G06F15/7817G06N3/063G06N3/045G06F9/5066G06F2209/509G06F9/5038G06F9/5016G06N3/0464G06N3/0495G06F9/5027G06N3/04
Inventor 朱国权陆启明凡军海杨方超金孝飞孙世春章明何煜坤马德胡有能
Owner ZHEJIANG LAB
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products