Parallel method and device for convolution calculation and data loading of neural network accelerator

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A neural network and accelerator technology, applied in the field of neural network computing, can solve the problems of long inference time, affect performance, and increase the chip area of large neural networks, so as to improve efficiency and real-time performance, reduce cache space, and reduce chip area. Effect

Active Publication Date: 2021-11-02

ZHEJIANG LAB +1

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The principle of the existing technology is simple, direct and easy to understand, but it does not fully consider the parallelism between various operations, which makes the reasoning time of large neural networks too long, especially for the target detection neural network with relatively high real-time requirements, the real-time performance becomes worse It will cause missed detection of the network, and then seriously affect the performance

At the same time, due to the lack of parallelism in the existing technology, a large number of convolution kernels need to be stored on-chip in order to ensure the normal calculation, so that the on-chip convolution kernel cache is larger, and then the chip area increases and the cost increases

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0043] Specific embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to illustrate and explain the present invention, and are not intended to limit the present invention.

[0044] Such as figure 1 As shown, a neural network accelerator convolution calculation and data loading parallel scheme, including input feature map loading, 64 convolution kernel loading, 64 convolution kernel loading and convolution calculation, 64 next layer Convolution kernel loading and convolution calculation.

[0045]Input feature map loading includes all input feature map channels used for convolution calculation; 64 convolution kernel loading includes not only loading 64 convolution kernels, but also loading the total number of convolution kernels is less than 64 case; 64 convolution kernel loading and convolution calculations include parallel operati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a parallel method and device for neural network accelerator convolution calculation and data loading. The parallel mode requires two input feature maps and convolution kernel cache blocks respectively, and the input feature maps and 64 convolution kernels are loaded according to the load. The input length is stored in each cache sub-block in turn, and the convolution calculation is performed efficiently while the data loading of the next group of 64 convolution kernels is realized. A general neural network requires hundreds or thousands of convolution kernels, and storing all of them on-chip requires a large amount of storage space, which will result in a larger chip area and a corresponding increase in manufacturing costs. The present invention significantly reduces on-chip storage space while ensuring efficient calculation of convolution, reduces chip area, and further reduces chip cost. Its implementation method is simple, flexible and controllable, and has nothing to do with the specific number of neural network layers.

Description

technical field [0001] The invention relates to the field of neural network calculations, in particular to a parallel method and device for convolution calculation and data loading of a neural network accelerator. Background technique [0002] Once the neural network appeared, it has become a research hotspot in academia and industry. With the deepening of research, various neural networks have been proposed, including large networks with hundreds of layers in depth and thousands of parameters. . At present, the main research work is done using a graphics processing unit (GPU). The GPU is easy to operate and easy to program. It has inherent advantages for training a neural network, but it also has shortcomings. In terms of executing some reasoning applications, especially large-scale terminal deployment , huge power consumption and high cost. [0003] In March 2016, Google AlphaGo defeated Go world champion and professional nine-dan Go player Li Sedol. Since then, research...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F15/78G06N3/04G06N3/063

CPCG06F15/781G06F15/7817G06N3/063G06N3/045G06F9/5066G06F2209/509G06F9/5038G06F9/5016G06N3/0464G06N3/0495G06F9/5027G06N3/04

Inventor 朱国权陆启明凡军海杨方超金孝飞孙世春章明何煜坤马德胡有能

Owner ZHEJIANG LAB

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Parallel method and device for convolution calculation and data loading of neural network accelerator

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology