Reconfigurable general standard convolution accelerator design method based on HLS

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A design method and a general standard technology, applied in neural architecture, physical implementation, biological neural network models, etc., can solve the problem of insufficient parallelism of convolutional neural network design, difficulty in realizing the practical application of embedded devices, and resource density of FPGA chips Low-level problems, to achieve the effect of improving the overall computing speed, saving system power consumption, and saving design time

Pending Publication Date: 2021-01-29

TIANJIN UNIV

View PDF5 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

With the continuous wave of artificial intelligence, various advanced algorithms emerge in an endless stream, in the field of image classification: AlexNet, VGG, ResNet, GoogleNet, etc., in the field of target detection: R-CNN, Faster R-CNN, SSD, YOLO, etc., The accuracy of the network is getting higher and higher, but the structure of the network is becoming more and more complex, and the scale of the network is getting larger and larger, so the network training and the network feed-forward process after the training is completed become very slow.

Large-scale networks rely on GPU servers for network computing, but for mobile and edge devices, the network structure is very large, making it difficult to realize the practical application of embedded devices

In addition, relying on the continuous improvement of chip manufacturing technology, the performance of CPU and GPU is continuously improved, and there are also various chips for the embedded field, such as Nvidia's TX1 and TX2, which have relatively good performance and power consumption, but Its power consumption and size are still relatively large, which cannot meet the urgent needs of the current industry

[0003] For the acceleration of network-related models using FPGA, as early as 1996, Cloutier and others began to study the acceleration of convolutional neural networks using FPGA. They realized the related design of handwritten letter recognition on FPGA, which was limited by the chip manufacturing process at that time. , the resource density on the FPGA chip is very low, and the parallelism of the convolutional neural network has not been fully designed, so no good experimental results have been obtained in the end, and the network execution speed is slow (article title: Vip: An fpga-based processor for image processing and neural networks; conference: In FifthInternational Conference on Microelectronics for Neural Networks and FuzzySystems (MicroNeuro'96), Lausanne, Switzerland; author: Cloutier J, Cosatto E, PigeonS, et al.; publication year: 1996; page number: 330 -336)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0030] The present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0031] figure 1 A block diagram is designed for the overall accelerator of the present invention, and the cycle block factors are respectively: q, p, and the control of the degree of parallelism can be realized by changing the sizes of q and p. For the weight data, this embodiment only transfers the weight required for this calculation to the on-chip BRAM, so the size of the on-chip BRAM is: p*q*k*k, where k is the size of the convolution kernel; for the input of the feature map, Use the "Line Buffer" in the library function to design a cache structure of q k lines, and the input data can be cached by moving and inserting; take the 3*3 convolution calculation as an example, use the "...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an HLS-based reconfigurable general standard convolution accelerator design method, which comprises the following steps: analyzing standard convolution calculation characteristics to obtain parallelism of four scales including convolution kernel, feature map, convolution kernel and feature map, and performing parallelization design on convolution calculation by using a cyclic partitioning and cyclic expansion technology to obtain a reconstructed general standard convolution accelerator. Introducing a cyclic blocking factor to control a parallelization scale; a weight bias storage structure, a feature map row cache structure and a multi-stage output cache structure are designed for on-chip storage of an accelerator, and a multi-channel parallel read-write structure is designed by using an AXI4 bus; for 3 * 3 and 1 * 1 convolution calculation, a reusable convolution calculation method is designed, and hardware resources are effectively saved; a five-layer streamlined structure is designed for the calculation process of the convolution layer, and the convolution calculation efficiency is improved.

Description

technical field [0001] The invention relates to the field of hardware accelerator design for convolution calculation in convolutional neural networks, in particular to an HLS-based reconfigurable general standard convolution accelerator design method. Background technique [0002] Convolutional neural networks are more and more applied to real life, and their practical application value is constantly being highlighted. They have very important applications in various fields such as image classification, target detection, semantic segmentation, and speech recognition. . With the continuous wave of artificial intelligence, various advanced algorithms emerge in an endless stream, in the field of image classification: AlexNet, VGG, ResNet, GoogleNet, etc., in the field of target detection: R-CNN, Faster R-CNN, SSD, YOLO, etc., The accuracy of the network is getting higher and higher, but the structure of the network is becoming more and more complex, and the scale of the networ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/063G06N3/04

CPCG06N3/063G06N3/045Y02D10/00

Inventor 马书根王龙海任超

Owner TIANJIN UNIV

Reconfigurable general standard convolution accelerator design method based on HLS

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology