A convolutional neural network accelerator based on calculation optimization of an FPGA

A convolutional neural network and accelerator technology, applied in the field of convolutional neural network accelerator hardware structure, can solve the problem of large amount of redundant calculation, achieve high computing performance, reduce reading, and improve real-time performance.

Pending Publication Date: 2019-04-09
SOUTHEAST UNIV +2
View PDF6 Cites 43 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Based on the foregoing analysis, there is a problem of excessive redundant cal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A convolutional neural network accelerator based on calculation optimization of an FPGA
  • A convolutional neural network accelerator based on calculation optimization of an FPGA
  • A convolutional neural network accelerator based on calculation optimization of an FPGA

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0029] The technical solutions and beneficial effects of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0030] Such as figure 1 As shown, the hardware structure of the convolutional neural network accelerator designed for the present invention, taking the PE array size 16*16, the convolution kernel size 3*3, and the convolution kernel step size 1 as an example, its working mode is as follows:

[0031] The PC caches the data partitions in the external memory DDR through the PCI-E interface, and the data cache area reads the feature map data through the AXI4 bus interface and caches it in three feature map sub-buffer areas in rows, and the input index value is cached in the feature in the same way Picture buffer area. The weight data read through the AXI4 bus interface is sequentially buffered in 16 convolution kernel buffer areas, and the weight index value is buffered in the convolution kernel buffer area in the same way. T...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a convolutional neural network accelerator based on calculation optimization of an FPGA. The convolutional neural network accelerator comprises an AXI4 bus interface, a data cache region, a pre-fetched data region, a result cache region, a state controller and a PE array. The data cache region is used for caching feature map data, convolution kernel data and index values read from an external memory DDR through an AXI4 bus interface; The pre-fetched data area is used for pre-fetching feature map data needing to be input into the PE array in parallel from the feature mapsub-cache area; The result cache region is used for caching a calculation result of each row of PE; The state controller is used for controlling the working state of the accelerator to realize conversion between the working states; And the PE array is used for reading the data in the pre-fetched data area and the convolution kernel sub-cache area to carry out convolution operation. The accelerator utilizes the characteristics of parameter sparsity, repeated weight data and an activation function Relu to end redundant calculation in advance, so that the calculation amount is reduced, and the energy consumption is reduced by reducing the access memory frequency.

Description

technical field [0001] The invention belongs to the field of electronic information and deep learning, in particular to a computing-optimized convolutional neural network accelerator hardware structure based on FPGA (Filed Programmable Gate Array). Background technique [0002] In recent years, the use of deep neural networks has grown rapidly and has had a significant impact on the world's economic and social activities. Deep convolutional neural network technology has received widespread attention in many machine learning fields, including speech recognition, natural language processing, and intelligent image processing. Especially in the field of image recognition, deep convolutional neural networks have achieved some remarkable results. In these domains, deep convolutional neural networks are able to achieve superhuman accuracy. The excellence of deep convolutional neural networks stems from its ability to extract high-level features from raw data after performing stati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/04G06N3/063
CPCG06N3/063G06N3/045Y02D10/00
Inventor 陆生礼庞伟舒程昊范雪梅吴成路邹涛
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products