A Convolutional Neural Network Accelerator Based on FPGA

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A convolutional neural network and accelerator technology, which is applied in the field of convolutional neural network accelerators, can solve the problems of high power consumption of GPU acceleration solutions and the difficulty of general-purpose processors to meet high performance, and achieves a flexible and low-power address generator. Effect

Active Publication Date: 2021-01-08

XIAN UNIV OF POSTS & TELECOMM

View PDF7 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0002] With the rise of edge computing, it is particularly urgent to implement CNN with resource constraints such as embedded devices. The existing general-purpose processor-based solutions and GPU-accelerated solutions are difficult to implement on resource-constrained embedded devices.

[0003] The existing implementation schemes are based on general-purpose processors or GPU-accelerated schemes, but it is difficult for general-purpose processors to implement CNN to meet high-performance requirements, and GPU-accelerated schemes consume too much power and are difficult to implement on resource-constrained embedded devices. Realize on

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0052] The CNN network is a highly parallel network, and the layers are independent of each other. A general-purpose processor implements CNN in a serial manner. For a highly parallel CNN, the performance is obviously not high, but the present invention is based on an FPGA implementation. The inherent parallelism of the FPGA itself is suitable for the highly parallel network of the network. The FPGA-based implementation scheme Can meet the requirements of high performance; In addition, GPU-based acceleration scheme, although GPU is processed in parallel, but the high power consumption is difficult to achieve on resource-constrained embedded devices, and FPGA is a low-power acceleration program to meet the resource-constrained requirements of embedded devices.

[0053] Such as figure 2 As shown, this embodiment provides an FPGA-based convolutional neural network accelerator. The accelerator in this embodiment is a hardware structure, which realizes the functions of a CNN netw...

Embodiment 2

[0087] In order to better understand the structure of the convolutional neural network accelerator based on the array processor in the present invention, and the reconfigurable performance of the operation processing unit in the accelerator, the structure and information processing process of the LENET-5 neural network are illustrated below as examples.

[0088] Such as Figure 1A As shown, the first layer structure of the LENET-5 network includes: 6 convolution kernels and 6 pooling layers. The processing method is to perform average pooling after convolution of the original image; obtain 6 feature maps;

[0089] Such as Figure 1B As shown, the second layer structure includes: 6*12 convolution kernels; the processing method is: 6 feature maps are multiplied by 6 convolution kernels in 12 rows and then added to output a result, specifically, 6 caches The result of multiplying the feature map output by the first layer structure with the corresponding convolution kernel in the ...

Embodiment 3

[0115] Such as Figure 1A shown in Figure 1A In the structure shown, the original input image is 32*32, and the first layer of convolution has 6 convolution kernels. After convolution, there are 6 feature maps of 28*28. Take these 6 feature maps of 28*28 After average pooling, 6 feature maps of 14*14 are obtained. The second layer of convolution has 6*12 convolution kernels. The convolution part of the second layer is not multiplied by 6 feature maps and convolution kernels. It needs Note that the second layer is multiplied by 6 feature maps and 6 convolution kernels and then added (the first layer of PE has no data exchange, and the second layer has data exchange, such as adding the other 5 results to the first PE Above) output a feature map, there are a total of 12 such operations, and finally output 12 feature maps of 10*10, and then pooling after averaging. The output gets 12 feature maps of 5*5.

[0116] The present invention is based on the design scheme of FPGA neural...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a convolutional neural network accelerator based on an FPGA. The convolutional neural network accelerator comprises a controller, N paths of parallel operation processing units,a weight updating unit and a bias updating unit, the controller is connected with each path of operation processing unit; the weight updating unit and the bias updating unit are respectively connected with the controller and each path of operation processing unit; wherein the controller is based on the mth layer structure of the CNN; and reconstructing the connection relationship of the modules in each path of operation processing unit to match the mth layer structure, and respectively updating the weight and bias of the reconstructed operation processing unit by adopting the weight updatingunit and the bias updating unit to enable the reconstructed operation processing unit to process the information according to the processing mode of the mth layer structure. According to the accelerator provided by the embodiment of the invention, the operation of each layer in the CNN network is processed by means of the reconstruction of the operation unit, so that the reuse of resources is achieved, the realization of the CNN network on embedded equipment can be met, and the power consumption is reduced.

Description

technical field [0001] The invention relates to CNN acceleration technology, in particular to an FPGA-based convolutional neural network accelerator. Background technique [0002] With the rise of edge computing, resource-constrained implementation of CNNs such as embedded devices has become particularly urgent. Existing general-purpose processor-based solutions and GPU-accelerated solutions are difficult to implement on resource-constrained embedded devices. [0003] The existing implementation schemes are based on general-purpose processors or GPU-accelerated schemes, but it is difficult for general-purpose processors to implement CNN to meet high-performance requirements, and GPU-accelerated schemes consume too much power and are difficult to implement on resource-constrained embedded devices. realized. [0004] For this reason, how to meet the requirements of high performance and low power consumption under the condition of limited resources in the FPGA-based accelerati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06N3/063G06N3/04

Inventor 谢晓燕辜振坤山蕊蒋林王喜娟

Owner XIAN UNIV OF POSTS & TELECOMM

A Convolutional Neural Network Accelerator Based on FPGA

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology