A Convolutional Neural Network Accelerator Based on FPGA
A convolutional neural network and accelerator technology, which is applied in the field of convolutional neural network accelerators, can solve the problems of high power consumption of GPU acceleration solutions and the difficulty of general-purpose processors to meet high performance, and achieves a flexible and low-power address generator. Effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0052] The CNN network is a highly parallel network, and the layers are independent of each other. A general-purpose processor implements CNN in a serial manner. For a highly parallel CNN, the performance is obviously not high, but the present invention is based on an FPGA implementation. The inherent parallelism of the FPGA itself is suitable for the highly parallel network of the network. The FPGA-based implementation scheme Can meet the requirements of high performance; In addition, GPU-based acceleration scheme, although GPU is processed in parallel, but the high power consumption is difficult to achieve on resource-constrained embedded devices, and FPGA is a low-power acceleration program to meet the resource-constrained requirements of embedded devices.
[0053] Such as figure 2 As shown, this embodiment provides an FPGA-based convolutional neural network accelerator. The accelerator in this embodiment is a hardware structure, which realizes the functions of a CNN netw...
Embodiment 2
[0087] In order to better understand the structure of the convolutional neural network accelerator based on the array processor in the present invention, and the reconfigurable performance of the operation processing unit in the accelerator, the structure and information processing process of the LENET-5 neural network are illustrated below as examples.
[0088] Such as Figure 1A As shown, the first layer structure of the LENET-5 network includes: 6 convolution kernels and 6 pooling layers. The processing method is to perform average pooling after convolution of the original image; obtain 6 feature maps;
[0089] Such as Figure 1B As shown, the second layer structure includes: 6*12 convolution kernels; the processing method is: 6 feature maps are multiplied by 6 convolution kernels in 12 rows and then added to output a result, specifically, 6 caches The result of multiplying the feature map output by the first layer structure with the corresponding convolution kernel in the ...
Embodiment 3
[0115] Such as Figure 1A shown in Figure 1A In the structure shown, the original input image is 32*32, and the first layer of convolution has 6 convolution kernels. After convolution, there are 6 feature maps of 28*28. Take these 6 feature maps of 28*28 After average pooling, 6 feature maps of 14*14 are obtained. The second layer of convolution has 6*12 convolution kernels. The convolution part of the second layer is not multiplied by 6 feature maps and convolution kernels. It needs Note that the second layer is multiplied by 6 feature maps and 6 convolution kernels and then added (the first layer of PE has no data exchange, and the second layer has data exchange, such as adding the other 5 results to the first PE Above) output a feature map, there are a total of 12 such operations, and finally output 12 feature maps of 10*10, and then pooling after averaging. The output gets 12 feature maps of 5*5.
[0116] The present invention is based on the design scheme of FPGA neural...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


