Hardware acceleration implementation architecture for backward training of convolutional neural network based on FPGA

A convolutional neural network and convolutional layer technology, applied in the field of deep learning, can solve problems such as insufficient storage bandwidth, and achieve the effect of significant acceleration effect, regular structure, and improved storage bandwidth.

Active Publication Date: 2019-12-06
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF8 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, FPGA is a very good choice for deep learning acceleration. However, there are not many studies on the specific structure of FPGA implementation ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hardware acceleration implementation architecture for backward training of convolutional neural network based on FPGA
  • Hardware acceleration implementation architecture for backward training of convolutional neural network based on FPGA
  • Hardware acceleration implementation architecture for backward training of convolutional neural network based on FPGA

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0068] 1. Example 1: FPGA simulation and implementation of convolutional neural network Hcnn backward training process

[0069] The simulation platform used in Example 1 is Matlab R2017b, ISE 14.7 and Modelsim 10.1a, and the implemented architecture is as follows figure 2 shown. First in Matlab R2017b for figure 2 The Hcnn convolutional neural network is verified by matlab fixed-point simulation, and the accuracy of the model can reach 95.34%. Then in ISE 14.7 and Modelsim 10.1a, the simulation verification and implementation of the hardware architecture are carried out. In matlab fixed-point simulation and FPGA implementation, most parameters and intermediate register variables adopt the fixed-point method of fi(1,18,12), that is, 1 bit of sign, 5 bits of integer, and 12 bits of decimal.

[0070] The Modelsim simulation results of the forward prediction process of the Hcnn convolutional neural network in Example 1 are as follows Figure 9 shown. It can be seen from the...

example 2

[0073] The simulation platform used in Example 2 is ISE 14.7 and PyCharm. first by Figure 9 It can be seen that the data processing time of the model is 821 clks (excluding the time to read input data), so at a clock frequency of 200M, the time used is 4105ns.

[0074] Then, on the simulation platform PyCharm, use the CPU model of Intel E3-1230V2@3.30GHz and the GPU model of TitanX to complete the calculation of the structure in Example 1, and the calculation time of the CPU and GPU to process a sample is 7330ns and 405ns.

[0075] The speed and power consumption performance analysis comparison diagram of the structure of Example 1 implemented in FPGA, CPU, and GPU is as follows Figure 11 shown. As can be seen from the figure, in terms of speed, the convolutional neural network FPGA implementation architecture of the present invention has an improvement of about three times compared with the CPU; and there is still a certain gap compared with the GPU, which is limited by ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a hardware acceleration implementation architecture for backward training of a convolutional neural network based on an FPGA. Based on a basic processing module of backward training of each layer of a convolutional neural network, the hardware acceleration implementation architecture comprehensively considers operation processing time and resource consumption, and realizesa backward training process of the Hcnn convolutional neural network in a parallel pipeline form by using methods of parallel-serial conversion, data fragmentation, pipeline design resource reuse andthe like according to the principle of achieving the maximum degree of parallelism and the minimum amount of resource consumption as much as possible. According to the hardware acceleration implementation architecture, the characteristics of data parallelism and assembly line parallelism of the FPGA are fully utilized, and the implementation is simple, and the structure is more regular, and the wiring is more consistent, and the frequency is also greatly improved, and the acceleration effect is remarkable. More importantly, the hardware acceleration implementation architecture uses an optimized pulsation array structure to balance IO read-write and calculation, improves the throughput rate under the condition that less storage bandwidth is consumed, and effectively solves the problem thatthe data access speed is much higher than that of implementation of a convolutional neural network FPGA with the data processing speed.

Description

technical field [0001] The present invention relates to the field of deep learning, one of the important development directions in artificial intelligence, and specifically relates to a hardware acceleration implementation architecture for backward training of convolutional neural networks based on FPGA. Background technique [0002] In recent years, the field of artificial intelligence, especially machine learning, has achieved breakthrough achievements in both theory and application. Deep learning is one of the most important development directions of machine learning. Deep learning can learn features with multi-level abstraction. Therefore, deep learning has excellent performance in solving complex and abstract learning problems. However, as the problem continues to become more complex and abstract, the model of the deep learning network becomes more complex, and the learning time of the model also increases. For example, Google's "AlphaGo" uses a multi-layer neural netw...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/063G06N3/04
CPCG06N3/063G06N3/045
Inventor 黄圳何春李玉柏王坚
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products