Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Hardware acceleration implementation architecture for forward prediction of convolutional neural network based on FPGA

A convolutional neural network and forward prediction technology, applied in the field of deep learning, can solve problems such as insufficient storage bandwidth, and achieve significant acceleration effects, neat structure, and fast processing speed

Active Publication Date: 2019-09-20
UNIV OF ELECTRONIC SCI & TECH OF CHINA
View PDF9 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, FPGA is a very good choice for deep learning acceleration. However, there are not many studies on the specific architecture of FPGA implementation of deep learning algorithms. There are problems such as insufficient storage bandwidth, and there is still a lot of room for improvement in acceleration effects.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hardware acceleration implementation architecture for forward prediction of convolutional neural network based on FPGA
  • Hardware acceleration implementation architecture for forward prediction of convolutional neural network based on FPGA
  • Hardware acceleration implementation architecture for forward prediction of convolutional neural network based on FPGA

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0057] 1. Example 1: FPGA simulation and implementation of convolutional neural network Hcnn forward prediction process

[0058] The simulation platform used in Example 1 is Pycharm, ISE 14.7 and Modelsim 10.1a, and the implemented architecture is as follows figure 2 shown. First in Pycharm for figure 2 The Hcnn convolutional neural network is modeled and trained, and the accuracy of the model can reach 96.64%. Save the parameters of the trained Hcnn convolutional neural network model, that is, the weights and bias items of each layer, for FPGA simulation and implementation. It should be noted that in the FPGA implementation, most of the parameters and intermediate register variables adopt the fixed-point method of fi(1,18,12), that is, 1 bit of sign, 5 bits of integer, and 12 bits of decimal. However, in the implementation of the softmax unit, the fitting function coefficient values ​​fluctuate too much in different intervals, so segmental fixed points are required, that...

example 2

[0062] The simulation platform used in Example 2 is ISE 14.7 and PyCharm. The data processing time of this model is 233 clks (excluding the time to read input data). According to analysis and statistics, the total number of fixed-point number operations in the forward prediction process is 170510 times. Therefore, at a clock frequency of 200M, the number of FLOPS per second is 170510×200×10 6 / 233=146.36G.

[0063] Then, on the simulation platform PyCharm, use the CPU model of Intel E3-1230V2@3.30GHz and the GPU model of TitanX to complete the calculation of the architecture in Example 1, and the calculation time of CPU and GPU processing a sample is respectively 3620ns and 105ns, so the number of floating-point operations per second of the CPU is 47.10GFLOPs, and the number of floating-point operations per second of the GPU is 1623.90GFLOPs.

[0064] The speed and power consumption performance analysis comparison diagram of the architecture of Example 1 implemented in FPGA...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a hardware acceleration implementation architecture for forward prediction of a convolutional neural network based on an FPGA. Aiming at a specific simplified and optimized convolutional neural network Hcnn, hardware architecture research and implementation for a forward prediction process of the convolutional neural network Hcnn are performed. According to the architecture, main operation units of the convolutional neural network are realized based on an optimized pulsation array. Operation processing time and resource consumption are comprehensively considered, and using methods of parallel-serial conversion, data fragmentation, pipeline design and the like, the forward prediction process of the Hcnn convolutional neural network in a parallel pipeline form is realized by taking the degree of parallelism as high as possible and the resource consumption as low as possible as the principle. And the characteristics of data parallelism and assembly line parallelism of the FPGA are fully utilized. The pulsation array structure balances IO read-write and calculation, the throughput rate is improved under the condition that less storage bandwidth is consumed, and the problem that the data access and storage speed is much higher than the data processing speed of a convolutional neural network FPGA is effectively solved.

Description

technical field [0001] The present invention relates to the field of deep learning, one of the important development directions in artificial intelligence, and specifically relates to a hardware acceleration implementation architecture of FPGA-based convolutional neural network forward prediction. Background technique [0002] In recent years, the field of artificial intelligence, especially machine learning, has achieved breakthrough achievements in both theory and application. Deep learning is one of the most important development directions of machine learning. Deep learning can learn features with multi-level abstraction. Therefore, deep learning has excellent performance in solving complex and abstract learning problems. However, as the problem continues to become more complex and abstract, the model of the deep learning network becomes more complex, and the learning time of the model also increases. For example, Google's "AlphaGo" uses a multi-layer neural network str...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N3/063G06N3/04
CPCG06N3/063G06N3/045
Inventor 黄圳何春朱立东王剑
Owner UNIV OF ELECTRONIC SCI & TECH OF CHINA
Features
  • Generate Ideas
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More