Unlock instant, AI-driven research and patent intelligence for your innovation.

A device for realizing hardware acceleration of forward prediction of convolutional neural network based on fpga

A convolutional neural network and forward prediction technology, applied in the field of deep learning, can solve problems such as insufficient storage bandwidth, and achieve the effect of significant acceleration, increased frequency, and consistent wiring.

Active Publication Date: 2022-03-15
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, FPGA is a very good choice for deep learning acceleration. However, there are not many researches on specific devices implemented by FPGA for deep learning algorithms. There are problems such as insufficient storage bandwidth, and there is still a lot of room for improvement in acceleration effects.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A device for realizing hardware acceleration of forward prediction of convolutional neural network based on fpga
  • A device for realizing hardware acceleration of forward prediction of convolutional neural network based on fpga
  • A device for realizing hardware acceleration of forward prediction of convolutional neural network based on fpga

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0057] Example 1: FPGA simulation and implementation of convolutional neural network Hcnn forward prediction process

[0058] The simulation platform used in Example 1 is Pycharm, ISE 14.7 and Modelsim 10.1a, and the implemented devices are as follows figure 2 shown. First in Pycharm for figure 2 The Hcnn convolutional neural network is modeled and trained, and the accuracy of the model can reach 96.64%. Save the parameters of the trained Hcnn convolutional neural network model, that is, the weights and bias items of each layer, for FPGA simulation and implementation. It should be noted that in the FPGA implementation, most of the parameters and intermediate register variables adopt the fixed-point method of fi(1,18,12), that is, 1 bit of sign, 5 bits of integer, and 12 bits of decimal. However, in the implementation of the softmax unit, the fitting function coefficient values ​​fluctuate too much in different intervals, so segmental fixed points are required, that is, diff...

example 2

[0061] Example 2: Speed ​​and power consumption performance analysis of the implementation model in Example 1

[0062] The simulation platform used in Example 2 is ISE 14.7 and PyCharm. The data processing time of this model is 233 clks (excluding the time to read input data). According to analysis and statistics, the total number of fixed-point number operations in the forward prediction process is 170510 times. Therefore, at a clock frequency of 200M, the number of floating-point operations per second FLOPS is .

[0063] Then, on the simulation platform PyCharm, use the CPU model of Intel E3-1230V2@3.30GHz and the GPU model of TitanX to complete the calculation of the device in Example 1, and the calculation time of the CPU and GPU to process a sample is 3620ns and 3620ns respectively. 105ns, so the number of floating-point operations per second of the CPU is 47.10GFLOPs, and the number of floating-point operations per second of the GPU is 1623.90GFLOPs.

[0064] The sp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a hardware acceleration realization device for forward prediction of convolutional neural network based on FPGA. Aiming at a specific simplified and optimized convolutional neural network Hcnn, the hardware device is researched and realized for the forward prediction process. This device is based on the optimized systolic array to realize the main computing unit of the convolutional neural network, comprehensively considering the computing processing time and resource consumption, using methods such as parallel-serial conversion, data slicing and pipeline design to achieve the largest possible parallelism and Based on the principle of reducing resource consumption as much as possible, the forward prediction process of the Hcnn convolutional neural network is realized in the form of a parallel pipeline. The characteristics of data parallelism and pipeline parallelism of FPGA are fully utilized. The systolic array structure balances IO reading and writing and computing, improves throughput while consuming less storage bandwidth, and effectively solves the problem of convolutional neural network FPGA implementation where data access speed is much faster than data processing speed.

Description

technical field [0001] The present invention relates to one of the important development directions in artificial intelligence—the field of deep learning, and in particular to a device for realizing hardware acceleration of forward prediction of convolutional neural network based on FPGA. Background technique [0002] In recent years, the field of artificial intelligence, especially machine learning, has achieved breakthrough achievements in both theory and application. Deep learning is one of the most important development directions of machine learning. Deep learning can learn features with multi-level abstraction. Therefore, deep learning has excellent performance in solving complex and abstract learning problems. However, as the problem continues to become more complex and abstract, the model of the deep learning network becomes more complex, and the learning time of the model also increases. For example, Google's "AlphaGo" uses a multi-layer neural network structure co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06N3/063G06N3/04
CPCG06N3/063G06N3/045
Inventor 黄圳何春朱立东王剑
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA