Hardware acceleration implementation architecture for forward prediction of convolutional neural network based on FPGA

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A convolutional neural network and forward prediction technology, applied in the field of deep learning, can solve problems such as insufficient storage bandwidth, and achieve significant acceleration effects, neat structure, and fast processing speed

Active Publication Date: 2019-09-20

UNIV OF ELECTRONIC SCI & TECH OF CHINA

View PDF9 Cites 11 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Therefore, FPGA is a very good choice for deep learning acceleration. However, there are not many studies on the specific architecture of FPGA implementation of deep learning algorithms. There are problems such as insufficient storage bandwidth, and there is still a lot of room for improvement in acceleration effects.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

example 1

[0057] 1. Example 1: FPGA simulation and implementation of convolutional neural network Hcnn forward prediction process

[0058] The simulation platform used in Example 1 is Pycharm, ISE 14.7 and Modelsim 10.1a, and the implemented architecture is as follows figure 2 shown. First in Pycharm for figure 2 The Hcnn convolutional neural network is modeled and trained, and the accuracy of the model can reach 96.64%. Save the parameters of the trained Hcnn convolutional neural network model, that is, the weights and bias items of each layer, for FPGA simulation and implementation. It should be noted that in the FPGA implementation, most of the parameters and intermediate register variables adopt the fixed-point method of fi(1,18,12), that is, 1 bit of sign, 5 bits of integer, and 12 bits of decimal. However, in the implementation of the softmax unit, the fitting function coefficient values fluctuate too much in different intervals, so segmental fixed points are required, that...

example 2

[0062] The simulation platform used in Example 2 is ISE 14.7 and PyCharm. The data processing time of this model is 233 clks (excluding the time to read input data). According to analysis and statistics, the total number of fixed-point number operations in the forward prediction process is 170510 times. Therefore, at a clock frequency of 200M, the number of FLOPS per second is 170510×200×10 6 / 233=146.36G.

[0063] Then, on the simulation platform PyCharm, use the CPU model of Intel E3-1230V2@3.30GHz and the GPU model of TitanX to complete the calculation of the architecture in Example 1, and the calculation time of CPU and GPU processing a sample is respectively 3620ns and 105ns, so the number of floating-point operations per second of the CPU is 47.10GFLOPs, and the number of floating-point operations per second of the GPU is 1623.90GFLOPs.

[0064] The speed and power consumption performance analysis comparison diagram of the architecture of Example 1 implemented in FPGA...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a hardware acceleration implementation architecture for forward prediction of a convolutional neural network based on an FPGA. Aiming at a specific simplified and optimized convolutional neural network Hcnn, hardware architecture research and implementation for a forward prediction process of the convolutional neural network Hcnn are performed. According to the architecture, main operation units of the convolutional neural network are realized based on an optimized pulsation array. Operation processing time and resource consumption are comprehensively considered, and using methods of parallel-serial conversion, data fragmentation, pipeline design and the like, the forward prediction process of the Hcnn convolutional neural network in a parallel pipeline form is realized by taking the degree of parallelism as high as possible and the resource consumption as low as possible as the principle. And the characteristics of data parallelism and assembly line parallelism of the FPGA are fully utilized. The pulsation array structure balances IO read-write and calculation, the throughput rate is improved under the condition that less storage bandwidth is consumed, and the problem that the data access and storage speed is much higher than the data processing speed of a convolutional neural network FPGA is effectively solved.

Description

technical field [0001] The present invention relates to the field of deep learning, one of the important development directions in artificial intelligence, and specifically relates to a hardware acceleration implementation architecture of FPGA-based convolutional neural network forward prediction. Background technique [0002] In recent years, the field of artificial intelligence, especially machine learning, has achieved breakthrough achievements in both theory and application. Deep learning is one of the most important development directions of machine learning. Deep learning can learn features with multi-level abstraction. Therefore, deep learning has excellent performance in solving complex and abstract learning problems. However, as the problem continues to become more complex and abstract, the model of the deep learning network becomes more complex, and the learning time of the model also increases. For example, Google's "AlphaGo" uses a multi-layer neural network str...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/063G06N3/04

CPCG06N3/063G06N3/045

Inventor 黄圳何春朱立东王剑

Owner UNIV OF ELECTRONIC SCI & TECH OF CHINA

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Hardware acceleration implementation architecture for forward prediction of convolutional neural network based on FPGA

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

example 1

example 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology