Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A hardware acceleration implementation device for backward training of convolutional neural network based on fpga

A convolutional neural network and hardware acceleration technology, applied in the field of deep learning, can solve problems such as insufficient storage bandwidth, achieve significant acceleration effects, consistent wiring, and increased frequency

Active Publication Date: 2022-05-03
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, FPGA is a very good choice for deep learning acceleration. However, there are not many studies on the specific structure of FPGA implementation of deep learning algorithms. There are problems such as insufficient storage bandwidth, and there is still a lot of room for improvement in acceleration effects.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A hardware acceleration implementation device for backward training of convolutional neural network based on fpga
  • A hardware acceleration implementation device for backward training of convolutional neural network based on fpga
  • A hardware acceleration implementation device for backward training of convolutional neural network based on fpga

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0068] 1. Example 1: FPGA simulation and implementation of convolutional neural network Hcnn backward training process

[0069] The simulation platform used in Example 1 is Matlab R2017b, ISE 14.7 and Modelsim 10.1a, and the implemented devices are as follows figure 2 shown. First in Matlab R2017b for figure 2 The Hcnn convolutional neural network is verified by matlab fixed-point simulation, and the accuracy of the model can reach 95.34%. Then in ISE 14.7 and Modelsim 10.1a, the simulation verification and realization of the hardware device are carried out. In matlab fixed-point simulation and FPGA implementation, most parameters and intermediate register variables adopt the fixed-point method of fi(1,18,12), that is, 1 bit of sign, 5 bits of integer, and 12 bits of decimal.

[0070] The Modelsim simulation results of the forward prediction process of the Hcnn convolutional neural network in Example 1 are as follows Figure 9 shown. It can be seen from the figure that ...

example 2

[0073] The simulation platform used in Example 2 is ISE 14.7 and PyCharm. first by Figure 9 It can be seen that the data processing time of the model is 821 clks (excluding the time to read input data), so at a clock frequency of 200M, the time used is 4105ns.

[0074] Then, on the simulation platform PyCharm, use the CPU model of Intel E3-1230V2@3.30GHz and the GPU model of TitanX to complete the calculation of the structure in Example 1, and the calculation time of CPU and GPU to process a sample is 7330ns and 405ns.

[0075] The speed and power consumption performance analysis comparison diagram of the structure of Example 1 implemented in FPGA, CPU, and GPU is as follows Figure 11 shown. As can be seen from the figure, in terms of speed, the convolutional neural network FPGA implementation device of the present invention has about three times the improvement compared with the CPU; and compared with the GPU, there is still a certain gap, which is limited by the resourc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a device for realizing hardware acceleration of backward training of convolutional neural network based on FPGA. This device is based on the basic processing modules of the backward training of each layer of the convolutional neural network, comprehensively considering the operation processing time and resource consumption, using methods such as parallel-serial conversion, data fragmentation, pipeline design resource reuse, etc., to achieve the largest possible Based on the principle of parallelism and as little resource consumption as possible, the backward training process of the Hcnn convolutional neural network is realized in the form of a parallel pipeline. The device makes full use of the characteristics of FPGA's data parallelism and pipeline parallelism, which is simple to implement, more regular in structure, more consistent in wiring, greatly increased in frequency, and has a remarkable acceleration effect. More importantly, this structure uses an optimized systolic array structure to balance IO reads and writes and calculations, improves throughput while consuming less storage bandwidth, and effectively solves the problem of convolution where data access speed is much faster than data processing speed. Problems with neural network FPGA implementation.

Description

technical field [0001] The present invention relates to one of the important development directions in artificial intelligence—the field of deep learning, and in particular to a hardware acceleration implementation device for backward training of convolutional neural network based on FPGA. Background technique [0002] In recent years, the field of artificial intelligence, especially machine learning, has achieved breakthrough achievements in both theory and application. Deep learning is one of the most important development directions of machine learning. Deep learning can learn features with multi-level abstraction. Therefore, deep learning has excellent performance in solving complex and abstract learning problems. However, as the problem continues to become more complex and abstract, the model of the deep learning network becomes more complex, and the learning time of the model also increases. For example, Google's "AlphaGo" uses a multi-layer neural network structure c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06N3/063G06N3/04
CPCG06N3/063G06N3/045
Inventor 黄圳何春李玉柏王坚
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products