Multi-parallel acceleration method for CNN full connection layer operation

A fully connected layer and computing result technology, applied in neural architecture, biological neural network models, etc., can solve problems such as reducing accuracy, wasting energy consumption, consuming processing time, and reducing the number of parameters, so as to improve the system energy efficiency ratio and save Effects of processing time and guaranteed processing accuracy

Active Publication Date: 2019-12-06
BEIJING INST OF SPACECRAFT SYST ENG
View PDF7 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] (1) There is no comprehensive analysis of factors such as the number of target types, the degree of difference in target features, and the degree of error tolerance, and the data format is reasonably selected, resulting in a failure to achieve a good balance between processing accuracy, parameter storage space, and parameter data throughput.
[0009] (2) Model compression on the parameters of the fully connected layer can effectively reduce the number of parameters, but there is also a risk of reducing the accuracy rate
The sparsity of the convolutional processing results of the convolutional neural network is not used, and a large number of invalid read operations are generated during the full connection operation, which wastes energy consumption and processing time
[0010] (3) Using a dedicated processor or ASIC or FPGA to realize the fully connected layer simply increases and iterates hardware resources to achieve logic acceleration, but does not fully utilize the full pipeline performance of programmable logic circuits, occupying more hardware resources and energy efficiency ratio greatly discounted;
[0011] (4) Using GPU to realize the fully connected layer has the advantages of simple programming and fast running speed, but there are problems such as high power consumption and difficult heat dissipation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-parallel acceleration method for CNN full connection layer operation
  • Multi-parallel acceleration method for CNN full connection layer operation
  • Multi-parallel acceleration method for CNN full connection layer operation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0048] The fully-connected layer operation acceleration method based on programmable logic circuits designed by the present invention is divided into the following modules: a processing data format determination module, an operation parallel degree determination module, a fully-connected layer weight parameter data format conversion and storage module, and an operation flow control module , a data judgment module, a data storage control module A / B, a data calculation module, a calculation result storage module and a data output module.

[0049] 1. The processing data format determination module is used to determine the processing data format: according to the application scenario and target characteristics of CNN, determine the data format used for the operation of the fully connected layer. If CNN is used to classify and identify remote sensing ship targets, the classification types are designed to be at most 30, mainly to distinguish large targets such as aircraft carriers, d...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-parallel acceleration method for CNN full-connection layer operation, and the method greatly reduces the reading operation of full-connection layer parameters, effectively saves the energy consumption, and improves the energy efficiency ratio of a system through employing the sparsity characteristics of an operation result of a convolutional neural network and interpreting the numerical value of a convolution layer processing result of the convolutional neural network in advance. By utilizing the characteristics of hardware resource reuse, strong expansion capability and the like of programmable logic devices such as an FPGA and constructing a parallel pipeline multiply-accumulate architecture, the processing time is effectively saved, and the processing efficiency is improved; by comprehensively analyzing factors such as target type data, target characteristic difference and error tolerance in an application process, a processing data format is reasonably set, so that the access efficiency of data and parameters is effectively improved while the processing precision is ensured, and the purpose of multi-parallel acceleration of a full connection layer is achieved.

Description

technical field [0001] The invention belongs to the technical field of computer architecture, and in particular relates to a multi-parallel acceleration method for CNN fully connected layer operations. Background technique [0002] As an important representative of deep learning, convolutional neural network (CNN) is widely used in object classification, recognition, video analysis and natural language processing. CNN is mainly composed of convolutional layer, pooling layer, activation function and fully connected layer. Due to the large amount of parameter data, high precision requirements, and high requirements for external storage speed, the CNN fully connected layer operation has always been the bottleneck restricting the acceleration of CNN. For example, the current typical classification network VGG-16 fully connected layer has 138M parameters. [0003] At present, the main optimization methods of the fully connected layer operation method of CNN are parameter compre...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/04
CPCG06N3/045
Inventor 李宗凌汪路元禹霁阳程博文李珂蒋帅庞亚龙郝梁牛跃华刘伟伟
Owner BEIJING INST OF SPACECRAFT SYST ENG
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products