Convolutional neural network parallel processing method based on OpenCL

A convolutional neural network and parallel processing technology, applied in the field of parallel processing of convolutional neural networks, can solve problems such as high model complexity and slow running speed, and achieve the effect of overcoming convolution operations, improving running speed, and overcoming complex structures

Active Publication Date: 2019-08-09
XIDIAN UNIV
View PDF7 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to address the above-mentioned deficiencies in the prior art, and propose a method for parallel processing of convolutional neural

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Convolutional neural network parallel processing method based on OpenCL
  • Convolutional neural network parallel processing method based on OpenCL
  • Convolutional neural network parallel processing method based on OpenCL

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The present invention will be further described below in conjunction with the accompanying drawings.

[0036] Refer to attached figure 1 , to further describe the specific steps of the present invention.

[0037] Step 1, obtain the reorganization matrix of the image data matrix.

[0038] Read image data matrix in host memory.

[0039] Create an image data matrix cache object in the GPU global memory, and transfer the image data matrix from the host memory to the image data matrix cache object.

[0040] According to the size of the convolution kernel and the number of channels of the image data matrix, the number of rows of the reorganization matrix is ​​calculated, and the image data matrix is ​​combined and rearranged in parallel to obtain a two-dimensional reorganization matrix. The size of the reorganization matrix is ​​K×N.

[0041] The specific steps of the parallel combination rearrangement are as follows:

[0042] Step 1, according to the following formula, c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a convolutional neural network parallel processing method based on an OpenCL, and mainly solves the problems of high model complexity and slow operation speed in the existing convolutional neural network parallel processing. The method comprises the steps of obtaining a recombination matrix of an image data matrix; obtaining a weight matrix; carrying out parallel computing on the product of the weight matrix and the recombination matrix in a blocking manner; performing parallel batch normalization on the product matrix; and outputting a characteristic value matrix. According to the invention, a large number of parallel computing units in a computer graphics processing unit (GPU) are utilized to convert the convolution process of the convolutional neural network intolarge-scale matrix multiplication, the product of the weight matrix and the recombination matrix is subjected to block parallel computing, the processing process of convolution layer data is simplified, the access mode of the data is optimized, the reuse rate of the data is improved, and the operation speed of the convolutional neural network is greatly improved.

Description

technical field [0001] The invention belongs to the field of computer technology, and further relates to a convolutional neural network parallel processing method using an open computing language OpenCL (Open Computing Language) in the field of computer vision and deep learning technology. The invention can accelerate the image convolution process of the convolutional neural network, and can be used for real-time target detection of computer vision. Background technique [0002] The convolution process of the convolutional neural network requires a large number of floating-point operations. When the number of layers of the convolutional neural network continues to deepen, the execution efficiency of the CPU is far from meeting the requirements. The GPU provides a large number of parallel computing units. OpenCL can program the GPU on the host side, and the heterogeneous architecture of CPU+GPU can be used to accelerate the image convolution process of the convolutional neura...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/04
CPCG06N3/045
Inventor 田小林荀亮张晰李娇娇李芳李帅逯甜甜焦李成
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products