Vectorization implementation method for Valid convolution of convolutional neural network

A technology of convolutional neural network and implementation method, which is applied in the field of vectorized implementation of convolutional neural network Valid convolution, can solve problems such as the impact of loading data efficiency, wasted storage bandwidth, mismatched number of processing units, etc., so as to improve the overall computing power. Efficiency, avoidance of summation of regulations, effect of reducing bandwidth requirements

Active Publication Date: 2020-02-14
NAT UNIV OF DEFENSE TECH
View PDF3 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] According to the architectural characteristics of vector processors, there are currently various vectorization implementation methods for convolution calculations. For example, a vectorization method for convolutional neural network operations of vector processors disclosed in Chinese patent application 201810687639.X, and patent application 201810689646.3. A GPDSP-oriented convolutional neural network multi-core parallel computing method, patent application 201710201589.5 discloses a vectorized implementation method of two-dimensional matrix convolution for vector processors, etc., but such schemes all use weight data Loading into the vector array memory AM, loading the input image feature data into the scalar storage SM of the vector array memory to complete the convolution calculation, and most of them use the third-dimensional order to reorder the data, and the Valid convolution calculation The amount is large, and there is no vectorization implementation method for Valid convolution in convolutional neural network. When the above traditional scheme is applied to the vectorization implementation of Valid convolution, there will be the following problems:
[0006] 1. The weight data cannot be effectively shared, which will waste storage bandwidth and fail to give full play to the computing efficiency of the vector processor
[0007] 2. Since the size of the third dimension is uncertain and does not match the number of processing units of the vector processor, and the size of the third dimension of different convolutional neural network models and different convolutional layers is different, the loading data efficiency of the above-mentioned various schemes will be large. Affected and not universal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Vectorization implementation method for Valid convolution of convolutional neural network
  • Vectorization implementation method for Valid convolution of convolutional neural network
  • Vectorization implementation method for Valid convolution of convolutional neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] The present invention will be further described below in conjunction with the accompanying drawings and specific preferred embodiments, but the protection scope of the present invention is not limited thereby.

[0049] like figure 2 As shown, the steps of the vectorized implementation method of the convolutional neural network Valid convolution in this embodiment include:

[0050] Step 1: Store the input feature data set data used for convolutional neural network calculation in a sample dimension-first manner, that is, the input feature data set data is continuously stored in the off-chip memory of the vector processor according to an N*M order matrix, where M is the total number of samples in the data set; N=preH*preW*preC is the number of input features of a single sample; and the data of the convolution kernel is stored in a priority manner according to the number of convolution kernels;

[0051] Step 2: the vector processor divides the input feature data set data ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Valid convolution vectorization implementation method for a convolutional neural network. The method comprises the following steps: 1, storing input feature data set data ina mode of sample dimension priority, and storing convolution kernel data in a mode of convolution kernel number dimension priority; 2, dividing a data matrix of the input feature data set into a plurality of matrix blocks according to columns; 3, transmitting the convolution kernel data matrix to the SM of each kernel each time, and transmitting a sub-matrix formed by extracting K rows of data from the input feature data matrix in rows to the AM of each kernel; 4, executing vectorization matrix multiplication calculation and parallelization matrix multiplication calculation; 5, storing an output characteristic matrix calculation result in an off-chip memory of the vector processor; and 6, repeating the steps 4 and 5 until all input characteristic data matrixes are calculated. The method has the advantages of being simple in implementation method, high in execution efficiency and precision, small in bandwidth requirement and the like.

Description

technical field [0001] The invention relates to a vector processor, in particular to a method for realizing vectorization of Valid convolution of a convolutional neural network. Background technique [0002] In recent years, deep learning models based on deep convolutional neural networks have made remarkable achievements in image recognition and classification, target detection, video analysis, etc. The rapid development of related technologies such as data processing and processors. Convolutional Neural Networks (CNN) is a type of Feedforward Neural Networks (Feedforward Neural Networks) that includes convolution calculations and has a deep structure. It is one of the representative algorithms for deep learning. The input layer of the convolutional neural network can process multi-dimensional data. Since the convolutional neural network is the most widely used in the field of computer vision, when designing the convolutional neural network structure, three-dimensional inp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/04G06N3/063G06N3/08G06F15/80
CPCG06N3/063G06N3/08G06F15/8007G06N3/045
Inventor 刘仲郭阳邓林田希扈啸陈海燕孙书为马媛曹坤吴立
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products