Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Parallel processing method for multi-input multi-output matrix convolution

A parallel processing, multi-output technology, applied in neural learning methods, neural architectures, biological neural network models, etc., can solve problems such as difficulty in exerting computing advantages and small convolution kernel scale, achieving high-performance computing capabilities and easy operation. , the effect of improving computational efficiency

Active Publication Date: 2018-06-26
NAT UNIV OF DEFENSE TECH
View PDF6 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Matrix convolution is one of the core modules commonly used in convolutional neural network models. It is not only computationally intensive and memory-intensive, but the size of the convolution kernel in matrix convolution calculations is generally small. Therefore, if reasonable calculations cannot be taken method, even if high-performance computing equipment is used, it is difficult to exert due computing advantages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel processing method for multi-input multi-output matrix convolution
  • Parallel processing method for multi-input multi-output matrix convolution
  • Parallel processing method for multi-input multi-output matrix convolution

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0030] Such as Figure 7 Shown, a kind of parallel processing method of multi-input multi-output matrix convolution of the present invention, its steps are:

[0031] S1: According to the number N of vector processing units VPE of the vector processor, the number M of input feature maps, the number P of convolution kernels, the size k of convolution kernels, and the moving step size s, determine the optimal calculation scheme for output feature maps ;

[0032] S2: Store the M input feature maps in the external storage DDR in turn, splice the N input convolution kernels according to the third dimension and row by row, and transfer the spliced ​​convolution kernel matrix to the vector processor In the vector storage body of ; wherein, N<=p;

[0033] S3: Load input features figure 1 The first element of is broadcast to the vector regi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a parallel processing method for multi-input multi-output matrix convolution. The method comprises the steps that S1, an optimal calculation scheme of output feature maps is determined according to the quantity N of vector processing elements (VPEs) of a vector processor and other parameters; S2, M input feature maps are sequentially stored into an external storage DDR, andN input convolution kernels are stitched in rows according to a third dimension; S3, a first element of the first input feature map is loaded and broadcasted to a vector register, and meanwhile elements in a first row of a convolution kernel in an AM are loaded into the vector register; S4, accumulation is performed kxk times, calculation of the first input feature map is completed, and meanwhilethe second input feature map is loaded; S5, the steps are repeated till calculation of first elements of N output feature maps is completed; S6, calculation of all elements of the N output feature maps is completed according to movement step length; and S7, the steps are repeated till the process is finally completed. The method has the advantages of being easy to implement, convenient to operate, capable of improving the parallelism of the vector processor and improving processor operation efficiency and the like.

Description

technical field [0001] The invention mainly relates to the fields of artificial intelligence, machine learning, and convolutional neural network, and in particular refers to a parallel processing method of multi-input multi-output matrix convolution. Background technique [0002] With the rise of deep learning technology, the target recognition technology based on convolutional neural network has made a breakthrough, and it has been widely used in image recognition, speech recognition, natural language processing and other fields. Matrix convolution is a calculation-intensive and memory-intensive calculation, and the matrix convolution operation in the convolutional neural network model often accounts for more than 85% of the calculation of a convolutional neural network model, so how to speed up the matrix convolution operation It is an important and difficult point of current research. [0003] For calculation and memory-intensive matrix convolution operations, the curren...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/04G06N3/08
CPCG06N3/08G06N3/045
Inventor 郭阳张军阳杨超田希扈啸李斌
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products