Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Convolution operation memory access optimization method based on GPU

A memory access and convolution operation technology, which is applied in the field of convolution operation memory access optimization, can solve the problems of large convolution operation memory access overhead and reduced number of convolution memory accesses.

Active Publication Date: 2020-10-20
HARBIN INST OF TECH
View PDF11 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to solve the problem that the memory access cost of the convolution operation in the prior art is large, and the number of memory accesses of the convolution is too many to reduce the performance of the convolution operation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Convolution operation memory access optimization method based on GPU
  • Convolution operation memory access optimization method based on GPU
  • Convolution operation memory access optimization method based on GPU

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0072] Embodiments of the present invention are shown below through the aforementioned examples.

[0073] In order to achieve the purpose of memory access optimization, an embodiment of the present invention is such as Image 6 shown, including:

[0074] S1: Load the convolution kernel data into the shared memory.

[0075] S2: Divide the convolution output into sub-blocks in units of 32 columns to obtain several sub-blocks containing 32-column data and one sub-block with less than 32-column data. which is Figure 5 The division method of the real place.

[0076] S3: It is assumed that there are N threads for processing sub-blocks; each thread calculates the index of the first data required by the thread. The index of the first data is the first and the left and right data required by each thread shown in FIG. 2 . Other required data can be obtained through the index operation of the first data.

[0077] S4: Each thread acquires the remaining required input data from the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A convolution operation memory access optimization method based on a GPU relates to a convolution operation memory access optimization technology. According to the invention, the defect of high memoryaccess overhead of convolution operation in the prior art can be solved. The method is characterized by comprising the steps of loading convolution kernel data into a shared memory; dividing the convolution output into sub-blocks by taking 32 columns as units to obtain a plurality of sub-blocks containing 32 columns of data and a sub-block containing less than 32 columns of data; enabling each thread to calculate an index of first data required by the thread; enabling each thread to obtain the remaining required input data from the index of the first data through a column reuse algorithm andtransmit the obtained input data to a row reuse algorithm; calculating an output result through the row reuse algorithm and storing the output result in register data sum; writing the sum into a global memory; and calculating other to-be-calculated data in the convolution output. The method is used for performing memory access optimization on convolution operation in the fields of image processing, video processing and machine learning.

Description

technical field [0001] The invention relates to a convolution operation memory access optimization technology, in particular to a GPU-based convolution operation memory access optimization method. Background technique [0002] In the fields of image processing, video processing and machine learning, convolution operation has become a core computing mode. 2D convolution is widely used in image filtering and frame difference. Depth-wise convolution is often used in mobile neural networks. Multi-channel 2D convolution is the core operation in neural networks. However, convolution operations consume a lot of computing resources and memory resources, and convolution operations occupy 90% of the execution time in image processing and machine learning. Many optimization methods for convolution operations have been proposed, among which methods based on GEMM (matrix multiplication), FFT and Winograd are the most widely used. However, these methods need to convert the input and out...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/063G06N3/04G06F9/50
CPCG06N3/063G06F9/5016G06N3/045Y02D10/00
Inventor 张伟哲鲁刚钊王峥李克勤孙广中
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products