Convolution operation method based on expansion access on heterogeneous many-core architecture

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A convolution operation and heterogeneous technology, applied in the field of deep learning, can solve problems such as inability to use processor computing resources, poor optimization effect, system bandwidth pressure, etc., to save memory bandwidth resources, reduce memory access requirements, and improve performance effect

Pending Publication Date: 2022-03-22

JIANGNAN INST OF COMPUTING TECH

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] At present, there are some optimized convolution operation methods, such as im2col, which converts the convolution operation into matrix multiplication, and uses the optimized matrix multiplication to optimize the convolution operation, but this method needs to expand the input to the original K*K times , which puts additional pressure on system memory

Heterogeneous many-core processors contain a large number of slave cores and have powerful computing power. Memory access bandwidth is the bottleneck of the system. For this kind of computing-intensive operations, such a method will not only fail to utilize the computing resources of the processor, but will also give the system Bandwidth causes huge pressure, and the optimization effect is not good. Therefore, how to efficiently use the memory access bandwidth and reduce the memory access pressure is the key to give full play to the performance of the processor.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0019] Embodiment: The present invention provides a convolution operation method based on dilation fetching on a heterogeneous many-core architecture, which specifically includes the following steps:

[0020] S1. Input input, weight, and stride, where input is Hi*Wi, weight is K*K, calculate the shape of output output according to the shape of input and weight, and obtain Ho*Wo;

[0021] S2. According to the shape of the output, in the Ho and Wo dimensions, according to the logic number of each core, the convolution calculation tasks are evenly distributed to the cores, and each core processes a calculation task whose size is Ho_BLOCK*Wo_BLOCK;

[0022] S3. Each core calculates the required input size Hi_BLOCK* Wo_BLOCK according to its own task size, Hi_BLOCK=Ho_BLOCK*stride+K-1, Wi_BLOCK= Wo_BLOCK*stride+K-1;

[0023] S4. Each core performs convolution calculation through the obtained input (Hi_BLOCK* Wo_BLOCK) and weight;

[0024] S5. Steps S3 and S4 are repeated until the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a convolution operation method based on expansion access on a heterogeneous many-core architecture, and the method comprises the following steps: S1, inputting an input, a weight and a stride wherein the input is Hi * Wi and the weight is K * K, and calculating the shape of an output according to the shapes of the input and the weight to obtain Ho * Wo; s2, according to the shape of the output, on the dimensions of Ho and Wo, according to the logic number of each core, the convolution calculation tasks are averagely distributed to many cores; s3, determining the size of each core according to the own task size; s4, each kernel carries out convolution calculation through the obtained input (HiBLOCK * WoBLOCK) and the obtained weight; and S5, repeating S3 and S4 until the calculation is finished. According to the method, memory bandwidth resources are saved, and meanwhile, many-core computing resources can be fully utilized.

Description

technical field [0001] The invention relates to a convolution operation method based on expansion fetching on a heterogeneous many-core architecture, and belongs to the technical field of deep learning. Background technique [0002] Convolution is one of the most important concepts in deep learning. During the training and reasoning process of the convolutional neural network, the convolution operation occupies the vast majority of calculations. High-performance computing platforms usually provide dedicated solutions. For calculation-intensive functions, such as convolution in deep learning, how to provide enough data for the powerful calculation kernel in a timely manner and improve the reusability of data is a problem that needs to be solved. [0003] The convolution operation is the core operation of the artificial intelligence CNN network. The data of each K times of the convolution operation overlaps but is not repeated. If the number of data is frequent, if the chara...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/15G06F9/30G06F15/16

CPCG06F17/153G06F15/161G06F9/30007

Inventor 袁欣辉尹万旺林蓉芬魏迪郑岩王飞孙浩男孙强史俊达王丹云

Owner JIANGNAN INST OF COMPUTING TECH

Convolution operation method based on expansion access on heterogeneous many-core architecture

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology