Check patentability & draft patents in minutes with Patsnap Eureka AI!

GPU batch matrix multiplication accelerator and processing method thereof

A technology of matrix multiplication and processing method, which is applied in the direction of processing input data, electrical digital data processing, digital data processing components, etc. Versatility and scope of application, improving efficiency and speed, and accelerating the effect of execution efficiency

Pending Publication Date: 2022-07-01
SOUTH CHINA UNIV OF TECH +1
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] However, in variable-size matrix calculation operations, simultaneous calls of different kernel functions between different calculation instances will lead to irregularities inside the hardware during calculation; Due to the Round-robin scheduling algorithm, the workload of each computing unit is unbalanced when the computing unit is loaded, resulting in uneven distribution of GPU computing resources, which in turn has a great impact on computing parallelism and computing density. damage
[0007] In the prior art, there are the following problems: when the existing GPU-based batch matrix multiplication calculation library solves variable-size large-batch matrix multiplication, the combined effect of the variable-size matrix input distribution and the work group scheduling algorithm will lead to Serious load imbalance and internal irregularities among the computing units affect the execution efficiency of the computing pipeline, making the GPU unable to make good use of the parallel capability of the hardware, thereby reducing the calculation rate of the final algorithm

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • GPU batch matrix multiplication accelerator and processing method thereof
  • GPU batch matrix multiplication accelerator and processing method thereof
  • GPU batch matrix multiplication accelerator and processing method thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053] A GPU batch matrix multiplication accelerator such as figure 1 shown, including:

[0054] The instruction decoding processor reads and decodes the DMA transmission packet submitted by the CPU side to the instruction queue, and transmits the decoded result to the DMA engine to load the matrix data from the CPU side;

[0055] The computing unit is used to perform parallel execution calculation on the loaded matrix shards, and calculate the sharding calculation results in the matrix sharding in parallel through the kernel function;

[0056] Shared memory unit, used to store the matrix part of the intermediate calculation process of matrix sharding;

[0057] High-speed storage unit for storing related data;

[0058] Workgroup manager for work management between units;

[0059] The main branch circuit is composed of a work group manager, an instruction decoding processor, a high-speed storage unit, a shared memory unit, a computing unit, and a DMA, which is used to optimi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a GPU (Graphics Processing Unit) batch matrix multiplication accelerator, which comprises a main branch circuit consisting of a working group manager, an instruction decoding processor, a high-speed storage unit, a shared memory unit, a computing unit and a DMA (Direct Memory Access), and is used for carrying out batch order optimization on matrix data to obtain matrix fragments and loading the matrix fragments into the computing unit; a bypass branch circuit is composed of an instruction decoding processor, a high-speed storage unit, a shared memory unit, a calculation unit and a DMA, when batch order optimization is not carried out, kernel function calculation is directly carried out on matrix data, and a matrix calculation result is obtained; according to the method, the utilization efficiency of the computing units can be effectively improved, the load balance of the task load of each computing unit is achieved, the computing density during operation is improved, and higher instruction parallelism, thread parallelism and memory access parallelism are achieved, so that the computing power of hardware is fully exerted, and the purpose of computing acceleration is achieved.

Description

technical field [0001] The invention relates to the research field of high-performance computing, in particular to a GPU batch matrix multiplication accelerator and a processing method thereof. Background technique [0002] Basic Linear Algebra Subprograms (Basic Linear Algebra Subprograms, BLAS) is an interface standard for a series of basic linear algebra operation functions, which are widely used in various fields of scientific computing and industry. Some higher-level languages ​​and computing libraries are also implemented by calling the BLAS interface (R language, Matlab, Numpy, Lapack, etc.). Nowadays, with the development of BLAS, various BLAS library implementations based on different platforms and hardware architectures have appeared, such as CuBLAS, RocBLAS, MKL, MAGMA, OpenBLAS, etc., and have played a crucial role in the development of modern science and industry. [0003] Classical BLAS application scenarios tend to have better performance for large inputs (la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/16G06F17/18G06F13/28G06F9/54G06F9/50G06F7/523G06F7/08
CPCG06F17/16G06F17/18G06F7/08G06F7/523G06F13/28G06F9/544G06F9/5011
Inventor 陆璐王瑞民冼允廷
Owner SOUTH CHINA UNIV OF TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More