GPU thread scheduling optimization method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A scheduling method and thread technology, applied in the field of GPU thread scheduling, can solve problems such as damage locality, unbalanced warp progress, idleness, etc., and achieve the effect of improving utilization and avoiding pauses.

Inactive Publication Date: 2013-10-02

凯习(北京)信息科技有限公司

View PDF1 Cites 30 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Fair round-robin allows this to happen, however the scheduling policy results in unbalanced warp progress that destroys this locality

But a pure round-robin scheduling strategy tends to make all warps reach the same long-latency operation at the same time, since all warps are stopped, there are not enough warps to hide the long-latency, and the result is that some cycles of FU are idle

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0030] The present invention is a GPU thread scheduling method, comprising the following processing steps:

[0031] Step 1: Architecture

[0032] (A) The GPU architecture mentioned in this article refers to the CUDA structure.

[0033] (B) It contains multiple SMs (streaming multiprocessors) inside, and each SM contains multiple CUDA cores.

[0034] (C) Each CUDA core has a computational unit FU.

[0035] (D) A warp contains 32 threads, and the threads in the same warp execute the same instruction and process different data.

[0036] Step 2: Thread Block

[0037] (A) A kernel corresponds to a thread grid, which is the general term for all the threads generated by the corresponding kernel, and the dimension of the grid is specified by the programmer during programming.

[0038] (B) The thread grid contains multiple blocks, and the dimension of the block is specified by the programmer. Thread blocks are numbered starting from 0.

[0039] (C) The thread block is mapped to t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a GPU thread scheduling optimization method. The GPU thread scheduling optimization method includes the steps of numbering thread blocks, mapping the thread blocks to SMs with a hash method, assigning different priority levels to the thread blocks in the same SM, dividing warps in the blocks into groups with fixed sizes according to the stage number of a flow line, and scheduling the warps in the groups, the groups and the blocks with a round-robin method. According to the better GPU thread scheduling optimization method, the problem that all the warps reach the same long delay operation in the same time is solved, through the strategy of warp three-stage scheduling, FU period idling caused by long delay operation is alleviated to some degree, and the use ratio of GPU computing resources is improved.

Description

technical field [0001] The invention relates to a multi-thread scheduling method in a computer system structure, in particular to a GPU thread scheduling method in a heterogeneous system structure. Background technique [0002] Graphics processing units (GPUs) have become a popular platform for executing common parallel applications. Programming systems like CUDA, ATI, and OpenCL allow programmers to parallelize applications into thousands of threads executing the same code. Existing research has also shown that applications running on GPUs have huge speedups compared to running on CPUs. The reason why the GPU can achieve such a huge acceleration is that it has more resources with the same computing power than the CPU. Programmers can make full use of the rich computing resources in the GPU by developing parallelism between threads (TLP, thread-level parallelism). resource. Although there are a large number of computing resources in the GPU, the computing resources in the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F9/48

Inventor 傅翠娇王锐栾钟治钱德沛

Owner 凯习(北京)信息科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

GPU thread scheduling optimization method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology