Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

GPU thread scheduling optimization method

A scheduling method and thread technology, applied in the field of GPU thread scheduling, can solve problems such as damage locality, unbalanced warp progress, idleness, etc., and achieve the effect of improving utilization and avoiding pauses.

Inactive Publication Date: 2013-10-02
凯习(北京)信息科技有限公司
View PDF1 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Fair round-robin allows this to happen, however the scheduling policy results in unbalanced warp progress that destroys this locality
But a pure round-robin scheduling strategy tends to make all warps reach the same long-latency operation at the same time, since all warps are stopped, there are not enough warps to hide the long-latency, and the result is that some cycles of FU are idle

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • GPU thread scheduling optimization method
  • GPU thread scheduling optimization method
  • GPU thread scheduling optimization method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention is a GPU thread scheduling method, comprising the following processing steps:

[0031] Step 1: Architecture

[0032] (A) The GPU architecture mentioned in this article refers to the CUDA structure.

[0033] (B) It contains multiple SMs (streaming multiprocessors) inside, and each SM contains multiple CUDA cores.

[0034] (C) Each CUDA core has a computational unit FU.

[0035] (D) A warp contains 32 threads, and the threads in the same warp execute the same instruction and process different data.

[0036] Step 2: Thread Block

[0037] (A) A kernel corresponds to a thread grid, which is the general term for all the threads generated by the corresponding kernel, and the dimension of the grid is specified by the programmer during programming.

[0038] (B) The thread grid contains multiple blocks, and the dimension of the block is specified by the programmer. Thread blocks are numbered starting from 0.

[0039] (C) The thread block is mapped to t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a GPU thread scheduling optimization method. The GPU thread scheduling optimization method includes the steps of numbering thread blocks, mapping the thread blocks to SMs with a hash method, assigning different priority levels to the thread blocks in the same SM, dividing warps in the blocks into groups with fixed sizes according to the stage number of a flow line, and scheduling the warps in the groups, the groups and the blocks with a round-robin method. According to the better GPU thread scheduling optimization method, the problem that all the warps reach the same long delay operation in the same time is solved, through the strategy of warp three-stage scheduling, FU period idling caused by long delay operation is alleviated to some degree, and the use ratio of GPU computing resources is improved.

Description

technical field [0001] The invention relates to a multi-thread scheduling method in a computer system structure, in particular to a GPU thread scheduling method in a heterogeneous system structure. Background technique [0002] Graphics processing units (GPUs) have become a popular platform for executing common parallel applications. Programming systems like CUDA, ATI, and OpenCL allow programmers to parallelize applications into thousands of threads executing the same code. Existing research has also shown that applications running on GPUs have huge speedups compared to running on CPUs. The reason why the GPU can achieve such a huge acceleration is that it has more resources with the same computing power than the CPU. Programmers can make full use of the rich computing resources in the GPU by developing parallelism between threads (TLP, thread-level parallelism). resource. Although there are a large number of computing resources in the GPU, the computing resources in the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F9/48
Inventor 傅翠娇王锐栾钟治钱德沛
Owner 凯习(北京)信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products