GPU thread load balancing method and device, chip and electronic equipment

A load balancing and threading technology, applied in the GPU field, can solve the problems of limited performance improvement, increased kernel startup and synchronization overhead, etc., to achieve the effect of improving performance and eliminating load imbalance

Pending Publication Date: 2022-06-03
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method does not fundamentally eliminate the unbalanced GPU thread load, and introduces multiple kernels, which also increases kernel startup and synchronization overhead, which has limited performance improvement
The coarse-grained parallel method reduces the number of open GPU threads and increases the workload of each thread, so as to reduce the performance bottleneck caused by the complex and unbalanced GPU threads, but this method still does not fundamentally eliminate the gap between GPU threads. load imbalance problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • GPU thread load balancing method and device, chip and electronic equipment
  • GPU thread load balancing method and device, chip and electronic equipment
  • GPU thread load balancing method and device, chip and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0065] In order to make the above-mentioned features and effects of the present invention more clearly and easily understood, the following specific embodiments are given, and are described in detail as follows in conjunction with the accompanying drawings.

[0066] As mentioned above, in the optimization of GPU programs, the unbalanced load among GPU threads will seriously affect program performance and become the performance bottleneck of GPU programs. The main technical means for CPU multi-threaded programs to solve the load imbalance among threads is dynamic task allocation, that is, tasks are dynamically allocated according to the execution of threads. However, in GPU programs, it is difficult to achieve dynamic assignment of tasks due to the static mapping relationship between threads and tasks. Neither the program splitting method nor the coarse-grained parallel method fundamentally eliminates the performance bottleneck caused by the unbalanced load among GPU threads. The...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a GPU thread load balancing method and device, a medium and electronic equipment, and the method comprises the steps: fixing the total number of threads started by a GPU program by fixing the number of working groups started by the GPU program and the number of threads started by each working group; all calculation tasks needing to be processed are grouped and all put into a command queue, a global task queue is constructed, and the access authority of each working group to the global task queue is allowed; and according to the current calculation task load condition of the thread which is always started by the GPU, obtaining a calculation task which needs to be executed by each working group from the global task queue. According to the method, by establishing a dynamic mapping relation between threads and tasks and constructing a local queue and a global queue, dynamic allocation according to task loads is achieved, and finally load balancing between GPU threads is achieved.

Description

technical field [0001] The present invention relates to the field of GPU technology, and in particular, to a method, device, chip, and electronic device for balancing GPU thread loads. Background technique [0002] Graphics processing unit (GPU) is an important part of artificial intelligence. As an accelerator, it has been widely used in many fields such as high-performance computing and image processing. [0003] Existing GPU chips have large-scale and fine-grained parallel architecture characteristics, which are suitable for processing large-scale data parallel tasks; at the same time, in traditional GPU programming, the mapping between threads and tasks is static. The mapping relationship will not change. Existing GPUs are very friendly to applications with large-scale data parallelism, which can greatly improve the performance of applications. However, for scenarios that require a combination of graphics processing algorithms and non-graphical algorithms such as artif...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50
CPCG06F9/505G06F9/5038Y02D10/00
Inventor 贾海鹏张云泉
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products