Supercharge Your Innovation With Domain-Expert AI Agents!

CUDA thread placement optimization method

An optimization method and threading technology, applied in the direction of multi-programming devices, program control devices, program synchronization, etc., can solve problems such as program performance degradation, process time-consuming, resource allocation conflicts, etc., to reduce work difficulty and workload , good optimization effect

Active Publication Date: 2019-11-01
HARBIN INST OF TECH
View PDF8 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It is not easy to set the thread block size reasonably, and a single evaluation criterion or a relatively simple strategy is only applicable to some programs
In most cases, setting larger thread blocks can improve GPU utilization, and the program concurrency will be better, but too large thread blocks may also cause GPU resource allocation conflicts and reduce program performance. Therefore, only GPU Utilization is an incomplete measure of performance
In addition, some programs are sensitive to changes in thread block size, and small-scale changes will cause large fluctuations in program performance. This shows that the factors that affect program performance are intertwined in many ways, which also makes the strategy of setting thread block size more complicated.
If you do not use the method of automatically searching for the optimal configuration but manually find the optimal configuration, you need programmers to have certain experience and adapt to the hardware environment. Performance information is collected by continuously modifying the thread block size and running the program repeatedly. The process is time-consuming. Larger, and empirical settings often fail to achieve optimal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • CUDA thread placement optimization method
  • CUDA thread placement optimization method
  • CUDA thread placement optimization method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] In conjunction with accompanying drawing, the realization of a kind of CUDA thread placement optimization method of the present invention is set forth as follows:

[0033] 1. The overall process of building a CUDA program thread placement optimization model and application is as follows: figure 1shown. The implementation of the thread-optimized placement model is mainly divided into three stages. The first stage is the acquisition of raw data, which is further divided into two parts. The first part uses the CUDA performance analysis tool nvprof to obtain program runtime information, kernel function running time, etc.; the second part first uses the tool clang in LLVM to convert the CUDA source program into an intermediate representation, and then uses the analysis pass to collect the CUDA program kernel Some static information about the function. The second stage completes the data processing, including information summarization, numerical processing, normalization p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a CUDA thread placement optimization method, and relates to a thread optimization technology based on machine learning. The invention aims to provide a CUDA (Compute Unified Device Architecture) thread placement optimization method so as to achieve the purposes of reducing the working difficulty of programming personnel and shortening the acquisition time of training data.The technical key points are as follows: program information acquisition, program information processing and machine learning model training. The program information processing comprises the steps ofperforming information summarization, numeralization processing and normalization processing on static information and program runtime information to obtain training set program features, and settinga label by utilizing a program execution time information set so as to complete the generation of label data; taking the training set program features and the label data as input, and performing performance modeling by using a support vector machine algorithm to obtain a program performance prediction model. When the thread is optimally placed in the application, the program information acquisition module needs to be called to acquire the program information of the program to be optimized. After that, the trained program performance prediction model is input, so that a proper thread block setting scheme can be obtained.

Description

technical field [0001] The invention relates to a CUDA thread placement optimization method, and relates to a machine learning-based thread optimization technology. Background technique [0002] GPUs are common hardware in modern computers, and they mainly provide basic image operations for the CPU. In recent years, based on powerful data parallel processing and floating-point computing capabilities, GPUs have been widely used in engineering applications and scientific computing fields. However, GPU has a complex structure and a multi-threaded programming model completely different from CPU, which makes it relatively complicated to develop efficient parallel programs on GPU. Therefore, it is particularly important to reduce the complexity of GPU programming and improve program performance. In 2007, NVIDIA Corporation released the program model and development environment for parallel computing under the GPU - the Unified Computing Architecture CUDA. Relying on the C langua...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F9/448G06F9/52G06K9/62
CPCG06F9/4484G06F9/449G06F9/52G06F18/2411Y02D10/00
Inventor 张伟哲何慧谢根栓鲁刚钊
Owner HARBIN INST OF TECH
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More