CUDA thread placement optimization method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An optimization method and threading technology, applied in the direction of multi-programming devices, program control devices, program synchronization, etc., can solve problems such as program performance degradation, process time-consuming, resource allocation conflicts, etc., to reduce work difficulty and workload , good optimization effect

Active Publication Date: 2019-11-01

HARBIN INST OF TECH

View PDF8 Cites 5 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

It is not easy to set the thread block size reasonably, and a single evaluation criterion or a relatively simple strategy is only applicable to some programs

In most cases, setting larger thread blocks can improve GPU utilization, and the program concurrency will be better, but too large thread blocks may also cause GPU resource allocation conflicts and reduce program performance. Therefore, only GPU Utilization is an incomplete measure of performance

In addition, some programs are sensitive to changes in thread block size, and small-scale changes will cause large fluctuations in program performance. This shows that the factors that affect program performance are intertwined in many ways, which also makes the strategy of setting thread block size more complicated.

If you do not use the method of automatically searching for the optimal configuration but manually find the optimal configuration, you need programmers to have certain experience and adapt to the hardware environment. Performance information is collected by continuously modifying the thread block size and running the program repeatedly. The process is time-consuming. Larger, and empirical settings often fail to achieve optimal

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0032] In conjunction with accompanying drawing, the realization of a kind of CUDA thread placement optimization method of the present invention is set forth as follows:

[0033] 1. The overall process of building a CUDA program thread placement optimization model and application is as follows: figure 1shown. The implementation of the thread-optimized placement model is mainly divided into three stages. The first stage is the acquisition of raw data, which is further divided into two parts. The first part uses the CUDA performance analysis tool nvprof to obtain program runtime information, kernel function running time, etc.; the second part first uses the tool clang in LLVM to convert the CUDA source program into an intermediate representation, and then uses the analysis pass to collect the CUDA program kernel Some static information about the function. The second stage completes the data processing, including information summarization, numerical processing, normalization p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a CUDA thread placement optimization method, and relates to a thread optimization technology based on machine learning. The invention aims to provide a CUDA (Compute Unified Device Architecture) thread placement optimization method so as to achieve the purposes of reducing the working difficulty of programming personnel and shortening the acquisition time of training data.The technical key points are as follows: program information acquisition, program information processing and machine learning model training. The program information processing comprises the steps ofperforming information summarization, numeralization processing and normalization processing on static information and program runtime information to obtain training set program features, and settinga label by utilizing a program execution time information set so as to complete the generation of label data; taking the training set program features and the label data as input, and performing performance modeling by using a support vector machine algorithm to obtain a program performance prediction model. When the thread is optimally placed in the application, the program information acquisition module needs to be called to acquire the program information of the program to be optimized. After that, the trained program performance prediction model is input, so that a proper thread block setting scheme can be obtained.

Description

technical field [0001] The invention relates to a CUDA thread placement optimization method, and relates to a machine learning-based thread optimization technology. Background technique [0002] GPUs are common hardware in modern computers, and they mainly provide basic image operations for the CPU. In recent years, based on powerful data parallel processing and floating-point computing capabilities, GPUs have been widely used in engineering applications and scientific computing fields. However, GPU has a complex structure and a multi-threaded programming model completely different from CPU, which makes it relatively complicated to develop efficient parallel programs on GPU. Therefore, it is particularly important to reduce the complexity of GPU programming and improve program performance. In 2007, NVIDIA Corporation released the program model and development environment for parallel computing under the GPU - the Unified Computing Architecture CUDA. Relying on the C langua...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F9/448G06F9/52G06K9/62

CPCG06F9/4484G06F9/449G06F9/52G06F18/2411Y02D10/00

Inventor 张伟哲何慧谢根栓鲁刚钊

Owner HARBIN INST OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

CUDA thread placement optimization method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology