Graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A program optimization and program technology, applied in the field of high-performance computing, can solve the problems of unclear explanation of the use occasions and conditions of optimization technology, difficult to achieve practical and operational standards, insufficient to give full play to the computing power of GPU devices, etc.

Active Publication Date: 2013-03-20

北京微视威信息科技有限公司

View PDF2 Cites 27 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] The above-mentioned public optimization technologies are limited to several aspects of CUDA program optimization, which are not enough to fully utilize the computing power of the GPU device, and the actual optimization effect is not perfect; at the same time, these documents or authors often do not give the technical theory Elaboration, unclear explanation of the use occasions and conditions of optimization technology, so it is difficult to achieve practical and operable standards

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0033] Principle of invention

[0034] The performance of a CUDA parallel program depends on many factors, each of which will cause a specific lower limit on the execution time of the program, and the final execution time of the program depends on the lowest one of the lower limits. The performance bottlenecks or performance optimization points of CUDA programs are mostly mentioned in the published literature. The present invention still adopts the optimization mode of performance bottlenecks. In order to achieve the purpose of optimizing program performance, it is first necessary to provide definitions for a wide range of program performance bottlenecks (the present invention may involve the same or similar performance bottlenecks as those in the existing literature, but the definition is the same as the existing ones) The definitions given in the literature are not exactly the same).

[0035] The processor of a GPU device will only be in two states during operation: executi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment. The GPU program optimization method defines performance bottleneck of a GPU program core and comprises global storage access delay, shared storage access conflict, instruction pipelining conflict and instruction bottleneck according to grades. An actual operational judgment criterion and a bottleneck optimization solving method of each performance bottleneck are provided. A global storage access delay optimization method includes transferring a shared storage, access merging, improving thread level parallelism and improving instruction level parallelism. A shared storage access conflict and instruction pipelining conflict optimization method includes solving bank conflict, transferring a register, improving thread level parallelism, and improving instruction level parallelism. The instruction bottle neck includes instruction replacing and branch reducing. The GPU program optimization method provides a basis for CUDA programming and optimization, helps a programmer conveniently find the performance bottleneck in a CUDA program, conducts high-efficiency and targeted optimization for the performance bottleneck, and enables the CUDA program to develop computing ability of GPU equipment to the great extent.

Description

technical field [0001] The invention relates to a parallel computing and data processing method in the fields of graphics, animation, scientific computing, geology, biology, physical simulation, etc., in particular to a GPU kernel program optimization method based on CUDA architecture, which belongs to the field of high-performance computing. Background technique [0002] The CUDA architecture (Compute Unified Device Architecture) is a parallel computing architecture for GPU (Graphic Processing Unit) graphics processors and other devices. CUDA C, C++, OpenCL, RapidMind, etc. CUDA C is a C language extension based on the CUDA architecture. Programmers can easily use this set of APIs for GPU programming. The realization of the effect of the program depends on the programmer writing a CUDA kernel program with high performance, stable function and strong portability. The CUDA kernel program is also called the kernel function, which is a parallel computing function running on th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F9/38

Inventor 孟洋李胜汪国平

Owner 北京微视威信息科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology