Graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment

A program optimization and program technology, applied in the field of high-performance computing, can solve the problems of unclear explanation of the use occasions and conditions of optimization technology, difficult to achieve practical and operational standards, insufficient to give full play to the computing power of GPU devices, etc.

Active Publication Date: 2013-03-20
北京微视威信息科技有限公司
View PDF2 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The above-mentioned public optimization technologies are limited to several aspects of CUDA program optimization, which are not enough to fully utilize the computing power of the GPU device, and the actual optimization effect is not perfect; at the same time, these documents or authors often do not give the technical theory Elaboration, unclear explanation of the use occasions and conditions of optimization technology, so it is difficult to achieve practical and operable standards

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment
  • Graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment
  • Graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] Principle of invention

[0034] The performance of a CUDA parallel program depends on many factors, each of which will cause a specific lower limit on the execution time of the program, and the final execution time of the program depends on the lowest one of the lower limits. The performance bottlenecks or performance optimization points of CUDA programs are mostly mentioned in the published literature. The present invention still adopts the optimization mode of performance bottlenecks. In order to achieve the purpose of optimizing program performance, it is first necessary to provide definitions for a wide range of program performance bottlenecks (the present invention may involve the same or similar performance bottlenecks as those in the existing literature, but the definition is the same as the existing ones) The definitions given in the literature are not exactly the same).

[0035] The processor of a GPU device will only be in two states during operation: executi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment. The GPU program optimization method defines performance bottleneck of a GPU program core and comprises global storage access delay, shared storage access conflict, instruction pipelining conflict and instruction bottleneck according to grades. An actual operational judgment criterion and a bottleneck optimization solving method of each performance bottleneck are provided. A global storage access delay optimization method includes transferring a shared storage, access merging, improving thread level parallelism and improving instruction level parallelism. A shared storage access conflict and instruction pipelining conflict optimization method includes solving bank conflict, transferring a register, improving thread level parallelism, and improving instruction level parallelism. The instruction bottle neck includes instruction replacing and branch reducing. The GPU program optimization method provides a basis for CUDA programming and optimization, helps a programmer conveniently find the performance bottleneck in a CUDA program, conducts high-efficiency and targeted optimization for the performance bottleneck, and enables the CUDA program to develop computing ability of GPU equipment to the great extent.

Description

technical field [0001] The invention relates to a parallel computing and data processing method in the fields of graphics, animation, scientific computing, geology, biology, physical simulation, etc., in particular to a GPU kernel program optimization method based on CUDA architecture, which belongs to the field of high-performance computing. Background technique [0002] The CUDA architecture (Compute Unified Device Architecture) is a parallel computing architecture for GPU (Graphic Processing Unit) graphics processors and other devices. CUDA C, C++, OpenCL, RapidMind, etc. CUDA C is a C language extension based on the CUDA architecture. Programmers can easily use this set of APIs for GPU programming. The realization of the effect of the program depends on the programmer writing a CUDA kernel program with high performance, stable function and strong portability. The CUDA kernel program is also called the kernel function, which is a parallel computing function running on th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/38
Inventor 孟洋李胜汪国平
Owner 北京微视威信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products