The invention relates to a
graphics processing unit (GPU)
program optimization method based on compute unified device architecture (
CUDA) parallel environment. The GPU
program optimization method defines performance
bottleneck of a GPU program core and comprises global storage access
delay, shared storage access conflict, instruction pipelining conflict and instruction
bottleneck according to grades. An actual operational judgment criterion and a
bottleneck optimization solving method of each performance bottleneck are provided. A global storage access
delay optimization method includes transferring a shared storage, access merging, improving thread level parallelism and improving instruction level parallelism. A shared storage access conflict and instruction pipelining conflict optimization method includes solving
bank conflict, transferring a register, improving thread level parallelism, and improving instruction level parallelism. The instruction
bottle neck includes instruction replacing and
branch reducing. The GPU
program optimization method provides a basis for
CUDA programming and optimization, helps a
programmer conveniently find the performance bottleneck in a
CUDA program, conducts high-efficiency and targeted optimization for the performance bottleneck, and enables the CUDA program to develop computing ability of GPU equipment to the great extent.