Supercharge Your Innovation With Domain-Expert AI Agents!

Method and system for reverse parsing of GPU instruction

An instruction and solver technology, applied in the field of reverse analysis of GPU instructions, can solve problems such as waste of computing resources, low code efficiency, PTX cannot control register allocation, etc., and achieve the effect of improving efficiency

Active Publication Date: 2017-06-13
INST OF COMPUTING TECH CHINESE ACAD OF SCI +1
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Nvidia, a GPU manufacturer in a monopoly position, maintains the inertia of closed technology, does not provide assemblers, does not support the lowest-level assembly programming, and does not disclose hardware architecture features that can only be controlled at the assembly level. As a supplement, it provides low-level Although the interface PTX is an intermediate representation very close to assembly, the control ability of PTX to hardware is lower than that of assembly. For example, PTX cannot control register allocation, nor can it precisely control the scheduling behavior of instructions. With less control, developers can only hope for compiler optimizations to improve performance, however, "Daniel JBernstein, Hsieh-Chung Chen, Chen-Mou Cheng, Tanja Lange, Ruben Niederhagen, Peter Schwabe, and Bo-Yin Yang. Usable assembly language for gpus: a success story. IACR Cryptology ePrint Archive, 2012:137, 2012.” pointed out that the code generated by the compiler NVCC provided by Nvidia is not very efficient, such as register allocation, there are a lot of bank conflicts. In fact, the Nvidia released Many parallel algorithm libraries are based on the internal assembler, and then manually compiled and optimized to achieve ideal efficiency. The problem is that unlike the 3D graphics field, which only has a small number of rendering engine developers, the user base of GPGPU is extensive and diverse. However, Nvidia only manually compiles and optimizes a small number of algorithm libraries, and only provides official support for a small number of large customers. The rest of the large number of users cannot extract the maximum performance from expensive GPGPU hardware. This is a huge waste of computing resources. The bad thing is that Nvidia has not optimized many core algorithms that are widely used, such as single-precision floating-point matrix multiplication (singe-precision matrix multiplication), Nvidia's manual assembly optimization version for the mainstream Kepler architecture, and the efficiency only reaches the theoretical peak 74% is the performance comparison between third-party optimized single-precision floating-point multiplication and SGEMM in NVIDIA manufacturer's cuBLAS. The performance of third-party assembly optimization is higher than that of cuBLAS. These studies show that assembly optimization is very valuable for mining GPU performance.
[0004] Some researchers have made some scattered progress on GPU performance tuning and tools, such as the microbenchmark program "Zhang, Yao, and John D. Owens."A quantitative performance analysis model forGPU architectures."In 2011 IEEE 17th International Symposium on HighPerformance Computer Architecture, pp.382-393. IEEE, 2011." "Xinxin Mei, KaiyongZhao, Chengjian Liu, and Xiaowen Chu. Benchmarking the memory hierarchy of modern gpus. In Network and Parallel Computing, pages 144–156. Springer, 2014. "Henry Wong, Misel-Myrto Papadopoulou, Maryam Sadooghi-Alvandi, and Andreas Moshovos. Demystifying gpu microarchitecture through microbenchmarking. In Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on, pages 235–206. IEEE", 235–206. IEEE, and GPU assembly-level optimization, however, their work is only concentrated on a single level, and they have not proposed a general technology that can be continued on the next-generation architecture, such as instruction cracking methods and corresponding automation tools. In addition, GPU has not yet Public benchmarking programs at the assembly level, most of the benchmarking programs in use are based on CUDA, resulting in unreliable test results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for reverse parsing of GPU instruction
  • Method and system for reverse parsing of GPU instruction
  • Method and system for reverse parsing of GPU instruction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The following is the instruction parsing algorithm flow of the present invention, as follows:

[0032] Instruction decoding needs to generate the corresponding relationship between 64-bit instruction encoding and assembly instructions, such as figure 2 As shown, the algorithm flow is as follows:

[0033] First, use the PTX instruction generator to automatically generate all the instructions in the NVIDIA PTX file and their modifier codes, then compile these PTX files into cubin with ptxas, and disassemble them through cuobjdump, and finally parse the disassembled information through assembly The instMap variable is represented as an instMap variable, which is used to decode the input of the solver. The structure of instMap includes: operation code, instruction, modifier code, all operands and corresponding operand types, etc.

[0034] Operands can be registers (R5), global memory ([R6+0x20]), constant memory (C[0x2][0x40]), shared memory ([0x50]), immediate data (0x9a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and system for reverse parsing of a GPU instruction and relates to the technical fields of GPU microarchitecture and compiler code generation and program optimization. The method includes the steps that the GPU instruction is compiled to generate a compiled file, disassembling is conducted on the compiled file to generate a disassembled file, and the disassembled file is expressed as instMap variables through an assembling parser, wherein the types of the instMap variables include operation codes, modification codes, the instruction, operands and corresponding operand types; the instMap variables are input to a decoding solver, the decoding solver judges the types of the instMap variables and finds corresponding codes according to the determined operation codes or the determined modification codes. According to the method and the system, on the basis of decoding the instruction codes, through the combination with a PTX document, a GPU assembler can be constructed, the decoding assisting function is provided for the GPU assembler, the efficiency of a GPU program is improved, and a series of micro benchmark test programs can be designed and standardized to detect the characteristic and parameter of a GPU microarchitecture.

Description

technical field [0001] The invention relates to the technical fields of GPU micro-architecture, compiler code generation technology and program optimization, in particular to a method and system for reversely analyzing GPU instructions. Background technique [0002] Over the years, GPU manufacturers only provide users with the upper-layer API encapsulated by the driver, and expose their internal principles and details as little as possible, such as the software architecture of the driver, the microarchitecture of the GPU, and the instruction set. This has caused the academic community to lag far behind the industry in the field of GPU architecture research, and has been stagnant for a long time. In the era when GPUs were only used for graphics acceleration, this conservative strategy did not become a prominent problem in practical applications, and even had certain Rationality: the original drawing API is strongly related to hardware implementation, the API is just a simple ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F9/45
CPCG06F8/427G06F8/53
Inventor 谭光明张秀霞
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More