GPU-based N-body simulation program performance optimization method

A technology for simulating programs and optimizing methods, applied in the computer field, can solve problems such as high complexity of direct methods, large amount of calculation, and insufficient computing power to meet the requirements, and achieve the effects of reduced time, reduced delay, and reduced data transmission

Active Publication Date: 2021-05-07
COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although there are some fast algorithms, such as the Fast Multipole Method (FMM), which reduces the overall complexity to O(N), the accurate and complete simulation of the N-body problem requires thousands of iterations, and the huge The amount of calculation has become the main bottleneck, and the computing power of traditional computers is far from meeting the requirements. With the rapid development of high-performance computing technology, parallel heterogeneous platforms represented by GPU have become the main means to improve computing performance.
[0004] For the performance optimization of the direct method (Particle-to-Particle, P2P) in the all-pairs problem, the shared memory of the video memory is mainly used to cache the particle information in the repeated calculation process. Although the efficiency is high, the complexity of the direct method is too high. , is O(N^2), not suitable for practical problems where N reaches hundreds of billions
For the fast N-body simulation method, the basic GPU optimization uses zero-copy memory to improve the transmission efficiency of GPU operations, but there are still repeated data transmission and calculations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • GPU-based N-body simulation program performance optimization method
  • GPU-based N-body simulation program performance optimization method
  • GPU-based N-body simulation program performance optimization method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0023] In the N-body simulation program, the basic GPU optimization is to first integrate the relevant data into an array and transfer it to the video memory. After completing the calculation of the particle interaction in each iteration, the acceleration or force is transferred back to the memory for data update. In the process of subpackaging, there may be a large number of repeated particles between different packets, resulting in a large amount of repeated data transmitted, which will increase the load on the bus. At the same time, there are many instructions that consume more clock cycles during the calculation process. Therefore, optimizing these links can improve efficiency without compromising accuracy.

[0024] Such as figure 1 and figure 2 As shown, the embodiment of the present invention provides a schematic flowchart of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an N-body simulation program performance optimization method based on a GPU (Graphics Processing Unit), which comprises the following steps of: transmitting related index information to the GPU, so that a process of constructing a short-range force list is migrated to the GPU, and meanwhile, the process of constructing the list is parallelized; changing a thread block scheduling mode, and loading particle information into a shared memory of the GPU in turn through pipeline scheduling of the GPU; calculating a short-range acting force in a GPU core function by adopting an interpolation polynomial and mixing precision, transmitting the calculated interpolation constant to the GPU after the interpolation constant is calculated on the CPU, and storing the interpolation constant in a shared memory of the GPU; reordering short-range force calculation results of all particles on the GPU and then enabling the reordered short-range force calculation results to besubjected to protocol merging in a GPU global memory, and after calculation of all the particles is completed, transmitting a final result back to the CPU. According to the method, the data transmission from the CPU memory to the GPU video memory is reduced, the delay of repeated memory access is reduced, the data access efficiency in the process of calculating the short-range force by the GPU is improved, the data transmission from the GPU video memory to the CPU memory is reduced, and the time for updating the information at the CPU end is also reduced.

Description

technical field [0001] The invention belongs to the technical field of computers, and in particular relates to a GPU-based N-body simulation program performance optimization method. Background technique [0002] The N-body problem is one of the basic problems in mechanics. It mainly studies the interaction between N particles and their motion laws. The initial parameters of the given particles, such as position, mass, and speed, are used to calculate the physical force of each particle. Interaction with other particles to simulate the evolution process of particles. In the calculation process, it is mainly divided into long-range force and short-range force. The N-body problem has important applications in astrophysics, molecular dynamics, materials science and other fields. [0003] In the face of a physical system with a total number of particles reaching hundreds of billions or even trillions, the calculation time will increase dramatically as the scale increases. Alth...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F30/25G06F119/14
CPCG06F30/25G06F2119/14
Inventor 王武赵文龙
Owner COMP NETWORK INFORMATION CENT CHINESE ACADEMY OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products