Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

CPU + GPU heterogeneous parallel optimization method in gas dynamic theory unified algorithm

A gas dynamic theory and optimization method technology, applied in the field of GPU parallel optimization, can solve problems such as increasing input congestion, reading and writing competition, occupying CPU resources, and loss of calculation data, etc., to reduce CPU computing burden, increase memory resource occupation, and fast The effect obtained

Active Publication Date: 2020-10-30
中国空气动力研究与发展中心超高速空气动力研究所
View PDF11 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] While the Boltzmann equation can be calculated and modeled, the Unified Algorithm of Gas Kinetic Theory (GKUA) is widely used, and there is an increasing need for a high-performance large-scale parallel computing resource environment that uses tens of thousands or even larger CPUs to run stably for a single computing task. Based on the research status of parallel computing and algorithm testing based on 64-80750 CPU cores, if the number of CPU cores used for job running exceeds 8192, the more CPUs, the more unstable it becomes. There will be a problem with a certain processor participating in parallel computing or parallel communication, input / output I / O, or somewhere in the computer system, which will cause the running operation to be forced to stop. The unified algorithm of cross-basin bypass flow uses 23800 cores to calculate the convergence curve in parallel
The research found that although in theory, the more CPU resources a program calls, the faster its calculation speed will be, and the less time-consuming the total calculation will be, but from the actual effect, once the job calls more than 2048 cores, the measured acceleration There will be a significant gap between the theoretical speedup ratio and the theoretical speedup ratio. Once the job calls more than 8192 cores, the program will become unstable.
This is because with the increase of CPU resources, parallel programs will generate more and more parallel communications. In addition to occupying CPU resources, these parallel communications will also increase the probability of a series of failures such as input congestion and read-write competition.
Therefore, it is not feasible to blindly use the method of increasing CPU resources to improve parallelism, such as Image 6 As shown in the convergence curve of 23,800 core parallel calculations of the unified algorithm for near-space cross-watershed flow of the medium-pointed leading-edge wing-body assembly aircraft, point A will cause the loss of calculation data, and it will roll back to the last data backup and continue to run again, while point B At the point, after the job advances to 20,000 steps, the program will be forced to stop due to computer failure
But this does not mean that the unified algorithm program has no possibility to further improve the optimization

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • CPU + GPU heterogeneous parallel optimization method in gas dynamic theory unified algorithm
  • CPU + GPU heterogeneous parallel optimization method in gas dynamic theory unified algorithm
  • CPU + GPU heterogeneous parallel optimization method in gas dynamic theory unified algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0085] In this embodiment, the supercomputer platform of the National Supercomputing Tianjin Center is used for testing, and the CPU equipment model of the computing node used is Intel Xeon X5670, and the GPU equipment model is NVIDIA Fermi M2050. The GPU parallel acceleration test was carried out for the three models. Their grid numbers were 70482, 207638, and 397434 respectively. The range of coordinate points in the three-dimensional discrete velocity space was [-5.4, 5.4], the interval was 1.8, and the incoming flow Mach number Ma=2, Kn=1E-1. Tested with 512, 768, 1152, 1728, 2304, 3072, 4096, 6144, 9216 threads respectively. The results obtained are as follows:

[0086] Table 1 Comparison table of single-step calculation optimization with grid size of 70482

[0087]

[0088] Note: Time unit: second (s).

[0089] Table 2 Comparison table of single-step calculation optimization with grid size of 207638

[0090]

[0091]

[0092] Note: Time unit: second (s).

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a CPU + GPU heterogeneous parallel optimization method in a gas dynamic theory unified algorithm. The CPU + GPU heterogeneous parallel optimization method comprises implementation and optimization of CPU + GPU heterogeneous parallel. The optimization method comprises the steps: S1, performing statistics on CPU consumption time based on a parallel operation process and a hotspot code of a Boltzmann equation unified algorithm; S2, based on a data structure and a variable use condition of a Boltzmann model equation unification algorithm, carrying out statistics on the usecondition of each memory; S3, performing parallel calculation based on a Boltzmann model equation unified algorithm to obtain a corresponding variable dependency relationship; S4, performing paralleloptimization on the CPU + GPU heterogeneous parallel Boltzmann model equation unified algorithm through three levels of a system, an algorithm and a statement on the basis of CPU consumption time, memory configuration and a variable dependency relationship calculated by the Boltzmann model equation unified algorithm in parallel. The CPU + GPU heterogeneous parallel optimization method in a gas dynamic theory unified algorithm realizes improvement of parallel efficiency by means of optimization means of three levels of a system, an algorithm and a statement.

Description

technical field [0001] The invention belongs to the technical field of aircraft aerodynamics, and in particular relates to a GPU parallel optimization method for solving complex multi-scale non-equilibrium flow problems in various water domains from high-rare free molecular flow to continuous flow. Background technique [0002] With the rapid development of science and technology, high-performance computing has become a research method of strategic significance in the development of science and technology and major engineering design. The complementary and interrelated research methods in aerodynamic design have become the three pillars of scientific research in the 21st century. Especially with the development and use of large-memory, high-speed super parallel computers, high-performance parallel computing has become the master in the field of complex scientific computing. The complex aerodynamic problem of re-entry into the atmosphere has extremely important opportunities...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F9/50G06F11/34G06F30/15G06F30/20
CPCG06F9/5027G06F11/3452G06F30/15G06F30/20
Inventor 李志辉张子彬彭傲平白智勇徐金秀吴俊林蒋新宇
Owner 中国空气动力研究与发展中心超高速空气动力研究所
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products