Method for optimizing finite difference algorithm in heterogeneous many-core framework

A technology of finite difference and optimization method, applied in the field of high-performance computing, can solve the problems of limited simulation range and simulation time, and low performance of finite difference numerical algorithm, so as to reduce the generation of bubbles and speed up the execution of instructions.

Inactive Publication Date: 2016-10-12
THE PLA INFORMATION ENG UNIV +2
View PDF3 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The technical problem to be solved by the present invention is to solve the problem that the performance of the finite difference nu

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for optimizing finite difference algorithm in heterogeneous many-core framework
  • Method for optimizing finite difference algorithm in heterogeneous many-core framework

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0026] Example 1: Combining Figure 1-Figure 2 , an optimization method for a finite difference algorithm in a heterogeneous many-core architecture, the finite difference algorithm is optimized using a three-step progressive optimization method, and the specific steps of the three-step progressive optimization method are:

[0027]Step 1. Basic optimization, extracting loop invariants to reduce calculation intensity, eliminating loop branches to facilitate vectorization; specifically: reducing calculation intensity through loop expansion and invariant extraction basic optimization methods, changing initial values ​​of loop variables and exit conditions to eliminate branches judge.

[0028] Step 2. Parallel optimization, using the OpenMP parallel model, by adding pragmas before the core loop to achieve thread-level parallelism, using built-in vector instructions to rewrite the core loop to achieve instruction-level parallelism; specifically: after the loop is divided into blocks...

specific Embodiment 2

[0035] Specific embodiment 2: combine Figure 1-Figure 2 , see figure 1 , figure 2 , the finite-difference numerical algorithm optimization method in the heterogeneous many-core system of the present invention, in the hybrid heterogeneous high-performance computer system based on the combination of many-core accelerator (MIC) and multi-core general-purpose processor (CPU), by transforming the initial value of the loop variable And the exit condition eliminates the branch judgment, because when the processor processes the conditional branch, the branch prediction logic unit will use a statistical method to predict the calculation result before the calculation result is available. Once the branch prediction error occurs, the instruction pipeline will return to the The branch position generates pipeline bubbles, resulting in waste of clock cycles. In addition, after the branch prediction fails, the compiler cannot continue to perform subsequent optimizations such as loop unrol...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the technical field of high-performance calculation, and relates to a method for optimizing a finite difference algorithm in a heterogeneous many-core framework. The method is used for optimizing the finite difference algorithm in a many-core accelerator (MIC) and multi-core general processor (CPU)-based hybrid heterogeneous high-performance computer system by using three progressive optimization methods. The method mainly comprises a basic optimization method, a parallel optimization method and a heterogeneous collaborative optimization method. The method disclosed in the invention has the beneficial effects as follows: the three progressive optimization methods are used for solving the problems of low calculation performance and bad parallel effect caused by leap-type access and parallel execution lack when converting the finite difference algorithm from a many-core system to a heterogeneous many-core; the method is an optimization method with high efficiency and expandability, and can be used for weakening the calculation strength and clearing obstacles for vectorization through basic optimization methods such as branch elimination, loop unrolling and invariant switching; and the parallel optimization method such as a core algorithm is rewritten by using a vector instruction set through analyzing data dependency and circulating partitioning, and a multi-threading and long-vector mechanism of the many-core processor is fully utilized.

Description

technical field [0001] The present invention belongs to the technical field of high-performance computing, specifically, belongs to the technical field of cooperative optimization of CPU and MIC in heterogeneous systems in the field of high-performance computers, and specifically relates to an optimization method of a finite difference algorithm in a heterogeneous many-core architecture. Background technique [0002] MIC (Many Integrated Cores), that is, "many-core architecture" has an architecture far more than CPU cores, and supports parallel computing functions with CPUs. [0003] In recent years, with the development of massively parallel architectures, heterogeneous many-core architectures have been widely used in the field of supercomputing. It can be seen from the Top500 list of supercomputers released every six months that more and more MICs that focus on parallel processing performance are integrated in high-performance clusters. Among them, the list released in Nov...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/30G06F9/38
CPCG06F9/30098G06F9/3851G06F9/3867
Inventor 许瑾晨张乾坤郝鑫单征戴涛周蓓郭绍忠
Owner THE PLA INFORMATION ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products