A Parallel Optimization Method and System for Radiation Hydrodynamic Equations (AMG)

CN116226587BActive Publication Date: 2026-06-30SHANDONG COMP SCI CENTNAT SUPERCOMP CENT IN JINAN +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANDONG COMP SCI CENTNAT SUPERCOMP CENT IN JINAN
Filing Date
2023-03-01
Publication Date
2026-06-30

Smart Images

  • Figure CN116226587B_ABST
    Figure CN116226587B_ABST
Patent Text Reader

Abstract

This disclosure provides a parallel optimization method and system for the Radiation Hydrodynamics (RHD) equations using the AMG method, relating to the field of data computation and processing technology. The method includes constructing a set of RHD equations, reading in the equations and initializing basic parameters; solving the RHD equations using the Algebraic Multigrid (AMG) method; wherein, during the AMG solution process, an optimized hybrid GS smoothing operator is used in the slave kernels for smoothing calculations, including dynamically constructing a cache array for each slave kernel; based on the constructed cache array, the computational tasks are allocated from the master kernel and indexed and mapped to each slave kernel; each slave kernel iterates through the data to be calculated and then returns the iteration results to the master kernel. This disclosure accelerates the solution speed of the RHD equations.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of data computation and processing technology, specifically to a parallel optimization method and system for the AMG radiation hydrodynamic equations. Background Technology

[0002] The statements in this section are merely background information relating to this disclosure and do not necessarily constitute prior art.

[0003] Radiation hydrodynamics is the discipline that describes the propagation of thermal radiation in fluids and its effects on general fluid motion. Radiation hydrodynamic theory has wide applications in astrophysics, laser nuclear fusion, and supernova explosion theory. Fluid motion and energy transfer at high energy density are highly complex processes, which can be described by a set of radiation hydrodynamic equations (RHD). Due to the influence of physical structure, multi-scale factors, and other factors, solving these equations is very difficult and represents a significant computational challenge.

[0004] Algebraic Multigrid (AMG) is one of the most efficient methods for solving RHD problems and is also one of the most convergent pre-tuners commonly used in solving linear elliptic differential equations. When solving large-scale systems of equations on high-performance computers, conventional iterative methods converge slowly to low-frequency errors. AMG compensates for this by alternating between coarse and fine grids. A smoothing process eliminates high-frequency components of the iteration error, while the remaining low-frequency components are eliminated by solving a smaller linear system on a coarse grid (coarse grid correction). This process is repeated recursively for the linear system on the coarse grid until the coarse grid size is sufficiently small. The smoothing operator approximates the fundamental error accurately and efficiently on a coarser grid, typically using relaxation iterative methods such as Jacobi or Gauss-Seidel (GS).

[0005] The complete AMG algorithm can be divided into two stages: setup and solution. The setup stage obtains the mesh hierarchy and constructs elements such as constraint operators and interpolation operators. The solution stage achieves cyclic solution of multi-layer meshes through V-cycle (from fine mesh to coarse mesh and back to fine mesh) or W-cycle (multiple alternations of coarse and fine meshes).

[0006] In the AMG calculation process, the smoothing module is the most time-consuming module. Furthermore, dependencies exist between variables during smoothing calculations; the calculation of elements in the current iteration step requires data from elements in the previous iteration step, and the dependent variables are scattered. Therefore, the biggest challenge in parallel optimization of smoothing operators lies in strong data dependencies, poor locality, and low computation-to-memory ratio. The Sunway next-generation supercomputer is built using the new-generation Sunway many-core processor SW26010pro. Each CPU integrates six core groups, and each core group contains a management processing unit (MPE) and a set of 8*8 computing processing elements (CPEs). The MPE is a general-purpose processor responsible for handling the logic-intensive parts of the program and controlling system resources; the numerous CPEs are computing cores responsible for accelerating hot-spot parts of the program, and each CPE can be indexed by a unique identifier. The system supports parallel modes of MPI, OpenMP, athread, and OpenACC, using the athread library as the primary efficient acceleration method within the core group.

[0007] Currently, there is no efficient master-slave core implementation method for hybrid GS smoothing operators on the Sunway series supercomputers. Existing hybrid GS smoothing operators cannot adapt to the characteristics of the Sunway many-core processor, resulting in a long solution time for the AMG method in solving the RHD equations, which cannot achieve rapid simulation of complex applications such as large-scale laser fusion. Summary of the Invention

[0008] To address the aforementioned issues, this disclosure presents a parallel AMG optimization method and system for the radiation hydrodynamics equations. It proposes an Algebraic Multigrid (AMG) method based on the Sunway supercomputer architecture to accelerate the solution of the radiation hydrodynamics (RHD) equations. Considering the characteristics of the Sunway supercomputer architecture and the features of the AMG algorithm, it optimizes the hybrid GS smoothing operator, which has the longest processing time in the AMG algorithm, to achieve efficient parallel solution of the RHD equations.

[0009] According to some embodiments, the present disclosure adopts the following technical solutions:

[0010] A parallel optimization method for the AMG (Radiative Hydrodynamics) equations, implemented on an SW26010 Pro processor, includes:

[0011] Construct a system of radiation hydrodynamic equations, read in the system of equations and initialize the basic parameters;

[0012] The radiation hydrodynamic equations were solved using the algebraic multigrid method (AMG).

[0013] In the solution process of the algebraic multigrid method (AMG), an optimized hybrid GS smoothing operator is used for smoothing calculations in the kernel. The process includes:

[0014] Build a cache array, and based on the built cache array, allocate computing tasks from the main core and map the indices to each slave core;

[0015] Each slave core iterates through the data that needs to be calculated and then returns the iteration results to the main core.

[0016] A parallel optimization system for the AMG (Radiative Hydrodynamics) equations, characterized in that it comprises:

[0017] An initialization module is used to construct a set of radiation hydrodynamic equations, read in the equations and initialize the basic parameters; and solve the set of radiation hydrodynamic equations using the algebraic multigrid method (AMG).

[0018] The smoothing module is used to perform smoothing calculations in the slave kernels using an optimized hybrid GS smoothing operator during the solution process of the algebraic multigrid method (AMG). The process includes: constructing a cache array; and mapping the computational tasks from the master kernel to each slave kernel by index according to the constructed cache array.

[0019] Each step involves iterating through and calculating the data that needs to be included in the calculation.

[0020] The data feedback module is used to send the iteration results back to the main core.

[0021] Compared with the prior art, the beneficial effects of this disclosure are as follows:

[0022] This disclosure presents a parallel optimization method for the AMG system of radiation hydrodynamic equations. The solution function is optimized based on the new generation of domestic Shenwei many-core processor. The test platform is the Shenwei many-core processor 26010pro, and the time taken to calculate the hot spot function V-cycle is tested.

[0023] Compared to the original algorithm, the optimized hybrid GS algorithm disclosed in this paper fully utilizes the advantages of the Sunway many-core processor, significantly improving computational efficiency and achieving a clear performance improvement. In the RHD example, using a single-process solution, comparing the computation time of a single V-cycle before and after optimization, the speedup ratio after optimization reaches 8.9 and 8, respectively. Utilizing the optimized AMG method of this paper, the computing power of the Sunway next-generation supercomputer can be fully leveraged to accelerate the solution speed of the RHD equations. Attached Figure Description

[0024] The accompanying drawings, which form part of this disclosure, are used to provide a further understanding of this disclosure. The illustrative embodiments of this disclosure and their descriptions are used to explain this disclosure and do not constitute an undue limitation of this disclosure.

[0025] Figure 1 A schematic diagram of the AMG process for the method provided in the embodiments of this disclosure;

[0026] Figure 2 A V-cycle diagram illustrating the method provided in this embodiment of the disclosure;

[0027] Figure 3 A schematic diagram of the kernel-mixed GS process is provided for the embodiments of this disclosure;

[0028] Figure 4 A time comparison graph showing the solution time of the system of equations for the method provided in the embodiments of this disclosure before and after optimization. Detailed implementation method:

[0029] The present disclosure will be further described below with reference to the accompanying drawings and embodiments.

[0030] It should be noted that the following detailed descriptions are illustrative and intended to provide further explanation of this disclosure. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains.

[0031] It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the exemplary embodiments according to this disclosure. As used herein, the singular form is intended to include the plural form as well, unless the context clearly indicates otherwise. Furthermore, it should be understood that when the terms “comprising” and / or “including” are used in this specification, they indicate the presence of features, steps, operations, devices, components, and / or combinations thereof.

[0032] Terminology Explanation:

[0033] The Sunway next-generation supercomputer is built using the new-generation Sunway SW26010pro many-core processor. Each CPU integrates six core groups, and each core group contains a management processing unit (MPE) and a set of 8x8 computing processing elements (CPEs). The MPE is a general-purpose processor responsible for handling the logic-intensive parts of the program and controlling system resources; the numerous CPEs are computing cores responsible for accelerating the hot spots in the program, and each CPE can be indexed by a unique identifier. The system supports parallel modes of MPI, OpenMP, athread, and OpenACC, using the athread library as the primary efficient acceleration method within the core group.

[0034] Example 1

[0035] One embodiment of this disclosure provides an AMG parallel optimization method for solving radiation hydrodynamic equations based on the Sunway architecture. The focus is on building upon the architecture of the next-generation Sunway supercomputer and providing an AMG algorithm optimized for the Sunway architecture to accelerate the solution of the RHD equations. By optimizing the longest-running hybrid GS smoothing operator, and utilizing the slave core computing resources of the Sunway processor, an efficient parallel solution method for the RHD equations is proposed. Figures 1-3 As shown.

[0036] A parallel optimization method for the AMG (Radiative Hydrodynamics) equations, implemented on an SW26010 Pro processor, includes:

[0037] Step 1: Construct the system of radiation hydrodynamic equations, read in the system of equations and initialize the basic parameters;

[0038] Step 2: Solve the radiation hydrodynamic equations using the algebraic multigrid method (AMG).

[0039] In step 2, the improved hybrid GS smoothing operator is used to accelerate the solution of the RHD equation by the AMG algorithm. During the solution process of the algebraic multigrid algorithm AMG, the optimized hybrid GS smoothing operator is used for smoothing calculations in the kernel. The process includes:

[0040] Step 21: Create a cache array in main memory;

[0041] Step 22: Based on the constructed cache array, allocate computing tasks from the main core and then map the indices to each slave core;

[0042] Step 23: Each slave core iterates through the data that needs to be calculated, then returns the iteration results to the main core and outputs the calculation results.

[0043] The advantage of the above technical solution is that, taking into account the characteristics of the Sunway supercomputer architecture and the features of the AMG algorithm, it provides an efficient parallel solution method for the RHD equation system by optimizing the hybrid GS smoothing operator, which has the longest processing time.

[0044] As one example, specifically, such as Figure 1 As mentioned in step 1, a set of radiation hydrodynamic equations is constructed, the equations are read in, and the basic parameters are initialized.

[0045] The complete AMG algorithm can be divided into two stages: setup and solution. After reading in the system of equations and initializing the basic parameters, the setup stage obtains different mesh hierarchical structures of different coarse and fine layers through nesting and constructs elements such as the constraint operator R and the interpolation operator P for coarse and fine mesh transformation. The solution stage realizes the cyclic solution of multi-layer meshes through V-cycle, W-cycle, etc.

[0046] like Figure 2 The process of a V-cycle is shown in the figure below:

[0047] Taking the solution of the system of equations Ax = b as an example, where A is a known system of equations and b is a known vector, we want to find the unknown vector x.

[0048] 1. Pre-smoothing: Starting with the original equation system as the finest mesh layer, perform hybrid GS smoothing calculations on the equation system Ax=b to eliminate errors with high frequency and obtain an approximate solution x. f .

[0049] 2. Coarse mesh correction

[0050] 2.1 Calculate the residual r = b - Ax f Limit it to coarse mesh layer b k =Rr;

[0051] 2.2 Solving the coarse mesh equation A k x k =b k ;

[0052] 2.3 Interpolation and Correction of Fine-Mesh Approximate Solution x f =x f +Px k .

[0053] 3. Post-smoothing: Smoothing calculations are performed again using the hybrid GS operator on the fine mesh, updating the approximate solution x. f .

[0054] Above, b k A k x kThese represent b, A, and x, respectively, on the coarse mesh layer after the original Ax = b has been constrained.

[0055] Then, the algebraic multigrid algorithm (AMG) is used to solve the radiation hydrodynamic equations.

[0056] As one embodiment, in the solution process of the algebraic multigrid algorithm AMG, an optimized hybrid GS smoothing operator is used for smoothing calculations in the kernel, as follows: Figure 1 and Figure 3 As shown, the dashed box represents the operations performed from the kernel, specifically the optimized GS smoothing process:

[0057] In step 21, a cache array is constructed;

[0058] The key to accelerating computation from a kernel group lies in achieving computations without interference between kernels, thus avoiding data dependencies. The problem with kernel-based acceleration is that simultaneous data updates by each kernel can render iteratively dependent data unreliable, leading to data errors, increased iteration count, and decreased convergence rate.

[0059] A cache array u_temp_data is built in the main memory of the core group to cache vector data u_data. The slave cores read the corresponding part of the main memory as needed for calculation, so as to achieve data updates without interference. Each slave core is assigned an independent calculation task to avoid the effective values ​​being consumed by the parallel calculation process.

[0060] In step 22, based on the constructed cache array, the computing tasks are allocated from the master core and the indices are mapped to each slave core; this process involves allocating the element data to be computed from the master core to the slave core.

[0061] This includes: dividing the local data block of the current computation task into diagonal blocks containing the main diagonal elements of the equation system and off-diagonal blocks without the diagonal elements of the equation system, storing them in memory in a column-compressed format, distributing the data of the task to be computed evenly to each slave core, and allocating any remaining data to smaller slave cores according to their index numbers.

[0062] Specifically, each process will access the local data block A of the current process N. n Divided into diagonal block data and Off-diagonal block data It is stored in memory in a columnar compression format.

[0063] Then, the data that needs to be accelerated is evenly distributed among the 64 slave cores. For the remaining data, it is distributed to the slave core with the smaller index number. The slave core with the smaller index number means that n is a smaller number. The data index Ele of the current slave core is determined by the current slave core index number n and the total number of elements AllEle to decide whether to accept additional elements. This fully considers the data dependency problem and the effective use of computing resources. The total number of elements is the number of data to be calculated.

[0064] The index of the nth computational task to be performed from the core (n starts from 0) is Ele. n The following formula is used for calculation:

[0065]

[0066] Where AllEle represents the total number of elements, n represents the nth slave element, and Ele n Indicates a data index.

[0067] Step 23: Each slave core iterates through the data that needs to be calculated, then returns the iteration results to the main core and outputs the calculation results.

[0068] In hybrid GS smoothing calculations, the fine mesh formed by the equations is first divided into 64 equally sized data blocks, with each kernel performing calculations on one data block. Based on the mesh division, each kernel performs calculations on the data within its own block. For example... Figure 3 As shown.

[0069] include:

[0070] Each kernel first defines variables i, j, and k for traversal. The initial value of i is the previous index Ele. n The corresponding lower bound, end is the index Ele n The corresponding upper bound; j and k are initially 0, and their final values ​​are determined based on the number of elements; index is the data index mapping, determined by the formula above. Then, it is determined whether the data is required for calculation by the current slave core. Each slave core calculates the data within its own data block; data not allocated to the current slave core's task and data that does not require smoothing are not modified.

[0071] If smooth data is required, it is necessary to determine whether it is a main diagonal block of the equation system. That is, when calculating the diagonal block data, it is necessary to determine whether the element data on which the current calculation task depends is in the current slave kernel.

[0072] If the element data that the computation task depends on is within the current slave kernel, the residual is obtained by subtracting the diagonal block data and the corresponding vector from the original residual. That is, the residual res is the product of the original residual, the diagonal block data diag_data, and the corresponding vector u_data.

[0073] If the required data is not in the current slave core, the calculation is performed through the cache array u_temp_data, thereby avoiding the impact of data updates from other slave cores on the calculation results. Through this step, the effect of parallel acceleration of slave core calculation is achieved.

[0074] The calculation of off-diagonal block data is similar to that of diagonal block data, but there is no need to distinguish whether the data within the off-diagonal block is located in the current slave kernel. After obtaining the residual, it is used for further coarse mesh correction. The u_data data is updated to the obtained residual divided by the corresponding data of the previous mesh layer.

[0075] After the previous calculation is completed, the obtained data u_data is transmitted back to the master core by each slave core. According to the slave core index divided in step two, the data is synchronized and transmitted back to the main memory for calculation of the next grid layer.

[0076] like Figure 4 As shown, the example used consists of two RHD equations. The solution to the equations is optimized based on the new generation of domestic Shenwei many-core processors. The test platform is the Shenwei many-core processor 26010pro. The test time for the computation hotspot function is the hybrid GS module, as shown below. Figure 4 This indicates the time taken in a single V-cycle for the combined GS function before and after optimization.

[0077] After the example is input and the setup phase is completed, a time stub is inserted, and the V-cycle loop begins. The V-cycle uses a single smoothing calculation before and after each mesh layer. A smoothing calculation is performed once before and once after each smoothing calculation. A time stub is inserted when the set calculation accuracy is reached, and the calculation completion time is compared.

[0078] As can be seen, the optimized hybrid GS algorithm of this publication, compared with the original AMG algorithm, fully utilizes the advantages of the Sunway many-core processor, achieving a significant acceleration effect on function performance and significantly improving computational efficiency. The speedup ratio for hot function parts can reach 8.9 and 8. Using the publicly optimized AMG method, the computing power of the Sunway next-generation supercomputer can be fully utilized to accelerate the solution speed of the RHD equations.

[0079] Example 2

[0080] One embodiment of this disclosure provides a parallel optimization system for the radiation hydrodynamic equation set AMG, 10, comprising:

[0081] An initialization module is used to construct a set of radiation hydrodynamic equations, read in the equations and initialize the basic parameters; and solve the set of radiation hydrodynamic equations using the algebraic multigrid method (AMG).

[0082] The smoothing module is used to perform smoothing calculations in the slave kernels using an optimized hybrid GS smoothing operator during the solution process of the algebraic multigrid algorithm AMG. The process includes: constructing a cache array; and mapping the computation tasks from the master kernel to each slave kernel by index according to the constructed cache array.

[0083] Each step involves iterating through and calculating the data that needs to be included in the calculation.

[0084] The data feedback module is used to send the iteration results back to the main core.

[0085] The system specifically executes all the steps of the method described in Embodiment 1.

[0086] This disclosure is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create a machine for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0087] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0088] While the specific embodiments of this disclosure have been described above in conjunction with the accompanying drawings, this is not intended to limit the scope of protection of this disclosure. Those skilled in the art should understand that various modifications or variations that can be made by those skilled in the art without creative effort based on the technical solutions of this disclosure are still within the scope of protection of this disclosure.

Claims

1. A method for AMG parallel optimization of a system of radiative hydrodynamic equations, characterized in that, The method is implemented based on the SW26010Pro processor and includes: Construct a system of radiation hydrodynamic equations, read in the system of equations and initialize the basic parameters; The radiation hydrodynamic equations were solved using the algebraic multigrid method (AMG). In the solution process of the algebraic multigrid method (AMG), an optimized hybrid GS smoothing operator is used for smoothing calculations in the kernel. The process includes: A cache array is dynamically constructed for each slave core. Based on the constructed cache array, the computing tasks are allocated from the master core and then mapped to each slave core by index. The process of allocating computing tasks from the master core and then mapping them to each slave core by index includes: dividing the local data block of the current computing task into diagonal block data and off-diagonal block data, storing it in memory in a column compression format, distributing the data of the task to be computed evenly to each slave core, and allocating the remaining data to smaller slave cores according to the index number. Each slave core iterates through the data that needs to be calculated and then returns the iteration results to the master core. The process of each slave core iterating through the data that needs to be calculated and then returning the iteration results to the master core includes: first, determining whether the data is the data that the current slave core needs to calculate; data that is not in the current slave core's task allocation and data that does not require smoothing are not modified.

2. The AMG parallel optimization method for the radiative hydrodynamic equations according to claim 1, wherein, The process of dynamically building a cache array for each slave core is as follows: the computed data u_data is cached by building a cache array u_temp_data, thereby allocating an independent computing task to each slave core.

3. The AMG parallel optimization method for the radiative hydrodynamic equations according to claim 1, wherein, Each slave core's computed data index determines whether to accept additional spare data based on the current slave core index number and the total number of data.

4. The AMG parallel optimization method for the radiative hydrodynamic equations of claim 1, wherein, If smooth data is required, it is necessary to determine whether the data is within the diagonal block. When calculating the diagonal block data, it is necessary to determine whether the element data on which the current calculation task depends is within the current kernel.

5. The parallel AMG optimization method for the radiative hydrodynamic equations according to claim 4, wherein, If the element data on which the computation task depends is within the current slave kernel, then the final residual is the difference between the original residual and the product of the diagonal block data and the corresponding vector.

6. The AMG parallel optimization method for radiation hydrodynamic equations as described in claim 4, characterized in that, If the element data that the computation task depends on is not in the current slave core, the computation is performed through the cache array.

7. The parallel optimization method for the AMG system of radiation hydrodynamic equations as described in claim 1, characterized in that, The iteration result is a solution vector, which is then passed back from each slave core to the master core.

8. A parallel optimization system for the AMG radiation hydrodynamic equations, characterized in that, Specifically, the AMG parallel optimization method for the radiation hydrodynamic equations as described in any one of claims 1-7 includes: An initialization module is used to construct a set of radiation hydrodynamic equations, read in the equations and initialize the basic parameters; and solve the set of radiation hydrodynamic equations using the algebraic multigrid method (AMG). The smoothing module is used to perform smoothing calculations in the slave kernels using an optimized hybrid GS smoothing operator during the solution process of the algebraic multigrid method (AMG). The process includes: constructing a cache array; and mapping the computational tasks from the master kernel to each slave kernel by index according to the constructed cache array. Each step involves iterating through and calculating the data that needs to be included in the calculation. The data feedback module is used to send the iteration results back to the main core.