A method for generating a non-structured grid with a hundred billion level in a tera-core level high-efficiency parallel mode
By employing a parallel generation method with tens of thousands of cores, and using MPI layering, group communication, and adaptive mesh stitching techniques, the computational resource and efficiency limitations of generating unstructured meshes of tens of billions of cores were overcome, achieving efficient generation and quality assurance, and supporting refined and engineering applications of CFD analysis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INST OF MECHANICS CHINESE ACAD OF SCI
- Filing Date
- 2025-09-29
- Publication Date
- 2026-06-26
AI Technical Summary
Existing technologies struggle to efficiently generate unstructured meshes of tens of billions of elements, limited by computational resources and efficiency, which hinders the refinement and engineering applications of CFD analysis.
Employing a parallel generation method with tens of thousands of cores, and utilizing MPI layering, group communication architecture, and adaptive mesh stitching technology, we achieve efficient generation of unstructured meshes with a scale of tens of billions of cores, including small-scale mesh stitching, adaptive matching, and global mesh reconstruction.
The system has achieved efficient generation of unstructured meshes with a capacity of tens of billions on a domestic supercomputing platform, reducing the generation time to within a few hours. This significantly improves computational efficiency and mesh quality while reducing the workload of manual intervention.
Smart Images

Figure CN121302964B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computational fluid dynamics, specifically to a method for efficiently generating unstructured meshes with tens of billions of cores in parallel. Background Technology
[0002] Mesh is the foundation of numerical simulations in Computational Fluid Dynamics (CFD). It provides a discretized computational framework for the partial differential form of the flow control equations by discretizing a continuous fluid domain into a finite number of control volumes or control points. In the CFD analysis process, the mesh serves to store flow field information and directly affects the accuracy, convergence, and computational efficiency of the numerical solution. Mesh can be broadly classified into structured meshes and unstructured meshes (including hybrid meshes). Unstructured meshes are favored in engineering applications due to their unique advantages. Unstructured meshes typically consist of flexible combinations of various element types such as tetrahedrons, hexahedrons, prisms, and pyramids, enabling them to efficiently adapt to complex geometries while maintaining mesh quality. Especially when dealing with complex geometric configurations such as aircraft wing-body blending areas, engine intake and exhaust ducts, and landing gear bays, unstructured hybrid meshes exhibit excellent geometric adaptability and have become the mainstream technique for CFD analysis of complex shapes.
[0003] With the pursuit of excellence in product performance by high-end equipment manufacturing industries such as aerospace and shipbuilding, CFD technology is rapidly developing towards high-fidelity and refined simulation. Refined CFD simulation places higher demands on mesh size. For example, the mesh size required for Direct Numerical Simulation (DNS) is proportional to the Reynolds number. 37 / 14 The relationship between growth and the required mesh size for Large Eddy Simulation (LES) is related to Re. 14 / 7 Proportional to the number of cells. Taking the SUBOFF standard model of a typical underwater vehicle as an example, large eddy simulation of the wall at a laboratory scale requires billions of grid cells. When the Reynolds number is further increased to the level of practical engineering applications, the required number of grid cells will reach tens or even hundreds of billions. This power-law-like increase in grid demand and the contradiction between existing computing resources have become the core bottleneck restricting the engineering application of high-fidelity CFD technology.
[0004] Commercial software is struggling to meet the demands of generating ultra-large-scale meshes for refined CFD simulations. Firstly, there's the inherent limitation of computational resources: most mainstream commercial mesh generation software still relies on a serial architecture, constrained by the memory capacity of a single computer. Generating meshes with billions of nodes requires hundreds of gigabytes or even terabytes of memory, far exceeding the capabilities of ordinary workstations. Even with high-end servers, memory expansion is limited by hardware and cost, making it difficult to meet the continuously growing demands. Secondly, computational efficiency is severely constrained: serial programs often take days or even weeks to generate meshes exceeding hundreds of millions, significantly impacting the overall efficiency of CFD analysis. This inefficiency not only prolongs product development cycles but also limits the numerous iterations required for optimization design. Therefore, breaking through the limitations of traditional serial mesh generation technology and developing large-scale parallel mesh generation technology with independent intellectual property rights is a key technological requirement for the refinement and engineering-oriented development of CFD technology. Summary of the Invention
[0005] To address the technical problems existing in the background art mentioned above, this invention proposes a method for generating unstructured meshes of tens of billions of cores in a highly efficient parallel manner. The concept is reasonable. By splicing small-scale meshes, breaking through the limitation of unstructured meshes of tens of billions of cores, a method for generating ultra-large-scale unstructured meshes is used. Combined with MPI (Message Passing Interface) parallel communication, the computational efficiency is greatly improved, so that the splicing and generation of unstructured meshes of tens of billions of cores can be completed in one or two hours.
[0006] To address the aforementioned technical problems, this invention provides a method for efficiently generating unstructured meshes at the level of tens of billions of cores in parallel, which mainly includes the following steps:
[0007] (1) For grid generation with high communication load, an MPI layered and group communication architecture is constructed based on the total grid size and the number of blocks to achieve parallelism at the level of 100,000 cores; and a structure array matching the MPI layered and group communication architecture is established.
[0008] (2) All processes read M blocks of unstructured grid data in parallel, distribute them evenly to the memory of P processes, and establish the grid connection relationship between adjacent processes, thereby breaking through the single-node memory limit and realizing 10,000-core parallelism.
[0009] (3) Efficient adaptive matching of the sub-mesh corresponding to the intersection surface in the M-block unstructured mesh. That is, if the coordinate points on both sides of the intersection surface coincide, it is directly replaced; if the coordinate points on both sides of the intersection surface do not coincide, one side of the mesh is deleted and a new mesh is locally regenerated to achieve adaptive matching.
[0010] (4) Efficiently and in parallel adjust the numbering of point, surface, and volume mesh elements in the global mesh, reconstruct the connection relationship of surface and volume elements in the global mesh, and check and ensure the correctness of mesh information;
[0011] (5) Analyze the global grid boundary, reconstruct the boundary information of the global grid and check it.
[0012] The method for efficiently generating unstructured meshes at the 10,000-core level in parallel, wherein the specific process of constructing the MPI layered and packet communication architecture in step (1) is as follows:
[0013] Initialize the global communication domain comm_world and start P parallel MPI processes; the main process sequentially reads the cell counts NC1, ..., NC of M small-scale meshes. M And calculate the total number of units NC global The system divides the communication domain into M sub-communication domains (sub_comm), thereby establishing a hierarchical communication mode between the global communication domain and the sub-communication domains, as well as a group communication mode for the M sub-communication domains; the j-th sub-communication domain contains L... i Local processes if Then from the i=1 to L i Execution L i =L i +1, so that each process can process almost the same amount of data, thereby improving parallel efficiency;
[0014] The structure array contains the number of grid cells, the number of grid nodes, the coordinates of the grid nodes, the composition of the grid cells, the boundary surfaces, and the boundary types; the structure array also contains local grid information and the connection relationship between grids in adjacent processes.
[0015] The method for efficiently generating unstructured meshes at the level of tens of thousands of cores in parallel, wherein the specific process of step (2) is as follows:
[0016] M sub-grids are read in parallel through M sub-communication domains (sub_comm). The grid information read includes node information (NN). i Individual and volumetric mesh cell link information NC i Strip and surface mesh element information NF i The number of nodes is given, and i ∈ [1, M]; in the i-th sub-community, the j-th process is assigned NN. sub,ij Node coordinates, NC sub,ij Individual grid cell information and NF i Boundary surface information; NN sub,ij =int(NN) i / L i ),if Then NN sub,ij =NN sub,ij +1, NN sub,ij=int(NN) i / L i ),if Then NN sub,ij =NN sub,ij +1,
[0017] Then, within the sub-communication domain sub_comm, the node numbers of the grid cells in the current process are calculated based on the process number. The nodes with the same numbers are compared and analyzed on the grid cells at the interface of adjacent processes, and the connection relationship of the stored grids between adjacent processes is calculated. Then, in the global communication domain comm_world, the range of node, cell, and face number of the grids stored in different sub-communication domains in the global grid is compared and analyzed.
[0018] The method for efficiently generating unstructured meshes at the level of tens of thousands of cores in parallel, wherein the specific process of step (3) is as follows:
[0019] (3.1) All processes started in step (1) simultaneously search in parallel for the faces / points marked as "internal faces / points" in the current process, and store this face / point information in Sel. ij In the array, i∈[1,M],j∈[1,L] i ]; Set Sel in the i-th communication domain ij Send it to all processes in the (i+1)th communication domain via MPI communication, and find the matching Sel in the (i+1)th communication domain. ij Corresponding coordinates Sel i+1,j And record the number of corresponding points Ndel found. i+1,j ;
[0020] (3.2) If all received Sel ij If the corresponding coordinates can all be found in the i+1 communication domain, then delete these corresponding Ndels. i+1,j Each coordinate point, delete the Fdel formed by these points i+1,j There are boundary surfaces, and Ndel is recorded. i+1,j The point's index in the i-th grid block is Ind ij ;
[0021] (3.3) If the received Sel ij If not all corresponding coordinate points can be found in the i+1 communication domain, then delete the corresponding Ndel points that have already been found. i+1,j Number of coordinate points, record deletion Ndel i+1,j The reduction in the boundary surface caused by each coordinate point, Fdel i+1,j Record Ndel i+1,j The point's index in the i-th grid block is Ind ijThen, for the remaining points Nrsd whose corresponding coordinates were not found... ij For reference, delete points NRdel in the (i+1)th communication domain whose distance is less than the local grid scale. i+1,j Delete NRdel record i+1,j Each point causes a reduction in the number of mesh cells, which is NCRdel. i+1,j Furthermore, with respect to point Nrsd ij Connected boundary surfaces and deletion of NRdel i+1,j The newly formed boundary surface is the front surface. The front surface advancement method is used to locally generate tetrahedral meshes to fill and delete NRdel. i+1,j Pointing causes the mesh cells to decrease NCRdel i+1,j The resulting space, ensuring the quality of the locally generated mesh so that the Jacobian coefficient is >0.7, and recording the newly added points Nadd i+1,j , face Fadd i+1,j and NCadd i+1,j The quantity.
[0022] The method for efficiently generating unstructured meshes at the level of tens of thousands of cores in parallel, wherein the specific process of step (4) is as follows:
[0023] (4.1) After all processes started in step (1) have completed step (3), each communication domain shall organize and reduce the number of points, surfaces and volumes in its respective communication domain. And the reduction in quantity due to deletion The Nadd obtained from the sorting and reduction will be used. i Fadd i NCadd i 、Ndel i 、Fdel i NCdel i This is broadcast to all processes via MPI; thus, the range of node numbers in the k-th communication domain is determined as follows: arrive The grid cell number is from arrive The boundary surfaces are numbered from arrive Furthermore, when k=1, the node number, mesh cell number, or boundary surface number is 1;
[0024] (4.2) Based on the starting position of the local grid in the global grid given in step (4.1) above, reorganize all grid numbering information to realize the integrated generation of the global grid;
[0025] (4.3) After all processes started in step (1) have completed step (4.2) above, check whether the grid information is correct.
[0026] The method for efficiently generating unstructured meshes at the 10,000-core level in parallel, wherein the method for integrating and generating the global mesh in step (4.2) is as follows: if a deleted point is encountered in the node numbering of the communication domain, it is sequentially filled by the subsequent points, and the mapping relationship G between the original number l and the integrated number is recorded. ij (l); due to the mapping relationship Ind ij and G ij (l) Since they only have local process parts, they need to be redefined as global rules within sub-communication domains, specifically for Ind. i and G i (l), and the reduction operation is to convert Ind ij According to j from 1 to L i Find the union of the sets in order to obtain Ind i G i (l) operation method and Ind i Same; deleted points in surface and volume meshes are ordered according to Ind. i Adjustments were made, and points that were not deleted were adjusted according to G. i (l) Adjust the mapping relationship; then adopt the improved hash table method, that is, dynamically divide the data to be compared and the hash table into blocks. The block size is dynamically determined by the memory function based on the remaining available memory capacity to avoid memory overflow and achieve the generation of tens of billions of grids.
[0027] The method for efficiently generating unstructured meshes at the 10,000-core level in parallel, wherein the method for checking the correctness of the mesh information in step (4.3) is as follows:
[0028] ① The maximum values of the grid point, surface, and volume numbers in the k-th communication domain should be respectively:
[0029] ② The number of nodes contained in the surface mesh and volume mesh elements should be consistent with the original mesh, and the maximum number should be 4 and 8 respectively;
[0030] ③ Every boundary surface mesh element must have a unique corresponding volume mesh element; if an error occurs, the current process outputs the error messages corresponding to ①-③, and then the current process sends an exit command to all other processes.
[0031] The method for efficiently generating unstructured meshes at the level of tens of thousands of cores in parallel, wherein the specific process of step (5) is as follows:
[0032] (5.1) The i-th sub-mesh contains Nbc i Each boundary and NF i -Fdel iFor each boundary surface element, all boundary surface elements are categorized and organized according to their attributes: "wall," "far field," "symmetry plane," "periodic boundary," "velocity inlet," "velocity outlet," and "slip surface." The types of boundary surface attributes are then reduced and statistically analyzed. global And type name and send to all processes started in step (1); classify all processes according to the boundary surface with the same attributes, and use 1 to Nbc. global The attribute is marked with a number from 1 to Nbc. global The corresponding attribute type name; organize the boundary surfaces with the same attribute to form Nbc. global The geometry of each boundary surface is used to reconstruct the boundary mesh;
[0033] (5.2) Check the correctness of the reconstructed boundary surface mesh; if the mesh information is correct, the current process will output the generated mesh in parallel I / O mode according to the file position corresponding to the current process ID, and release all memory.
[0034] The method for efficiently generating unstructured meshes at the level of tens of billions of cores in parallel, wherein the method for checking the correctness of the reconstructed boundary surface mesh in step (5.2) is as follows:
[0035] ① Boundary surface attribute check
[0036] Traverse all boundary surfaces and verify that the predefined boundary condition identifier field is not empty, unique, and within the set of valid values; if the boundary surface is correct, then the boundary surface is correct.
[0037] ② After sorting the global node numbers of each boundary surface, a unique signature is generated, and a hash table is used to quickly check for duplicates within the process; if a signature maps to only one boundary surface, then the boundary surface is correct, and if a signature maps to two or more, it is a duplicate error.
[0038] By adopting the above technical solution, the present invention has the following beneficial effects:
[0039] This invention presents a method for efficiently generating unstructured meshes with tens of billions of cells / nodes in parallel at a scale of tens of thousands of cores. Through innovative distributed data organization, storage, and operation methods, it overcomes the limitations of single-machine memory and computing power, enabling efficient parallel computing at the tens of thousands of cores level on domestic platforms to generate industrial-grade ultra-large-scale unstructured meshes with over tens of billions of cells / nodes. The invention's built-in mesh correctness guarantee mechanism includes verification of mesh node numbers and their upper limits, checks on cell composition and node number ranges, consistency and unique attribution verification of boundary surface attributes, and duplicate surface detection. Errors can be quickly and accurately located to the partition ID, ensuring both topological and semantic correctness of the output mesh. In terms of parallel I / O and resource management, a sub-communication domain parallel read / write strategy is adopted, effectively improving throughput and reducing file read / write time.
[0040] This invention overcomes the limitation of traditional mesh stitching, which requires strict matching of coordinate points. Through local dynamic mesh reconstruction technology, it achieves intelligent filling of mismatched nodes / faces / volumes, ensuring local mesh quality with a Jacobian coefficient > 0.7 and maintaining mesh topological closure. This invention can automatically handle non-strict matching situations such as overlaps, gaps, and misalignments, avoiding manual repairs and repeated iterations, significantly reducing the workload of manual intervention.
[0041] This invention overcomes the communication bottleneck problem of high concurrency in mesh generation by employing innovative dynamic load balancing technology with tens of thousands of cores and hierarchical and group communication technologies. By using dynamic domain decomposition technology, the global mesh matching problem is transformed into a multi-level subdomain collaborative solution, greatly improving the mesh stitching generation speed and testing efficiency. Figure 4 As shown.
[0042] This invention, under an innovative parallel computing architecture, optimizes memory management by dynamically dividing the comparison list in real time based on "remaining available memory," overcoming the problem of memory overflow caused by excessive single expansion of the comparison list. This breakthrough overcomes the technical bottleneck of generating unstructured meshes on the scale of tens of billions of complex geometries, achieving efficient generation of unstructured meshes on the scale of tens of billions on a domestic supercomputing platform. Tests show that 30 billion mesh units can be generated in a few hours, which is more than an order of magnitude faster than traditional methods. Attached Figure Description
[0043] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.
[0044] Figure 1 This is a schematic diagram of the structure array involved in the method for generating unstructured meshes of tens of billions of cores in parallel at a high efficiency of tens of thousands of cores in this invention;
[0045] Figure 2 This is a schematic diagram of the interfacial mesh splicing and local reconstruction involved in the method for generating unstructured meshes at the level of tens of thousands of cores in parallel according to the present invention.
[0046] Figure 3 This is a schematic diagram of mesh reading and MPI hierarchical communication between nodes involved in the method for generating unstructured meshes of tens of billions of nodes in a high-efficiency parallel process at the ten-thousand-core level in this invention.
[0047] Figure 4This is a speed comparison chart of the method for generating unstructured meshes at the level of tens of thousands of cores in parallel with the traditional classical method of "file read / write + search calculation" (the horizontal axis is the number of mesh cells, and the vertical axis is the computation time consumed, in hours).
[0048] Figure 5 This is a schematic diagram of the standard SUBOFF configuration partitioning method for generating unstructured meshes at the level of tens of thousands of cores in parallel (each diagram shows three nodes). Detailed Implementation
[0049] The technical solution of the present invention will now be clearly and completely described with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0050] The present invention will be further explained below with reference to specific embodiments.
[0051] This embodiment provides a method for efficiently generating unstructured meshes with tens of billions of cores in parallel, which mainly includes the following steps:
[0052] (1) For mesh generation with high communication load, an MPI layered and packet communication architecture is constructed based on the total mesh size and the number of blocks to achieve parallelism at the level of 100,000 cores; and a structure array matching the MPI layered and packet communication architecture is established, such as Figure 1 As shown, this is to optimize data layout and improve communication efficiency;
[0053] The specific process of constructing the MPI layered and packet communication architecture is as follows:
[0054] Initialize the global communication domain comm_world and start P parallel MPI processes; the main process sequentially reads the cell counts NC1, ..., NC of M small-scale meshes. M And calculate the total number of units NC global ; Divide the communication into M sub-communication domains (sub_comm) to establish a hierarchical communication model between the global communication domain and sub-communication domains (e.g., Figure 3 (as shown), and the packet communication modes of M sub-communication domains; the j-th sub-communication domain contains L i Local processes if Then from the i=1 to L i Execution L i =L i +1, so that each process can process almost the same amount of data, thereby improving parallel efficiency;
[0055] To store grid information other than the number of grid cells, a structure array is created. This structure array contains coordinates, grid cell composition, boundary surfaces, and edge types. The aforementioned structure array also contains local grid information and the connection relationship between grids in adjacent processes.
[0056] (2) All processes read M blocks of unstructured grid data in parallel, distribute them evenly across the memory of P processes, and establish grid connections between adjacent processes, thereby overcoming the single-node memory limitation and achieving parallel processing across tens of thousands of cores; the specific process is as follows:
[0057] M sub-grids are read in parallel through M sub-communication domains (sub_comm). The grid information read includes node information (NN). i Individual and volumetric mesh cell link information NC i Strip and surface mesh element information NF i The number of nodes is given, and i ∈ [1, M]; in the i-th sub-community, the j-th process is assigned NN. sub,ij Node coordinates, NC sub,ij Individual grid cell information and NF i Boundary surface information; NN sub,ij =int(NN) i / L i ),if Then NN sub,ij =NN sub,ij +1, NN sub,ij =int(NN) i / L i ),if Then NN sub,ij =NN sub,ij +1,
[0058] Then, within the sub-communication domain sub_comm, the node numbers of the grid cells in the current process are calculated based on the process number. The nodes with the same numbers are compared and analyzed on the grid cells at the interface of adjacent processes, and the connection relationship of the stored grids between adjacent processes is calculated. Then, in the global communication domain comm_world, the range of node, cell, and face number of the grids stored in different sub-communication domains in the global grid is compared and analyzed.
[0059] (3) Efficient adaptive matching of the intersection surfaces of sub-mesh in an M-block unstructured mesh, such as Figure 2 As shown, if the coordinate points on both sides of the interface coincide, the mesh is directly replaced; if the coordinate points on both sides of the interface do not coincide, one side of the mesh is deleted, and a new mesh is locally regenerated to achieve adaptive matching; the specific process is as follows:
[0060] (3.1) All processes started in step (1) simultaneously search in parallel for the faces / points marked as "internal faces / points" in the current process, and store the information of these faces / points in Sel. ij In the array, i∈[1,M],j∈[1,L] i ]; Set Sel in the i-th communication domain ij Send it to all processes in the (i+1)th communication domain via MPI communication, and find the matching Sel in the (i+1)th communication domain. ij Corresponding coordinates Sel i+1,j And record the number of corresponding points Ndel found. i+1,j .
[0061] (3.2) If all received Sel ij If the corresponding coordinates can all be found in the i+1 communication domain, then delete these corresponding Ndels. i+1,j Each coordinate point, delete the Fdel formed by these points i+1,j There are boundary surfaces, and Ndel is recorded. i+1,j The point's index in the i-th grid block is Ind ij .
[0062] (3.3) If the received Sel ij If not all corresponding coordinate points can be found in the i+1 communication domain, then delete the corresponding Ndel points that have already been found. i+1,j Number of coordinate points, record deletion Ndel i+1,j The reduction in the boundary surface caused by each coordinate point, Fdel i+1,j Record Ndel i+1,j The point's index in the i-th grid block is Ind ij Then, for the remaining points Nrsd whose corresponding coordinates were not found... ij For reference, delete points NRdel in the (i+1)th communication domain whose distance is less than the local grid scale. i+1,j Delete NRdel record i+1,j Each point causes a reduction in the number of mesh cells, which is NCRdel. i+1,j Furthermore, with respect to point Nrsd ij Connected boundary surfaces and deletion of NRdel i+1,j The newly formed boundary surface is the front surface. The front surface advancement method is used to locally generate tetrahedral meshes to fill and delete NRdel. i+1,j Pointing causes the mesh cells to decrease NCRdel i+1,j The resulting space, ensuring the quality of the locally generated mesh so that the Jacobian coefficient is >0.7, and recording the newly added points Nadd i+1,j , face Fadd i+1,j and NCadd i+1,j The quantity.
[0063] (4) Efficiently and in parallel adjust the numbering of point, surface, and volume mesh elements in the global mesh, reconstruct the connection relationships of surface and volume elements in the global mesh, and check and ensure the correctness of the mesh information; the specific process is as follows:
[0064] (4.1) After all processes started in step (1) have completed step (3), each communication domain shall organize and reduce the number of points, surfaces and volumes in its respective communication domain. And the reduction in quantity due to deletion The Nadd obtained from the reduction method i Fadd i NCadd i 、Ndel i 、Fdel i NCdel i This is broadcast to all processes via MPI; thus, the range of node numbers in the k-th communication domain is determined as follows: arrive The grid cell number is from arrive The boundary surfaces are numbered from arrive Furthermore, when k=1, the node number, mesh cell number, or boundary surface number is 1.
[0065] (4.2) Based on the starting position of the local grid in the global grid given in step (4.1) above, reorganize all grid numbering information to achieve the integrated generation of the global grid; wherein, the method for the integrated generation of the global grid is as follows: if a deleted point is encountered in the node numbering in the communication domain, it is filled in sequentially by the subsequent points, and the mapping relationship G between the original number l and the integrated number is recorded. ij (l); due to the mapping relationship Ind ij and G ij (l) Since they only have local process parts, they need to be redefined as global rules within sub-communication domains, specifically for Ind. i and G i (l), and the reduction operation is to convert Ind ij According to j from 1 to L i Find the union of the sets in order to obtain Ind i G i (l) operation method and Ind i Same; deleted points in surface and volume meshes are ordered according to Ind. i Adjustments were made, and points that were not deleted were adjusted according to G. i(l) Adjust the mapping relationship; then adopt the improved hash table method, that is, dynamically divide the data to be compared and the hash table into blocks. The block size is dynamically determined by the memory function based on the remaining available memory capacity to avoid memory overflow and achieve the generation of tens of billions of grids.
[0066] (4.3) After all processes started in step (1) complete step (4.2) above, check whether the grid information is correct; the method for checking whether the grid information is correct is as follows:
[0067] ① The maximum values of the grid point, surface, and volume numbers in the k-th communication domain should be respectively:
[0068] ② The number of nodes contained in the surface mesh and volume mesh elements should be consistent with the original mesh, and the maximum number should be 4 and 8 respectively;
[0069] ③ Every boundary surface mesh element must have a unique corresponding volume mesh element; if an error occurs, the current process outputs the error messages corresponding to ①-③, and then the current process sends an exit command to all other processes.
[0070] (5) Analyze the global mesh boundary, reconstruct the boundary information of the global mesh, and check it; the specific process is as follows:
[0071] (5.1) The i-th sub-mesh contains Nbc i Each boundary and NF i -Fdel i For each boundary surface element, all boundary surface elements are categorized and organized according to their attributes: "wall," "far field," "symmetry plane," "periodic boundary," "velocity inlet," "velocity outlet," and "slip surface." The types of boundary surface attributes are then reduced and statistically analyzed. global And type name and send to all processes started in step (1); classify all processes according to the boundary surface with the same attributes, and use 1 to Nbc. global The attribute is marked with a number from 1 to Nbc. global The corresponding attribute type name; organize the boundary surfaces with the same attribute to form Nbc. global The geometry of each boundary surface is used to reconstruct the boundary mesh;
[0072] (5.2) Check the correctness of the reconstructed boundary surface mesh; if the mesh information is correct, the current process will output the generated mesh in parallel I / O mode according to the file position corresponding to the current process ID, and release all memory; the method for checking the correctness of the reconstructed boundary surface mesh is as follows:
[0073] ① Boundary surface attribute check
[0074] Traverse all boundary surfaces and verify that their predefined boundary condition identifier fields are not empty, unique, and within the set of valid values. If a boundary surface is valid, it is considered correct. The set of valid values includes "wall", "far field", "symmetry surface", "periodic boundary", "velocity inlet", "velocity outlet", "slip surface", and "internal surface".
[0075] ② After sorting the global node numbers of each boundary surface, a unique signature is generated, and a hash table is used to quickly check for duplicates within the process; if a signature maps to only one boundary surface, then the boundary surface is correct, and if a signature maps to two or more, it is a duplicate error.
[0076] For a typical submersible standard model with a submersible configuration, a CFD numerical simulation mesh of approximately 2.1 billion cells is planned to be generated. The specific implementation steps are as follows:
[0077] 1) Define the computational domain. Set up a rectangular computational domain with dimensions of 80L, 60L, and 60L (L is the model length), and roughly divide the computational domain into M = 20 sub-regions. Generate structured or unstructured meshes in each sub-region, ensuring that the number of mesh nodes at the boundaries between sub-regions is roughly the same.
[0078] 2) Start the parallel program and read the control parameters. Start 25,000 MPI processes on the supercomputing platform (P=25,000), start the global communication domain comm_world and allocate P=25,000 processes. Then, use the main process to read the parameter control file, reading in the file path, file name / file type, number of groups (or number of sub-regions), output grid name, output grid path, grid interface matching precision, etc.
[0079] 3) Allocate grid data and optimize computational load. The main program reads the number of grid cells NC for each sub-region according to the path and file name, and calculates the total number of cells NC. global Based on the sub-grid information, M=20 sub-communication domains (sub_comm) are dynamically divided, and P=25000 processes are allocated to these M=20 sub-communication domains according to the number of sub-region grid cells (NC) as the weight. Then, the local storage array (e.g., ...) is initialized. Figure 1 As shown, all processes read the corresponding sub-region grid according to groups, and the total number of grids read by each process is basically the same to ensure balanced process load within each sub-domain. Furthermore, free edges / faces are compared and analyzed within the sub-communication domain to obtain the connection relationships of all local process-read grids in the sub-region grid, as well as their connection relationships with other adjacent grids; global communication calculations obtain the number and connection relationships of each sub-domain grid in the global grid.
[0080] 4) Locate and adaptively match the interfaces. Use the keyword "Interface_XXX" output by the mesh boundary conditions to locate the interfaces. Use the interface information to determine the FPC vertices of the interface and the volume mesh elements connected to the interface. Collect the number of interfaces and the coordinates of the vertices in the sub-communication domain and send them to the next sub-communication domain corresponding to the current sub-region. Compare the received interface vertex coordinates with the local interface vertex coordinates. If all received interface vertices have corresponding boundary vertices in the current sub-communication domain, delete all corresponding vertices and the interfaces formed by these vertices in the current sub-communication domain. If some coordinates are not matched, first delete the matched points and boundary surfaces, and then delete points in the adjacent domain that are less than the local mesh scale, using the remaining unmatched points as a reference. Subsequently, using the remaining boundary surfaces as the front surface, generate a tetrahedral mesh locally using the front surface advancement method, ensuring mesh quality.
[0081] 5) Reconstruct global mesh node, face, and cell numbers. After all processes have completed the interface matching, each sub-communication domain counts the increase and decrease of points, faces, and volumes, and broadcasts this count to all processes via MPI. Based on the statistics, determine the global numbering range and the node and cell numbering range for each sub-communication domain. Subsequently, establish the relationship between the original numbering l and the global numbering G. ij The mapping relationship of (l) is used to merge the mapping table Ind through subdomain reduction operations. i And Gi(l). The specific matching search method is an improved hash table method, which dynamically divides the data to be compared and the hash table into blocks. The block size is dynamically determined by the remaining available memory capacity to avoid memory overflow. Finally, the correctness of the mesh information is checked, including the number range, the number of element nodes, and the correspondence between boundary surfaces and volume elements.
[0082] 6) Analyze and reconstruct the boundary in the global mesh, and output the generated mesh. Classify the boundary surfaces according to their attributes (e.g., wall, far field, symmetry plane, etc.), and count the number of attribute types (Nbc). global The attribute type is then labeled with a number. Boundary surfaces with the same attribute are organized to form Nbc. global The boundary geometry is then analyzed. Next, it is checked whether all boundary faces have unique properties and whether duplicate faces exist. If the checks pass, the final mesh file is output in parallel via MPI-IO and memory is released. Figure 5 The CFD computation mesh of the final generated attached SUBOFF model is shown, with each sub-region marked by a light yellow line.
[0083] This invention is well-conceived. It uses a method to generate ultra-large-scale unstructured grids by splicing small-scale grids, breaking through the limitation of unstructured grids of tens of billions, and combining it with MPI (Message Passing Interface) parallel communication to greatly improve computational efficiency, so that the splicing and generation of unstructured grids of tens of billions can be completed in one or two hours.
[0084] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some or all of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for efficiently generating unstructured meshes with tens of billions of cores in parallel, characterized in that... It mainly includes the following steps: (1) For mesh generation with high communication load, an MPI layered and packet communication architecture is constructed based on the total mesh size and the number of blocks to achieve parallelism at the level of 100,000 cores; and a structure array matching the MPI layered and packet communication architecture is established; (2) All processes read M blocks of unstructured grid data in parallel, distribute them evenly to the memory of P processes, and establish the grid connection relationship between adjacent processes, thereby breaking through the single-node memory limit and realizing 10,000-core parallelism. (3) Efficient adaptive matching of the sub-mesh corresponding to the intersection surface in the M-block unstructured mesh. That is, if the coordinate points on both sides of the intersection surface coincide, it is directly replaced; if the coordinate points on both sides of the intersection surface do not coincide, one side of the mesh is deleted and a new mesh is locally regenerated to achieve adaptive matching. (4) Efficiently and in parallel adjust the numbering of point, surface, and volume mesh elements in the global mesh, reconstruct the connection relationship of surface and volume elements in the global mesh, and check and ensure the correctness of mesh information; (5) Analyze the global grid boundary, reconstruct the boundary information of the global grid and check it.
2. The method for efficiently generating unstructured meshes with tens of billions of cores in parallel as described in claim 1, characterized in that, The specific process of constructing the MPI layered and packet communication architecture in step (1) is as follows: Initialize the global communication domain comm_world and start P parallel MPI processes; the main process sequentially reads the cell counts NC1, ..., NC of M small-scale meshes. M And calculate the total number of units NC global ; The system divides the communication domain into M sub-communication domains (sub_comm) to establish a hierarchical communication mode between the global communication domain and the sub-communication domains, as well as a group communication mode for the M sub-communication domains; the i-th sub-communication domain contains L... i Local processes ;if Then from the i=1 to L i Execution This allows each process to handle almost the same amount of data, thus improving parallel efficiency; The structure array contains the number of grid cells, the number of grid nodes, the coordinates of the grid nodes, the composition of the grid cells, the boundary surfaces, and the boundary types; the structure array also contains local grid information and the connection relationship between grids in adjacent processes.
3. The method for efficiently generating unstructured meshes with tens of billions of cores in parallel as described in claim 2, characterized in that... The specific process of step (2) is as follows: M sub-grids are read in parallel through M sub-communication domains (sub_comm). The grid information read includes node information (NN). i Individual and volumetric mesh cell link information NC i Strip and surface mesh element information NF i There are 1, i ∈ [1, M]; In the i-th sub-communication domain sub_comm, the j-th process is assigned NN sub,ij Node coordinates, NC sub,ij Individual grid cell information and NF i Boundary surface information; ,if but , ; Then, within the sub-communication domain sub_comm, the node numbers of the grid cells in the current process are calculated based on the process number. The nodes with the same numbers are compared and analyzed on the grid cells at the interface of adjacent processes, and the connection relationship of the stored grids between adjacent processes is calculated. Then, in the global communication domain comm_world, the range of node, cell, and face number of the grids stored in different sub-communication domains in the global grid is compared and analyzed.
4. The method for efficiently generating unstructured meshes with tens of billions of cores in parallel as described in claim 3, characterized in that... The specific process of step (3) is as follows: (3.1) All processes started in step (1) simultaneously search for faces / points marked as "internal faces / points" in the current process in parallel, and store this face / point information in Sel. ij In the array, i∈[1,M], j∈[1,L] i ]; Set Sel in the i-th communication domain ij Send it to all processes in the (i+1)th communication domain via MPI communication, and find the matching Sel in the (i+1)th communication domain. ij Corresponding coordinates Sel i+1,j And record the number of corresponding points Ndel found. i+1,j ; (3.2) If all received Sel ij If the corresponding coordinates can all be found in the i+1 communication domain, then delete these corresponding Ndels. i+1,j Each coordinate point, delete the Fdel formed by these points i+1,j There are boundary surfaces, and Ndel is recorded. i+1,j The point's index in the i-th grid block is Ind ij ; (3.3) If the received Sel ij If not all corresponding coordinate points can be found in the i+1 communication domain, then delete the corresponding Ndel points that have already been found. i+1,j Number of coordinate points, record deletion Ndel i+1,j The reduction in the boundary surface caused by each coordinate point, Fdel i+1,j Record Ndel i+1,j The point's index in the i-th grid block is Ind ij Then, for the remaining points Nrsd whose corresponding coordinates were not found... ij For reference, delete points NRdel in the (i+1)th communication domain whose distance is less than the local grid scale. i+1,j Delete NRdel record i+1,j Each point causes a reduction in the number of mesh cells, which is NCRdel. i+1,j Furthermore, with respect to point Nrsd ij Connected boundary surfaces and deletion of NRdel i+1,j The newly formed boundary surface is the front surface. The front surface advancement method is used to locally generate tetrahedral meshes to fill and delete NRdel. i+1,j Pointing causes the mesh cells to decrease NCRdel i+1,j The resulting space, ensuring the quality of the locally generated mesh so that the Jacobian coefficient is >0.7, and recording the newly added points Nadd i+1,j , face Fadd i+1,j and NCadd i+1,j The quantity.
5. The method for efficiently generating unstructured meshes with tens of billions of cores in parallel as described in claim 4, characterized in that... The specific process of step (4) is as follows: (4.1) After all processes started in step (1) have completed step (3), each communication domain shall organize and reduce the number of points, surfaces and volumes in its respective communication domain. , , And the reduction in quantity due to deletion , , The obtained by sorting out the regulations This is broadcast to all processes via MPI; thus, the range of node numbers in the k-th communication domain is determined as follows: arrive The grid cell number is from arrive The boundary surface is numbered from arrive Furthermore, when k=1, the node number, mesh cell number, or boundary surface number is 1. (4.2) Based on the starting position of the local mesh in the global mesh given in step (4.1) above, reorganize all mesh numbering information to realize the integrated generation of the global mesh; (4.3) After all processes started in step (1) have completed the above step (4.2), check whether the grid information is correct.
6. The method for efficiently generating unstructured meshes with tens of billions of cores in parallel as described in claim 5, characterized in that... The method for generating the global mesh in step (4.2) is as follows: if a deleted point is encountered in the node numbering of the communication domain, it is filled in sequentially by the subsequent points, and the mapping relationship G between the original number l and the integrated number is recorded. ij (l); due to the mapping relationship Ind ij and G ij (l) Since they only have local process parts, they need to be redefined as global rules within sub-communication domains, specifically for Ind. i and G i (l), and the reduction operation is to convert Ind ij According to j from 1 to L i Find the union of the sets in order to obtain Ind i G i (l) operation method and Ind i Same; deleted points in surface and volume meshes are ordered according to Ind. i Adjustments were made, and points that were not deleted were adjusted according to G. i (l) Adjustment of mapping relationships; An improved hash table method is then adopted, which dynamically divides the data to be compared and the hash table into blocks. The block size is dynamically determined by a memory function based on the remaining available memory capacity, thus avoiding memory overflow and achieving the generation of tens of billions of grids.
7. The method for efficiently generating unstructured meshes with tens of billions of cores in parallel as described in claim 5, characterized in that... The method for checking the correctness of the grid information in step (4.3) is as follows: ① The maximum values of the grid point, surface, and volume numbers in the k-th communication domain should be respectively: , , ; ② The number of nodes contained in the surface mesh and volume mesh elements should be consistent with the original mesh, and the maximum number should be 4 and 8 respectively; ③ Every boundary surface mesh element must have a unique corresponding volume mesh element; if an error occurs, the current process outputs the error messages corresponding to ①-③, and then the current process sends an exit command to all other processes.
8. The method for efficiently generating unstructured meshes with tens of billions of cores in parallel as described in claim 5, characterized in that... The specific process of step (5) is as follows: (5.1) The i-th sub-mesh contains Nbc i Each boundary and For each boundary surface element, all boundary surface elements are categorized and organized according to their attributes: "wall," "far field," "symmetry plane," "periodic boundary," "velocity inlet," "velocity outlet," and "slip surface." The types of boundary surface attributes are then reduced and statistically analyzed. global And type name and send to all processes started in step (1); classify all processes according to the boundary surface with the same attributes, and use 1 to Nbc. global The attribute is marked with a number from 1 to Nbc. global The corresponding attribute type name; organize the boundary surfaces with the same attribute to form Nbc. global The geometry of each boundary surface is used to reconstruct the boundary mesh; (5.2) Check the correctness of the reconstructed boundary surface mesh; if the mesh information is correct, the current process will output the generated mesh in parallel IO according to the file position corresponding to the current process ID, and release all memory.
9. The method for efficiently generating unstructured meshes with tens of billions of cores in parallel as described in claim 8, characterized in that... The method for checking the correctness of the reconstructed boundary surface mesh in step (5.2) is as follows: ① Boundary surface attribute check Traverse all boundary surfaces and verify that the predefined boundary condition identifier field is not empty, unique, and within the set of valid values; if the boundary surface is correct, then the boundary surface is correct. ② After sorting the global node numbers of each boundary surface, a unique signature is generated, and a hash table is used to quickly check for duplicates within the process; if a signature maps to only one boundary surface, then the boundary surface is correct, and if a signature maps to two or more, it is a duplicate error.