Method and apparatus for multiscale training of physical information neural networks

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
The multi-Adam optimizer addresses the instability and inefficiency of PINNs by balancing loss terms, ensuring stable and fast convergence for PDE solutions.

JP2026521815APending Publication Date: 2026-07-01ROBERT BOSCH GMBH +1

Patent Information

Authority / Receiving Office: JP · JP
Patent Type: Applications
Current Assignee / Owner: ROBERT BOSCH GMBH
Filing Date: 2023-07-05
Publication Date: 2026-07-01

AI Technical Summary

Technical Problem

Traditional methods for solving partial differential equations (PDEs) face challenges such as unrealistic predictions, high computational cost, and inefficiency, particularly in high dimensions, and Physics-Informed Neural Networks (PINNs) suffer from unbalanced loss functions and inappropriate scaling, leading to instability and slow convergence.

Method used

A scale-invariant optimizer, referred to as multi-Adam, is introduced to equilibrate loss terms by dividing them into groups based on PDEs, boundary conditions, and optionally initial conditions, using equal hyperparameters for primary and secondary momentum to achieve stable training and fast convergence.

Benefits of technology

The multi-Adam optimizer ensures balanced loss scaling, resulting in stable and efficient training of PINNs, enabling accurate and rapid convergence for PDE solutions.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure 2026521815000001_ABST

Patent Text Reader

Abstract

A computer-implemented method for training a neural network to approximate a solution to a partial differential equation (PDE) is disclosed. The method includes constructing a loss function for a neural network by a weighted sum of loss terms generated based on at least one PDE and one or more boundary conditions, dividing the loss terms generated based on one or more PDEs into separate groups having loss terms generated based on one or more boundary conditions, updating each of the groups of loss terms, and averaging the updates of all groups of loss terms as a step size for updating the parameters of the neural network at each epoch. Several other embodiments are also disclosed.

Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field]

[0001] The aspects of this disclosure generally relate to artificial intelligence, and more specifically to methods and apparatus provided for multiscale training of a neural network based on physical information using a parameter-by-parameter scale-invariant optimizer. [Background technology]

[0002] Numerical solutions to partial differential equations (PDEs) represent a significant branch of applied mathematics, a hot topic for research in various fields, and play a crucial role in real-world industrial applications, particularly in the simulation of a wide range of physical components. Finding accurate and efficient methods for solving PDEs would be beneficial.

[0003] Traditional methods for solving PDEs (Physics-Defined Exponential Derivatives) are numerical methods such as the finite difference method and the finite element method. Nevertheless, various numerical methods can generate unrealistic predictions for certain scientific problems, and these methods struggle to handle PDEs in high dimensions. Furthermore, these methods are time-consuming and expensive. Therefore, there is a growing trend to combine machine learning techniques with PDE solutions. Physics-Information Neural Networks (PINNs) are one of the leading approaches. PINNs have demonstrated their effectiveness in various advanced applications and are already being applied in a wide range of fields, including fluid dynamics, bioengineering, and metamaterial design. [Overview of the project] [Problems that the invention aims to solve]

[0004] However, the original PINN still suffers from several problems during training. One of the problems is the gap between the loss function of the PINN and the actual absolute error. In practical cases, one of the loss terms of the PINN may be several orders of magnitude larger than the others and occupy most of the training process. In that case, contrary to the usual situation, a low total loss does not necessarily mean a better approximation. One of the important causes of the above problem is the inappropriate scaling of the PDE region. Scaling can have a significant impact on the PDE loss when the PDE is not invariant with respect to scaling. Therefore, a method for training the PINN is needed to counter the effects of unbalanced losses and inappropriate region scaling.

Means for Solving the Problems

[0005] Summary of the Invention The following presents a simplified overview of one or more aspects in order to provide a basic understanding of such aspects. This overview is not an extensive overview of all contemplated aspects, nor is it intended to identify the main or important elements of all aspects or to define the scope of any or all aspects. Its sole purpose is to present, in a simplified form, some concepts of one or more aspects as a prelude to the more detailed description presented later.

[0006] The physics-informed neural network (PINN) has achieved remarkable progress in solving partial differential equations in various fields by encoding the PDE as a loss. Although the PINN is favored for its simplicity and flexibility, the convergence and accuracy of the PINN have been taken up as a very big problem, especially in the case of systems with multi-scale characteristics that can be caused by different numbers of sampling points distributed on the boundary and within the solution region. Therefore, there is a need to improve the training of the PINN towards an unbalanced optimization objective function.

[0007] This disclosure provides a scale-invariant optimizer that uses gradient momentum to equilibrate the loss for each parameter. Furthermore, by using a parameter-specific scale-invariant optimizer, training stability under domain scaling and fast convergence rates is also achieved.

[0008] In one embodiment, a method is disclosed. This method includes constructing a loss function for a neural network by a weighted sum of loss terms generated based on at least one PDE and one or more boundary conditions; dividing the loss terms generated based on one or more PDEs into separate groups having loss terms generated based on one or more boundary conditions; updating each of the groups of loss terms; and averaging the updates of all groups of loss terms as a step size for updating the parameters of the neural network in each epoch.

[0009] In a further embodiment, dividing loss terms generated based on one or more PDEs into separate groups having loss terms generated based on one or more boundary conditions further includes dividing each loss term generated based on each PDE of one or more PDEs into one separate group, and the loss terms generated based on one or more boundary conditions into one group.

[0010] In a further embodiment, the weighted sum of loss terms is generated based on one or more initial conditions, and the loss terms generated based on each of the one or more initial conditions are divided into separate groups.

[0011] In a further embodiment, all groups of loss terms are updated by the Adam optimizer.

[0012] In a further embodiment, the hyperparameters β1 and β2 of the Adam optimizer are set to equal values.

[0013] In a further embodiment, the loss term of the loss function is the L2 loss.

[0014] In a further embodiment, one or more PDEs are one or more of the following: Maxwell's equations, Navier-Stokes equations, Poisson's equation, Helmholtz equation, heat and diffusion equations, equilibrium differential equations, displacement equations and / or principal equations.

[0015] In one embodiment, a computer-implemented method for approximating a solution to a partial differential equation (PDE) using a neural network trained by one of the methods described herein includes inputting the coordinates of a point within the solution domain of the PDE into the neural network, and outputting the solution of the PDE corresponding to the point by the neural network.

[0016] In a further embodiment, PDE is Maxwell's equation, and multiple solutions of PDE correspond to distributions of electromagnetic fields that fit Maxwell's equation. Multiple solutions of PDE can be used in the design of DC / DC converters.

[0017] In a further embodiment, PDE is the Navier stroke equation, and multiple solutions of PDE correspond to velocity and pressure fields that fit the Navier stroke equation. Solutions of PDE can be used in the implementation of fuel cells.

[0018] In one embodiment, a computer system is disclosed. The computer system comprises one or more processors and one or more storage devices that store computer executable instructions, which, when executed, cause one or more processors to perform one of the operations disclosed herein.

[0019] In one embodiment, one or more computer-readable storage media are disclosed that store computer-executable instructions, when executed, cause one or more processors to perform one of the operations disclosed herein.

[0020] In one embodiment, a computer program product is disclosed that includes computer executable instructions, when executed, causing one or more processors to perform one of the operations disclosed herein.

[0021] The disclosed aspects are described in relation to the accompanying drawings, which are provided to illustrate, not to limit, the disclosed aspects. [Brief explanation of the drawing]

[0022] [Figure 1] Exemplary Poisson equations in the complex domain according to various aspects of this disclosure are illustrated. [Figure 2] Exemplary flowcharts illustrating various aspects of this disclosure for training a neural network to approximate a solution to a partial differential equation (PDE) are shown. [Figure 3] Exemplary flowcharts illustrating the use of trained neural networks in various aspects of this disclosure are shown. [Figure 4] Exemplary computer systems in various aspects of this disclosure are illustrated. [Modes for carrying out the invention]

[0023] Herein, the present disclosure is considered with reference to several exemplary implementations. It should be understood that these implementations are considered not to imply any limitation on the scope of the present disclosure, but solely to enable those skilled in the art to better understand and therefore implement embodiments of the present disclosure.

[0024] Various embodiments are described in detail with reference to the accompanying drawings. Wherever possible, the same reference numerals are used throughout the drawings to refer to the same or similar parts. References made to examples and embodiments are for illustrative purposes only and are not intended to limit the scope of this disclosure.

[0025] Physical Information Neural Networks (PINNs) are machine training techniques used to solve partial differential equations (PDEs) that can approximate the solution of a PDE by training a neural network, thereby minimizing the differentiable loss. Firstly, an exemplary PDE is defined as follows, with corresponding boundaries and initial conditions:

[0026]

number

[0027] Here, f is a differential equation, u is the solution to the equation, Ω is the domain, and ∂Ω is its boundary. Furthermore, λ is an additional parameter, B and I are the boundary and initial conditions, and T i This is the set corresponding to the initial conditions.

[0028] Next, the model's loss function can be defined as a weighted sum of the three losses generated by the above constraints, as follows:

[0029]

number

[0030] Here, wf, w b , w i These are the weights of different losses, T, Tf, and T b , Ti L is the number of sampling points. Next, the loss function Lf, L b , L i These can be defined as follows, and these are the L2 losses in this example.

[0031]

number

[0032] When training a PINN, the goal is to optimize different losses simultaneously, as shown in equation (3), and to make all loss terms as low as possible. However, in real-world cases, the PDE loss and boundary loss can be several orders of magnitude different, leading to a failure to approach the correct solution. One of the main reasons for this problem is improper scaling of the domain. Most PDEs are not scaled invariantly, which causes changes in the domain that force a rescaling of the PDE loss.

[0033] Figure 1 illustrates exemplary Poisson equations in the complex domain according to various aspects of this disclosure.

[0034] An exemplary reference solution to the Poisson equation is shown on the left side of Figure 1, the central image shows the training results of a baseline PINN on an 8x8 domain, and the right image shows it on a 1x1 domain. As illustrated, when training on a 1x1 domain, the Poisson equation is not scaled invariantly, making it difficult for the model to fit boundary conditions, and the PDE loss and boundary loss differ significantly when narrowed from an 8x8 domain to a 1x1 domain.

[0035] PINNs are generally optimized by adaptive moment estimation (Adam) optimizers, which differ from stochastic gradient descent (SGD) optimizers. Adam uses primary and secondary momenta to calculate the step size for updating neural network parameters. However, the Adam optimizer often behaves abnormally when the scale of the loss term and the convergence rate change significantly.

[0036] Observing that domain scaling can significantly impact PINN training, this disclosure proposes a novel optimizer to address this challenge.

[0037] As mentioned above, since each PDE and boundary condition is an individual objective, it is natural to view PINN optimization as a multi-task training problem. Furthermore, it is straightforward to consider a method for reweighting the loss term to correct imbalances between the individual objective functions.

[0038] The inventors have recognized that the primary and secondary momentum of the Adam optimizer are relatively stable. Secondly, the secondary momentum essentially reflects the inherent difference between the scales of the PDE loss and the boundary loss. By using the secondary momentum as a weight, the PDE loss and the boundary loss will have approximately equivalent scales.

[0039] Based on the idea of considering PINN optimization as a multitasking problem, the loss of PINN is divided into several groups, as disclosed in this disclosure.

[0040] In one embodiment of the disclosed method, all PDE losses are divided into one group, and all boundary losses are divided into another group. This is because the PDE losses and boundary losses differ by several orders of magnitude.

[0041] In other embodiments of the method of the present disclosure, since different PDEs are likely to have different intrinsic scaling factors, each PDE loss is divided into separate groups, and all boundary losses are divided into separate groups, causing imbalances within the same group, while boundary losses are typically calculated by measuring the L2 error on the sampling point and are invariant with respect to the scaling of the region.

[0042] In further embodiments of the method of the present disclosure, all initial losses are divided into a single group, excluding PDE losses and boundary losses, due to a potential imbalance between PDEs and boundary conditions.

[0043] For each group, the primary and secondary momentum are maintained separately, and the update for each group is determined in the same way as the Adam optimizer. Finally, the updates for each group are averaged and used as the final update step size for the network parameters. Due to its nature as a multi-task problem, the optimizer proposed in this disclosure is referred to as multi-Adam.

[0044] In one embodiment, the multi-Adam process is shown in Table 1 below.

[0045] [Table 1]

[0046] Beta in Adam Optimizer 1、 While β2 is typically set to (0.9, 0.999) by default, these values are not optimal for the proposed multi-Adam. For best convergence of PINN training, the inventors recognize that the ability of scale invariance is related to the equivalence of β1 and β2. Equivalence means that the scaling factor is eliminated by the optimizer tracking the same historical period for the primary and secondary gradient momentum and dividing the scaling factor by a different period. In one embodiment, β1 and β2 may be set to (0.99, 0.99).

[0047] By using the proposed multi-Adam optimizer, particularly by using the second-order momentum as the weight for different groups, each loss term will have a nearly equivalent scale. Furthermore, experiments have observed that while multi-Adam is hardly optimized in the first few thousand epochs, once good estimates of the first and second-order momentum are obtained, an ultrafast convergence rate can be observed. Compared to other methods that converge slowly with much more unstable phenomena such as the Adam optimizer or GranNorm, high efficiency, stability, and accuracy can be achieved with multi-Adam.

[0048] In real-world industrial applications, systems like those in physics can be characterized by PDEs, particularly in the simulation of a wide range of physical components. In one embodiment, Maxwell's equations, a set of PDEs describing the relationship between electric and magnetic fields and charge and current densities, can be used in the design of DC / DC converters. In another embodiment, equations of fluid dynamics, such as the thermal and diffusion equations and / or the Navier-Strokes equations, are used in the implementation of fuel cells. Another example is the PDE of structural mechanisms, such as elastic mechanisms for the design of E-machines.

[0049] To solve PDEs that accurately and efficiently characterize various systems, PINNs can be trained to approximate PDE solutions in a supervised manner. The PDE constraint is introduced into the loss function, and as a result, the trained neural network fits observed real data while simultaneously minimizing PDE residuals.

[0050] Figure 2 shows exemplary flowcharts for training a neural network to approximate a solution to a bias differential equation (PDE) according to various aspects of the present disclosure. As will be described later, some or all of the illustrated configurations may be omitted in any implementation within the scope of the present disclosure, and some illustrated configurations may not be required for implementation of all embodiments. Furthermore, some blocks may be carried out in parallel or in different orders. In some embodiments, the method may be carried out by any suitable apparatus or means for performing the functions or algorithms described later.

[0051] This method begins in block 201 by constructing a neural network loss function by a weighted sum of loss terms generated based on at least one PDE and one or more boundary conditions.

[0052] In one embodiment, the input to the neural network being trained is the coordinates of a plurality of sampling points either within the boundary or within the solution region. In a further embodiment, the input to the neural network being trained also includes a time point corresponding to a time-related problem.

[0053] In one embodiment, the loss function of the neural network is L(θ,λ;T) = w f1 L f1 (θ,λ;T f ) + …w fn L fn (θ,λ;T f ) + w b1 L b1 (θ,λ;T b ) + … + w bn L bn (θ,λ;T b ) and is constructed as here, where L f1 to L fn are PDE losses, L b1 to L bn are boundary losses, w f1…fn ,w b1…bn are the weights of the PDE loss and the boundary loss, and T,T f ,T b are the number of sampling points.

[0054] In a further example, the loss function of the neural network is based on one or more initial conditions as L(θ,λ;T) = w f1 L f1 (θ,λ;T f ) + …w fn L fn (θ,λ;T f ) + w b1 L b1 (θ,λ;T b ) + … + w bn L bn (θ,λ;T b ) + w i1 L i1 (θ,λ;T i ) + … + w in L in (θ,λ;T i) can also be constructed as, here, L i1 From L in Up to this point, it is the initial loss, w i1…in These are the weights of the initial losses, T,T f ,T b ,T i This is the number of sampling points.

[0055] In one embodiment, the loss term of the constructed loss function is the L2 loss, as shown in equation (3).

[0056] In one embodiment, one or more PDEs may be one or more of the following: Maxwell's equations for electromagnetic problems, Navier-Stokes equations for flow control, Poisson's equation for electronic, magnetic and / or thermal problems, Helmholtz's equation for electromagnetic problems, thermal and diffusion equations for thermodynamic problems, equilibrium differential equations, displacement equations and major equations for elastic problems.

[0057] Next, the method proceeds to block 202, where the loss terms generated based on one or more PDEs are divided into separate groups, each having loss terms generated based on one or more boundary conditions.

[0058] Referring to the above embodiment, the loss term L generated based on one or more PDEs f1 (θ,λ;T f ) from L fn (θ,λ;T f The loss term L is generated based on one or more boundary conditions up to ). b1 (θ,λ;T b ) from L bn (θ,λ;T b It is divided into separate groups that include up to ).

[0059] In further embodiments, each loss term generated based on one or more PDEs is divided into a separate group, and each loss term generated based on one or more boundary conditions is divided into a group, resulting in L f1 (θ,λ;Tf ) is divided into one group, L f2 (θ,λ;T f ) etc are divided into one group etc, and loss term L b1 (θ,λ;T b ) from L bn (θ,λ;T b Up to ) is divided into one group.

[0060] In further embodiments, if the loss function includes loss terms based on one or more initial conditions, the loss terms corresponding to one or more initial conditions may be divided into a separate group, or each loss term corresponding to each initial condition may be divided into a separate group.

[0061] Next, the method proceeds to block 203, where all groups of the loss term are updated, and the updates of all groups of the loss term are averaged as the step size for updating the neural network parameters in each epoch.

[0062] In one embodiment, all groups of the loss term are updated by the Adam optimizer.

[0063] In further embodiments, the hyperparameters β1 and β2 of the Adam optimizer are set to equal values. For example, the hyperparameters β1 and β2 of the Adam optimizer may be set to (0.99, 0.99).

[0064] In one embodiment, the operation in block 203 may be carried out with reference to Table 1 above. After the neural network is trained, it can be used for various industrial applications, particularly in the simulation of a wide range of physical components.

[0065] Figure 3 illustrates exemplary flowcharts for using a trained neural network in various aspects of the present disclosure. As will be discussed later, some or all of the illustrated configurations may be omitted in any implementation within the scope of the present disclosure, and some of the illustrated configurations may not be required for implementation in all embodiments. Furthermore, some blocks may be performed in parallel or in different orders. In some embodiments, the method may be performed by any suitable apparatus or means for performing the functions or algorithms described later. As shown in Figure 2, the neural network is trained to approximate a solution of a PDE, for example, one or more of the following: Maxwell's equations for electromagnetic problems, Navig-Stokes equations for flow control, Poisson's equation for electrons, magnetic and / or thermal problems, Helmholtz equation for electromagnetic problems, thermal and diffusion equations for thermodynamic problems, equilibrium differential equations, displacement equations and major equations for elastic problems, etc.

[0066] This method begins in block 301 and inputs the coordinates of a point within the solution domain of the PDE. In further embodiments, a specific time is also input to the neural network for time-related problems.

[0067] Next, the method outputs a PDE solution corresponding to a point using a neural network, and proceeds to block 302. In further embodiments, for time-related problems, the output solution corresponds to a specific time.

[0068] In one embodiment, PDE is Maxwell's equation, and multiple solutions to PDE correspond to electromagnetic field distributions that fit Maxwell's equation. Multiple solutions to PDE can be used in the design of DC / DC converters.

[0069] In another embodiment, PDE is the Navier stroke equation, and multiple solutions of PDE correspond to velocity and pressure fields that fit the Navier stroke equation. Solutions of PDE can be used in the implementation of fuel cells.

[0070] In one embodiment, using points and their corresponding solutions within a specific density, a figure similar to the center and right figures of Figure 1 can be generated.

[0071] Figure 4 illustrates exemplary computer systems according to various embodiments of the present disclosure. The computer system may be configured to include at least one processor 410. The computer system may further be configured to include at least one storage device 420. Naturally, the storage device 420 may be configured to store computer-executable instructions that, when executed, cause the processor 410 to perform any operation according to embodiments of the present disclosure, as described in relation to Figures 1 to 3. Embodiments of the present disclosure may be embodied in one or more computer-readable media, such as non-temporary computer-readable media. The non-temporary computer-readable media may be configured to store instructions that, when executed, cause one or more processors to perform any operation according to embodiments of the present disclosure, as described in relation to Figures 1 to 3. Embodiments of the present disclosure may be embodied in a computer program product that, when executed, includes computer-executable instructions that cause one or more processors to perform any operation according to embodiments of the present disclosure, as described in relation to Figures 1 to 3.

[0072] Naturally, all operations in the above-described method are merely illustrative, and this disclosure is not limited to any operation in the method or the order of any sequence of such operations, and should encompass all other equivalents under the same or similar concepts.

[0073] Naturally, all modules within the aforementioned device may be implemented using various approaches. These modules may be implemented as hardware, software, or a combination thereof. Furthermore, any of these modules may be further functionally divided into submodules, or they may be combined together.

[0074] The foregoing description is provided to enable those skilled in the art to carry out the various embodiments described herein. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may also apply to other embodiments. Therefore, the claims are not intended to be limited to the embodiments shown herein. All structural and functional equivalents to elements of the various embodiments described throughout this disclosure, known or to those skilled in the art, are expressly incorporated herein by reference and are intended to be encompassed by the claims.

Claims

1. A computer-implemented method for training a neural network to approximate the solution of a partial differential equation (PDE), The loss function of the neural network is constructed by a weighted sum of loss terms generated based on at least one PDE and one or more boundary conditions, The loss terms generated based on one or more PDEs are divided into separate groups having loss terms generated based on one or more boundary conditions, Each group of the loss term is updated, and the updates of all groups of the loss term are averaged as the step size for updating the parameters of the neural network in each epoch. Computerized methods, including those mentioned above.

2. Dividing the loss terms generated based on one or more PDEs into separate groups having loss terms generated based on one or more boundary conditions is: The loss terms generated based on each of the one or more PDEs are divided into one separate group, and the loss terms generated based on the one or more boundary conditions are divided into one group. The computer-implemented method according to claim 1, including the method described in claim 1.

3. The weighted sum of the loss terms is generated based on one or more initial conditions, and the loss terms generated based on each of the one or more initial conditions are divided into separate groups. The computer-implemented method according to claim 1.

4. Each of the aforementioned loss term groups is updated by the Adam optimizer. The computer-implemented method according to claim 1.

5. The hyperparameter β of the Adam optimizer １ and β ２ It is set to an equal value. The computer-implemented method according to claim 4.

6. The loss term in the loss function is the L2 loss. The computer-implemented method according to claim 1.

7. The one or more PDEs are one or more of the following: Maxwell's equations, Navier-Stokes equations, Poisson's equation, Helmholtz equation, heat and diffusion equations, equilibrium differential equations, displacement equations and / or major equations. The computer-implemented method according to claim 1.

8. A computer-implemented method for approximating a solution to a partial differential equation (PDE) to a neural network trained according to any one of claims 1 to 7, The coordinates of the points within the solution region of the PDE are input to the trained neural network, The neural network outputs the solution of the PDE corresponding to the point, Computerized methods, including those mentioned above.

9. The PDE is Maxwell's equation, and the multiple solutions of the PDE correspond to the distribution of the electromagnetic field that fits Maxwell's equation. The computer-implemented method according to claim 8.

10. The PDE is the Navier stroke equation, and the multiple solutions of the PDE correspond to the velocity and pressure fields that fit the Navier stroke equation. The computer-implemented method according to claim 8.

11. A computer system, One or more processors, When executed, one or more storage devices storing computer executable instructions for causing one or more processors to perform the operation according to any one of claims 1 to 10, A computer system equipped with the following features.

12. One or more computer-readable storage media storing computer-executable instructions for causing one or more processors to perform an operation according to any one of claims 1 to 10 when executed.

13. A computer program product comprising a computer-executable instruction that, when executed, causes one or more processors to perform an operation according to any one of claims 1 to 10.