Neural network multi-agent consensus control method based on constructive certificate
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANJING TECH UNIV
- Filing Date
- 2026-03-30
- Publication Date
- 2026-06-30
Smart Images

Figure CN122308191A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of distributed control and machine learning technology, specifically to a neural network multi-agent consensus control method based on stability certificates, belonging to the interdisciplinary fields of multi-agent cooperative control and deep learning.
[0002] This method is applicable to application scenarios with strict requirements for collaborative accuracy and system stability, such as multi-robot collaboration, UAV swarm control, and networked aerospace platforms. Background Technology
[0003] An agent is a proxy capable of perceiving its environment and taking actions to achieve specific goals. By sensing changes in the environment (e.g., through sensors or data input), an agent makes judgments and decisions based on its learned knowledge and algorithms, and then executes actions to influence the environment or achieve predetermined objectives. In applications such as multi-robot collaboration, drone swarm control, and networked aerospace platforms, agents are robots, drones, and aircraft, respectively. They are capable of autonomous learning and continuous evolution to better complete tasks and adapt to complex environments.
[0004] Distributed cooperation in networked multi-agent systems (MASs) is a cornerstone of modern control theory. Consensus, in particular, is the core mechanism for achieving state synchronization and protocol agreement among agents, and is widely used in various cooperative tasks. Although the solvability theory for the consensus problem is relatively mature, designing controllers that can guarantee fast convergence and adapt to complex dynamics remains a significant challenge in practical applications. Existing technologies mainly suffer from the following key technical deficiencies:
[0005] 1. Traditional linear protocols suffer from performance bottlenecks and parameter tuning difficulties: Traditional fixed-form linear protocols typically rely on neighbor-based diffusion feedback laws. Although theoretically sound, in practical multi-agent networks, careful adjustment of gain or weight parameters is often necessary to ensure stability. These methods are conservative in the face of uncertainty and communication delays, and face an inherent "disagreement-effort trade-off," making it difficult to achieve optimal transient performance through limited parameter adjustments.
[0006] 2. Lack of theoretical guarantees for stability in conventional neural network controllers: While introducing neural network (NN) strategies can improve transient response and adapt to complex dynamics, embedding them in feedback loops introduces core safety hazards. The nonlinear mapping induced by neural networks has "black box" characteristics, making it difficult to analyze the stability and robustness of the closed-loop system. Unconstrained training often results in controllers that are vulnerable or even unstable under distribution shifts or unmodeled disturbances, making it difficult to meet the stringent requirements of safety-critical systems.
[0007] 3. Existing "stability-guaranteed learning" methods struggle to address the coupling problem in multi-agent networked systems: While stability-guaranteed learning methods based on Lyapunov construction or convex relaxation exist (such as neural observers and Lipschitz constraint architectures), most current results are limited to the control of a single plant. They cannot directly solve the graph-coupled closed-loop dynamics problem in networked environments, nor do they enforce distributed implementation constraints throughout the optimization process. Furthermore, existing methods often focus on post-training validation rather than maintaining certificate feasibility throughout the entire training process, resulting in low training efficiency and the possibility that the final model may still violate distributed constraints. Summary of the Invention
[0008] This invention proposes a neural network (NN) multi-agent consensus control method based on constructive certificates. Through specific NN network architecture design and constructive training process, it ensures that networked multi-agent systems (MASs) maintain closed-loop stability while learning complex transient behaviors.
[0009] The above objectives are achieved through the following technical solutions:
[0010] The present invention provides a neural network multi-agent consensus control method based on constructed certificates, which includes the following steps:
[0011] 1) Construct a distributed control network for a multi-agent system, and configure a residual neural network (ResNN) controller with shared parameters for each agent; the controller topology includes parallel linear shortcut branches and nonlinear residual branches;
[0012] 2) Define a stability constraint mechanism based on constructive certificates, calculate the upper bound of the global Lipschitz constant of the residual branch, and dynamically constrain the gain of the linear shortcut branch accordingly to construct the stability certificate of the closed-loop system.
[0013] 3) Perform end-to-end training based on differentiable simulation, unfold system dynamics within the time horizon, and jointly optimize controller parameters and stability certificate parameters through backpropagation;
[0014] 4) Update the ResNN network weights through spectral normalization projection; and balance the consistency convergence speed and control energy consumption of the multi-agent system based on the comprehensive loss function, and output the trained ResNN controller model.
[0015] Furthermore, in step 1), the ResNN controller employs a parameterization strategy. The input is the agent's local diffusion error signal. The output is a control signal;
[0016] The linear shortcut branch output ,in It is a linear shortcut gain, which is a learnable scalar gain used to maintain baseline stability;
[0017] The nonlinear residual branch It adopts a biasless multilayer perceptron (MLP) structure, specifically including an input layer, There are one hidden layer and one output layer, and the width of each neuron in each layer is configured as follows: The activation function used is ReLU, which is used to fit nonlinear transient behavior.
[0018] Furthermore, in step 2), the stability constraint mechanism is specifically derived as follows: using Lyapunov theory, it is found that when the linear shortcut gain... Global Lipschitz constant greater than the residual branch At that time, the system satisfies exponential uniform convergence; during the training process, the system constructs... Computable upper bound and force setting This ensures that the system always has a positive stability margin. .
[0019] Furthermore, in steps 3) and 4), the specific training and updating processes include:
[0020] A1: Sample the initial state and proceed. Rollout step-by-step forward differential simulation:
[0021] ;
[0022] A2: Calculate the comprehensive loss function The formula is as follows:
[0023]
[0024] in For inconsistent energy, Efforts to control For the weighting factor;
[0025] A3: After performing gradient descent updates, perform spectral clipping on the weights of each layer of the MLP to ensure that the spectral norm of each layer's weights does not exceed the dynamic threshold. This satisfies the stability certificate requirements in step S2. For learnable parameters, their parameterization update method is as follows:
[0026]
[0027] In the formula, It is a sigmoid function. These are the trainable scalar parameters in the network. and These are the preset lower and upper bounds of the spectral norm, respectively.
[0028] Compared with the prior art, the beneficial effects of the present invention include:
[0029] 1. Stability guarantee throughout the entire process:
[0030] Unlike the traditional "train first, validate later" approach, this invention ensures that the ResNN controller meets the closed-loop stability certificate at every step of training through constructive parameterization and projection, thus avoiding training divergence.
[0031] 2. Specific lightweight network design:
[0032] use The lightweight MLP structure can achieve excellent control performance with low computational cost, making it easy to deploy on intelligent agents with limited computing resources.
[0033] 3. Convergence performance is controllable:
[0034] The convergence rate of the theoretically proven system depends on the certificate margin ( This method can effectively improve the transient response speed of the system by explicitly optimizing this margin.
[0035] Experiments show that the present invention achieves comprehensive performance comparable to the optimal linear controller without the need for manual grid search parameters, significantly reduces control energy consumption, and provides strict Lyapunov stability theory guarantees throughout the training process. Attached Figure Description
[0036] Figure 1This is a schematic diagram of the policy parameterization structure of the ResNN controller of the present invention; it shows the parallel structure of linear shortcut connection and bias-independent MLP residual branch;
[0037] Figure 2 For simulation purposes in the embodiments of the present invention Schematic diagram of the communication topology of a multi-agent node system;
[0038] Figure 3 This is a schematic diagram of the convergence curve of the loss function as a function of the number of iterations during the training process of this invention.
[0039] Figure 4 The shortcut gain during the training process of this invention Upper bound of residual gain A schematic diagram of the dynamic evolution; showing the difference between the two (i.e., the stability margin). It remains positive throughout;
[0040] Figures 5(a) and 5(b) are schematic diagrams showing the comparison results of the transient response performance between the Ours method of the present invention and the benchmark methods (BL1~BL3); Figure 5(a) shows the change of average inconsistency energy over time, and Figure 5(b) shows the change of average control effort over time.
[0041] Figures 6(a) to 6(d) show representative state trajectories under the Ours method of the present invention and the benchmark methods (BL1 to BL3), respectively. Comparison diagram;
[0042] Figures 7(a) and 7(b) show the results of the present invention under different weight parameters. A schematic diagram of the average inconsistency energy (Fig. 7(a)) versus the control norm (Fig. 7(b)) curves under scanning;
[0043] Figure 8 To achieve the present invention with different weight parameters Inconsistency costs With cost control A visual illustration of the Pareto frontier;
[0044] Figures 9(a) to 9(d) show the results of the present invention under different weight parameters ( A schematic diagram illustrating the evolution of the state trajectories of each agent is provided. Detailed Implementation
[0045] The present invention will be further illustrated below with reference to the accompanying drawings and specific embodiments. It should be understood that these embodiments are for illustrative purposes only and are not intended to limit the scope of the invention.
[0046] The neural network multi-agent consensus control method of the present invention includes the following steps:
[0047] S1: Constructing a distributed ResNN controller: For leaderless multi-agent systems, a shared control strategy based on residual neural networks is constructed, which consists of a linear shortcut branch and a neural network residual branch connected in parallel;
[0048] S2: Establish a stability certificate mechanism based on gain advantage: Calculate the global upper bound of the residual branch of the neural network, and set the gain of the linear shortcut branch to be strictly greater than the upper bound of the residual gain, so as to form a sufficient condition that can guarantee consistent convergence.
[0049] S3: Perform constructive certificate training: Use a differentiable simulation framework to train the controller end-to-end. In each iteration of training, the stability certificate is forced to be satisfied through parameterized constraints and projection updates to ensure that the controller maintains closed-loop stability throughout the training process.
[0050] S4: Distributed Execution and Coordination: The trained controller calculates the control input based solely on the agent's local neighbor information, driving the multi-agent system to achieve leaderless consensus.
[0051] In step S1, the specific structure of the distributed ResNN controller is as follows:
[0052] The input signal is a locally diffused signal. Defined as an intelligent agent with his neighbors Weighted sum of state differences;
[0053] Control Law The mathematical expression for adopting a sharing strategy is: ;in, For the shortcut gain of the linear backbone branch, This is the residual branch of an unbiased multilayer perceptron (MLP);
[0054] The MLP residual branch contains two hidden layers, and the neuron widths of the input layer, the first hidden layer, the second hidden layer, and the output layer are configured as follows: The activation function chosen is the linear rectified function ReLU.
[0055] The technical principle behind MLP's use of this specific configuration is as follows:
[0056] Physical matching of dimensions: Since the controlled object is a first-order multi-agent system, both the local diffusion error signal and the control input are scalars. Therefore, the widths of the input layer and the output layer are strictly constrained to n0=1 and n3=1, respectively.
[0057] Transient reshaping and nonlinear representation: By configuring two hidden layers with a width of 16 (n1=16, n2=16), the controller is given sufficient nonlinear fitting capability under extremely low computational load, breaking through the inherent "convergence speed-energy consumption" bottleneck of traditional linear controllers. It can provide strong feedback when the system deviation is large and output a smooth and gentle signal when approaching consistency.
[0058] Guarantee of forward invariance of the system: The residual branches strictly adopt a "bias-free" design and combine with the following conditions: The ReLU activation function is used. This design inherently guarantees that when the input is 0, the network output will always be 0. This means that when the agents reach consensus (i.e., the local error signal disappears), the residual network will not generate any additional action signals, thus theoretically guaranteeing the forward invariance of the consistent manifold and the absolute convergence of the system.
[0059] In step S3, during the constructive certificate training, the gain is adjusted. The parameterization and update steps include:
[0060] Introducing unconstrained variables Using Softplus functions to Perform reparameterization: ;in, This is the upper bound of the residual branch gain calculated in the current iteration step; this formula enforces this. That is, stability margin Always positive.
[0061] In step S4, the specific operation of spectral normalization projection is as follows:
[0062] Set an upper limit for the spectral norm of each layer's weights. After each gradient descent update, the weight matrix is adjusted. Execution spectrum truncation projection: By restricting the weight norm, the expansion of the Lipschitz constant of the residual branches is prevented, thus maintaining the feasibility of the stability certificate.
[0063] In steps S3 and S4, the construction and calculation of the comprehensive loss function includes:
[0064] Using the forward Euler method, the integration step size is set. Within a finite time horizon Generate predicted trajectories internally; calculate the comprehensive loss function. : .in The inconsistency energy term is used to measure consistency. To measure energy consumption control efforts, To balance the weighting coefficients of the two.
[0065] Inconsistent energy : Measures the overall error of the current state of all agents deviating from the system's mean state; minimizing this term can help the multi-agent system quickly achieve state synchronization.
[0066] Control effort This measures the overall energy consumption and actuation amplitude required for the controller to drive all intelligent agents; by minimizing this, excessive signals can be avoided from the system output, thus achieving energy-saving control.
[0067] In this embodiment, to address the challenges of nonlinear fitting and stability verification in the cooperative control of multi-agent systems (MAS), a complete "Correct-by-Construction" control and training scheme is constructed.
[0068] 1. Experimental Environment and System Setup
[0069] This embodiment considers that... A multi-agent system consisting of several agents has the following communication topology: Figure 2 As shown, a fixed undirected connected graph with unit edge weights is used. The Laplace eigenvalues of this topology are respectively... and The system's integration step size is set to... Assessment Perspective (corresponding grid length) ), initial state obey Uniform distribution within the interval.
[0070] 2. ResNN Controller Construction
[0071] like Figure 1 As shown, this invention designs a parameterized ResNN controller. Its mathematical expression is:
[0072]
[0073] The specific structure includes:
[0074] (1) Linear shortcut branch (Shortcut Connection): by The items consist of, among which It is a learnable scalar gain. This branch provides basic linear feedback, similar to a proportional controller, to establish the baseline stability of the system.
[0075] (2) Residual Branch: by The components are constructed using a biasless multilayer perceptron (MLP).
[0076] In this embodiment, the MLP includes There are 1 hidden layer, and the specific neuron width (referring to the number of neurons in each layer) is configured as follows: The activation function used is ReLU. This branch is used to capture complex nonlinear dynamic characteristics to optimize the transient response.
[0077] 3. Constructed Certificate Training Process
[0078] The core of this invention lies in employing a training strategy based on constructive certificates, which enforces Lyapunov stability conditions through parameterized constraints at each step of training. The specific implementation steps are as follows:
[0079] ① Initialization:
[0080] Initialize network weights scalar and Set the optimizer learning rate. Batch size is .
[0081] ② Dynamic stability constraints (Certificate):
[0082] In each iteration, firstly, the upper bound of the Lipschitz constant of the residual branch is calculated. Subsequently, the shortcut gain k is parameterized to ensure... .like Figure 4 As shown, in In each training iteration, despite the shortcut gain (Orange line) and boundary (Blue line) Both are dynamically adjusted, but the difference between the two is... (Green line) is always strictly greater than This ensures the stability of the closed-loop system through its construction.
[0083] ③ Differentiable simulation (Rollout):
[0084] Initial conditions for sampling Perform K-step time evolution:
[0085] (1)
[0086] (2)
[0087] (3)
[0088] in:
[0089] m is the index of the discrete time step; h is the time step size of the numerical integration.
[0090] x[m] is the joint state vector of all agents at step m;
[0091] L is the Laplace matrix that reflects the communication topology of the intelligent agent;
[0092] Equation (1) represents the local diffusion error signal vector obtained by the topology graph for all agents. ;
[0093] This is the joint control input vector; This represents the output vector after each agent independently applies the residual network weights;
[0094] Equation (2) indicates that the control input is calculated using a shared ResNN strategy;
[0095] Equation (3) represents the iterative update of the system state to the next step based on the current state and control input;
[0096] This process utilizes an automatic differentiation framework, allowing gradient backpropagation to traverse the entire time horizon.
[0097] ④ Loss Calculation and Optimization:
[0098] Calculate the comprehensive loss function The formula is:
[0099]
[0100] In this embodiment, the loss scaling normalization during initialization is set. To balance inconsistent energy and control efforts On the order of magnitude. For example... Figure 3 As shown, the training loss eventually converges quickly and tends to stabilize.
[0101] ⑤ Spectral projection:
[0102] After updating the weights, apply spectral clipping to force the spectral norm of each layer of weights. This is to prevent the Lipschitz constant from exploding and to maintain the validity of the certificate.
[0103] 4. Performance Verification and Result Analysis
[0104] Under the same experimental conditions, this embodiment compares the performance of the constructive certificate-based ResNN controller proposed in this invention with three benchmark methods (BL1 to BL3) to verify the advantages of this method in terms of "convergence accuracy", "control energy consumption" and "overall performance".
[0105] 4.1 Comparison Benchmarks and Evaluation Indicators
[0106] (1) Comparison baseline settings:
[0107] The method of this invention (Ours): A ResNN controller trained with a constructed certificate.
[0108] BL1 (Same Gain Linear Control): Employs a linear controller ,in The shortcut gain value obtained by training in this invention can be directly used. ).
[0109] BL2 (Adjusted Gain Linear Control): Employs a linear controller, but its gain... Optimization was performed on the validation set using grid search to minimize the overall loss, ultimately yielding the desired result. This is the theoretically optimal performance that a linear controller can achieve.
[0110] BL3 (Unconstrained ResNN): It adopts the same ResNN architecture and parameterization method as this invention, but removes the spectral clipping step, that is, it does not enforce stability certificate constraints.
[0111] (2) Evaluation indicators: The following three indicators will be used for evaluation. Transient performance within time (in) The average value was taken from each trial. (Standard deviation)
[0112] Inconsistent energy integral ( ): The smaller the value, the better the convergence consistency.
[0113] Control effort integral ( ): The smaller the value, the lower the system energy consumption.
[0114] Overall score ( ): A comprehensive indicator that measures both performance and cost.
[0115] 4.2 Comparison of Experimental Results
[0116] The performance data of each model on the test set are shown in Table 1.
[0117] Table 1 Performance comparison between different models
[0118]
[0119] 4.3 Results Analysis
[0120] Based on the experimental results in Table 1 and Figures 5(a), 5(b), and 6(a) to 6(d), the specific analysis is as follows:
[0121] 1. Energy efficiency ratio is significantly better than direct linear control (compared to BL1): Although BL1 has inconsistent energy... Slightly advantageous ( However, the cost is extremely high energy consumption for control ( ). Gundam In contrast, this invention significantly reduces control effort while maintaining good convergence performance. This demonstrates that the nonlinear residual branch of ResNN effectively reshapes the control strategy, achieving more efficient control without sacrificing stability.
[0122] 2. Achieve optimal linear performance without parameter tuning (compared to BL2): BL2 represents the "ceiling" performance of a linear controller after tedious grid search tuning. Experimental data shows that the overall score of this invention is also [missing information]. This is completely consistent with BL2. This means that the present invention does not require explicit gain grid search, but can automatically find the optimal performance balance point through end-to-end training, and additionally provides theoretical stability certificate guarantee, which has higher engineering practical value.
[0123] 3. Verification of the necessity of constructive certificates (comparison to BL3): Although BL3 with stability constraints removed has the lowest control energy consumption ( However, its inconsistent energy surge to This results in the worst overall score. This directly demonstrates the crucial role of the "spectral pruning" and "constructive certificate" mechanisms in this invention: without these constraints, neural networks are highly susceptible to getting trapped in local optima where the control objective is abandoned in order to reduce energy consumption, which can even lead to system performance collapse.
[0124] 4. Pareto Trade-off Validation: Furthermore, by adjusting the weights in the loss function... This invention can generate a series of controllers with different trade-off characteristics. For example... Figure 8 As shown, with from Increase to Inconsistent costs With cost control A clear Pareto front relationship is presented. This demonstrates that the present invention can flexibly customize the control strategy according to different preferences for "response speed" or "energy saving" in actual applications.
[0125] Implementation Results: This implementation method, through specific parameter comparisons and ablation experiments, fully verifies the effectiveness of the ResNN control method based on constructed certificates. Its core advantages lie in: not only achieving comprehensive performance comparable to optimally tuned linear controllers and significantly reducing control energy consumption, but more importantly, ensuring the validity of the certificates through mathematical construction throughout the entire training process. This solves the trust problem that makes traditional neural network controllers difficult to apply in safety-critical systems.
[0126] Summarize:
[0127] This invention discloses a ResNN multi-agent consensus control method based on constructed certificates, belonging to the interdisciplinary field of multi-agent cooperative control and deep learning. The method includes:
[0128] 1) Construct a distributed controller based on residual neural network (ResNN), and adopt parallel linear shortcut branch and nonlinear residual branch structure to enhance nonlinear transient regulation capability while maintaining baseline stability;
[0129] 2) Establish a stability constraint mechanism based on constructive certificates, calculate the upper bound of the Lipschitz constant of the residual branch in real time during training, and dynamically adjust the shortcut gain accordingly to ensure that the stability margin of the closed-loop system is always positive.
[0130] 3) Perform end-to-end training based on differentiable simulation, and combine spectral pruning technology with the comprehensive loss function to jointly optimize the controller parameters.
[0131] Experiments show that the present invention achieves comprehensive performance comparable to the optimal linear controller without the need for manual grid search parameters, significantly reduces control energy consumption, and provides strict Lyapunov stability theory guarantees throughout the training process.
Claims
1. A method for multi-agent consensus control of neural networks based on constructive certificates, characterized in that, Includes the following steps: (i) Construct a distributed control network for a multi-agent system, and configure a ResNN controller with shared parameters for each agent; the controller topology includes parallel linear shortcut branches and residual branches; (ii) Define a stability constraint mechanism based on constructive certificates, calculate the upper bound of the global Lipschitz constant of the residual branch, and dynamically constrain the gain of the linear shortcut branch accordingly to construct the stability certificate of the closed-loop system. (iii) Perform end-to-end training based on differentiable simulation, unfold system dynamics within the time horizon, and jointly optimize controller parameters and stability certificate parameters through backpropagation; (iv) Update the ResNN network weights through spectral normalization projection; Based on the comprehensive loss function, the consistency convergence speed and control energy consumption of the multi-agent system are balanced to obtain the trained ResNN controller model; (v) In a multi-agent system, the controller calculates the control input based only on the local neighbor information of the agent itself, and drives the multi-agent system to achieve leaderless consensus. 2.The method of claim 1, wherein, In step (one), Configuring a ResNN controller for each agent is to employ a parameterized ResNN controller which is mathematically expressed as: , wherein, denotes a linear shortcut branch, is a linear shortcut gain, which is a learnable scalar gain; denotes the agent local spread error signal; denotes the total parameter set of the ResNN control policy; denotes the non-linear residual branch which employs a bias-free multi-layer perceptron, MLP; denotes the set of weight matrices of the residual branch.
3. The neural network multi-agent consensus control method based on constructed certificates according to claim 2, characterized in that, In the residual branch, a multi-layer perceptron (MLP) includes one hidden layer, and the neuron widths of the input layer, the first hidden layer, the second hidden layer, and the output layer are configured as , and the activation function adopts ReLU.
4. The neural network multi-agent consensus control method based on constructed certificates according to claim 1, characterized in that, In step (ii), the upper bound of the Lipschitz constant of the residual branch is first calculated ; then the linear shortcut gain is set by parameterization , which ensures and is used as a dynamic stability constraint for the training in step (iii).
5. The method of claim 4, wherein, In step (iii), a training strategy based on constructive certificates is adopted. At each step of training, the Lyapunov stability condition is enforced through parameterized constraints. The specific steps include: 1) In each iteration: parameterization and update of the gain , the method is to introduce an unconstrained variable , re-parameterize using the Softplus function form, expressed as: ; where β ub is the upper bound of the global Lipschitz constant of the residual branch calculated in the current iteration step; 2) Training and Updates: First, the initial conditions are sampled and a K-step time evolution is performed: (1) (2) (3) in: m is the index of the discrete time step; h is the time step size of the numerical integration. x[m] is the joint state vector of all agents at step m; L is the Laplace matrix that reflects the communication topology of the intelligent agent; Equation (1) represents the local diffusion error signal vector obtained by the topology graph for all agents. ; This is the joint control input vector; This represents the output vector after each agent independently applies the residual network weights; Equation (2) indicates that the control input is calculated using a shared ResNN strategy; Equation (3) represents the iterative update of the system state to the next step based on the current state and control input; Then, loss calculation and optimization: Comprehensive loss function Represented as: , In the formula, parameter t represents a continuous time variable, parameter T represents the time horizon, and parameter N represents the total number of agents in the multi-agent system. For inconsistent energy, Efforts to control This is a weighting factor.
6. The neural network multi-agent consensus control method based on constructed certificates according to claim 2, characterized in that, The specific operation of spectral normalization projection in step (iv) is as follows: set the spectral norm of the weights of each layer of the MLP. The upper limit is ; After each gradient descent update, the weight matrix is... Execution spectrum truncation projection ; This represents a dynamic threshold, which is a learnable parameter. Its parameterization update method is as follows: , In the formula, It is a sigmoid function. These are the trainable scalar parameters in the network. and These are the preset lower and upper bounds of the spectral norm, respectively.