Control methods and systems for output consistency in multi-agent systems under switching topologies
By decomposing the value function of the time-varying kernel matrix using a two-layer reinforcement learning algorithm, the output consistency problem of multi-agent systems under switching topologies is solved. A distributed control strategy applicable to both fixed and random topologies is provided, enabling unconstrained control over the eigenvalues of the leader's dynamic matrix and system model information.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHANDONG NORMAL UNIV
- Filing Date
- 2023-10-23
- Publication Date
- 2026-06-30
Smart Images

Figure CN117270397B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of multi-agent control technology, specifically to a control method and system for ensuring output consistency in a multi-agent system under a switching topology. Background Technology
[0002] The statements in this section are merely background information related to the present invention and do not necessarily constitute prior art.
[0003] An intelligent agent is a computational entity that resides in a specific environment and can continuously and autonomously perform its functions. A computational system consisting of multiple intelligent agents interacting in an environment is called a multi-agent system (MAS), such as multiple drones, multiple robots, and multiple robotic arms.
[0004] The control process of multi-agent systems (MAS) faces various challenges, such as state / output consistency, formation control, and swarm control. Consistency is fundamental to other control problems, aiming to ensure that the leader and all followers are in the same state. However, in practical applications, followers may have different states, meaning that not all MAS require state consistency. Therefore, it is sufficient to control the output consistency of the system to meet the requirements.
[0005] Regarding the output consistency problem, most current solutions focus on MAS with fixed topologies. However, in practical applications, it is difficult to guarantee that the topology information of the MAS remains unchanged. Since the agents exchange information using communication networks, unexpected events may occur during the information exchange process, such as link failures, node failures, and data packet loss. These events will cause changes in the topology information, affecting the output consistency of the MAS when switching topologies.
[0006] Stochastic ST and periodic ST are two main switching topology types of MAS. Among them, stochastic ST is usually represented by a stochastic Markov process, which usually requires the model information of the agent to build the output consistency controller. However, in practical applications, the model information of the system may be inaccurate or unavailable, making it difficult for the obtained output consistency controller to meet the actual control requirements.
[0007] Furthermore, reinforcement learning (RL) algorithms are among the core algorithms in the field of artificial intelligence. RL can eliminate constraints on the exact model of the system and utilize system trajectory data to solve optimal control problems. Although it is currently possible to design data-based distributed controllers for the optimal output consistency problem of linear multi-agent systems (MAS) without requiring any system model information, such distributed control strategies are only applicable to MAS with fixed topologies. Moreover, existing RL algorithms have only one learning layer and can only iteratively learn value functions with a time-invariant kernel matrix. For the output consistency problem of ST MAS, sometimes its kernel matrix is related to the switching signal, and the kernel matrix of its value function is time-varying during switching. This makes current RL algorithms unable to solve value function problems with time-varying kernel matrices, indirectly affecting the fixed output consistency of multi-agent systems (MAS) during control. Summary of the Invention
[0008] To address the technical problems described in the background, this invention provides a control method and system for output consistency in a multi-agent system under switching topologies. By employing a two-layer RL algorithm with a preparation layer and a learning layer, it solves the problem of existing algorithms being unable to solve value functions with time-varying kernel matrices. This strategy is applicable to both fixed and random topologies. Furthermore, it imposes no constraints on the eigenvalues of the leader's dynamic matrix and requires no model information from the multi-agent system.
[0009] To achieve the above objectives, the present invention adopts the following technical solution:
[0010] The first aspect of the present invention provides a control method for output consistency of a multi-agent system under a switching topology, comprising the following steps:
[0011] With the tracking error of all agents set to 0, a control equation is determined that makes the outputs of the followers and leaders in the agents consistent. The performance index under the switching topology is optimized based on the control equation to obtain the optimal control strategy equation.
[0012] The tracking state of the multi-agent system is changed according to the tracking error in the optimal control strategy equation to achieve consistent output control.
[0013] Specifically, a value function with a time-varying kernel matrix is obtained by tracking the state. Based on a two-layer reinforcement learning method with a preparation layer and a learning layer, two parts of the value function are obtained respectively. After merging, the required value function and the corresponding optimal control policy are obtained.
[0014] Furthermore, assuming the tracking error of all agents is zero, the control equations that ensure the outputs of the followers and leader in the agents are consistent are obtained; including:
[0015] The local tracking error of the agent is determined based on the agent's adjacency elements in the topology, the agent's time-invariant traction gain, and the agent's output.
[0016] The corresponding tracking state is obtained based on the agent's local tracking error, the agent's state, and the input.
[0017] Based on the dynamics of the agent, the dynamics of the leader in the agent, the local tracking error, and the tracking state, the control equations that make the outputs of the followers and the leader consistent are obtained.
[0018] Furthermore, the performance index of the control equation under the switching topology is determined and optimized to obtain the optimal control strategy equation; this includes: determining the value function of each agent based on the admissible control strategy in the performance index, and obtaining the optimal control strategy equation based on the Bellman equation.
[0019] Furthermore, by tracking the state, a value function with a time-varying kernel matrix is obtained. Based on a two-layer reinforcement learning method with a preparation layer and a learning layer, the two parts of the value function are obtained separately. After merging, the desired value function and the corresponding optimal control policy are obtained; including:
[0020] The value function is determined based on the tracking error and kernel matrix of the multi-agent system under the switching topology.
[0021] Based on the switching kernel matrix and the constant kernel matrix, the value of the kernel matrix under the switching topology is obtained;
[0022] The value function with the time-varying kernel matrix is divided into two parts, which are obtained by using the preparation layer and the learning layer in the two-layer reinforcement learning method, respectively.
[0023] Furthermore, the value function with the time-varying kernel matrix is divided into two parts, as shown in the following equation:
[0024]
[0025]
[0026] In the formula, P is a value function. i >0 is a constant kernel matrix. It is the kernel matrix that changes with switching, and σ(k) is the switching signal. This represents a vector group consisting of system state, input, and output data. for The transpose of , k is time, and φ is the positive integer to be designed.
[0027] Furthermore, in the preparation layer, the value function is obtained. The result is shown in the following formula:
[0028]
[0029] In the formula, To track the status, For the control strategy of s-1 iterations, Q i and R i All are positive definite matrices of the pair to be selected. It is a value function.
[0030] Furthermore, in the learning layer, updates and iterations are performed according to the following formula until convergence, yielding the value function. Result:
[0031]
[0032]
[0033] In the formula, To track the status, For the control strategy of s-1 iterations, Q i and R i All are positive definite matrices of the pair to be selected. It is a value function.
[0034] A second aspect of the present invention provides a system for implementing the above-described method, comprising:
[0035] The target construction module is configured to: determine the control equation that makes the outputs of the followers and leaders in the agents consistent, with the tracking error of all agents being 0; optimize the performance index under the switching topology based on the control equation to obtain the optimal control strategy equation.
[0036] The target output module is configured to change the tracking state of the multi-agent system according to the tracking error in the optimal control strategy equation, thereby achieving consistent output control.
[0037] Specifically, a value function with a time-varying kernel matrix is obtained by tracking the state. Based on a two-layer reinforcement learning method with a preparation layer and a learning layer, two parts of the value function are obtained respectively. After merging, the final required value function and the corresponding optimal control policy are obtained.
[0038] A third aspect of the present invention provides a computer-readable storage medium.
[0039] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps in the control method for output consistency of a multi-agent system under a switched topology as described above.
[0040] A fourth aspect of the present invention provides a computer device.
[0041] A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps in the control method for output consistency of a multi-agent system under a switched topology as described above.
[0042] Compared with existing technologies, one or more of the above technical solutions have the following beneficial effects:
[0043] This paper proposes a two-layer reinforcement learning algorithm with a preparation layer and a learning layer. This algorithm addresses the problem of solving value functions with time-varying kernel matrices, which is unsuitable for existing algorithms. The strategy is applicable to both fixed and random topologies. Furthermore, it imposes no constraints on the eigenvalues of the leader's dynamic matrix and requires no model information specific to the multi-agent system. Attached Figure Description
[0044] The accompanying drawings, which form part of this invention, are used to provide a further understanding of the invention. The illustrative embodiments of the invention and their descriptions are used to explain the invention and do not constitute an improper limitation of the invention.
[0045] Figure 1 A schematic diagram of a consistency control process provided for one or more embodiments of the present invention;
[0046] Figure 2 This is a schematic diagram of a directed graph topology provided in one or more embodiments of the present invention;
[0047] Figure 3 A schematic diagram of a random switching signal provided in one or more embodiments of the present invention;
[0048] Figure 4 This is a schematic diagram of a periodic switching signal provided for one or more embodiments of the present invention. Detailed Implementation
[0049] The present invention will be further described below with reference to the accompanying drawings and embodiments.
[0050] It should be noted that the following detailed descriptions are exemplary and intended to provide further illustration of the invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains.
[0051] Terminology Explanation:
[0052] Switching topology (ST) refers to the change in the topological relationships between nodes over time.
[0053] In a random ST, the topological relationships between nodes change randomly over time.
[0054] In a periodic ST, the topological relationships between nodes change periodically over time.
[0055] Example 1:
[0056] This embodiment presents a control method and system for output consistency in a multi-agent system under topology switching. It solves the problem of existing algorithms being unable to solve value functions with time-varying kernel matrices by employing a two-layer RL algorithm with a preparation layer and a learning layer. This strategy is applicable to both fixed and random topologies. Furthermore, it imposes no constraints on the eigenvalues of the leader's dynamic matrix and requires no specific model information from the multi-agent system. Figures 1-4 As shown, it includes the following steps:
[0057] Step 1: Preliminary knowledge and problem formulation for MAS consistency control;
[0058] Step 2: Establish performance metrics for MAS under ST;
[0059] Step 3: Obtain the optimal equation for achieving the MAS consensus goal;
[0060] Step 4: Rewrite the MAS tracking status;
[0061] Step 5: Construct a two-layer reinforcement learning algorithm architecture;
[0062] Step Six: Define the operational mechanism of the two-layer reinforcement learning algorithm;
[0063] Step 7: Design a more general data-based distributed control strategy;
[0064] Step 8: Implementation of the two-layer reinforcement learning algorithm;
[0065] Specifically:
[0066] Step 1: Preliminary knowledge and problem formulation for MAS consistency control.
[0067] This step, based on graph theory, introduces ST (stochastic topology) and describes the output consistency of linear MAS under ST.
[0068] Graph theory can describe the topological relationships in a MAS (Multi-Agent System). Sometimes, due to unexpected events, the topology of agents may become time-varying, such as link failures. This time-varying topology is also called ST (Short-Terminal), which corresponds to a time-varying directed graph. Where σ(k) is the topological state at time k. This represents the topology of the MAS at time k, consisting of the node set. Edge set and weighted adjacency matrix Composition. Assumption Representation diagram An edge in the equation represents information flowing from j to i at time k. If otherwise Furthermore, the neighbor set of agent i is defined as Thulaplatz matrix Defined as The degree matrix D(σ(k)) is obtained through... Obtained.
[0069] If there is a leader v0, and paths from v0 to other agents exist, then the graph contains a spanning tree. If v0 can directly obtain information from agent i, then the time-invariant traction gain g is... i =1, otherwise, g i =0. Then define This is the traction matrix.
[0070] Generally speaking, In a finite set Switching up, where t represents the total number of different topologies. When the random variable... Time-varying topology picture The union of can be defined as The union of the neighbor sets of agent i is Therefore, the union of the degree matrix and the weighted adjacency matrix is and
[0071] This embodiment is for stochastic ST and periodic ST with irreversible Markov processes.
[0072] 1) Stochastic ST. The switching process is stochastic and determined by an irreversible Markov process. Typically, the transition probability matrix of an irreversible Markov process is shown below:
[0073]
[0074] in, Indicates from the current topology Switch to the next topology The probability of then and Assume that an irreducible Markov process begins with its invariant distribution.
[0075] 2) Periodic ST. The switching process is periodic, such as...
[0076] Regarding the consistency of MAS output under ST.
[0077] Considering the discrete-time (DT) linear MAS, the dynamics of agent i are as follows:
[0078] x i (k+1)=Ax i (k)+B i u i (k), y i (k)=Cx i (k), i=1,...,N (2)
[0079] in, and Let A and B represent the system state, control input, and system output of agent i, respectively. System matrices A and B i C are constant matrices. Also, assume (A, B) i (A,C) is controllable and (A,C) is observable.
[0080] The leader's dynamics are shown in the following formula:
[0081] x0(k+1)=Ax0(k), y0(k)=Cx0(k) (3)
[0082] in, and These are the leader's state and output, respectively. The consistency problem can be solved by controlling the outputs of the followers (1) and the leader (2) to achieve consistency.
[0083] Step 2: Establish performance metrics for MAS under ST.
[0084] Consider ST The output consistency problem of MAS(2) can be addressed by defining the local tracking error of agent i as follows:
[0085]
[0086] in, It is topology The neighboring elements of agent i. Clearly, if all Setting all values to 0 will ensure consistent output.
[0087] Based on the tracking error (4), the corresponding tracking state can be obtained as follows:
[0088]
[0089] Combining equations (2)(3) and (4)(5), we get:
[0090]
[0091]
[0092] in,
[0093]
[0094] and
[0095] Therefore, the output consistency problem of MAS(2) is transformed into the output regulation problem of the error system (6). To achieve consistency, the following needs to be given regarding the graph. The assumption.
[0096] Assumption 1: Diagram It contains at least one spanning tree, and the leader is connected to the root node.
[0097] Under assumption 1, this embodiment establishes the following performance indicators for MAS under ST based on equation (6):
[0098]
[0099] in, and Let J be a symmetric matrix of agents i, i = 1, ..., N. According to formula (7), minimize J. i This can reduce the control cost for each agent and enable them to accomplish their common task, namely, consistent output.
[0100] Step 3 yielded the optimal equations for achieving the MAS consistency objective, including the optimal Bellman equation and the optimal control strategy equation.
[0101] According to equation (7), for the permissive control strategy u i The value function of each agent i for:
[0102]
[0103] Then the Bellman equation is:
[0104]
[0105] set up Representing the optimal value function, we get:
[0106]
[0107] According to Bellman's principle of optimality and equation (10), the optimal Bellman equation is:
[0108]
[0109] The corresponding optimal control strategy equation is:
[0110]
[0111] In step four, the tracking state of MAS needs to be rewritten, based on the tracking error system (6), the tracking state It can be rewritten based on system data.
[0112] Consider the tracking state in (6) The error system can be extended to [k-φ,k], that is:
[0113]
[0114]
[0115] in,
[0116]
[0117]
[0118]
[0119]
[0120]
[0121]
[0122] O b =[(CA φ-1 ) T (CA φ-2 ) T … (CA) T C T ] T ,
[0123]
[0124]
[0125] in addition,
[0126]
[0127]
[0128]
[0129] Consider agent i, and define d i Let i be the total number of neighboring agents, and i1,...,idi Let represent each neighbor. For j = 1, ..., N, define a dummy factor. if but otherwise Then define
[0130]
[0131]
[0132]
[0133] Based on equation (15), the following lemma is given.
[0134] Lemma 1: For MAS(2) under ST, there is an observability index I o ,therefore
[0135] rank(O b )<n,φ<I o ,
[0136] rank(O b )=n,φ≥I o (16)
[0137] In addition, I o Satisfy I o q≥n.
[0138] Theorem 1: Consider topology The output consistency problem of MAS(2) is discussed if φ≥I. o Tracking status This can be represented by system data, that is,
[0139]
[0140] in,
[0141]
[0142] in addition
[0143] ,and:
[0144]
[0145] in,
[0146]
[0147]
[0148]
[0149] In step five, a two-layer reinforcement learning algorithm architecture needs to be constructed to obtain the value function by tracking the state. A two-layer RL framework is proposed to iteratively learn the information in equation (12). and the corresponding optimal control strategy
[0150] For the tracking error system (6), the actual value function The specific form is still unclear, but it can be approximated by a quadratic function at every point in the state space. Therefore, the value function (8) can be approximated as:
[0151]
[0152] in,
[0153] Substituting the data-based tracking state (17) derived from Theorem 1 into equation (18), for:
[0154]
[0155] in, It is the kernel matrix.
[0156]
[0157] It should be noted that the kernel matrix in (19) The kernel matrix changes with the switching signal σ(k). However, existing algorithms cannot iteratively learn the changing kernel matrix. Therefore, a better understanding of the kernel matrix is needed. Perform the following processing:
[0158]
[0159] Where P i >0 is a constant kernel matrix. It is the kernel matrix that changes with switching. According to equations (19) and (20), we have:
[0160]
[0161] in
[0162]
[0163] Substituting equation (21) into equations (11)-(12), we get:
[0164]
[0165]
[0166] Note the value function with time-varying kernel matrix. It is divided into two parts, namely and Let s, (s=0,1,2...) be defined as the iteration number. Based on equations (23)-(24), a two-layer RL framework and its operation mechanism are given, thus obtaining u i (k) and Two parts
[0167] The first part represents the quadratic form formed by the portion of the value function (19) that is independent of the switching signal, and the second part represents the quadratic form formed by the portion that is related to the switching signal. After the division, using the designed two-layer reinforcement learning algorithm, the kernel matrix of the first part, which is independent of the switching signal, is constant and can be obtained in the learning layer; the kernel matrix of the second part is time-varying and can be obtained in the preparation layer.
[0168] Two-layer RL consists of a preparation layer and a learning layer, wherein:
[0169] I) Preparation layer
[0170]
[0171] II) Learning Layer
[0172] The following value updates (26) and policy improvements (27) will be performed until convergence.
[0173]
[0174]
[0175] In step six, the operation mechanism of the two-layer reinforcement learning algorithm (25)-(27) is defined, specifically including:
[0176] 1) For i = 1, ..., N, choose l, Q i and R i Then initialize. and s = 0, 1, k = 1.
[0177] 2) h = k.
[0178] 3) Execute the loop (k = h:1:h+l).
[0179] a) According to formula (4), we get
[0180] b) and Combined, Solve by preparing layer (25).
[0181] c) Collect and save for the learning layer
[0182] 4) If Non-convergent, based on the saved l groups Solving the problem using the least squares method and value update (26).
[0183] Otherwise, stop learning.
[0184] 5) Based on the obtained and Calculated from strategy improvement (27)
[0185] 6) s = s + 1 and return to step 2.
[0186] By using the proposed two-layer RL algorithm (25)-(27), it is possible to obtain the desired results in the preparation layer (25) and the learning layer (26)-(27). of And corresponding distributed control strategies.
[0187] Note 1: Existing literature has proposed distributed control strategies for MAS under ST. However, these strategies assume that the eigenvalues of all leaders lie outside the unit circle. According to formulas (25)-(27) and (3), this invention makes no assumptions about the eigenvalues of the leaders' dynamic matrix, thus avoiding the requirements in these formulas. Note 1 will be verified in simulation.
[0188] In step seven, a more general data-based distributed control strategy is designed. To execute the two-layer RL algorithm, a specific expression for the distributed control strategy (27) needs to be obtained. Combining the value function (21) with the Bellman equation (9), we have:
[0189]
[0190] Then, the Hamiltonian function H i (k), i = 1, ..., N, are defined as:
[0191]
[0192] According to equation (22), It can be divided into:
[0193]
[0194] in,
[0195]
[0196] It can be divided into:
[0197]
[0198] in,
[0199]
[0200]
[0201] According to the necessary conditions for optimal control:
[0202]
[0203] From the constant kernel matrix P i and switching the core matrix The control strategy consists of:
[0204]
[0205] in,
[0206]
[0207] By combining equations (22) and (35) with the proposed two-layer RL framework (25)-(27), we obtain the specific expressions for the preparation layer and the learning layer.
[0208] In step eight, the two-layer reinforcement learning algorithm is implemented. In summary, the determined two-layer RL algorithm has taken the following specific form.
[0209] Algorithm 1: Data-based two-layer RL algorithm
[0210] 1. Definition Choose a small positive value ε.
[0211] 2. Preparation layer:
[0212]
[0213]
[0214] 3. Learning layer:
[0215] Ⅰ) Value Update
[0216] if
[0217]
[0218] otherwise
[0219] Stop learning
[0220] II) Strategy Improvement
[0221]
[0222] s = s + 1, return to step two.
[0223] Note 2: Existing data-based distributed control strategies are only applicable to MAS with fixed topologies. This article... i The expression (35) for (k) applies to both fixed topologies and ST topologies. If If it is fixed, then based on equation (15), and Then, in equation (35), u based on data i (k) degenerates into the result under a fixed topology, i.e., u i (k) also applies to fixed topology cases. Therefore, the data-based control strategy u proposed in this embodiment... i (k) is more general.
[0224] Note 3: There are switching terms in equations (6), (13)-(14) and (17). Therefore, in equation (19) kernel matrix It also involves switching and changing. However, existing algorithms only have a learning layer, and they can only iteratively learn the constant kernel matrix. Because... It is not a constant value; it changes with σ(k) during the learning process, so the kernel matrix cannot be solved using existing algorithms.
[0225] The above content addresses the output consistency problem of discrete-time linear MAS under ST conditions. For MAS under ST conditions, the kernel matrix of the value function changes with the switching signal. To solve for the time-varying kernel matrix, this invention proposes a two-layer RL algorithm and designs a more general data-based distributed control strategy, eliminating the constraints of existing technologies on the eigenvalues of the leader's dynamic matrix. Furthermore, the proposed technique overcomes the requirements of existing technologies regarding agent model information.
[0226] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
[0227] Example 2:
[0228] A system for implementing the above method includes:
[0229] The objective building module is configured to: with the tracking error of all agents being 0, obtain the control equation that makes the outputs of the followers and leaders in the agents consistent, determine the performance index of the control equation under random topology switching and optimize it to obtain the optimal control strategy equation.
[0230] The target output module is configured to change the tracking state of the multi-agent system according to the tracking error in the optimal control strategy equation, thereby achieving consistent output control.
[0231] Specifically, a value function with a time-varying kernel matrix is obtained by tracking the state. Based on a two-layer reinforcement learning method with a preparation layer and a learning layer, two parts of the value function are obtained respectively. After merging, the required value function and the corresponding optimal control policy are obtained.
[0232] Example 3:
[0233] This embodiment provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps in the control method for output consistency of a multi-agent system under a switching topology as described in Embodiment 1 above.
[0234] Example 4:
[0235] This embodiment provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it implements the steps in the control method for output consistency of a multi-agent system under a switching topology as described in Embodiment 1 above.
[0236] The steps or networks involved in Embodiments 2 to 4 above correspond to those in Embodiment 1. For specific implementation details, please refer to the relevant description section of Embodiment 1. The term "computer-readable storage medium" should be understood as a single medium or multiple media including one or more instruction sets; it should also be understood as including any medium capable of storing, encoding, or carrying an instruction set for execution by a processor and enabling the processor to perform any of the methods in this invention.
Claims
1. A control method for output consistency in a multi-agent system under a switching topology, characterized in that, Includes the following steps: With the goal of making the tracking error of all agents approach 0, we determine the control equation that makes the outputs of the followers and leaders in the agents consistent. Based on the control equation, we optimize the performance index under the switching topology to obtain the optimal control strategy equation. The tracking state of the multi-agent system is changed according to the tracking error in the optimal control strategy equation to achieve consistent output control. Specifically, a value function with a time-varying kernel matrix is obtained by tracking the state. Based on a two-layer reinforcement learning method with a preparation layer and a learning layer, two parts of the value function are obtained separately. After merging, the desired value function and the corresponding optimal control strategy are obtained. The value function with the time-varying kernel matrix... It is divided into two parts, namely and The first part represents the quadratic form formed by the portion of the value function independent of the switching signal, and the second part represents the quadratic form formed by the portion related to the switching signal. After partitioning, using the designed two-layer reinforcement learning algorithm, the kernel matrix of the first part, which is independent of the switching signal and has a constant value, is obtained in the learning layer; the kernel matrix of the second part is time-varying and is obtained in the preparation layer; and the relationship between the two satisfies... .
2. The control method for output consistency of a multi-agent system under a switching topology as described in claim 1, characterized in that, Assuming the tracking error of all agents is zero, the control equations that ensure the outputs of the followers and leader in the agents are consistent are obtained; including: The local tracking error of the agent is determined based on the agent's adjacency elements in the topology, the agent's time-invariant traction gain, and the agent's output. The corresponding tracking state is obtained based on the agent's local tracking error, the agent's state, and the input. Based on the dynamics of the agent, the dynamics of the leader in the agent, the local tracking error, and the tracking state, the control equations that make the outputs of the followers and the leader consistent are obtained.
3. The control method for output consistency of a multi-agent system under a switching topology as described in claim 1, characterized in that, The performance index of the control equation under the switching topology is determined and optimized to obtain the optimal control strategy equation; this includes: determining the value function of each agent based on the admissible control strategy in the performance index, and obtaining the optimal control strategy equation based on the Bellman equation.
4. The control method for output consistency of a multi-agent system under a switching topology as described in claim 1, characterized in that, By tracking the state, a value function with a time-varying kernel matrix is obtained. Based on a two-layer reinforcement learning method with a preparation layer and a learning layer, two parts of the value function are obtained separately. After merging, the desired value function and the corresponding optimal control policy are obtained; including: The value function is determined based on the tracking error and kernel matrix of the multi-agent system under the switching topology. The value of the kernel matrix is obtained based on the switching kernel matrix and the constant kernel matrix; The value function with the time-varying kernel matrix is divided into two parts, which are obtained by using the preparation layer and the learning layer in the two-layer reinforcement learning method, respectively.
5. The control method for output consistency of a multi-agent system under a switching topology as described in claim 1, characterized in that, The value function with the time-varying kernel matrix is divided into two parts, as shown in the following equation: ; ; In the formula, , It is a value function. It is a constant kernel matrix. It is a switching and changing kernel matrix. To switch signals, This represents a vector group consisting of system state, input, and output data. for transpose, For a moment, Let be the positive integer to be designed.
6. The control method for output consistency of a multi-agent system under a switching topology as described in claim 1, characterized in that, In the preparation layer, the value function is obtained. The result is shown in the following formula: ; In the formula, To track the status, For the control strategy of s-1 iterations, and All are positive definite matrices of the pair to be selected. It is a value function. , s Defined as the number of iterations. k For a moment.
7. The control method for output consistency of a multi-agent system under a switching topology as described in claim 1, characterized in that, In the learning layer, updates and iterations are performed according to the following formula until convergence, yielding the value function. Result: ; ; In the formula, To track the status, For the control strategy of s-1 iterations, and All are positive definite matrices of the pair to be selected. It is a value function. , s Defined as the number of iterations. k For a moment.
8. A control system for ensuring output consistency in a multi-agent system under a switching topology, characterized in that, include: The target construction module is configured to: determine the control equation that makes the outputs of the followers and leaders in the agents consistent, with the goal of making the tracking error of all agents approach 0; optimize the performance index under the switching topology based on the control equation to obtain the optimal control strategy equation. The target output module is configured to change the tracking state of the multi-agent system according to the tracking error in the optimal control strategy equation, thereby achieving consistent output control. Specifically, a value function with a time-varying kernel matrix is obtained by tracking the state. Based on a two-layer reinforcement learning method with a preparation layer and a learning layer, two parts of the value function are obtained separately. After merging, the desired value function and the corresponding optimal control strategy are obtained. The value function with the time-varying kernel matrix... It is divided into two parts, namely and The first part represents the quadratic form formed by the portion of the value function independent of the switching signal, and the second part represents the quadratic form formed by the portion related to the switching signal. After partitioning, using the designed two-layer reinforcement learning algorithm, the kernel matrix of the first part, which is independent of the switching signal and has a constant value, is obtained in the learning layer; the kernel matrix of the second part is time-varying and is obtained in the preparation layer; and the relationship between the two satisfies... .
9. A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the control method for output consistency of a multi-agent system under a switched topology as described in any one of claims 1-7.
10. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the control method for output consistency of a multi-agent system under a switched topology as described in any one of claims 1-7.