Identification and self-learning cooperative control method for unmanned vehicle cluster system under unknown model
By employing reinforcement learning and data-driven methods to identify and reconstruct models for unmanned vehicle swarm systems, a distributed cooperative controller was designed. This solved the cooperative control problem of unmanned vehicle swarm systems under unknown models, optimized steady-state and transient performance, and achieved self-learning cooperative control of the unmanned vehicle swarm.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING INST OF TECH
- Filing Date
- 2023-06-08
- Publication Date
- 2026-06-23
AI Technical Summary
Existing collaborative control methods for autonomous vehicle swarm systems rely on precise models, which cannot guarantee the steady-state and transient performance of the system when the model is unknown or uncertain. Furthermore, traditional controller designs fail to effectively utilize system data for adaptive learning.
We employ reinforcement learning and data-driven methods to identify and reconstruct the system model, design a distributed cooperative controller and cost function, and use a policy iterative learning algorithm to find the optimal control strategy, thereby achieving self-learning cooperative control of an unmanned vehicle cluster.
In the case of unknown autonomous vehicle models, efficient cooperative control of the system was achieved, ensuring the steady-state and transient performance of the autonomous vehicle cluster, relaxing the dependence on the dynamic model, and realizing optimal cooperative path tracking control.
Smart Images

Figure CN116700257B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of unmanned systems technology, specifically to a method for identification and self-learning collaborative control of unmanned vehicle swarm systems under unknown model conditions. Background Technology
[0002] Multi-unmanned vehicle (UAV) swarm systems consist of multiple unmanned vehicles that collaborate and exchange information to complete complex tasks. Cooperative control of UAV swarm systems has extremely wide applications in military and civilian fields such as collaborative rescue, search and rescue, collaborative reconnaissance, and collaborative strike. Existing UAV swarm system cooperative control research is based on precise UAV models. However, in practical applications, communication between UAVs is easily affected by the external environment, potentially leading to unknown or uncertain UAV system models. Model uncertainty poses significant challenges to controller design and performance analysis. System identification methods can provide an effective technical means to address model uncertainty in UAV systems. How to achieve system identification using collected system data without relying on UAV models is currently a hot research topic and a difficult challenge.
[0003] The design of the controller is crucial for ensuring the cooperative control of an autonomous vehicle swarm system. Existing controller design techniques only consider the steady-state performance of the system, neglecting its transient performance, thus failing to guarantee optimal cooperative control. Therefore, designing a reasonable and efficient distributed cooperative control strategy to ensure both steady-state and transient performance, and achieving optimal cooperative control of the autonomous vehicle swarm through adaptive learning using only collected autonomous vehicle system data without relying on precise autonomous vehicle models, is a pressing issue. Consequently, it is necessary to research and provide system identification and self-learning cooperative control methods for autonomous vehicle swarm systems under unknown model conditions. Summary of the Invention
[0004] In view of this, the present invention provides a method for identification and self-learning cooperative control of unmanned vehicle swarm systems under unknown model conditions, which can realize system identification and self-learning cooperative control of unmanned vehicle swarm systems under unknown model conditions.
[0005] To achieve the above objectives, the present invention provides a method for identification and self-learning cooperative control of unmanned vehicle swarm systems under unknown model conditions. The technical solution includes the following steps:
[0006] Step 1: Establish a multi-autonomous vehicle system model, in which the multi-autonomous vehicle cluster system consists of N follower autonomous vehicles and one navigator autonomous vehicle, which communicate through a directed graph topology.
[0007] Step 2: Using the system data collected by each autonomous vehicle, the system model is identified and reconstructed through reinforcement learning and data-driven methods.
[0008] Step 3: Based on the system model of identification and reconstruction, design a distributed cooperative controller for reinforcement learning and the corresponding cost function.
[0009] Step 4: Under the designed reinforcement learning algorithm, the optimal control strategy is found through the policy iterative learning algorithm to achieve optimal cooperative tracking control of the follower vehicle to the navigator vehicle.
[0010] Furthermore, in step one, the established multi-vehicle system model includes the dynamics model of the navigator vehicle and the dynamics model of the follower vehicle.
[0011] The dynamic model of the Navigator autonomous vehicle is Where: z0 represents the state of the Navigator autonomous vehicle. Let z0 be the first derivative; A represents the system matrix of the autonomous vehicle, which is a parameter to be learned or identified.
[0012] The dynamic model of the i-th follower autonomous vehicle is: i = 1, 2, 3…N; where z i Indicates the state of the following autonomous vehicle, u i A represents the control input of the follower autonomous vehicle; A and B are the unknown system matrices to be learned or identified.
[0013] Furthermore, in step two, the system model is identified and reconstructed through reinforcement learning and data-driven methods, as follows:
[0014] Step 1: Write the dynamics of the autonomous vehicle as follows: in, The Kronek product represents the Navigator Z0 autonomous vehicle. Represents n z ×n z The identity matrix is vec(A1), which denotes the row vectorization of matrix A1.
[0015] Step 2: Status of the Navigator Autonomous Vehicle and Estimation is performed using the following two filters:
[0016]
[0017] in, and β represent respectively and The designed filter variables, and They represent The first derivatives of β and γ; γ>0 is an arbitrarily chosen constant.
[0018] make The relationship between β and β is expressed as follows:
[0019] Step ③ express The estimated value, the stability analysis of the transformation error of the system model identification problem, and the estimation error equation are as follows:
[0020]
[0021] in, express The second norm; and Let represent the estimated value and the true value after filtering, respectively; ε1, ε2, and ε3 are arbitrarily chosen constants greater than zero; the subscript 'a' indicates the number of iterations for the autonomous vehicle, starting from the moment a = 1 when system data collection begins, and continuing until a = s, where s represents the data collection time length, and β... a , They represent and The estimated value corresponding to the i-th iteration of learning;
[0022] Step 4: Data-driven system model reconstruction, the specific steps are as follows:
[0023] In the time interval [t0,t] l The interval [t0, t1] will be divided into multiple sub-intervals, denoted as [t0, t1], [t2, t3], ..., [t...]. l-1 ,t l The system collects data and stores it in the introduced storage vector. and In this context, the storage vector is represented as follows:
[0024]
[0025] Among them, z ri =∏ ri z0 represents the state of the navigating autonomous vehicle. z0 is in the mask surface ∏ r The mapped state, z r (t0) represents the state at time t0, and r represents the mapping point corresponding to the follower autonomous vehicle i; ∏ ri Represents a diagonal matrix where all non-zero elements are 1; vecv(z r (t1)-vecv(z r (t0) represents the mapping state z. riThe system data collected within the sub-interval [t0,t1] is extrapolated accordingly; This represents the first system data collected by the storage vector within the interval [t0, t1].
[0026] The system model matrix is reconstructed as follows
[0027] A = [[A 1 ] T ,[A 2 ] T ,…,[A p ] T ] T
[0028] in, This represents the value of the b-th element during the reconstruction of the unknown matrix A, where 1 ≤ b ≤ p ≤ N, and vecs(I n ) denotes the row vectorization of the identity matrix; p is the number of columns in matrix A, 1≤p≤N, and the unknown system matrix B is obtained through the reconstructed matrix A.
[0029] Further, step four: Under the designed reinforcement learning algorithm, the optimal control strategy is found through a policy iterative learning algorithm to achieve optimal cooperative tracking control of the follower autonomous vehicle on the leader autonomous vehicle. The specific steps are as follows:
[0030] Assuming the initial iteration value m = 0, the control policy of each follower autonomous vehicle learns from the initial stable control policy. i∈[1,N]; the unmanned vehicle i executes steps 402 to 404 to find the optimal control strategy, i takes values from 1 to N, thereby obtaining the optimal cooperative tracking control strategy of all follower unmanned vehicles for the navigator unmanned vehicle.
[0031] Step 401: Learn the collected system data over the time interval [t, t+ε], where ε represents an arbitrary constant greater than zero. At the end of the t+ε interval [t, t+ε], use the reconstructed system matrix A to solve the following integral Bellman equation calculation strategy evaluation:
[0032]
[0033] Among them, u i and Let u represent the control input of driverless vehicle i and the control input of the m-th iteration, respectively. j (τ) represents the control input of the neighbor of driverless car i. Let a be the control input of autonomous vehicle j, a''''''''''''''''''''''''''''''" ""' ... ij S represents the weighting coefficient connecting driverless car i and driverless car j. ijS represents the arbitrarily selectable correlation matrix corresponding to the control inputs of autonomous vehicles i and j. ii To represent the arbitrarily selectable known matrix corresponding to the control input of the unmanned vehicle i, W i Let be a known positive definite matrix that can be arbitrarily chosen for the autonomous vehicle i. Represents the weight of the entire topology graph; θ is the reciprocal of the weights of the entire topology graph; i The dimension is 2*1, representing the state tracking error of the autonomous vehicle i, θ. i T The dimension is 1*2, representing the transpose of the state tracking error of autonomous vehicle i; V i [m+1] [θ i [(t+ε)] = represents the state tracking error θ i (t+ε) is the value function of the (m+1)th iteration at time t+ε; N represents the number of neighbors of driverless car i.
[0034] Step 402: Use V i [m] The desired improved control strategy is to
[0035]
[0036] Among them, h i S represents the weights of the entire topology graph, specifically the sum of the weights between driver i and driver j, and between driver i and the lead driver 0, within the entire autonomous vehicle system; matrix B represents the input matrix of the autonomous vehicle; S ii >0 indicates that the known matrix corresponding to the control input of the unmanned vehicle i can be arbitrarily selected.
[0037] Step 403: If Where υ is a constant that can take any value greater than zero, the optimal learning strategy for the i-th follower autonomous vehicle is then obtained as follows: i = 1, 2, ..., N; otherwise, m is incremented by 1, and the process returns to step 401 until the optimal learning strategy for the i-th follower autonomous vehicle is obtained.
[0038] Beneficial effects:
[0039] This invention provides a model identification and self-learning cooperative control method for unmanned vehicle swarm systems with unknown models. First, for systems with completely unknown models, this invention utilizes reinforcement learning and data-driven control to design a reinforcement learning-based system model identification and reconstruction method. Second, it designs an optimal cooperative control scheme based on reinforcement learning and proposes an online adaptive learning cooperative control algorithm that learns the optimal control strategy solely based on online state and input information. Through Lyapunov stability analysis, the optimal control strategy ensures that the following unmanned vehicle can achieve optimal tracking of the leading unmanned vehicle. The model identification and reinforcement learning cooperative control method provided by this invention completely relaxes the requirements of traditional tracking control protocols that rely on vehicle dynamics, while simultaneously achieving efficient cooperative path tracking control. Attached Figure Description
[0040] Figure 1 This invention provides a design flowchart for unmanned vehicle cluster system model identification and self-learning collaborative control.
[0041] Figure 2 This is a directed topology diagram of the unmanned aerial vehicle (UAV) swarm system in an embodiment of the present invention. Detailed Implementation
[0042] The present invention will now be described in detail with reference to the accompanying drawings and embodiments.
[0043] like Figure 1 As shown, this invention provides a self-learning cooperative control method for an unmanned vehicle swarm system under unknown model conditions. The specific steps are as follows:
[0044] Step 1: Establish a multi-autonomous vehicle system model. Consider a multi-autonomous vehicle cluster system consisting of N autonomous vehicle systems and a leader, communicating through a directed graph topology.
[0045] The established multi-vehicle system model includes the dynamics model of the navigator vehicle and the dynamics model of the follower vehicle; the dynamics model of the navigator vehicle is as follows: Where: z0 represents the state of the Navigator autonomous vehicle. Let z0 be the first derivative; A represents the system matrix of the autonomous vehicle, which is the parameter to be learned or identified.
[0046] The dynamic model of the i-th follower autonomous vehicle is: i = 1, 2, 3…N; where z i Indicates the state of the following autonomous vehicle, u i A represents the control input of the follower autonomous vehicle; A and B are the unknown system matrices to be learned or identified. The leader and follower autonomous vehicles are isomorphic, so A is the same for both.
[0047] Step Two: System Model Identification and Reconstruction. Utilizing the system data collected by each autonomous vehicle, the system model is identified and reconstructed through reinforcement learning and data-driven methods.
[0048] The specific methods for system model identification and reconstruction in step two are as follows:
[0049] Step ①: System model identification method based on reinforcement learning. The dynamics of the autonomous vehicle are expressed in the following form. in, The Kronek product represents the Navigator Z0 autonomous vehicle. Represents n z ×n z The identity matrix is vec(A1), which denotes the row vectorization of matrix A1.
[0050] Step 2: Filter Design. Navigator Autonomous Vehicle Status. and Estimation is performed using the following two filters:
[0051]
[0052] in, and β represent respectively and The designed filter variables, and They represent The first derivative of β; γ>0 is an arbitrarily chosen constant;
[0053] make The relationship between β and β is expressed as follows:
[0054] Step ③ Stability Analysis. The stability analysis of the transformation error in the system model identification problem is performed, and the estimated error equation is as follows:
[0055]
[0056] in, express The second norm; and Let ε1, ε2, and ε3 represent the estimated and true values after filtering, respectively; ε1, ε2, and ε3 are arbitrarily chosen constants greater than zero; the subscript 'a' indicates the number of iterations for the autonomous vehicle, starting from a = 1 when system data collection begins, and continuing until a = s, where s represents the data collection time length; β a , They represent and The estimated value corresponding to the i-th iteration of learning;
[0057] Step 4: Data-driven system model reconstruction method
[0058] In the time interval [t0,t] l The interval [t0, t1] will be divided into multiple sub-intervals, denoted as [t0, t1], [t2, t3], ..., [t...]. l-1 ,t l The system collects data and stores it in the introduced storage vector. and In this context, the storage vector is represented as follows:
[0059]
[0060] Among them, z ri =∏ ri z0 represents the state of the navigating autonomous vehicle. z0 is in the mask surface ∏ r The mapped state, z r (t0) represents the state at time t0, and r represents the mapping point corresponding to the follower autonomous vehicle i; ∏ ri Represents a diagonal matrix where all non-zero elements are 1; vecv(z r (t1)-vecv(z r (t0) represents the mapping state z. ri The system data collected within the sub-interval [t0,t1] is extrapolated accordingly; This represents the first system data collected by the storage vector within the interval [t0, t1].
[0061] (2) System model matrix reconstruction
[0062] A = [[A 1 ] T ,[A 2 ] T ,…,[A p ] T ] T
[0063] in, This represents the value of the b-th element during the reconstruction of the unknown matrix A, where 1 ≤ b ≤ p ≤ N, and vecs(I n ) denotes the row vectorization of the identity matrix; p is the number of columns in matrix A, 1≤p≤N, and the unknown system matrix B is obtained through the reconstructed matrix A.
[0064] Step 3: Based on the system model of identification and reconstruction, design a distributed cooperative controller for reinforcement learning and the corresponding cost function.
[0065] Distributed controller design
[0066]
[0067] Among them, a ij Let a represent the weight between autonomous vehicle i and autonomous vehicle j. If autonomous vehicle i and autonomous vehicle j are neighbors, then a ij >0, otherwise, a ij =0; a i0 Let a represent the weight between driverless car i and the lead driverless car 0. If driverless car i and lead driverless car 0 are neighbors, then a i0 >0, otherwise, a i0 =0.
[0068] Cost function design
[0069]
[0070] Among them, W i >0, S ii >0 and S ij ≥0 indicates a known matrix that can be arbitrarily chosen, θ i =z i -z0 represents the tracking error between the following autonomous vehicle and the lead autonomous vehicle, v j This represents the control input for the unmanned vehicle j.
[0071] Step 4: Under the designed reinforcement learning algorithm, the optimal control strategy is found through the policy iterative learning algorithm to achieve optimal cooperative tracking control of the following autonomous vehicle to the leading autonomous vehicle.
[0072] The specific methods of existing optimal cooperative control algorithms based on reinforcement learning are as follows:
[0073] (1) Model-based strategy iterative distributed optimal cooperative control scheme
[0074]
[0075]
[0076] However, Algorithm 1 is limited by its reliance on the precise dynamic matrix B of the system, which is reflected in the optimal control strategy in step four. If the dynamic model of the following autonomous vehicle is inaccurate, the model-based reinforcement learning algorithm may be ineffective for output tracking control. Therefore, we propose an optimal cooperative control scheme based on offline reinforcement learning.
[0077] (2) Model-free offline policy iterative distributed optimal cooperative control scheme
[0078]
[0079]
[0080] The above algorithm can be written as follows:
[0081] The specific steps are as follows:
[0082] Assuming the initial iteration value m = 0, the control policy of each follower autonomous vehicle learns from the initial stable control policy. i∈[1,N]; the unmanned vehicle i executes steps 402 to 404 to find the optimal control strategy, i takes all values from 1 to N, thereby obtaining the optimal cooperative tracking control strategy of all follower unmanned vehicles for the leader unmanned vehicle;
[0083] Step 401: Learn the collected system data over the time interval [t, t+ε], where ε represents an arbitrary constant greater than zero. At the end of the t+ε interval [t, t+ε], use the reconstructed system matrix A to solve the following integral Bellman equation calculation strategy evaluation:
[0084]
[0085] Among them, u i and Let u represent the control input of driverless vehicle i and the control input of the m-th iteration, respectively. j (τ) represents the control input of the neighbor of driverless car i. Let a be the control input of autonomous vehicle j, a''''''''''''''''''''''''''''''" ""' ... ij S represents the weighting coefficient connecting driverless car i and driverless car j. ij S represents the arbitrarily selectable correlation matrix corresponding to the control inputs of autonomous vehicles i and j. ii To represent the arbitrarily selectable known matrix corresponding to the control input of the unmanned vehicle i, W i Let be a known positive definite matrix that can be arbitrarily chosen for the autonomous vehicle i. Represents the weight of the entire topology graph; θ is the reciprocal of the weights of the entire topology graph; i The dimension is 2*1, representing the state tracking error of the autonomous vehicle i, θ. i T The dimension is 1*2, representing the transpose of the state tracking error of autonomous vehicle i; V i [m+1] [θ i [(t+ε)] = represents the state tracking error θ i (t+ε) is the value function of the (m+1)th iteration at time t+ε; N represents the number of neighbors of driverless car i;
[0086] Step 402: Use V i [m]The desired improved control strategy is to
[0087]
[0088] Among them, h i The weight of the entire topology graph, that is, the sum of the weights between driver i and driver j, and between driver i and the lead driver 0 in the entire autonomous vehicle system, is expressed as: Matrix B represents the input matrix of the autonomous vehicle; S ii >0 indicates that the known matrix corresponding to the control input of the unmanned vehicle i can be arbitrarily selected;
[0089] Step 403: If Where υ is a constant that can take any value greater than zero, the optimal learning strategy for the i-th follower autonomous vehicle is then obtained as follows: i = 1, 2, ..., N; otherwise, m is incremented by 1, and the process returns to step 401 until the optimal learning strategy for the i-th follower autonomous vehicle is obtained.
[0090] In summary, the above are merely preferred embodiments of the present invention and are not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. A method for identification and self-learning cooperative control of unmanned vehicle swarm systems under unknown model conditions, characterized in that, Includes the following steps: Step 1: Establish a multi-autonomous vehicle system model, in which the multi-autonomous vehicle cluster system consists of N follower autonomous vehicles and one navigator autonomous vehicle, which communicate through a directed graph topology. Step 2: Utilize the system data collected by each autonomous vehicle to identify and reconstruct the system model through reinforcement learning and data-driven methods; Step 3: Based on the system model of identification and reconstruction, design a distributed cooperative controller for reinforcement learning and the corresponding cost function; Step 4: Under the designed reinforcement learning algorithm, the optimal control strategy is found through the policy iterative learning algorithm to achieve optimal cooperative tracking control of the follower vehicle to the navigator vehicle; In step one, the established multi-vehicle system model includes the dynamic model of the navigator vehicle and the dynamic model of the follower vehicle. The dynamic model of the Navigator autonomous vehicle is as follows: Where: z0 represents the state of the Navigator autonomous vehicle. Let z0 be the first derivative; A represents the system matrix of the autonomous vehicle, which is the parameter to be learned or identified. The dynamic model of the i-th follower autonomous vehicle is: Where z i Indicates the state of the following autonomous vehicle, u i A represents the control input of the follower autonomous vehicle; A and B are the unknown system matrices to be learned or identified. In step two, the identification and reconstruction of the system model are achieved through reinforcement learning and data-driven methods, as follows: Step 1: Write the dynamics of the autonomous vehicle as follows: in, The Kronek product represents the Navigator Z0 autonomous vehicle. Represents n z ×n z The identity matrix, vec(A1), denotes the row vectorization of matrix A1; Step 2: Status of the Navigator Autonomous Vehicle And l are estimated using the following two filters: in, and β represent l and , respectively. The designed filter variables, and They represent The first derivative of β; γ>0 is an arbitrarily chosen constant; make The relationship between β and β is expressed as follows: Step ③ express The estimated value, the stability analysis of the transformation error of the system model identification problem, and the estimation error equation are as follows: in, express The second norm; and Let ε1, ε2, and ε3 represent the estimated and true values after filtering, respectively; ε1, ε2, and ε3 are arbitrarily chosen constants greater than zero; the subscript 'a' indicates the number of iterations for the autonomous vehicle, starting from a = 1 when system data collection begins, and continuing until a = s, where s represents the data collection time length; β a , They represent and The estimated value corresponding to the m-th iteration of learning; Step 4: Data-driven system model reconstruction, the specific steps are as follows: In the time interval [t0,t] l The interval [t0, t1] will be divided into multiple sub-intervals, denoted as [t0, t1], [t2, t3], ..., [t...]. l-1 ,t l The system collects data and stores it in the introduced storage vector. In this context, the storage vector is represented as follows: Among them, z ri =Π ri z0 indicates the status of the navigating autonomous vehicle. z0 is in the mask face Π r The mapped state, z r (t0) represents the state at time t0, and r represents the mapping point corresponding to the follower autonomous vehicle i; Π ri Represents a diagonal matrix where all non-zero elements are 1; vecv(z r (t1)-vecv(z r (t0) represents the mapping state z. ri The system data collected within the sub-interval [t0,t1] is extrapolated accordingly; This represents the first system data collected by the storage vector within the interval [t0, t1]. The system model matrix is reconstructed as A = [[A 1 ] T ,[A 2 ] T ,…,[A p ] T ] T in, This represents the value of the b-th element during the reconstruction of the unknown matrix A, where 1 ≤ b ≤ p ≤ N, and vecs(I n ) denotes the row vectorization of the identity matrix; p is the number of columns in matrix A, 1≤p≤N, and the unknown system matrix B is obtained through the reconstructed matrix A.
2. The identification and self-learning cooperative control method for an unmanned vehicle swarm system under unknown model conditions as described in claim 1, characterized in that, Step four: Under the designed reinforcement learning algorithm, the optimal control strategy is found through the policy iterative learning algorithm to achieve optimal cooperative tracking control of the follower autonomous vehicle on the leader autonomous vehicle. The specific steps are as follows: Assuming the initial iteration value m = 0, the control policy of each follower autonomous vehicle learns from the initial stable control policy. The autonomous vehicle i executes steps 402 to 404 to find the optimal control strategy, where i takes values from 1 to N, thereby obtaining the optimal cooperative tracking control strategy of all follower autonomous vehicles for the leader autonomous vehicle. Step 401: Learn the collected system data over the time interval [t, t+ε], where ε represents an arbitrary constant greater than zero. At the end of the t+ε interval [t, t+ε], use the reconstructed system matrix A to solve the following integral Bellman equation calculation strategy evaluation: Among them, u i and Let u represent the control input of driverless vehicle i and the control input of the m-th iteration, respectively. j (τ) represents the control input of the neighbor of driverless car i. Let a be the control input of autonomous vehicle j, a''''''''''''''''''''''''''''''" ""' ... ij S represents the weighting coefficient connecting driverless car i and driverless car j. ij S represents the arbitrarily selectable correlation matrix corresponding to the control inputs of autonomous vehicles i and j. ii To represent the arbitrarily selectable known matrix corresponding to the control input of the unmanned vehicle i, W i Let be a known positive definite matrix that can be arbitrarily chosen for the autonomous vehicle i. Represents the weight of the entire topology graph; θ is the reciprocal of the weights of the entire topology graph; i The dimension is 2*1, representing the state tracking error of the autonomous vehicle i, θ. i T The dimension is 1*2, representing the transpose of the state tracking error of autonomous vehicle i; V i [m+1] [θ i [(t+ε)] represents the state tracking error θ i (t+ε) is the value function of the (m+1)th iteration at time t+ε; N represents the number of neighbors of driverless car i; Step 402: Use V i [m] The desired improved control strategy is to Among them, h i S represents the weights of the entire topology graph, specifically the sum of the weights between driver i and driver j, and between driver i and the lead driver 0, within the entire autonomous vehicle system; matrix B represents the input matrix of the autonomous vehicle; S ii >0 indicates that the known matrix corresponding to the control input of the unmanned vehicle i can be arbitrarily selected; Step 403: If Where υ is a constant that can take any value greater than zero, the optimal learning strategy for the i-th follower autonomous vehicle is then obtained as follows: Otherwise, increment m by 1 and return to step 401 until the optimal learning strategy for the i-th follower autonomous vehicle is obtained.