A multi-unmanned system scene perception decision-making method and system based on dynamic cooperation

By constructing a scene perception and decision-making model for multiple unmanned systems, and utilizing sensor information and executed actions, the problem of incomplete perception in unmanned systems is solved, enabling more accurate collaborative situational analysis and team collaboration.

CN118133887BActive Publication Date: 2026-06-23SHANGHAI UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANGHAI UNIV
Filing Date
2024-03-20
Publication Date
2026-06-23

Smart Images

  • Figure CN118133887B_ABST
    Figure CN118133887B_ABST
Patent Text Reader

Abstract

The application discloses a multi-unmanned system scene perception decision method and system based on dynamic cooperation, relates to the field of multi-agent reinforcement learning, and comprises the following steps: acquiring sensor information of each unmanned system at a current time and an executed action at a previous time; inputting the sensor information of all unmanned systems at the current time and the executed action at the previous time into a multi-unmanned system scene perception decision model to obtain an executed action of each unmanned system at the current time; the multi-unmanned system scene perception decision model comprises a trained first network module, a second network module and a third network module; the trained first network module is used for determining a time sequence trajectory pre-encoding vector of each unmanned system at the current time; the trained second network module is used for determining a dynamic cooperation relationship adjacency matrix of the unmanned system at the current time; and the trained third network module is used for determining the executed action of each unmanned system at the current time. The application improves the effectiveness of the executed action of each unmanned system and promotes team cooperation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of multi-agent reinforcement learning technology, and in particular to a method and system for scene perception and decision-making in multi-unmanned systems based on dynamic cooperation. Background Technology

[0002] Unmanned systems typically utilize various onboard sensors for real-time scene perception and understanding. After analyzing and understanding the environmental state, they select appropriate actions based on their own behavioral strategies. However, in most real-world applications, unmanned systems suffer from incomplete perception and dynamically changing scenarios, leading to incomplete or inaccurate situational awareness. This poses a significant challenge to collaborative decision-making among multiple unmanned systems.

[0003] In existing technologies, the method of constructing a global static collaboration graph is generally used to determine the execution actions of each unmanned system at different times. However, the global static collaboration graph is usually determined based on the external scene information perceived by various sensors of the unmanned system. It is a static processing process and is difficult to cope with dynamic external collaboration scenarios. Furthermore, it does not consider the importance of the collaboration relationship between unmanned systems. This also makes the execution actions of each unmanned system uncoordinated and the cooperation between the unmanned systems not smooth. Summary of the Invention

[0004] The purpose of this invention is to provide a method and system for scene perception and decision-making based on dynamic collaboration among multiple unmanned systems, which fully considers the dynamic collaborative relationship between multiple unmanned systems, improves the effectiveness of actions performed by each unmanned system, and promotes team collaboration.

[0005] To achieve the above objectives, embodiments of the present invention provide the following solutions:

[0006] A scene perception and decision-making method for multi-unmanned systems based on dynamic collaboration includes:

[0007] Acquire the sensor information of each unmanned system at the current moment and the actions performed at the previous moment; the sensor information includes at least RGB images from visual cameras and point cloud data from lidar.

[0008] The sensor information of all unmanned systems at the current moment and the execution actions of the previous moment are input into the multi-unmanned system scene perception and decision model to obtain the execution actions of each unmanned system at the current moment; the multi-unmanned system scene perception and decision model includes a trained first network module, a trained second network module and a trained third network module;

[0009] The trained first network module is used to determine the pre-encoded temporal trajectory vector of each unmanned system at the current moment based on the sensor information of all unmanned systems at the current moment and the actions performed at the previous moment;

[0010] The trained second network module is used to determine the adjacency matrix of the dynamic cooperative relationship of the unmanned systems at the current moment based on the pre-encoded vectors of the temporal trajectories of all unmanned systems at the current moment;

[0011] The trained third network module is used to determine the action to be performed by each unmanned system at the current moment based on the pre-encoded vectors of the temporal trajectories of all unmanned systems at the current moment and the adjacency matrix of the dynamic cooperative relationship of the unmanned systems at the current moment.

[0012] Optionally, the trained first network module includes: a first fully connected neural network, a gated recurrent network, and a second fully connected neural network;

[0013] The first fully connected neural network is used to obtain the state feature encoding vector of each unmanned system at the current moment based on the sensor information of all unmanned systems at the current moment and the actions executed at the previous moment;

[0014] The gated recurrent network is used to obtain the hidden feature vector of each unmanned system at the current moment based on the state feature encoding vector of all unmanned systems at the current moment and the hidden feature vector of the previous moment; the hidden feature vector is equal to the temporal trajectory feature encoding vector.

[0015] The second fully connected neural network is used to obtain the pre-encoded temporal trajectory vector of each unmanned system at the current moment based on the temporal trajectory feature encoding vector of all unmanned systems at the current moment.

[0016] Optionally, the trained second network module includes: a precoding matrix unit and a multi-head attention network;

[0017] The precoding matrix unit is used to obtain the temporal trajectory precoding matrix at the current moment based on the temporal trajectory precoding vectors of all unmanned systems at the current moment;

[0018] The multi-head attention network is used to obtain the adjacency matrix of the dynamic cooperative relationship of the unmanned system at the current moment based on the temporal trajectory pre-encoding matrix at the current moment.

[0019] Optionally, the trained third network module includes: a graph convolutional network, a third fully connected neural network, and an action selection unit;

[0020] The graph convolutional network is used to obtain the cooperative situation analysis matrix at the current moment based on the pre-encoded vectors of the temporal trajectories of all unmanned systems at the current moment and the adjacency matrix of the dynamic cooperative relationship of the unmanned systems at the current moment.

[0021] The third fully connected neural network is used to obtain the action strategy value vector of each unmanned system at the current moment based on the cooperative situation analysis matrix at the current moment; the action strategy value vector contains the strategy value of all executed actions;

[0022] The action selection unit is used to select the action to be executed at the current moment from the action strategy value vector using a greedy algorithm.

[0023] Optionally, the training process of the multi-unmanned system scene perception decision model includes:

[0024] Acquire sensor information and actions performed by each unmanned system at historical moments; the historical moments refer to moments prior to the current moment.

[0025] Construct a network model; the network model includes: a first network module, a second network module, and a third network module connected in sequence;

[0026] The sensor information, actions executed at the first historical moment, and actions executed at the second historical moment of all unmanned systems are input into the network model. The network model is trained with the goal of minimizing the network loss function, and the trained network model is determined as the scene perception and decision-making model for multiple unmanned systems. The second historical moment is the moment before the first historical moment.

[0027] Optionally, the network loss function is:

[0028]

[0029]

[0030] Among them, L i () represents the network loss function, θ i Here are the actual network parameters, and T represents different historical times. Let Q be the objective policy value of the i-th unmanned system at time t. i Let H be the action strategy value vector of the i-th unmanned system. (L) [i] represents the situational characteristics of the i-th unmanned system. Let i be the action performed by the i-th unmanned system at time t. Let r be the action performed by the i-th unmanned system at time t+1. t Let be the reward function at time t, and γ be the reward discount factor. These are the target network parameters.

[0031] Optionally, the adjacency matrix of the dynamic cooperative relationship of the unmanned system is:

[0032] M = f Attention (Z)

[0033]

[0034] Where M is the adjacency matrix of dynamic cooperative relationships in the unmanned system, f Attention () represents a multi-head attention network, and Z is the temporal trajectory pre-coding matrix. Let i be the pre-encoded vector of the time-series trajectory of the i-th unmanned system at time t, where i = 1, 2, ..., N, and N is the number of unmanned systems.

[0035] This invention also provides a computer system, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of a scene perception and decision-making method for a multi-unmanned system based on dynamic cooperation.

[0036] According to the specific embodiments provided in this invention, the present invention discloses the following technical effects:

[0037] On the one hand, the embodiments of the present invention utilize the sensor information of each unmanned system at the current moment and the execution actions at the previous moment to increase the amount of information input to the multi-unmanned system scene perception and decision model. On the other hand, the multi-unmanned system scene perception and decision model fully considers the dynamic cooperation relationship between each unmanned system and determines the adjacency matrix of the dynamic cooperation relationship of the unmanned systems based on the pre-encoded vectors of the temporal trajectories of all unmanned systems. This matrix determines the importance of the cooperation relationship of each unmanned system at different moments, improves the effectiveness of solving the execution actions of each unmanned system, and further promotes team collaboration. Attached Figure Description

[0038] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0039] Figure 1 A flowchart of a scene perception and decision-making method for multi-unmanned systems based on dynamic cooperation provided in an embodiment of the present invention;

[0040] Figure 2 A flowchart illustrating the training process of a scene perception and decision-making model for a multi-unmanned system provided in an embodiment of the present invention;

[0041] Figure 3 This is a first virtual mission environment diagram for multi-unmanned surface vessel collaborative search and rescue provided in an embodiment of the present invention;

[0042] Figure 4This is a second virtual mission environment diagram for multi-unmanned surface vessel collaborative search and rescue provided in an embodiment of the present invention. Detailed Implementation

[0043] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0044] The purpose of this invention is to provide a scene perception and decision-making method and system based on dynamic collaboration among multiple unmanned systems, which fully considers the dynamic collaborative relationship between multiple unmanned systems, improves the effectiveness of each unmanned system's actions, and promotes team collaboration.

[0045] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0046] Figure 1 The specific process of a scene perception and decision-making method for multi-unmanned systems based on dynamic cooperation is shown. The following is a detailed introduction to each part of the method.

[0047] See Figure 1 This dynamic collaborative multi-unmanned system scene perception and decision-making method includes:

[0048] Step S1: Obtain the sensor information of each unmanned system at the current moment and the actions performed at the previous moment; the sensor information includes at least the RGB images from the visual camera and the point cloud data from the lidar.

[0049] Step S2: Input the current sensor information and previous actions of all unmanned systems into the multi-unmanned system scene perception decision model to obtain the current actions of each unmanned system. The multi-unmanned system scene perception decision model includes a trained first network module, a trained second network module, and a trained third network module. The trained first network module is used to determine the temporal trajectory pre-encoded vector of each unmanned system at the current moment based on the current sensor information and previous actions of all unmanned systems. The trained second network module is used to determine the adjacency matrix of the dynamic cooperation relationship of the unmanned systems at the current moment based on the temporal trajectory pre-encoded vector of all unmanned systems at the current moment. The trained third network module is used to determine the current actions of each unmanned system at the current moment based on the temporal trajectory pre-encoded vector of all unmanned systems at the current moment and the adjacency matrix of the dynamic cooperation relationship of the unmanned systems at the current moment.

[0050] Specifically, the trained first network module includes: a first fully connected neural network, a gated recurrent network, and a second fully connected neural network.

[0051] The first fully connected neural network is used to obtain the state feature encoding vector of each unmanned system at the current moment based on the sensor information of all unmanned systems at the current moment and the actions executed at the previous moment, that is:

[0052]

[0053] In the formula, Let f be the state feature encoding vector of the i-th unmanned system at time t. E_1 This is the first fully connected neural network. For the sensor information of the i-th unmanned system at time t, Let be the action performed by the i-th unmanned system at time t-1.

[0054] The gated recurrent network is used to obtain the hidden feature vector of each unmanned system at the current time step (the hidden feature vector is equal to the temporal trajectory feature encoding vector) based on the state feature encoding vector of all unmanned systems at the current time step and the hidden feature vector of the previous time step, that is:

[0055]

[0056]

[0057] In the formula, Let f be the hidden feature vector of the i-th unmanned system at time t. GRU For gated loop networks, Let be the hidden feature vector of the i-th unmanned system at time t-1. Let be the temporal trajectory feature encoding vector of the i-th unmanned system at time t.

[0058] The second fully connected neural network is used to obtain the pre-encoded temporal trajectory vector of each unmanned system at the current moment based on the temporal trajectory feature encoding vector of all unmanned systems at the current moment, that is:

[0059]

[0060] In the formula, f is the pre-encoded vector of the temporal trajectory of the i-th unmanned system at time t. E_2 This is the second fully connected neural network.

[0061] Furthermore, the trained second network module includes: a precoding matrix unit and a multi-head attention network.

[0062] The precoding matrix unit is used to obtain the temporal trajectory precoding matrix at the current moment based on the temporal trajectory precoding vectors of all unmanned systems at the current moment, that is:

[0063]

[0064] In the formula, Z is the temporal trajectory precoding matrix. Let i be the pre-encoded temporal trajectory vector of the Nth unmanned system at time t, where i = 1, 2, ..., N, and N is the number of unmanned systems.

[0065] Multi-head attention networks are used to obtain the adjacency matrix of the dynamic cooperative relationship of the unmanned system at the current moment based on the pre-encoded matrix of the temporal trajectory at the current moment, that is:

[0066] M = f Attention (Z) (6)

[0067] In the formula, M is the adjacency matrix of the dynamic cooperation relationship of unmanned systems (the weight matrix of each edge in a fully connected undirected graph composed of N unmanned systems), with a dimension of N×N, and the value of each element in this matrix belongs to [0,1]. Attention It is a multi-head attention network.

[0068] Furthermore, the trained third network module includes: a graph convolutional network, a third fully connected neural network, and an action selection unit.

[0069] Graph convolutional networks are used to obtain the cooperative situation analysis matrix at the current moment based on the pre-encoded vectors of the temporal trajectories of all unmanned systems at the current moment and the adjacency matrix of the dynamic cooperative relationships of the unmanned systems at the current moment.

[0070]

[0071] In the formula, l represents the number of layers in the graph convolutional network, l∈[1,L], H (l) H is the cooperative situation analysis matrix for layer l. (l+1) H is the cooperative situation analysis matrix for layer l+1. (1) =Z, where σ is the activation function of the graph convolutional network. The network parameters are passed through an L-layer graph convolutional network to obtain the final cooperative situation analysis matrix H. (L) .

[0072] The third fully connected neural network is used to obtain the action strategy value vector of each unmanned system at the current moment based on the cooperative situation analysis matrix at the current moment, that is:

[0073] Q i =f E_3 (H (L) [i]) (8)

[0074] In the formula, Q i Let f be the action policy value vector of the i-th unmanned system (containing the policy values ​​of all actions executed by the i-th unmanned system). E_3 H is the third fully connected neural network. (L) [i] represents the collaborative situation analysis matrix H of the i-th unmanned system. (L) The situation characteristics in the middle.

[0075] The action selection unit is used to select the action to be performed at the current moment from the action strategy value vector of the i-th unmanned system using the ∈-greedy algorithm.

[0076] As a preferred implementation, before applying the multi-unmanned system scene perception and decision-making model, it is necessary to use the multi-unmanned system experience pool D (built based on historical data) to train the multi-unmanned system scene perception and decision-making model in the early stage. At the same time, it is also necessary to use newly acquired data to continuously update the multi-unmanned system experience pool D.

[0077] like Figure 2 As shown, the training process of the scene perception and decision-making model for multi-unmanned systems includes:

[0078] Step 101: Obtain sensor information and actions executed by each unmanned system at historical moments; where historical moments refer to moments prior to the current moment.

[0079] Step 102: Construct the network model; the network model includes: a first network module, a second network module, and a third network module connected in sequence.

[0080] Step 103: Input the sensor information of the first historical moment of all unmanned systems, the execution actions of the first historical moment and the execution actions of the second historical moment into the network model, repeat the operation of formulas (1) to (8), train the network model with the goal of minimizing the network loss function, and determine the trained network model as the multi-unmanned system scene perception decision model; the second historical moment is the moment before the first historical moment.

[0081] The network loss function is:

[0082]

[0083]

[0084] Among them, L i () represents the network loss function, θ i Here are the actual network parameters, and T represents different historical times. Let the target strategy value of the i-th unmanned system at time t be . This is a concretization of the action strategy value vector of the i-th unmanned system. Let i be the action performed by the i-th unmanned system at time t. Let r be the action performed by the i-th unmanned system at time t+1. t Let be the reward function at time t, and γ be the reward discount factor. These are the target network parameters.

[0085] To verify the effectiveness of the above-mentioned scene perception and decision-making method for multi-unmanned systems based on dynamic collaboration, the following data analysis and verification will be conducted using a multi-unmanned surface vessel collaborative search and rescue mission as an example.

[0086] like Figure 3 and Figure 4 As shown, to better reflect real-world application scenarios, this section constructs a multi-unmanned surface vessel (USV) collaborative search and rescue mission environment in the Unity 3D virtual simulation engine. This is a typical multi-unmanned system cluster collaborative rescue mission under incomplete observation. USVs use their onboard radar sensors to perceive and understand the environment, and select appropriate rescue targets and search routes based on their own understanding. Successfully rescuing all stranded individuals within a limited time is considered mission completion. The USVs modeled in this environment conform to the kinematic rules of maritime USVs, and the time delay and large inertia of maritime vehicles are also considered when implementing the control logic.

[0087] In real maritime rescue missions, unmanned surface vessel (USV) search and rescue teams conduct a thorough search of the sea surface according to pre-assigned patrol tasks to ensure that a search and rescue operation can be launched immediately upon receiving a distress signal. Furthermore, due to severe weather and fluctuating sea temperatures, those stranded can quickly suffer from hypothermia, making the search and rescue mission extremely urgent; failure to rescue them promptly can endanger their lives. Therefore, in the mission simulation, the initial state of the mission is randomized for the USV and the positions of the stranded individuals. The termination condition is set to the USV reaching the rescue range of all stranded individuals or completing a specified number of decision-making steps. The positions of the USV and the target point are reset at the start of a new round.

[0088] In this embodiment, all observation information for the unmanned surface vessels (USVs) is provided by radar sensors with a maximum observable angle of 150 degrees and a total of 30 rays, which can acquire relevant information about walls, other USVs, and lifebuoys. Each USV's motion space consists of 6-dimensional discrete movements, executing commands such as stationary, straight ahead, left turn, right turn, left turn, and right turn.

[0089] In the comparative experiments of this embodiment, the IQL algorithm under a fully distributed training framework was selected as one of the baseline algorithms; secondly, the DICG algorithm based on a single-head attention mechanism was selected as another benchmark algorithm to verify the role of the multi-head attention mechanism in graph attention mechanisms; finally, the cutting-edge multi-unmanned system reinforcement learning method MA-POCA was used as the benchmark. The experimental metric is the team reward score, that is, the cumulative reward of the unmanned system team in each round.

[0090] Table 1 Comparison of Team Reward Scores for Various Methods in Multi-UAV Collaborative Search and Rescue Missions

[0091] 40(k) 80(k) 120(k) 160(k) 200(k) IQL 5.2 5.8 5.5 5.88 5.9 MA-POCA 5.89 6.23 6.0 6.17 6.16 DICG 5.85 6.1 6.25 6.56 6.53 Our 6.1 6.24 6.71 6.56 6.66

[0092] Table 1 presents the performance of all algorithms from the perspective of algorithm scores at each training stage, with a sampling interval of 40k steps. It is evident that all algorithms tend to stabilize their scores after 120k iterations, indicating that the unmanned surface vessels (USVs) can effectively complete collaborative search and rescue tasks after training to a certain level of cooperative strategy. Specifically, in the initial 40k time steps, both the IQL and MA-POCA algorithms learn strategies with high scores. However, as time progresses, the improvement trend of these two algorithms becomes smaller and fluctuates. The main reason for this phenomenon is the poor adaptability of unmanned systems to non-steady-state scenarios, leading to poor learning effects of multi-USV cooperative strategies. The DICG algorithm performs only moderately in these three tasks, further illustrating that the cooperative graph constructed based on a single dimension has limited effect on algorithm improvement.

[0093] In summary, the localized observation characteristics of multiple unmanned systems' sensors lead to insufficient observation and understanding of their dynamic task scenarios, resulting in inadequate or even ineffective strategy capabilities. This invention addresses this issue by leveraging the sensor information and actions of each unmanned system to increase the amount of information, and by integrating the temporal characteristics of different unmanned systems through dynamic collaboration among them. This enhances the unmanned systems' understanding of the task scenario, considering the dynamic collaboration relationships (i.e., the importance of collaboration relationships at different times) to more accurately express the collaborative situation. This helps each unmanned system better integrate its state characteristics, achieve collaborative situation analysis, and promote teamwork.

[0094] Furthermore, embodiments of the present invention also provide a computer system, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the above-described dynamic cooperative multi-unmanned system scene perception and decision-making method.

[0095] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the systems disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the descriptions are relatively simple; relevant parts can be referred to the method section.

[0096] This document uses specific examples to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. Furthermore, those skilled in the art will recognize that, based on the ideas of the present invention, there will be changes in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as a limitation of the present invention.

Claims

1. A scene perception and decision-making method for multi-unmanned systems based on dynamic collaboration, characterized in that, include: Acquire the sensor information of each unmanned system at the current moment and the actions performed at the previous moment; the sensor information includes at least RGB images from visual cameras and point cloud data from lidar. The sensor information of all unmanned systems at the current moment and the execution actions of the previous moment are input into the multi-unmanned system scene perception and decision model to obtain the execution actions of each unmanned system at the current moment; the multi-unmanned system scene perception and decision model includes a trained first network module, a trained second network module and a trained third network module; The trained second network module includes: a precoding matrix unit and a multi-head attention network; the precoding matrix unit is used to obtain the temporal trajectory precoding matrix at the current time based on the temporal trajectory precoding vectors of all unmanned systems at the current time; the multi-head attention network is used to obtain the adjacency matrix of the dynamic cooperation relationship of the unmanned systems at the current time based on the temporal trajectory precoding matrix at the current time; the adjacency matrix of the dynamic cooperation relationship of the unmanned systems is: ; ; in, This represents the adjacency matrix for dynamic cooperative relationships in unmanned systems. For multi-head attention networks, Pre-encode the time-series trajectory matrix. For the first i Unmanned systems in t The time-series trajectory pre-encoded vector at each moment, i =1,2,... N , N The number of unmanned systems; The trained first network module is used to determine the pre-encoded temporal trajectory vector of each unmanned system at the current moment based on the sensor information of all unmanned systems at the current moment and the actions performed at the previous moment; The trained second network module is used to determine the adjacency matrix of the dynamic cooperative relationship of the unmanned systems at the current moment based on the pre-encoded vectors of the temporal trajectories of all unmanned systems at the current moment; The trained third network module is used to determine the action to be performed by each unmanned system at the current moment based on the pre-encoded vectors of the temporal trajectories of all unmanned systems at the current moment and the adjacency matrix of the dynamic cooperative relationship of the unmanned systems at the current moment.

2. The scene perception and decision-making method for multi-unmanned systems based on dynamic cooperation according to claim 1, characterized in that, The trained first network module includes: a first fully connected neural network, a gated recurrent network, and a second fully connected neural network; The first fully connected neural network is used to obtain the state feature encoding vector of each unmanned system at the current moment based on the sensor information of all unmanned systems at the current moment and the actions executed at the previous moment; The gated recurrent network is used to obtain the hidden feature vector of each unmanned system at the current moment based on the state feature encoding vector of all unmanned systems at the current moment and the hidden feature vector of the previous moment; the hidden feature vector is equal to the temporal trajectory feature encoding vector. The second fully connected neural network is used to obtain the pre-encoded temporal trajectory vector of each unmanned system at the current moment based on the temporal trajectory feature encoding vector of all unmanned systems at the current moment.

3. The scene perception and decision-making method for multi-unmanned systems based on dynamic cooperation according to claim 1, characterized in that, The trained third network module includes: a graph convolutional network, a third fully connected neural network, and an action selection unit; The graph convolutional network is used to obtain the cooperative situation analysis matrix at the current moment based on the pre-encoded vectors of the temporal trajectories of all unmanned systems at the current moment and the adjacency matrix of the dynamic cooperative relationship of the unmanned systems at the current moment. The third fully connected neural network is used to obtain the action strategy value vector of each unmanned system at the current moment based on the cooperative situation analysis matrix at the current moment; the action strategy value vector contains the strategy value of all executed actions; The action selection unit is used to select the action to be executed at the current moment from the action strategy value vector using a greedy algorithm.

4. The scene perception and decision-making method for multi-unmanned systems based on dynamic cooperation according to claim 1, characterized in that, The training process of the multi-unmanned system scene perception and decision-making model includes: Acquire sensor information and actions performed by each unmanned system at historical moments; the historical moments refer to moments prior to the current moment. Construct a network model; the network model includes: a first network module, a second network module, and a third network module connected in sequence; The sensor information, actions executed at the first historical moment, and actions executed at the second historical moment of all unmanned systems are input into the network model. The network model is trained with the goal of minimizing the network loss function, and the trained network model is determined as the scene perception and decision-making model for multiple unmanned systems. The second historical moment is the moment before the first historical moment.

5. The scene perception and decision-making method for multi-unmanned systems based on dynamic cooperation according to claim 4, characterized in that, The network loss function is: in, ( ) represents the network loss function. These are the actual network parameters. For different historical moments, For the first i Unmanned systems in t The strategic value of a target at any given moment For the first i The value vector of action strategy in unmanned systems For the first i Situational characteristics of unmanned systems For the first i Unmanned systems in t The execution of actions at any given moment For the first i Unmanned systems in t+ The action to be performed at moment 1 For the first t The reward function at time step, As a reward discount factor, These are the target network parameters.

6. A computer system, comprising: A memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that the processor executes the computer program to implement the steps of the dynamic cooperative multi-unmanned system scene perception and decision-making method according to any one of claims 1-5.