Extensible multi-agent hierarchical navigation method and device based on graph network
By using a multi-agent hierarchical navigation method based on graph networks, a relationship graph between agents and target points is generated, the relationship matrix is calculated, and the optimal path is planned. This solves the problems of high computational load, poor performance, and weak transfer capability in multi-agent navigation, and achieves efficient multi-target allocation and multi-machine cooperation.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TSINGHUA UNIVERSITY
- Filing Date
- 2023-11-03
- Publication Date
- 2026-06-23
AI Technical Summary
Existing multi-agent navigation methods suffer from high computational cost, poor performance, and overly complex action and state spaces for multiple agents, leading to overfitting, poor transferability, and difficulty in large-scale application.
A scalable multi-agent hierarchical navigation method based on graph networks is adopted. By generating agent relationship graphs and target point relationship graphs, the agent-target point relationship matrix is calculated. The optimal path is generated using a preset path planner and a multi-layer perception strategy, enabling multiple agents to autonomously select navigation targets, reducing real-time computing overhead and improving migration capabilities.
It greatly reduces real-time computing overhead, improves performance and model transferability, promotes multi-objective allocation and multi-machine cooperation, and solves the problems of large computational load and poor performance in existing technologies.
Smart Images

Figure CN117629210B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the fields of robotics and machine learning, and in particular to a scalable multi-agent hierarchical navigation method and apparatus based on graph networks. Background Technology
[0002] Multi-agent multi-target navigation is an important topic in robotics research, with significant applications in autonomous driving, disaster relief, and logistics. This field can be described as multiple agents assigning tasks to multiple target points and then navigating to those points to complete their missions. During navigation, it's crucial to ensure load balancing among the agents to achieve higher efficiency.
[0003] Currently, existing technologies mainly include algorithms related to goal allocation and path planning, as well as machine learning algorithms. Multi-goal allocation is typically modeled as the Multiple Traveling Salesman Problem (MTS), mixed-integer linear programming (MIBLM), and contract auction network models. The MTS is an extension of the classic Traveling Salesman Problem. MIBLM is a method for task allocation based on a hybrid genetic algorithm with integer encoding. It can be combined with the Traveling Salesman Problem to solve dynamic multi-goal problems, enabling online dynamic task allocation and trajectory planning algorithms. The auction network model, first proposed by Bertsekas, solves the task allocation problem and then utilizes the contract network protocol to achieve real-time task allocation for agents. Path planning algorithms have a long research history in both single-agent and multi-agent applications. These algorithms can be directly deployed in different environments without additional training. Path planning algorithms need to find a path that optimizes a predetermined performance function. Representative methods in multi-agent path planning algorithms include the Fast Random Search Tree (FRS), Voronoi Graph Search (VRF), and A* algorithm. While these algorithms do not require extensive training, they incur significant computational overhead during deployment. At the same time, because the parameters of the search algorithms depend on manual design, these methods have poor generalization performance and are not suitable for scenarios that require complex collaboration.
[0004] Machine learning algorithms, leveraging the performance of neural networks, are well-suited for complex collaborative tasks. In particular, algorithms based on reinforcement learning have achieved excellent results in navigation tasks. These algorithms typically use a pre-trained neural network model as the agent, allowing for deployment with minimal computational overhead, inferring each action of the agent through the trained model.
[0005] At present, algorithms based on recurrent neural networks can maintain an abstract representation of the environment through the memory property of recurrent neural networks. In addition, algorithms based on imitation learning optimize their own behavior by imitating the action trajectories of experts. However, the increase in the number of agents leads to an exponential growth in the action space and state space, which brings great difficulties to the optimization of the algorithm. Therefore, although the algorithm has achieved good performance, it is mostly only applicable to scenarios with a single agent.
[0006] Existing technologies can address the above problems through multi-agent reinforcement learning, achieving relatively good performance in the case of multiple agents. However, this method still faces the following shortcomings:
[0007] 1. In scenarios with a large number of intelligent agents, the algorithm has very high requirements for computing power. The search space of the overall task grows exponentially with the number of intelligent agents. Faced with the huge search space of a large number of intelligent agents, the algorithm is difficult to train and converge to the optimal solution with limited computing power. Therefore, existing methods usually handle tasks with fewer than 10 intelligent agents.
[0008] 2. In scenarios with different numbers of agents, the transferability of the algorithm is weak. The network structure of existing algorithms usually only supports tasks with a fixed number of agents. Even if there is a network structure that supports changes in the number of agents, due to the overfitting characteristics of neural networks, the trained model often performs well in the training environment, but performs poorly in scenarios with different numbers of agents, which greatly limits the flexible deployment of the algorithm.
[0009] In summary, existing multi-agent navigation systems suffer from high real-time computational overhead and poor performance. Furthermore, the action and state spaces of multiple agents are too complex, making it difficult to optimize using single-agent machine learning methods. Overfitting is also prone to occur during training, resulting in poor model transferability. Deployment in environments with varying numbers of agents is also difficult, hindering large-scale application. These issues urgently need to be addressed. Summary of the Invention
[0010] This application provides a scalable multi-agent hierarchical navigation method and apparatus based on graph networks to solve the problems of existing multi-agent navigation methods, such as large computational load, poor performance, and overly complex action and state spaces of multiple agents, which easily lead to overfitting, poor transfer ability, and difficulty in large-scale application.
[0011] The first aspect of this application provides a scalable multi-agent hierarchical navigation method based on graph networks, comprising the following steps: receiving state information of all agents and position information of all target points based on a preset target point selector, generating an agent relationship graph of all agents based on the state information, and generating a target point relationship graph of all target points based on the position information; calculating an agent-target point relationship matrix based on the agent relationship graph and the target point relationship graph, and matching the corresponding optimal target point for the current agent based on the agent-target point relationship matrix; fusing the state information and the position information through a preset subgraph fusion strategy based on a preset path planner to obtain a fusion result, encoding the fusion result according to a preset multi-layer perception strategy, extracting the encoded relationship between the current agent and the optimal target point, and generating the optimal path between the current agent and the optimal target point based on the encoded relationship, so as to control the current agent to go to the optimal target point according to the optimal path.
[0012] Optionally, in one embodiment of this application, the step of calculating the agent-target point relationship matrix based on the agent relationship graph and the target point relationship graph includes: randomly capturing any agent node in the agent relationship graph and any target point node in the target point relationship graph; calculating the linear projection of the node features of the agent node and the target point node respectively; performing a dot product operation on the linear projection of the node features of the agent node and the target point node to obtain the dot product operation result; and normalizing the dot product operation result according to a preset prediction function to obtain the agent-target point relationship matrix.
[0013] Optionally, in one embodiment of this application, fusing the state information and the position information through a preset subgraph fusion strategy includes: encoding all agents to obtain encoding results; splitting all agents based on the encoding results to generate multiple state subgraphs of the agent relationship graph, wherein each state subgraph of the multiple state subgraphs has the same number of agent nodes; encoding the multiple state subgraphs based on a preset graph network model to obtain an encoded state subgraph corresponding to each state subgraph; extracting feature data of the current agent in each encoded state subgraph, and performing information fusion on the feature data to obtain the average feature value of the current agent.
[0014] Optionally, in one embodiment of this application, the step of encoding the fusion result according to a preset multi-layer perception strategy, extracting the encoding relationship between the current agent and the optimal target point, and generating the optimal path between the current agent and the optimal target point based on the encoding relationship includes: encoding the feature average value of the current agent and the position information of the optimal target point according to the preset multi-layer perception strategy to obtain target point encoding information and current agent encoding information; extracting the encoding relationship between the current agent and the optimal target point based on the target point encoding information and the current agent encoding information, and generating the probability distribution of the next action of the current agent according to the encoding relationship; determining the next action of the current agent based on the path planner according to the probability distribution of the next action, and generating the optimal path from the current agent to the optimal target point through the next action.
[0015] A second aspect of this application provides a scalable multi-agent hierarchical navigation device based on graph networks, comprising: a generation module, configured to receive state information of all agents and position information of all target points based on a preset target point selector, and generate an agent relationship graph of all agents based on the state information, and a target point relationship graph of all target points based on the position information; a matching module, configured to calculate an agent-target point relationship matrix based on the agent relationship graph and the target point relationship graph, and match the corresponding optimal target point for the current agent based on the agent-target point relationship matrix; and a navigation module, configured to fuse the state information and the position information based on a preset path planner using a preset subgraph fusion strategy to obtain a fusion result, encode the fusion result according to a preset multi-layer perception strategy, extract the encoded relationship between the current agent and the optimal target point, and generate the optimal path between the current agent and the optimal target point based on the encoded relationship, so as to control the current agent to travel to the optimal target point according to the optimal path.
[0016] Optionally, in one embodiment of this application, the matching module includes: a capturing unit, configured to randomly capture any agent node in the agent relationship graph and any target node in the target point relationship graph; a first calculation unit, configured to calculate the linear projection of node features of the any agent node and the any target node respectively; a second calculation unit, configured to perform a dot product operation on the linear projection of node features of the any agent node and the linear projection of node features of the any target node to obtain a dot product operation result; and a normalization unit, configured to normalize the dot product operation result according to a preset prediction function to obtain the agent-target point relationship matrix.
[0017] Optionally, in one embodiment of this application, the navigation module includes: a first encoding unit, configured to encode all the agents to obtain an encoding result; a splitting unit, configured to split all the agents based on the encoding result to generate multiple state subgraphs of the agent relationship graph, wherein each state subgraph of the multiple state subgraphs has the same number of agent nodes; a second encoding unit, configured to encode the multiple state subgraphs based on a preset graph network model to obtain an encoded state subgraph corresponding to each state subgraph; and a first extraction unit, configured to extract feature data of the current agent in each encoded state subgraph and perform information fusion on the feature data to obtain the average feature value of the current agent.
[0018] Optionally, in one embodiment of this application, the navigation module includes: a third encoding unit, configured to encode the average feature value of the current agent and the position information of the optimal target point according to the preset multi-layer perception strategy, to obtain target point encoding information and current agent encoding information; a second extraction unit, configured to extract the encoding relationship between the current agent and the optimal target point based on the target point encoding information and the current agent encoding information, and generate the probability distribution of the next action of the current agent according to the encoding relationship; and a determination unit, configured to determine the next action of the current agent based on the path planner and the probability distribution of the next action, and generate the optimal path from the current agent to the optimal target point through the next action.
[0019] A third aspect of this application provides an electronic device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the scalable multi-agent hierarchical navigation method based on graph networks as described in the above embodiments.
[0020] A fourth aspect of this application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described scalable multi-agent hierarchical navigation method based on graph networks.
[0021] Therefore, the embodiments of this application have the following beneficial effects:
[0022] The embodiments of this application can receive the state information of all agents and the position information of all target points based on a preset target point selector. Based on the state information, an agent relationship graph of all agents is generated, and based on the position information, a target point relationship graph of all target points is generated. An agent-target point relationship matrix is calculated based on the agent relationship graph and the target point relationship graph. Based on the agent-target point relationship matrix, the optimal target point is matched for the current agent. Based on a preset path planner, the state information and position information are fused using a preset subgraph fusion strategy to obtain the fusion result. The fusion result is encoded according to a preset multi-layer perception strategy to extract the encoded relationship between the current agent and the optimal target point. Based on the encoded relationship, the optimal path between the current agent and the optimal target point is generated, so that the current agent can be controlled to move to the optimal target point according to the optimal path. This application, based on reinforcement learning and a subgraph fusion strategy, enables multiple agents to autonomously select different target points as navigation targets simultaneously, promoting multi-target allocation and multi-machine cooperation, greatly reducing real-time computation overhead, and improving performance and model transferability. This solves the problems of existing multi-agent navigation methods, such as high computational cost, poor performance, and overly complex action and state spaces of multiple agents, which can easily lead to overfitting, poor transferability, and difficulty in large-scale application.
[0023] Additional aspects and advantages of this application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of this application. Attached Figure Description
[0024] The above and / or additional aspects and advantages of this application will become apparent and readily understood from the following description of the embodiments taken in conjunction with the accompanying drawings, wherein:
[0025] Figure 1 A flowchart of a scalable multi-agent hierarchical navigation method based on graph networks provided in an embodiment of this application;
[0026] Figure 2 A schematic diagram of the logical architecture of a scalable multi-agent hierarchical navigation method based on graph networks is provided for one embodiment of this application;
[0027] Figure 3 A path planner framework diagram is provided for one embodiment of this application;
[0028] Figure 4 This is an example diagram of a graph-based scalable multi-agent hierarchical navigation device according to an embodiment of this application;
[0029] Figure 5 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application.
[0030] Among them, 10-A scalable multi-agent hierarchical navigation device based on graph networks, 100-Generation module, 200-Matching module, 300-Navigation module, 501-Memory, 502-Processor, and 503-Communication interface. Detailed Implementation
[0031] The embodiments of this application are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain this application, and should not be construed as limiting this application.
[0032] The following describes a scalable multi-agent hierarchical navigation method and apparatus based on graph networks according to embodiments of this application, with reference to the accompanying drawings. Addressing the problems mentioned in the background section, this application provides a scalable multi-agent hierarchical navigation method based on graph networks. In this method, based on a preset target point selector, the state information of all agents and the position information of all target points are received. An agent relationship graph of all agents is generated based on the state information, and a target point relationship graph of all target points is generated based on the position information. An agent-target point relationship matrix is calculated based on the agent relationship graph and the target point relationship graph. Based on the agent-target point relationship matrix, the optimal target point is matched for the current agent. Based on a preset path planner, the state information and position information are fused using a preset subgraph fusion strategy to obtain a fusion result. The fusion result is encoded according to a preset multi-layer perception strategy to extract the encoded relationship between the current agent and the optimal target point. Based on the encoded relationship, the optimal path between the current agent and the optimal target point is generated, so as to control the current agent to move to the optimal target point according to the optimal path. This application, based on reinforcement learning and subgraph fusion strategies, enables multiple agents to autonomously and simultaneously select different target points as navigation targets. This promotes multi-target allocation and multi-machine cooperation, significantly reduces real-time computation overhead, and improves performance and model transferability. Therefore, it solves the problems of existing multi-agent navigation methods, such as high computational cost, poor performance, and overly complex action and state spaces of multiple agents, leading to overfitting, poor transferability, and difficulty in large-scale application.
[0033] Specifically, Figure 1 This is a flowchart illustrating a scalable multi-agent hierarchical navigation method based on graph networks, provided in an embodiment of this application.
[0034] like Figure 1 As shown, this scalable multi-agent hierarchical navigation method based on graph networks includes the following steps:
[0035] In step S101, based on a preset target point selector, the state information of all agents and the position information of all target points are received, and an agent relationship graph of all agents is generated according to the state information, and a target point relationship graph of all target points is generated according to the position information.
[0036] The embodiments of this application can first utilize a target point selector to receive the state information of all agents and the position information of all target points, such as... Figure 1 As shown, each agent is treated as a node, and all agents are connected into a fully connected graph G. a That is, the agent relationship graph of all agents; in addition, each target point is also regarded as a node, and all target points are connected into a fully connected graph G. g That is, the target point relationship diagram of all target points.
[0037] Therefore, embodiments of this application can utilize a target point selector to construct an agent relationship graph and a target point relationship graph based on the state information of each agent and the position information of the target point. This allows the introduction of a target point selector to select a target point that is more conducive to global optimization for the agents.
[0038] In step S102, the agent-target point relationship matrix is calculated based on the agent relationship graph and the target point relationship graph, and the optimal target point is matched for the current agent based on the agent-target point relationship matrix.
[0039] After constructing the agent relationship graph and the target point relationship graph, embodiments of this application can further utilize the agent relationship graph G based on graph neural networks. a Relationship diagram G with target point g Calculate the agent-target point relationship matrix, and then select the best target point for the current agent based on the matching probability of all target points to complete the matching of the agent relationship graph and the target point relationship graph.
[0040] Optionally, in one embodiment of this application, calculating the agent-target point relationship matrix based on the agent relationship graph and the target point relationship graph includes: randomly capturing any agent node in the agent relationship graph and any target point node in the target point relationship graph; calculating the linear projection of the node features of any agent node and any target point node respectively; performing a dot product operation on the linear projection of the node features of any agent node and the linear projection of the node features of any target point node to obtain the dot product operation result; and normalizing the dot product operation result according to a preset prediction function to obtain the agent-target point relationship matrix.
[0041] Specifically, embodiments of this application can first utilize a graph matcher to randomly capture any agent node in the agent relationship graph and any target node in the target point relationship graph, such as... Figure 2As shown; calculate the linear projections of the node features of any agent node and any target node X, respectively W key X, W query X, W value X; Using the Softmax function, for W query X and W key X performs a dot product calculation to obtain the dot product result, which is the weight matrix W between nodes. mi ; Multi-layer perception with weight sharing f node Use X and W value X*W mi T Update node characteristics.
[0042] Secondly, embodiments of this application can calculate the weight matrix W between any two nodes from different relation graphs. r And update these nodes; multi-layer perception layer f with shared weights edge W query Y, W key Z and dist FMM As input, and predict W through a Softmax operation. r f node Encode node features as Y and W. value Z*W r T Where Y and Z represent nodes from different relational graphs, dist FMM (i,j) For node y in Y i and node z in Z j The distance between fmm, M r (k,h) Let represent the relational distribution of all global target candidate points for agent k, where k is a specific agent, j is a planetary node, and h represents the star node corresponding to planetary node j.
[0043] Therefore, embodiments of this application can calculate the agent-target point relationship matrix using the agent relationship graph and the target point relationship graph based on a preset graph matcher, thereby assigning the best target point to the current agent.
[0044] In step S103, based on a preset path planner, state information and position information are fused through a preset subgraph fusion strategy to obtain a fusion result. The fusion result is then encoded according to a preset multi-layer perception strategy to extract the encoding relationship between the current agent and the best target point. Based on the encoding relationship, the best path between the current agent and the best target point is generated so as to control the current agent to go to the best target point according to the best path.
[0045] Furthermore, to achieve better generalization in scenarios with a variable number of agents, embodiments of this application utilize a subgraph-based path planner, such as... Figure 3 As shown, for each agent, it receives the state information of all other agents and the target point assigned by the target point selector (i.e., the best target point of the current agent). It uses the agent subgraph fusion unit and the agent-target point feature extractor to extract the relationship between agents and the relationship between agents and target points, respectively, to output the next action. This enables the current agent to effectively reduce the collision probability and reach the best target point faster in a multi-target-point, multi-agent scenario.
[0046] Therefore, the embodiments of this application adopt a distributed decision-making approach, in which each agent makes its own independent decision after receiving sensor information, to adapt to the situation in real-world scenarios where agents go online or offline midway. This utilizes hierarchical reinforcement learning to effectively decouple the policy space, reduce the spatial dimension, optimize the agent's neural network model with high data efficiency, and greatly improve the performance of the agent's neural network model.
[0047] Optionally, in one embodiment of this application, state information and position information are fused using a preset subgraph fusion strategy, including: encoding all agents to obtain encoding results; splitting all agents based on the encoding results to generate multiple state subgraphs of the agent relationship graph, wherein each state subgraph has the same number of agent nodes; encoding the multiple state subgraphs based on a preset graph network model to obtain an encoded state subgraph corresponding to each state subgraph; extracting feature data of the current agent in each encoded state subgraph, and fusing the feature data to obtain the average feature value of the current agent.
[0048] Those skilled in the art should understand that, in order to enable intelligent agents to better identify the state information of other intelligent agents around them and avoid unnecessary collisions in the process of heading to the best target point, the embodiments of this application use graph networks for relationship recognition. At the same time, in order to complete the task efficiently in scenarios with different numbers of intelligent agents, the embodiments of this application can also utilize an intelligent agent subgraph fusion device, which adopts the form of multiple subgraph representations to represent different numbers of intelligent agents as multiple subgraphs with a fixed number of nodes.
[0049] Specifically, embodiments of this application first encode the features of all agents using a multilayer perception strategy with weight sharing, based on a subgraph extractor; secondly, embodiments of this application can split the N agents, i.e., all agents, to obtain m subgraphs with k nodes each, where... Nodes in different subgraphs can represent repeated agents until the set of subgraphs contains all agents; then, graph neural networks can be used to encode each of the m subgraphs to extract the relationships between nodes in the subgraphs.
[0050] In the specific implementation process, the embodiments of this application can obtain the encoded information of each subgraph based on the subgraph feature fusion device, that is, the encoded state subgraph, extract the feature data of the agent in each encoded state subgraph, and perform a fusion operation on the extracted feature data to obtain the average feature value of the current agent, which is the encoded relationship characteristic between the current agent and other agents.
[0051] It is understood that the embodiments of this application employ a subgraph fusion approach, which divides multiple agents into multiple subgraphs with a fixed number of nodes, where each node represents an agent. The information of the subgraphs is fused, and then the information from the fused subgraphs is merged. This is equivalent to decomposing scenarios with different numbers of agents into multiple trained scenarios with a fixed number of agents, thereby avoiding overfitting during training, improving the generalization of the trained model in scenarios with different numbers of agents, improving training performance in multi-agent environments with dynamically changing numbers of agents, and enhancing the model's transferability, enabling it to be deployed in environments with varying numbers of agents, thus facilitating large-scale applications.
[0052] Optionally, in one embodiment of this application, encoding the fusion result according to a preset multi-layer perception strategy, extracting the encoding relationship between the current agent and the optimal target point, and generating the optimal path between the current agent and the optimal target point based on the encoding relationship, includes: encoding the feature average value of the current agent and the position information of the optimal target point according to the preset multi-layer perception strategy to obtain target point encoding information and current agent encoding information; extracting the encoding relationship between the current agent and the optimal target point based on the target point encoding information and the current agent encoding information, and generating the probability distribution of the next action of the current agent according to the encoding relationship; determining the next action of the current agent based on the probability distribution of the next action based on the path planner, and generating the optimal path from the current agent to the optimal target point through the next action.
[0053] Furthermore, embodiments of this application can also be based on an agent-target point feature extractor, fusing agent state information and target point location information. In the path planner, the current agent can receive the state information of other agents and the location information of the best target point corresponding to the current agent. Then, the current agent and the state information of other agents are connected to form a graph, and information fusion is performed. This graph is then merged with the location information of the best target point and the state information of the current agent to complete agent feature enhancement. Moreover, embodiments of this application can encode the fusion result through a weight-sharing multi-layer perception strategy to extract the relationship between the agent and the target point, enabling the current agent to travel to the specified best target point more efficiently.
[0054] Therefore, the embodiments of this application can utilize reinforcement learning to deploy the trained neural network as an intelligent agent in a distributed manner in the environment, which greatly reduces the overhead of real-time computing and generates efficient cooperative strategies, promoting multi-objective allocation and multi-machine cooperation.
[0055] The scalable multi-agent hierarchical navigation method based on graph networks proposed in this application receives state information of all agents and position information of all target points based on a preset target point selector. It generates an agent relationship graph for all agents based on the state information and a target point relationship graph for all target points based on the position information. An agent-target point relationship matrix is calculated based on the agent relationship graph and the target point relationship graph, and the optimal target point is matched for the current agent based on this matrix. A preset path planner fuses the state information and position information using a preset subgraph fusion strategy to obtain the fusion result. The fusion result is encoded according to a preset multi-layer perception strategy, and the encoded relationship between the current agent and the optimal target point is extracted. Based on this encoded relationship, the optimal path between the current agent and the optimal target point is generated, allowing the current agent to be controlled to reach the optimal target point. This application, based on reinforcement learning and a subgraph fusion strategy, enables multiple agents to autonomously select different target points simultaneously as navigation targets, promoting multi-target allocation and multi-machine cooperation, significantly reducing real-time computation overhead, and improving performance and model transferability.
[0056] Secondly, with reference to the accompanying drawings, a scalable multi-agent hierarchical navigation device based on graph networks, according to an embodiment of this application, is described.
[0057] Figure 4 This is a block diagram of a graph-network-based scalable multi-agent hierarchical navigation device according to an embodiment of this application.
[0058] like Figure 4 As shown, the scalable multi-agent hierarchical navigation device 10 based on graph networks includes: a generation module 100, a matching module 200, and a navigation module 300.
[0059] The generation module is used to receive the state information of all agents and the position information of all target points based on a preset target point selector, and generate an agent relationship graph of all agents based on the state information, and a target point relationship graph of all target points based on the position information.
[0060] The matching module is used to calculate the agent-target point relationship matrix based on the agent relationship graph and the target point relationship graph, and to match the best target point for the current agent based on the agent-target point relationship matrix.
[0061] The navigation module is used to fuse state information and position information based on a preset path planner and a preset subgraph fusion strategy to obtain the fusion result. It then encodes the fusion result according to a preset multi-layer perception strategy, extracts the encoding relationship between the current agent and the best target point, and generates the best path between the current agent and the best target point based on the encoding relationship. The module then controls the current agent to go to the best target point according to the best path.
[0062] Optionally, in one embodiment of this application, the matching module 200 includes: a capture unit, a first calculation unit, a second calculation unit, and a normalization unit.
[0063] The capturing unit is used to randomly capture any agent node in the agent relationship graph and any target node in the target point relationship graph.
[0064] The first computing unit is used to calculate the linear projection of node features of any agent node and any target node, respectively.
[0065] The second computing unit is used to perform a dot product operation on the linear projection of the node features of any agent node and the linear projection of the node features of any target point node to obtain the dot product operation result.
[0066] The normalization unit is used to normalize the dot product operation result according to the preset prediction function to obtain the agent-target point relationship matrix.
[0067] Optionally, in one embodiment of this application, the navigation module 300 includes: a first encoding unit, a splitting unit, a second encoding unit, and a first extraction unit.
[0068] The first encoding unit is used to encode all agents to obtain the encoding result.
[0069] The splitting unit is used to split all agents based on the encoding results, generating multiple state subgraphs of the agent relationship graph, wherein each state subgraph has the same number of agent nodes.
[0070] The second encoding unit is used to encode multiple state subgraphs based on a preset graph network model to obtain the encoded state subgraph corresponding to each state subgraph.
[0071] The first extraction unit is used to extract the feature data of the current agent in each coded state subgraph and perform information fusion on the feature data to obtain the average feature value of the current agent.
[0072] Optionally, in one embodiment of this application, the navigation module 300 further includes: a third encoding unit, a second extraction unit, and a determination unit.
[0073] The third encoding unit is used to encode the average feature value of the current agent and the location information of the best target point according to the preset multi-layer perception strategy, so as to obtain the target point encoding information and the current agent encoding information.
[0074] The second extraction unit is used to extract the encoding relationship between the current agent and the optimal target point based on the target point encoding information and the current agent encoding information, and to generate the probability distribution of the next action of the current agent based on the encoding relationship.
[0075] The determination unit is used to determine the next action of the current agent based on the probability distribution of the next action, and to generate the optimal path from the current agent to the optimal target point based on the next action.
[0076] It should be noted that the foregoing explanation of the embodiment of the scalable multi-agent hierarchical navigation method based on graph networks also applies to the scalable multi-agent hierarchical navigation device based on graph networks in this embodiment, and will not be repeated here.
[0077] The scalable multi-agent hierarchical navigation device based on graph networks proposed in this application includes a generation module, used to receive state information of all agents and position information of all target points based on a preset target point selector, and generate an agent relationship graph of all agents based on the state information, and a target point relationship graph of all target points based on the position information; a matching module, used to calculate an agent-target point relationship matrix based on the agent relationship graph and the target point relationship graph, and match the corresponding best target point for the current agent based on the agent-target point relationship matrix; and a navigation module, used to fuse state information and position information based on a preset path planner and a preset subgraph fusion strategy to obtain a fusion result, and encode the fusion result according to a preset multi-layer perception strategy, extract the encoded relationship between the current agent and the best target point, and generate the best path between the current agent and the best target point based on the encoded relationship, so as to control the current agent to go to the best target point according to the best path. Based on reinforcement learning and subgraph fusion strategies, this application enables multiple agents to autonomously select different target points as navigation targets simultaneously, promoting multi-target allocation and multi-machine cooperation, greatly reducing the overhead of real-time computing, and improving performance and model transferability.
[0078] Figure 5 A schematic diagram of the structure of an electronic device provided in an embodiment of this application. The electronic device may include:
[0079] The memory 501, the processor 502, and the computer program stored on the memory 501 and capable of running on the processor 502.
[0080] When the processor 502 executes the program, it implements the scalable multi-agent hierarchical navigation method based on graph networks provided in the above embodiments.
[0081] Furthermore, electronic devices also include:
[0082] Communication interface 503 is used for communication between memory 501 and processor 502.
[0083] The memory 501 is used to store computer programs that can run on the processor 502.
[0084] The memory 501 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk storage device.
[0085] If the memory 501, processor 502, and communication interface 503 are implemented independently, then the communication interface 503, memory 501, and processor 502 can be interconnected via a bus to complete communication between them. The bus can be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus, etc. Buses can be categorized as address buses, data buses, control buses, etc. For ease of representation, Figure 5 The bus is represented by a single thick line, but this does not mean that there is only one bus or one type of bus.
[0086] Optionally, in a specific implementation, if the memory 501, processor 502, and communication interface 503 are integrated on a single chip, then the memory 501, processor 502, and communication interface 503 can communicate with each other through an internal interface.
[0087] Processor 502 may be a central processing unit (CPU), an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of this application.
[0088] This application also provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processor, implements the above-described scalable multi-agent hierarchical navigation method based on graph networks.
[0089] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.
[0090] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this application, "N" means at least two, such as two, three, etc., unless otherwise explicitly specified.
[0091] Any process or method described in the flowchart or otherwise herein can be understood as representing a module, segment, or portion of code comprising one or N executable instructions for implementing custom logic functions or processes, and the scope of the preferred embodiments of this application includes additional implementations in which functions may be performed not in the order shown or discussed, including substantially simultaneously or in reverse order depending on the functions involved, as should be understood by those skilled in the art to which embodiments of this application pertain.
[0092] The logic and / or steps represented in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a processor-included system, or other system that can fetch and execute instructions from, an instruction execution system, apparatus, or device). For the purposes of this specification, "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transmit programs for use by, or in conjunction with, an instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of computer-readable media include: an electrical connection having one or more wires (electronic device), a portable computer disk drive (magnetic device), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Alternatively, the computer-readable medium may be paper or other suitable media on which the program can be printed, since the program can be obtained electronically by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in a computer memory.
[0093] It should be understood that the various parts of this application can be implemented using hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods can be implemented using software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.
[0094] Those skilled in the art will understand that all or part of the steps of the methods in the above embodiments can be implemented by a program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.
[0095] Furthermore, the functional units in the various embodiments of this application can be integrated into a processing module, or each unit can exist physically separately, or two or more units can be integrated into a module. The integrated module can be implemented in hardware or as a software functional module. If the integrated module is implemented as a software functional module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.
[0096] The storage medium mentioned above can be a read-only memory, a disk, or an optical disk, etc. Although embodiments of this application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting this application. Those skilled in the art can make changes, modifications, substitutions, and variations to the above embodiments within the scope of this application.
Claims
1. A scalable multi-agent hierarchical navigation method based on graph networks, characterized in that, Includes the following steps: Based on a preset target point selector, the system receives the state information of all agents and the position information of all target points, generates an agent relationship graph of all agents based on the state information, and generates a target point relationship graph of all target points based on the position information. Calculate the agent-target point relationship matrix based on the agent relationship graph and the target point relationship graph, and match the corresponding best target point for the current agent based on the agent-target point relationship matrix; Based on a preset path planner, the state information and the position information are fused through a preset subgraph fusion strategy to obtain a fusion result. The fusion result is then encoded according to a preset multi-layer perception strategy. The encoding relationship between the current agent and the optimal target point is extracted. Based on the encoding relationship, the optimal path between the current agent and the optimal target point is generated so as to control the current agent to go to the optimal target point according to the optimal path. The step of calculating the agent-target point relationship matrix based on the agent relationship graph and the target point relationship graph includes: Randomly capture any agent node in the agent relationship graph and any target node in the target point relationship graph; Calculate the linear projection of the node features of any agent node and any target point node respectively; Perform a dot product operation on the linear projection of the node features of any intelligent agent node and the linear projection of the node features of any target point node to obtain the dot product operation result; The dot product result is normalized according to a preset prediction function to obtain the agent-target point relationship matrix; The process of fusing the state information and the location information using a preset subgraph fusion strategy includes: Encode all the agents to obtain the encoding results; Based on the encoding results, all agents are split to generate multiple state subgraphs of the agent relationship graph, wherein each state subgraph of the multiple state subgraphs has the same number of agent nodes; Based on a preset graph network model, the multiple state subgraphs are encoded to obtain the encoded state subgraph corresponding to each state subgraph. Extract the feature data of the current agent in each encoded state subgraph, and perform information fusion on the feature data to obtain the average feature value of the current agent; The step of encoding the fusion result according to a preset multi-layer perception strategy, extracting the encoding relationship between the current agent and the optimal target point, and generating the optimal path between the current agent and the optimal target point based on the encoding relationship includes: The feature average value of the current agent and the position information of the best target point are encoded according to the preset multi-layer perception strategy to obtain target point encoding information and current agent encoding information. Based on the target point encoding information and the current agent encoding information, the encoding relationship between the current agent and the optimal target point is extracted, and the probability distribution of the next action of the current agent is generated according to the encoding relationship; Based on the path planner, the next action of the current agent is determined according to the probability distribution of the next action, and the optimal path from the current agent to the optimal target point is generated by the next action.
2. A scalable multi-agent hierarchical navigation device based on graph networks, characterized in that, include: The generation module is used to receive the state information of all agents and the position information of all target points based on a preset target point selector, and generate an agent relationship graph of all agents based on the state information, and generate a target point relationship graph of all target points based on the position information. The matching module is used to calculate the agent-target point relationship matrix based on the agent relationship graph and the target point relationship graph, and to match the best target point for the current agent based on the agent-target point relationship matrix; The navigation module is used to fuse the state information and the position information based on a preset path planner and a preset subgraph fusion strategy to obtain a fusion result, encode the fusion result according to a preset multi-layer perception strategy, extract the encoding relationship between the current agent and the optimal target point, and generate the optimal path between the current agent and the optimal target point based on the encoding relationship, so as to control the current agent to go to the optimal target point according to the optimal path; The matching module includes: The capturing unit is used to randomly capture any agent node in the agent relationship graph and any target node in the target point relationship graph; The first computing unit is used to calculate the linear projection of the node features of any intelligent agent node and any target point node, respectively. The second calculation unit is used to perform a dot product operation on the linear projection of the node features of any intelligent agent node and the linear projection of the node features of any target point node to obtain the dot product operation result. The normalization unit is used to normalize the dot product operation result according to the preset prediction function to obtain the agent-target point relationship matrix; The navigation module includes: The first encoding unit is used to encode all the intelligent agents to obtain the encoding result; A splitting unit is used to split all the agents based on the encoding result to generate multiple state subgraphs of the agent relationship graph, wherein each state subgraph of the multiple state subgraphs has the same number of agent nodes; The second encoding unit is used to encode the plurality of state subgraphs based on a preset graph network model to obtain the encoded state subgraph corresponding to each state subgraph. The first extraction unit is used to extract the feature data of the current agent in each encoded state subgraph, and to perform information fusion on the feature data to obtain the average feature value of the current agent; The navigation module also includes: The third encoding unit is used to encode the average feature value of the current agent and the position information of the optimal target point according to the preset multi-layer perception strategy, so as to obtain target point encoding information and current agent encoding information. The second extraction unit is used to extract the encoding relationship between the current agent and the optimal target point based on the target point encoding information and the current agent encoding information, and to generate the probability distribution of the next action of the current agent according to the encoding relationship; The determining unit is configured to determine the next action of the current agent based on the path planner and the probability distribution of the next action, and generate the optimal path from the current agent to the optimal target point through the next action.
3. An electronic device, characterized in that, include: The system includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the scalable multi-agent hierarchical navigation method based on graph networks as described in claim 1.
4. A computer-readable storage medium having a computer program stored thereon, characterized in that, The program is executed by the processor to implement the scalable multi-agent hierarchical navigation method based on graph networks as described in claim 1.