A multi-objective routing decision-making method and system applied to the metaverse
By establishing diverse service network models and multi-agent routing methods in the metaverse, the problem that traditional networks cannot meet the high quality of service requirements of the metaverse is solved, enabling on-demand customized routing decisions and resource optimization, and improving the QoS of the metaverse network.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- NORTHWESTERN POLYTECHNICAL UNIV
- Filing Date
- 2023-03-13
- Publication Date
- 2026-06-30
AI Technical Summary
The existing network environment cannot meet the high quality of service requirements of the metaverse, especially in traditional core networks where network congestion, edge computing deployment difficulties, high costs and limited resources, and traditional routing algorithms cannot distinguish the QoS requirements of diverse services.
A multi-objective routing decision-making method is adopted. By conducting in-depth analysis of the diverse services in the metaverse, a diversified service network model is established. Multi-agent routing methods and reinforcement learning algorithms are used to make routing decisions in a collaborative manner to meet the differentiated QoS requirements of different data types.
It enables on-demand customized routing decisions in the metaverse, meeting the QoS requirements of diverse services, improving network resource utilization, optimizing routing strategies, reducing network congestion, and enhancing user experience.
Smart Images

Figure CN116366533B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of edge network routing decision technology, specifically relating to a multi-target routing decision method and system applied to the metaverse. Background Technology
[0002] The metaverse has become a new and important form of the next generation of the Internet. However, the current network environment is not yet able to adapt to the immersive high Quality of Service (QoS) requirements of the metaverse.
[0003] Traditional core networks, by centralizing various applications to the cloud, are prone to network congestion and cannot meet the QoS requirements of the metaverse, such as latency and packet loss. Multi-access edge computing (MEC) brings servers closer to the endpoints, enabling faster network service response. However, it is difficult to deploy, costly, and suffers from limited and unevenly distributed resources. Furthermore, traditional routing algorithms cannot differentiate between diverse services, selecting the next-hop route solely based on the destination address.
[0004] Currently, there is a lack of research on routing decisions that address the diverse QoS requirements of different services. Furthermore, the metaverse has spawned numerous application scenarios, such as virtual education, virtual meetings, industrial manufacturing, and immersive performances, further highlighting the diverse QoS requirements of services. At the same time, the distribution of various services across cities also varies significantly. Summary of the Invention
[0005] The technical problem to be solved by this invention is to address the shortcomings of the prior art by providing a multi-target routing decision method and system for the metaverse. By conducting in-depth analysis of the diverse services of the metaverse, this invention solves the technical problem that routing decisions cannot be customized on demand for different data types in specific application scenarios of the metaverse.
[0006] The present invention adopts the following technical solution:
[0007] A multi-objective routing decision-making method applied to the metaverse includes the following steps:
[0008] S1. Design the metaverse network scenario and establish a diversified business network model under the metaverse background;
[0009] S2. Based on the diversified business network model obtained in step S1, and taking the metaverse traffic characteristics as a basis, model the differentiated service quality characteristics, decompose the diversified business requirements into multiple independent target optimizations, and obtain a multi-objective optimization model.
[0010] S3. Based on the multi-objective optimization model obtained in step S2, create virtual objects as decision models for various data types in each router;
[0011] S4. Using the decision model obtained in step S3 as an independent agent, construct a cooperative multi-agent routing method.
[0012] S5. Train the agents using the multi-agent routing method constructed in step S4, and implement multi-objective routing decisions based on the trained agents.
[0013] Specifically, in step S1, the scenario construction of the metaverse network is as follows:
[0014] The core routing network is applied within a city, with M regions having different social service functions and different service types. There are n core routers in each region. The set of core routers V is used as nodes in an undirected graph, and the set of links E is used as edges in the undirected graph, resulting in an undirected graph G(V,E) representing the metaverse network environment.
[0015] Specifically, in step S2, the diversified business is defined as follows:
[0016] The business is divided into latency-sensitive business, high-reliability business, and throughput-sensitive business, with the following weights for the three indicators:
[0017]
[0018] By assigning different weights to three indicators to differentiate businesses, and using a utility function as the evaluation function for each type of business.
[0019] Furthermore, evaluation functions for various types of business. as follows:
[0020]
[0021] in, Let α1, α2, and α3 represent the normalized latency, packet loss, and throughput at time t, respectively, where α1, α2, and α3 are all 1.
[0022] Specifically, in step S3, a decision model is set for the end-to-end routing path between key routers, and the minimum hop count routing algorithm is used for the end-to-end routing path of other routers. The end-to-end routing path between key routers selects i routers as starting nodes and j routers as ending nodes, and decomposes them into 3 types of services. There are a total of i×j×3 agents in the scenario.
[0023] Furthermore, for the decision model, the service quality of the corresponding services in the end-to-end routing path is evaluated using a utility function, and the set of optional actions for the end-to-end routing path is obtained through the k-shortest path algorithm. Each virtual object is treated as an agent, and a multi-agent reinforcement learning algorithm is used to train different end-to-end paths and corresponding services. The resulting multi-agent reinforcement learning model is used as the decision model for each end-to-end path.
[0024] Specifically, in step S4, a centralized learning and distributed execution application framework is adopted, as follows:
[0025] First, we model the state, action, and reward of the reinforcement learning algorithm:
[0026] Based on modeling of network latency, packet loss rate, and throughput, the network environment status is determined as follows:
[0027] s t ={s t,1 ,s t,2 ,…,s t,n}
[0028] in, Indicates routing node v i The number of data packets in the queue, where l represents the length of the data packets. Represents node v i At the transmission rate, each agent can only obtain the states of its adjacent routing nodes as its observation space, represented as...
[0029] The set of k shortest path actions in each decision model is used as the action space;
[0030] Use a utility function as the reward value;
[0031] The distributed execution strategy specifically involves: each agent network The action with the maximum local value function is obtained by using a neural network, and the state change is observed by using gate control loop units. For learning centralized information, a hybrid neural network is used to integrate the local value functions of each agent to obtain a joint action function.
[0032] By restricting the weights of the hybrid network to non-negative numbers and taking the action that maximizes the local value function, the joint action value function is maximized, as follows:
[0033]
[0034]
[0035] Where, τ i u represents the observation state of agent i.i This represents the action performed by agent i;
[0036] The loss function L(θ) of the QMIX algorithm is as follows:
[0037]
[0038] Where b represents the size of the batch data sample obtained by randomly sampling from the experience pool. Let θ represent the target network's decision on sample i, θ represent the network parameters, and s represent the environmental state.
[0039] Specifically, in step S5, the training of the multi-agent system is carried out as follows:
[0040] S501, Input the number of training iterations T and the number of rounds that the training data packets need to be forwarded in each round, episode_limit;
[0041] S502, Modeling multi-agent reinforcement learning methods;
[0042] S503: Assign intelligent agents and set up neural network models for router forwarding services and destinations;
[0043] S504. Construct and initialize the experience pool;
[0044] S505. Determine if the current training round number is less than T;
[0045] S506. When the number of training rounds is less than T, proceed to step S508.
[0046] S507. When the number of training rounds is greater than or equal to T, proceed to step S5022.
[0047] S508. Initialize the environment and obtain the corresponding attributes and status;
[0048] S509, Clear packet forwarding rounds;
[0049] S5010. Determine if the number of data packet forwarding rounds is less than episode_limit;
[0050] S5011. When the number of data packet forwarding rounds is less than episode_limit, proceed to step S5013.
[0051] S5012. When the number of packet forwarding rounds is greater than or equal to episode_limit, proceed to step S5021.
[0052] S5013, forwarding rounds +1;
[0053] S5014. Obtain the Q value through the neural network model of each agent, and select actions according to the ε greedy policy;
[0054] S5015. Store the current status, next status, reward, action and other data into the experience pool;
[0055] S5016. Determine if the number of data in the experience pool is greater than or equal to the batch size;
[0056] S5017. When the experience pool data is greater than or equal to the batch size, proceed to step S5019.
[0057] S5018. When the experience pool data is smaller than the batch size, jump to step S5010.
[0058] S5019. Randomly sample and learn data from the same position in each episode in the experience pool;
[0059] S5020, Update the neural network parameters and jump to step S5010;
[0060] S5021, Increment the number of training sessions by 1, and jump to step S505;
[0061] S5022. Obtain the Q value using the neural network trained by each agent;
[0062] S5023. Each agent selects the action with the largest Q value as the forwarding path for various data streams;
[0063] S5024, Output the corresponding forwarding path.
[0064] Furthermore, the routing and packet forwarding are as follows:
[0065] Step 1: Initialize the environment and set related parameters;
[0066] Step 2: Load the network topology;
[0067] Step 3: Update link latency, packet loss, and router cache utilization based on network conditions;
[0068] Step 4: Use the RIP algorithm as the routing algorithm for the non-intelligent router;
[0069] Step 5: Initialize the routing algorithms of each agent router;
[0070] Step 6: When the agent router selects the deep reinforcement learning routing algorithm, proceed to step 9;
[0071] Step 7: When the agent router selects the MT-OSPF routing algorithm, proceed to step 32;
[0072] Step 8: When the intelligent agent router selects the RIP routing algorithm, proceed to step 34;
[0073] Step 9: Calculate the weight of the path corresponding to each data type using the utility function;
[0074] Step 10: Determine if this is the first time the k shortest paths are obtained as the set of optional actions;
[0075] Step 11: When obtaining the shortest path k for the first time, jump to step 13;
[0076] Step 12: If this is not the first time obtaining the shortest path k, proceed to step 14.
[0077] Step 13: Use the MT-OSPF algorithm to obtain the k shortest paths as the set of possible actions for each agent;
[0078] Step 14: Using agent networks Obtain the packet forwarding path;
[0079] Step 15: Perform one round of packet forwarding based on the packet forwarding path;
[0080] Step 16: Obtain the new network status;
[0081] Step 17: Determine whether the routing algorithm is a deep reinforcement learning algorithm;
[0082] Step 18: If the routing algorithm is a deep reinforcement learning algorithm, proceed to step 20;
[0083] Step 19: If the routing algorithm is not a deep reinforcement learning algorithm, skip to step 6;
[0084] Step 20: Calculate and record the reward value based on the utility function;
[0085] Step 21: Determine if the number of data packet forwarding attempts is less than episode_limit;
[0086] Step 22: When the number of packet forwardings is less than episode_limit, proceed to step 6;
[0087] Step 23: Increase episode_limit by 1;
[0088] Step 24: Train the neural network model for each agent;
[0089] Step 25: Determine if the number of training iterations is less than train_limit;
[0090] Step 26: When the number of training iterations is less than train_limit, proceed to step 28;
[0091] Step 27: When the number of training iterations is greater than or equal to train_limit, proceed to step 6;
[0092] Step 28: Increase training count by 1;
[0093] Step 29: Determine whether each agent has selected a deep reinforcement learning routing algorithm;
[0094] Step 30: When the agent selects a deep reinforcement learning routing algorithm, proceed to step 35;
[0095] Step 31: If the agent does not select a deep reinforcement learning routing algorithm, proceed to step 36.
[0096] Step 32: Calculate the weight of the path corresponding to each data type using the utility function;
[0097] Step 33: Use the MT-OSPF algorithm to obtain the packet forwarding path, and then jump to step 15;
[0098] Step 34: Use the RIP algorithm to obtain the packet forwarding path, and then jump to step 15;
[0099] Step 35: Save the current neural network models of each agent;
[0100] Step 36: Output the corresponding performance graph.
[0101] Secondly, embodiments of the present invention provide a multi-objective routing decision system applied to the metaverse, comprising:
[0102] The network module designs the metaverse network scenario and establishes a diverse business network model within the metaverse context.
[0103] The demand module, based on the diversified business network model obtained from the network module, uses the metaverse traffic characteristics as a basis to model the differentiated service quality characteristics, decomposes the diverse business demands into multiple independent target optimizations, and obtains a multi-objective optimization model.
[0104] The decision module creates virtual objects based on the multi-objective optimization model obtained from the requirements module, serving as decision models for various data types in each router.
[0105] The collaboration module treats the decision model obtained from the decision module as an independent agent and constructs a cooperative multi-agent routing method.
[0106] The training module uses a multi-agent routing method built by the collaborative module to train the agents, and then implements multi-objective routing decisions based on the trained agents.
[0107] Compared with the prior art, the present invention has at least the following beneficial effects:
[0108] A multi-objective routing decision-making method applied to the metaverse provides a solution for end-to-end routing strategies addressing the differentiated QoS requirements of diverse services in future 6G core network construction. It analyzes the diverse QoS of metaverse services from three perspectives: latency, packet loss, and throughput, and uses a utility function to weight these three indicators, aiming to quantify the requirements of each service for different service types. Furthermore, this invention employs multi-agent collaborative reinforcement learning to enable different types of services in each router to select paths as needed under the current network resource conditions, improving utility function indicators, fully utilizing network resources, and meeting differentiated QoS requirements. By segmenting features and using a utility function to highlight these features, the method quantifies the diverse services of the metaverse from a routing decision-making perspective, formulates routing strategies as needed, and achieves multi-objective optimization of diverse services in the metaverse.
[0109] Furthermore, metaverse applications are still in their early stages, and as an important part of 6G, there are currently no relevant scenario solutions. Therefore, a network model is constructed based on the functional area division within a city.
[0110] Furthermore, based on the characteristics of diverse services and differentiated QoS requirements in the metaverse network scenario constructed in the previous step, diverse QoS indicators are modeled to refine the processing of diverse services in the metaverse.
[0111] Furthermore, different metaverse services have differentiated requirements for different QoS indicators. By proposing a utility function, weights are heuristically assigned to differentiated QoS, reflecting the QoS requirements of diverse services.
[0112] Furthermore, virtual objects are created for various service types along each end-to-end path as decision models for path selection. This facilitates optimal individual routing algorithms for diverse services during end-to-end path transmission. In addition, employing a hybrid routing approach—setting virtual objects in critical end-to-end transmission paths while using only minimum hop count routing in other end-to-end paths—improves network scalability and accelerates the convergence speed of routing algorithms.
[0113] Furthermore, each virtual object's decision-making only considers its own optimality. However, without coordination among these virtual objects, conflicts between decisions can easily arise, leading to network congestion. Adopting a cooperative algorithm framework of centralized learning and distributed execution is beneficial for coordinating various decisions and preventing conflicts.
[0114] Furthermore, by using multi-agent reinforcement learning training, multi-objective functions are optimized to improve the QoS of diverse services.
[0115] Furthermore, a packet forwarding method applicable to diverse services in the metaverse is provided.
[0116] It is understandable that the beneficial effects of the second aspect mentioned above can be found in the relevant descriptions in the first aspect mentioned above, and will not be repeated here.
[0117] In summary, this invention can be used for routing calculations in diverse business scenarios within the metaverse, providing a new approach for the construction of future 6G core networks.
[0118] The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description
[0119] Figure 1 This is a flowchart of the present invention;
[0120] Figure 2 A diagram illustrating various business scenarios within the metaverse;
[0121] Figure 3 A diagram of a multi-agent reinforcement learning framework in a metaverse scenario;
[0122] Figure 4 A neural network model diagram for a multi-agent reinforcement learning algorithm;
[0123] Figure 5 Flowchart of a multi-agent reinforcement learning algorithm;
[0124] Figure 6 Here is a flowchart of the routing packet forwarding algorithm;
[0125] Figure 7 This is a comparison chart of the reward values of the present invention with those of MT-OSPF and RIP methods;
[0126] Figure 8 This is a performance comparison chart of the average packet latency of this invention with MT-OSPF and RIP;
[0127] Figure 9 This is a performance comparison chart of the packet loss rate of this invention with MT-OSPF and RIP;
[0128] Figure 10 This is a performance comparison chart of the data packet throughput of the present invention with MT-OSPF and RIP. Detailed Implementation
[0129] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0130] In the description of this invention, it should be understood that the terms "comprising" and "including" indicate the presence of the described features, integrals, steps, operations, elements and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or collections thereof.
[0131] It should also be understood that the terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise.
[0132] It should also be further understood that the term "and / or" as used in this specification and the appended claims refers to any combination and all possible combinations of one or more of the associated listed items, and includes such combinations. For example, A and / or B can represent three cases: A alone, A and B simultaneously, and B alone. Additionally, the character " / " in this document generally indicates that the preceding and following objects have an "or" relationship.
[0133] It should be understood that although terms such as first, second, third, etc., may be used in the embodiments of the present invention to describe the preset range, these preset ranges should not be limited to these terms. These terms are only used to distinguish the preset ranges from one another. For example, without departing from the scope of the embodiments of the present invention, the first preset range may also be referred to as the second preset range, and similarly, the second preset range may also be referred to as the first preset range.
[0134] Depending on the context, the word "if" as used here can be interpreted as "when," "when," "in response to determination," or "in response to detection." Similarly, depending on the context, the phrase "if determination" or "if detection (of the stated condition or event)" can be interpreted as "when determination," "in response to determination," "when detection (of the stated condition or event)," or "in response to detection (of the stated condition or event)."
[0135] The accompanying drawings illustrate various structural schematic diagrams according to embodiments disclosed in this invention. These drawings are not to scale, and some details have been enlarged for clarity, and some details may have been omitted. The shapes of the various regions and layers shown in the drawings, as well as their relative sizes and positional relationships, are merely exemplary and may deviate from reality due to manufacturing tolerances or technical limitations. Furthermore, those skilled in the art can design regions / layers with different shapes, sizes, and relative positions as needed.
[0136] This invention provides a multi-objective routing decision-making method applied to the metaverse. It analyzes the diverse QoS of metaverse services from three perspectives: latency, packet loss, and throughput. A utility function is used to weight these three indicators, aiming to quantify the service requirements of each service for different service types. Furthermore, this invention employs multi-agent collaborative reinforcement learning to enable different types of services in each router to select paths as needed under the current network resource conditions, improving utility function indicators, fully utilizing network resources, and meeting differentiated QoS requirements. By performing feature segmentation and using a utility function to highlight these features, this invention quantifies the diverse services of the metaverse from a routing decision-making perspective, formulates routing strategies as needed, and achieves multi-objective optimization of the diverse services of the metaverse.
[0137] Please see Figure 1 This invention provides a multi-objective routing decision-making method applied to the metaverse, comprising the following steps:
[0138] S1. Design the metaverse network scenario and establish a diversified business network model under the metaverse background;
[0139] Please see Figure 2 The multi-type service network environment model under the metaverse background includes:
[0140] According to modern urban planning, the city is divided into five functional zones: industrial zone, central business district, residential zone, cultural and educational zone, and scenic zone.
[0141] Different functional zones correspond to different metaverse application scenarios. For example, residential areas primarily feature home entertainment projects such as virtual cinemas and immersive games; central business districts mainly offer virtual meetings and immersive shopping; industrial zones primarily handle equipment testing and industrial manufacturing; scenic areas offer virtual exhibitions and virtual tourism; and educational zones feature virtual classrooms and immersive sports. Furthermore, different application scenarios have different network resource requirements. Industrial zones may prioritize data reliability over throughput; scenic areas have lower requirements for latency and data reliability but higher requirements for data throughput. In addition, the availability of network resources such as edge servers, routers, and links varies across different areas, resulting in an uneven distribution of resources.
[0142] The metaverse network environment in this invention is defined as follows:
[0143] This invention is applied to a core routing network within a city, where there are M regions with different social service functions, and a certain number of core routers exist in each region.
[0144] The core router is represented as:
[0145] V = {v1, v2, v3, ..., v} n}
[0146] There are n routers in total, v i Let i be the i-th router.
[0147] The link set is represented as:
[0148] E={e i,j |i,j∈{1,2,…n}}
[0149] Where e i,j Represents node v i With node v j Connected links.
[0150] An end-to-end path is represented as:
[0151] path src,dst ∈{v src ,e src,i ,v i ,...,v dst}
[0152] Where path src,dst Indicates from source node v src to target node v dst The end-to-end path.
[0153] Furthermore, an undirected graph G(V,E) is used to represent the metaverse network environment, where V represents the set of router nodes and E represents the set of links.
[0154] S2. The traffic characteristics of diverse services in the metaverse are analyzed from the perspectives of latency, packet loss, and throughput. In order to reflect the differentiated QoS requirements of diverse services, a general utility function is used to customize the end-to-end transmission path of diverse services on demand, so as to meet the differentiated QoS requirements and improve the user experience.
[0155] This invention defines diversified services as follows:
[0156] The various service types in this invention include traffic characteristics from three perspectives: latency x, packet loss y, and throughput z, as shown in the following formula:
[0157]
[0158] Where, x t y t , z t Let represent the sets of latency, packet loss, and throughput corresponding to the end-to-end path within the network at time t. They represent node v at time t, respectively. i to node v j The latency, packet loss, and throughput corresponding to the end-to-end path.
[0159] The latency of the data packet on the link is:
[0160]
[0161] in, Indicates transmission link e i,j The length of η is given by ι, where ι represents the data transmission speed on the link, l represents the length of the data packet, and η represents the number of data packets in the queue. Represents node v i The sending rate.
[0162] The end-to-end latency of the data packet is:
[0163]
[0164] The number of packet losses from end to end is:
[0165]
[0166] in, and These represent router node v. i and link e i,j The number of data packets lost.
[0167] However, due to the dynamic nature of networks, end-to-end paths change over time. A packet successfully received by a router at a given moment may originate from routing decisions made in previous moments, and therefore cannot reflect the throughput performance of the current routing path. Therefore, we use the minimum remaining capacity ratio of routers along the end-to-end path to reflect this metric.
[0168]
[0169] in, Represents node v i The remaining capacity, Represents node v i The full capacity.
[0170] After modeling the three QoS metrics, a target function needs to be defined to optimize the routing problem for differentiated QoS requirements of various services. Based on this, we propose a utility function. Used to represent node v at time t i to node v j Different QoS requirements for different types of services corresponding to the end-to-end path.
[0171]
[0172] in, These represent the normalized latency, packet loss, and throughput at time t, respectively. α1, α2, and α3 are used to balance different QoS metrics to ensure fairness among them, and are set to 1.
[0173] The normalization process is as follows:
[0174] q t ∈{x t ,y t ,z t}
[0175]
[0176] in q represents the value of the normalized QoS metric. max This indicates the maximum value corresponding to the QoS indicator.
[0177] To ensure that the multi-type service classification method in this invention conforms to most real-world situations, services are classified into latency-sensitive services, high-reliability services, and throughput-sensitive services, with the following weights corresponding to the three indicators:
[0178]
[0179] The objective optimization function is:
[0180]
[0181] stw1+w2+w3=1
[0182] w1, w2, w3 ≥ 0
[0183]
[0184] S3. In real network topologies, there are a large number of routers. Each router needs to calculate routing paths for each destination address, and also consider diverse service types. Therefore, it is impractical to set up a complex decision model for every end-to-end path.
[0185] In addition, to increase network scalability, decision models are only set up between critical end-to-end paths, while the minimum hop count routing algorithm is used for other end-to-end paths. i routers are selected as starting routers and j routers as ending routers, and three types of services are set up respectively. Therefore, a total of i×j×3 decision models need to be created.
[0186] In this invention, for the decision model between key end-to-end paths, the edge weights of each link in the end-to-end routing path are first calculated using a utility function. Then, the set of possible actions for the end-to-end routing path is obtained through the k-shortest path algorithm. Each virtual object runs a deep Q-learning algorithm, continuously exploring and learning to achieve the objective function of end-to-end optimal path selection. A neural network is used to fit the Q-table to solve the problem of excessively large state-action space. Each virtual object selects an appropriate action based on its state through the neural network and receives reward feedback. This process is repeated until a superior strategy is finally obtained. The trained strategy is the decision model for that QoS metric in the end-to-end path.
[0187] S4. Construct a cooperative multi-agent reinforcement learning routing algorithm;
[0188] Please see Figure 3 In step S3, each decision model is treated as an independent agent. Each agent selects appropriate actions based on changes in the network environment, striving to meet the differentiated QoS requirements of diverse services. Figure 3 In the second layer of the multi-type business segmentation process, intelligent agents and non-intelligent agents are segmented using the same method, namely the method used in S2. Furthermore, in the third layer of the framework diagram, only the decision-making model of the intelligent agent is trained into a neural network, and the trained network model is obtained in the fourth layer, and the corresponding intelligent agent is synchronized.
[0189] Please see Figure 4This invention proposes an application framework based on the QMIX algorithm that is suitable for centralized learning and distributed execution of diverse services in the metaverse, and coordinates multi-agent routing algorithms in a cooperative manner.
[0190] First, this invention models the state, action, and reward of reinforcement learning algorithms:
[0191] Network environment status: Based on the above modeling of network latency, packet loss rate, and throughput, the network environment status is represented as follows:
[0192] s t ={s t,1 ,s t,2 ,…,s t,n}
[0193] in, Indicates routing node v i The number of data packets in the queue, where l represents the length of the data packets. Represents node v i The sending rate;
[0194] For each agent, they can only obtain the states of their neighboring routing nodes as their observation space, represented as:
[0195] Observation space: For each agent, it can only obtain the states of its neighboring routing nodes. Therefore, the observation space can be represented as follows:
[0196] Action Space: Segment Routing (SR) is adopted, and the end-to-end routing path is determined at the source routing node. First, the RIP algorithm is used to determine the set of k shortest path actions for the end-to-end path. Then, a multi-agent reinforcement learning routing algorithm is used to select an optimal path from among them.
[0197] Rewards: The reward value needs to reflect the differentiated QoS requirements of diverse services; therefore, we use an objective optimization function as the reward value. Furthermore, to reflect the cooperative nature of multi-agent reinforcement learning, a centralized learning approach is adopted. Using the sum of the reward values of all agents as the global reward value is an important part of cooperative learning.
[0198] Furthermore, the strategy is implemented in a distributed manner, with each agent network... This invention employs a neural network to obtain the action that maximizes the local value function, and utilizes a gated recurrent unit (GRU) to observe state changes, thereby enhancing its adaptability. For learning centralized information, this invention designs a hybrid neural network to integrate the local value functions of each agent, thus obtaining a joint action function. To ensure cooperation, the algorithm restricts the weights of the hybrid network to non-negative numbers (except for the bias), thereby ensuring that the monotonicity of the joint action value function is the same as that of each local value function. Therefore, taking the action that maximizes the local value function is equivalent to maximizing the joint action value function; the specific formula is as follows:
[0199]
[0200]
[0201] Where, τ i u represents the observation state of agent i. i This represents the action performed by agent i.
[0202] The loss function of the QMIX algorithm is:
[0203]
[0204] Where b represents the size of the batch data sample obtained by randomly sampling from the experience pool. Let θ represent the target network's decision on sample i, θ represent the network parameters, and s represent the environmental state.
[0205] tar tot =reward + γmax u' Q tot (τ',u',s';θ - )
[0206] Where reward represents the reward function and γ represents the discount function.
[0207] S5. The multi-agent system is trained using the proposed routing algorithm.
[0208] S501, Input the number of training iterations T and the number of rounds that the training data packets need to be forwarded in each round, episode_limit;
[0209] S502, Modeling multi-agent reinforcement learning methods;
[0210] S503: Assign intelligent agents and set up neural network models for router forwarding services and destinations;
[0211] S504. Construct and initialize the experience pool;
[0212] S505. Determine if the current training round number is less than T;
[0213] S506. When the number of training rounds is less than T, proceed to step S508.
[0214] S507. When the number of training rounds is greater than or equal to T, proceed to step S5022.
[0215] S508. Initialize the environment and obtain the corresponding attributes and status;
[0216] S509, Clear packet forwarding rounds;
[0217] S5010. Determine if the number of data packet forwarding rounds is less than episode_limit;
[0218] S5011. When the number of data packet forwarding rounds is less than episode_limit, proceed to step S5013.
[0219] S5012. When the number of packet forwarding rounds is greater than or equal to episode_limit, proceed to step S5021.
[0220] S5013, forwarding rounds +1;
[0221] S5014. Obtain the Q value through the neural network model of each agent, and select actions according to the ε greedy policy;
[0222] S5015. Store the current status, next status, reward, action and other data into the experience pool;
[0223] S5016. Determine if the number of data in the experience pool is greater than or equal to the batch size;
[0224] S5017. When the experience pool data is greater than or equal to the batch size, proceed to step S5019.
[0225] S5018. When the experience pool data is smaller than the batch size, jump to step S5010.
[0226] S5019. Randomly sample and learn data from the same position in each episode in the experience pool;
[0227] S5020, Update the neural network parameters and jump to step S5010;
[0228] S5021, Increment the number of training sessions by 1, and jump to step S505;
[0229] S5022. Obtain the Q value using the neural network trained by each agent;
[0230] S5023. Each agent selects the action with the largest Q value as the forwarding path for various data streams;
[0231] S5024, Output the corresponding forwarding path.
[0232] Please see Figure 6 After the agent completes training and obtains the routing decision model, it is applied to the packet forwarding process. The packet forwarding process includes the following steps:
[0233] Step 1: Initialize the environment and set related parameters;
[0234] Step 2: Load the network topology;
[0235] Step 3: Update link latency, packet loss, and router cache utilization based on network conditions;
[0236] Step 4: Use the RIP (Routing Information Protocol) algorithm as the routing algorithm for the non-intelligent router;
[0237] Step 5: Initialize the routing algorithms of each agent router;
[0238] Step 6: When the agent router selects the deep reinforcement learning routing algorithm, proceed to step 9;
[0239] Step 7: When the intelligent agent router selects the MT-OSPF (Multi-type Open Shortest Path First) routing algorithm, proceed to step 32;
[0240] Step 8: When the intelligent agent router selects the RIP routing algorithm, proceed to step 34;
[0241] Step 9: Calculate the weight of the path corresponding to each data type using the utility function;
[0242] Step 10: Determine if this is the first time the k shortest paths are obtained as the set of optional actions;
[0243] Step 11: When obtaining the shortest path k for the first time, jump to step 13;
[0244] Step 12: If this is not the first time obtaining the shortest path k, proceed to step 14.
[0245] Step 13: Use the MT-OSPF algorithm to obtain the k shortest paths as the set of possible actions for each agent;
[0246] Step 14: Using agent networks Obtain the packet forwarding path;
[0247] Step 15: Perform one round of packet forwarding based on the packet forwarding path (one round refers to all packet forwarding actions within the βΔt time interval);
[0248] Step 16: Obtain the new network status;
[0249] Step 17: Determine whether the routing algorithm is a deep reinforcement learning algorithm;
[0250] Step 18: If the routing algorithm is a deep reinforcement learning algorithm, proceed to step 20;
[0251] Step 19: If the routing algorithm is not a deep reinforcement learning algorithm, skip to step 6;
[0252] Step 20: Calculate and record the reward value based on the utility function;
[0253] Step 21: Determine if the number of data packet forwarding attempts is less than episode_limit;
[0254] Step 22: When the number of packet forwardings is less than episode_limit, proceed to step 6;
[0255] Step 23: Increase episode_limit by 1;
[0256] Step 24: Train the neural network model for each agent;
[0257] Step 25: Determine if the number of training iterations is less than train_limit;
[0258] Step 26: When the number of training iterations is less than train_limit, proceed to step 28;
[0259] Step 27: When the number of training iterations is greater than or equal to train_limit, proceed to step 6;
[0260] Step 28: Increase training count by 1;
[0261] Step 29: Determine whether each agent has selected a deep reinforcement learning routing algorithm;
[0262] Step 30: When the agent selects a deep reinforcement learning routing algorithm, proceed to step 35;
[0263] Step 31: If the agent does not select a deep reinforcement learning routing algorithm, proceed to step 36.
[0264] Step 32: Calculate the weight of the path corresponding to each data type using the utility function;
[0265] Step 33: Use the MT-OSPF algorithm to obtain the packet forwarding path, and then jump to step 15;
[0266] Step 34: Use the RIP algorithm to obtain the packet forwarding path, and then jump to step 15;
[0267] Step 35: Save the current neural network models of each agent;
[0268] Step 36: Output the corresponding performance graph.
[0269] S6. Test the routing strategies for various types of services in the metaverse.
[0270] Please see Figure 7 The reward values of this method were compared with those of MT-OSPF and RIP routing algorithms at different packet generation rates. The RIP algorithm maintained a reward value of 11.1 when the packet generation rate was below 3.85Gbps, but the reward value decreased significantly as the packet generation rate increased. The MT-OSPF algorithm maintained a reward value of 12.54 when the packet generation rate was below 3.95Gbps, and the reward value also showed a decreasing trend as the packet generation rate increased. The method proposed in this invention, like the MT-OSPF algorithm, only shows a decreasing trend when the packet generation rate reaches 3.95Gbps, but its rate of decrease is much slower than the other two routing algorithms. Furthermore, at low packet generation rates, this method shows a 9% and 32% improvement compared to MT-OSPF and RIP algorithms, respectively, while at a generation rate of 4.55Gbps, the reward value of this method shows an improvement of 37% and 614% compared to MT-OSPF and RIP algorithms, respectively.
[0271] Please see Figure 8 The average latency of our proposed method was compared with that of MT-OSPF and RIP routing algorithms at different packet generation rates for latency-sensitive services. When the packet generation rate is below 3.9 Gbps, the latency of all three algorithms remains constant, with the proposed method exhibiting the lowest average latency. This is because the network is still relatively unobstructed at this point, and the cooperative multi-agent reinforcement learning algorithm coordinates different service types, allowing other tasks to yield the lowest-latency routing path for latency-sensitive services. At lower data generation rates, our method shows a 10% and 30% improvement over MT-OSPF and RIP routing algorithms, respectively. As the data generation rate increases, the average latency of all three algorithms shows an upward trend, with our method showing the slowest increase in average latency.
[0272] Please see Figure 9The packet loss rate of our proposed method was compared with that of MT-OSPF and RIP routing algorithms at different packet generation rates for high-reliability services. It was found that the packet loss rate of the RIP routing algorithm increased significantly when the packet generation rate reached 4.25 Gbps, while the MT-OSPF routing algorithm began to experience packet loss at 4.45 Gbps. In contrast, the algorithm proposed in this study only experienced a small number of packet losses when the data generation rate reached 4.55 Gbps.
[0273] Please see Figure 10 The throughput of throughput-sensitive services was compared with that of MT-OSPF and RIP routing algorithms at different packet generation rates. It can be observed that at lower data generation rates, all three algorithms show a linear upward trend. However, with the RIP routing algorithm, the throughput growth rate for throughput-sensitive services begins to decline when the data generation rate reaches 3.89 Gbps. With the MT-OSPF routing algorithm, the throughput growth rate for throughput-sensitive services begins to decline when the data generation rate reaches 4.1 Gbps. In contrast, the routing algorithm proposed in this invention only begins to decline in the throughput growth rate for throughput-sensitive services when the packet generation rate reaches 4.17 Gbps, and the decline rate is the slowest.
[0274] In another embodiment of the present invention, a multi-objective routing decision system for the metaverse is provided. This system can be used to implement the above-mentioned multi-objective routing decision method for the metaverse. Specifically, the multi-objective routing decision system for the metaverse includes a design module, an optimization module, a virtual module, a routing module, a training module, and a decision module.
[0275] Among them, the design module designs the metaverse network scenario and establishes a diverse business network model under the metaverse background;
[0276] The optimization module, based on the diversified business network model obtained from the design module, uses the metaverse traffic characteristics as a basis to model the differentiated service quality characteristics, decomposes the diversified business requirements into multiple independent optimization objectives, and obtains a multi-objective optimization model.
[0277] The virtual module creates virtual objects based on the multi-objective optimization model obtained from the optimization module, serving as a decision model for various data types in each router.
[0278] The routing module uses the decision model obtained from the virtual module as an independent agent to construct a cooperative multi-agent routing method.
[0279] The training module uses a multi-agent routing method built by the routing module to train the agents, and then implements multi-objective routing decisions based on the trained agents.
[0280] In another embodiment of the present invention, a terminal device is provided, comprising a processor and a memory. The memory stores a computer program, the computer program including program instructions, and the processor executes the program instructions stored in the computer storage medium. The processor may be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. It is the computing and control core of the terminal, suitable for implementing one or more instructions, specifically suitable for loading and executing one or more instructions to achieve a corresponding method flow or corresponding function. The processor described in this embodiment of the present invention can be used in the operation of a multi-objective routing decision method applied to a metaverse, including:
[0281] This paper designs a metaverse network scenario and establishes a diversified service network model under the metaverse background. Based on the diversified service network model and the characteristics of metaverse traffic, it models the differentiated service quality characteristics and decomposes the needs of diversified services into multiple independent optimization objectives, resulting in a multi-objective optimization model. Based on the multi-objective optimization model, virtual objects are created as decision models for various data types in each router. The decision models are used as independent agents to construct a cooperative multi-agent routing method. The constructed multi-agent routing method is used to train the agents, and the multi-objective routing decision is realized based on the trained agents.
[0282] In another embodiment of the present invention, a storage medium is also provided, specifically a computer-readable storage medium (memory). This computer-readable storage medium is a memory device in a terminal device used to store programs and data. It is understood that the computer-readable storage medium here can include both the built-in storage medium in the terminal device and extended storage media supported by the terminal device. The computer-readable storage medium provides storage space that stores the terminal's operating system. Furthermore, this storage space also stores one or more instructions suitable for loading and execution by a processor. These instructions can be one or more computer programs (including program code). It should be noted that the computer-readable storage medium here can be high-speed RAM or non-volatile memory, such as at least one disk storage device.
[0283] One or more instructions stored in a computer-readable storage medium can be loaded and executed by a processor to implement the corresponding steps of the multi-objective routing decision method applied to the metaverse in the above embodiments; one or more instructions in the computer-readable storage medium are loaded and executed by the processor in the following steps:
[0284] This paper designs a metaverse network scenario and establishes a diversified service network model under the metaverse background. Based on the diversified service network model and the characteristics of metaverse traffic, it models the differentiated service quality characteristics and decomposes the needs of diversified services into multiple independent optimization objectives, resulting in a multi-objective optimization model. Based on the multi-objective optimization model, virtual objects are created as decision models for various data types in each router. The decision models are used as independent agents to construct a cooperative multi-agent routing method. The constructed multi-agent routing method is used to train the agents, and the multi-objective routing decision is realized based on the trained agents.
[0285] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some, not all, of the embodiments of the present invention. The components of the embodiments of the present invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the present invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without inventive effort are within the scope of protection of the present invention.
[0286] Example
[0287] To avoid loss of generality, this embodiment uses the real network topology Abilene as the city-wide network topology in this metaverse scenario, which includes 11 nodes and 14 links. Further, 5 nodes are selected as origin routers and 3 nodes as destination routers. For data packets initiated by the origin routers and received by the destination routers, the proposed reinforcement learning method is used for route selection; for other data packets, the RIP routing algorithm is used. The specific parameter settings for this scenario are as follows:
[0288] Each agent selects three shortest paths as its action set. The ratio of data packet generation for the three data types is 1:1:3. The router's data packet sending rate is 1.5Gbps, the router's cache capacity is 375MB, and the data packet size is 20KB.
[0289] Compared with traditional MT-OSPF and RIP routing algorithms, the method of this invention demonstrates its superior performance in terms of latency, packet loss rate, and throughput across diverse service scenarios. This successfully illustrates that the proposed method optimizes the network service quality for various types of services in the metaverse, thereby improving the user experience.
[0290] Experimental results show that this invention can comprehensively consider the relationship between the network environment and the differentiated QoS requirements of diverse services, alleviate the problem of uneven distribution of resources / services, and ensure that the optimal goals of each agent are consistent with the overall optimal goal while guaranteeing the monotonicity of the local optimal strategy and the global optimal strategy. It solves the problem of conflict between diverse services, realizes the customized strategy of end-to-end routing, meets the differentiated QoS requirements of diverse services, and improves the user experience of the metaverse immersive requirements.
[0291] In summary, this invention presents a multi-objective routing decision-making method and system applied to the metaverse. It fully considers the needs of diverse services, decomposing the metaverse application scenario into three metrics: latency, packet loss, and throughput. A general utility function is designed to quantify these metrics, highlighting the characteristics of each service and providing a foundation for subsequent multi-objective routing strategy solutions. By solving the multi-objective routing strategy, optimal strategies are formulated for diverse services on demand. Based on the QMIX algorithm, a cooperative learning strategy is provided for diverse services in the metaverse to coordinate the routing strategies of each decision-making unit. Compared with MT-OSPF and RIP, it shows significant performance improvements in latency, reliability, and throughput.
[0292] The above content is only for illustrating the technical concept of the present invention and should not be construed as limiting the scope of protection of the present invention. Any modifications made to the technical solution based on the technical concept proposed in this invention shall fall within the scope of protection of the claims of this invention.
Claims
1. A multi-target routing decision method applied to a metaverse, characterized in that, Includes the following steps: S1. Design the metaverse network scenario and establish a diverse business network model under the metaverse background. The metaverse network scenario is constructed as follows: The core routing network is applied in a city range, and there are regions with different social service functions, different regions have differentiated service types, and in each region, there are core routers, the core router set is taken as a node in an undirected graph; the link set is taken as an edge of the undirected graph, to obtain an undirected graph representing a meta-universe network environment; S2. Based on the diversified business network model obtained in step S1, and taking the metaverse traffic characteristics as a basis, model the differentiated service quality characteristics, decompose the diversified business requirements into multiple independent target optimizations, and obtain a multi-objective optimization model. S3. Based on the multi-objective optimization model obtained in step S2, create virtual objects as decision models for each data type in each router; S4. Using the decision model obtained in step S3 as an independent agent, construct a cooperative multi-agent routing method, employing an application framework of centralized learning and distributed execution, as detailed below: First, we model the state, action, and reward of the reinforcement learning algorithm: Based on modeling of network latency, packet loss rate, and throughput, the network environment status is determined as follows: in, , Represents routing nodes The number of data packets in the queue, Indicates the length of the data packet. Represents a node At the transmission rate, each agent can only obtain the states of its adjacent routing nodes as its observation space, represented as... ; The set of k shortest path actions in each decision model is used as the action space; Use a utility function as the reward value; The distributed execution strategy specifically involves: each agent network The action with the maximum local value function is obtained by using a neural network, and the state change is observed by using gate control loop units. For learning centralized information, a hybrid neural network is used to integrate the local value functions of each agent to obtain a joint action function. By restricting the weights of the hybrid network to non-negative numbers and taking the action that maximizes the local value function, the joint action value function is maximized, as follows: in, Represents intelligent agents The observation status, Represents intelligent agents The action performed; The loss function of the QMIX algorithm as follows: in, This represents the size of the batch of data samples randomly sampled from the experience pool. Indicates the target network's response to samples The decision, Represents network parameters, Indicates the state of the environment; S5. Train the agents using the multi-agent routing method constructed in step S4, and implement multi-objective routing decisions based on the trained agents.
2. The multi-objective routing decision method applied to the metaverse according to claim 1, characterized in that, In step S2, the diversified services are defined as follows: The business is divided into latency-sensitive business, high-reliability business, and throughput-sensitive business, with the following weights for the three indicators: By assigning different weights to three indicators to differentiate businesses, and using a utility function as the evaluation function for each type of business.
3. The multi-objective routing decision method applied to the metaverse according to claim 2, characterized in that, Evaluation functions for various types of business as follows: in, They represent the normalized values respectively. The latency, packet loss, and throughput at any given moment. , , All are 1.
4. The multi-objective routing decision method applied to the metaverse according to claim 1, characterized in that, In step S3, a decision model is set for the end-to-end routing path between key routers, and the minimum hop count routing algorithm is used for the end-to-end routing path selection between other routers. One router serves as the starting node. Each router serves as an endpoint node, and the services are decomposed into three categories. The scenario involves a total of [number missing] routers. An intelligent agent.
5. The multi-objective routing decision method applied to the metaverse according to claim 4, characterized in that, For the decision model, the service quality of the corresponding service in the end-to-end routing path is evaluated using a utility function, and then the set of optional actions for the end-to-end routing path is obtained through the k-shortest path algorithm. Each virtual object is treated as an agent, and a multi-agent reinforcement learning algorithm is used to train different end-to-end paths and corresponding services. The resulting multi-agent reinforcement learning model is used as the decision model for each end-to-end path.
6. The multi-objective routing decision method applied to the metaverse according to claim 1, characterized in that, In step S5, the training of the multi-agent system is carried out as follows: S501, Input training times and the number of rounds in which training data packets need to be forwarded in each round ; S502, Modeling multi-agent reinforcement learning methods; S503: Assign intelligent agents and set up neural network models for router forwarding services and destinations; S504. Construct and initialize the experience pool; S505. Determine if the current training round number is less than [the previous number]. ; S506, When the number of training rounds is less than If necessary, proceed to step S508; S507, When the number of training rounds is greater than or equal to At that time, proceed to step S5022; S508. Initialize the environment and obtain the corresponding attributes and status; S509, Clear packet forwarding rounds; S5010, Determine if the number of data packet forwarding rounds is less than... ; S5011, When the number of data packet forwarding rounds is less than At that time, proceed to step S5013; S5012, When the number of packet forwarding rounds is greater than or equal to At that time, proceed to step S5021; S5013, forwarding rounds +1; S5014. Obtain the Q value through the neural network model of each agent, and according to... Greedy strategy selects actions; S5015. Store the current status, next status, reward, action and other data into the experience pool; S5016. Determine if the number of data in the experience pool is greater than or equal to the batch size; S5017. When the experience pool data is greater than or equal to the batch size, proceed to step S5019. S5018. When the experience pool data is smaller than the batch size, jump to step S5010. S5019, For each in the experience pool Random sampling and learning are performed on data from the same location within the dataset; S5020, Update the neural network parameters and jump to step S5010; S5021, Increment the number of training sessions by 1, and jump to step S505; S5022. Obtain the Q value using the neural network trained by each agent; S5023. Each agent selects the action with the largest Q value as the forwarding path for various data streams; S5024, Output the corresponding forwarding path.
7. The multi-objective routing decision method applied to the metaverse according to claim 6, characterized in that, The routing and packet forwarding are as follows: Step 1: Initialize the environment and set related parameters; Step 2: Load the network topology; Step 3: Update link latency, packet loss, and router cache utilization based on network conditions; Step 4: Use the RIP algorithm as the routing algorithm for the non-intelligent router; Step 5: Initialize the routing algorithms of each agent router; Step 6: When the agent router selects the deep reinforcement learning routing algorithm, proceed to step 9; Step 7: When the agent router selects the MT-OSPF routing algorithm, proceed to step 32; Step 8: When the intelligent agent router selects the RIP routing algorithm, proceed to step 34; Step 9: Calculate the weight of the path corresponding to each data type using the utility function; Step 10: Determine if this is the first time the k shortest paths are obtained as the set of optional actions; Step 11: When obtaining the shortest path k for the first time, jump to step 13; Step 12: If this is not the first time obtaining the shortest path k, proceed to step 14. Step 13: Use the MT-OSPF algorithm to obtain the k shortest paths as the set of possible actions for each agent; Step 14: Using agent networks Obtain the packet forwarding path; Step 15: Perform one round of packet forwarding based on the packet forwarding path; Step 16: Obtain the new network status; Step 17: Determine whether the routing algorithm is a deep reinforcement learning algorithm; Step 18: If the routing algorithm is a deep reinforcement learning algorithm, proceed to step 20; Step 19: If the routing algorithm is not a deep reinforcement learning algorithm, skip to step 6; Step 20: Calculate and record the reward value based on the utility function; Step 21: Determine if the number of data packet forwarding attempts is less than episode_limit; Step 22: When the number of packet forwardings is less than episode_limit, proceed to step 6; Step 23: Increase episode_limit by 1; Step 24: Train the neural network model for each agent; Step 25: Determine if the number of training iterations is less than train_limit; Step 26: When the number of training iterations is less than train_limit, proceed to step 28; Step 27: When the number of training iterations is greater than or equal to train_limit, proceed to step 29; Step 28: Increase training count by 1; Step 29: Determine whether each agent has selected a deep reinforcement learning routing algorithm; Step 30: When the agent selects a deep reinforcement learning routing algorithm, proceed to step 35; Step 31: If the agent does not select a deep reinforcement learning routing algorithm, proceed to step 36. Step 32: Calculate the weight of the path corresponding to each data type using the utility function; Step 33: Use the MT-OSPF algorithm to obtain the packet forwarding path, and then jump to step 15; Step 34: Use the RIP algorithm to obtain the packet forwarding path, and then jump to step 15; Step 35: Save the current neural network models of each agent; Step 36: Output the corresponding performance graph.
8. A multi-objective routing decision-making system applied to the metaverse, characterized in that, include: The network module designs the metaverse network scenario and establishes diverse business network models within the metaverse context. The metaverse network scenario is constructed as follows: The core routing network is used within the city limits and has [various characteristics]. There are regions with different social service functions, and different regions have different types of services. Within each region, there exists... A core router, a collection of core routers As nodes in an undirected graph; the link set As edges of an undirected graph, we obtain an undirected graph representing the metaverse network environment; The demand module, based on the diversified business network model obtained from the network module, uses the metaverse traffic characteristics as a basis to model the differentiated service quality characteristics, decomposes the diverse business demands into multiple independent target optimizations, and obtains a multi-objective optimization model. The decision module creates virtual objects based on the multi-objective optimization model obtained from the requirements module, serving as decision models for various data types in each router. The collaboration module treats the decision models obtained from the decision-making module as independent agents to construct a cooperative multi-agent routing method. It employs an application framework of centralized learning and distributed execution, as detailed below: First, we model the state, action, and reward of the reinforcement learning algorithm: Based on modeling of network latency, packet loss rate, and throughput, the network environment status is determined as follows: in, , Represents routing nodes The number of data packets in the queue, Indicates the length of the data packet. Represents a node At the transmission rate, each agent can only obtain the states of its adjacent routing nodes as its observation space, represented as... ; The set of k shortest path actions in each decision model is used as the action space; Use a utility function as the reward value; The distributed execution strategy specifically involves: each agent network The action with the maximum local value function is obtained by using a neural network, and the state change is observed by using gate control loop units. For learning centralized information, a hybrid neural network is used to integrate the local value functions of each agent to obtain a joint action function. By restricting the weights of the hybrid network to non-negative numbers and taking the action that maximizes the local value function, the joint action value function is maximized, as follows: in, Represents intelligent agents The observation status, Represents intelligent agents The action performed; The loss function of the QMIX algorithm as follows: in, This represents the size of the batch of data samples randomly sampled from the experience pool. Indicates the target network's response to samples The decision, Represents network parameters, Indicates the state of the environment; The training module uses a multi-agent routing method built by the collaborative module to train the agents, and then implements multi-objective routing decisions based on the trained agents.