Multi-agent risk perception security computing method based on federal reinforcement learning

By optimizing the selection of participating nodes and edge offloading strategies through federated reinforcement learning, the problems of vehicle computing task requirements and privacy leakage are solved, and a safe and efficient computing mode selection is achieved.

CN116017571BActive Publication Date: 2026-06-26NANJING UNIV OF AERONAUTICS & ASTRONAUTICS

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NANJING UNIV OF AERONAUTICS & ASTRONAUTICS
Filing Date
2022-12-14
Publication Date
2026-06-26

Smart Images

  • Figure QLYQS_2
    Figure QLYQS_2
  • Figure QLYQS_6
    Figure QLYQS_6
  • Figure QLYQS_10
    Figure QLYQS_10
Patent Text Reader

Abstract

The application provides a kind of multi-agent risk perception security computing method based on federal reinforcement learning, belong to wireless communication and information security field.This method provides a kind of multi-agent joint framework based on federal reinforcement learning, let vehicle according to the dynamic characteristics of internet of vehicles environment, autonomous selection computing mode.In "safety first" mode, federal server as agent observes current channel quality, vehicle's historical federal participation rate and last time participation state and other parameters, and uses safety reinforcement learning algorithm to select vehicle participating in this round of federal training.In "efficiency first" mode, vehicle as agent, does not participate in federal training, observes current channel quality and edge node available resources and historical service quality, and uses multi-agent safety reinforcement learning algorithm to select edge node for computing task offloading.This method can take into account the data security and computing efficiency of vehicle, and reduce vehicle energy consumption and task computing delay.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention provides a multi-agent risk-aware security computing method based on federated reinforcement learning, belonging to the fields of wireless communication and information security. Background Technology

[0002] Vehicles inevitably generate computational tasks while in motion; however, their limited resources are insufficient to meet these demands. Therefore, vehicles need to seek assistance from edge nodes to perform these computations. However, this data interaction between the vehicle and the VEC server via a public channel does not consider the security of the task data, making it highly susceptible to user privacy leaks.

[0003] Federated learning, as an emerging distributed framework, allows participating parties to upload their local parameters to a federated server for model training. The federated server does not need to use the original datasets of the participating parties, effectively avoiding the risks associated with user privacy leaks. Federated learning also has significant applications in connected vehicles. For example, Chinese patent application publication number CN113163366 discloses a privacy-preserving model aggregation system and method based on federated learning for connected vehicles; and Chinese patent application publication number CN114627648 discloses a method and system for guiding urban traffic flow based on federated learning.

[0004] However, federated learning consumes resources and introduces latency, making it difficult to balance efficiency. Reinforcement learning, as a method for finding optimal decisions through exploration and utilization, can observe the real-time environment and dynamically formulate strategies to achieve the best results, thereby improving the computational efficiency of connected vehicle services. For example, Chinese patent application publication number CN113296845 proposes a task offloading algorithm design based on a dual-deep Q-network algorithm improved by deep reinforcement learning in an edge computing environment; Chinese patent application publication number CN114138373 invents an edge computing task offloading method based on reinforcement learning.

[0005] Unlike the research on application systems based on federated learning in the Internet of Vehicles and the research on decision optimization based on reinforcement learning, this patent combines the privacy protection of federated learning with the efficiency optimization of reinforcement learning, and designs a multi-agent risk perception security computing method based on federated reinforcement learning. This method provides a multi-agent joint framework based on federated reinforcement learning, which enables vehicles to autonomously select computing modes according to the dynamic characteristics of the Internet of Vehicles environment and optimize the efficiency of communication between nodes. Summary of the Invention

[0006] The purpose of this invention is to optimize the selection of participating nodes in federated learning and the edge offloading strategy for multi-agent parallelism by utilizing security reinforcement learning, thereby improving the efficiency of both federated learning and edge computing modes while ensuring security, and realizing a multi-agent risk-aware security computing method based on federated reinforcement learning.

[0007] This invention includes the following steps:

[0008] Step 1: Suppose there are N vehicles within the communication range of the federated server, and in the k-th time slot, the federated participation rate of vehicle n is... Participation status and the quality of the provided models 1≤n≤N; Let the amount of data generated by each vehicle's computational task be... The latest delivery time is The required computing resources are Suppose there are M edge nodes providing computing services to these vehicles, and let the available resources of edge node m be... and historical service quality Let the transmission rate of the channel between vehicle n and edge node m be... The transmission rate of the channel between vehicle n and the federated server is Assume the federated server needs to select each round The vehicle participated in federal training;

[0009] Step 2: Construct the first deep neural network: The federated server denotes each possible state of the vehicle-to-everything (V2X) environment as follows: Let the action space of the first deep neural network be A1, with dimension |A1|. Indicates the selected action, let... This is determined by the deep neural network based on the state. and actions The output Q-value, the weight parameters of the deep neural network in the k-th time slot are... Initialize the experience pool size to buffer_size1, and the weight parameters. Learning factor α1, discount factor γ1, and participation level threshold κ1;

[0010] In step 2, the action to be selected, i.e., a set of selected participating nodes, can be represented as: When vehicle n is selected by the federated server, j n =1, otherwise, j n =0;

[0011] Step 3: In the k-th time slot, the federated server observes the current environment and build status. According to probability Select Action Where a′∈A1;

[0012] In step 3, the federated server is built. The specific steps are as follows: In the k-th time slot, based on the current set of transmission rates of the communication channels between the federated server and all vehicles... The set of participation rates of all vehicles at the previous moment The set of participation states of all vehicles at the previous moment The set of model quality provided by all vehicles at the previous moment Build

[0013] probability The calculation formula is as follows:

[0014]

[0015] Among them, long-term risk value It needs to be based on the set θ of the participation rates of all vehicles observed at the previous moment. (k-1) The set of participating states of all vehicles at the previous moment, d (k-1) and the set of model masses h provided by all vehicles at the previous moment. (k-1) The update is performed using the following formula:

[0016]

[0017] in, The learning rate for long-term risk value;

[0018] Step 4: The federated server sends a request to the selected vehicle to upload its local model;

[0019] Step 5: [Settings are missing from the original text] Vehicles that choose to participate in this round of federated training upload their local models to the federated server, which then aggregates and trains a new public service model. Vehicles that do not participate in federated training... The vehicle proceeds to step 10;

[0020] In step 5, vehicles in the environment can autonomously choose a computing mode: if safety is a priority, they can choose to participate in federated training; if efficiency is a priority, they can choose edge computing.

[0021] Step 6: The federated server evaluates the model quality update parameters provided by vehicle n. If vehicle n did not participate in federal training, then Then, update the federated participation rate based on the total number of times vehicle n was selected by the federated server up to time slot k and the total number of times it participated in federated training. And calculate the reward function generated by this communication.

[0022] In step 6, the federated server determines the participation status set d of all vehicles. (k) The set of transmission rates for the federal server and all vehicle-to-vehicle communication channels. and the set of model quality h provided by all current vehicles (k) Calculate the reward function using the following formula.

[0023]

[0024] Step 7: Sequence It is stored in the experience pool as a new experience.

[0025] Step 8: The federated server randomly samples Z1 experience sequences from the experience pool and uses the Adam optimization algorithm to update the weight parameters of the first deep neural network.

[0026] In step 8, the weight parameters The update is as follows:

[0027]

[0028] Where g(z) follows a random distribution U(1, buffer_size1);

[0029] Step 9: Repeat steps 3-8 until the federated server learns a stable participating node selection strategy;

[0030] Step 10: Construct the second deep neural network: Each possible state of all vehicles in the connected vehicle environment is denoted as... Let the action space of the second deep neural network be A2, with dimension |A2|. Indicates the selected action, let... This is determined by the deep neural network based on the state. and actions The output Q-value, the weight parameters of the deep neural network in the k-th time slot are... Initialize the experience pool size to buffer_size2, and the weight parameters. Learning factor α², discount factor γ²;

[0031] In step 10, the action to be selected, i.e., the edge node where the vehicle performs task unloading, can be represented as:

[0032]

[0033] Step 11: In the k-th time slot, all vehicles observe the current environment and construct a state. According to probability Select Action Where a″∈A2;

[0034] In step 11, the federated server is built. The specific steps are as follows: In the k-th time slot, based on the amount of data generated by vehicle n for the computational task at the current time... The latest delivery delay of the computation task generated by vehicle n at the current moment Computational resources required for the computational tasks generated for the current vehicle n The set of estimated transmission rates for the communication channels between vehicle n and all edge nodes at the current time. The set of available resources for all edge nodes at the current moment The set of service quality of all edge nodes at the previous time step Build Then, by utilizing the states of all vehicles, a joint construction can be built.

[0035] probability The calculation formula is as follows:

[0036]

[0037] Among them, long-term risk value The latest delivery time of the computation task needs to be considered. Latency caused by edge computing and the service quality set ρ of all edge nodes at the previous time step. (k-1) The update is performed using the following formula:

[0038]

[0039] in, The learning rate for long-term risk value;

[0040] The latency here The estimated value is composed of the edge unloading delay estimate and the edge computing delay estimate, and is calculated as follows:

[0041]

[0042] Step 12: Vehicle n offloads the computation task to edge node m. After the edge node completes the computation, it returns the computation result and records the latency.

[0043] Step 13: Based on delay Does it exceed the latest delivery time of the computation task? Determine the service quality of edge node m and update its parameters. Observe the reward function generated by this communication.

[0044] In step 13, vehicle n is determined based on the service quality set ρ of all current edge nodes. (k) and latency Calculate the reward function using the following formula.

[0045]

[0046] Among them, time delay These are actual values, obtained through measurement;

[0047] Step 14: Sequence It is stored in the experience pool as a new experience.

[0048] Step 15: Vehicle n randomly samples Z2 experience sequences from the experience pool, and updates the weight parameters of the deep neural network Q2 using the Adam optimization algorithm.

[0049] In step 15, the weight parameters The update is as follows:

[0050]

[0051] Where g(z) follows a random distribution U(1, buffer_size2);

[0052] Step 16: Repeat steps 11-15 until vehicle n learns a stable edge computing unloading selection strategy.

[0053] Beneficial effects: First, this invention dynamically optimizes the selection of training samples by the federated service based on the vehicle's historical federated participation and the quality of the provided model, so as to train a new model with higher quality more efficiently; second, the vehicle dynamically selects the edge offloading strategy based on the available resources of the edge node and the historical service quality, so as to achieve more efficient and secure edge computing. Detailed Implementation

[0054] To better understand the technical content of this invention, the following embodiments are provided for detailed explanation.

[0055] A multi-agent risk-aware security computation method based on federated reinforcement learning includes the following steps:

[0056] Step 1: Assume there are 10 vehicles within the communication range of the federated server. Let the federated participation rate of vehicle n be [value missing] in the k-th time slot. Participation status and the quality of the provided models 1≤n≤10; Let the amount of data generated by each vehicle's computational task be... The latest delivery time is The required computing resources are There are 5 edge nodes providing computing services to these vehicles. Let the available resources of edge node m be... and historical service quality Let the transmission rate of the channel between vehicle n and edge node m be... The transmission rate of the channel between vehicle n and the federated server is Assume that the federal server selects 8 vehicles to participate in federal training each round.

[0057] Step 2: Construct the first deep neural network: The federated server denotes each possible state of the vehicle-to-everything (V2X) environment as follows: Let the action space of the first deep neural network be A1, with dimension |A1|. Indicates the selected action, let... This is determined by the deep neural network based on the state. and actions The output Q-value, the weight parameters of the deep neural network in the k-th time slot are... Initialize the experience pool size buffer_size1 = 3000, and the weight parameters... The learning factor α1 = 0.9, the discount factor γ1 = 0.5, and the participation level threshold κ1 = 0.6.

[0058] In step 2, the action to be selected, i.e., a set of selected participating nodes, can be represented as: When vehicle n is selected by the federated server, j n =1, otherwise, j n =0;

[0059] Step 3: In the k-th time slot, the federated server observes the current environment and build status. According to probability Select Action Where a′∈A1.

[0060] In step 3, the federated server is built. The specific steps are as follows: In the k-th time slot, based on the current set of transmission rates of the communication channels between the federated server and all vehicles... The set of participation rates of all vehicles at the previous moment The set of participation states of all vehicles at the previous moment The set of model quality provided by all vehicles at the previous moment Build

[0061] probability The calculation formula is as follows:

[0062]

[0063] Among them, risk value It needs to be based on the set θ of the participation rates of all vehicles observed at the previous moment. (k-1) The set of participating states of all vehicles at the previous moment, d (k-1) and the set of model masses h provided by all vehicles at the previous moment. (k -1) The update is performed using the following formula:

[0064]

[0065] Where 0.95 is the learning rate for the long-term risk value;

[0066] Step 4: The federated server sends a request to the selected vehicle to upload its local model.

[0067] Step 5: [Settings are missing from the original text] Vehicles that choose to participate in this round of federated training upload their local models to the federated server, which then aggregates and trains a new public service model. Vehicles that do not participate in federated training... The vehicle proceeds to step 10; if vehicle n participates in federal training, then... otherwise,

[0068] In step 5, vehicles in the environment can autonomously choose a computing mode: if safety is a priority, they can choose to participate in federated training; if efficiency is a priority, they can choose edge computing.

[0069] Step 6: The federated server evaluates the model quality update parameters provided by vehicle n. If vehicle n did not participate in federal training, then Then, update the federated participation rate based on the total number of times vehicle n was selected by the federated server up to time slot k and the total number of times it participated in federated training. And calculate the reward function generated by this communication.

[0070] In step 6, the federated server determines the participation status set d of all vehicles. (k) The set of transmission rates for the federal server and all vehicle-to-vehicle communication channels. and the set of model quality h provided by all current vehicles (k) Calculate the reward function using the following formula.

[0071]

[0072] Step 7: Sequence It is stored in the experience pool as a new experience.

[0073] Step 8: The federated server randomly samples 32 experience sequences from the experience pool and uses the Adam optimization algorithm to update the weight parameters of the deep neural network Q1.

[0074] In step 8, the weight parameters The update is as follows:

[0075]

[0076] Where g(z) follows a random distribution U(1,3000).

[0077] Step 9: Repeat steps 3-8 until the federated server learns a stable participating node selection strategy.

[0078] Step 10: Construct the second deep neural network: Each possible state of all vehicles in the connected vehicle environment is denoted as... Let the action space of the second deep neural network be A2, with dimension |A2|. Indicates the selected action, let... This is determined by the deep neural network based on the state. and actions The output Q-value, the weight parameters of the deep neural network in the k-th time slot are... Initialize the experience pool size buffer_size2 = 3000, and the weight parameters... Learning factor α² = 0.9, discount factor γ² = 0.5.

[0079] In step 10, the action to be selected, i.e., the edge node where the vehicle performs task unloading, can be represented as:

[0080]

[0081] Step 11: In the k-th time slot, all vehicles observe the current environment and construct a state. According to probability Select Action Where a″∈A2.

[0082] In step 11, the federated server is built. The specific steps are as follows: In the k-th time slot, based on the amount of data generated by vehicle n for the computational task at the current time... The latest delivery delay of the computation task generated by vehicle n at the current moment Computational resources required for the computational tasks generated for the current vehicle n The set of estimated transmission rates for the communication channels between vehicle n and all edge nodes at the current time. The set of available resources for all edge nodes at the current moment The set of service quality of all edge nodes at the previous time step Build Then, by utilizing the states of all vehicles, a joint construction can be built.

[0083] probability The calculation formula is as follows:

[0084]

[0085] Among them, risk value The latest delivery time of the computation task needs to be considered. Latency caused by edge computing and the service quality set ρ of all edge nodes at the previous time step. (k-1) The update is performed using the following formula:

[0086]

[0087] Where 0.95 is the learning rate for the long-term risk value;

[0088] The latency here The estimated value is composed of the edge unloading delay estimate and the edge computing delay estimate, and is calculated as follows:

[0089]

[0090] Step 12: Vehicle n offloads the computation task to edge node m. After the edge node completes the computation, it returns the computation result and records the latency.

[0091] Step 13: Based on delay Does it exceed the latest delivery time of the computation task? To determine the service quality of edge node m at this time, based on edge node m at the new time... The ratio of the total number of edge computing operations completed to the total number of edge computing operations performed updates the service quality parameter. Observe the reward function generated by this communication.

[0092] In step 13, vehicle n is determined based on the service quality set ρ of all current edge nodes. (k) and latency Calculate the reward function using the following formula.

[0093]

[0094] Among them, time delay This is the actual value, obtained through measurement.

[0095] Step 14: Sequence It is stored in the experience pool as a new experience.

[0096] Step 15: Vehicle n randomly samples 32 experience sequences from the experience pool, and updates the weight parameters of the deep neural network Q2 using the Adam optimization algorithm.

[0097] In step 15, the weight parameters The update is as follows:

[0098]

[0099] Where g(z) follows a random distribution U(1,3000).

[0100] Step 16: Repeat steps 11-15 until vehicle n learns a stable edge computing unloading selection strategy.

Claims

1. A multi-agent risk-aware security computation method based on federated reinforcement learning, characterized in that, Includes the following steps: Step 1: Assume that the communication range of the federated server includes... The federal participation rate of vehicle n in the k-th time slot. Participation status and model quality , The amount of data generated by the computational task for each vehicle is The latest delivery time is Computing resources are ; It has Let there be m edge nodes, and let the available resources of edge node m be... Service quality , Let the transmission rate of the channel between vehicle n and edge node m be... The transmission rate between vehicle n and the federated server is Assume the federated server selects each round. The vehicle participated in federal training; Step 2: Construct the first deep neural network: The federated server denotes each possible state of the vehicle-to-everything (V2X) environment as follows: Let the action space of the first deep neural network be... , Indicates the selected action, let... This is determined by the deep neural network based on the state. and actions Output The value is the weight parameter of the deep neural network in the k-th time slot. ; Step 3: In the k-th time slot, the federated server observes the current environment and build status. ; According to probability Select Action ,in ; Step 4: The federated server sends data to the selected... The vehicle sends a request to upload its local model; Step 5: Select participants for this round of federal training The vehicles upload their local models to the federated server, which then uses model aggregation to train a new public service model; the rest... The vehicle proceeds to step 10; Step 6: The federated server evaluates the model quality of vehicle n and updates the parameters. Those who did not participate in federal training Then, the federated participation rate is updated based on the total number of times vehicle n was selected by the federated server up to time slot k and the total number of times it participated in federated training. And calculate the reward function. ; Step 7: Sequence Stored in the experience pool; Step 8: The federated server randomly samples from the experience pool. 100 empirical sequences, optimized using Adam The algorithm updates the weight parameters of the first deep neural network. ; Step 9: Repeat steps 3-8 until the federated server learns a stable participating node selection strategy; Step 10: Construct the second deep neural network: Each possible state of all vehicles in the connected vehicle environment is denoted as... The action space of the second deep neural network is , Indicates the selected action. This is determined by the deep neural network based on the state. and actions Output The value is the weight parameter of the deep neural network in the k-th time slot. ; Step 11: In the k-th time slot, all vehicles observe the current environment and construct a state. According to probability Select Action ,in ; Step 12: Vehicle n offloads the computation task to edge node m. After the edge node completes the computation, it returns the computation result and records the latency. ; Step 13: Based on delay Does it exceed the latest delivery time of the computation task? Determine the service quality of edge node m and update its parameters. And calculate the reward function. ; Step 14: Sequence Stored in the experience pool; Step 15: Vehicle n is randomly sampled from the experience pool. A series of empirical sequences are used to update a deep neural network using the Adam optimization algorithm. Weight parameters ; Step 16: Repeat steps 11-15 until vehicle n learns a stable edge computing unloading selection strategy.

2. The multi-agent risk-aware security computation method based on federated reinforcement learning as described in claim 1, characterized in that, In step 2, the action to be selected, i.e., a set of selected participating nodes, is represented as: Wherein, when vehicle n is selected by the federated server, =1, otherwise = 0.

3. The multi-agent risk-aware security computation method based on federated reinforcement learning as described in claim 1, characterized in that, In step 3, the federated server is built. The specific steps are as follows: In the k-th time slot, based on the current set of transmission rates of the communication channels between the federated server and all vehicles... = The set of participation rates of all vehicles at the previous moment. The set of states of all vehicles participating in the previous moment. = The previous moment provided a set of model quality for all vehicles. = , build ; probability The calculation formula is as follows: ; Among them, long-term risk value It needs to be based on the set of participation rates of all vehicles observed at the previous moment. The set of participation states of all vehicles at the previous moment. and the set of model quality provided by all vehicles at the previous moment. The update is performed using the following formula: ; in, The learning rate for long-term risk value. The threshold for participation level.

4. The multi-agent risk-aware security computation method based on federated reinforcement learning as described in claim 1, characterized in that, In step 5, vehicles in the environment can autonomously choose a computing mode: if safety is a priority, they can choose to participate in federated training; if efficiency is a priority, they can choose edge computing.

5. The multi-agent risk-aware security computation method based on federated reinforcement learning as described in claim 1, characterized in that, In step 6, the federated server determines the participation status set of all vehicles. The set of transmission rates for the federal server and all vehicle-to-vehicle communication channels. and the set of model quality provided for all current vehicles Calculate the reward function using the following formula. : 。 6. The multi-agent risk-aware security computation method based on federated reinforcement learning as described in claim 1, characterized in that, In step 8, the weight parameters The update is as follows: ;in, Follows uniform distribution , The size of the experience pool. As a learning factor, This is the discount factor.

7. The multi-agent risk-aware security computation method based on federated reinforcement learning as described in claim 1, characterized in that, In step 10, the action to be selected, i.e., the edge node where the vehicle performs task unloading, is represented as: 。 8. The multi-agent risk-aware security computation method based on federated reinforcement learning as described in claim 1, characterized in that, In step 11, the federated server is built. The specific steps are as follows: In the k-th time slot, based on the amount of data generated by vehicle n for the computational task at the current time... The latest delivery delay of the computation task generated by vehicle n at the current moment. , which represents the computing resources required for the computational tasks generated by the current vehicle n. The set of estimated transmission rates for the communication channels between vehicle n and all edge nodes at the current moment. = The set of available resources for all edge nodes at the current moment. = The set of service quality of all edge nodes at the previous moment = , build Then, by utilizing the states of all vehicles, a joint structure can be built. ; probability The calculation formula is as follows: ; Among them, long-term risk value Based on the latest delivery delay of the computation task Latency generated by edge computing and the service quality set of all edge nodes at the previous time step. The update is performed using the following formula: ; in, The learning rate for long-term risk value; Delay The estimated value is composed of the edge unloading delay estimate and the edge computing delay estimate, and is calculated as follows: 。 9. The multi-agent risk-aware security computation method based on federated reinforcement learning as described in claim 1, characterized in that, In step 13, vehicle n is determined based on the service quality set of all current edge nodes. and latency Calculate the reward function using the following formula. : ; Among them, time delay This is the actual value, obtained through measurement.

10. The multi-agent risk-aware security computation method based on federated reinforcement learning as described in claim 1, characterized in that, In step 15, the weight parameters The update is as follows: ;in, Follows uniform distribution , The size of the experience pool. As a learning factor, This is the discount factor.