Vehicle cloud multi-path computation offloading method and device

By constructing a communication model and a multi-path computation offloading reliability model for vehicle cloud, and combining deep reinforcement learning algorithms to optimize the computation offloading objective function, the problem of failure risk in multi-path offloading of vehicle cloud is solved, and resource utilization and reliability are improved.

CN116360878BActive Publication Date: 2026-06-23CHINA TELECOM CORP LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA TELECOM CORP LTD
Filing Date
2021-12-28
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing vehicle cloud multi-path computation offloading methods fail to effectively consider the failure risk of multi-path offloading, resulting in low resource utilization and low reliability.

Method used

A communication model for the vehicle cloud is constructed, defining the relationship between the overall effect value of the task vehicle and the calculation task unloading variable. A multi-path calculation unloading reliability model based on risk factors is adopted, and the model is solved by a Markov game model and a multi-agent deep reinforcement learning algorithm with deep deterministic policy gradient. The calculation unloading objective function is optimized, and the Ornstein-Uhlenbeck process is introduced to increase the generalization ability of the model.

Benefits of technology

It improves the success rate and resource utilization of multi-path computation offloading, enhances the reliability of computation task offloading in the vehicle cloud environment, and reduces the failure probability of computation task offloading.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116360878B_ABST
    Figure CN116360878B_ABST
Patent Text Reader

Abstract

The present disclosure provides a vehicle cloud multi-path computing offloading method and device, first constructs a computing offloading target function of the overall effect value of the vehicle-mounted resources, maximizes the function value as the optimization target of the computing offloading, and proposes to convert the optimization of the computing offloading target function into the optimization of a multi-path computing offloading reliability model based on a risk factor, so as to improve the success rate and resource utilization of the multi-path computing offloading. In addition, the present disclosure also constructs a Markov game modeling for the multi-path computing offloading reliability model, uses a multi-agent deep reinforcement learning algorithm based on a deep deterministic policy gradient to realize the computing offloading, and in order to avoid converging to a local optimal solution, proposes to use an Ornstein-Uhlenbeck process based on mean reversion and diffusion to increase the generalization ability of the model.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of vehicle networking technology, and in particular to a method and apparatus for offloading multi-path computing in vehicle cloud. Background Technology

[0002] Background of Computational Task Offloading Technology: With the increasing number of resource-intensive vehicle-to-everything (V2X) applications such as autonomous driving, driver assistance, augmented reality, and virtual reality, the problem of resource shortage in V2X is becoming increasingly serious. Computational task offloading, as an effective means, can alleviate the resource shortage problem to a large extent by migrating computational tasks from resource-scarce vehicles to vehicles or transportation facilities with abundant computational resources.

[0003] Background of Computational Task Offloading Research in Vehicle-to-Everything (V2X) Networks: Current methods for offloading computational tasks in V2X networks are mostly based on cloud computing centers, transportation infrastructure, or a combination of both. The dependence on base stations and transportation infrastructure severely limits these solutions. Furthermore, since these computational task offloading services are invested in and operated by third parties, these solutions typically incur high costs. Additionally, in high-speed V2X scenarios, the limited coverage of base stations or transportation infrastructure necessitates frequent switching during computational task offloading, inevitably impacting performance. Considering the differentiated onboard resource states among V2X networks, vehicle clouds, composed of vehicles, offer a new solution for computational task offloading in V2X networks. Therefore, researching cloud-based V2X computational task offloading methods based on vehicle clouds is of significant importance.

[0004] Background of Multi-path Computation Offloading Research in 5G Vehicle Cloud: 5G vehicle cloud computing task offloading is achieved by sharing computing resources among vehicles. The biggest challenge it faces is the instability of the computing task offloading environment caused by the dynamic changes in resources. In intensive vehicle-to-everything (V2X) applications, where the supply of computing resources exceeds demand, computation offloading may fail due to the high speed and dynamic nature of V2X. Therefore, multi-path computation offloading methods have been proposed. Traditional multi-path computation offloading methods have not considered the failure risk of multi-path offloading, resulting in either low resource utilization or low reliability. Resource waste refers to the situation where the success rate of computation task offloading no longer increases with the increase of resources; low reliability refers to the situation where the success rate of computation task offloading has not been maximized. Summary of the Invention

[0005] The vehicle cloud multi-path computation offloading method and apparatus provided in this disclosure are used to solve the problems in related technologies where multi-path computation offloading methods have not considered the failure risk of multi-path offloading, resulting in either low resource utilization or low reliability.

[0006] On one hand, embodiments of this disclosure provide a vehicle cloud multi-path computation offloading method, including:

[0007] A communication model for a vehicle cloud is defined, comprising service vehicles with idle computing resources and task vehicles with insufficient computing resources. The communication model defines the relationship between the overall effect value of task vehicles and the computational task offloading variable.

[0008] The optimization objective is to maximize the overall effect value, and a calculation objective function for unloading is constructed by combining multiple constraints.

[0009] Determine whether the idle computing resources of the service vehicle are greater than the computing resources required by the task vehicle. If so, introduce a multi-way computing offloading reliability model based on risk factors. The optimization objective of the multi-way computing offloading reliability model is to maximize the overall effect value. The risk factor is to represent the average failure probability of offloading each computing task of the service vehicle.

[0010] For the aforementioned multi-path computation offloading reliability model, a Markov game model is constructed, and a multi-agent deep reinforcement learning algorithm based on deep deterministic policy gradient is used to solve the Markov game model to obtain the optimal solution for computation task offloading.

[0011] In some embodiments, the vehicle cloud multi-path computation offloading method provided in this disclosure includes, while solving the Markov game model using a multi-agent deep reinforcement learning algorithm based on deep deterministic policy gradient, optimizing the multi-agent deep reinforcement learning algorithm based on the Ornstein-Uhlenbeck process.

[0012] In some embodiments, the vehicle cloud multi-path computation offloading method provided in this disclosure optimizes the multi-agent deep reinforcement learning algorithm based on the Ornstein-Uhlenbeck process, specifically including:

[0013] Noise N based on the Ornstein-Uhlenbeck process is added to the output of the actor network contained in the multi-agent deep reinforcement learning algorithm. e Where noise N e The formula is as follows:

[0014] N e =κ(μ-X(t))dt+σ n Y(t);

[0015] Where κ represents mean regression, μ represents the mean, X(t) represents a random variable with a mean μ, dt represents the variance, and σ n The weights of diffusion are represented by Y(t), which represents a random variable that follows a Gaussian distribution with mean 0 and variance dt.

[0016] In some embodiments, the vehicle cloud multi-path computation offloading method provided in this disclosure includes, in particular, a multi-path computation offloading reliability model based on risk factors:

[0017]

[0018]

[0019] in, The reliability index represents the risk resistance capability, ε represents the risk factor, ξ represents the multiplexing factor, and K represents the reliability index. s K represents the number of service vehicles. t Indicates the number of vehicles on the mission. This represents the available resources of service vehicle j. This indicates the resource requirements of mission vehicle i.

[0020] In some embodiments, in the vehicle cloud multi-path computational offloading method provided in this disclosure, the optimization objective is to maximize the overall effect value, and a computational offloading objective function is constructed by combining multiple constraints, specifically including:

[0021]

[0022]

[0023] Where maxU represents the maximized total effect value, L i Let C1 represent the computational task offloading variable for task vehicle i, where C1 indicates that the total computational resources allocated to all task vehicles cannot exceed the available resources in the model; C2 indicates that the computational resources allocated to service vehicle j cannot exceed the computational resources it can provide; and L represents the computational resources that can be allocated to service vehicle j. i,j C3 is an index indicating whether task vehicle i will offload its computing resources to service vehicle j; C4 indicates that the computing resources allocated to task vehicle i cannot exceed its required computing resources; C5 and C6 indicate that the maximum number of offloaded connections supported by service vehicle j and task vehicle i cannot exceed the maximum number limits N1 and N2, respectively; C6 represents the transmission rate C. i It should be higher than the specified reference transmission rate R baseline .

[0024] In some embodiments, in the vehicle cloud multi-path computation offloading method provided in this disclosure, constructing a Markov game model specifically includes: using an array (S,A,R,T,γ) to represent the Markov game model, wherein...

[0025]

[0026]

[0027] R = R comm +R comp ,

[0028] S represents the state space, H i For channel gain, This refers to the resource requirements of mission vehicle i. To serve the set of available resources for vehicles, Ω s It is a collection of service vehicles; K s A represents the number of service vehicles; A represents the motion space. i This represents the motion space of task vehicle i. The sub-action space of task vehicle i, where M represents the dimension of the action space; R represents the reward function, Ri comm R represents the reward function for the communication process. comp K represents the reward function in the calculation process. t Indicates the number of vehicles on the mission. It is the signal-to-interference-plus-noise ratio (SIR) of the communication connection between mission vehicle i and service vehicle j. This represents the number of service vehicles assigned to task vehicle i. γ represents the number of task vehicles assigned to service vehicle j; T represents the state transition probability; and γ represents the discount coefficient.

[0029] In some embodiments, the vehicle cloud multi-path computation offloading method provided in this disclosure uses a multi-agent deep reinforcement learning algorithm based on deep deterministic policy gradients to solve the Markov game model, specifically including:

[0030] The parameters of the critic network in a multi-agent deep reinforcement learning algorithm are based on the loss function Loss(θ) of temporal variance. i Update;

[0031]

[0032] The parameters of the actor network contained in the multi-agent deep reinforcement learning algorithm will be updated along the negative method gradient direction L(w) of the loss function;

[0033]

[0034] in, y represents the state-action value obtained by the commentator network. i Let E represent the target state action value obtained by the critic-target network included in the multi-agent deep reinforcement learning algorithm, and let E represent the average value over the long-term reward. This represents the negative gradient.

[0035] In some embodiments, in the vehicle cloud multi-path computing offloading method provided in this disclosure, when it is determined that the idle computing resources of the service vehicle are less than the computing resources required by the task vehicle, a value priority mechanism is adopted to offload the computing task.

[0036] Based on the same inventive concept, embodiments of this disclosure provide a vehicle cloud multi-path computing offloading device, comprising:

[0037] The computational unloading objective function module is configured to define a communication model for a vehicle cloud, which includes service vehicles with idle computing resources and task vehicles with insufficient computing resources. The communication model defines the relationship between the overall effect value of the task vehicles and the computational task unloading variables. The computational unloading objective function is constructed with the goal of maximizing the overall effect value and in combination with multiple constraints.

[0038] The reliability model construction module is configured to determine whether the idle computing resources of the service vehicle are greater than the computing resources required by the task vehicle. If so, a multi-way computing offloading reliability model based on risk factors is introduced. The optimization objective of the multi-way computing offloading reliability model represents maximizing the overall effect value. The risk factor represents the average failure probability of each computing task offloading of the service vehicle.

[0039] The reinforcement learning module is configured to construct a Markov game model for the multi-path computation offloading reliability model, and use a multi-agent deep reinforcement learning algorithm based on deep deterministic policy gradient to solve the Markov game model to obtain the optimal solution for computation task offloading.

[0040] In some embodiments, the vehicle cloud multi-path computing offloading device provided in this disclosure further includes an optimization module configured to optimize the multi-agent deep reinforcement learning algorithm based on the Ornstein-Uhlenbeck process.

[0041] The beneficial effects of the embodiments disclosed herein are as follows:

[0042] The vehicle cloud multi-path computation method and apparatus provided in this disclosure first constructs a computational offloading objective function for the overall effect value of vehicle resources. Maximizing this function value is the optimization objective for computational offloading. For high-density vehicle cloud environments where computing resources are in oversupply, the optimization of the computational offloading objective function is transformed into optimizing a multi-path computational offloading reliability model based on risk factors, thereby improving the success rate and resource utilization of multi-path computational offloading. Furthermore, this disclosure constructs a Markov game model for the multi-path computational offloading reliability model and uses a multi-agent deep reinforcement learning algorithm based on deep deterministic policy gradients to achieve computational offloading. Simultaneously, to avoid convergence to local optima, an Ornstein-Uhlenbeck process based on mean recovery and diffusion is proposed to increase the model's generalization ability. Attached Figure Description

[0043] Figure 1 This is a model diagram of a computing task offloading scenario in a cloud-based vehicle network provided in this embodiment of the disclosure;

[0044] Figure 2 A flowchart of the vehicle cloud multi-path computation and offloading method provided in this embodiment of the disclosure;

[0045] Figure 3 A flowchart of a multi-agent deep reinforcement learning algorithm based on deep deterministic policy gradient provided in an embodiment of this disclosure;

[0046] Figure 4 A comparison of the convergence of the multi-agent deep reinforcement learning algorithm based on deep deterministic policy gradient provided in the embodiments of this disclosure with the reinforcement learning algorithm Q-learning in related technologies;

[0047] Figure 5 A comparison diagram of the reliability of the vehicle cloud multi-path computation offloading method and single-path computation offloading provided in the embodiments of this disclosure;

[0048] Figure 6 This is a structural block diagram of the vehicle cloud multi-path computing unloading device provided in an embodiment of this disclosure. Detailed Implementation

[0049] Exemplary embodiments will now be described more fully with reference to the accompanying drawings. However, these exemplary embodiments can be implemented in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be more thorough and complete, and will fully convey the concept of the exemplary embodiments to those skilled in the art.

[0050] Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. Numerous specific details are provided in the following description to give a thorough understanding of embodiments of this disclosure. However, those skilled in the art will recognize that the technical solutions of this disclosure may be practiced without one or more of the specific details, or other methods, components, apparatuses, steps, etc. may be employed. In other instances, well-known methods, apparatuses, implementations, or operations are not shown or described in detail to avoid obscuring various aspects of this disclosure.

[0051] The block diagrams shown in the accompanying drawings are merely functional entities and do not necessarily correspond to physically independent entities. That is, these functional entities can be implemented in software, in one or more hardware modules or integrated circuits, or in different network and / or processor devices and / or microcontroller devices.

[0052] The flowcharts shown in the accompanying drawings are merely illustrative and do not necessarily include all content and operations / steps, nor do they necessarily have to be performed in the described order. For example, some operations / steps can be broken down, while others can be combined or partially combined; therefore, the actual execution order may change depending on the specific circumstances.

[0053] Figure 1 This paper illustrates a vehicle-cloud-based computing task offloading model in a vehicle-to-everything (V2X) network, primarily consisting of two bidirectional intersections. This scenario involves only vehicles, without base stations or other traffic infrastructure. There are K vehicles in the V2X network, equipped with heterogeneous computing resources and tasks. From the perspective of onboard resource utilization, these vehicles can be categorized into those with idle computing resources and those with insufficient resources. Vehicles with idle resources are defined as service vehicles; their computing resources are sufficient not only to meet their own computing needs but also to provide computing resources to task vehicles, essentially acting as task destination nodes based on the vehicle itself. Vehicles with insufficient computing resources are task vehicles; their own computing resources are insufficient to meet their own V2X computing task requirements, necessitating the offloading of computing tasks to other task destination nodes, essentially acting as task source nodes based on the vehicle itself. The number of service vehicles and task vehicles are K and K respectively. s and K t The model introduces independent variables i and j to represent the task vehicle and the service vehicle, respectively. It is worth noting that in reality, the role of each vehicle as a service vehicle or a task vehicle is not fixed. The occurrence of service vehicles and task vehicles follows a parameter λ. s and λ t The Poisson distribution.

[0054] In addition, each vehicle is equipped with a communication module that supports various communication modes, including end-to-end (Device-to-Device, D2D) V2V communication and vehicle-to-multi-vehicle (V2mV) communication based on non-orthogonal multiple access. Using V2V or V2mV communication, the task vehicle can offload computing tasks to the service vehicle. Network function virtualization (NFV) technology is then used to construct a virtual computing center from the idle resources of the service vehicle.

[0055] The communication environment considers both large-scale fading caused by path loss and small-scale fading caused by relative speed. In vehicle-to-everything (V2X) computing task offloading mainly involves three communication methods: V2V, V2I (vehicle-to-infrastructure), and V2mV. V2V is primarily based on end-to-end communication, with the throughput C between task vehicle i and service vehicle j... i,j It can be represented by the following formula. In the following formula, P i,j and h i,j These represent the power and channel gain of the communication link, respectively, and N0 is the noise of the V2V link. It's worth noting that in the following formula, the V2V communication connection accesses spectrum resources via Orthogonal Frequency Division Multiple Access (OFDMA). The implementation of V2I communication is similar to V2V, and its throughput can also be characterized by the following formula, where N0 is the noise in the V2I communication connection. Since transportation infrastructure typically has many antennas, a single infrastructure unit can communicate with multiple vehicles simultaneously.

[0056]

[0057] It is worth noting that V2V communication primarily utilizes Non-orthogonal multiple access (NOMA) technology. NOMA-based V2V communication not only enables simultaneous, same-frequency transmission from one vehicle to multiple vehicles but also significantly improves spectral efficiency. Furthermore, NOMA technology achieves relatively stable performance gains without relying on user feedback to CSI (Content Status Indicator). Due to the high-speed mobility and latency inherent in vehicular networks, changes in the network environment often fail to provide effective network status information, allowing NOMA technology to leverage its performance advantages. In terms of technical implementation, NOMA uses power multiplexing for signal modulation at the transmitting end and Successive Interference Cancellation (SIC) for signal demodulation at the receiving end. Essentially, the communication gain of NOMA technology is achieved at the expense of receiver complexity.

[0058] Due to the structural limitations of the vehicle itself, the V2mV communication link only considers the case of two signals multiplexed. At channel gain h... i,k and h i,k' When the following relationship is satisfied, the following formula characterizes the power allocation scheme P corresponding to the communication connection from transmitter i to receivers k and k'. i,k <P i,k' P i P i,k and P i,k' These are the total transmission power limit for the V2mV connection, and the transmission power of users k and k', respectively. The channel gain h of user k... i,k In a communication environment where the channel gain is inferior to that of user k', based on SIC technology, P i,k It is usually set to less than P. i,k' The value of C. The throughput C of the two communication links. i,k and C i,k' and overall throughput C i They were characterized separately, among which N0 and N0 are the internal interference and Gaussian white noise of the V2mV link.

[0059] h i,k <h i,k' →P i,k <P i,k' , where P i,k +P i,k' ≤P i ;

[0060]

[0061] C i =C i,k+C i,k' ;

[0062] Specifically, this disclosure provides a vehicle cloud multi-path computation offloading method, such as... Figure 2 As shown, the following steps may be included:

[0063] S201. Define the communication model of the vehicle cloud. The vehicle cloud includes service vehicles with idle computing resources and task vehicles with insufficient computing resources. The communication model defines the relationship between the overall effect value of task vehicles and the computing task offloading variable.

[0064] S202. With the goal of maximizing the overall effect value, and in combination with multiple constraints, construct a calculation unloading objective function;

[0065] S203. Determine whether the idle computing resources of the service vehicle are greater than the computing resources required by the task vehicle. If yes (i.e., supply exceeds demand), proceed to step S204; if no (i.e., supply is less than demand), proceed to step S204'.

[0066] S204. Introduce a multi-way computation unloading reliability model based on risk factors. The optimization objective of the multi-way computation unloading reliability model represents the maximization of the overall effect value, and the risk factor represents the average failure probability of unloading each computation task of the service vehicle.

[0067] S204'. Use a value-first mechanism to offload computing tasks, for example, prioritize offloading computing tasks for vehicle networking services that are related to safety or have high requirements for user service quality.

[0068] S205. For the multi-way computation offloading reliability model, construct a Markov game model;

[0069] S206. A multi-agent deep reinforcement learning algorithm based on deep deterministic policy gradient is used to solve the Markov game model to obtain the optimal solution for unloading the computational task.

[0070] In the above-mentioned vehicle cloud multi-path computation offloading method provided in the embodiments of this disclosure, a computation offloading objective function for the overall effect value of vehicle resources is first constructed. The maximization of this function value is taken as the optimization objective for computation offloading. For high-density vehicle cloud environments where the supply of computing resources exceeds the demand, it is proposed to transform the optimization of the computation offloading objective function into the optimization of the multi-path computation offloading reliability model based on risk factors, so as to improve the success rate and resource utilization of multi-path computation offloading.

[0071] For convenience, this disclosure stipulates that each communication connection accesses spectrum resources via orthogonal multiple access (OMA), thus each communication connection occupies its own spectrum resource block. To better control internal interference introduced by multiplexing techniques in the NOMA mechanism, the number of vehicles connected in a V2mV connection is limited. Since vehicle power resources are relatively abundant, power allocation is set to a fixed value. Therefore, the optimization variable primarily considers the computational task offloading variable L, and the optimization objective is defined as the overall effect value U, which typically considers the computational benefit of vehicle i in the computational order. Benefits of the communication phase And satisfy the following relationship:

[0072]

[0073] Where U represents the total effect value, L i This represents the variable for calculating the unloading of task vehicle i. This represents the reward of vehicle i in the computational order. This represents the benefit gained during the i-communication phase of the mission vehicle.

[0074] In some embodiments, in the vehicle cloud multi-path computational offloading method provided in this disclosure, step S202, with the optimization objective of maximizing the overall effect value and combining multiple constraints, constructs a computational offloading objective function, which can be specifically constructed as follows:

[0075]

[0076]

[0077] Where maxU represents the maximized overall effect value, and for multiple constraints, C1 means that the computational resources allocated to all task vehicles cannot exceed the available resources in the model; C2 means that the computational resources allocated to service vehicle j cannot exceed the computational resources it can provide; L i,j C3 is an index indicating whether task vehicle i will offload its computing resources to service vehicle j; C4 indicates that the computing resources allocated to task vehicle i cannot exceed its required computing resources; C5 and C6 indicate that the maximum number of offloaded connections supported by service vehicle j and task vehicle i cannot exceed the maximum number limits N1 and N2, respectively; C6 represents the transmission rate C. i It should be higher than the specified reference transmission rate R baseline .

[0078] To address the issue of oversupply of computing resources, where the supply exceeds the resource requirements of connected vehicle (V2V) applications, a situation particularly prevalent in V2V scenarios with high vehicle density, and considering the high-speed dynamic nature of V2V systems, computational task offloading faces the risk of failure, a multiplexing-based computational task offloading mechanism is introduced into the model to more fully utilize computing resources and improve the reliability of V2V systems. Specifically, considering the risk of computational task offloading failure during high-speed vehicle movement, which reduces the reliability of computational task offloading, the model first allocates the necessary computing resources to the task vehicles. Then, unused computing resources are redistributed to these task vehicles. Accordingly, a computational task is offloaded from the source node to multiple task nodes for computation. As long as the computational task is successfully offloaded at one of the destination nodes, the sending end can obtain the expected computation result, significantly reducing the probability of computational task offloading failure. In other words, the reliability of computational task offloading is improved by performing multiple rounds of offloading from idle resources. Furthermore, to reflect the risk of computational task offloading failure in V2V systems, a new reliability metric, risk resistance, is introduced. Defined in this disclosure, it can be fully demonstrated in the ability to successfully offload computing tasks in scenarios of dynamic changes in vehicle network resources.

[0079] Specifically, step S204 above, the multi-path calculation offloading reliability model based on risk factors, can be implemented in the following way:

[0080]

[0081]

[0082] in, The reliability index represents the risk resistance capability, ε represents the risk factor, ξ represents the multiplexing factor, and K represents the reliability index. s K represents the number of service vehicles. t Indicates the number of vehicles on the mission. This represents the available resources of service vehicle j. This represents the resource requirements of vehicle i. The risk coefficient ε characterizes the average success probability of offloading each computing task in the vehicle-to-everything (V2X) network, and the multiplexing factor ξ represents the number of times the same computing task is offloaded.

[0083] In some embodiments, the computational task offloading problem in cloud-based vehicle networks can be modeled as a partially observable Markov game model in step S205 above, and the Markov game model is represented by an array (S,A,R,T,γ), where S represents the state space, A represents the action space, R represents the reward function, T represents the state transition probability, and γ represents the discount coefficient. The definitions of the state space S, action space A, and reward function R are given below.

[0084] State space S: For each task vehicle i (i.e., intelligent agent), its state space typically includes factors related to the problem to be solved. Here, the state space can be defined as... Where H i Indicates channel gain. This indicates the resource requirements of mission vehicle i. Ω represents the set of available resources for service vehicle j. s Let j represent the set of service vehicles.

[0085] Action space A: Action space A of mission vehicle i i It is defined in the following formula, where A i Includes a set of sub-action spaces (1≤m≤M); M is the dimension of the action space, which is defined in the following formula. M can usually be determined by the computational resource requirements of the task vehicle i itself. The number of service vehicles j K s To decide. Action set The following formula is given. Compared to the traditional multi-task vehicle DRL model where multiple tasks share the same action dimension, the heterogeneity of computational resources in this model (i.e., different task vehicles may have different resource requirements) results in different action dimensions between different task vehicles.

[0086]

[0087]

[0088]

[0089] In addition, the overall length A of the motion space i and action subset The following formula is given, where This represents the total number of possible combinations of selecting m vehicles from all service vehicles.

[0090]

[0091] The reward function R, as shown in the following equation, maximizes the overall reward value of all task vehicles as the overall effect value U, which is the optimization objective in the calculation unloading objective function defined in step 202. Accordingly, the reward value is defined as the overall reward value of all task vehicles i, which is mainly determined by the communication process R. comm And the calculation process R comp It consists of two parts. R comm The main consideration was the signal-to-interference-plus-noise ratio (SIR) of the communication connection between mission vehicle i and service vehicle j, R. comp The main considerations are the number of service vehicles for task vehicle i and the number of task vehicles assigned to service vehicle j.

[0092] R = R comm +R comp ;

[0093]

[0094] in, It is the signal-to-interference-noise ratio (SINR) of the communication connection between mission vehicle i and service vehicle j. and These represent the number of service vehicles assigned to task vehicle i and the number of task vehicles assigned to service vehicle j, respectively. Since the complexity of power multiplexing in a NOMA-based communication connection increases dramatically with the number of users, each computational task vehicle will gain more benefits from having fewer service vehicles assigned to it.

[0095] Based on the computational task unloading optimization objective determined by the intelligent optimization objective mechanism, an appropriate deep reinforcement learning model is intelligently selected to achieve computational task unloading. For each computational task unloading optimization objective, a multi-agent deep reinforcement learning (DRL) algorithm based on deep deterministic policy gradient (MADDPG) is introduced for solution. As a variant of the Actor-Critic algorithm, the DDPG network consists of four parts: an Actor network, an Actor-Target network, a Critic network, and a Critic-Target network. The Critic network is a neural network that takes the environment state and action selection as input and the state-action value as output. The Actor network is a parameterized method network that takes the environment state as input and the selected action as output.

[0096] In step S206 of this embodiment, during the process of solving the Markov game model using a multi-agent deep reinforcement learning algorithm based on deep deterministic policy gradients: in order to reduce the gap between the prediction value and the actual value of the commentator network, the parameters of the commentator network are based on the loss function Loss(θ) of the temporal variance in the following formula. i Update it.

[0097]

[0098] in, y represents the state-action value obtained by the commentator network. i This represents the target state action value obtained by the critic target network contained in the multi-agent deep reinforcement learning algorithm.

[0099] For the action-player network, the goal is to select the optimal action with the maximum reward. Without loss of generality, as shown in the following equation, the parameters of the action-player network are updated along the negative method gradient direction L(w) of the loss function. Due to the deterministic approach used in the DDPG algorithm, the method gradient is directly applied to the negative gradient used to calculate the Q-value in the following equation.

[0100]

[0101] Where E represents the average value over long-term returns, s represents the state space, a represents the action space, θ represents the value network parameters, μ represents the policy network, and Q represents the current value.

[0102] To avoid convergence to a local optimum, while performing step S206 and employing a multi-agent deep reinforcement learning algorithm based on deep deterministic policy gradients to solve the Markov game model, specifically, as follows: Figure 3 As shown, after performing step S2061, providing a multi-agent deep reinforcement learning algorithm based on depth-determined policy gradients, and before performing step S2063, performing centralized training and distributed execution of the multi-agent deep reinforcement learning algorithm, step S2062 can be performed to optimize the multi-agent deep reinforcement learning algorithm based on the Ornstein-Uhlenbeck process. Specifically, noise N based on the Ornstein-Uhlenbeck process can be added to the output of the actor network. e This is to increase the model's exploratory capabilities.

[0103] N e =κ(μ-X(t))dt+σ n Y(t);

[0104] Where κ represents mean regression, μ represents the mean, X(t) represents a random variable with a mean μ, dt represents the variance, and σ n The weights of diffusion are represented by Y(t), which represents a random variable that follows a Gaussian distribution with mean 0 and variance dt.

[0105] To meet the computational resource requirements of each task vehicle i, the model can be based on the probability distribution output by the actor network. Choose in descending order One action. In this case, an array {S} e A e ,R e ,S e+1 It is stored in the experience replay pool D.

[0106] Specifically, in step S2063 above, which involves centralized training and distributed execution of the multi-agent deep reinforcement learning algorithm, the centralized training phase is completed in a virtual computing center within the vehicle network. Essentially, it consists of a cluster of service vehicles traveling on the road. Due to the short transmission distance and relatively low speed between service vehicle i and its neighboring task vehicles j, a relatively stable communication environment can be established. After collecting training samples, the service vehicles interact with other vehicles using idle channels in a roulette-like manner. This vehicle cluster can be considered a sample collector to some extent, providing the training set required for centralized training. After the training phase is completed, the parameter update process for the critic network and the actor network ends. Each task vehicle downloads its method network from nearby neighboring service vehicles, achieving distributed execution and completing the offloading of computational tasks. The target network parameter θ... Q ' and θ μ It can be updated using the following formula.

[0107]

[0108] Where τ is the weight of the update step, which is usually defined as a positive integer much less than 1, θ represents the value network parameters, μ represents the policy network, μ' represents the target policy network parameters, Q' represents the target value, and Q represents the current value.

[0109] This disclosure also verifies the convergence of the proposed method in a vehicle-to-everything (V2X) scenario with 3 mission vehicles and 6 service vehicles, such as... Figure 4 As shown. By Figure 4It is evident that, compared to Q-learning-based multi-agent reinforcement learning algorithms in related technologies, the MADDPG-based multi-agent deep reinforcement learning algorithm proposed in this disclosure exhibits a significant advantage in convergence performance. Specifically, thanks to the centralized training mechanism, the MADDPG-based multi-agent deep reinforcement learning algorithm demonstrates stable convergence. Conversely, Q-learning is a list-based reinforcement learning algorithm, whose state and action space tends to grow explosively with the increase in agents. Q-learning-based multi-agent RL algorithms primarily utilize an ε-greedy action exploration method for action selection, where the agent selects the maximum value with probability 1-ε or randomly selects an action with probability ε. During this action exploration process, some states and actions will not be searched at all, resulting in poor performance. In summary, by utilizing the introduced centralized training and distributed execution architecture, the MADDPG-based multi-agent deep reinforcement learning algorithm proposed in this disclosure can guarantee the convergence of the deep reinforcement learning model.

[0110] To verify the superior overall reward performance of the proposed method, a vehicle-to-everything (V2X) model with a total of 12 vehicles was constructed, where the number of task vehicles increased from 3 to 9, and the number of service vehicles decreased from 9 to 3. Figure 4 In comparison with the Q-learning algorithm, the proposed MADDPG-based multi-agent deep reinforcement learning algorithm shows significant performance advantages. Its superiority primarily stems from the additional information introduced by the reviewer network through reward sharing during the centralized training phase. Furthermore, as the number of task vehicles increases, the performance gap between the proposed MADDPG-based multi-agent deep reinforcement learning algorithm and the Q-learning algorithm shows an increasing trend, fully demonstrating the effectiveness of the proposed MADDPG-based multi-agent deep reinforcement learning algorithm in competitive environments. In vehicular network scenarios with fewer than six task vehicles, resources are in a state of oversupply. As the number of task vehicles increases, the ample computing resources supplied by service vehicles are used to offload computational tasks to meet the increasing computing resource demands of task vehicles, thus the overall reward shows a gradually increasing trend. Notably, when there are six task vehicles in the system, the overall reward reaches its maximum, corresponding to a state of supply-demand equilibrium. In this case, all computing resources in the system can be fully utilized. Then, as the number of service vehicles continues to decrease while the number of task vehicles continues to increase, the resource state changes from oversupply to undersupply.

[0111] Figure 5The performance of the deep reinforcement learning algorithm based on the multiplexing offloading mechanism was verified. The baseline corresponds to the success rate of single-path offloading in the computational task when the risk coefficient ε = 0.8. Conversely, in the schemes with ε = 0.7, ε = 0.8, and ε = 0.9, computational resources are repeatedly allocated to other task vehicles until all resources are allocated. Figure 5 It can be seen that as the number of service vehicles increases, the advantage of the multiplexing offloading mechanism over the baseline scheme in terms of offloading success rate gradually widens. The risk coefficient ε represents the average success probability of offloading each computing task in the vehicle network. As the risk coefficient ε increases, the improvement potential of the multiplexing offloading mechanism in terms of offloading success rate gradually decreases. This is because in the vehicle network environment with a high risk coefficient, the success rate of offloading tasks is higher, thus weakening the advantage of the multiplexing offloading mechanism. In summary, the proposed scheme can improve the reliability of computing task offloading by about 20%. Based on the above analysis, it can be seen that the proposed method can significantly improve the reliability of computing task offloading in the vehicle network.

[0112] Based on the same inventive concept, this disclosure provides a vehicle cloud multi-path computing offloading device. Since the principle of this vehicle cloud multi-path computing offloading device in solving the problem is similar to that of the above-mentioned vehicle cloud multi-path computing offloading method in solving the problem, the implementation of the vehicle cloud multi-path computing offloading device provided in this disclosure can refer to the implementation of the above-mentioned vehicle cloud multi-path computing offloading method provided in this disclosure, and the repeated parts will not be described again.

[0113] Specifically, the vehicle cloud multi-path computing offloading device provided in this disclosure embodiment, such as... Figure 6 As shown, it includes: a computational offloading objective function module 601, which is configured to define a communication model of a vehicle cloud, the vehicle cloud including service vehicles with idle computing resources and task vehicles with insufficient computing resources. The communication model defines the relationship between the overall effect value of the task vehicles and the computational offloading variables; and constructs a computational offloading objective function with the optimization objective of maximizing the overall effect value and combining multiple constraints; and a reliability model construction module 602, which is configured to determine whether the idle computing resources of the service vehicles are greater than the computing resources required by the task vehicles. If so, a multi-way computational offloading reliability model based on risk factors is introduced. The optimization objective of the multi-way computational offloading reliability model represents the maximized overall effect value, and the risk factor represents the average failure probability of each computational task offloading of the service vehicles.

[0114] The reinforcement learning module 603 is configured to construct a Markov game model for the multi-way computation offloading reliability model, and use a multi-agent deep reinforcement learning algorithm based on deep deterministic policy gradient to solve the Markov game model to obtain the optimal solution for computation task offloading.

[0115] Optionally, in the vehicle cloud multi-path computing offloading device provided in the embodiments of this disclosure, such as Figure 6 As shown, to avoid convergence to a local optimum, it may also include: an optimization module 604, which is configured to optimize the multi-agent deep reinforcement learning algorithm based on the Ornstein-Uhlenbeck process.

[0116] Those skilled in the art will understand that embodiments of this disclosure can be provided as methods, systems, or computer program products. Therefore, this disclosure can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this disclosure can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0117] This disclosure is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to this disclosure. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0118] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0119] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0120] Obviously, those skilled in the art can make various modifications and variations to this disclosure without departing from its spirit and scope. Therefore, if such modifications and variations fall within the scope of the claims of this disclosure and their equivalents, this disclosure is also intended to include such modifications and variations.

Claims

1. A vehicle cloud multi-path computational offloading method, characterized in that, include: A communication model for a vehicle cloud is defined, comprising service vehicles with idle computing resources and task vehicles with insufficient computing resources. The communication model defines the relationship between the overall effect value of the task vehicles and the computing task offloading variable; the overall effect value includes the sum of the benefits of the task vehicles in the computing phase and the benefits in the communication phase. With the optimization objective of maximizing the overall effect value, and in conjunction with multiple constraints, a calculation objective function for unloading is constructed, specifically including: in, This represents the maximized total effect value. Indicates the mission vehicle The computational task unloads variables. This means that the computing resources allocated to all mission vehicles cannot exceed the available resources in the model; Indicates service vehicles The allocated computing resources cannot exceed the computing resources that can be provided. It is an index that indicates whether it is a mission vehicle. It will offload its computing resources to service vehicles. ; Indicates allocation to task vehicles The computing resources available to a device cannot exceed the computing resources it requires. and These respectively represent service vehicles Supported by mission vehicles The maximum number of unloaded connections cannot exceed the maximum limit. and ; Indicates the transmission rate It should be higher than the specified reference transmission rate. ; If the idle computing resources of the service vehicle are greater than the computing resources required by the task vehicle, a multi-path computing offloading reliability model based on risk factors is introduced. The multi-path computing offloading reliability model aims to maximize the risk resistance of the reliability index. The risk resistance of the reliability index is calculated based on the risk factor and the multi-path multiplexing factor. The optimization objective of the multi-path computing offloading reliability model represents the maximization of the overall effect value. The risk factor represents the average failure probability of offloading each computing task of the service vehicle. The multi-path multiplexing factor represents the number of times the same computing task is offloaded. For the aforementioned multi-path computation offloading reliability model, a Markov game model is constructed, and a multi-agent deep reinforcement learning algorithm based on deep deterministic policy gradients is used to solve the Markov game model to obtain the optimal solution for computation task offloading, specifically including: The parameters of the critic network in a multi-agent deep reinforcement learning algorithm are based on a loss function with temporal variance. Update; The parameters of the actor network in a multi-agent deep reinforcement learning algorithm will be along the direction of the negative method gradient of the loss function. Update; in, This represents the state-action values ​​obtained by the commentator network. Let E represent the target state action value obtained by the critic-target network included in the multi-agent deep reinforcement learning algorithm, and let E represent the average value over the long-term reward. Let w represent the negative gradient, and w represent the parameters of the loss function. Representing the state space, Let R represent the action space, and R represent the reward function. Represents the parameters of the value network. Represents the policy network, Indicates the network parameters of the target policy. Indicates current value. This represents the discount factor.

2. The vehicle cloud multi-path computation and offloading method as described in claim 1, characterized in that, While solving the Markov game model using a multi-agent deep reinforcement learning algorithm based on deep deterministic policy gradients, the algorithm also includes optimizing the multi-agent deep reinforcement learning algorithm based on the Ornstein-Uhlenbeck process.

3. The vehicle cloud multi-path computation and offloading method as described in claim 2, characterized in that, The multi-agent deep reinforcement learning algorithm is optimized based on the Ornstein-Uhlenbeck process, specifically including: Noise based on the Ornstein-Uhlenbeck process is added to the output of the actor network contained in the multi-agent deep reinforcement learning algorithm. Among them, noise The formula is as follows: ; in, This indicates mean reversion. This represents the mean. Represents a value with mean random variables, Represents variance. The weights representing diffusion. This represents a sequence of elements that follow a mean of 0 and a variance of 0. A random variable with a Gaussian distribution.

4. The vehicle cloud multi-path computation and offloading method as described in any one of claims 1 to 3, characterized in that, The multi-path computational offloading reliability model based on risk factors specifically includes: in, This indicates the reliability index's ability to withstand risks. Indicates risk factors, Indicates the multiplexing factor. Indicates the number of service vehicles. Indicates the number of vehicles on the mission. Indicates service vehicles Idle resources Indicates the mission vehicle Resource requirements.

5. The vehicle cloud multi-path computation and offloading method as described in any one of claims 1 to 3, characterized in that, Constructing a Markov game model specifically includes: using arrays Characterizing the Markov game model, where, Representing the state space, For channel gain, It is a mission vehicle resource requirements, The set of available resources for serving vehicles. It is a collection of service vehicles; Indicates the number of service vehicles; Represents the action space, Indicates the mission vehicle The space of motion mission vehicles The sub-action space, where M represents the dimension of the action space; Represents the reward function, Represents the reward function for the communication process. This represents the reward function used in the calculation process. Indicates the number of vehicles on the mission. It is a mission vehicle Service vehicles The signal-to-interference-plus-noise ratio of the communication connection between them Indicates allocation to task vehicles The number of service vehicles. Indicates allocation to service vehicles The number of mission vehicles; Represents the state transition probability. This represents the discount factor.

6. The vehicle cloud multi-path computation and offloading method as described in any one of claims 1 to 3, characterized in that, When it is determined that the available computing resources of the service vehicle are less than the computing resources required by the task vehicle, a value-first mechanism is used to offload the computing task.

7. A vehicle cloud multi-path computing unloading device, characterized in that, include: The computational unloading objective function module is configured to define a communication model for a vehicle cloud, which includes service vehicles with idle computing resources and task vehicles with insufficient computing resources. The communication model defines the relationship between the overall effect value of the task vehicles and the computational task unloading variable. The overall effect value includes the sum of the task vehicles' gains in the computation phase and the gains in the communication phase. The computational unloading objective function is constructed with the goal of maximizing the overall effect value and in combination with multiple constraints. The reliability model construction module is configured to determine whether the idle computing resources of the service vehicle are greater than the computing resources required by the task vehicle. If so, a multi-path computing offloading reliability model based on risk factors is introduced. The multi-path computing offloading reliability model aims to maximize the risk resistance of reliability indicators. The risk resistance of reliability indicators is calculated based on the risk factors and the multi-path multiplexing factor. The optimization objective of the multi-way computation offloading reliability model is to maximize the overall effect value, the risk factor is to represent the average failure probability of offloading each computation task of the service vehicle, and the multiplexing factor represents the number of times the same computation task is offloaded. The reinforcement learning module is configured to construct a Markov game model for the multi-path computation offloading reliability model, and use a multi-agent deep reinforcement learning algorithm based on deep deterministic policy gradient to solve the Markov game model to obtain the optimal solution for computation task offloading.

8. The vehicle cloud multi-path computing offloading device as described in claim 7, characterized in that, Also includes: The optimization module is configured to optimize the multi-agent deep reinforcement learning algorithm based on the Ornstein-Uhlenbeck process.