Supercharge Your Innovation With Domain-Expert AI Agents!

Multi-agent reinforcement learning method for data unloading of Internet of Things

A technology of reinforcement learning and multi-agents, applied in the direction of reducing energy consumption, advanced technology, electrical components, etc., can solve problems such as slow learning speed, and achieve the effect of improving learning rate and performance

Pending Publication Date: 2022-03-22
SUN YAT SEN UNIV SHENZHEN +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In recent years, intelligent reflective surface (IRS) is considered to be a promising technology because it can improve the quality and spectral efficiency of wireless communication. The use of machine learning for IRS regulation has strong robustness, but deep neural The offline training of network (DNN) relies on exhaustive search and based on alternating optimization (AO) method, although the optimal policy can be learned from scratch, the learning speed is usually slow

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-agent reinforcement learning method for data unloading of Internet of Things
  • Multi-agent reinforcement learning method for data unloading of Internet of Things
  • Multi-agent reinforcement learning method for data unloading of Internet of Things

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment 1

[0036] Aiming at the problem of continuous variable control in the dynamic environment of agents, the multi-agent deep deterministic policy gradient (MADDPG) method is used to solve it: assuming that there are n agents (agents), we design a corresponding action set a for each agent 1 ,...,a n , the corresponding observation quantity o 1 ,...,o n . The state transition equation includes all states, actions and observations: in Each agent's policy consists only of its own state and actions: μ i :o i →a i . Each agent uses a deep critic network to approximate the Q function, that is, the i-th agent's critic network learns the action value function: in The parameters of the critic network will be updated every decision epoch, thus outputting a better approximation of the Q value. This can be achieved by training a deep neural network (DNN) to minimize the loss function:

[0037]

[0038] in:

[0039]

[0040] represents the output of the target-critic netwo...

specific Embodiment 2

[0048] For the highly coupled multivariate learning complexity and target value estimation problem, an optimization-driven deep deterministic policy gradient (DDPG) method is designed:

[0049] Specifically, the action a t =(θ i ,ω i ,ρ i ,τ i ,k i ) into the global action a c,t = θ i with local action a o,t =(ω i ,ρ i ,τ i ,k i );

[0050] Define a deep deterministic policy gradient (DDPG) module and an optimization module, and the DDPG module generates a global action a c,t , the optimization module generates a local action a o,t .

[0051] At the beginning of the iteration, the DDPG module outputs the global action a c,t = θ i , and input the optimization model;

[0052] The optimization module first fixes the phase θ i , to solve the active beamforming strategy ω by solving the equivalent convex problem i , then fix the above parameters, perform inner layer iterations, and alternately solve the reflection coefficient ρ i and time slot division ratio τ ...

specific Embodiment 3

[0060] For the problem of multi-agent information fusion, in the MADDPG framework, each user generates an estimate of the target value y by the target-critic network, and generates estimates of other user strategies by the approximate policy network j≠i, in order to complete the information fusion of multi-agents. In the early stage of learning, due to the random initialization of the critic network and the approximate policy network, the estimation of y and the policy estimation far from the optimal value. This problem can be tackled with an optimization-driven hierarchical reinforcement learning approach. Estimate the lower bound of the target value y by solving approximate optimization problems and approximate strategies for other agents Specifically, system participants are divided into high-level controllers and low-level multi-user agents. The controller agent has a DDPG module and an optimization module. Through the optimization module in Example 2, the action est...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-agent reinforcement learning method for data unloading of the Internet of Things, and the method comprises the steps: carrying out the joint optimization of active and passive beam forming and a resource allocation decision of a user in a multi-terminal scene of the Internet of Things, and formulating a power minimization problem; and constructing a Markov decision process, and solving a power minimization problem based on multi-agent reinforcement learning. According to the method, the optimization problem is solved in a layered manner, so that multi-agent deep reinforcement learning is improved, and the learning rate and the performance are remarkably improved. The multi-agent reinforcement learning method for data unloading of the Internet of Things can be widely applied to the field of wireless communication.

Description

technical field [0001] The invention relates to the field of wireless communication, in particular to a multi-agent reinforcement learning method for data offloading of the Internet of Things. Background technique [0002] With the rapid development of wireless communication networks, the number of terminals connected to the network is increasing, and IoT devices represented by sensor nodes will exist widely. How to ensure the power supply of these ubiquitous devices is an urgent problem to be solved by the Internet of Things. The wireless power communication network uses radio frequency energy signals to transmit energy to passive terminals, which is an important way to solve the energy limitation problem of Internet of Things devices. In recent years, intelligent reflective surface (IRS) is considered to be a promising technology because it can improve the quality and spectral efficiency of wireless communication. The use of machine learning for IRS regulation has strong r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): H04W72/04
CPCH04W72/53Y02D30/70
Inventor 龚世民谭源正刘玥周航
Owner SUN YAT SEN UNIV SHENZHEN
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More