Multi-agent reinforcement learning method for data unloading of Internet of Things

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of reinforcement learning and multi-agents, applied in the direction of reducing energy consumption, advanced technology, electrical components, etc., can solve problems such as slow learning speed, and achieve the effect of improving learning rate and performance

Pending Publication Date: 2022-03-22

SUN YAT SEN UNIV SHENZHEN +1

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In recent years, intelligent reflective surface (IRS) is considered to be a promising technology because it can improve the quality and spectral efficiency of wireless communication. The use of machine learning for IRS regulation has strong robustness, but deep neural The offline training of network (DNN) relies on exhaustive search and based on alternating optimization (AO) method, although the optimal policy can be learned from scratch, the learning speed is usually slow

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment 1

[0036] Aiming at the problem of continuous variable control in the dynamic environment of agents, the multi-agent deep deterministic policy gradient (MADDPG) method is used to solve it: assuming that there are n agents (agents), we design a corresponding action set a for each agent 1 ,...,a n , the corresponding observation quantity o 1 ,...,o n . The state transition equation includes all states, actions and observations: in Each agent's policy consists only of its own state and actions: μ i :o i →a i . Each agent uses a deep critic network to approximate the Q function, that is, the i-th agent's critic network learns the action value function: in The parameters of the critic network will be updated every decision epoch, thus outputting a better approximation of the Q value. This can be achieved by training a deep neural network (DNN) to minimize the loss function:

[0037]

[0038] in:

[0039]

[0040] represents the output of the target-critic netwo...

specific Embodiment 2

[0048] For the highly coupled multivariate learning complexity and target value estimation problem, an optimization-driven deep deterministic policy gradient (DDPG) method is designed:

[0049] Specifically, the action a t =(θ i ,ω i ,ρ i ,τ i ,k i ) into the global action a c,t = θ i with local action a o,t =(ω i ,ρ i ,τ i ,k i );

[0050] Define a deep deterministic policy gradient (DDPG) module and an optimization module, and the DDPG module generates a global action a c,t , the optimization module generates a local action a o,t .

[0051] At the beginning of the iteration, the DDPG module outputs the global action a c,t = θ i , and input the optimization model;

[0052] The optimization module first fixes the phase θ i , to solve the active beamforming strategy ω by solving the equivalent convex problem i , then fix the above parameters, perform inner layer iterations, and alternately solve the reflection coefficient ρ i and time slot division ratio τ ...

specific Embodiment 3

[0060] For the problem of multi-agent information fusion, in the MADDPG framework, each user generates an estimate of the target value y by the target-critic network, and generates estimates of other user strategies by the approximate policy network j≠i, in order to complete the information fusion of multi-agents. In the early stage of learning, due to the random initialization of the critic network and the approximate policy network, the estimation of y and the policy estimation far from the optimal value. This problem can be tackled with an optimization-driven hierarchical reinforcement learning approach. Estimate the lower bound of the target value y by solving approximate optimization problems and approximate strategies for other agents Specifically, system participants are divided into high-level controllers and low-level multi-user agents. The controller agent has a DDPG module and an optimization module. Through the optimization module in Example 2, the action est...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-agent reinforcement learning method for data unloading of the Internet of Things, and the method comprises the steps: carrying out the joint optimization of active and passive beam forming and a resource allocation decision of a user in a multi-terminal scene of the Internet of Things, and formulating a power minimization problem; and constructing a Markov decision process, and solving a power minimization problem based on multi-agent reinforcement learning. According to the method, the optimization problem is solved in a layered manner, so that multi-agent deep reinforcement learning is improved, and the learning rate and the performance are remarkably improved. The multi-agent reinforcement learning method for data unloading of the Internet of Things can be widely applied to the field of wireless communication.

Description

technical field [0001] The invention relates to the field of wireless communication, in particular to a multi-agent reinforcement learning method for data offloading of the Internet of Things. Background technique [0002] With the rapid development of wireless communication networks, the number of terminals connected to the network is increasing, and IoT devices represented by sensor nodes will exist widely. How to ensure the power supply of these ubiquitous devices is an urgent problem to be solved by the Internet of Things. The wireless power communication network uses radio frequency energy signals to transmit energy to passive terminals, which is an important way to solve the energy limitation problem of Internet of Things devices. In recent years, intelligent reflective surface (IRS) is considered to be a promising technology because it can improve the quality and spectral efficiency of wireless communication. The use of machine learning for IRS regulation has strong r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): H04W72/04

CPCH04W72/53Y02D30/70

Inventor 龚世民谭源正刘玥周航

Owner SUN YAT SEN UNIV SHENZHEN

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-agent reinforcement learning method for data unloading of Internet of Things

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment 1

specific Embodiment 2

specific Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology