Multi-agent reinforcement learning scheduling method and system, and electronic device

A reinforcement learning and multi-agent technology, applied in the field of multi-agent systems, can solve problems such as strategies without coordination, resource allocation that cannot be realized, and training difficulties

Active Publication Date: 2019-06-28
SHENZHEN INST OF ADVANCED TECH
View PDF4 Cites 66 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the field of multi-agent reinforcement learning, if traditional Q-learning, PG (Policy Gradient Method, strategy gradient algorithm) and other reinforcement learning algorithms are used for distributed learning, the expected results will still not be achieved, because each agent in each step All agents try to learn to predict the actions of other agents, and in a dynamic environment, other agents are always changing, so the environment will become unstable, it is difficult to learn knowledge, and it is impossible to achieve optimal resource allocation.
In addition, from the perspective of reinforcement learning methods, most of the current scheduling methods are single-agent reinforcement learning and distributed reinforcement learning. If only one agent is used for centralized training, it will be difficult due to the complex state changes and permutations and combinations under the network topology. A large amount of action space makes the algorithm difficult to train and difficult to converge
The method of using distributed reinforcement learning also faces another problem. The usual distributed reinforcement learning is to speed up the convergence speed through joint training of multiple agents, but in fact the scheduling strategies of these agents are the same, but During the training process, multiple avatars are used to speed up the training, so the final results are homogeneous agents that do not have the ability to cooperate
In the traditional multi-agent method, each agent will predict the decisions of other agents at each step of decision-making, but because the decisions of other agents are also unstable in a dynamic environment, training is very difficult and each agent can do Things are pretty much the same without a collaborative strategy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-agent reinforcement learning scheduling method and system, and electronic device
  • Multi-agent reinforcement learning scheduling method and system, and electronic device
  • Multi-agent reinforcement learning scheduling method and system, and electronic device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.

[0045] In order to solve the deficiencies in the existing technology, the multi-agent reinforcement learning scheduling method of the embodiment of the application uses the multi-agent reinforcement learning technology in the field of reinforcement learning, according to the load information on each service node in the cloud service environment Modeling, using cyclic neural network to learn timing information to make decisions, training an agent for each server, and competing or cooperating among agents with different tasks to maintain load balancing under the entire network topology. Afte...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a multi-agent reinforcement learning scheduling method, a multi-agent reinforcement learning scheduling system and electronic equipment. The method comprises the following steps: step a, collecting server parameters of a network data center and virtual machine load information running on each server; b, establishing a virtual simulation environment by using the parametersof the server and the load information of the virtual machine, and establishing a deep reinforcement learning model of multiple agents; c, performing offline training and learning by utilizing the deep reinforcement learning model of the multiple agents and a simulation environment, and training an agent model for each server; and d, deploying the agent model to a real service node, and schedulingaccording to the load condition of each service node. The service running on the server is virtualized through the virtualization technology, load balancing is carried out in a virtual machine scheduling mode, resource distribution is more macroscopic, and a strategy that multiple agents generate cooperation in a complex dynamic environment can be achieved.

Description

technical field [0001] The application belongs to the technical field of multi-agent systems, and in particular relates to a multi-agent reinforcement learning scheduling method, system and electronic equipment. Background technique [0002] In the cloud computing environment, it is difficult for the traditional service deployment method to cope with changing access methods. Although the fixed allocation of resources can provide services stably, there is also a large amount of waste of resources. For example, under the same network topology , some servers may often run at full load, while some servers only deploy a few services and still have a lot of unused storage space and computing power. It can be seen that traditional deployment services are difficult to cope with this waste of resources, and it is difficult to implement Efficient scheduling makes it impossible to use resources efficiently. Therefore, a scheduling algorithm that can adapt to the dynamic environment is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50G06F9/48G06N3/04G06N3/08
CPCG06F9/48G06F9/50G06N3/04G06N3/08
Inventor 任宏帅王洋须成忠
Owner SHENZHEN INST OF ADVANCED TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products