Cluster resource management and task scheduling method and system based on deep reinforcement learning

A technology of reinforcement learning and task scheduling, applied in neural learning methods, resource allocation, electrical digital data processing, etc., can solve problems such as inability to take into account multiple goals at the same time, low utilization of system resources, and inability to achieve box packing effects, etc. Achieve the effects of reducing the number of resource fragments, improving system efficiency, and saving manpower

Pending Publication Date: 2020-11-20
PEKING UNIV
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Since task scheduling and resource management are essentially a bin packing problem, this problem has been proven to be an NPC problem and cannot be optimally solved in polynomial time
The main problem of the existing methods based on heuristic and artificial rules is: on the one hand, heuristic rules are easy to fall into the local optimal solution, and cannot achieve a good box packing effect, that is, there are many fragmented resources, and the utilization of system resources The rate is not high; on the other hand, when the task composition or cluster

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cluster resource management and task scheduling method and system based on deep reinforcement learning
  • Cluster resource management and task scheduling method and system based on deep reinforcement learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below through specific embodiments and accompanying drawings.

[0028] Such as figure 1 As shown, the present invention mainly includes a task queue to be scheduled, a resource scheduling management agent, a task scheduling module and a simulator, and is mainly divided into two processes of training and running.

[0029] The training process is to collect task running conditions for a period of time on the computer cluster, including task resource demand vectors, task request time and other attributes. Based on the collected data, in the simulator for figure 2 The Q neural network shown is trained until the expected goal is reached, then the training ends.

[0030] After the training will be figure 2 The Q neural network parameters shown are copied to figure 1 The resource scheduling management agent in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a cluster resource management and task scheduling method and system based on deep reinforcement learning. The method comprises the steps: placing tasks needing to be operatedin a to-be-scheduled task queue; sequentially processing the tasks in the to-be-scheduled task queue through a resource scheduling management intelligent agent, and generating a scheduling decision according to cluster resource conditions and resource requirements of the tasks, wherein the resource scheduling management intelligent agent is a neural network obtained by training by using a deep reinforcement learning method according to historical task records running on a cluster; and according to the scheduling decision, scheduling the task to a corresponding machine in the cluster for execution. According to the invention, the utilization rate of cluster resources and the throughput rate of the system can be improved, and computer cluster resource allocation can be self-adaptive when thetask load condition changes; according to the method, the task response time can be shorter, the number of cluster machines can be reduced under the condition of the same load, and the method and system have important significance in saving energy and protecting the environment.

Description

technical field [0001] The invention belongs to the technical field of computer software, and relates to a resource management and task scheduling method and system for large-scale computer clusters. Background technique [0002] In today's information age, people cannot live without the various services provided by the Internet, which has also spawned a large number of giant Internet companies, such as Alibaba, Baidu, Tencent, Meituan, etc.; at the same time, cloud computing and The concept of the Industrial Internet of Things is also booming. Originally scattered small computer clusters are gradually migrating to large clusters of cloud computing vendors, such as Alibaba Cloud and AWS. This makes large Internet companies and cloud computing vendors need to maintain a large number of computer clusters and countless applications and tasks running on them. Maintaining the normal operation of a large number of clusters requires a lot of power consumption, which is an importan...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/50G06F9/48G06F9/54G06N3/04G06N3/08
CPCG06F9/5061G06F9/4881G06F9/546G06N3/084G06F2209/548G06N3/045Y02D10/00
Inventor 张正超肖臻毛航宇潘丽晨
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products