Cloud resource online scheduling method and device for distributed machine learning task

A machine learning and distributed technology, applied in the field of cloud computing, can solve problems such as poor scheduling effect

Active Publication Date: 2019-08-02
长三角信息智能创新研究院
View PDF5 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] In view of this, the present invention provides an online scheduling method and device for cloud resources oriented to distributed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cloud resource online scheduling method and device for distributed machine learning task
  • Cloud resource online scheduling method and device for distributed machine learning task
  • Cloud resource online scheduling method and device for distributed machine learning task

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0094] This embodiment provides an online scheduling method for cloud resources oriented to distributed machine learning tasks, please refer to figure 1 , the method includes:

[0095] Step S1: At the beginning of each period, the cloud resource broker observes the price function of various resources in each geographically distributed data center and the amount of data that needs to be trained for each machine learning task. Among them, the geographically distributed data center uses For placing computing nodes and parameter servers, machine learning tasks are submitted by users, and at each moment, the amount of data that each task needs to be trained at the next moment is generated, and the price function is

[0096]

[0097] Among them, h is the resource usage, is the threshold, with For setting adjustable parameters based on actual resource prices.

[0098] Specifically, the inventors of the present application found through a lot of practice and research that ...

Embodiment 2

[0154] This embodiment provides an online scheduling device for cloud resources oriented to distributed machine learning tasks. Please refer to Figure 4 , the device consists of:

[0155] The price function and data volume observation module 201 is used to observe the price function of various resources in each geographically distributed data center and the amount of data that each machine learning task needs to train at the beginning of each period, wherein the geographical distribution The data center is used to place computing nodes and parameter servers. Machine learning tasks are submitted by users, and at each moment, the amount of data that each task needs to be trained at the next moment is generated. The price function is

[0156]

[0157] Among them, h is the resource usage, is the threshold, with To set adjustable parameters according to actual resource prices;

[0158] The cost calculation module 202 is used to calculate the cost generated in the proce...

Embodiment 3

[0185] See Figure 5 , based on the same inventive concept, the present application also provides a computer-readable storage medium 300, on which a computer program 311 is stored, and when the program is executed, the method as described in the first embodiment is implemented.

[0186] Since the computer-readable storage medium introduced in the third embodiment of the present invention is a computer device used to implement the cloud resource online scheduling method for distributed machine learning tasks in the first embodiment of the present invention, based on the introduction in the first embodiment of the present invention method, those skilled in the art can understand the specific structure and deformation of the computer-readable storage medium, so details are not repeated here. All computer-readable storage media used in the method in Embodiment 1 of the present invention fall within the scope of protection intended by the present invention.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a cloud resource online scheduling method for distributed machine learning tasks. The method comprises the following steps: firstly, starting from each time period, enabling acloud resource broker to observe a price function of each resource of each data center, and the data volume of each task needing to be trained; calculating the sum of all costs generated in the distributed machine learning task scheduling process; representing the sum with integral linear programming; decoupling the relationship between every two adjacent time periods of the relaxed linear programming through a regularization method; converting an online planning problem of the whole T moment which is difficult to process into independent linear planning at each moment. In this way, real-timedecision making can be carried out without depending on future information; and finally, the designed independent rounding method is adopted to solve the deployment scheme and the data migration scheme of the computing node and the parameter server of each machine learning task at each moment, so that the total cost sum is minimum on the basis of ensuring the task completion effect, and the scheduling effect is optimized.

Description

technical field [0001] The invention relates to the technical field of cloud computing, in particular to an online scheduling method and device for cloud resources oriented to distributed machine learning tasks. Background technique [0002] Traditional machine learning tends to gather all data sets for offline training to obtain a better model. In the actual training process, the data sources are usually geographically dispersed, and are not generated at the same time, but in a sequential order over time. Therefore, the traditional machine learning training method is no longer applicable, and it is necessary to rely on distributed Geo-distributed machine learning. Distributed machine learning enables efficient training of large geographically dispersed datasets over time, eliminating the need for all datasets to be trained at a central site. [0003] At present, distributed machine learning generally adopts the parameter server (parameter server) framework. In order to tr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): H04L12/24H04L29/08G06N3/08
CPCH04L41/142H04L41/145H04L67/1097G06N3/08H04L67/60
Inventor 李晓彤李宗鹏周睿婷黄浩
Owner 长三角信息智能创新研究院
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products