Unlock instant, AI-driven research and patent intelligence for your innovation.

An online scheduling method and device for distributed machine learning tasks

A machine learning and task scheduling technology, applied in the field of cloud computing, can solve problems such as poor scheduling effects, achieve efficient scheduling and deployment, ensure rationality, and avoid idleness and waste

Active Publication Date: 2022-06-07
WUHAN UNIV
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] In view of this, the present invention provides an online scheduling method and device for distributed machine learning tasks, to solve or at least partially solve the technical problem of poor scheduling effect in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An online scheduling method and device for distributed machine learning tasks
  • An online scheduling method and device for distributed machine learning tasks
  • An online scheduling method and device for distributed machine learning tasks

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0090] This embodiment provides an online scheduling method for distributed machine learning tasks, see figure 1 , the method includes:

[0091] Step S1: Set decision-making parameters related to task scheduling. The decision-making parameters related to task scheduling include: parameters indicating whether the task starts at time t, the time required to complete the task, the weight of the task, all work nodes representing task j and The parameter indicating whether the parameter servers are all deployed on the same server, the number of m-type worker nodes deployed on server h executing task j, and the number of p-type parameter servers deployed on server h executing task j, where , the weight of the task is used to represent the urgency of the task.

[0092] Specifically, the inventors of the present application have found through extensive practice and research that in traditional distributed machine learning tasks, the resource configuration of computing nodes is estima...

Embodiment 2

[0164] This embodiment provides an online scheduling device for distributed machine learning tasks. Please refer to Figure 5 , the device includes:

[0165] Parameter setting module 201: used to set decision parameters related to task scheduling, and the decision parameters related to task scheduling include: a parameter indicating whether the task starts at time t, the time required to complete the task, the weight of the task, and the parameter indicating task j. The parameter indicating whether all worker nodes and parameter servers are deployed on the same server, the number of m-type worker nodes deployed on server h executing task j, the number of p-type parameter servers deployed on server h executing task j The number, among which, the weight of the task is used to represent the urgency of the task;

[0166] The task completion time weighted sum representation module 202 is used to obtain the completion time weighted sum of all tasks according to the parameter indica...

Embodiment 3

[0175] See Image 6 , based on the same inventive concept, the present application also provides a computer-readable storage medium 300 on which a computer program 311 is stored, which implements the method described in the first embodiment when the program is executed.

[0176] Since the computer-readable storage medium introduced in Embodiment 3 of the present invention is a computer device used to implement the online scheduling method for distributed machine learning tasks in Embodiment 1 of the present invention, based on the method described in Embodiment 1 of the present invention, Those skilled in the art can understand the specific structure and modification of the computer-readable storage medium, and thus will not be repeated here. All computer-readable storage media used in the method in Embodiment 1 of the present invention belong to the scope of protection of the present invention.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an online scheduling method for distributed machine learning tasks. By using a dual approximation method, the difficult-to-handle full-time online problems are converted into a series of offline task scheduling problems, and the algorithm is ensured during the conversion process. sex. The present invention can tend to optimally schedule distributed machine learning tasks and resources online without knowing the future, avoiding idleness and waste of resources, thereby efficiently scheduling and deploying various tasks in real time while making full use of limited resources. tasks and resources. In the specific application process, it can help cloud resource providers to allocate resource scheduling tasks reasonably, and can maximize the use of resource demand elasticity of different tasks to make full use of existing resources. According to the present invention, the resource provider can adjust the online scheduling strategy online in real time as time goes by, so as to maximize the utilization of resources and minimize the weighted sum of completion time of all tasks.

Description

technical field [0001] The present invention relates to the technical field of cloud computing, in particular to an online scheduling method and device for distributed machine learning tasks. Background technique [0002] Machine learning is a very important data analysis technique for obtaining useful information from large-scale datasets. In distributed machine learning, the dataset is distributed to a large number of worker nodes to train and update model parameters in parallel. According to the specific implementation of updating model parameters, it is divided into parameter server framework and AllReduce framework. In the parameter server architecture, computing nodes are divided into two categories: worker nodes and parameter servers. The training data set of the worker node sends the generated parameter change to the parameter server and receives the updated parameters from the parameter server. The parameter server maintains the entire model parameter set, and the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06N20/00G06F9/48
CPCG06N20/00G06F9/4806
Inventor 张琴李宗鹏黄浩
Owner WUHAN UNIV