An online scheduling method and device for distributed machine learning tasks
A machine learning and task scheduling technology, applied in the field of cloud computing, can solve problems such as poor scheduling effects, achieve efficient scheduling and deployment, ensure rationality, and avoid idleness and waste
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0090] This embodiment provides an online scheduling method for distributed machine learning tasks, see figure 1 , the method includes:
[0091] Step S1: Set decision-making parameters related to task scheduling. The decision-making parameters related to task scheduling include: parameters indicating whether the task starts at time t, the time required to complete the task, the weight of the task, all work nodes representing task j and The parameter indicating whether the parameter servers are all deployed on the same server, the number of m-type worker nodes deployed on server h executing task j, and the number of p-type parameter servers deployed on server h executing task j, where , the weight of the task is used to represent the urgency of the task.
[0092] Specifically, the inventors of the present application have found through extensive practice and research that in traditional distributed machine learning tasks, the resource configuration of computing nodes is estima...
Embodiment 2
[0164] This embodiment provides an online scheduling device for distributed machine learning tasks. Please refer to Figure 5 , the device includes:
[0165] Parameter setting module 201: used to set decision parameters related to task scheduling, and the decision parameters related to task scheduling include: a parameter indicating whether the task starts at time t, the time required to complete the task, the weight of the task, and the parameter indicating task j. The parameter indicating whether all worker nodes and parameter servers are deployed on the same server, the number of m-type worker nodes deployed on server h executing task j, the number of p-type parameter servers deployed on server h executing task j The number, among which, the weight of the task is used to represent the urgency of the task;
[0166] The task completion time weighted sum representation module 202 is used to obtain the completion time weighted sum of all tasks according to the parameter indica...
Embodiment 3
[0175] See Image 6 , based on the same inventive concept, the present application also provides a computer-readable storage medium 300 on which a computer program 311 is stored, which implements the method described in the first embodiment when the program is executed.
[0176] Since the computer-readable storage medium introduced in Embodiment 3 of the present invention is a computer device used to implement the online scheduling method for distributed machine learning tasks in Embodiment 1 of the present invention, based on the method described in Embodiment 1 of the present invention, Those skilled in the art can understand the specific structure and modification of the computer-readable storage medium, and thus will not be repeated here. All computer-readable storage media used in the method in Embodiment 1 of the present invention belong to the scope of protection of the present invention.
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


