Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method, system and program products for a dynamic, hierarchical reporting framework in a network job scheduler

a job scheduler and hierarchical reporting technology, applied in the field of job scheduling systems, can solve the problems of scheduling agents that are easily damaged by heavy communication load, so as to improve the scalability and performance of scheduling agents

Inactive Publication Date: 2009-04-09
IBM CORP
View PDF19 Cites 32 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0006]The present invention proposes a reporting framework solution that enhances the scalability and performance of the scheduling agent by relying on dynamic, hierarchical aggregation of data.
[0007]The shortcomings of the prior art are overcome and additional advantages are provided through the use of a system and method for scheduling jobs in a multinode job scheduling system, the method comprising the steps of: passing job start information from a scheduling agent to a master node; passing the job start information from the master node to a tree structure of nodes in the multinode job scheduling system; providing an indication, from the nodes in the tree structure, to the master node, that the respective nodes are ready to receive an executable task; transferring executable tasks to the nodes in the tree; starting the executable tasks on the nodes of the tree; and returning job status information from the nodes in the tree to the master node along the same node tree structure. Additionally, the present invention provides a method for reporting job status in a multinode job processing system, which comprises: preserving information regarding a hierarchical tree of compute nodes, established during job launch in respective ones of memory systems in the compute nodes; and, at least during job execution, returning job status information from the compute nodes to a master node higher in said tree.

Problems solved by technology

Scheduling parallel workloads in a High Performance Computing (HPC) cluster is an increasingly complex task, especially when it concerns scalability and performance of the scheduling agent.
When hundreds of compute agents running across the cluster attempt to report the status of a job to a scheduling agent running on a single node, the scheduling agent quickly becomes a performance bottleneck under the heavy communication load.
In many cases, this scenario could also lead to the failure of the scheduling agent.
When a large number of compute agents running on different nodes in the cluster report job status to a single scheduling agent running on a single node, the communication load can overwhelm the scheduling agent.
While the former approach results in a performance bottleneck, the latter causes delays in recognizing failures which in turn affects the reliability of the scheduling agent.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method, system and program products for a dynamic, hierarchical reporting framework in a network job scheduler
  • Method, system and program products for a dynamic, hierarchical reporting framework in a network job scheduler
  • Method, system and program products for a dynamic, hierarchical reporting framework in a network job scheduler

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0013]When a job is submitted for execution to the, scheduling agent determines a set of nodes on which this job can run based on the requirements of the job as well as the availability of necessary resources on these compute nodes. Each compute node runs an agent that is capable of reporting the status of jobs dispatched to them for execution.

[0014]To start the job, the scheduling agent sets up a master compute node and sends the job object to it in a JOB_START transaction. This master compute node forwards this “job start order” to a predetermined number of slave nodes while initiating the hierarchical job launch tree. The master node informs each slave node who its child nodes are. Each slave node further forwards the job to its children until all the nodes on which the job runs have received the job start transaction (See FIG. 1).

[0015]Every node in the tree now communicates that it is ready to receive the executable or task(s) to be run. In most existing schemes, all of the age...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention employs a master node for each job to be scheduled and in turn the master node distributes job start information and executable tasks to a plurality of nodes configured in a hierarchical node tree of a multinode job scheduling system. The status of the various tasks executing at the leaf nodes and other nodes of the tree report status back up the same hierarchical tree structure used to start the job, not to a scheduling agent but rather to the master node which has been established by the scheduling agent as the focal point, not only for job starting, but also for the reporting of status information from the leaf and other nodes in the tree.

Description

TECHNICAL FIELD[0001]The present invention is generally directed to a job scheduling system in an environment which includes a large plurality of data processing nodes. More particularly, the present invention is directed to providing a hierarchical structure for the return of job or task status information so as to relieve bottlenecks created at a scheduling agent especially when there are a large plurality of nodes carrying out the job.BACKGROUND OF THE INVENTION[0002]Scheduling parallel workloads in a High Performance Computing (HPC) cluster is an increasingly complex task, especially when it concerns scalability and performance of the scheduling agent. This is because clusters are being used to solve extremely large and complicated problems. This has led to an increase in the number of nodes required to execute a parallel job by an order of magnitude or more. By implication, the total number of nodes in a typical HPC cluster has gone up by an order of magnitude as well.[0003]Whe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F9/46
CPCG06F9/4843
Inventor BRELSFORD, DAVID P.CHAN, WAIMANHUGHES, STEPHEN C.MARTHI, KAILASH N.SURE, RAVINDRA R.
Owner IBM CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products