A Method for Optimizing Job Running Parameters Applied to Supercomputing Cluster Scheduling

A technology for applying jobs and job parameters, applied in computing, electrical digital data processing, multi-programming devices, etc., can solve problems such as reducing the enthusiasm of users to optimize testing, consuming computer time resources, and difficult to obtain running speed, and improving hardware resources. Utilize efficiency, reduce the amount of hardware resource usage, and improve the effect of computing speed experience

Active Publication Date: 2022-05-13
UNIV OF SCI & TECH OF CHINA
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This is a very big inconvenience to users, which reduces the enthusiasm of users to conduct optimization tests. At the same time, the experience data that users can master is limited, and it is difficult to obtain better candidate operating parameters.
In addition, the user's test operation also consumes the user's own machine time resources, resulting in more machine time costs. The survey shows that there are not many users who do this kind of optimization and debugging
In the current cluster job scheduling system, such as Slurm, PBS Pro, Platform LSF, or TORQUE, when a user submits a new job, the system will only faithfully use the parallel parameters submitted by the user for calculation, and will not test faster for the user Calculation parameters, especially the automatic optimization and modification of the specific application's own input parameters
In particular, the computing software in current supercomputing clusters tends to have a multi-layer parallel structure, and the input parameter space of the corresponding applications is becoming more and more complex. It is difficult for supercomputing cluster users to obtain ideal operating speeds with only a small amount of computing experience. It will cause a large number of jobs in the cluster system to be in an inefficient state

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Method for Optimizing Job Running Parameters Applied to Supercomputing Cluster Scheduling
  • A Method for Optimizing Job Running Parameters Applied to Supercomputing Cluster Scheduling

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] A method for optimizing job operation parameters applied to supercomputing cluster scheduling proposed in this embodiment includes the following steps.

[0049] SA1. Obtain an application job submitted by a user.

[0050] SA2. Obtain the application category described in the application job, select a parameter estimation model according to the application category and the operating parameters to be optimized, and obtain the estimated parameter configuration in combination with the application job and the parameter estimation model.

[0051] The input of the parameter estimation model is the information of the job to be run, and the output is the estimated parameter configuration corresponding to the job to be run. The parameter estimation model can adopt an empirical model, that is, a manual setting. The parameter estimation model can also be obtained by using big data training. The training database of the parameter estimation model is the historical job database of t...

Embodiment 2

[0065] On the basis of Embodiment 1, in the further implementation of this embodiment, when the application job is run in step SA5, if any test job corresponding to it has not finished running, that is, any test job corresponding to the application job has not been run or is running, stop and delete the execution of all test jobs corresponding to the application job, and use the estimated parameter configuration corresponding to the application job as the optimal parameter configuration.

[0066]All test jobs corresponding to the application job are recorded as the test job set corresponding to the application job. If the test job set corresponding to the application job is not completed before the application job is run, the original parameter configuration, estimated parameter configuration, and various parameters corresponding to the application job cannot be configured. Group supplementary parameter configurations can be effectively compared. At this time, continuing to tes...

Embodiment 3

[0070] On the basis of Embodiment 1, in this embodiment, the historical job database further includes the job configuration parameters used in the test job and the corresponding calculation completion time, so as to increase the number of samples through the test job.

[0071] Since the test job is a part of the corresponding application job, its running time is much shorter than the application job, so when training the parameter estimation model, it is necessary to process the job calculation time corresponding to the test job to restore the job corresponding to the test job The parameter configuration is applied to the corresponding application job, the job calculation time required by the application job, and then the parameter configuration of the test job and the restored job calculation time are used to train the parameter estimation model.

[0072] In this embodiment, the historical job database is divided into the original sub-library and the test sub-library. The orig...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A method for optimizing job operation parameters applied to supercomputing cluster scheduling, comprising: obtaining an application job submitted by a user, and obtaining multiple sets of different job parameter configurations corresponding to the application job; Screen the optimal parameter configuration in the group job parameter configuration; push the optimal parameter configuration to the user, or modify the job parameter configuration of the application job submitted by the user according to the optimal parameter configuration. The invention realizes the automatic optimization of the parameter configuration of the application jobs submitted by the users of the supercomputing cluster, makes up for the defect that most users do not have the ability to optimize the parameter configuration, and is conducive to improving the computing efficiency of the supercomputing cluster as a whole.

Description

technical field [0001] The invention relates to the field of supercomputing clusters, in particular to a method for optimizing job operation parameters applied to supercomputing cluster scheduling. Background technique [0002] Computing software such as but not limited to VASP is running on the supercomputing cluster. When submitting a job for these software, the user needs to set operating environment parameters, operating resource parameters, application-related input parameters and other operating parameters. In particular, one or more parallel parameters need to be specified as input parameters, such as the required total CPU cores number, or in the case of a multi-layer parallel structure, the number of parallel computing tasks assigned by each layer. Users can adjust these parallel parameters to significantly increase the calculation speed without changing the calculation results. But at present, many software itself cannot pre-judge a parallel parameter that is clo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/50
CPCG06F9/5061G06F9/5038G06F2209/5021
Inventor 张文帅李会民李京
Owner UNIV OF SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products