Job operation parameter optimization method applied to super-computing cluster scheduling

A technology for applying jobs and job parameters, applied in computing, electrical digital data processing, multi-programming devices, etc., can solve problems such as reducing the enthusiasm of users to optimize testing, consuming computer time resources, and difficult to obtain running speed, and improving hardware resources. Utilize efficiency, reduce the amount of hardware resource usage, and improve the effect of computing speed experience

Active Publication Date: 2022-02-15
UNIV OF SCI & TECH OF CHINA
View PDF9 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This is a very big inconvenience to users, which reduces the enthusiasm of users to conduct optimization tests. At the same time, the experience data that users can master is limited, and it is difficult to obtain better candidate operating parameters.
In addition, the user's test operation also consumes the user's own machine time resources, resulting in more machine time costs. The survey shows that there are not many users who do this kind of optimization and debugging
In the current cluster job scheduling system, such as Slurm, PBS Pro, Platform LSF, or TORQUE, when a user submits a new job, the system will only faithfully use the parallel parameters submitted by the user for calculation, and will not test faster for the user Calculation parameters, especially the automatic optimization and modification of the specific application's own input parameters
In particular, the computing software in current supercomputing clusters tends to have a multi-layer parallel structure, and the input parameter space of the corresponding applications is becoming more and more complex. It is difficult for supercomputing cluster users to obtain ideal operating speeds with only a small amount of computing experience. It will cause a large number of jobs in the cluster system to be in an inefficient state

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Job operation parameter optimization method applied to super-computing cluster scheduling
  • Job operation parameter optimization method applied to super-computing cluster scheduling

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] A method for optimizing job operation parameters applied to supercomputing cluster scheduling proposed in this embodiment includes the following steps.

[0049] SA1. Obtain an application job submitted by a user.

[0050] SA2. Obtain the application category described in the application job, select a parameter estimation model according to the application category and the operating parameters to be optimized, and obtain the estimated parameter configuration in combination with the application job and the parameter estimation model.

[0051] The input of the parameter estimation model is the information of the job to be run, and the output is the estimated parameter configuration corresponding to the job to be run. The parameter estimation model can adopt an empirical model, that is, a manual setting. The parameter estimation model can also be obtained by using big data training. The training database of the parameter estimation model is the historical job database of t...

Embodiment 2

[0065] On the basis of Embodiment 1, in the further implementation of this embodiment, when the application job is run in step SA5, if any test job corresponding to it has not finished running, that is, any test job corresponding to the application job has not been run or is running, stop and delete the execution of all test jobs corresponding to the application job, and use the estimated parameter configuration corresponding to the application job as the optimal parameter configuration.

[0066]All test jobs corresponding to the application job are recorded as the test job set corresponding to the application job. If the test job set corresponding to the application job is not completed before the application job is run, the original parameter configuration, estimated parameter configuration, and various parameters corresponding to the application job cannot be configured. Group supplementary parameter configurations can be effectively compared. At this time, continuing to tes...

Embodiment 3

[0070] On the basis of Embodiment 1, in this embodiment, the historical job database further includes the job configuration parameters used in the test job and the corresponding calculation completion time, so as to increase the number of samples through the test job.

[0071] Since the test job is a part of the corresponding application job, its running time is much shorter than the application job, so when training the parameter estimation model, it is necessary to process the job calculation time corresponding to the test job to restore the job corresponding to the test job The parameter configuration is applied to the corresponding application job, the job calculation time required by the application job, and then the parameter configuration of the test job and the restored job calculation time are used to train the parameter estimation model.

[0072] In this embodiment, the historical job database is divided into the original sub-library and the test sub-library. The orig...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A job operation parameter optimization method applied to super-computing cluster scheduling comprises the following steps: acquiring an application job submitted by a user, and acquiring multiple groups of different job parameter configurations corresponding to the application job; screening an optimal parameter configuration from the multiple groups of operation parameter configurations according to a set parameter configuration judgment condition; and pushing the optimal parameter configuration to the user, or modifying the job parameter configuration of the application job submitted by the user according to the optimal parameter configuration. According to the method, automatic optimization of the parameter configuration of the application job submitted by the super-computing cluster user is realized, the defect that most users do not have the parameter configuration optimization capability is made up, and the calculation efficiency of the super-computing cluster is improved on the whole.

Description

technical field [0001] The invention relates to the field of supercomputing clusters, in particular to a method for optimizing job operation parameters applied to supercomputing cluster scheduling. Background technique [0002] Computing software such as but not limited to VASP is running on the supercomputing cluster. When submitting a job for these software, the user needs to set operating environment parameters, operating resource parameters, application-related input parameters and other operating parameters. In particular, one or more parallel parameters need to be specified as input parameters, such as the required total CPU cores number, or in the case of a multi-layer parallel structure, the number of parallel computing tasks assigned by each layer. Users can adjust these parallel parameters to significantly increase the calculation speed without changing the calculation results. But at present, many software itself cannot pre-judge a parallel parameter that is clo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50
CPCG06F9/5061G06F9/5038G06F2209/5021
Inventor 张文帅李会民李京
Owner UNIV OF SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products