Method, system and related equipment for scheduling deep learning jobs

A deep learning and job scheduling technology, applied in the field of artificial intelligence, which can solve the problems of different scheduling requirements of typical system services, lack of batch abstraction and mechanism, and reduced user experience.

Active Publication Date: 2018-11-30
HUAWEI CLOUD COMPUTING TECH CO LTD
View PDF8 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

These complex factors make it difficult for the batch job scheduler to schedule deep learning jobs in a simple way, and users have to write some adaptation scripts that are not highly reusable
[0009] (2) Although deep learning commissioning jobs and online reasoning jobs are similar to traditional services, as an application service submitted by users, their life cycle is often relatively short, and their scheduling requirements are also different from those of web servers and databases. Typical System Services
For the service scheduler designed for system service scenarios with relatively stable quantity and life cycle, and lacking batch processing abstraction and mechanism, these special scheduling requirements are either completely impossible to achieve, or need to be assisted by complex external mechanisms
[0010] Neither of the two types of traditional schedulers can fully meet the complex and diverse scheduling requirements of multiple deep learning libraries and multiple types of deep learning jobs, which is an important obstacle to providing deep learning services in the public cloud
Simply using the original batch job scheduler or service scheduler not only cannot realize the scheduling strategy proprietary to deep learning, thereby reducing user experience and increasing the complexity of operation and maintenance; but also potentially affects the utilization of hardware resources and improves Operating costs of public cloud

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method, system and related equipment for scheduling deep learning jobs
  • Method, system and related equipment for scheduling deep learning jobs
  • Method, system and related equipment for scheduling deep learning jobs

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0090] In the first way, the job request also includes at least one of the following information: job name, deep learning program storage location, application startup file, data set storage location, the type of the at least one task, the at least one The quantity of each task in the tasks, the command line parameters of the job, and the resource requirements of each task in the at least one task.

[0091] In the second way, the job request also includes at least one of the following information: job name, deep learning program, application startup file, data set storage location, type of the at least one task, and The quantity of each type of task, job command line parameters, and resource requirements of each task in the at least one task.

[0092] Among them, the job name is the identifier of the deep learning job. The storage location of the deep learning program is used for the computing node to read the deep learning program according to the storage location of the app...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method, a system and related equipment for scheduling deep learning jobs. The method includes acquiring job requests for the deep learning jobs; determining target job description file templates from a plurality of preliminarily stored job description file templates according to deep learning library types and job types; determining identification of target job base mirror images from identification of a plurality of preliminarily stored job base mirror images according to the deep learning library types and the job types; generating target job description files according to the target job description file templates and the identification of the target job base mirror images; transmitting the target job description files to a container scheduler; selecting targetjob base mirror images from the preliminarily stored job base mirror images by the container scheduler according to the target job description files and creating at least one container for executing the job requests. The deep learning library types and the job types are carried in the job requests. According to the scheme, the method, the system and the related equipment have the advantage that the deep learning job scheduling compatibility rate can be increased by the aid of the method, the system and the related equipment.

Description

technical field [0001] This application relates to the field of artificial intelligence, in particular to a deep learning job scheduling method, system and related equipment. Background technique [0002] In recent years, deep learning technology has been more and more widely used in various industries. Major public cloud service providers at home and abroad have launched deep learning cloud services. This type of cloud service has become an inevitable choice for enterprises to lower the threshold of technology use and reduce the cost of software and hardware deployment. When cloud service providers provide deep learning services, they often need to consider many indicators such as cost, performance, resource utilization, reliability, scalability, maintainability, etc., and the pros and cons of the scheduling system largely determine the above indicators. . This is because the "on-demand" and "elastic" usage characteristics of cloud services need to be realized through in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/48
CPCG06F9/4881G06F9/5027G06F9/5083G06N3/08G06N3/105G06N20/00
Inventor 林健杨洁洪斯宝
Owner HUAWEI CLOUD COMPUTING TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products