Elastic quota scheduling method and device for AI computing cluster and medium

A technology of computing clusters and scheduling methods, which is applied in the field of cloud computing, can solve problems such as load and failure to cluster according to cloud platforms, and achieve the effect of improving utilization

Active Publication Date: 2021-06-11
SHANDONG YINGXIN COMP TECH CO LTD
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004]The present invention mainly solves the problem that it is impossible to dynamically allocate computing resource quotas for enterprises according to the cloud platform cluster load

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Elastic quota scheduling method and device for AI computing cluster and medium
  • Elastic quota scheduling method and device for AI computing cluster and medium
  • Elastic quota scheduling method and device for AI computing cluster and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0035] Embodiments of the present invention provide a quota elastic scheduling method of a AI calculation cluster, see figure 1 , Including the following steps:

[0036] S100, according to the performance setting of the cloud platform, set the scan interval to periodically acquire the GPU and CPU resource load information, write the configuration file int maxgpu = utils.getconf ("MAXGPU"), int GPUADD = Utils. GetConf ("AddGPU") and writes the CPU configuration file in the same way, because when the business user is submitted to the cloud platform, the required GPU reaches the upper limit, the new task of the enterprise user will be sent, The calculation of new tasks until the current computing task is completed.

[0037] Set the expansion threshold, set the expansion threshold according to the overall performance of the cloud platform, this value can be set according to the cloud platform cluster performance, code

[0038]

[0039] Depending on whether the current GPU's idle qua...

Embodiment 2

[0049] Embodiments of the present invention provide a quota elastic scheduling system for AI calculation clusters, see figure 2 , Including: threshold configuration module, load monitoring module, and quota elastic management module;

[0050] The threshold configuration module sets a scanning interval, expand threshold, and expansion policy according to the cloud platform or user needs;

[0051] The load monitoring module features an open source component Prometheus, CADVisor, or directly through the container management component Docker stats command, the load monitoring module scans according to the scan interval, where the container is equivalent to the computing resource cluster purchased by each enterprise user on the cloud platform. The performance of each container is different, so the total GPU of the container, GPU idle, CPU total, CPU idle amount, and the total amount of GPU of the monitored container, GPU idle, CPU total The amount, the CPU idle amount is recorded in th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a quota elastic scheduling method for an AI computing cluster, which is applied to a cloud platform. The method comprises the following steps: configuring a scanning interval, a capacity expansion threshold and a capacity expansion strategy according to the cloud platform; scanning the computing resources of the container running on the cloud platform, and scanning the computing resources of the cloud platform; performing maximum verification on a first computing resource of a first computing task which is running in the container and a second computing resource of a second computing task which is about to run in the container; judging whether the computing resources of the container meet maximum verification or not, and if not, executing a capacity expansion strategy step; detecting whether the computing resources of the cloud platform reach the capacity expansion threshold value or not: if yes, starting the capacity expansion strategy for the container. In this way, when the cloud platform has many idle computing resources, the computing resources can be flexibly allocated to enterprise users, the users in need can fully utilize the idle resources to carry out tasks, and the cluster resource utilization rate is also improved.

Description

Technical field [0001] The present invention relates to the field of cloud computing techniques, and more particularly to a quota elastic scheduling method, apparatus, and medium of a AI calculation cluster. Background technique [0002] With the continuous development of artificial intelligence technology and cloud computing industry, more and more companies have begun to build their own AI resource management platform to support the development and development of AI business, for immediate, efficient use of cloud platform suppliers' computing resources, In terms of enterprise tenants running on the cloud platform, the computational resource of the cloud platform should be dynamically changed. [0003] However, the existing cloud platform computing resource quota is often fixed according to the contract of the tenant and the cloud platform. The instant cloud platform has a lot of free computing resources, and it is impossible to make quotas for quotas as a company in accordance ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50G06F9/48G06F11/30
CPCG06F9/505G06F9/4881G06F11/3006G06F11/3051Y02D10/00
Inventor 胡叶
Owner SHANDONG YINGXIN COMP TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products