Unlock instant, AI-driven research and patent intelligence for your innovation.

A cluster GPU multiplexing and intelligent load method and system

An intelligent load and GPU card technology, applied in the computer field, can solve problems such as manual adjustment, waste of GPU computing resources, and inability to adapt to cluster computing requirements, and achieve the effect of avoiding task termination, improving resource utilization, and ensuring normal operation.

Inactive Publication Date: 2019-05-17
ZHENGZHOU YUNHAI INFORMATION TECH CO LTD
View PDF0 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Since GPU is a scarce computing resource compared with CPU and memory, the result of only scheduling and allocating by card is that only one computing task can run on a GPU card at a time. (For example, the development of AI only needs simple calculations in the early stage), which will cause a waste of GPU computing resources
[0006] Although NVIDIA's VGPU technology can be used to virtualize one GPU into multiple VGPUs, this method requires manual adjustment and cannot adapt to the changing computing needs of the cluster.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A cluster GPU multiplexing and intelligent load method and system
  • A cluster GPU multiplexing and intelligent load method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0082] This method can specifically be realized according to the following steps:

[0083] A) Modify the gpuNodes file initialization process:

[0084] Determine whether there is a gpuNodes file and compare it with the node and nodeShare files in the scheduling system to add the GPU card position record of the node. If no multiplexing is set for node2 (4 cards), add the record "node2:0 0 0 0", otherwise add the record "node2:0 0 0 0 0 0 0 0".

[0085] B) Modify an existing resource allocation module

[0086] Input parameter: task ID $JOBID

[0087] Output: Task GPU resource sequence such as "601.node01;;node01#0,1;node02#2,3"

[0088] The module first obtains the task information according to the task ID and extracts the list of nodes to be allocated by the task and the number of GPUs that each node should allocate, and traverses the node list to obtain the GPU usage of the corresponding node in the gpuNodes file, such as "node01:0 1 0 1" Indicates that the node has used t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a cluster GPU multiplexing and intelligent load method. The method comprises the following steps of carrying out multiplexing setting on a GPU card; allocating a task applied to be executed to the GPU card which is subjected to multiplexing setting, and binding the task with the GPU card; periodically scanning the video memory use of the GPU card, and dynamically adjustingthe multiplexing condition of the GPU card based on the utilization rate of the video memory and a setting strategy; and when the task is finished, releasing the GPU card bound with the task. By usingthe cluster GPU multiplexing and intelligent load method provided by the invention, a plurality of operation tasks can be operated on one GPU; the resource utilization rate can be effectively improved for application scenarios with small calculation amount of the GPU, the task termination caused by GPU video memory overflow during multi-task concurrence is avoided, and the normal operation of thehigh-priority tasks is guaranteed.

Description

technical field [0001] This field relates to the computer field, and more specifically relates to a method and system for clustering GPU multiplexing and intelligent loads. Background technique [0002] Maui is an open source job scheduling application software, which is widely used in high-performance service clusters to implement job scheduling management. Through the GRES attribute of Maui, you can set the number of GPU nodes and support the scheduling of GPU resources. There are also related researches and These solutions only consider job scheduling in units of GPU cards, that is, a GPU card can only run one task at a time. [0003] With the current growing demand for artificial intelligence AI computing, GPU resources in high-performance clusters are becoming more and more important as accelerated computing resources. As an expensive and scarce computing resource (compared to CPU and memory), GPU needs to be able to Provide finer-grained and more flexible scheduling a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F9/50
CPCG06F9/5016
Inventor 胡叶
Owner ZHENGZHOU YUNHAI INFORMATION TECH CO LTD