A cluster GPU multiplexing and intelligent load method and system
An intelligent load and GPU card technology, applied in the computer field, can solve problems such as manual adjustment, waste of GPU computing resources, and inability to adapt to cluster computing requirements, and achieve the effect of avoiding task termination, improving resource utilization, and ensuring normal operation.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment
[0082] This method can specifically be realized according to the following steps:
[0083] A) Modify the gpuNodes file initialization process:
[0084] Determine whether there is a gpuNodes file and compare it with the node and nodeShare files in the scheduling system to add the GPU card position record of the node. If no multiplexing is set for node2 (4 cards), add the record "node2:0 0 0 0", otherwise add the record "node2:0 0 0 0 0 0 0 0".
[0085] B) Modify an existing resource allocation module
[0086] Input parameter: task ID $JOBID
[0087] Output: Task GPU resource sequence such as "601.node01;;node01#0,1;node02#2,3"
[0088] The module first obtains the task information according to the task ID and extracts the list of nodes to be allocated by the task and the number of GPUs that each node should allocate, and traverses the node list to obtain the GPU usage of the corresponding node in the gpuNodes file, such as "node01:0 1 0 1" Indicates that the node has used t...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 

