GPU cluster shared video memory system, method, device and equipment

A GPU cluster and computing node technology, applied in the field of GPU cluster shared video memory systems, can solve the problems of complex programming of GPU cluster systems and low shared video memory performance, and achieve the effects of expanding design space, simplifying programming, and improving efficiency

Pending Publication Date: 2021-11-19
ALIBABA SINGAPORE HLDG PTE LTD
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] This application provides a GPU cluster shared video memory system to solve the existing problems in the prior art when the GPU cluster supports a large load with hig

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • GPU cluster shared video memory system, method, device and equipment
  • GPU cluster shared video memory system, method, device and equipment
  • GPU cluster shared video memory system, method, device and equipment

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0066] Please refer to figure 1 , which is a schematic structural diagram of an embodiment of the GPU cluster shared video memory system of the present application. The system provided in this embodiment may include: an application development device 1 and an application running device 2 .

[0067] Such as figure 2 As shown, the application development device 1 may be a software editor installed on a terminal device (such as a personal computer or a notebook computer) used by a program developer, such as tensorflow integrated development environment. The application development device 1 can be used to develop an application program based on the unified video memory of the GPU cluster. The application development device 1 provides developers with the abstraction of a single system without considering multi-machine and parallelism, and only needs to use a video memory allocation instruction (such as malloc) once to allocate video memory for use in the face of a large-load app...

no. 2 example

[0116] In the foregoing embodiments, a GPU cluster shared video memory system is provided, and correspondingly, the present application also provides a device, that is, a software editor. The device corresponds to the embodiment of the above-mentioned system. Since the device embodiment is basically similar to the system embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the system embodiment. The device embodiments described below are illustrative only.

[0117] The present application additionally provides a software editor, which is used to use the global virtual memory address space of the GPU cluster as a programming view to determine the program code for the target application to use the global virtual memory resource of the GPU cluster, and the program code does not include transmission between different computing nodes. Code for video memory data.

[0118] The global virtual video memory address spa...

no. 3 example

[0122] In the foregoing embodiments, a GPU cluster shared video memory system is provided. Correspondingly, the present application also provides a GPU cluster shared video memory device, which may be a module of an operating system. The device corresponds to the embodiment of the above-mentioned system. Since the device embodiment is basically similar to the system embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the system embodiment. The device embodiments described below are illustrative only.

[0123] The present application additionally provides a GPU cluster shared video memory device, including:

[0124] The physical video memory allocation unit is used to determine the GPU cluster global video memory address mapping information of the target application according to the GPU cluster global virtual video memory address space of the target application running on the first computing node;

[0125] Th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a GPU cluster video memory sharing method, device and system and equipment. The method comprises the following steps: determining GPU cluster global video memory address mapping information of a target application according to a GPU cluster global virtual video memory address space of the target application running on a first computing node; when page missing abnormity occurs when the target application accesses the GPU video memory, determining a second computing node where the target page data is located according to global video memory address mapping information of the target application; and calling the target page data in the second computing node into the GPU video memory of the first computing node, and reading the target page data from the GPU video memory of the first computing node by the target application. By the adoption of the processing mode, the video memory resources are aggregated from the GPU cluster system level, a unified GPU video memory address space and a single programming view are provided for a distributed GPU cluster in the face of large loads with high video memory resource requirements, explicit management data migration and communication are avoided, and GPU cluster system programming is simplified.

Description

technical field [0001] The application relates to the technical field of video memory management, in particular to a GPU cluster shared video memory system, method and device, a software editor, and electronic equipment. Background technique [0002] The high-computing graphics processing unit (GPU) cluster carries many key intelligent computing services in the enterprise, and has become a solid foundation for high-end applications such as AI deep learning training, massive data analysis, and large-scale scientific computing. The software frameworks used by these applications are mostly distributed architectures, such as the machine learning platform TensorFlow. In the case of a single GPU with limited video memory resources, GPU clusters need to share GPU video memory to support heavy-duty applications with higher video memory resource requirements. [0003] At present, a typical GPU cluster sharing video memory method is that the programming model is distributed or parall...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06T1/20G06F9/50G06F8/65
CPCG06T1/20G06F9/5027G06F8/65Y02D10/00
Inventor 安仲奇
Owner ALIBABA SINGAPORE HLDG PTE LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products