Method, system and device for efficient memory replacement between GPU devices and storage medium

A device-to-device and memory technology, applied in inter-program communication, multi-program device, processor architecture/configuration, etc., can solve problems such as insufficient memory, unbalanced memory requirements, etc., to reduce memory constraints, retrieve data in time, The effect of reducing memory

Active Publication Date: 2022-08-05
UNIV OF SCI & TECH OF CHINA
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In addition, in the parallel training scenario, in order to realize the data replacement between each GPU and the CPU in the system, the CPU memory becomes the load bottleneck. On DGX-1, the number of links between the CPU and the shared PCIe is 4:1, and the data replacement cost varies with The number of GPUs increases linearly; especially for the pipeline parallel training scenario, the pipeline stage is designed to balance the calculation load, but the memory requirements of each stage of this pipeline are unbalanced. Compared with the tail, the head of the pipeline The department must store more intermediate data for reverse calculation, resulting in ResNet152 model training with a batch size greater than 128 and Bert model training with a parameter volume greater than 640 million on a server with 8 NVIDIA 1080Ti GPU devices There was an error of insufficient memory, and the memory consumption of the entire model at this time only accounted for 68.9% and 88.7% of the total memory of the 8 GPUs. Therefore, it is necessary to develop a new memory replacement technology to improve data exchange performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method, system and device for efficient memory replacement between GPU devices and storage medium
  • Method, system and device for efficient memory replacement between GPU devices and storage medium
  • Method, system and device for efficient memory replacement between GPU devices and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0026] An embodiment of the present invention provides a method for efficient memory replacement between GPU devices, which is a method for efficient memory replacement between GPU devices in a pipeline parallel scenario to reduce memory constraints. It mainly includes the following steps:

[0027] 1. Allocate work components for GPU devices participating in the swap work.

[0028] In the embodiment of the present invention, the work components mainly include: an advisor (Advisor), a memory manager (MemoryManager), a coordinator (Coordinator), and a transmitter (Transmission). like figure 1 shown, the architecture of the above four working components in a single GPU device is shown.

[0029] In the embodiment of the present invention, the GPUs participating in the work are other GPU devices including high memory load GPU devices, the high memory load GPU devices refer to the GPU devices whose memory load exceeds the set threshold, and the other GPU devices refer to the non-h...

Embodiment 2

[0052] The present invention also provides a system for efficient memory replacement between GPU devices, which is mainly implemented based on the methods provided in the foregoing embodiments, such as Figure 4 As shown, the system mainly includes:

[0053] The work component allocation unit is used to allocate work components to the GPU devices participating in the exchange work, including: proposers, memory managers, coordinators and transmitters;

[0054] The memory replacement unit is used to realize the memory replacement between GPU devices through work components, and the steps include: when the coordinator of the current GPU device receives the data exchange request, if the type of the data exchange request is data unloading, the proposer determines the relevant The data exchange scheme, and the memory manager allocates memory of the corresponding size in the destination GPU device according to the data exchange scheme, and generates the destination space information ...

Embodiment 3

[0057] The present invention also provides a processing device, such as Figure 5 As shown, it mainly includes: one or more processors; a memory for storing one or more programs; wherein, when the one or more programs are executed by the one or more processors, the One or more processors implement the methods provided by the foregoing embodiments.

[0058] Further, the processing device further includes at least one input device and at least one output device; in the processing device, the processor, the memory, the input device, and the output device are connected through a bus.

[0059] In this embodiment of the present invention, the specific types of the memory, the input device, and the output device are not limited; for example:

[0060] The input device can be a touch screen, an image capture device, a physical button or a mouse, etc.;

[0061] The output device can be a display terminal;

[0062]The memory may be random access memory (Random Access Memory, RAM), or ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method, system and device for efficient memory replacement between GPU devices and a storage medium. In a related scheme, firstly, on the premise of effectively reducing memory limitation, data exchange operation (unloading and retrieving) is parallel to model training, calculation overhead is not introduced, and transmission time can be hidden; and secondly, the inactive data on the GPU equipment with high memory load are unloaded to other GPU equipment and are retrieved when needed, so that the free memory space of the equipment in the system is fully utilized, a plurality of direct connection high-speed links among the GPUs are aggregated, and the high-speed communication bandwidth is obtained, so that the memory is reduced more quickly, and the data are retrieved more timely. By combining the two points, the performance overhead introduced by memory compression can be greatly reduced, and the limitation of the memory on model training can be effectively reduced, so that the model training efficiency is improved.

Description

technical field [0001] The present invention relates to the technical field of GPU device memory replacement, and in particular, to a method, system, device and storage medium for efficient memory replacement between GPU devices. Background technique [0002] In 2020, the deep learning team of New York University in the United States published the deep learning parallel training memory compression system SwapAdvisor (SwapAdvisor: Push Deep Learning Beyond the GPUMemory Limitvia Smart Swapping) at the ASPLOS (Architectural Support for Programming Languages ​​and Operating Systems) conference. Swap to CPU (central processing unit) memory, and then swap from CPU memory to GPU (graphics processing unit) memory next time you need it. However, the bandwidth of data replacement between CPU and GPU through PCIe Gen3x16 link is 16GB / s, which is only 60% of the bandwidth of NVLink2.0 of a direct link between GPUs. As a result, on DGX-1, the upper limit of replacement data speed is onl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50G06F9/54G06T1/20G06T1/60
CPCG06F9/5016G06F9/5027G06F9/546G06T1/20G06T1/60G06F2209/548Y02D10/00
Inventor 王海权李诚周泉于笑颜吕敏许胤龙
Owner UNIV OF SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products