Distributed deep neural network performance modeling method based on an instruction queue
A deep neural network and instruction queue technology, applied in the field of performance modeling, can solve the problems that model construction depends on experimental results, and the time-consuming estimation of neural network cannot be given, so as to achieve the effects of enhanced characterization, improved effect, and strong versatility
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0025] The workflow of the distributed deep neural network performance modeling method using the parameter server method to update parameters introduced in this embodiment 1 involves the extraction of key feature parameters of software and hardware, single-GPU performance modeling and multi-GPU performance modeling processes, and finally Realize the estimation of the time-consuming iteration of this kind of deep neural network training under the current hardware environment and software configuration, where GPU performance modeling includes instruction queue model, throughput model, GPU topology model, parameter server / collection communication transmission model.
[0026] figure 1 A schematic diagram of the workflow of the distributed deep neural network performance modeling method based on instruction queue is given. Such as figure 1 As shown, the specific workflow is as follows: firstly, the key feature parameters of software and hardware are extracted A, and the key parame...
Embodiment 2
[0039] This embodiment provides a workflow for a distributed deep neural network performance modeling method using an ensemble communication method to update parameters. The difference between this embodiment and Embodiment 1 is that the transmission model C used is different, such as image 3 As shown, this embodiment uses the collective communication transmission model C2, and the rest are consistent with Embodiment 1.
[0040] The principle of the parameter server transmission model C2 is as follows Figure 8 As shown, it consists of two parts: the server GPU topology structure H8 and the calculation and transmission time-consuming schematic diagram H9. The server GPU topology structure H8 includes GPUs and CPUs. Figure 8 For example, it is composed of CPUH1, GPU0 H2, GPU1 H3, GPU2 H4, and GPU3 H5, and the calculation and transmission time-consuming schematic diagram H9 is composed of a calculation module H6 and a parameter update kernel function module H7. Among them, t...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


