Unlock instant, AI-driven research and patent intelligence for your innovation.

Distributed deep neural network performance modeling method based on an instruction queue

A deep neural network and instruction queue technology, applied in the field of performance modeling, can solve the problems that model construction depends on experimental results, and the time-consuming estimation of neural network cannot be given, so as to achieve the effects of enhanced characterization, improved effect, and strong versatility

Active Publication Date: 2019-04-19
UNIV OF SCI & TECH OF CHINA
View PDF2 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the performance modeling method of deep neural network trained on GPU is mainly carried out by collecting the underlying data through experiments: for example, in the article "Performance modeling and evaluation of distributed deep learning framework on gpus[C]" included in IEEE in 2018, for Three different deep learning frameworks have constructed a time-delay model of one iteration of the convolutional neural network, but the model construction depends on the experimental results, and it cannot give an estimate of the time-consuming of one iteration of neural network training; there are also theoretical calculations The practice of estimating performance: For example, the article "Paleo: A Performance Model for Deep Neural Networks [C]" included in the conference ICLR (In Proceedings of the International Conference on Learning Representations) in 2017 was constructed for different networks and distributed hardware environments Deep neural network performance model, which is mapped to specific software, hardware, and communication strategy spaces, is used to explore the scalability of deep learning systems, but its error can only be controlled within 30%

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed deep neural network performance modeling method based on an instruction queue
  • Distributed deep neural network performance modeling method based on an instruction queue
  • Distributed deep neural network performance modeling method based on an instruction queue

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0025] The workflow of the distributed deep neural network performance modeling method using the parameter server method to update parameters introduced in this embodiment 1 involves the extraction of key feature parameters of software and hardware, single-GPU performance modeling and multi-GPU performance modeling processes, and finally Realize the estimation of the time-consuming iteration of this kind of deep neural network training under the current hardware environment and software configuration, where GPU performance modeling includes instruction queue model, throughput model, GPU topology model, parameter server / collection communication transmission model.

[0026] figure 1 A schematic diagram of the workflow of the distributed deep neural network performance modeling method based on instruction queue is given. Such as figure 1 As shown, the specific workflow is as follows: firstly, the key feature parameters of software and hardware are extracted A, and the key parame...

Embodiment 2

[0039] This embodiment provides a workflow for a distributed deep neural network performance modeling method using an ensemble communication method to update parameters. The difference between this embodiment and Embodiment 1 is that the transmission model C used is different, such as image 3 As shown, this embodiment uses the collective communication transmission model C2, and the rest are consistent with Embodiment 1.

[0040] The principle of the parameter server transmission model C2 is as follows Figure 8 As shown, it consists of two parts: the server GPU topology structure H8 and the calculation and transmission time-consuming schematic diagram H9. The server GPU topology structure H8 includes GPUs and CPUs. Figure 8 For example, it is composed of CPUH1, GPU0 H2, GPU1 H3, GPU2 H4, and GPU3 H5, and the calculation and transmission time-consuming schematic diagram H9 is composed of a calculation module H6 and a parameter update kernel function module H7. Among them, t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a distributed deep neural network performance modeling method based on an instruction queue. The method is characterized by comprising hardware performance parameters; A neuralnetwork structure, a bottom layer computing architecture, a data transmission protocol, link bandwidth characteristics and a server GPU topological structure are used for carrying out layered mappingsplitting and segmented computing on a deep neural network, then an instruction queue is used for estimating training iteration time consumption, and meanwhile the data interaction condition betweenhardware is output. According to the distributed deep neural network performance modeling method based on the instruction queue, software and hardware characteristics are considered at the same time;Time consumption analysis is carried out by using an instruction-level queue model, so that one-time iteration time consumption estimation of deep neural network training and analysis of each hardwaredata interaction process are realized, and the method is suitable for different hardware environments (different servers, different types of GPUs and different block numbers of GPUs) and different neural networks.

Description

technical field [0001] The invention belongs to the technical field of performance modeling based on a specific calculation model, and in particular relates to a modeling method for the performance of a deep neural network trained on a single or multiple graphics processing units (GPUs). Background technique [0002] The central processing unit (Central Processing Unit, CPU) is the computing core and control core of a computer, and the graphics processing unit (Graphics Processing Unit, GPU) is a microprocessor for image computing work. Compared with CPU, GPU has more computing units. Since the development of GPU general-purpose computing technology, GPU has been widely used in large-scale computing tasks, especially in the field of deep learning. [0003] Deep Neural Network (DNN) refers to an artificial neural network (Artificial Neutral Network, ANN) with multiple hidden layers. Its concept was proposed by Jeffrey Hinton's research group at the University of Toronto in 20...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/10G06N3/08G06N3/063
CPCG06N3/063G06N3/084G06N3/10Y02D10/00
Inventor 李陈圣秦晓卫裴梓茜李晓敏杨渡佳
Owner UNIV OF SCI & TECH OF CHINA