Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Hybrid pipeline parallel method for accelerating distributed deep neural network training

A deep neural network and neural network technology, applied in the field of hybrid model division and task placement, can solve problems such as training acceleration, and achieve the effect of increasing possibilities and improving training speed.

Pending Publication Date: 2021-05-11
SOUTHEAST UNIV
View PDF0 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The present invention mainly aims at the training acceleration of distributed deep learning under the current pipeline training mode, and proposes a hybrid model division and task placement method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hybrid pipeline parallel method for accelerating distributed deep neural network training
  • Hybrid pipeline parallel method for accelerating distributed deep neural network training
  • Hybrid pipeline parallel method for accelerating distributed deep neural network training

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] Embodiment 1: see figure 1 , a hybrid pipeline parallel method for accelerating distributed deep neural network training, the method includes the following steps:

[0031] Step 1: Establish the hierarchical cumulative distribution function (CDF) model of the deep neural network, analyze the corresponding input conditions required by the deep learning application execution model division and task placement algorithm, and use the pytorch framework to obtain the parameter amount of each layer; then according to the given The batch-size size of each layer is used to calculate the intermediate result traffic of each layer; finally, according to the type of each layer of the network, such as convolutional layer, fully connected layer, etc., calculate the floating point calculation amount of each layer, and do it for step 2 Preparation;

[0032]Step 2: According to the result of step 1, use the dynamic programming algorithm to solve the parallel time between any two layers of...

specific Embodiment

[0046] Specific embodiments: the present invention is mainly carried out in a GPU cluster environment.

[0047] figure 1 It shows a schematic diagram of a GPU cluster, which mainly includes several GPU server nodes. Each server node has several GPUs. The nodes are connected through PCIe, and the nodes are connected through Ethernet.

[0048] figure 2 Represents a schematic diagram of the hybrid parallel method. The same rectangle is divided into two parts, representing the division method, and the rectangles of different parts are stacked together to represent the repeated calculation part, that is, the part that needs communication. The three graphs correspond to three tensor division methods, which require communication weight parameters, output layer data, and input layer data respectively.

[0049] image 3 Represents the overall flow chart, the first choice is to initialize the system, load the input data and create a specified model, then describe the model by layer,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a hybrid pipeline parallel method for accelerating distributed deep neural network training, and mainly solves the problems that resources are not fully utilized and efficient distributed training cannot be realized in a traditional GPU cluster distributed training process. The core mechanism of the method mainly comprises three parts, namely deep learning model description, model hybrid division and hybrid pipeline parallel division. The method comprises: firstly, for the resource requirements of deep learning application in a GPU training process, describing corresponding indexes such as a calculated amount, an intermediate result communication amount and a parameter synchronization amount in the training process, and using the indexes as input of model hybrid division and task placement; and then, according to a model description result and an environment of a GPU cluster, designing two division algorithms based on dynamic programming to realize model hybrid division and hybrid pipeline parallel division so as to minimize the maximum value of task execution time of each stage after division, so that load balance is ensured, and efficient distributed training of a deep neural network is realized.

Description

[0001] Field [0002] The invention relates to a method for model division and task placement in mixed pipeline distributed deep learning, and belongs to the technical field of distributed computing. Background technique [0003] Deep learning is a general term for a class of pattern analysis methods. Through deep neural networks, multi-layer nonlinear information is used for supervised or unsupervised feature extraction and transformation. In recent years, with the continuous advancement of technology, deep learning has been widely used in many fields such as image recognition, natural language processing, and human-computer confrontation. However, with the continuous development of deep learning, the depth of the network is getting deeper and deeper, the number of layers is increasing, and the training time is also increasing. Usually, it takes tens of hours or even weeks or days to train a complete deep neural network model. Therefore, how to efficiently execute the traini...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N3/04G06N3/08G06F9/50
CPCG06F9/505G06F9/5083G06N3/04G06N3/084
Inventor 张竞慧李剑歌王宇晨金嘉晖东方罗军舟
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products