Hybrid pipeline parallel method for accelerating distributed deep neural network training

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A deep neural network and neural network technology, applied in the field of hybrid model division and task placement, can solve problems such as training acceleration, and achieve the effect of increasing possibilities and improving training speed.

Pending Publication Date: 2021-05-11

SOUTHEAST UNIV

View PDF0 Cites 6 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0007] The present invention mainly aims at the training acceleration of distributed deep learning under the current pipeline training mode, and proposes a hybrid model division and task placement method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0030] Embodiment 1: see figure 1 , a hybrid pipeline parallel method for accelerating distributed deep neural network training, the method includes the following steps:

[0031] Step 1: Establish the hierarchical cumulative distribution function (CDF) model of the deep neural network, analyze the corresponding input conditions required by the deep learning application execution model division and task placement algorithm, and use the pytorch framework to obtain the parameter amount of each layer; then according to the given The batch-size size of each layer is used to calculate the intermediate result traffic of each layer; finally, according to the type of each layer of the network, such as convolutional layer, fully connected layer, etc., calculate the floating point calculation amount of each layer, and do it for step 2 Preparation;

[0032]Step 2: According to the result of step 1, use the dynamic programming algorithm to solve the parallel time between any two layers of...

specific Embodiment

[0046] Specific embodiments: the present invention is mainly carried out in a GPU cluster environment.

[0047] figure 1 It shows a schematic diagram of a GPU cluster, which mainly includes several GPU server nodes. Each server node has several GPUs. The nodes are connected through PCIe, and the nodes are connected through Ethernet.

[0048] figure 2 Represents a schematic diagram of the hybrid parallel method. The same rectangle is divided into two parts, representing the division method, and the rectangles of different parts are stacked together to represent the repeated calculation part, that is, the part that needs communication. The three graphs correspond to three tensor division methods, which require communication weight parameters, output layer data, and input layer data respectively.

[0049] image 3 Represents the overall flow chart, the first choice is to initialize the system, load the input data and create a specified model, then describe the model by layer,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a hybrid pipeline parallel method for accelerating distributed deep neural network training, and mainly solves the problems that resources are not fully utilized and efficient distributed training cannot be realized in a traditional GPU cluster distributed training process. The core mechanism of the method mainly comprises three parts, namely deep learning model description, model hybrid division and hybrid pipeline parallel division. The method comprises: firstly, for the resource requirements of deep learning application in a GPU training process, describing corresponding indexes such as a calculated amount, an intermediate result communication amount and a parameter synchronization amount in the training process, and using the indexes as input of model hybrid division and task placement; and then, according to a model description result and an environment of a GPU cluster, designing two division algorithms based on dynamic programming to realize model hybrid division and hybrid pipeline parallel division so as to minimize the maximum value of task execution time of each stage after division, so that load balance is ensured, and efficient distributed training of a deep neural network is realized.

Description

[0001] Field [0002] The invention relates to a method for model division and task placement in mixed pipeline distributed deep learning, and belongs to the technical field of distributed computing. Background technique [0003] Deep learning is a general term for a class of pattern analysis methods. Through deep neural networks, multi-layer nonlinear information is used for supervised or unsupervised feature extraction and transformation. In recent years, with the continuous advancement of technology, deep learning has been widely used in many fields such as image recognition, natural language processing, and human-computer confrontation. However, with the continuous development of deep learning, the depth of the network is getting deeper and deeper, the number of layers is increasing, and the training time is also increasing. Usually, it takes tens of hours or even weeks or days to train a complete deep neural network model. Therefore, how to efficiently execute the traini...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/04G06N3/08G06F9/50

CPCG06F9/505G06F9/5083G06N3/04G06N3/084

Inventor 张竞慧李剑歌王宇晨金嘉晖东方罗军舟

Owner SOUTHEAST UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Hybrid pipeline parallel method for accelerating distributed deep neural network training

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

specific Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology