A distributed acceleration method and system for a deep learning training task

A deep learning and distributed technology, applied in the field of deep learning, can solve the problems of reducing training accuracy, reducing single communication traffic, etc., to achieve the effect of improving cluster expansion efficiency, compressing communication time, and accelerating the training process

Active Publication Date: 2019-06-18
INST OF INFORMATION ENG CAS
View PDF4 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Most of the existing technologies focus on the first approach, which compresses the sent gradients by means of quantization and spar

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A distributed acceleration method and system for a deep learning training task
  • A distributed acceleration method and system for a deep learning training task
  • A distributed acceleration method and system for a deep learning training task

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0031] refer to figure 1 , the present embodiment provides a distributed acceleration method for deep learning training tasks, the method includes the following steps:

[0032] (1) Build a distributed GPU training cluster, including: divide parameter servers and work nodes, determine communication architecture, refer to figure 2 , the specific steps are as follows:

[0033] (1-1) Build a parameter server to save and update model parameters. The CPUs of all servers in the cluster collectively form a parameter server. All model parameters are evenly stored in the memory of each CPU. The parameter update is completed by the CPU and exposed to the outside world. There are two operations, push and pull, to be invoked by working nodes. The push operation refers to the parameter server receiving the gradient sent by the working node, and the pull o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a distributed acceleration method and system for a deep learning training task. The method comprises the following steps: (1) building a distributed GPU training cluster; (2)adopting a swap-in swap-out strategy, and adjusting a minibatch size on a single GPU working node in the distributed GPU training cluster is adjusted; (3) adjusting the learning rate according to theminibatch size determined in the step (2); And (4) carrying out deep learning training by adopting the hyper-parameters minibatch size determined in the steps (2) and (3) and the learning rate. On thepremise that the training accuracy is not affected, the communication time is greatly compressed simply and efficiently by reducing the number of times of parameter updating communication between clusters, and compared with a single GPU mode, the cluster expansion efficiency can be fully improved in a multi-GPU mode, and acceleration of the training process of the ultra-deep neural network modelis achieved.

Description

technical field [0001] The invention belongs to the field of deep learning, and specifically aims at the problems of low cluster expansion efficiency and slow training in the process of training ultra-deep neural network models by distributed GPU clusters, and proposes an acceleration method to reduce the time required for training. Background technique [0002] In recent years, big data-driven deep learning technology has achieved considerable performance improvements in many fields of artificial intelligence. The deeper and deeper the neural network model, the larger the data scale has become the current basic trend. Complex network models often require more training data to obtain excellent generalization capabilities. However, training deep models with such a combination of big data is a great challenge. Deep learning training tasks are typical computing-intensive tasks, so distributed GPU (Graphics Processing Unit-graphics processing unit) clusters are often used for tr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/063G06N3/08G06T1/20
Inventor 刘万涛郭锦荣虎嵩林韩冀中
Owner INST OF INFORMATION ENG CAS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products