Single-step delay stochastic gradient descent training method for machine learning

A stochastic gradient descent and training method technology, applied in the optimization field of distributed training, can solve problems such as low convergence accuracy, achieve the effects of improving training speed, improving utilization rate, and ensuring training accuracy

Pending Publication Date: 2020-11-03
NAT UNIV OF DEFENSE TECH
View PDF2 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although ASGD achieves better training performance by alleviating the synchronization overhead in SSGD, it tends to converge less accurately than SSGD due to the use of delayed gradient information for training.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Single-step delay stochastic gradient descent training method for machine learning
  • Single-step delay stochastic gradient descent training method for machine learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0018] Example 1: One-step delayed stochastic gradient descent training method for machine learning.

[0019] This embodiment implements OD-SGD in MXNet, which is an efficient and flexible distributed training framework specially designed for neural network training. MXNet is implemented based on the parameter server structure. One of its key designs is the dependency engine, which is a static library that is scheduled according to the dependencies between operations. During the training process, operations without dependencies can be executed in parallel. The default MXNet framework only defines one update function in one training mode. Although OD-SGD is implemented in MXNet, it is equally applicable to point-to-point deep learning frameworks such as Pytorch and Caffe. The method includes the following steps:

[0020] S1: Define global update function and local update function

[0021] The single-step delay stochastic gradient descent training method includes a global upd...

Embodiment 2

[0029] Example 2: OD-SGD implements the method in the MXNet framework.

[0030] OD-SGD is implemented in the MXNet framework, and evaluation experiments are conducted on GPU clusters with 5, 9 and 13 computers. Each machine has 20 2.60GHz Intel cores, 256G memory, and is equipped with 2 K80 (dual GPU) Tesla GPU processors. The clusters are interconnected via Ethernet and InfiniBand with network bandwidths of 10Gbps and 56Gbps respectively. All machines are installed Red Hat4.8.3, CUDA8.0 and cuDNN 6.0. During the implementation, one of the machines is set as a parameter server and scheduling node, and the rest of the machines are set as computing nodes.

[0031] In order to prove the effectiveness of the method of the present invention, three data sets are used for implementation: (1) MNIST, which is a handwritten digit database, consists of 60,000 training samples and 10,000 testing samples. (2) CIFAR-10, which contains 60000 32x32 color images of 10 different categories, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a single-step delay stochastic gradient descent training method for machine learning, and the method comprises the following steps: defining a global updating function and a local updating function, enabling the updating function to run in a parameter server in a distributed training mode, and enabling the updating function to run in a computing node in a single-node training mode; adjusting an execution sequence of calculation and communication operations in the calculation node, and in an ODSGD training mode, performing an update operation of a local weight by the calculation node while executing a gradient sending operation; introducing a new parameter variable to break the original data dependence relationship; and executing a distributed training task, settingthe number of iterations of a preheating stage, and additionally specifying an algorithm for local updating and corresponding hyper-parameters. According to the method, the utilization rate of computing resources in the distributed deep learning training process can be increased, meanwhile, on the premise of ensuring the training precision, the distributed training speed is increased, and the training time of the neural network model is shortened.

Description

technical field [0001] The invention relates to the field of artificial intelligence, in particular to an optimization method for distributed training. Background technique [0002] Deep learning has recently achieved great success in various fields such as computer vision, natural language processing, autonomous driving, and intelligent medical care. The rise of deep learning is mainly due to two conditions. One is the emergence of general-purpose and customized hardware accelerators (GPU, NPU, TPU, etc.), which have brought great progress in computing power. Open source of general training datasets like CIFAR. However, with the rapid growth of deep neural networks and datasets, the computing power of the machines used for training becomes a bottleneck, and it takes days or weeks to complete the training of a large neural network model, for example, based on the ImageNet dataset, using It takes 21 hours to train ResNet-50 for 90 rounds with 8 P100 GPUs, which makes real-t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/08G06N3/04
CPCG06N3/08G06N3/045
Inventor 董德尊徐叶茂徐炜遐廖湘科
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products