Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-step delayed update method for distributed deep learning based on sparsification of communication operations

A communication operation and deep learning technology, applied in neural learning methods, biological neural network models, neural architectures, etc., can solve problems such as reducing the speed of distributed training, optimize communication overhead, eliminate synchronization overhead, and make up for delay information. Effect

Active Publication Date: 2022-04-22
NAT UNIV OF DEFENSE TECH
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Considering the impact of the weight delay problem on the model training accuracy, the key to optimizing the ASGD method is to ensure the convergence accuracy of the model. Researchers have proposed different optimization measures based on the asynchronous update mechanism. Although the final convergence accuracy of the model has been improved, an additional The limitation or operation of ∆ reduces the speed of distributed training to some extent, making it impossible to train faster than the original ASGD method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-step delayed update method for distributed deep learning based on sparsification of communication operations
  • Multi-step delayed update method for distributed deep learning based on sparsification of communication operations
  • Multi-step delayed update method for distributed deep learning based on sparsification of communication operations

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] In order to better understand the contents of the present invention, an example is given here.

[0026] The invention discloses a distributed deep learning multi-step delay update method (SSD-SGD) based on communication operation sparsification, and its specific steps include:

[0027] S1, warm-up training, use the synchronous stochastic gradient descent method to train the deep learning model for a certain number of iterations before performing multi-step delayed iterative training. The purpose is to make the weights and gradients of the network model tend to in a stable state.

[0028] S2, the switching phase, which only includes 2 iterations of training, which are used to complete the backup of the retrieved global weights and the first local parameter update operation respectively, the purpose of which is to switch the synchronous stochastic gradient descent update method to multi-step Delayed training mode. The local parameter update operation adopts the local up...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a distributed deep learning multi-step delay update method based on communication operation sparsification. The specific steps include: warm-up training, using a synchronous stochastic gradient descent method to perform a certain depth learning model before performing multi-step delay iterative training. The training of the number of iterations; the switching stage, the purpose is to switch the synchronous stochastic gradient descent update method to the multi-step delay training mode; the local parameter update operation adopts the local update method based on the global gradient, the purpose is to alleviate the weight delay and ensure the model. Convergence accuracy; multi-step delay training, which specifically includes three steps of global parameter update, local parameter update and communication operation sparsification. The present invention slows down network congestion by adopting sparse communication operations, eliminates synchronization overhead, greatly reduces communication overhead in the distributed training process, and optimizes the communication overhead in the training process.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence, in particular to a training update method for distributed deep learning. Background technique [0002] Deep learning has recently achieved great success in various fields such as computer vision, natural language processing, autonomous driving, and intelligent medical care. The rise of deep learning is mainly due to two conditions. One is the emergence of general-purpose and customized hardware accelerators (GPU, NPU, TPU, etc.), which have brought great progress in computing power. Open source of general training datasets like CIFAR. However, with the rapid growth of deep neural networks and datasets, the computing power of the machines used for training becomes a bottleneck, and it takes days or weeks to complete the training of a large neural network model. In this case, distributed Training has become a common practice, which greatly improves training efficiency and speeds u...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F8/65G06N3/04G06N3/063G06N3/08
CPCG06F8/65G06N3/063G06N3/08G06N3/045
Inventor 董德尊徐叶茂徐炜遐廖湘科
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products