A distributed acceleration method and system for a deep learning training task

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A deep learning and distributed technology, applied in the field of deep learning, can solve the problems of reducing training accuracy, reducing single communication traffic, etc., to achieve the effect of improving cluster expansion efficiency, compressing communication time, and accelerating the training process

Active Publication Date: 2019-06-18

INST OF INFORMATION ENG CAS

View PDF4 Cites 30 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Most of the existing technologies focus on the first approach, which compresses the sent gradients by means of quantization and sparse gradients to reduce the single communication traffic, but this method will reduce the final training accuracy in most cases. Rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0030] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0031] refer to figure 1 , the present embodiment provides a distributed acceleration method for deep learning training tasks, the method includes the following steps:

[0032] (1) Build a distributed GPU training cluster, including: divide parameter servers and work nodes, determine communication architecture, refer to figure 2 , the specific steps are as follows:

[0033] (1-1) Build a parameter server to save and update model parameters. The CPUs of all servers in the cluster collectively form a parameter server. All model parameters are evenly stored in the memory of each CPU. The parameter update is completed by the CPU and exposed to the outside world. There are two operations, push and pull, to be invoked by working nodes. The push operation refers to the parameter server receiving the gradient sent by the working node, and the pull o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a distributed acceleration method and system for a deep learning training task. The method comprises the following steps: (1) building a distributed GPU training cluster; (2)adopting a swap-in swap-out strategy, and adjusting a minibatch size on a single GPU working node in the distributed GPU training cluster is adjusted; (3) adjusting the learning rate according to theminibatch size determined in the step (2); And (4) carrying out deep learning training by adopting the hyper-parameters minibatch size determined in the steps (2) and (3) and the learning rate. On thepremise that the training accuracy is not affected, the communication time is greatly compressed simply and efficiently by reducing the number of times of parameter updating communication between clusters, and compared with a single GPU mode, the cluster expansion efficiency can be fully improved in a multi-GPU mode, and acceleration of the training process of the ultra-deep neural network modelis achieved.

Description

technical field [0001] The invention belongs to the field of deep learning, and specifically aims at the problems of low cluster expansion efficiency and slow training in the process of training ultra-deep neural network models by distributed GPU clusters, and proposes an acceleration method to reduce the time required for training. Background technique [0002] In recent years, big data-driven deep learning technology has achieved considerable performance improvements in many fields of artificial intelligence. The deeper and deeper the neural network model, the larger the data scale has become the current basic trend. Complex network models often require more training data to obtain excellent generalization capabilities. However, training deep models with such a combination of big data is a great challenge. Deep learning training tasks are typical computing-intensive tasks, so distributed GPU (Graphics Processing Unit-graphics processing unit) clusters are often used for tr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/063G06N3/08G06T1/20

Inventor 刘万涛郭锦荣虎嵩林韩冀中

Owner INST OF INFORMATION ENG CAS

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A distributed acceleration method and system for a deep learning training task

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology