Distributed training method and device for deep learning model, and computing equipment

A distributed computing and deep learning technology, applied in the field of data processing, can solve the problems of low utilization of computing node hardware resources and low efficiency of distributed training and training, so as to improve throughput and hardware resource utilization, and reduce communication computing Compared with the effect of improving training efficiency

Pending Publication Date: 2021-11-12
ALIBABA GRP HLDG LTD
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in the existing distributed training methods, the communication calculation ratio of each computing node (the time for the computing node to communicate with other computing nodes and the time for the computing node to perform gradient calculation, the ratio of the two) is relatively high, so that the computing node The utilization rate of hardware resources is not high, which makes the training efficiency of distributed training low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed training method and device for deep learning model, and computing equipment
  • Distributed training method and device for deep learning model, and computing equipment
  • Distributed training method and device for deep learning model, and computing equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0040] Firstly, the implementation environment of the distributed training method of the embodiment of the present invention is introduced.

[0041] data center

[0042] A data center is a network of specific equipment for global collaboration, which is used to transmit, accelerate, display, calculate, and store data information on the Internet network infrastructure. In the future development, the data center will also become a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed training method and device for a deep learning model and computing equipment. The method comprises the following steps: in each training step, acquiring a predetermined number of training data from a training data set as batch training data; calculating a gradient of a model parameter of the deep learning model on the batch training data, and taking the gradient as a local gradient; calculating an accumulated value of the local gradients of the preset number of training steps as an accumulated gradient; communicating with other computing nodes, and exchanging accumulated gradients of each other; and calculating a gradient average value of the accumulated gradients of all the computing nodes, and updating the model parameters based on the gradient average value.

Description

technical field [0001] The present invention relates to the technical field of data processing, in particular to a distributed training method, device and computing equipment of a deep learning model. Background technique [0002] Deep learning is an increasingly popular computing and machine learning implementation method in the industry, which can be used in various scenarios such as images, voice, video, and machine translation. Taking machine translation as an example, the effect of machine translation based on neural networks has been significantly improved, and it has been continuously developed in recent years. At present, in some languages ​​and scenarios, the translation quality can even reach the level of human translation. [0003] Data Parallel (Data Parallel) is a form of distributed training for deep learning models, which divides the training data into multiple parts and trains on different computing nodes. If the computing nodes do not have shared public me...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N20/00
CPCG06N20/00
Inventor 樊士庆孟晨王思宇龙国平杨军
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products