Distributed training method and device for deep learning model, and computing equipment

A distributed computing and deep learning technology, applied in the field of data processing, can solve the problems of low utilization of computing node hardware resources and low efficiency of distributed training and training, so as to improve throughput and hardware resource utilization, and reduce communication computing Compared with the effect of improving training efficiency
CN113642734APending Publication Date: 2021-11-12ALIBABA GRP HLDG LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ALIBABA GRP HLDG LTD
Publication Date
2021-11-12

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses a distributed training method and device for a deep learning model and computing equipment. The method comprises the following steps: in each training step, acquiring a predetermined number of training data from a training data set as batch training data; calculating a gradient of a model parameter of the deep learning model on the batch training data, and taking the gradient as a local gradient; calculating an accumulated value of the local gradients of the preset number of training steps as an accumulated gradient; communicating with other computing nodes, and exchanging accumulated gradients of each other; and calculating a gradient average value of the accumulated gradients of all the computing nodes, and updating the model parameters based on the gradient average value.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The present invention relates to the technical field of data processing, in particular to a distributed training method, device and computing equipment of a deep learning model. Background technique

[0002] Deep learning is an increasingly popular computing and machine learning implementation method in the industry, which can be used in various scenarios such as images, voice, video, and machine translation. Taking machine translation as an example, the effect of machine translation based on neural networks has been significantly improved, and it has been continuously developed in recent years. At present, in some languages ​​and scenarios, the translation quality can even reach the level of human translation.

[0003] Data Parallel (Data Parallel) is a form of distributed training for deep learning models, which divides the training data into multiple parts and trains on different computing nodes. If the computing nodes do not have shared public me...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More