Unlock instant, AI-driven research and patent intelligence for your innovation.

Distributed training method and device for deep learning model, equipment and storage medium

A deep learning and training method technology, applied in the field of model training, can solve the problem of not considering the node network communication speed, and achieve the effect of avoiding the node network communication speed from being too slow

Pending Publication Date: 2020-04-07
GUANGDONG INSPUR BIG DATA RES CO LTD
View PDF6 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, when Kubernetes dispatches the distributed model to nodes, it does not consider the network communication speed between these nodes. When the number of network transmissions between nodes is frequent or the amount of data transmitted is large, the network transmission speed is very important. Too slow network transmission Speed ​​can become the bottleneck of model training

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed training method and device for deep learning model, equipment and storage medium
  • Distributed training method and device for deep learning model, equipment and storage medium
  • Distributed training method and device for deep learning model, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0037] It should be noted that Kubernetes can manage large-scale distributed clusters, but the network environment of the cluster is very complex. For example, the physical distance between the nodes in the cluster is very far, or there are multiple gateways between adjacent nodes, or the access bandwidth of the nodes is inconsistent, etc. Practical problems, these problems will lead to inconsistencies in the network communication speed between nodes, and the networ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a distributed training method for a deep learning model. The distributed training method comprises the following steps: acquiring a network communication speed data set betweennodes in a cluster; performing network grouping on all nodes of the cluster by utilizing a clustering algorithm and the network communication speed data set; and scheduling the distributed training task of the deep learning model to the node of the network group with the highest network communication speed for training. Visibly, in the application, network grouping is carried out on nodes in advance according to the network communication speed between the nodes, so that when a distributed training task of a deep learning model is executed, the node in the network group with the highest network communication speed can be selected for training, and the problem that the training speed is too low due to the fact that the network communication speed of the node is too low is solved. The invention further discloses a distributed training device and equipment for the deep learning model and a computer readable storage medium, and the technical effects can also be achieved.

Description

technical field [0001] The present invention relates to the technical field of model training, and more specifically, to a distributed training method, device, equipment, and computer-readable storage medium of a deep learning model. Background technique [0002] As the scale of deep learning models becomes larger and more parameters need to be trained, the computing power of a single computing node can no longer meet the needs of model training, so researchers conduct distributed training for deep learning models. The meaning of distributed model training is: the training process of the model is distributed to multiple computing nodes, and the computing resources of multiple nodes are used to participate in the parameter training of the model to speed up the training speed. When multiple nodes participate in the training process of the model, according to the researcher's design, each node will calculate some parameters of the model or model parameters of a certain iteratio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62H04L12/26
CPCH04L43/0894G06F18/23213G06F18/217
Inventor 王振
Owner GUANGDONG INSPUR BIG DATA RES CO LTD