Distributed training method and device for deep learning model
A deep learning and distributed technology, applied in the field of deep learning and distributed training, can solve problems such as adaptive adjustment of the number of unworkable servers, GPU vacancy, and low utilization of GPU clusters.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0091] Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
[0092] like figure 1 As shown, this embodiment provides a distributed training method for a deep learning model, including the following steps:
[0093] Step S110: Obtain the training state data corresponding to the training task sent by the deep learning platform;
[0094] Step S120: Generate an elastic scaling policy according to the resource requirements corresp...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


