Distributed training method and device for deep learning model

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A deep learning and distributed technology, applied in the field of deep learning and distributed training, can solve problems such as adaptive adjustment of the number of unworkable servers, GPU vacancy, and low utilization of GPU clusters.

Pending Publication Date: 2020-11-27

CHINA UNIONPAY

View PDF0 Cites 24 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] However, when the deep learning model is performing different training tasks, some training tasks require more GPUs, while others only require less GPUs, or some special training tasks will show a certain cycle when using GPUs. Sexual characteristics, there are peaks and valleys in use, resulting in an idle state of the GPU in some training tasks

Therefore, for different training tasks, the number of working servers cannot be adaptively adjusted, resulting in low GPU cluster utilization

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0091] Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

[0092] like figure 1 As shown, this embodiment provides a distributed training method for a deep learning model, including the following steps:

[0093] Step S110: Obtain the training state data corresponding to the training task sent by the deep learning platform;

[0094] Step S120: Generate an elastic scaling policy according to the resource requirements corresp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a distributed training method and device for a deep learning model. According to the specific implementation scheme, the method comprises the steps of obtaining training statedata corresponding to a training task sent by a deep learning platform; generating an elastic scaling strategy according to a cluster resource demand corresponding to the training task; dynamically adjusting the number of training nodes corresponding to the training task by adopting an elastic scaling strategy; and executing a training task according to the training state data and the adjusted training node. According to the method, the adaptability of the cluster resource demand corresponding to the training task is improved, the GPU or CPU resource utilization rate is improved, and it can beensured that the training task can be correctly and efficiently executed by utilizing the adjusted training node under the condition that the training node is added or deleted at any time.

Description

technical field [0001] This application relates to the field of deep learning, especially to the field of distributed training. Background technique [0002] The deep learning framework / platform supports a distributed training mode, that is, multiple devices can be used, and multiple GPUs (Graphics Processing Units) can be set on each device, and the deep learning model is parallelized on the GPUs in each device. train. Existing deep learning frameworks / platforms, for example, TensorFlow (based on data flow programming, dataflow programming) native PS (parameter service, Parameter server) architecture supports asynchronous training mode. When the deep learning framework / platform is running, the deep learning framework / platform can be deployed to a specific physical cluster. The nodes in the TensorFlow cluster are divided into two categories: parameter server and worker. The parameter server stores the parameters of the model, and the working server is responsible for calcu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F9/50G06N3/063G06N3/04G06N3/08

CPCG06F9/5027G06F9/5066G06N3/063G06N3/08G06N3/045

Inventor 乔萧雅刘国宝周雍恺

Owner CHINA UNIONPAY

Distributed training method and device for deep learning model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology