Model training request scheduling method and device

A technology of model training and scheduling method, applied in the field of deep learning, can solve the problems of slowing down asynchronous training speed, low request response efficiency, etc., to achieve the effect of shortening request response time and improving efficiency

Pending Publication Date: 2022-06-07
ALIBABA (CHINA) CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since different workers are asynchronous, the time when the request arrives at ps is random, and the method of processing each request in chronological order makes the request response inefficient, which reduces the speed of asynchronous training

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Model training request scheduling method and device
  • Model training request scheduling method and device
  • Model training request scheduling method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Exemplary embodiments will be described in detail here, examples of which are represented in the drawings. When the following description relates to the drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar features. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are only examples of devices and methods consistent with some aspects of the present application as detailed in the appended claims.

[0030] First, the terms involved in this application are explained:

[0031] Parameter nodes: Nodes used to store, distribute, summarize, and update the parameters of the model during deep learning training, and each parameter node is responsible for some of the parameters of the model.

[0032] Compute nodes: In the process of deep learning training, the nodes used to perform training-related jobs, including inference computat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a model training request scheduling method and device, and aims at a model training system for training a model, the system is composed of a plurality of parameter nodes and a plurality of computing nodes, the parameter nodes are used for updating parameters of the model based on a gradient, and the computing nodes are used for computing the gradient based on the parameters of the model; the method comprises the following steps: acquiring to-be-processed requests sent by a plurality of computing nodes; sorting the to-be-processed requests according to node identifiers in the to-be-processed requests, the node identifiers being used for identifying computing nodes corresponding to the to-be-processed requests; and according to the sorting result, sequentially sending each to-be-processed request to the parameter node to obtain a processing result of each to-be-processed request. The requests are sequenced based on the identifiers of the computing nodes in the requests, so that the requests are responded in order, the time for the parameter nodes to respond to the requests of the single computing nodes is shortened, and the model training efficiency is improved.

Description

Technical field [0001] The present application relates to the field of deep learning technology, in particular to a model training request scheduling method and apparatus. Background [0002] In the field of deep learning, such as computer vision, natural language processing, personalized recommendation and other fields, in order to improve the quality or effect of the model, larger-scale model parameters or larger-scale training data are usually used in model training, such as click-through estimation models with trillion-level parameter scales, language models with hundreds of billions of parameter scales, etc. Distributed training has become a necessary means for efficient training of ultra-large-scale models. [0003] The commonly used distributed training architecture is the ps-worker architecture, which divides the nodes into ps (parameterserver, parameter server) and worker (worker or compute server) two roles, in asynchronous mode, each worker independently initiates pull...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/48G06F9/50G06N20/00
CPCG06F9/4881G06F9/5038G06N20/00G06F2209/5011G06F2209/5018
Inventor 李豪董建波宋钺张泽超
Owner ALIBABA (CHINA) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products