Method and equipment for model training in distributed system

A technology of distributed systems and training models, applied in computing models, transmission systems, program control design, etc., can solve the problems of heavy workload of the main node and the burden of the main node, avoid the risk of deadlock, reduce the burden and save money The effect of performance overhead

Active Publication Date: 2017-08-08
HUAWEI TECH CO LTD
View PDF6 Cites 42 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] It can be seen that the master node needs to perform multiple model updates and parameter delivery. For large-scale training scenarios, the workload of the master node is relatively heavy, which brings a greater burden to the master node, and it is easy to make the master node a part of the entire training process. The bottleneck of the scene

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and equipment for model training in distributed system
  • Method and equipment for model training in distributed system
  • Method and equipment for model training in distributed system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the embodiments of the present invention.

[0034] It should also be understood that, although the terms first, second, etc. may be used herein to describe various components, these terms are only used to distinguish one element from another. "Multiple" in the embodiments of the present invention means two or more. "And / or" describes the association relationship of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A method and equipment for model training in a distributed system are used to relieve burdens of main nodes in model training. The method comprises the steps that a parameter server in a first servo node receives training results which are sent by a parameter client side in at least one servo node in the distributed system, wherein the first servo node is a random servo node in the distributed system, and a parameter client side of each servo node executes a training task corresponding to a sub-model stored in the parameter server of the servo node so as to obtain the training results; and the parameter server in the first servo node updates the stored sub-model according to the received training results.

Description

technical field [0001] The invention relates to the technical field of machine learning, in particular to a method and equipment for training models in a distributed system. Background technique [0002] Building models in machine learning (machining learning, ML) is a key step in data mining (data mining, DM) tasks. Take a general parallel framework (Spark) as an example. When building a model, the master node (Master) can send tasks to multiple slave nodes (Slave) for execution. When executing tasks, it generally needs to go through multiple rounds of iterative operations. , after each round of iterative operation, each slave node needs to report the result of the iterative operation to the master node, the master node updates the model, and sends the updated parameters to each slave node, and each slave node starts again Execute the next round of iterative operation. [0003] It can be seen that the master node needs to perform multiple model updates and parameter deliv...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F15/18G06N20/00
CPCG06N20/00G06F9/5027G06F2209/5017H04L41/0803H04L67/10
Inventor 张友华涂丹丹
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products