Parameter communication optimization method for distributed machine learning

A machine learning and parameter communication technology, applied in machine learning, database distribution/replication, instruments, etc., can solve problems such as unbalanced iterative computing load, fault-tolerant amplification, lost local updates, etc., to achieve heterogeneous cluster computing performance optimization, The effect of high accuracy and speed, training speed optimization

Pending Publication Date: 2020-04-17
杭州电子科技大学舟山同博海洋电子信息研究院有限公司 +2
View PDF3 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The overall synchronous parallel method will lead to unbalanced iterative calculation load due to the difference in the performance of computing nodes
[0004] At present, there are some methods to solve the above problems, such as the distributed machine learning asynchronous iteration scheme, in which the computing nodes can use the local model parameters to execute the next iteration before receiving the global model parameters. It may fall into a local optimum, and it cannot guarantee that the machine learning model will eventually converge to the optimal solution,

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parameter communication optimization method for distributed machine learning
  • Parameter communication optimization method for distributed machine learning
  • Parameter communication optimization method for distributed machine learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] Below in conjunction with accompanying drawing and specific implementation application process, the present invention is further described:

[0024] Step 1—Set up nodes in a master-slave manner:

[0025] Such as figure 1 , the present invention uses one node of the heterogeneous cluster as a parameter server, and the other nodes as computing nodes to realize a parameter server system. Such as figure 2 , the parameter server is implemented in a multi-threaded manner. Each thread corresponds to a computing node, which is used to receive and send the gradient calculated by the computing node; and another thread is specially set up to process the sum of the gradients of the above threads and the model parameters. updates and broadcasts. Such as image 3 , the calculation node is mainly used to calculate and update the model gradient.

[0026] Step 2 - Adopt a data parallel strategy:

[0027] The present invention constructs multiple copies of the network model to be ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a parameter communication optimization method for distributed machine learning. According to the method, the fault-tolerant characteristic of the machine learning iteration-convergence algorithm is expanded; a dynamic finite fault tolerance characteristic is provided, a distributed machine learning parameter communication optimization strategy is realized based on the dynamic finite fault tolerance, the performance of each computing node is fully utilized by dynamically adjusting the synchronization strategy of each computing node and a parameter server in combination with a performance detection model, and the accuracy of the machine learning model is ensured; sufficient computing resources are guaranteed, and the training process of the model is not affected by dynamic changes of the distributed computing resources; a training algorithm and system hardware resources are decoupled, the process that developers manually allocate computing resources and adjust andoptimize data communication according to experience is liberated, and the expansibility and high execution efficiency of a program in various cluster environments are effectively improved. The methodcan be applied to the fields of optimization of distributed machine learning parameter communication, optimization of cluster computing performance and the like.

Description

technical field [0001] The invention belongs to the field of machine learning and high-performance parallel computing, in particular to a parameter communication optimization method oriented to distributed machine learning. Background technique [0002] With the advent of the big data era, distributed machine learning has become a research hotspot because it can adapt to the complexity of big data, obtain higher prediction accuracy, and support smarter tasks. [0003] The main purpose of distributing machine learning is: (1) to solve the insufficient memory of a single computing node, so as to ensure that the data volume of TB level and above can be processed; (2) to use parallel acceleration model training to reduce the training time of several months shorten. One of the most important issues is how to achieve parallel acceleration of the training process. Data parallelization based on parameter server is a common parallelization scheme in distributed machine learning, in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N20/00G06F16/27
CPCG06N20/00G06F16/27
Inventor 张纪林屠杭镝沈静李明伟万健孙海
Owner 杭州电子科技大学舟山同博海洋电子信息研究院有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products