Supercharge Your Innovation With Domain-Expert AI Agents!

Distributed machine learning system and communication scheduling method suitable for same

A machine learning and communication scheduling technology, applied in machine learning, transmission systems, instruments, etc., can solve the problems of wasting computing and communication resources, linearly decreasing computing time, and increasing communication time, avoiding operating system scheduling and comprehensive traffic analysis. accurate effect

Active Publication Date: 2020-09-01
HUNAN UNIV
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the above process, communication and computing do not overlap each other, resulting in waste of computing and communication resources
[0010] Moreover, as the amount of training data increases, the increase of nodes can linearly decrease the calculation time. Therefore, it is an inevitable trend to set more working nodes in the distributed machine learning system to train the model, but at the same time, the increase of working nodes Make the communication time grow nonlinearly and rapidly, causing the communication time to become the bottleneck of distributed machine learning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed machine learning system and communication scheduling method suitable for same
  • Distributed machine learning system and communication scheduling method suitable for same
  • Distributed machine learning system and communication scheduling method suitable for same

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The following is a detailed description of the embodiments of the present invention. This embodiment is carried out based on the technical solution of the present invention, and provides detailed implementation methods and specific operation processes to further explain the technical solution of the present invention.

[0047] The present invention provides an embodiment of a communication scheduling method applicable to a distributed machine learning system. An automaton is added to the distributed machine learning system. The automaton is a functional unit set on a parameter server or a network chip of a switch. Perform network traffic analysis on all nodes of the distributed machine learning system to identify parameter servers and worker nodes in the distributed machine learning system.

[0048] Such as figure 1 As shown, the specific process of the automaton performing traffic analysis on all nodes of the distributed machine learning system is:

[0049] Let the cu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a distributed machine learning system and a communication scheduling method suitable for the same and the method comprises the steps: additionally arranging an automaton on a parameter server of the distributed machine learning system or a network chip of a switcher, and recognizing the parameter server and a working node in the distributed machine learning system through the automaton; enabling the parameter server to sequentially send the correspondingly distributed parameters to each working node, and only send the correspondingly distributed parameters to one working node at the same time point; after each working node pulls the parameter from the parameter server, immediately starting to calculate the gradient according to the parameter; and after each workingnode completes gradient calculation, judging whether the parameter server receives gradients pushed by other working nodes or not at the moment, and if not, pushing the gradient calculated by the current working node to the parameter server. According to the invention, the communication of the distributed machine learning system is reasonably scheduled, and the communication time cost of distributed machine learning is effectively reduced.

Description

technical field [0001] The invention belongs to the cross technical field of distributed computing and machine learning, and specifically relates to a distributed machine learning system and a communication scheduling method applicable thereto. Background technique [0002] With the advent of the era of big data, machine learning algorithms, especially deep learning algorithms suitable for large-scale data, are receiving more and more attention and applications, including speech recognition, image recognition, and natural language processing. However, with the increase of the input training data (a type of data used to solve the neural network model in machine learning) and the neural network model, there are memory limitations and weeks or even months of training time for single-node machine learning training. problem, distributed machine learning came into being. Distributed machine learning has received widespread attention in both industry and academia. For example, Goo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N20/00H04L29/08
CPCG06N20/00H04L67/10H04L67/60Y02D30/50
Inventor 陈果陈博伟蔡均瑶
Owner HUNAN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More