Efficient neural network training scheduling method based on heterogeneous distributed system

A neural network training and distributed system technology, applied in the field of efficient neural network training scheduling, can solve the problems of increased synchronization waiting time overhead, training interruption, non-convergence, etc., to achieve improved robustness and scalability, high The effect of accuracy and high convergence rate

Inactive Publication Date: 2020-04-28
杭州电子科技大学舟山同博海洋电子信息研究院有限公司 +1
View PDF2 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] 2) When the performance difference between computing nodes is large, the time overhead of synchronous waiting is increased
However, none of the above strategies can completely solve the uneven load caused by node heterogeneity or task preemption, resulting in frequent interruption and non-convergence of training

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Efficient neural network training scheduling method based on heterogeneous distributed system
  • Efficient neural network training scheduling method based on heterogeneous distributed system
  • Efficient neural network training scheduling method based on heterogeneous distributed system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] The specific steps of the method of the present invention are:

[0024] Step 1. Establish a distributed neural network training system based on parameter server. The distributed neural network training system has two kinds of nodes, which are the master node and the working node. Among them, the master node and the working node communicate in a point-to-point manner.

[0025] Such as Image 6 As shown, the present invention adopts a parameter server as an implementation architecture, and uses a process to simulate a master node and work. Among them, the master node uses multithreading to communicate with the working nodes point-to-point. There is no data communication between the working node and the working node, and the working node only performs data communication with the master node, thereby greatly reducing the occurrence of network congestion. In the process of simulating the master node, an additional thread is set up to aggregate the model parameters transmitted...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an efficient neural network training scheduling method based on a heterogeneous distributed system. The method comprises the following steps: firstly, detecting and analyzing resource dynamic changes in a distributed system through a resource detection system; a training process is decomposed into internal iteration and external iteration as important subsets of a task scheduling system, and then the task scheduling system adaptively modifies environmental parameters and schedules and calculates according to distributed system node state information provided by a resource detection system. Related experiments performed under a public data set show that the method has better robustness and expandability on the premise of ensuring high accuracy and convergence rate.

Description

Technical field [0001] The invention belongs to the technical field of distributed machine learning acceleration, and specifically is an efficient neural network training scheduling method based on a heterogeneous distributed system. Background technique [0002] Machine learning, especially deep learning, has become one of the core research contents in the field of artificial intelligence, and has been widely used in image recognition, natural language processing and other fields. As the size of machine learning training data sets and the number of model parameters continue to grow, stand-alone training machine learning models can no longer adapt to large-scale data environments. The huge training data set size and complex model structure can improve the accuracy of the model, but it will bring higher time and resource costs. In recent years, with the development of distributed systems and the improvement of hardware performance, distributed machine learning has become a resear...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/063G06N3/08
CPCG06N3/063G06N3/08
Inventor 张纪林周详万健任永坚周丽
Owner 杭州电子科技大学舟山同博海洋电子信息研究院有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products