Deep learning framework Caffe system and algorithm based on MIC cluster

A deep learning and clustering technology, applied in the field of high-performance computing, can solve the problems of limited cost, scalability and performance, and large time complexity of a single node, achieve load balancing, improve kernel computing efficiency, and improve performance.

Inactive Publication Date: 2017-05-10
ZHENGZHOU YUNHAI INFORMATION TECH CO LTD
View PDF3 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] As the problems that need to be solved become more and more complex and the performance requirements of convolutional neural networks are getting higher and higher, more and more training data are required in the network, and distributed storage is in the network, correspondingly requiring more Trainable parameters and calculations, while the original version of Caffe is generally implemented serially based on a stand-alone system, which results in a considerable amount of time spent training a complex model that use

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deep learning framework Caffe system and algorithm based on MIC cluster
  • Deep learning framework Caffe system and algorithm based on MIC cluster
  • Deep learning framework Caffe system and algorithm based on MIC cluster

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0023] The Caffe algorithm system based on the deep learning framework of the MIC cluster includes multiple nodes in the MIC cluster, and the nodes include a master node and a slave node, and each node shares data and tasks through MPI communication. The master node is responsible for calculating and summarizing the information fed back by each node, and then distributing the updated parameters to each node. The slave node uses the new parameters to perform the next round of iterative calculation, and feeds back the execution result to the master node.

[0024] The Caffe algorithm, a deep learning framework based on the MIC cluster, runs on multiple nodes of the MIC cluster through MPI technology. The tasks and data are equally divided between each node through MPI communication, and sub-tasks and sub-data are executed in parallel between different nodes to perform ForwardBackward in Caffe. Calculation, the execution result is fed back to the master node, the master node calcu...

Embodiment 2

[0028] Taking 5 nodes as an example, the master-slave nodes and master-slave processes are allocated as follows figure 2 As shown, the MIC cluster includes nodes numbered 0-16, and the node numbered 0 is set as the master node, and the master node is connected to 4 slave nodes through threads. Each slave node contains 1 master process and 3 slave processes. Slave node 1 includes master process 1 and slave processes 2, 3, and 4. The slave node 2 includes a master process 5 and slave processes 6, 7, and 8. The slave node 3 includes a master process 9 and slave processes 10, 11, 12. The slave node 4 includes a master process 13 and slave processes 14, 15, 16.

[0029] In the case that the number of parallel threads for slave process calculation changes, the number n of slave processes will be increased or decreased accordingly to ensure the full utilization of the number of threads on each MIC node.

Embodiment 3

[0031] The difference from Embodiment 1 is that in the ForwardBackward calculation of the Caffe kernel part, complex operations such as matrices and equations are involved, and the calculation method of OpenMp multi-threaded concurrent execution is used to decompose the complex operations, and the multi-threaded parallel setting method is a parallel outer loop. The overhead of thread scheduling is reduced, and it turns out that the computational efficiency of the entire program is greatly improved through parallel matrix operations. The multi-threaded parallelism of the kernel is mainly based on the bach_size decomposition of convolution, pooling and other layers, that is, the parallel reading and processing of pictures, which reduces the time complexity of the program and improves performance. The flow chart of multi-threaded parallel implementation is as follows: image 3 shown.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a deep learning framework Caffe system and algorithm based on an MIC cluster. The deep learning framework Caffe algorithm runs on multiple nodes of the MIC cluster according to an MPI technology; the nodes share a task and data equally via MPI communication; different nodes parallelly execute subtasks and process subdata, perform ForwardBackward calculation in Caffe, and feed back execution results to a master node; the master node calculates and tabulates weight information fed back by each node, and distributes updated parameters to the nodes; and each slave node performs next iterative computation according to the new parameters. Therefore, multithread-based parallel computing in a same process is realized; and as a large quantity of matrix calculations are involved in a convolutional neural network, while multithread-based parallel computing is mainly applied to matrix calculations, the performance of an MIC processor can be played to the most extent by setting a quantity of parallel threads, and the runtime performance of the Caffe is improved.

Description

technical field [0001] The invention relates to the field of high-performance computing, and in particular to an optimization method for parallel processing of data and tasks of a deep learning framework based on a cluster system and ensuring load balancing between processes and nodes. Background technique [0002] The author of Caffe (Convolution Architecture For Feature Extraction) is Jia Yangqing, who graduated from UC Berkeley with a Ph.D. It is currently one of the most popular deep learning frameworks, and it is characterized by quick use, modularity, and openness. Caffe contains a variety of convolutional neural network implementation models, including googlenet, alexnet, etc. The training process of the entire convolutional neural network is realized by layer-by-layer calculations such as convolution and down-sampling. [0003] As the problems that need to be solved become more and more complex and the performance requirements of convolutional neural networks are get...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/063G06F9/48G06F9/50G06F9/54
CPCG06N3/063G06F9/4843G06F9/5027G06F9/546G06F2209/483G06F2209/5018
Inventor 刘姝
Owner ZHENGZHOU YUNHAI INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products