Deep learning framework Caffe system and algorithm based on MIC cluster

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A deep learning and clustering technology, applied in the field of high-performance computing, can solve the problems of limited cost, scalability and performance, and large time complexity of a single node, achieve load balancing, improve kernel computing efficiency, and improve performance.

Inactive Publication Date: 2017-05-10

ZHENGZHOU YUNHAI INFORMATION TECH CO LTD

View PDF3 Cites 16 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] As the problems that need to be solved become more and more complex and the performance requirements of convolutional neural networks are getting higher and higher, more and more training data are required in the network, and distributed storage is in the network, correspondingly requiring more Trainable parameters and calculations, while the original version of Caffe is generally implemented serially based on a stand-alone system, which results in a considerable amount of time spent training a complex model that uses a large amount of data

[0004] The limitations of the original version of Caffe running on a single machine and single process lead to limited scalability and performance. At the same time, in the ForwardBackward calculation of the Caffe kernel part, complex operations such as matrices and equations are involved. The calculation of this part of the original version of Caffe is based on serial single-thread Execution, and when the size of the matrix is large, the running time complexity on a single node will be quite large

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0023] The Caffe algorithm system based on the deep learning framework of the MIC cluster includes multiple nodes in the MIC cluster, and the nodes include a master node and a slave node, and each node shares data and tasks through MPI communication. The master node is responsible for calculating and summarizing the information fed back by each node, and then distributing the updated parameters to each node. The slave node uses the new parameters to perform the next round of iterative calculation, and feeds back the execution result to the master node.

[0024] The Caffe algorithm, a deep learning framework based on the MIC cluster, runs on multiple nodes of the MIC cluster through MPI technology. The tasks and data are equally divided between each node through MPI communication, and sub-tasks and sub-data are executed in parallel between different nodes to perform ForwardBackward in Caffe. Calculation, the execution result is fed back to the master node, the master node calcu...

Embodiment 2

[0028] Taking 5 nodes as an example, the master-slave nodes and master-slave processes are allocated as follows figure 2 As shown, the MIC cluster includes nodes numbered 0-16, and the node numbered 0 is set as the master node, and the master node is connected to 4 slave nodes through threads. Each slave node contains 1 master process and 3 slave processes. Slave node 1 includes master process 1 and slave processes 2, 3, and 4. The slave node 2 includes a master process 5 and slave processes 6, 7, and 8. The slave node 3 includes a master process 9 and slave processes 10, 11, 12. The slave node 4 includes a master process 13 and slave processes 14, 15, 16.

[0029] In the case that the number of parallel threads for slave process calculation changes, the number n of slave processes will be increased or decreased accordingly to ensure the full utilization of the number of threads on each MIC node.

Embodiment 3

[0031] The difference from Embodiment 1 is that in the ForwardBackward calculation of the Caffe kernel part, complex operations such as matrices and equations are involved, and the calculation method of OpenMp multi-threaded concurrent execution is used to decompose the complex operations, and the multi-threaded parallel setting method is a parallel outer loop. The overhead of thread scheduling is reduced, and it turns out that the computational efficiency of the entire program is greatly improved through parallel matrix operations. The multi-threaded parallelism of the kernel is mainly based on the bach_size decomposition of convolution, pooling and other layers, that is, the parallel reading and processing of pictures, which reduces the time complexity of the program and improves performance. The flow chart of multi-threaded parallel implementation is as follows: image 3 shown.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a deep learning framework Caffe system and algorithm based on an MIC cluster. The deep learning framework Caffe algorithm runs on multiple nodes of the MIC cluster according to an MPI technology; the nodes share a task and data equally via MPI communication; different nodes parallelly execute subtasks and process subdata, perform ForwardBackward calculation in Caffe, and feed back execution results to a master node; the master node calculates and tabulates weight information fed back by each node, and distributes updated parameters to the nodes; and each slave node performs next iterative computation according to the new parameters. Therefore, multithread-based parallel computing in a same process is realized; and as a large quantity of matrix calculations are involved in a convolutional neural network, while multithread-based parallel computing is mainly applied to matrix calculations, the performance of an MIC processor can be played to the most extent by setting a quantity of parallel threads, and the runtime performance of the Caffe is improved.

Description

technical field [0001] The invention relates to the field of high-performance computing, and in particular to an optimization method for parallel processing of data and tasks of a deep learning framework based on a cluster system and ensuring load balancing between processes and nodes. Background technique [0002] The author of Caffe (Convolution Architecture For Feature Extraction) is Jia Yangqing, who graduated from UC Berkeley with a Ph.D. It is currently one of the most popular deep learning frameworks, and it is characterized by quick use, modularity, and openness. Caffe contains a variety of convolutional neural network implementation models, including googlenet, alexnet, etc. The training process of the entire convolutional neural network is realized by layer-by-layer calculations such as convolution and down-sampling. [0003] As the problems that need to be solved become more and more complex and the performance requirements of convolutional neural networks are get...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06N3/063G06F9/48G06F9/50G06F9/54

CPCG06N3/063G06F9/4843G06F9/5027G06F9/546G06F2209/483G06F2209/5018

Inventor 刘姝

Owner ZHENGZHOU YUNHAI INFORMATION TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Deep learning framework Caffe system and algorithm based on MIC cluster

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology