Distributed parallel deep neural network performance evaluation method for super computer

A deep neural network and supercomputer technology, applied in neural learning methods, biological neural network models, computing, etc., can solve the problem of inability to fully evaluate and effectively utilize the computing power of the Tianhe-3 supercomputer, no evaluation method is provided, and no prototype is available. Support and other issues to achieve the effect of shortening training time, realizing full utilization, and improving convergence speed

Active Publication Date: 2021-03-02
XI AN JIAOTONG UNIV
View PDF6 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The prototype machine does not support development frameworks for deep neural networks such as caffe, pytorch, tensorflow, etc., nor does it provide evaluation methods for distributed parallel deep neural networks on the platform, so it is impossible to directly carry out the evaluation of corresponding distributed parallel deep neural networks, and then It is impossible to fully evaluate and effectively utilize the powerful computing power of the Tianhe-3 supercomputer

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed parallel deep neural network performance evaluation method for super computer
  • Distributed parallel deep neural network performance evaluation method for super computer
  • Distributed parallel deep neural network performance evaluation method for super computer

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0064] Taking the Tianhe-3 prototype as an example, the evaluation of the performance of the distributed parallel deep neural network based on the supercomputer of the present invention is carried out. The Tianhe-3 prototype has two different processor nodes, MT-2000+ and FT-2000+. In this embodiment, MT-2000+ and FT-2000+ single-node, multi-process parallel training tasks, MT-2000+ multi-node multi-process distributed training tasks and FT-2000+ multi-node multi-process distributed training tasks are designed respectively. Comprehensively evaluate the parallel training performance of a single node on the Tianhe-3 prototype and its scalability in multi-node distributed training. In order to ensure the robustness of the data, all the experimental results in this embodiment are the arithmetic mean value after five tests, and the evaluation results are as follows:

[0065] 1. The performance of a single node is as follows:

[0066] In a single MT2000+ node, the loss value of tra...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a distributed parallel deep neural network performance evaluation method for a super computer, and belongs to the field of high-performance calculation and deep neural networks. The method sequentially comprises the steps of distributed parallel granularity design, deep neural network training framework platform configuration, distributed parallel communication protocol deployment, deep neural network optimization, training test data set slicing and test mode application, and a universal test method is provided for developers. The design of the distributed parallel granularity ensures the comprehensive test of a single node and multiple nodes; the reliability of application layer deployment and bottom layer communication is ensured through coupling of a deep neuralnetwork training framework pytorch and a bottom layer distributed communication framework MPI; the single-node test of different processors and the multi-node test of different processors are helpfulfor realizing the full utilization of super computer computing resources, improving the program computing performance, shortening the neural network training time and improving the neural network training convergence rate.

Description

technical field [0001] The invention belongs to the field of high-performance computing and deep neural network, in particular to a supercomputer-oriented distributed parallel deep neural network performance evaluation method. Background technique [0002] The processors used by the Tianhe-3 prototype include FT-2000+ (FTP) and MT-2000+ (MTP). FTP includes 64 FTC662 processor cores with armv8 architecture, and the main frequency is 2.2-2.4GHZ. The 32MB secondary cache can provide 204.8GB / s memory access bandwidth, and the typical working energy consumption is about 100W; while the MTP processor, which contains a total of 128 armv8 cores, is organized into 4 super nodes, and the main frequency can reach up to 2.0GHZ, the consumption of the whole processor is 240W. The prototype machine does not support development frameworks for deep neural networks such as caffe, pytorch, tensorflow, etc., nor does it provide evaluation methods for distributed parallel deep neural networks ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/04G06N3/08G06F11/34
CPCG06N3/08G06F11/3447G06N3/045Y02D10/00
Inventor 张兴军魏嘉纪泽宇李靖波姬辰肇魏正岳莹莹高柏松
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products