Unlock instant, AI-driven research and patent intelligence for your innovation.

Method for pausing and restoring MPI (message passing interface) parallel application running

An application program and continuous operation technology, applied in the computer field, can solve the problems of implementation difficulties, limited communication protocol support, and process communication timeout exit, etc., to achieve the effect of convenient control and scheduling

Active Publication Date: 2015-04-29
徐州君迈网络科技有限公司
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method has also been initially tried at home and abroad. The main difficulty is that the support provided by ordinary communication protocols (such as TCP / IP communication protocol) is very limited, so it is very difficult to implement.
As for the second method to solve the problem of suspending or suspending parallel programs, communication protocols are needed to provide support, so that enough communication status information of each process of MPI can be obtained, so as to facilitate the preservation of information of parallel applications and ensure communication between processes The consistency of the state, otherwise, when the parallel application is suspended according to the system default method, the process in communication will exit due to timeout, and eventually the entire parallel application will crash
[0006] In the traditional TCP / IP communication protocol, the maximum time for data transmission error retransmission is about 9 minutes, which is not possible in the current TCP communication protocol implementation (some commercial versions of Solaris allow the system administrator to change this time) Therefore, if the parallel application is suspended in the Ethernet environment in the normal way, it will eventually cause some process communication to time out and exit, which will cause the entire parallel application to crash.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for pausing and restoring MPI (message passing interface) parallel application running

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] In order to make the technical means, creative features, goals and effects achieved by the present invention easy to understand, the present invention will be further described below in conjunction with the accompanying drawings and specific examples.

[0032] figure 1 The processing process of this method "pause or resume the operation of MPI parallel application program" is described, and the main process is as follows:

[0033] Step 1. Transform the implementation of the TCP communication protocol in the Linux operating system, and add the control interface function tcp_ioctl_MPI() in the implementation of the TCP communication protocol to query the detailed status of the communication between MPI processes, and then control the communication between processes and process each process communication synchronization problem.

[0034] Step 2. Transform the signal mechanism in the Linux operating system, modify the interface function catch_tstp() of "handling the pause ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for pausing and restoring MPI (message passing interface) parallel application running. Some low-priority parallel calculation tasks need to be paused in advance, so that more calculation resources are supplied for new emergency parallel calculation tasks. Particularly, when a pausing or restoring signal is received in the running process of an MPI parallel application, an improved MPI library function, an improved Linux operating system signal mechanism and an improved communication TCP (transmission control protocol) are skillfully used, and all processes of the MPI parallel application are coordinately paused or restored. According to the method, the MPI library function, the Linux operating system signal mechanism and the communication TCP are improved on a lower layer of the MPI parallel application, so that the MPI parallel application running on an upper layer is transparent, and MPI parallel application running can be greatly conveniently controlled and dispatched.

Description

technical field [0001] The invention belongs to the technical field of computers, and relates to a method for controlling the operation of parallel application programs, in particular to a method for suspending and resuming the operation of MPI (Message Passing Interface, MPI) parallel application programs. Background technique [0002] In a parallel computer system, in order to solve the problem of inserting urgent MPI parallel computing tasks at any time, it is necessary to suspend / suspend some low-priority parallel computing tasks in order to give up more computing resources for new urgent parallel computing tasks. At present, most parallel applications do not provide the function of suspending / pausing, but need to complete this task from outside the parallel application, that is, it is necessary to synchronously save the running of the process of the parallel application on each computing node State and communication state, one of the difficulties is how to save the comm...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F9/48G06F9/54
Inventor 曾小荟罗文浪龙满生李金忠卜登立吕敬祥
Owner 徐州君迈网络科技有限公司