Parallel runtime error detection method

A technology for runtime errors and detection methods, applied in the computer field, can solve problems such as high running overhead and good portability, and achieve the effect of solving overhead and reducing running interference

Active Publication Date: 2017-09-26
JIANGNAN INST OF COMPUTING TECH
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The technical problem to be solved by the present invention is to provide a method for detecting errors during parallel operation in view of the above-mentioned defects in the prior art, which is oriented to MPI communication deadlock detection, and solves the problem that the prior art has a large operating cost and lacks parallelism during operation. The defect of interactive fault exploration and analysis hardly affects the performance of parallel programs when it is not activated. It can implement the runtime library based on the open interface of the MPI library. It has good portability and is completely transparent to user programming.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel runtime error detection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] In order to make the content of the present invention clearer and easier to understand, the content of the present invention will be described in detail below in conjunction with specific embodiments and accompanying drawings.

[0022] figure 1 A flow chart of a method for detecting errors during parallel runtime according to a preferred embodiment of the present invention is schematically shown.

[0023] Specifically, for example, the method for detecting errors during parallel runtime according to the preferred embodiment of the present invention is used to locate errors generated during the runtime of an MPI parallel program.

[0024] Such as figure 1 As shown, the error detection method according to the preferred embodiment of the present invention includes:

[0025] The first step S1: performing steady-state detection;

[0026] Wherein, when performing steady-state detection, for example, in the function MPI_INIT of the MPI tool (used to initialize the MPI execu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for detecting errors during parallel operation, comprising: setting a first counter and a second counter with an initial value of 0; when a process enters an MPI blocking operation, adding one to the first counter and starting a timer; When returning from this blocking operation, the value of the first counter is assigned to the second counter, and the timer is canceled; and, if the MPI is blocked in an MPI call, a soft interrupt signal is triggered when the timer is full, thereby Enter an interrupt processing function, compare the current value of the first counter and the second counter in the interrupt processing function, if the current value of the first counter and the second counter is not equal, then perform a state dump and then perform deadlock detection; if If the current values ​​of the first counter and the second counter are equal, then return from the interrupt processing function, and continue to execute the parallel program.

Description

technical field [0001] The invention relates to the technical field of computers, in particular to a method for detecting errors during parallel operation. Background technique [0002] In the field of HPC (High Performance Computing, high-performance computing), MPI (Message Passing Interface, message passing interface) is currently the most widely used and flexible parallel programming model, and it is also the mainstream message passing standard, and the most widely used parallel program programming model, there are multiple open source implementations. For MPI, because it is not automatically parallelized, programmers generally develop parallel computing by gradually increasing the parallel scale from small to large. [0003] When the program is parallel in a small scale, you can add printing to the program or use the debugging method of the debugging tool to solve the detection and location of errors during parallel operation. However, when faced with hundreds of point...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F11/36G06F9/38
Inventor 刘勇彭超陈华蓉王敬宇冯赟龙王雯霞
Owner JIANGNAN INST OF COMPUTING TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products