System and method for recording and recovering checking point in distributed environment

A distributed environment and checkpoint technology, applied in the field of computer science, can solve problems such as the inability to find a consistent state, and achieve the effect of less communication

Active Publication Date: 2015-04-15
XIDIAN UNIV
View PDF2 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since the checkpoint is recorded randomly by the process itself, there may be a problem that a consistent state cannot be found

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for recording and recovering checking point in distributed environment
  • System and method for recording and recovering checking point in distributed environment
  • System and method for recording and recovering checking point in distributed environment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments.

[0044] 1. System initialization process

[0045] The present invention mainly has three modules to form. They are monitoring module, checkpoint recording module, and checkpoint recovery module. The monitoring module is responsible for monitoring whether the process is running correctly, closing the recording module and starting the recovery module when the operation is not normal. The checkpoint recording module is responsible for recording checkpoints in units of messages. The checkpoint recovery module is responsible for restoring checkpoints according to certain rules. The monitoring module is respectively connected with the checkpoint recording module and the checkpoint recovery module, and the monitoring module, the checkpoint recording module and the checkpoint recovery module respectively monitor, record and recover the process in re...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a system and a method for recording and recovering a checking point in a distributed environment. The system comprises a monitoring module, a checking point recording module and a checking point recovering module, the monitoring module monitors whether a process correctly runs or not, closes the recording module in abnormal running and starts the recovering module, the checking point recording module records the checking point by taking a message as a unit, the checking point recovering module recovers the checking point according to a certain rule, the monitoring module is respectively connected with the checking point recording module and the checking point recovering module, and the monitoring module, the checking point recording module and the checking point recovering module monitor, record and recover running information of a parent process respectively. The method includes recording and recovering the checking point. The device solves the problem that consistent checking points cannot be finally found among distributed checking points, communication volume between processes is quite less, and the device has the advantage of non-blocking property of the distributed checking points.

Description

technical field [0001] The invention belongs to the field of computer science and relates to the reliability of computer clusters, more specifically, it is an asynchronous checkpoint technology protocol suitable for distributed environments and can be used for computer cluster error recovery. Background technique [0002] Currently, computing tasks are becoming increasingly complex and require ever-increasing computation times. At the same time, high-performance computing systems include an increasing number of failure-prone components. The end result is that long-running distributed computations are increasingly interrupted by frequent hardware errors. In distributed computing, when a process fails, the cost is not only the loss of all calculations of the process, but also the loss of calculations of the calculation processes communicating with it. In order to ensure that distributed applications can be used more effectively in large-scale environments, it is imperative t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/07
Inventor 马建峰孟园李金库姚青松马卓
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products