Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A system and method for checkpoint recording and recovery in a distributed environment

A distributed environment and checkpoint technology, applied in the field of computer science, can solve problems such as the inability to find a consistent state, and achieve the effect of less communication

Active Publication Date: 2017-08-25
XIDIAN UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since the checkpoint is recorded randomly by the process itself, there may be a problem that a consistent state cannot be found

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A system and method for checkpoint recording and recovery in a distributed environment
  • A system and method for checkpoint recording and recovery in a distributed environment
  • A system and method for checkpoint recording and recovery in a distributed environment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments.

[0044] 1. System initialization process

[0045] The present invention mainly has three modules to form. They are monitoring module, checkpoint recording module, and checkpoint recovery module. The monitoring module is responsible for monitoring whether the process is running correctly, closing the recording module and starting the recovery module when the operation is not normal. The checkpoint recording module is responsible for recording checkpoints in units of messages. The checkpoint recovery module is responsible for restoring checkpoints according to certain rules. The monitoring module is respectively connected with the checkpoint recording module and the checkpoint recovery module, and the monitoring module, the checkpoint recording module and the checkpoint recovery module respectively monitor, record and recover the process in re...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a system and method for checkpoint recording and recovery in a distributed environment. The system includes three modules, namely a monitoring module, a checkpoint recording module and a checkpoint recovery module. The monitoring module is responsible for monitoring whether the process is running correctly, closing the recording module and starting the recovery module when the operation is abnormal; the checkpoint recording module is responsible for recording checkpoints in units of messages; the checkpoint recovery module is responsible for restoring inspection according to certain rules point; the monitoring module is respectively connected with the checkpoint recording module and the checkpoint restoration module, and the monitoring module, the checkpoint recording module and the checkpoint restoration module respectively monitor, record and restore the parent process operation information. Methods include: checkpoint recording and checkpoint recovery. The invention solves the problem that the distributed checkpoint cannot finally find the consistency checkpoint, the communication volume between processes is very small, and has the advantage of non-blocking of the distributed checkpoint.

Description

technical field [0001] The invention belongs to the field of computer science and relates to the reliability of computer clusters, more specifically, it is an asynchronous checkpoint technology protocol suitable for distributed environments and can be used for computer cluster error recovery. Background technique [0002] Currently, computing tasks are becoming increasingly complex and require ever-increasing computation times. At the same time, high-performance computing systems include an increasing number of failure-prone components. The end result is that long-running distributed computations are increasingly interrupted by frequent hardware errors. In distributed computing, when a process fails, the cost is not only the loss of all calculations of the process, but also the loss of calculations of the calculation processes communicating with it. In order to ensure that distributed applications can be used more effectively in large-scale environments, it is imperative t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F11/07
Inventor 马建峰孟园李金库姚青松马卓
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products