Proactive failure recovery model for distributed computing

A computer and computing node technology, which is applied in the direction of data error detection and response error generation in computing and computing redundancy, can solve problems such as limiting usefulness and efficiency, and achieve saving workload, allowing cost, and optimizing The effect of treatment

Active Publication Date: 2017-05-31
SAUDI ARABIAN OIL CO
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this technique has overhead issues associated with choosing the optimal checkpoint interval and stable storage location for the checkpoint data
Furthermore, current failure recovery models are usually limited to a few types of computational failures and are invoked manually in case of computational failure, which limits their usefulness and efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Proactive failure recovery model for distributed computing
  • Proactive failure recovery model for distributed computing
  • Proactive failure recovery model for distributed computing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The following detailed description is given to enable any person skilled in the art to make and use the disclosed subject matter, and is presented in the context of one or more specific implementations. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the scope of the present disclosure. Thus, the present disclosure is not intended to be limited to the described and / or illustrated embodiments but is to be accorded the widest scope consistent with the principles and features disclosed herein.

[0028] The present disclosure generally describes methods and systems for providing a proactive fault recovery model (FRM) for distributed computing to ensure business continuity optimization in the event of a computing node (e.g., computer server, etc.) failure, Computer implemented methods, computer program ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

This disclosure generally describes methods and systems, including computer-implemented methods, computer-program products, and computer systems, for providing a proactive failure recovery model for distributed computing. One computer-implemented method includes building a virtual tree-like computing structure of a plurality of computing nodes, for each computing node of the virtual tree-like computing structure, performing, by a hardware processor, a node failure prediction model to calculate a mean time between failure (MTBF) associated with the computing node, determining whether to perform a checkpoint of the computing node based on a comparison between the calculated MTBF and a maximum and minimum threshold, migrating a process from the computing node to a different computing node acting as a recovery node, and resuming execution of the process on the different computing node.

Description

[0001] priority claim [0002] This application claims priority to US Patent Application No. 14 / 445,369, filed July 29, 2014, the entire contents of which are incorporated herein by reference. Background technique [0003] Critical / real-time scientific applications (e.g. seismic data processing, 3D reservoir Uncertainty modeling and simulation) require high-end computing power, which can take days or weeks to process data to generate the desired solution. The success of longer job executions depends on the reliability of the system. Since most scientific applications deployed on supercomputers can fail if only one of the processes fails, fault tolerance in distributed systems is an important feature in complex computing environments. Tolerating computer processing failures of any type reactively generally involves the choice of whether to allow periodic checkpointing of the state of one or more processes - an effective technique that can be widely applied in high performance...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/14G06F11/20
CPCG06F11/1438G06F11/1471G06F11/203G06F11/0757G06F11/0721G06F11/1461G06F11/1407G06F11/34
Inventor 哈兰德·S·AL-瓦哈比
Owner SAUDI ARABIAN OIL CO
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products