Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Active Failure Recovery Model for Distributed Computing

A technology of computers and computing nodes, which is applied in the direction of data error detection and response error generation in computing and computing redundancy, which can solve problems such as limiting usefulness and efficiency, and achieve workload saving, optimized processing, and energy saving the effect of time

Active Publication Date: 2021-01-05
SAUDI ARABIAN OIL CO
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this technique has overhead issues associated with choosing the optimal checkpoint interval and stable storage location for the checkpoint data
Furthermore, current failure recovery models are usually limited to a few types of computational failures and are invoked manually in case of computational failure, which limits their usefulness and efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Active Failure Recovery Model for Distributed Computing
  • Active Failure Recovery Model for Distributed Computing
  • Active Failure Recovery Model for Distributed Computing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The following detailed description is given to enable any person skilled in the art to make and use the disclosed subject matter, and is presented in the context of one or more specific implementations. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the scope of the present disclosure. Thus, the present disclosure is not intended to be limited to the described and / or illustrated embodiments but is to be accorded the widest scope consistent with the principles and features disclosed herein.

[0028] The present disclosure generally describes methods and systems for providing a proactive fault recovery model (FRM) for distributed computing to ensure business continuity optimization in the event of a computing node (e.g., computer server, etc.) failure, Computer implemented methods, computer program ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present disclosure generally describes methods and systems, including computer-implemented methods, computer program products, and computer systems, for providing an active failure recovery model for distributed computing. A computer-implemented method includes: constructing a virtual tree-like computing structure of a plurality of computing nodes; for each computing node of the virtual tree-like computing structure, executing a node failure prediction model by a hardware processor to calculate a correlation with the computing node connected mean time between failures (MTBF); determine whether to perform a checkpoint of the compute node based on a comparison between the computed MTBF and maximum and minimum thresholds; migrate the process from the compute node to a different one as a recovery node Compute node; and continue the process execution on a different compute node.

Description

[0001] priority claim [0002] This application claims priority to US Patent Application No. 14 / 445,369, filed July 29, 2014, the entire contents of which are incorporated herein by reference. Background technique [0003] Execute critical / real-time scientific applications (e.g. seismic data processing, 3D reservoir Uncertainty modeling and simulation) require high-end computing power, which can take days or weeks to process data to generate the desired solution. The success of longer job executions depends on the reliability of the system. Since most scientific applications deployed on supercomputers can fail if only one of the processes fails, fault tolerance in distributed systems is an important feature in complex computing environments. Tolerating computer processing failures of any type reactively generally involves the choice of whether to allow periodic checkpointing of the state of one or more processes - an effective technique that can be widely applied in high per...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F11/14G06F11/20
CPCG06F11/1438G06F11/1471G06F11/203G06F11/0757G06F11/0721G06F11/1461G06F11/1407G06F11/34
Inventor 哈兰德·S·AL-瓦哈比
Owner SAUDI ARABIAN OIL CO
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products