Failure detection and fault tolerance method and failure detection and fault tolerance system for real-time cloud platform

A fault detection and fault-tolerant system technology, applied in transmission systems, digital transmission systems, electrical components, etc., can solve problems such as loss of state information, functional failure, and inability to completely eliminate master node dependencies, to ensure uninterrupted operation and achieve fault tolerance. Effect

Active Publication Date: 2014-04-09
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF4 Cites 82 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] 1. It is impossible to completely overcome the state loss when the node fails. Usually, the state information, configuration information, business program files, etc. are saved on the node. Once the node fails, the state information will be lost
[0006] 2. It is impossible to completely eliminate the dependence on the master node
For example, in twitter storm, although the working node can still run when the master node fails, most functions will fail, such as submitting tasks, fault detection, etc.
[0007] 3. There is a lack of a comprehensive and overall fault detection and fault tolerance mechanism, so that both program-level and node-level faults can be detected and repaired in time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Failure detection and fault tolerance method and failure detection and fault tolerance system for real-time cloud platform
  • Failure detection and fault tolerance method and failure detection and fault tolerance system for real-time cloud platform
  • Failure detection and fault tolerance method and failure detection and fault tolerance system for real-time cloud platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] The principles and features of the present invention are described below in conjunction with the accompanying drawings, and the examples given are only used to explain the present invention, and are not intended to limit the scope of the present invention.

[0058] The present invention can be applied to platforms with distributed structures such as real-time cloud platforms and flow computing platforms, and is used to complete the fault detection and fault-tolerant functions of the real-time cloud platforms.

[0059] Such as figure 1 As shown, a fault detection and fault tolerance system oriented to a real-time cloud platform includes a client 1, a global state monitoring module 2, a global state storage module 3 and several working nodes 4;

[0060] The client 1 is used to send commands to the global state storage module 2, submit tasks, assign tasks to each working node, and store the tasks assigned to each working node into the global state storage module 3;

[006...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a failure detection and fault tolerance method and a failure detection and fault tolerance system for a real-time cloud platform. The system comprises a client used for sending a command, summiting a task and storing tasks assigned to working nodes in corresponding paths, a global state monitoring module used for monitoring the operation state of the working nodes, carrying out node-level failure detection and fault tolerance according to heartbeat information uploaded by the working nodes and performing migration of a task in a failure node, a global state storage module used for storing the working state and heartbeat information of the global state monitoring module and the working nodes, and working nodes used for performing a task, running a daemon process to guard a work process and performing program-level failure detection and fault tolerance. State information of a whole cluster is all stored in a Zookeeper system, a stateless architecture of the nodes is realized, a node failure does not cause state loss, the system has a perfect failure detection and fault tolerance mechanism, multilevel fault tolerance is realized, and uninterrupted operation of real-time services is guaranteed.

Description

technical field [0001] The invention relates to the field of real-time cloud computing, in particular to a fault detection and fault tolerance method and system for a real-time cloud platform. Background technique [0002] With the rise of technologies such as cloud computing and the Internet of Things, data is growing and accumulating at an unprecedented rate, and more and more appear in applications in the form of large-scale and continuous streams. The most typical application It is monitoring applications, such as financial market monitoring, network monitoring, mobile object monitoring, intrusion detection, and ecosystem monitoring. Real-time applications have higher requirements for fault detection, recovery, and fault tolerance. [0003] For this reason, many data stream processing systems have been developed in industry and academia, including Stanford University's STREAM, Xerox's Tapestry, University of California, Berkeley's Telegraph, Brown University and MIT's Au...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L12/24H04L12/26
Inventor 张闯李钊徐克付张鹏
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products