Distributed system multilevel fault tolerance method under cloud environment

A distributed system, cloud environment technology, applied in the redundancy of computing for data error detection, response error generation and other directions, can solve problems such as large impact on application performance, poor scalability, and network congestion.

Active Publication Date: 2014-05-07
HUAZHONG UNIV OF SCI & TECH
View PDF3 Cites 59 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Direct use of physical cluster distributed system fault tolerance scheme: high cost, physical cluster fault tolerance scheme generally adopts process-level distributed checkpoint method, which can only backup and roll back the state of the process, but cannot save the state and environment of the operating system
When a node fails, the backup file can only be migrated to a redundant standby node for recovery, so the redundant node needs to be running all the time, which causes a great waste of resources
In addition, its scalability is poor. When the application is restored, it needs to solve the problem of dependence on the target node environment during process migration, such as IP address, runtime environment, etc. Therefore, the recovery process is highly dependent on th

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed system multilevel fault tolerance method under cloud environment
  • Distributed system multilevel fault tolerance method under cloud environment
  • Distributed system multilevel fault tolerance method under cloud environment

Examples

Experimental program
Comparison scheme
Effect test
No Example Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a distributed system multilevel fault tolerance method under a cloud environment, which comprises a distributed application collaboration algorithm based on a virtual machine disk snapshot, which can back up the I/O state and the dependent operating system environment; a hierarchical fault detection and recovery mechanism, which can detect a physical layer, a virtual layer, a cloud platform layer, a virtual machine OS layer and an application layer fault in real time, and adopt the matched fault recovery method for different faults. Thus, the fault detection and recovery can be refined to modules, and the strategy of top-down stepwise recovery is adopted to minimize the recovery overhead; based on the virtual fault tolerance cluster service deployment strategy of the template, a user can use the virtual machine template to perform one-click deployment on the virtual machine fault tolerance cluster and upload the operation to perform collocation, and use the authorized fault-tolerant PaaS service. The invention can effectively solve the problems that the existing cluster deployment is complicated and the fault tolerance overhead is expensive, and can cope with the distributed application fault at all levels under the cloud computing environment in all directions.

Description

technical field [0001] The invention belongs to the field of computing disaster tolerance in cloud computing reliability research, and more specifically relates to a multi-level fault tolerance method for a distributed system in a cloud environment. Background technique [0002] In terms of cluster fault tolerance, the traditional fault-tolerant technology is mainly to deal with the failure of computing nodes in the cluster. The main method used is the time redundancy mechanism, that is, when a node fails, the backup node is used to replace the failed node, and then the business application is executed. Rollback, re-execution from a previous point in time. Process checkpoint / rollback is a relatively mature and common technology that utilizes time redundancy mechanism for fault tolerance. Process checkpoint technology can save the running CPU register state and memory image to an external storage device to form a checkpoint file. When a node fails, the checkpoint file can be...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F11/14
Inventor 邹德清金海江昌庆羌卫中
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products