Disaster-tolerant method, device and machine-readable medium of cluster system

A cluster system and disaster recovery technology, which is applied to the generation of instruments, response errors, and fault handling not based on redundancy, etc., can solve problems such as data corruption, split brain, etc., and achieve the effect of improving stability and reliability

Active Publication Date: 2019-01-18
ALIBABA GRP HLDG LTD
View PDF12 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, when there are at least two standby nodes in the cluster system, since at least two standby nodes have the same rights to be upgraded to the master node, a split-brain problem may occur, that is, the at least two standby nodes may both be promoted to the master node. Node, the cluster system will have at least two master nodes, and then there will be at least two master nodes competing for resources, which may lead to data destruction in severe cases

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Disaster-tolerant method, device and machine-readable medium of cluster system
  • Disaster-tolerant method, device and machine-readable medium of cluster system
  • Disaster-tolerant method, device and machine-readable medium of cluster system

Examples

Experimental program
Comparison scheme
Effect test

application example 1

[0098] In application example 1, the detected node can be a working node, and the detection node can be a control node. Refer to Figure 4 , which shows a flow chart of application example 1 of a disaster recovery method for a cluster system of the present application, which may specifically include:

[0099] Step 401, the working node sends a lease request to the control node after startup;

[0100] Step 402, the control node sends a lease to the working node according to the lease request;

[0101] Optionally, the control node may be the master node in the cluster system. The lease may carry the lease generation time and lease duration corresponding to the lease. The working node may receive the lease sent by the control node according to the lease request.

[0102] Step 403, the working node sends a renewal request to the control node before its own lease expires;

[0103] Step 404, the control node sends the lease to the working node according to the renewal request; ...

application example 2

[0112] In application example 2, the detected node may be the master node of the control node, and the detection node may be the management node. Specifically, the master node can determine whether its own lease is valid, and when its own lease is invalid, stop the lease task corresponding to its own lease; therefore, the split-brain problem caused by dual master nodes can be effectively avoided.

[0113] In addition, the lease duration recorded by the management node for the lease of the master node may be longer than the actual lease duration corresponding to the lease. In this way, when the management node determines that the master node is invalid and looks for a new master node, the lease of the master node has expired. Therefore, the split-brain problem caused by dual master nodes can be further effectively avoided.

[0114] As for the communication process between the management node and the master node in application example 2, since it is similar to the communication ...

application example 3

[0116] In application example 3, the detected node may be the backup node of the control node, and the detection node may be the master node of the control node. Specifically, the standby node can determine whether its own lease is valid, and when its own lease is invalid, stop the lease task corresponding to its own lease; for example, the lease task in this case can include: the task of electing the master node, etc. The standby node whose lease has expired does not need to participate in the selection of the primary node, so the split-brain problem caused by dual primary nodes can be effectively avoided.

[0117] As for the communication process between the master node and the standby node in Application Example 3, since it is similar to the communication process between the control node and the working node in Application Example 1, it will not be described here, and cross-references are sufficient.

[0118] The embodiment of the present application provides a data process...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the application provides a disaster recovery method, a devcie and a machine-readable medium of a cluster system. The cluster system comprises a master node and a standby node. The standby node corresponds to a lease granted by the master node. The method comprises the following steps: obtaining a lease time corresponding to the lease of the standby node; according to the lease time corresponding to the lease agreement of the standby node, preempting the distributed lock of the master node; if the local node as the backup node preempts the distributed lock of the master node,the master node is switched. Embodiments of the present application can improve the stability and reliability of a trunked system.

Description

technical field [0001] The present application relates to the technical field of computer clusters, in particular to a disaster recovery method, device and machine-readable medium for a cluster system. Background technique [0002] Computer clusters can use multiple cluster nodes to perform parallel computing to obtain higher computing speed, and can also use multiple cluster nodes for backup, so that the entire cluster system can still operate normally after any one device fails. The reliability of the cluster system refers to the cluster system's ability to respond to requests under any circumstances, that is, when any device or any cluster node in the cluster system fails, the cluster system can pass through the remaining devices and The cluster nodes continue to run, which puts higher requirements on the disaster recovery capability of the cluster system. [0003] Existing solutions usually use a heartbeat (Heartbeat) detection method to detect whether a cluster node in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/07G06F11/14
CPCG06F11/0709G06F11/0793G06F11/1425
Inventor 安龙送周博
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products