Cluster node fault processing method and system
A fault handling method and node technology, applied in the field of communication, can solve problems such as split-brain, failure of the cluster to provide external services, and impact on client business, so as to highlight substantive features, improve stability and scene adaptability, and avoid split-brain risk effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0072] like figure 1 As shown, this embodiment provides a fault processing method of a cluster node, including the following steps:
[0073] S1: Add a node information database in the cluster to get information on all nodes in the cluster to store the node information database.
[0074] The data stored in the node information database includes: node service information, node information, cluster information, and fault processing node bar. The node service information includes a service name, service startup time, and service status information; the node information includes a node start time, a NPU usage of a node, and a number of clients connected to the node; the cluster information includes a node number in the cluster. , Node status information and cluster status information; the fault processing node strip is used to store the current fault processing node label.
[0075] S2: When the cluster start and client connect to the cluster node, the node information database is updat...
Embodiment 2
[0089] This embodiment also provides a fault processing method of a cluster node, including:
[0090] 1. After the node is started, the cluster adds a node information database and starts the timing event to get information on each node to store the update node information database.
[0091] Among them, the following information is saved in the node information database;
[0092] Node Service Information: Service Name, Service Start Time, Service Status Information
[0093] This node information: Node startup time, CPU usage, client connection information;
[0094] Cluster information: Number, status, cluster status of each node;
[0095] A fault processing node strip for storing a fault recovery node label.
[0096] 2. When the cluster start and client are connected to the cluster node, the above information is updated to the node information database, and determine the node health condition in accordance with the sorting algorithm, and the sort is sequentially suited to the node...
Embodiment 3
[0107] Based on the example one, such as figure 2 , The present invention also discloses a system for troubleshooting cluster nodes, comprising: a formation unit 1 database, sorting unit 2, a storage unit 3, and the node selecting unit 4.
[0108] Database construction unit 1, to add a node in the cluster information database, access to information for all nodes in the cluster node information stored in the database.
[0109] Sorting unit 2, for when the cluster is started and the client is connected to the node cluster, the node update the timing information database, and using the determined sorting algorithm to sort the data node health information database stored in the node.
[0110] Wherein the sorting unit 2 comprises:
[0111] The first scoring module, according to the score for the node state information determining a state of each node;
[0112] A second scoring module for determining a start time for each node according to the node score start time;
[0113] The third s...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 

