Computer system and cluster management method

A leader-follower cluster configuration with client system spares and dynamic reconfiguration addresses high costs and split-brain issues, ensuring low-cost high availability and reliability.

WO2026126552A1PCT designated stage Publication Date: 2026-06-18HITACHI LTD

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
HITACHI LTD
Filing Date
2025-07-09
Publication Date
2026-06-18

Smart Images

  • Figure JP2025024623_18062026_PF_FP_ABST
    Figure JP2025024623_18062026_PF_FP_ABST
Patent Text Reader

Abstract

This computer system includes: a server system in which a plurality of server nodes operate; and a client system in which a plurality of nodes operate, the computer system including a cluster configured from two server nodes and one node. In the cluster, one server node is a leader node that functions as a leader of a leader-follower type cluster management algorithm, the other server node is a follower node that functions as a follower of the leader-follower type cluster management algorithm, and the node is a monitor node that functions as a monitor for voting the selection of the leader.
Need to check novelty before this filing date? Find Prior Art

Description

Computer System and Cluster Management Method Incorporation by Reference 【0001】 This application claims the priority of Japanese Patent Application No. 2024-217756 filed on December 12, 2024, and incorporates the content thereof by reference. 【0002】 The present invention relates to a cluster implementing a quorum-based algorithm. 【0003】 In a cloud system, a cluster configuration is adopted to ensure availability. In the cluster configuration, one node operates as the primary system, and the other nodes are standby systems. When a failure occurs in the primary system node, one of the standby system nodes continues the service as the primary system. 【0004】 In the cluster configuration, split-brain becomes a problem. As a method for preventing split-brain, a quorum-based algorithm is used (see, for example, Patent Document 1). Also, as one of the quorum-based algorithms, there is a method that adopts a Leader-Follower type algorithm (for example, Raft). The cluster is composed of one Leader node that functions as the primary system node and two or more Follower nodes. When a failure occurs in the Leader, a candidate for the next Leader and voting to the candidate nodes are performed according to the Leader election algorithm, and the Follower that obtains votes from a majority of nodes becomes the next Leader. 【0005】 Special Table 2017-517817 Gazette 【0006】 In order to implement a quorum-based algorithm, it is necessary to configure the cluster to include three or more odd-numbered nodes. Therefore, it is necessary to prepare multiple non-primary system nodes, and there is a problem that the cost required for introduction is high. 【0007】 An object of the present invention is to realize a cluster that can ensure high availability at low cost and avoid split-brain. 【0008】A representative example of the invention disclosed in this application is as follows: A computer system comprising a server system consisting of a plurality of computers and operating a plurality of server nodes that provide services, and a client system consisting of a plurality of computers and operating a plurality of nodes, wherein the plurality of nodes of the client system comprises a plurality of gateway nodes and a plurality of client nodes that use services provided by the server nodes, and the computer system comprises a first cluster consisting of two of the server nodes and one of the nodes, wherein in the first cluster, one of the server nodes is a leader node that functions as the leader of a leader-follower cluster management algorithm, the other server node is a follower node that functions as a follower of a leader-follower cluster management algorithm, and one of the nodes is a monitor node that functions as a monitor that votes for the election of a leader in the leader-follower cluster management algorithm. 【0009】 According to the present invention, a cluster can be realized that is low-cost, ensures high availability, and avoids split-brain architecture. Other issues, configurations, and effects will be clarified by the following description of the embodiments. 【0010】This figure shows an example of the configuration of the computer system in Example 1. This figure shows an example of the hardware configuration of the computer that implements the node in Example 1. This figure shows an example of the functional configuration of the server node in Example 1. This figure shows an example of the functional configuration of the client node that functions as a Monitor in Example 1. This figure shows an example of the data structure of the health management information in Example 1. This is a flowchart illustrating an example of the health confirmation process performed by the latent fault management unit in Example 1. This is a flowchart illustrating an example of the fault monitoring process performed by the latent fault management unit in Example 1. This figure shows an example of how to change the cluster configuration in Example 1. This figure shows an example of how to change the cluster configuration in Example 1. This is a diagram illustrating an example of how to change the cluster configuration in Example 1. This is a diagram illustrating an example of how to change the cluster configuration in Example 1. This is a diagram illustrating an example of how to change the cluster configuration in Example 1. This is a diagram illustrating an example of how to change the cluster configuration in Example 1. This is a diagram illustrating an example of how to change the cluster configuration in Example 1. This is a flowchart illustrating an example of how to change the cluster configuration in Example 1. This is a diagram illustrating an example of how to change the cluster configuration in Example 1. This is a flowchart illustrating an example of how to change the cluster management unit of the Monitor in Example 1. This figure shows an example of the configuration of the computer system in Example 2. This figure shows an example of the configuration of the computer system in Example 3. 【0011】 The embodiments of the present invention will be described below with reference to the drawings. However, the present invention is not to be construed as being limited to the embodiments described below. It will be readily apparent to those skilled in the art that the specific configuration can be modified without departing from the spirit or intent of the present invention. 【0012】 In the configuration of the invention described below, identical or similar components or functions are denoted by the same reference numerals, and redundant descriptions are omitted. 【0013】 The designations "First," "Second," "Third," etc., used in this specification are for the purpose of identifying constituent elements and do not necessarily limit their number or order. 【0014】 Figure 1 shows an example of the configuration of the computer system in Example 1. Figure 2 shows an example of the hardware configuration of the computer that implements the node in Example 1. 【0015】The computing system consists of a server system including multiple server nodes 100 that provide services, and a client system including multiple client nodes 101 that use the services. The server system is, for example, a cloud system. The client system is, for example, an on-premises system located at a site. 【0016】 Server node 100 is connected to other server nodes 100 via a network such as a LAN (Local Area Network) so as to be able to communicate, and is also connected to each client node 101 via the network so as to be able to communicate. Client node 101 is connected to other client nodes 101 via the network so as to be able to communicate. 【0017】 The server node 100 and client node 101 are implemented using a computer 200 as shown in Figure 2. The nodes may be implemented using one computer 200, or multiple computers 200. Alternatively, the nodes may be implemented using virtualization technology. 【0018】 The computer 200 comprises a processor 201, main memory 202, secondary memory 203, and a network interface 204. Each hardware component is connected via a bus 205. 【0019】 The processor 201 executes a program stored in the main memory 202. By executing processing according to the program, the processor 201 operates as a functional unit (module) that realizes a specific function. In the following description, when the processing is described with a functional unit as the subject, it indicates that the processor 201 is executing a program that realizes that functional unit. 【0020】 The main memory 202 is a memory unit that stores the program executed by the processor 201 and the information used by the program. The main memory 202 is also used as a work area. 【0021】The secondary storage device 203 is a large-capacity storage device such as an HDD (Hard Disk Drive) or SSD (Solid State Drive). The programs and information stored in the main memory 202 may also be stored in the secondary storage device 203. In this case, the processor 201 reads the programs and information from the secondary storage device 203 and loads them into the main memory 202. 【0022】 The network interface 204 is an interface for communicating with other devices over a network. 【0023】 In Example 1, as shown in Figure 1, the cluster is configured with two server nodes 100-1 and 100-2 and two client nodes 101-1 and 101-2. 【0024】 Server node 100-1 functions as the Leader of a Leader-Follower cluster management algorithm, and server node 100-2 functions as the Follower of a Leader-Follower cluster management algorithm. Client node 101-1 functions as a Monitor, and client node 101-2 functions as a Spare. A Monitor is a type of Follower in a Leader-Follower cluster management algorithm; it can vote in the Leader selection process but cannot run for office (it will not be selected as the next Leader). A Spare is a node that acts as a standby for a Monitor. An example of a Leader-Follower cluster management algorithm is Raft. 【0025】 Note that there may be two or more client nodes 101 that function as spares. Also, the cluster does not need to contain any client nodes 101 that function as spares. 【0026】 In the following explanation, a node that functions as a Leader will be referred to as Leader, a node that functions as a Follower will be referred to as Follower, a node that functions as a Monitor will be referred to as Monitor, and a node that functions as a Spare will be referred to as Spare. 【0027】 The arrows in Figure 1 represent health check communication between nodes. Health check communication transmits health information such as Heartbeat. Messages in the consensus protocol, as well as messages between server node 100 and client node 101, may also be treated as health information. Health check communication is performed at a high frequency, such as every 100 milliseconds, between Leader and Follower, and between Leader and Monter, according to the Leader-Follower type cluster management algorithm. Health check communication is performed at a low frequency, such as every 10 seconds, between other nodes. 【0028】 By configuring the cluster that implements the Leader-Follower type cluster management algorithm as shown in Figure 1, it is not necessary to prepare two or more standby server nodes 100. Furthermore, by including the client node 101 in the cluster, abnormalities and replacements of the Leader can be detected quickly. By adopting the cluster configuration of Example 1, it is possible to ensure high availability at low cost and avoid split-brain scenarios. 【0029】 Figure 3 shows an example of the functional configuration of the server node 100 in Embodiment 1. 【0030】 The server node 100 includes a cluster management unit 110 and a server application 120, and also holds liveness management information 150. 【0031】 The server application 120 is an application that provides services. These services include, for example, storage services and web services. The present invention is not limited to the type of service. 【0032】 The cluster management unit 110 manages the cluster according to a Leader-Follower type cluster management algorithm. The Leader's cluster management unit 110 communicates with the Follower and Monitor at a high frequency to check on their well-being. The Follower's cluster management unit 110 communicates with the Leader at a high frequency to check on their well-being. If a Leader failure is detected as a result of monitoring through well-being checks, a new Leader is selected according to a known method. 【0033】 The cluster management unit 110 includes a potential failure management unit 111. The potential failure management unit 111 performs low-frequency health check communications. The health check communications performed by the cluster management unit 110 and the potential failure management unit 111 are conducted in a manner that allows each communication to be distinguished. 【0034】 The potential failure management unit 111 of each node communicates with other nodes at a low frequency to confirm their existence. 【0035】 The survival management information 150 is information for managing the survival status of nodes and communication paths between nodes that are monitored by the potential failure management unit 111. 【0036】 The cluster management unit 110 also holds information (liveness management information) for managing the liveness status of nodes monitored by high-frequency liveness confirmation communication, but this is omitted from the diagram. 【0037】 Figure 4 shows an example of the functional configuration of client node 101-1, which functions as a Monitor in Example 1. 【0038】 Client node 101-1 is equipped with a cluster management unit 110 and a client application 130, and also holds liveness management information 150 and a queue 160. Client nodes that function as spares are similar. 【0039】 The client application 130 uses the service to perform predetermined processing. The present invention is not limited to the type and content of processing performed by the client application 130. 【0040】 The cluster management unit 110 manages the cluster according to a Leader-Follower type cluster management algorithm. The Monitor's cluster management unit 110 communicates with the Leader at a high frequency to check on its status. The Spare's cluster management unit 110 does not communicate with the Leader at a high frequency to check on its status. 【0041】 The cluster management unit 110 includes a potential failure management unit 111. The potential failure management unit 111 performs low-frequency health check communications. The health check communications performed by the cluster management unit 110 and the potential failure management unit 111 are conducted in a manner that allows each communication to be distinguished. 【0042】 The spare potential fault management unit 111 conducts low-frequency survival confirmation communication with the Leader, Follower, and Monitor. 【0043】 The survival management information 150 is information for managing the survival of nodes and communication paths connecting the nodes. 【0044】 The queue 160 is a queue for accumulating requests to be sent to the server application 120. 【0045】 Each node constituting the cluster holds information for managing the survival state of the nodes monitored by high-frequency survival confirmation communication, which is omitted in the figure. 【0046】 FIG. 5 is a diagram showing an example of the data structure of the survival management information 150 of Example 1. 【0047】 The survival management information 150 stores entries including an ID 501, a monitored site 502, a source time 503, and a survival confirmation time 504. 【0048】 The ID 501 is a field for storing an identifier of an entry. The monitored site 502 is a field for storing information representing the site to be monitored. The entry with ID 501 being "1" manages the survival information of node 1, and the entry with ID 502 being "3" manages the survival information of the transmission path from node 1 to node 2. Here, nodes 1, 2,... are either the server node 100 or the client node 101, and are unique identification names for distinguishing each node. The roles of the nodes, namely Leader, Follower, Monitor, and Spare, are not distinguished. The source time 503 is a field for storing the source time of the survival information. The survival confirmation time 504 is a field for storing the reception time of the survival information. The survival information transmitted by the potential fault management unit 111 at a low frequency includes the information excluding the survival confirmation time 504 from the table in FIG. 5. This enables grasping the state of communication paths that cannot be directly seen from the own node. 【0049】 FIG. 6 is a flowchart for explaining an example of the survival confirmation process executed by the potential fault management unit 111 of Example 1. 【0050】 The potential failure management unit 111 monitors the system's survival information after startup (step S101). 【0051】 When survival information is received, the potential failure management unit 111 updates the survival management information 150 based on the received survival information (step S102) and continues to monitor the survival information. 【0052】 Here, we will explain an example of how to update the survival management information 150. Here, we will explain how to update the survival management information 150 assuming that the local node is node 3. 【0053】 When node 3 receives liveness information from node 2, including the liveness status of the transmission path from node 1 to node 2, it updates the liveness confirmation time 504 of the entry corresponding to the liveness status of node 2 (the entry with ID 501 being 2) and the liveness confirmation time 504 of the entry corresponding to the liveness status of the transmission path from node 1 to node 2 (the entry with ID 501 being 3). 【0054】 Figure 7 is a flowchart illustrating an example of the fault monitoring process performed by the latent fault management unit 111 in Embodiment 1. The latent fault management unit 111 periodically performs the fault monitoring process described below. 【0055】 The potential failure management unit 111 determines whether its own node is a Leader or not (step S201). 【0056】 If the local node is not the Leader, the potential fault management unit 111 terminates the fault monitoring process. 【0057】 If the local node is the Leader, the potential failure management unit 111 refers to the survival management information 150 and determines whether or not a faulty part exists (step S202). 【0058】 Specifically, the latent failure management unit 111 searches for entries in which the difference (elapsed time) between the survival confirmation time 504 and the current time is greater than a predetermined threshold. The latent failure management unit 111 determines that the part corresponding to the monitored part 502 of the searched entry is the faulty part. 【0059】 The predetermined threshold is, for example, three or five times the execution cycle of the fault monitoring process. 【0060】 If no faulty parts are found, the potential fault management unit 111 terminates the fault monitoring process. 【0061】 If a faulty component exists, the potential fault management unit 111 requests the cluster management unit 110 to change the cluster configuration (step S203), and then terminates the fault monitoring process. Since a fault in one component may affect other components, if a fault is detected in one component, the potential fault management unit 111 may postpone the process in step S203 for a certain period of time to check whether faults have occurred in a wider range of components before proceeding. 【0062】 When the cluster management unit 110 receives a request, it modifies the cluster configuration to enable the selection of a new leader (to reduce the occurrence of split-brain) according to the location of the failure. 【0063】 Here, we will explain how to change the cluster configuration. Figures 8A and 8B show an example of how to change the cluster configuration in Example 1. 【0064】 (1) If a failure occurs in the communication path between the Leader and the Monitor, the Leader's cluster management unit 110 changes the Monitor to a Spare and changes the Spare to a Monitor. 【0065】 (2) If a failure occurs in the communication path between the Leader and the Monitor, or in the communication path between the Leader and the Spare, the Leader will be unable to provide services to the client nodes. The cluster management unit 110 of the Leader changes the Follower to Leader, and changes the Leader to Follower. 【0066】(3) If a failure occurs in the communication path between the Follower and the Monitor (Figure 8A) If the Leader also fails and stops in this state, the Follower will run for the next Leader based on the Leader-Follower type leader selection algorithm. However, the Follower cannot get votes from Leaders and Monitors that have voting rights other than its own node. In other words, it will not be able to get a majority of votes and a new Leader will not be elected. To avoid this situation, the Leader's cluster management unit 110 changes the Monitor to a Spare and the Spare to a Monitor, as shown in Figure 8B. 【0067】 (4) If at least one of the following occurs: a failure in the communication path between the Leader and Spare, or a failure in the communication path between the Follower and Spare, this does not affect the Leader selection process. Therefore, the Leader's cluster management unit 110 does not change the cluster configuration. However, the Leader's cluster management unit 110 may issue an alert to prompt the recovery of the communication path failure. The Leader's cluster management unit 110 may also select a new Spare from among the client nodes 101 included in the client system. 【0068】 (5) If a failure occurs in the communication path between the Leader and Follower, the Leader's cluster management unit 110 removes the Follower from the cluster, changes the Spare to a Monitor, and adds it back to the cluster. 【0069】 Thus, in this embodiment, the cluster configuration is modified in advance to avoid split-brain and other failures based on the monitoring results of the potential failure management unit 111. 【0070】 Figures 9A, 9B, 9C, and 9D illustrate an example of a change in the cluster configuration of Embodiment 1. 【0071】If a failure occurs in the Leader (Figure 9A), a Follower is elected as the new Leader based on the Leader election algorithm. The cluster management unit 110 of the server node 100 elected as Leader removes the old Leader from the cluster and changes the Spare to a Monitor and adds it to the cluster, as shown in Figure 9B. 【0072】 If server node 100, which was the Leader, recovers from a failure and is then added to the cluster as a Follower (Figure 9C), the Leader's cluster management unit 110 changes one of the two Monitors included in the cluster to Spare, as shown in Figure 9D. One possible method for selecting which client node 101 to change is to prioritize changing the client node 101 with poor communication quality. 【0073】 Figure 10 is a flowchart illustrating an example of the processing performed by the cluster management unit 110 of the Monitor in Example 1. 【0074】 The Monitor's cluster management unit 110 periodically executes the processes described below. 【0075】 The cluster management unit 110 determines whether the difference (elapsed time) between the time the Leader was confirmed to be alive and the current time is less than or equal to a first threshold (step S301). In other words, it determines whether or not a failure has occurred in the Leader. If the elapsed time is greater than the first threshold, it is determined that a failure has occurred in the Leader. 【0076】 If the elapsed time is greater than the first threshold, the cluster management unit 110, together with the Follower, selects a new Leader based on the Leader selection algorithm (step S302), releases the request transmission hold (step S306), and then terminates the process. If requests are stored in the queue 160, the cluster management unit 110 sends the requests to the new Leader in the order they were stored. 【0077】If the elapsed time is less than or equal to the first threshold, the cluster management unit 110 determines whether the elapsed time is less than or equal to the second threshold (step S303). That is, it determines whether there is a possibility that a failure has occurred in the Leader. If the elapsed time is greater than the second threshold, it is determined that there is a possibility that a failure has occurred in the Leader. The second threshold is a value that is greater than the survival monitoring period and less than the first threshold. 【0078】 If the elapsed time is greater than the second threshold, the cluster management unit 110 suspends the transmission of the request from the client application 130 (step S304). After that, the cluster management unit 110 terminates processing. At this time, the cluster management unit 110 records the Leader's liveness check time in the work area. After suspending the transmission of the request, the cluster management unit 110 stores the request received from the client application 130 in the queue 160. 【0079】 If the elapsed time is less than or equal to the second threshold, the cluster management unit 110 determines whether or not the Leader's liveness confirmation time has been updated (step S305). Specifically, the cluster management unit 110 compares the Leader's liveness confirmation time recorded in the work area with the Leader's liveness confirmation time in the liveness management information (not shown). 【0080】 If the Leader's liveness check time has not been updated, the cluster management unit 110 terminates processing. 【0081】 If the Leader's liveness check time has been updated, the cluster management unit 110 releases the request transmission hold (step S306), and then terminates the process. 【0082】 A Monitor, which also functions as a client node, will, if it detects a potential failure of the Leader, suspend sending requests to the Leader (which also functions as the primary server node), and resume sending requests after the likelihood of the Leader failure decreases or after a new Leader is elected. This reduces the chance of request loss due to Leader re-election. 【0083】In Example 2, the nodes of the client systems that make up the cluster are different from those in Example 1. Below, Example 2 will be described focusing on the differences from Example 1. 【0084】 Figure 11 shows an example of the configuration of the computer system in Example 2. 【0085】 The client system in Embodiment 2 includes a gateway node 102. The server node 100 communicates with the client node 101 via the gateway node 102. 【0086】 In Example 2, as shown in Figure 11, the cluster is configured with two server nodes 100-1 and 100-2 and two gateway nodes 102-1 and 102-2. 【0087】 Server node 100-1 functions as the Leader of a Leader-Follower type cluster management algorithm, and server node 100-2 functions as the Follower of the Leader-Follower type cluster management algorithm. Gateway node 102-1 functions as the Monitor, and gateway node 102-2 functions as the Spare. 【0088】 Note that there may be two or more gateway nodes 102 that function as spares. Also, the cluster does not need to include gateway nodes 102 that function as spares. 【0089】 The functions of Leader, Follower, Monitor, and Spare are the same as in Example 1. Furthermore, the processing performed by the cluster management unit 110 and the potential failure management unit 111 is also the same as in Example 1. 【0090】 In Example 1, the opportunity for request loss can be suppressed only for the client node 101 that functions as a monitor. On the other hand, in the cluster of Example 2, the effect of suppressing the opportunity for request loss can be expected for the client node 101 connected to the gateway node 102. 【0091】Example 3 differs from Example 1 in that a multi-tiered cluster is configured in the server system. Below, we will describe Example 3, focusing on the differences from Example 1. 【0092】 Figure 12 shows an example of the configuration of the computer system in Example 3. 【0093】 Cluster A is a cluster with the configuration described in Example 1. In Example 3, cluster B is formed with server nodes 100-3 and 100-4, which provide services used by the Leader and Follower of cluster A. The Leader of cluster A functions as a Monitor in cluster B, and the Follower of cluster A functions as a Spare in cluster B. If a Follower of cluster A becomes a new Leader according to the contents of Example 1, the Monitor and Spare of cluster B are swapped in conjunction with this. For example, if server node 100-2 becomes the new Leader of cluster A, server node 100-2 becomes the Monitor of cluster B, and server node 100-1 becomes a Spare of cluster B. 【0094】 By configuring cluster B in the same way as cluster A, the control described in Example 1 becomes possible. 【0095】 It should be noted that the present invention is not limited to the embodiments described above, and various modifications are included. Furthermore, for example, the embodiments described above are detailed explanations of the configuration in order to clearly illustrate the present invention, and are not necessarily limited to those having all the configurations described. In addition, some of the configurations in each embodiment can be added to, deleted from, or replaced with other configurations. 【0096】Furthermore, each of the above-mentioned configurations, functions, processing units, processing means, etc., may be implemented in hardware, either partially or entirely, by designing them as integrated circuits, for example. The present invention can also be implemented by software program code that realizes the functions of the embodiment. In this case, a storage medium on which the program code is recorded is provided to a computer, and the processor of that computer reads the program code stored in the storage medium. In this case, the program code read from the storage medium itself realizes the functions of the embodiment described above, and the program code itself and the storage medium on which it is stored constitute the present invention. Examples of storage media used to supply such program code include flexible disks, CD-ROMs, DVD-ROMs, hard disks, SSDs (Solid State Drives), optical disks, magneto-optical disks, CD-Rs, magnetic tapes, non-volatile memory cards, ROMs, and the like. 【0097】 Furthermore, the program code that implements the functions described in this embodiment can be implemented in a wide range of programming or scripting languages, such as assembler, C / C++, Perl, Shell, PHP, Python, and Java. 【0098】 Furthermore, the program code for the software that implements the functions of the embodiment may be distributed via a network and stored in a storage means such as a computer's hard disk or memory, or in a storage medium such as a CD-RW or CD-R, and the computer's processor may read and execute the program code stored in the storage means or storage medium. 【0099】 In the above-described embodiment, the control lines and information lines shown are those deemed necessary for explanation and do not necessarily represent all control lines and information lines in the actual product. All components may be interconnected.

Claims

1. A computer system comprising: a server system consisting of a plurality of computers and operating on a plurality of server nodes that provide services; and a client system consisting of a plurality of computers and operating on a plurality of nodes, wherein the plurality of nodes of the client system comprises a plurality of gateway nodes and a plurality of client nodes that utilize services provided by the server nodes; the computer system comprises a first cluster consisting of two of the server nodes and one of the nodes, wherein in the first cluster, one of the server nodes is a leader node that functions as the leader of a leader-follower type cluster management algorithm; the other server node is a follower node that functions as a follower of a leader-follower type cluster management algorithm; and one of the nodes is a monitor node that functions as a monitor that votes for the election of a leader.

2. A computer system according to claim 1, wherein the first cluster is a node of the client system and includes a spare node which is a spare for the monitor node, and each of the leader node, follower node, monitor node, and spare node monitors the health status of the communication paths connecting each of the leader node, follower node, monitor node, and spare node, and the leader node changes the configuration of the first cluster so that a new leader node can be elected when a failure of one of the communication paths is detected.

3. A computer system according to claim 2, wherein the leader node, when it detects a failure in the communication path connecting the follower node and the monitor node, changes the monitor node to the spare node and changes the spare node to the monitor node.

4. A computer system according to claim 2, wherein the leader node, when it detects a failure in the communication path connecting the leader node and the follower node, removes the follower from the first cluster and changes the spare node to the monitor node.

5. A computer system according to claim 2, wherein the server system includes a plurality of upper-level server nodes that provide services to the leader node and the follower node, the computer system includes a second cluster comprising two of the upper-level server nodes, the leader node of the first cluster, and the follower node of the first cluster, wherein in the second cluster, one of the upper-level server nodes is the leader node of the second cluster, the other upper-level server node is the follower node of the second cluster, the leader node of the first cluster is the monitor node of the second cluster, and the follower node of the first cluster is the spare node of the second cluster.

6. A computer system according to claim 1, wherein the monitor node, when there is a possibility that the leader node is experiencing a failure, stores requests to be sent to the leader node in a queue, and when the failure of the leader node has been resolved, sends the requests stored in the queue to the leader node.

7. A cluster management method executed by a computer system, comprising: a computer system comprising a server system consisting of a plurality of computers and operating a plurality of server nodes that provide services; and a client system comprising a plurality of computers and operating a plurality of nodes, wherein the plurality of nodes of the client system comprises a plurality of gateway nodes and a plurality of client nodes that utilize services provided by the server nodes; the computer system comprises a cluster consisting of two of the server nodes and two of the nodes, wherein in the cluster, one of the server nodes is a leader node that functions as the leader of a leader-follower cluster management algorithm; the other server node is a follower node that functions as a follower of a leader-follower cluster management algorithm; one of the nodes is a monitor node that functions as a monitor that votes for the election of the leader; the other node is a spare node that functions as a spare for the monitor node; and the cluster management method is A cluster management method characterized by comprising the steps of: each of the leader node, follower node, monitor node, and spare node monitors the health status of the communication paths connecting each of the leader node, follower node, monitor node, and spare node; and the leader node changing the cluster configuration so that a new leader node can be elected when a failure is detected in one of the communication paths.