Method for providing distributed system election by container orchestration system
By using the heartbeat detection and lease mechanism of the container orchestration system, automatic master-slave switching and configuration updates of the distributed system are realized, which solves the system stability and availability problems when the master node fails and improves the automation level of the system.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- QIMING INFORMATION TECH CO LTD
- Filing Date
- 2025-01-21
- Publication Date
- 2026-06-25
AI Technical Summary
In distributed systems, when the master node fails, existing technologies require manual intervention to switch between master and slave nodes, which is time-consuming, labor-intensive, and prone to errors. Furthermore, configuration information updates also require manual intervention, affecting the availability and stability of the system.
Heartbeat detection is performed using a container orchestration system, master node election is conducted using a lease mechanism, health checks are performed using probes, and node configurations are automatically synchronized to achieve master-slave switching and configuration updates.
It enables automatic master-slave failover, rapid recovery, and configuration updates in distributed systems, improving system stability and availability, reducing manual intervention, and increasing automation.
Smart Images

Figure CN2025073610_25062026_PF_FP_ABST
Abstract
Description
A container orchestration system provides a method for distributed system election. Technical Field
[0001] This invention relates to the field of container orchestration technology, and more particularly to a method for providing distributed system election in a container orchestration system. Background Technology
[0002] With the rapid development of cloud computing, container orchestration systems are being used more and more widely in distributed systems. However, in distributed systems, how to quickly and automatically perform master-slave failover when the master node fails, in order to ensure the availability and stability of the system, has always been a problem that needs to be solved.
[0003] 1. The problem of automatic master-slave failover when the master node fails in a distributed system: In traditional distributed systems, when the master node fails, manual intervention is usually required to perform master-slave failover, which is not only time-consuming and labor-intensive, but also prone to errors.
[0004] 2. Rapid recovery from master node failure in distributed systems: In distributed systems, master node failure may cause system shutdown or service interruption;
[0005] 3. Automatic configuration update problem when the master node fails in a distributed system: In a distributed system, updating configuration information usually requires manual intervention. When the master node fails and a master-slave switch is completed, how can the configuration information be automatically updated to ensure the normal operation of the system?
[0006] 4. The issue of notifying other nodes when the master node fails in a distributed system: In a distributed system, when the master node fails and completes the master-slave switch, how to notify other nodes and update the corresponding configuration information? Technical issues
[0007] The purpose of this invention is to address the aforementioned technical problems by proposing a method for distributed system election in a container orchestration system.
[0008] A container orchestration system provides a method for distributed system election, comprising the following sub-steps:
[0009] S1: Utilize a container orchestration system for heartbeat detection;
[0010] S2: Use the lease mechanism built into the container orchestration system to elect the master node;
[0011] S3: The task on the node with the heartbeat detection failure is switched to the master node;
[0012] S4: Automatically synchronize nodes using the configuration tools of the container orchestration system.
[0013] Furthermore, a container orchestration system provides a method for distributed system election, wherein step S1 includes the following sub-steps:
[0014] S11: Collect node resource usage information through a container orchestration system;
[0015] The node resources include CPU, memory, network, and disk I / O;
[0016] S12: Periodic health checks using probes;
[0017] The probes include Kubernetes liveness and readiness.
[0018] S13: Set the heartbeat detection frequency to 2 seconds by default, and determine the node abnormality by the number of retries after a failed heartbeat detection.
[0019] Furthermore, a container orchestration system provides a method for distributed system election, wherein step S13 includes the following sub-steps:
[0020] If the heartbeat detection fails and the number of retries is ≤3, then the node is normal;
[0021] If the heartbeat detection fails and the number of retries exceeds 3, the node is considered abnormal and a re-election is triggered.
[0022] Furthermore, a container orchestration system provides a method for distributed system election, wherein step S2 includes the following sub-steps:
[0023] S21: During the election process, each node instance continuously attempts to update the holder and renewal time information of the Lease resource;
[0024] S22: When a node instance successfully updates the holder and renewal time information of the Lease resource, other node instances will find that the Lease is occupied during the check and will not try to take over the master control again.
[0025] S23: The node that successfully acquires Lease instance permissions becomes the master node, and each node instance will be registered in the Lease holder field with a unique node name;
[0026] S24: When a node instance restarts, control is confirmed by comparing the current instance's ID with the lease holder's ID.
[0027] Furthermore, a container orchestration system provides a method for distributed system election, wherein step S3 includes the following sub-steps:
[0028] S31: After confirming the completion of the master node election, the orchestration system switches the tasks originally handled by the faulty node to the master node to ensure the stability and availability of the system;
[0029] S32: Safely pause and isolate tasks on the faulty node, and keep the stateful data of the master node instance and member node instances in constant synchronization through shared data storage.
[0030] Furthermore, a container orchestration system provides a method for distributed system election, wherein step S4 includes the following sub-steps:
[0031] S41: Container orchestration system configuration tool, updates the role configuration of the new master node, and marks the original failed node as a standby or offline node;
[0032] S42: Use the configuration tool to automatically synchronize and publish configurations, updating all relevant nodes of the original fault and the new master node. Beneficial effects
[0033] The beneficial effects of this invention are as follows: By providing a method for electing a distributed system through a container orchestration system, the stability and availability of the distributed system are improved by managing and coordinating the various components of the distributed system using the container orchestration system; the use of an election mechanism to select a master node ensures the automatic repair capability of the distributed system; and the use of automated processing for switching tasks and updating configurations reduces the trouble of manual intervention and improves the automation level of the distributed system. Attached Figure Description
[0034] Figure 1 is a node relationship diagram of a container orchestration system that provides a method for distributed system election. The best embodiment of the present invention
[0035] To provide a clearer understanding of the technical features, objectives, and effects of the present invention, specific embodiments of the present invention will now be described with reference to the accompanying drawings.
[0036] As shown in Figure 1, a container orchestration system provides a method for distributed system election, comprising the following sub-steps:
[0037] S1: Utilize a container orchestration system for heartbeat detection;
[0038] S2: Use the lease mechanism built into the container orchestration system to elect the master node;
[0039] S3: The task on the node with the heartbeat detection failure is switched to the master node;
[0040] S4: Automatically synchronize nodes using the configuration tools of the container orchestration system.
[0041] Furthermore, a container orchestration system provides a method for distributed system election, wherein step S1 includes the following sub-steps:
[0042] S11: Collect node resource usage information through a container orchestration system;
[0043] The node resources include CPU, memory, network, and disk I / O;
[0044] S12: Periodic health checks using probes;
[0045] The probes include Kubernetes liveness and readiness.
[0046] S13: Set the heartbeat detection frequency to 2 seconds by default, and determine the node abnormality by the number of retries after a failed heartbeat detection.
[0047] Furthermore, a container orchestration system provides a method for distributed system election, wherein step S13 includes the following sub-steps:
[0048] If the heartbeat detection fails and the number of retries is ≤3, then the node is normal;
[0049] If the heartbeat detection fails and the number of retries exceeds 3, the node is considered abnormal and a re-election is triggered.
[0050] Furthermore, a container orchestration system provides a method for distributed system election, wherein step S2 includes the following sub-steps:
[0051] S21: During the election process, each node instance continuously attempts to update the holder and renewal time information of the Lease resource;
[0052] S22: When a node instance successfully updates the holder and renewal time information of the Lease resource, other node instances will find that the Lease is occupied during the check and will not try to take over the master control again.
[0053] S23: The node that successfully acquires Lease instance permissions becomes the master node, and each node instance will be registered in the Lease holder field with a unique node name;
[0054] S24: When a node instance restarts, control is confirmed by comparing the current instance's ID with the lease holder's ID.
[0055] Furthermore, a container orchestration system provides a method for distributed system election, wherein step S3 includes the following sub-steps:
[0056] S31: After confirming the completion of the master node election, the orchestration system switches the tasks originally handled by the faulty node to the master node to ensure the stability and availability of the system;
[0057] S32: Safely pause and isolate tasks on the faulty node, and keep the stateful data of the master node instance and member node instances in constant synchronization through shared data storage.
[0058] Furthermore, a container orchestration system provides a method for distributed system election, wherein step S4 includes the following sub-steps:
[0059] S41: Container orchestration system configuration tool, updates the role configuration of the new master node, and marks the original failed node as a standby or offline node;
[0060] S42: Use configuration tools to automatically synchronize and publish configurations, updating all nodes related to the original fault and the new master node.
[0061] Specific Implementation Example 1: Solution to the Problems of the Prior Art
[0062] 1. Automatic master-slave failover when the master node fails in a distributed system: The container orchestration system of this invention can automatically detect the status of the master node and automatically start the election process when the master node fails, electing a new master node and realizing automatic master-slave failover.
[0063] 2. Rapid recovery from master node failure in distributed systems: The container orchestration system of this invention can quickly elect a new master node when the master node fails, update the configuration file and notify other nodes, ensuring that the system can quickly resume normal operation.
[0064] 3. Automatic configuration update when the master node fails in a distributed system: The container orchestration system of this invention can automatically update the configuration file after the election is completed, write the information of the new master node into the configuration file, and mark the information of the original master node as a standby node, so as to ensure that the system can be correctly configured and run.
[0065] 4. Notification of other nodes when the master node fails in a distributed system: The container orchestration system of this invention can notify other nodes through broadcast messages, informing them that a new master node has been elected and updating the corresponding configuration information, ensuring that other nodes can obtain the latest configuration information in a timely manner and make corresponding adjustments, thus ensuring the normal operation of the system.
[0066] This solution provides a method for distributed system election through a container orchestration system. By using the container orchestration system to manage and coordinate the various components of the distributed system, the stability and availability of the distributed system are improved. An election mechanism is used to select the master node, ensuring the automatic repair capability of the distributed system. Automated processing of task switching and configuration updates reduces the trouble of manual intervention and improves the automation level of the distributed system.
[0067] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are merely illustrative of the principles of the invention. Various changes and modifications can be made to the invention without departing from its spirit and scope, and all such changes and modifications fall within the scope of the present invention as claimed. The scope of protection of this invention is defined by the appended claims and their equivalents.
Claims
1. A method for providing distributed system election in a container orchestration system, characterized in that, Includes the following sub-steps: S1: Utilize a container orchestration system for heartbeat detection; S2: Use the lease mechanism built into the container orchestration system to elect master nodes; S3: The task on the node with the heartbeat detection failure is switched to the master node; S4: Automatically synchronize nodes using the configuration tools of the container orchestration system.
2. The method for providing distributed system election in a container orchestration system according to claim 1, characterized in that, Step S1 includes the following sub-steps: S11: Collect node resource usage information through a container orchestration system; The node resources include CPU, memory, network, and disk I / O; S12: Periodic health checks using probes; The probes include Kubernetes liveness and readiness. S13: Set the heartbeat detection frequency to 2 seconds by default, and determine the node abnormality by the number of retries after a failed heartbeat detection.
3. A method for providing distributed system election in a container orchestration system according to claim 2, characterized in that, Step S13 includes the following sub-steps: If the heartbeat detection fails and the number of retries is ≤3, then the node is normal; If the heartbeat detection fails and the number of retries exceeds 3, the node is considered abnormal and a re-election is triggered.
4. A method for providing distributed system election in a container orchestration system according to claim 1, characterized in that, Step S2 includes the following sub-steps: S21: During the election process, each node instance continuously attempts to update the holder and renewal time information of the Lease resource; S22: When a node instance successfully updates the holder and renewal time information of the Lease resource, other node instances will find that the Lease is occupied during the check and will not try to take over the master control again. S23: The node that successfully acquires Lease instance permissions becomes the master node, and each node instance will be registered in the Lease holder field with a unique node name; S24: When a node instance restarts, control is confirmed by comparing the current instance's ID with the lease holder's ID.
5. A method for providing distributed system election in a container orchestration system according to claim 1, characterized in that, The following are the sub-steps of step S3: S31: After confirming the completion of the master node election, the orchestration system switches the tasks originally handled by the faulty node to the master node to ensure the stability and availability of the system; S32: Safely pause and isolate tasks on the faulty node, and keep the stateful data of the master node instance and member node instances in constant synchronization through shared data storage.
6. A method for providing distributed system election in a container orchestration system according to claim 1, characterized in that, The following are the sub-steps of step S4: S41: Container orchestration system configuration tool, updates the role configuration of the new master node, and marks the original failed node as a standby or offline node; S42: Use configuration tools to automatically synchronize and publish configurations, updating all nodes related to the original fault and the new master node.