Decentralized HPC computing cluster management method and system based on paxos algorithm

A decentralization, computing cluster technology, applied in the field of decentralized HPC computing cluster management methods and systems, can solve problems such as increased job scheduling pressure, poor scalability, and difficulty in high-concurrency monitoring business scenarios, to achieve computing capabilities, The effect of improved usability

Active Publication Date: 2020-05-26
DAWNING INFORMATION IND BEIJING +1
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the working mode of a single-master cluster, all jobs can only be submitted and scheduled through the main management node. When the cluster size is small, multiple jobs are queued to ease the job scheduling pressure. When the supercomputer scale is large enough, computing power is no longer the bottleneck. , the scheduling and availability of the main management node will become a new bottleneck, especially when many small jobs are submitted with high concurrency, the job scheduling pressure will increase exponentially; the same is true for cluster computing resource monitoring, and the pressure of collecting data is transferred to the management Node processing, it is difficult to achieve ultra-large-scale high-concurrency monitoring business scenarios
Since the existing high-performance computing cluster is a job scheduling system with a single management node, the job scheduler cannot achieve load balancing; the scalability is poor, or it does not support expansion, and management nodes cannot be added at will

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Decentralized HPC computing cluster management method and system based on paxos algorithm
  • Decentralized HPC computing cluster management method and system based on paxos algorithm
  • Decentralized HPC computing cluster management method and system based on paxos algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041] Such as Figure 1-2 As shown, a decentralized HPC computing cluster management method based on paxos algorithm, including

[0042] Deploy the main management node and multiple standby management nodes, and set up the cluster management election mechanism;

[0043] The cluster management election mechanism includes: the heartbeat connection reply sent by the main management node exceeds the preset value, and the standby management node performs election according to the paxos algorithm to generate a new main management node;

[0044] The original active management node goes offline, and the new active management node performs heartbeat monitoring on the remaining standby management nodes.

[0045] The cluster management method of the present invention is mainly based on the paxos algorithm for multi-cluster election. During deployment, multiple standby management nodes can be deployed, and each management node can perform job scheduling; the main management node monitor...

Embodiment 2

[0069] The present invention also provides a decentralized HPC computing cluster system based on the paxos algorithm, which is used to implement the method in Embodiment 1 above. Such as figure 1 As shown, the cluster system includes a master management node and multiple standby management nodes, and the master management node and the managed nodes automatically generate management nodes according to the cluster management election mechanism preset by the cluster system;

[0070] Wherein the cluster management election mechanism includes

[0071] The heartbeat connection reply sent by the main management node exceeds the preset value, and the standby management node performs election according to the paxos algorithm to generate a new main management node;

[0072] The original active management node goes offline, and the new active management node performs heartbeat monitoring on the remaining standby management nodes.

[0073] refer to figure 2 , specifically, the cluster...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a decentralized HPC computing cluster management method and system based on a paxos algorithm. The method comprises the steps of deploying a main management node and a plurality of standby management nodes, and setting a cluster management election mechanism, wherein the cluster management election mechanism comprises the steps that a reply of heartbeat connection sent by amain management node exceeds a preset value, and a standby management node carries out election according to the paxos algorithm to generate a new main management node; and enabling the original mainmanagement node to be offline, and enabling the new main management node to perform heartbeat monitoring on the remaining standby management nodes. According to the invention, the HPC high-performance job scheduling cluster can be optimized from a single-master centralized cluster mode to a decentralized cluster mode; the change of the mode enables the availability of the cluster to be greatly improved; limit of a single-point fault of the single-master cluster centralization mode is avoided; the fault-tolerant capability of the cluster is improved by several orders of magnitudes; the faultis more suitable for an actual scene; automatic high availability is provided for the cluster; and high availability is completed without a third-party tool.

Description

technical field [0001] The present invention relates to the technical field of computer data processing, specifically, a decentralized HPC computing cluster management method and system based on paxos algorithm. Background technique [0002] With the country's vigorous promotion of informatization reform, China's supercomputer construction is also among the best in the world, and there are more and more national-level supercomputing centers. This has higher and higher requirements for software such as job scheduling systems and cluster monitoring systems running on supercomputers. The HPC software product architecture used when the scale was originally small cannot adapt to larger-scale scheduling and computing resource monitoring, resulting in hardware It does not match the software system, thus affecting the actual computing performance of the entire computing cluster at the software level. The current HPC cluster product software is basically a master-slave cluster archi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L12/24H04L12/26H04L29/08
CPCH04L41/042H04L67/1008H04L43/10
Inventor 解文龙张晋锋张永生刘瑞贤李斌历军
Owner DAWNING INFORMATION IND BEIJING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products