Mirroring and recovering method of high-performance computing cluster node

A technology of high-performance computing and recovery methods, which is applied in the direction of data error detection and response error generation in computing and computing redundancy, and can solve the problems of increasing operating system installation failures, long operating system installation time, and increasing high performance Computing cluster system installation and deployment time and other issues to achieve the effect of speeding up mirroring and recovery time, avoiding secondary modification operations, and reducing the amount of data copying

Pending Publication Date: 2021-03-05
DAWNING INFORMATION IND BEIJING
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The problem with this deployment method is that the installation of the operating system often takes a long time, especially when installing many software packages. At the same time, after the installation of the operating system is completed, it is necessary to perform secondary configuration work such as node names and IP addresses.
[0005] For computing nodes that include GPU and other co-processor accelerator cards and high-speed Internet, after the operating system is installed, the corresponding driver software package must be installed, and various driver software often contain kernel modules, and the computing needs to be restarted several times during the installation process. This not only increases the installation and deployment time of the high-performance computing cluster system, but also greatly increases the risk of operating system installation failure when the computing node is restarted.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mirroring and recovering method of high-performance computing cluster node
  • Mirroring and recovering method of high-performance computing cluster node
  • Mirroring and recovering method of high-performance computing cluster node

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0029] The following combination Figure 1 to Figure 2 The mirroring and recovery method in this embodiment will be described.

[0030] Such as figure 1 As shown, in this embodiment, the high-performance computing cluster node is provided with a server, which is a diskless boot server, and a mirror extraction node and N deployment nodes are mounted under the diskless boot server, N=1, 2,... , wherein, after the image extraction node and the deployment node are restarted, they can enter the diskless boot environment, and the partition information and the contents of each partition in the image extraction node are used as templates, and each deployment node performs partitioning and recovery with reference to the image extraction node. At the same time, considering that the partition image file used when the deployment node is restored is relatively large, to avoid occupying resources in the diskless boot server, a mirror storage node is set up in the high-performance computing...

Embodiment 2

[0113] Such as image 3 As shown, this embodiment provides another method for mirroring and restoring high-performance computing cluster nodes. In the high-performance computing cluster nodes to which this method is applicable, the image extraction node and the image storage node are not separately set, but are used in the absence of The partition image file saved in advance in the disk boot server is used to upgrade and restore the deployment node. The partition table information of the deployment node is also stored in the diskless boot server.

[0114] The mirroring and recovery method includes:

[0115] 201. Build a diskless boot server, so that the image extraction node and deployment node can start the system through diskless. This step is the same as that of Embodiment 1 and will not be repeated here.

[0116] 202. Transfer the previously saved partition image file to an external image storage node for storage. In this embodiment, it is assumed that there is a previo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a mirroring and recovery method of a high-performance computing cluster node, the cluster node is provided with a server, at least two deployment nodes are mounted on the server, and the mirroring and recovery method comprises the following steps: step 1, initializing network service of the server, establishing a diskless starting system of the server, and constructing a diskless starting mirror image file of the diskless starting system in the server; step 2, enabling the server to acquire preset partition table information, generate a partition mirror image file according to the preset partition table information and the diskless startup mirror image file, and send the preset partition table information and the partition mirror image file to a deployment node; and3, formatting and partitioning by the deployment node according to the preset partition table information, and performing system recovery by the partitioned deployment node by utilizing the partitionmirror image file. According to the technical scheme, the local hard disk can be conveniently and flexibly mounted, and the risk that a redundant operating system and a driver are installed unsuccessfully in the installation process is avoided.

Description

technical field [0001] The present application relates to the technical field of high-performance computing clusters, in particular, to a method for mirroring and restoring nodes of high-performance computing clusters. Background technique [0002] Modern high-performance computing cluster systems are mostly composed of multiple computing clusters interconnected through high-speed networks, and each computing cluster is called a computing node. With the development of the computer industry and the continuous improvement of computing capabilities, the number of computing nodes included in high-performance computing clusters has also increased year by year, and the number of computing nodes ranges from dozens to hundreds of thousands. Even the largest high-performance computing cluster currently includes more than million computing nodes. In recent years, with the rise of heterogeneous computing, in order to improve the floating-point computing capability of a single computin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/61G06F11/14
CPCG06F8/63G06F11/1464
Inventor 韩孟之解西国翟建孙建鹏况吕林
Owner DAWNING INFORMATION IND BEIJING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products