Unlock instant, AI-driven research and patent intelligence for your innovation.

Program for managing failure in a network of nodes based on a local strategy

A node network and node technology, which is applied in some fields of node network, can solve the problems of large computing time loss, computing time loss, complexity, etc., and achieve the effects of reducing downtime, minimizing interference, and simple storage

Inactive Publication Date: 2019-12-31
BULL SA
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The second level L2, performs the first intermediate backup by duplicating on the partner node, which is less simple and slightly more expensive, resulting in a greater loss of computation time during failures, can only be retrieved at the second level L2,
[0009] The third level L3, which makes the second intermediate save Solomon (Reed-Solomon) encoding, simpler, but still slightly more expensive, resulting in a larger loss of computation time during failures, and recoverable only at the third level L3,
[0010] The fourth level, L4, performs a global backup in terms of the file system, which is complex and very expensive, resulting in a really significant loss of computation time during failures, which can only be retrieved at the fourth level, L4
[0011] From local level L1 to global level L4, backup is becoming more powerful and flexible, but it is also becoming more complex and expensive

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Program for managing failure in a network of nodes based on a local strategy
  • Program for managing failure in a network of nodes based on a local strategy
  • Program for managing failure in a network of nodes based on a local strategy

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] figure 1 Schematically represents an example of a network part, comprising a group of nodes connected to each other by PCIe switches and their storage media, according to a version of the present invention, a part of the network in which an example of a fault management method according to the present invention can be performed.

[0065] This part of the network comprises several computing nodes 2, figure 1 The three compute nodes 21, 22, and 23 in the example of figure 1 In the example of several storage media 3, three storage media 31, 32 and 33.

[0066] These computing nodes 2 and their storage media 3 form a group of computing nodes managed by a PCIe switch 1 that connects these computing nodes 2 to their respective storage media 3 via PCIe bidirectional connections 7 , 8 or 9 . These PCIe connections 7, 8 or 9 may be PCIe multiplex connections. Connection 7 is a 4-way connection. Connection 8 is a 4-way connection. The connections 9 are bi-directional connect...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Disclosed is a failure management method in a network of nodes, the method including, for each considered node: first, a step of locally saving the state of this considered node, to a storage medium for this node in question; then, if the considered node has failed, retrieving the local backup of the state of this considered node, by redirecting the link between the considered node and its storagemedium to connect this storage medium to an operational node other than the considered node, this operational node already in the process of carrying out this calculation, the local backups of theseconsidered nodes, used for the retrieving steps being coherent with each other so as to correspond to the same state of calculation; if at least one considered node failed, returning this local backupfor this considered node to a new additional node added to the network at the time of the failure.

Description

technical field [0001] The invention relates to a method for managing faults in a network of nodes and to a network of nodes part associated with the fault management. Background technique [0002] Backups are performed at one or more levels in a network of nodes performing the same computation. Therefore, these are multilevel backups. When a failure occurs, computing can be at least partially restored without a full restart, simply thanks to backups. Depending on the type of failure, a certain level of backup is used to restore the computation partially or even in a majority or almost complete way. [0003] Distributed applications can last longer than average without any network failures, also known as the cluster's MTBF ("mean time between failures"), so they have a lot of opportunity for outages. Typically, they have no internal fault management solution, which can lead to loss of local backup data in the event of a physical failure of a compute node. All computation...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): H04L12/24H04L29/08
CPCH04L41/0654H04L41/0893H04L67/1095G06F3/0635G06F9/5061G06F11/2033G06F11/1438G06F11/2035H04L41/0668H04L41/0856
Inventor 古伊劳姆·莱保泰雷伊曼纽尔·布雷莱弗洛伦特·杰曼皮奥特尔·莱斯尼基
Owner BULL SA