Fault detection method, device, local node, detection system and storage medium

By using the second BFD keep-alive message to cyclically detect other forwarding paths in multi-hop BFD detection, the problem of fault misjudgment caused by untimely route convergence in load routing scenarios is solved, ensuring the stability of inter-node communication and user experience.

CN116886574BActive Publication Date: 2026-06-26MAIPU COMM TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
MAIPU COMM TECH CO LTD
Filing Date
2023-07-18
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In multi-hop BFD detection, under load routing scenarios, failure to converge routes in a timely manner can lead to misjudgments of faults, causing communication between nodes to stop and affecting user experience.

Method used

The local node periodically sends a first BFD keep-alive message to pause the transmission of packets from the peer, and uses a second BFD keep-alive message to cyclically detect other forwarding paths until a specific forwarding path without faults is determined or a communication fault is determined within a preset detection period. The diagnostic field is then used to adjust the path.

Benefits of technology

This effectively avoids misjudgments of faults caused by untimely route convergence, ensures normal communication between nodes, and improves user experience.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116886574B_ABST
    Figure CN116886574B_ABST
Patent Text Reader

Abstract

The application provides a fault detection method, device, local node, detection system and storage medium. The fault detection method is applied to a local node. The local node is connected with a peer node in communication and has multiple forwarding paths. When it is determined that a path fault exists in a packet sending path of the peer node, the periodic sending of a first BFD keep-alive packet is suspended. The other forwarding paths are cyclically detected for faults by sending a second BFD keep-alive packet to the peer node. When a third BFD keep-alive packet is received, a specific forwarding path without fault is determined. When the third BFD keep-alive packet is not received within a preset detection time length, it is determined that the communication between the local node and the peer node is faulty. This can find the specific forwarding path in time or determine the inter-node communication fault after the preset detection time length, effectively avoids the fault misjudgment caused by the slow routing convergence, further avoids the inter-node communication suspension caused by the fault misjudgment, and improves the user experience.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of communications, and more specifically, to a fault detection method, apparatus, local node, detection system, and storage medium. Background Technology

[0002] Bidirectional Forwarding Detection (BFD) is a high-speed fault detection mechanism based on the RFC5880 standard, enabling end-to-end fault detection. The BFD detection mechanism involves establishing a BFD session between two nodes and periodically sending BFD packets along the path between them. If one node does not receive a BFD packet within a predetermined time, the BFD session state changes to Down, indicating a fault has occurred on the path.

[0003] BFD (Browser Detection) is divided into single-hop BFD detection (two nodes communicate directly) and multi-hop BFD detection (two nodes communicate through an intermediate node). Multi-hop BFD is a network-wide unified detection mechanism used to quickly detect and monitor the forwarding connectivity of IP routes in the network. In multi-hop BFD detection scenarios, there are cases where the routes between the two nodes are load balancing routes: the routing table of one node contains multiple next-hop nodes with the other node as the destination address, meaning there is more than one path between the two nodes.

[0004] In existing technologies, in scenarios where multi-hop BFD detection involves load routing, a BFD session is created between the two endpoints, utilizing one of the paths for fault detection. However, if a fault occurs in the middle of this path or if the outgoing interfaces at both ends of the path experience oscillations, the BFD session may directly determine that the communication between the two endpoints has failed if route convergence is not timely, thus stopping communication between the two endpoints. However, there may still be other communicable paths between the two endpoints at this time, which constitutes a false fault diagnosis by the BFD session.

[0005] Furthermore, if routing convergence is not timely, misjudgment of BFD session failures can lead to excessively long communication interruptions between nodes, affecting user experience. Summary of the Invention

[0006] The purpose of this invention is to provide a fault detection method, device, local node, detection system, and storage medium to improve the problems existing in the prior art.

[0007] The embodiments of the present invention can be implemented as follows:

[0008] In a first aspect, the present invention provides a fault detection method applied to a local node, wherein the local node and a peer node are communicatively connected and have multiple forwarding paths; the method includes:

[0009] The first BFD keep-alive message is periodically sent to the peer node through the local packet sending path;

[0010] If the first BFD keep-alive message sent by the peer node through the peer packet sending path is not received within a preset period, it is determined that there is a path failure in the peer packet sending path and the periodic sending of the first BFD keep-alive message is suspended; the local packet sending path and the peer packet sending path are either of the forwarding paths.

[0011] Fault detection is performed cyclically on forwarding paths other than the peer's packet sending path by sending a second BFD keep-alive message to the peer node.

[0012] When the third BFD keep-alive message of the peer node is received within the preset detection time, a fault-free specific forwarding path is obtained;

[0013] If the third BFD keep-alive message is not received within the preset detection time, it is determined that there is a communication failure between the local node and the peer node.

[0014] Optionally, the third BFD keep-alive message is sent by the peer node when it receives the second BFD keep-alive message, after pausing the periodic transmission of the first BFD keep-alive message and returning it through the forwarding path that receives the second BFD keep-alive message;

[0015] The specific forwarding path is the forwarding path traversed by the third BFD keep-alive message; after the step of obtaining the fault-free specific forwarding path, the method further includes:

[0016] Resume the periodic transmission of the first BFD keep-alive message on the specific forwarding path;

[0017] Receive the first BFD keep-alive message periodically sent by the peer node through the specific forwarding path.

[0018] Optionally, the local node includes the local outgoing interface for each of the forwarding paths; the step of cyclically performing fault detection on other forwarding paths besides the peer packet sending path by sending a second BFD keep-alive message to the peer node includes:

[0019] From all the local outgoing interfaces, determine all other outgoing interfaces except the local outgoing interface corresponding to the peer packet sending path;

[0020] Within the preset detection time, the second BFD keep-alive message is sent cyclically to the peer node through each of the other outgoing interfaces to perform fault detection on each of the other forwarding paths.

[0021] Optionally, the step of cyclically sending the second BFD keep-alive message to the peer node through each of the other outgoing interfaces within the preset detection time to perform fault detection on each of the other forwarding paths includes:

[0022] Designate one of the other output interfaces as the interface to be tested;

[0023] Within the preset detection time, the second BFD keep-alive message is sent to the peer node through the interface under test to perform fault detection on the forwarding path corresponding to the interface under test;

[0024] Determine whether the third BFD keepalive message is received from the interface under test within the preset period; the third BFD keepalive message is received by the peer node when it receives the second BFD keepalive message, and the periodic transmission of the first BFD keepalive message is paused and the message is returned from the forwarding path that received the second BFD keepalive message.

[0025] If the third BFD keep-alive message is received from the interface under test within the preset period, the forwarding path corresponding to the interface under test is determined to be a fault-free specific forwarding path.

[0026] If the third BFD keep-alive message is not received from the interface under test within the preset period, it is determined whether the loop detection time exceeds the preset detection duration.

[0027] If the loop detection time exceeds the preset detection time, then the communication between the local node and the peer node is determined to be faulty.

[0028] If the loop detection time does not exceed the preset detection time, then the next other outgoing interface is taken as the interface to be tested, and the process of sending the second BFD keep-alive message to the peer node through the interface to be tested to perform fault detection on the forwarding path corresponding to the interface to be tested continues until it is determined that the forwarding path corresponding to the interface to be tested is a fault-free specific forwarding path or that there is a communication failure between the local node and the peer node is determined.

[0029] Optionally, the first BFD keep-alive message, the second BFD keep-alive message, and the third BFD keep-alive message all include diagnostic fields;

[0030] In the first BFD keep-alive message, the second BFD keep-alive message, and the third BFD keep-alive message, the diagnostic fields are 0, preset interface change identifier, and preset interface change response identifier, respectively.

[0031] The diagnostic field in the first BFD keep-alive message is used to indicate that the forwarding path through which the first BFD keep-alive message passes is fault-free at the message receiving end;

[0032] The diagnostic field in the second BFD keepalive message is used to indicate that there is a path failure in the peer node's packet sending path, and the peer packet sending path needs to be adjusted to the forwarding path traversed by the second BFD keepalive message;

[0033] The diagnostic field in the third BFD keep-alive message is used to inform the local node that the peer node has received the second BFD keep-alive message, and at the same time instructs the local node to adjust its packet sending path to the forwarding path traversed by the third BFD keep-alive message.

[0034] Secondly, the present invention provides a fault detection device applied to a local node, wherein the local node and the peer node are communicatively connected and have multiple forwarding paths; the device includes:

[0035] The periodic transceiver module is used for:

[0036] The first BFD keep-alive message is periodically sent to the peer node through the local packet sending path;

[0037] If the first BFD keep-alive message sent by the peer node through the peer packet sending path is not received within a preset period, it is determined that there is a path failure in the peer packet sending path and the periodic sending of the first BFD keep-alive message is suspended; the local packet sending path and the peer packet sending path are either of the forwarding paths.

[0038] The loop detection module is used for:

[0039] Fault detection is performed cyclically on forwarding paths other than the peer's packet sending path by sending a second BFD keep-alive message to the peer node.

[0040] When the third BFD keep-alive message of the peer node is received within the preset detection time, a fault-free specific forwarding path is obtained;

[0041] If the third BFD keep-alive message is not received within the preset detection time, it is determined that there is a communication failure between the local node and the peer node.

[0042] Optionally, the third BFD keep-alive message is sent by the peer node when it receives the second BFD keep-alive message, after pausing the periodic transmission of the first BFD keep-alive message and returning it through the forwarding path that receives the second BFD keep-alive message;

[0043] The specific forwarding path is the forwarding path traversed by the third BFD keep-alive message; after the loop detection module obtains the fault-free specific forwarding path, the periodic transceiver module is further used for:

[0044] Resume the periodic transmission of the first BFD keep-alive message on the specific forwarding path;

[0045] Receive the first BFD keep-alive message periodically sent by the peer node through the specific forwarding path.

[0046] Thirdly, the present invention provides a local node, comprising: a memory and a processor, wherein the memory stores a software program, and when the local node is running, the processor executes the software program to implement the fault detection method as described in any of the foregoing embodiments.

[0047] Fourthly, the present invention provides a detection system, the detection system including a local node and a peer node as described in the foregoing embodiments, wherein the local node and the peer node are communicatively connected and have multiple forwarding paths.

[0048] Fifthly, the present invention provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the fault detection method described in any of the foregoing embodiments.

[0049] Compared with existing technologies, this invention provides a fault detection method, apparatus, local node, detection system, and storage medium. The fault detection method is applied to a local node that communicates with a peer node and has multiple forwarding paths. When the local node determines that a path fault exists in the peer's packet transmission path, it suspends the periodic transmission of a first BFD keep-alive message. It then cyclically detects faults in other forwarding paths by sending a second BFD keep-alive message to the peer node. Upon receiving a third BFD keep-alive message, it identifies a specific forwarding path without faults; or, if no third BFD keep-alive message is received within a preset detection period, it determines that there is a communication fault between the local and peer nodes. This allows for timely identification of specific forwarding paths or, if a fault is only identified after a preset detection period, effectively avoids misjudgments caused by untimely route convergence, further preventing communication interruptions due to misjudgments and improving user experience. Attached Figure Description

[0050] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present invention and should not be regarded as a limitation on the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.

[0051] Figure 1 This is a topology diagram for a load routing scenario.

[0052] Figure 2 This is a schematic diagram of a detection system provided in an embodiment of the present invention.

[0053] Figure 3 This is one of the flowcharts of a fault detection method provided in an embodiment of the present invention.

[0054] Figure 4 This is a second schematic flowchart of a fault detection method provided in an embodiment of the present invention.

[0055] Figure 5 This is the third flowchart illustrating a fault detection method provided in an embodiment of the present invention.

[0056] Figure 6 This is the fourth flowchart illustrating a fault detection method provided in an embodiment of the present invention.

[0057] Figure 7 This is a schematic diagram of a detection scenario between nodes provided in an embodiment of the present invention.

[0058] Figure 8 This is a schematic diagram of a fault detection device provided in an embodiment of the present invention.

[0059] Figure 9 This is a schematic diagram of the structure of a local node provided in an embodiment of the present invention. Detailed Implementation

[0060] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. The components of the embodiments of the present invention described and shown in the accompanying drawings can generally be arranged and designed in various different configurations.

[0061] Therefore, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the invention without inventive effort are within the scope of protection of the invention.

[0062] It should be noted that similar labels and letters in the following figures indicate similar items. Therefore, once an item is defined in one figure, it does not need to be further defined and explained in subsequent figures.

[0063] It should be noted that, where there is no conflict, the features in the embodiments of the present invention can be combined with each other.

[0064] Here, we will first introduce the keywords or key terms involved in this invention:

[0065] 1. Load Routing: If a node A's routing table contains multiple next-hop nodes with node B as the destination address, it indicates that the routing between nodes A and B is load routing. In other words, there is more than one path between nodes A and B. For example, please refer to... Figure 1 , Figure 1 The diagram shows six router nodes (node ​​A, node B, and R1 to R4). For node A, the router nodes with node B as the destination address are R1, R2, and R3; for node B, the router nodes with node A as the destination address are R1, R2, and R4. There are three paths between nodes A and B. This example is merely one illustration and is not intended to be limiting.

[0066] 2. Route convergence: This refers to the process by which a node re-establishes its routing table, sends the data, learns it, and stabilizes it after a change in the network topology, and then notifies all relevant nodes in the network of the change.

[0067] Please continue to refer to Figure 1 There are three paths between nodes A and B: S1 (node ​​A—R1—node B), S2 (node ​​A—R2—node B), and S3 (node ​​A—R3—R4—node B). Figure 1 In such a scenario, performing BFD detection between node A and node B is called multi-hop BFD detection.

[0068] Combination Figure 1 In the existing technology, multi-hop BFD detection between node A and node B requires the creation of a BFD session, and then the fault detection is performed using one of the paths. Assuming that path S1 is used, node A and node B periodically send BFD messages to each other.

[0069] A failure in path S1 can occur due to factors such as router node R1 malfunctioning, interface A1 of node A experiencing oscillation, or interface B1 of node B experiencing oscillation. If route convergence is timely, the BFD session can choose a new path (e.g., S2) for BFD detection. However, if route convergence is delayed, a failure in path S1 will prevent nodes A and B from receiving BFD messages from their counterparts. In this case, the BFD session will directly determine that communication between nodes A and B has failed, and communication between the nodes will cease.

[0070] However, in reality, there are two other paths, S2 and S3, between node A and node B. These two paths may not be faulty. If route convergence is not timely and S2 or S3 is not faulty, the BFD session's fault determination result is a false fault assessment. Moreover, in the case of untimely route convergence, the communication interruption time caused by the BFD session's false fault assessment is too long, which can easily affect the user experience.

[0071] In view of this, embodiments of the present invention provide a fault detection method that can use a second BFD keep-alive message to cyclically perform fault detection on other forwarding paths. This allows for the identification of a fault-free specific forwarding path upon receiving a third BFD keep-alive message, or the determination of inter-node communication failure only if a third BFD keep-alive message is not received within a preset detection period. This timely identification of specific forwarding paths or the determination of inter-node communication failure only after a preset detection period effectively avoids misjudgments caused by untimely route convergence, further preventing communication interruptions due to misjudgments and improving user experience. The following detailed description, through embodiments and accompanying drawings, illustrates this method.

[0072] Here, we will first introduce the application scenarios of this invention.

[0073] Please refer to Figure 2 The present invention provides a detection system 100, which includes a local node 110 and a peer node 120. The local node 110 and the peer node 120 are communicatively connected and have three forwarding paths (forwarding path 1 to forwarding path 3).

[0074] One of these forwarding paths may have no intermediate nodes or at least one intermediate node. The local node 110 and the peer node 120 can be network devices such as switches or routers. It should be noted that the number of forwarding paths between the local node 110 and the peer node 120 depends on the actual application. Figure 2 The number of forwarding paths shown is for illustrative purposes only and is not limited here.

[0075] The fault detection method provided in this embodiment of the invention can be applied to the local node 110 and the peer node 120 in the above-mentioned detection system 100. The fault detection method will be described in detail below with the local node 110 as the execution subject.

[0076] Please refer to Figure 3 , Figure 3 This is a flowchart illustrating a fault detection method provided in an embodiment of the present invention. The fault detection includes the following steps S101 to S105:

[0077] S101. Periodically send the first BFD keep-alive message to the peer node through the local packet sending path.

[0078] In this embodiment, the local node can send a first BFD keep-alive message to the peer node through a pre-selected local packet sending path at preset intervals. Similarly, the peer node can send a first BFD keep-alive message to the peer node through a pre-selected peer packet sending path at preset intervals. The local packet sending path can be any forwarding path, and the peer packet sending path can be any forwarding path other than the local packet sending path.

[0079] S102. When the first BFD keep-alive message sent by the peer node through the peer packet sending path is not received within the preset period, it is determined that there is a path failure in the peer packet sending path and the periodic sending of the first BFD keep-alive message is suspended.

[0080] S103. By sending a second BFD keep-alive message to the peer node, fault detection is performed on other forwarding paths besides the peer packet sending path in a loop.

[0081] In this embodiment, if the local node does not receive the first BFD keep-alive message sent by the peer node through the peer packet sending path within a preset period, it can be considered that there is a path failure in the peer packet sending path. At this time, the periodic sending of the first BFD keep-alive message to the peer node is suspended. Instead, the local node sends the second BFD keep-alive message to the peer node to cyclically detect the failure of other forwarding paths besides the peer packet sending path.

[0082] Because, for the local node, if it does not receive the first BFD keep-alive message from the peer node within the preset period, it can only determine that there is a fault in the peer's packet sending path, but it is unknown whether there is a path fault in the other forwarding paths, so it needs to be detected.

[0083] S104. When a third BFD keep-alive message is received from the peer node within the preset detection time, a fault-free specific forwarding path is obtained.

[0084] S105. If no third BFD keep-alive message is received within the preset detection time, it is determined that there is a communication failure between the local node and the peer node.

[0085] In this embodiment, when the peer node receives the second BFD keep-alive message, it can pause the periodic transmission of the first BFD keep-alive message and return a third BFD keep-alive message through the same forwarding path used to receive the second BFD keep-alive message.

[0086] For the local node, when it receives the third BFD keep-alive message from the peer node within the preset detection time, it can determine that the forwarding path traversed by the third BFD keep-alive message is a fault-free specific forwarding path; when the local node does not receive the third BFD keep-alive message within the preset detection time, it means that the peer node has not received the second BFD keep-alive message within the preset detection time, and it can be considered that all forwarding paths between the local node and the peer node have path faults, and the local node can determine that the communication between the local node and the peer node is faulty.

[0087] The fault detection method provided in this invention involves the local node pausing the periodic transmission of the first BFD keep-alive message when it determines that there is a path fault in the packet transmission path of the peer node. It then cyclically detects faults in other forwarding paths by sending a second BFD keep-alive message to the peer node. Upon receiving a third BFD keep-alive message, it identifies a specific forwarding path without faults. Alternatively, if no third BFD keep-alive message is received within a preset detection period, it determines that there is a communication fault between the local and peer nodes. This method can promptly identify specific forwarding paths or determine inter-node communication faults only after a preset detection period, effectively avoiding misjudgments caused by untimely route convergence. Furthermore, it prevents communication interruptions due to misjudgments, thus improving user experience.

[0088] In the optional implementation, once a fault-free forwarding path is obtained, both the local node and the peer node can resume the periodic transmission of the first BFD keep-alive message on that specific forwarding path. For the corresponding details, please refer to... Figure 4 After step S104 above, the following may also be included:

[0089] S106. Resume the periodic transmission of the first BFD keep-alive message on a specific forwarding path;

[0090] S107. Receive the first BFD keep-alive message periodically sent by the peer node through a specific forwarding path.

[0091] In this embodiment, after determining that there is a path failure in the peer's packet transmission path, the local node can use the second BFD keep-alive message to detect other forwarding paths besides the peer's packet transmission path. Once a specific forwarding path without faults is identified, both the local and peer nodes can resume the periodic transmission of the first BFD keep-alive message on that specific forwarding path. This effectively avoids misjudgment of faults caused by untimely route convergence, ensuring normal communication between the local and peer nodes before route convergence.

[0092] The preset period is the sending period of the first BFD keep-alive message. For example, the preset period can be set to 100ms or 200ms. It should be noted that this example is only for illustration, and the size of the preset period should be set according to the actual application situation, and is not limited here.

[0093] In the optional implementation, the first BFD keep-alive message, the second BFD keep-alive message, and the third BFD keep-alive message all adopt the unified message format specified by RFC. This unified message format includes a 5-bit diagnostic field (Diag), which can express field values ​​from 0 to 31. According to the table below, field values ​​from 0 to 8 have their own meanings, while field values ​​from 9 to 31 are reserved values.

[0094] Diag field values describe 0 No diagnostic information. 1 Control Detection Time Expired 2 Echo function failed. 3 Neighbor Signaled Session Down notification 4 Forwarding Plane Reset 5 Path Down 6 The connection channel has failed (Concatenated Path Down). 7 Administratively Down 8 Reverse concatenated path down 9~31 Reserved value (for future use)

[0095] In this embodiment, the diagnostic field values ​​of the first BFD keep-alive message, the second BFD keep-alive message, and the third BFD keep-alive message are 0, a preset interface change identifier, and a preset interface change response identifier, respectively. The preset interface change identifier and the preset interface change response identifier can be any two of the 24 reserved values ​​from 9 to 31. For example, the preset interface change identifier and the preset interface change response identifier can be set to 9 and 10 respectively; this example is merely illustrative and not intended to limit the scope of the message.

[0096] Therefore, the diagnostic field in the first BFD keepalive message is used to indicate to the message receiver that the forwarding path traversed by the first BFD keepalive message is fault-free. The diagnostic field in the second BFD keepalive message is used to indicate that there is a path fault in the peer node's packet sending path, and the peer node's packet sending path can be adjusted to the forwarding path traversed by the second BFD keepalive message. The diagnostic field in the third BFD keepalive message is used to inform the local node that the peer node has received the second BFD keepalive message, and at the same time, instruct the local node to adjust its packet sending path to the forwarding path traversed by the third BFD keepalive message.

[0097] In an optional implementation, the local node may include the local outgoing interface corresponding to each forwarding path, and the peer node may include the peer outgoing interface corresponding to each forwarding path. Optionally, the cause of path failure in the peer packet sending path may be: failure of intermediate nodes on the peer packet sending path, oscillation of the local outgoing interface or the peer outgoing interface at both ends of the peer packet sending path, etc.

[0098] Therefore, for the local node, if it does not receive the first BFD keep-alive message from the peer node within the preset period, it needs to send the second BFD keep-alive message through all other outgoing interfaces of the local node except for the previously recorded receiving interface to cyclically detect other forwarding paths besides the peer's packet sending path.

[0099] The testing process will be explained in detail below.

[0100] Optional, in Figure 3 Based on this, please refer to Figure 5The sub-steps of step S103 above may include S1031 to S1032.

[0101] S1031. Determine all other outgoing interfaces from all local outgoing interfaces except the local outgoing interface corresponding to the packet sending path of the peer end.

[0102] Optionally, the local node may include a routing management module and a BFD module. The routing management module is responsible for routing-related management, while the BFD module is responsible for fault detection between the local node and the peer node.

[0103] When the BFD module creates a BFD session with the peer node, it can subscribe to routes from the routing management module. In this way, after each route update, the routing management module will synchronize the local outgoing interface of all routes with the peer node as the destination address to the BFD module. That is, the BFD module can know the local outgoing interface of each forwarding path between the local node and the peer node.

[0104] Optionally, each time the local node and the peer node receive the first BFD keepalive message from the other, they will record the receiving interface of the first BFD keepalive message. This receiving interface is the local outgoing interface corresponding to the peer's packet sending path. In this embodiment, the BFD module can determine all other outgoing interfaces besides the receiving interface from all local outgoing interfaces.

[0105] S1032. Within the preset detection time, the second BFD keep-alive message is sent to the peer node through each other outgoing interface in a loop to perform fault detection on each other forwarding path.

[0106] In this embodiment, the BFD module can cyclically send second BFD keep-alive messages to the peer node through each other outgoing interface to perform fault loop detection on each other forwarding path.

[0107] Optional, in Figure 5 Based on this, please refer to Figure 6 The sub-steps of step S1032 may include S01 to S07.

[0108] S01. Select another output interface as the interface to be tested.

[0109] S02. Within the preset detection time, send a second BFD keep-alive message to the peer node through the interface under test to perform fault detection on the forwarding path corresponding to the interface under test.

[0110] S03. Determine whether a third BFD keep-alive message has been received from the interface under test within a preset period.

[0111] In this embodiment, if a third BFD keep-alive message is received from the interface under test within a preset period starting from the second BFD keep-alive forwarding, then step S04 is executed; if no third BFD keep-alive message is received from the interface under test within the preset period starting from the second BFD keep-alive forwarding, then step S05 is executed.

[0112] S04. Determine that the forwarding path corresponding to the interface under test is a fault-free specific forwarding path.

[0113] S05. Determine whether the time taken for the loop detection exceeds the preset detection time.

[0114] In this embodiment, the loop detection time can be counted from the first time the local node sends the second BFD keep-alive message. If the loop detection time exceeds the preset detection time, then step S06 is executed; if the loop detection time does not exceed the preset detection time, then step S07 is executed and then the above step S02 is returned to be executed until it is determined that the forwarding path corresponding to the interface under test is a fault-free specific forwarding path or that the communication between the local node and the peer node is faulty.

[0115] S06. Determine if there is a communication failure between the local node and the peer node;

[0116] S07. Select the next other output interface as the interface to be tested.

[0117] To facilitate understanding, the following is an example of performing fault detection cyclically.

[0118] Combination Figure 7 Assume there are 4 forwarding paths (S1 to S4) between local node A (hereinafter referred to as node A) and peer node B (hereinafter referred to as node B). The local outgoing interfaces of node A are A1 to A4, and the peer outgoing interfaces of node B are B1 to B4.

[0119] Assuming node A periodically sends the first BFD keepalive message to node B through its local outgoing interface A1, then the local packet sending path is S1, and the receiving interface for node B each time it receives the first BFD keepalive message is B1. Assuming node B periodically sends the first BFD keepalive message to node A through its local outgoing interface B4, then the peer packet sending path is S4, and the receiving interface for node A each time it receives the first BFD keepalive message is A4.

[0120] For node A, if it does not receive node B's first BFD keep-alive message at A4 within a preset period since the last receipt of the first BFD keep-alive message, it indicates that there is a path fault in S4 (which could be A4 oscillation, B4 oscillation, or intermediate nodes on S4 going down). However, whether there is a path fault in S1 to S3 is unknown. Therefore, node A needs to suspend sending the first BFD keep-alive message from A1 and instead needs to perform fault detection on S1 to S3 cyclically within a preset detection duration T1.

[0121] (1) Send a second BFD keep-alive message to the peer node via A1;

[0122] (2) Determine whether a third BFD keep-alive message has been received from A1 within the preset period T2;

[0123] (3) If a third BFD keep-alive message is received from A1 within T2, it means that the peer node received the second BFD keep-alive message on B1 and did not send the first BFD keep-alive message, but returned the third BFD keep-alive message through B1. This means that S1 is fault-free and node A and node B can continue to periodically send the first BFD keep-alive message through A1 and B1 respectively.

[0124] (4) If the third BFD keep-alive message is not received from A1 within T2, node A will then send the second BFD keep-alive message to the peer node through A2, and determine whether the third BFD keep-alive message is received from A2 within T2.

[0125] (5) If a third BFD keep-alive message is received from A2 within T2, the principle is the same as above, which means that S2 is fault-free, and node A and node B can continue to periodically send the first BFD keep-alive message through A2 and B2 respectively.

[0126] (6) If the third BFD keep-alive message is not received from A2 within T2, node A will then send the second BFD keep-alive message to the peer node through A3, and determine whether the third BFD keep-alive message is received from A3 within T2.

[0127] (7) If a third BFD keep-alive message is received from A2 within T2, the principle is the same as above, which means that S3 is fault-free, and node A and node B can continue to periodically send the first BFD keep-alive message through A3 and B3 respectively.

[0128] (8) If no third BFD keep-alive message is received from A2 within T2, then return to step (1) above until the loop detection time exceeds T1. Then determine that the communication between node A and node B is faulty or that a third BFD keep-alive message is received from a local outgoing interface of A1 to A3. In other words, the forwarding path corresponding to the local outgoing interface is taken as a fault-free specific forwarding path.

[0129] It should be noted that the preset detection duration T1 is longer than the preset period T2, and the value of T1 depends on the actual situation. The above example is for illustrative purposes only and is not intended to be limiting.

[0130] It should be noted that the execution order of each step in the above method embodiments is not limited to that shown in the attached figures, and the execution order of each step shall be subject to the actual application situation.

[0131] Compared with the prior art, the embodiments of the present invention have the following beneficial effects:

[0132] This invention utilizes any two reserved values ​​of the diagnostic field as a preset interface change identifier and a preset interface change response identifier. Thus, when there is a path failure in the peer's packet sending path, the local node can use the second BFD keep-alive message to detect faults in other forwarding paths. When the local node receives the third BFD keep-alive message from the peer node within a preset detection time, it obtains a specific forwarding path without faults.

[0133] Compared to existing technologies that directly misjudge and terminate inter-node communication due to untimely route convergence, this invention can identify specific forwarding paths by cyclically detecting faults in other forwarding paths within a preset detection period when route convergence is untimely. This effectively avoids misjudgment of BFD sessions that could lead to the termination of inter-node communication, thus improving the user experience.

[0134] In order to perform the corresponding steps in the above method embodiments and various possible implementations, an implementation of a fault detection device is given below.

[0135] Please see Figure 8 , Figure 8 A schematic diagram of the fault detection device provided in an embodiment of the present invention is shown. The fault detection device 200 is applied to a local node, which has a communication connection with a peer node and multiple forwarding paths; the fault detection device 200 includes: a periodic transceiver module 210 and a loop detection module 220.

[0136] The periodic transceiver module 210 is used to: periodically send a first BFD keep-alive message to the peer node through the local packet sending path; when the first BFD keep-alive message sent by the peer node through the peer packet sending path is not received within a preset period, it is determined that there is a path failure in the peer packet sending path and the periodic sending of the first BFD keep-alive message is suspended; the local packet sending path and the peer packet sending path are both any forwarding path;

[0137] The loop detection module 220 is used to: cyclically detect faults in forwarding paths other than the peer's packet sending path by sending a second BFD keep-alive message to the peer node; when a third BFD keep-alive message is received from the peer node within a preset detection period, a specific forwarding path without faults is obtained; when no third BFD keep-alive message is received within the preset detection period, a communication fault is determined between the local node and the peer node.

[0138] Optionally, the fault detection device 200 can be the BFD module of the local node, and the periodic transceiver module 210 and the cycle detection module 220 can be sub-modules of the BFD module.

[0139] Optionally, the third BFD keepalive message is returned by the peer node after pausing the periodic transmission of the first BFD keepalive message and using the forwarding path through which the second BFD keepalive message was received, upon receiving the second BFD keepalive message. The specific forwarding path is the forwarding path traversed by the third BFD keepalive message. After the cycle detection module 220 obtains the fault-free specific forwarding path, the periodic transceiver module 210 is further configured to: resume the periodic transmission of the first BFD keepalive message on the specific forwarding path; and receive the first BFD keepalive message periodically transmitted by the peer node through the specific forwarding path.

[0140] Optionally, the local node includes the local outgoing interface of each forwarding path; when the loop detection module 220 is used to perform fault detection on other forwarding paths besides the peer packet sending path by sending a second BFD keep-alive message to the peer node, it can be specifically used to: determine all other outgoing interfaces besides the local outgoing interface corresponding to the peer packet sending path from all local outgoing interfaces; and within a preset detection time, loop the sending of a second BFD keep-alive message to the peer node through each other outgoing interface to perform fault detection on each other forwarding path.

[0141] Optionally, the cyclic detection module 220 is used to cyclically send second BFD keepalive messages to the peer node through various other outgoing interfaces within a preset detection period to perform fault detection on various other forwarding paths. Specifically, it can be used to: designate one other outgoing interface as the interface under test; send a second BFD keepalive message to the peer node through the interface under test within a preset detection period to perform fault detection on the forwarding path corresponding to the interface under test; determine whether a third BFD keepalive message is received from the interface under test within a preset period; the third BFD keepalive message is the message sent back from the forwarding path that received the second BFD keepalive message when the peer node receives the second BFD keepalive message, pausing the periodic sending of the first BFD keepalive message; if the third BFD keepalive message is received from the interface under test within a preset period... If the interface receives a third BFD keep-alive message, it determines that the forwarding path corresponding to the interface under test is a fault-free specific forwarding path. If no third BFD keep-alive message is received from the interface under test within a preset period, it determines whether the loop detection time exceeds the preset detection time. If the loop detection time exceeds the preset detection time, it determines that there is a communication failure between the local node and the peer node. If the loop detection time does not exceed the preset detection time, the next other outgoing interface is selected as the interface under test, and the process returns to sending a second BFD keep-alive message to the peer node through the interface under test to perform fault detection on the forwarding path corresponding to the interface under test, until it is determined that the forwarding path corresponding to the interface under test is a fault-free specific forwarding path or that there is a communication failure between the local node and the peer node.

[0142] Optionally, the first, second, and third BFD keepalive messages all include diagnostic fields. In the first, second, and third BFD keepalive messages, the diagnostic fields are 0, a preset interface change flag, and a preset interface change response flag, respectively. The diagnostic fields in the first BFD keepalive message indicate that the forwarding path traversed by the first BFD keepalive message at the receiving end is fault-free; the diagnostic fields in the second BFD keepalive message indicate that the peer node's packet sending path has a path fault and needs to be adjusted to the forwarding path traversed by the second BFD keepalive message; the diagnostic fields in the third BFD keepalive message inform the local node that the peer node has received the second BFD keepalive message and instruct the local node to adjust its packet sending path to the forwarding path traversed by the third BFD keepalive message.

[0143] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working process of the fault detection device 200 described above can be referred to the corresponding process in the foregoing method embodiments, and will not be repeated here.

[0144] Please see Figure 9 , Figure 9This is a schematic diagram of the structure of a local node provided in an embodiment of the present invention. The local node 300 includes a processor 310, a memory 320 and a bus 330, and the processor 310 is connected to the memory 320 through the bus 330.

[0145] The memory 320 can be used to store software programs, such as the software program corresponding to the fault detection device 200 described above. The processor 310 executes various functional applications and data processing by running the software program stored in the memory 320 to implement the fault detection method provided in the embodiments of the present invention.

[0146] The memory 320 may be, but is not limited to, random access memory (RAM), read-only memory (ROM), flash memory, programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), etc.

[0147] The processor 310 can be an integrated circuit chip with signal processing capabilities. The processor 310 can be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.

[0148] Understandable. Figure 9 The structure shown is for illustrative purposes only; the local node 300 may also include more than [other components]. Figure 9 The more or fewer components shown, or having the same Figure 9 The different configurations shown. Figure 9 The components shown can be implemented using hardware, software, or a combination thereof.

[0149] This invention also provides a detection system, which includes the aforementioned local node and peer node, and the local node and peer node are connected by communication and have multiple forwarding paths.

[0150] This invention also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the fault detection method disclosed in the above embodiments. The computer-readable storage medium can be, but is not limited to, various media capable of storing program code, such as a USB flash drive, external hard drive, ROM, RAM, PROM, EPROM, EEPROM, FLASH disk, or optical disk.

[0151] In summary, this invention provides a fault detection method, apparatus, local node, detection system, and storage medium. The fault detection method is applied to a local node that communicates with a peer node and has multiple forwarding paths. When the local node determines that a path fault exists in the peer's packet transmission path, it suspends the periodic transmission of the first BFD keep-alive message. It then cyclically detects faults in other forwarding paths by sending a second BFD keep-alive message to the peer node. Upon receiving a third BFD keep-alive message, it identifies a specific fault-free forwarding path; or, if no third BFD keep-alive message is received within a preset detection period, it determines that there is a communication fault between the local and peer nodes. This allows for timely identification of specific forwarding paths or, if a fault is only identified after a preset detection period, effectively avoids misjudgments caused by untimely route convergence, further preventing communication interruptions due to misjudgments and improving user experience.

[0152] The above are merely specific embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

1. A fault detection method, characterized in that, Applied to a local node, wherein the local node and the peer node are connected and there are multiple forwarding paths; the method includes: The local end periodically sends a first BFD keep-alive message to the peer node through the local packet sending path; the first BFD keep-alive message is used to indicate that the forwarding path traversed is fault-free; If the first BFD keep-alive message sent by the peer node through the peer packet sending path is not received within a preset period, it is determined that there is a path failure in the peer packet sending path and the periodic sending of the first BFD keep-alive message is suspended; the local packet sending path and the peer packet sending path are any two of the multiple forwarding paths; By sending a second BFD keep-alive message to the peer node, fault detection is performed on other forwarding paths besides the peer packet sending path in a loop; the second BFD keep-alive message is used to indicate that there is a path fault in the peer packet sending path of the peer node, and the peer packet sending path needs to be adjusted to the forwarding path traversed by the second BFD keep-alive message. When the third BFD keep-alive message from the peer node is received within the preset detection time, a fault-free specific forwarding path is obtained; the third BFD keep-alive message is used to inform the local node that the peer node has received the second BFD keep-alive message, and at the same time instruct the local node to adjust the local packet sending path to the forwarding path traversed by the third BFD keep-alive message. If the third BFD keep-alive message is not received within the preset detection time, it is determined that there is a communication failure between the local node and the peer node.

2. The method according to claim 1, characterized in that, The third BFD keep-alive message is sent by the peer node when it receives the second BFD keep-alive message, after pausing the periodic transmission of the first BFD keep-alive message and returning it through the forwarding path that receives the second BFD keep-alive message; The specific forwarding path is the forwarding path through which the third BFD keep-alive message passes; After the step of obtaining a fault-free specific forwarding path, the method further includes: Resume the periodic transmission of the first BFD keep-alive message on the specific forwarding path; Receive the first BFD keep-alive message periodically sent by the peer node through the specific forwarding path.

3. The method according to claim 1, characterized in that, The local node includes the local outgoing interface for each of the forwarding paths; the step of cyclically performing fault detection on other forwarding paths besides the peer packet sending path by sending a second BFD keep-alive message to the peer node includes: From all the local outgoing interfaces, determine all other outgoing interfaces except the local outgoing interface corresponding to the peer packet sending path; Within the preset detection time, the second BFD keep-alive message is sent cyclically to the peer node through each of the other outgoing interfaces to perform fault detection on each of the other forwarding paths.

4. The method according to claim 3, characterized in that, The step of cyclically sending the second BFD keep-alive message to the peer node through each of the other outgoing interfaces within the preset detection time period to perform fault detection on each of the other forwarding paths includes: Designate one of the other output interfaces as the interface to be tested; Within the preset detection time, the second BFD keep-alive message is sent to the peer node through the interface under test to perform fault detection on the forwarding path corresponding to the interface under test; Determine whether the third BFD keepalive message is received from the interface under test within the preset period; the third BFD keepalive message is received by the peer node when it receives the second BFD keepalive message, and the periodic transmission of the first BFD keepalive message is paused and the message is returned from the forwarding path that received the second BFD keepalive message. If the third BFD keep-alive message is received from the interface under test within the preset period, the forwarding path corresponding to the interface under test is determined to be a fault-free specific forwarding path. If the third BFD keep-alive message is not received from the interface under test within the preset period, it is determined whether the loop detection time exceeds the preset detection duration. If the loop detection time exceeds the preset detection time, then the communication between the local node and the peer node is determined to be faulty. If the loop detection time does not exceed the preset detection time, then the next other outgoing interface is taken as the interface to be tested, and the process of sending the second BFD keep-alive message to the peer node through the interface to be tested to perform fault detection on the forwarding path corresponding to the interface to be tested continues until it is determined that the forwarding path corresponding to the interface to be tested is a fault-free specific forwarding path or that there is a communication failure between the local node and the peer node is determined.

5. The method according to any one of claims 1-4, characterized in that, The diagnostic fields of the first BFD keep-alive message, the second BFD keep-alive message, and the third BFD keep-alive message are 0, preset interface change identifier, and preset interface change response identifier, respectively. The diagnostic field in the first BFD keep-alive message is used to indicate that the forwarding path through which the first BFD keep-alive message passes is fault-free; The diagnostic field in the second BFD keepalive message is used to indicate that there is a path failure in the peer node's packet sending path, and the peer packet sending path needs to be adjusted to the forwarding path traversed by the second BFD keepalive message; The diagnostic field in the third BFD keep-alive message is used to inform the local node that the peer node has received the second BFD keep-alive message, and at the same time instructs the local node to adjust its packet sending path to the forwarding path traversed by the third BFD keep-alive message.

6. A fault detection device, characterized in that, Applied to a local node, wherein the local node and the peer node are connected and there are multiple forwarding paths; the device includes: The periodic transceiver module is used to periodically send a first BFD keep-alive message to the peer node through the local packet sending path; the first BFD keep-alive message is used to indicate that the forwarding path traversed is fault-free; The periodic transceiver module is further configured to determine that there is a path fault in the peer packet transmission path and suspend the periodic transmission of the BFD keep-alive message when it does not receive the first BFD keep-alive message sent by the peer node through the peer packet transmission path within a preset period; the local packet transmission path and the peer packet transmission path are any two of the multiple forwarding paths. The loop detection module is used to perform fault detection on other forwarding paths besides the peer packet sending path by sending a second BFD keep-alive message to the peer node; the second BFD keep-alive message is used to indicate that there is a path fault in the peer packet sending path of the peer node, and the peer packet sending path needs to be adjusted to the forwarding path traversed by the second BFD keep-alive message. When the third BFD keep-alive message from the peer node is received within the preset detection time, a fault-free specific forwarding path is obtained; the third BFD keep-alive message is used to inform the local node that the peer node has received the second BFD keep-alive message, and at the same time instruct the local node to adjust the local packet sending path to the forwarding path traversed by the third BFD keep-alive message. If the third BFD keep-alive message is not received within the preset detection time, it is determined that there is a communication failure between the local node and the peer node.

7. The apparatus according to claim 6, characterized in that, The third BFD keep-alive message is sent by the peer node when it receives the second BFD keep-alive message, after pausing the periodic transmission of the first BFD keep-alive message and returning it through the forwarding path that receives the second BFD keep-alive message; The specific forwarding path is the forwarding path traversed by the third BFD keep-alive message; after the loop detection module obtains the fault-free specific forwarding path, the periodic transceiver module is further used for: Resume the periodic transmission of the first BFD keep-alive message on the specific forwarding path; Receive the first BFD keep-alive message periodically sent by the peer node through the specific forwarding path.

8. A local node, characterized in that, include: A memory and a processor, wherein the memory stores a software program, and the processor executes the software program when the local node is running to implement the fault detection method as described in any one of claims 1-5.

9. A detection system, characterized in that, The detection system includes a local node and a peer node as described in claim 8, wherein the local node and the peer node are connected in communication and have multiple forwarding paths.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the fault detection method according to any one of claims 1-5.