Cross-layer fault diagnosis method, and electronic device, readable medium and program product
By using a knowledge-based cross-layer fault diagnosis method and knowledge graph in a hybrid IP and optical network, the problems of low fault diagnosis efficiency and inaccurate location are solved, and efficient cross-layer fault location and repair are achieved.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- ZTE CORP
- Filing Date
- 2025-12-15
- Publication Date
- 2026-07-02
AI Technical Summary
In hybrid IP and optical networks, fault diagnosis is inefficient and the location is inaccurate, resulting in low resource utilization and difficulties in inter-departmental collaboration.
The first large model trained on a knowledge base is used for cross-layer fault diagnosis. Cross-layer fault diagnosis rules are used in combination with cross-layer network knowledge graphs to locate faults and provide repair suggestions.
It improves the efficiency and accuracy of fault diagnosis in hybrid networks, enables rapid location and repair of cross-layer faults, and enhances the stability of system operation.
Smart Images

Figure CN2025142393_02072026_PF_FP_ABST
Abstract
Description
Cross-layer fault diagnosis methods, electronic devices, readable media and program products
[0001] Cross-reference to related applications
[0002] This application claims priority to Chinese patent application CN 202411988671.3, filed on December 27, 2024, entitled “Method for cross-layer fault diagnosis, electronic device, readable medium and program product”, the entire contents of which are incorporated herein by reference. Technical Field
[0003] This disclosure relates to the field of communication technology, and in particular to a cross-layer fault diagnosis method, electronic device, readable medium, and program product. Background Technology
[0004] To achieve long-distance transmission and improve network speed, hybrid networking of different network types has gradually become an important networking method. Internet Protocol (IP) networks and optical networks are a commonly used hybrid network. IP networks consist of routers, which have strong packet processing and traffic management capabilities and are mainly used for user traffic service processing. Optical networks consist of wavelength division multiplexing (WDM) equipment, which can reduce bit costs and improve reliability. By using multiple wavelengths for multiplexing transmission, it carries the router's services, achieving high-capacity, long-distance transmission. In other words, optical networks provide IP networks with a high-speed, low-cost long-distance information transmission channel.
[0005] For a long time, IP networks and optical networks have developed independently, with different departments handling planning, design, deployment, and operation and maintenance. The cooperation and coordination between departments is cumbersome, resulting in low resource utilization. Consequently, cross-layer fault diagnosis is inefficient and fault location is inaccurate. Summary of the Invention
[0006] This disclosure provides a cross-layer fault diagnosis method, electronic device, readable medium, and program product.
[0007] This disclosure provides a cross-layer fault diagnosis method, comprising: acquiring a fault phenomenon, wherein the fault phenomenon has a fault manifestation form in a hybrid network, the hybrid network including two or more networks; based on the fault phenomenon, performing cross-layer fault diagnosis on the hybrid network using a first major model to obtain a diagnosis result, wherein the first major model is a model trained based on knowledge in a knowledge base, the knowledge including cross-layer fault diagnosis rules.
[0008] This disclosure also provides an electronic device, which includes a memory and a processor; the memory stores a computer program that can be executed by the processor, and when the computer program is executed by the processor, it implements the cross-layer fault diagnosis method according to the embodiments of this disclosure.
[0009] This disclosure also provides a computer-readable medium having a computer program stored thereon, which, when executed by a processor, implements a cross-layer fault diagnosis method according to embodiments of this disclosure.
[0010] This disclosure also provides a computer program product, which includes a computer program that, when executed by a processor, implements the cross-layer fault diagnosis method according to embodiments of this disclosure. Attached Figure Description
[0011] In the accompanying drawings of the embodiments disclosed herein:
[0012] Figure 1 illustrates a gray light application scenario according to an embodiment of this disclosure;
[0013] Figure 2 illustrates a colored light application scenario according to an embodiment of this disclosure;
[0014] Figure 3 illustrates a schematic diagram of a hybrid network architecture combining IP and optical networks according to an embodiment of the present disclosure;
[0015] Figure 4 shows a schematic diagram of a converged architecture according to an embodiment of the present disclosure;
[0016] Figure 5 shows a flowchart of a cross-layer fault diagnosis method according to an embodiment of the present disclosure;
[0017] Figure 6 illustrates a service layer model of an IP network and an optical network according to an embodiment of the present disclosure;
[0018] Figure 7 illustrates a schematic diagram of IP network service fault diagnosis rules according to an embodiment of the present disclosure;
[0019] Figure 8 shows a schematic diagram of the input power fluctuations of all OTS segment optical amplifiers according to embodiments of the present disclosure;
[0020] Figure 9 illustrates a schematic diagram of optical network service fault diagnosis rules according to an embodiment of the present disclosure;
[0021] Figure 10 illustrates a schematic diagram of a cross-layer network knowledge graph of IP networks and optical networks according to an embodiment of the present disclosure;
[0022] Figure 11 shows a flowchart of a cross-layer fault diagnosis method based on a first large model and a cross-layer network knowledge graph according to an embodiment of the present disclosure;
[0023] Figure 12 shows another flowchart of a cross-layer fault diagnosis method according to an embodiment of the present disclosure;
[0024] Figure 13 illustrates a flowchart of cross-layer service connectivity fault diagnosis for IP networks and optical networks according to an embodiment of the present disclosure;
[0025] Figure 14 illustrates a flowchart of cross-layer packet loss fault diagnosis for IP networks and optical networks according to embodiments of the present disclosure;
[0026] Figure 15 shows another flowchart of a cross-layer fault diagnosis method according to an embodiment of the present disclosure. Detailed Implementation
[0027] To enable those skilled in the art to better understand the technical solutions of this disclosure, the embodiments of this disclosure will be described in detail below with reference to the accompanying drawings.
[0028] The present disclosure will be described more fully below with reference to the accompanying drawings; however, the embodiments shown may be embodied in different forms, and the present disclosure should not be construed as limited to the embodiments set forth below. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will enable those skilled in the art to fully understand the scope of the disclosure.
[0029] The accompanying drawings of the embodiments disclosed herein are provided to further illustrate the embodiments of this disclosure and form part of the specification. They are used together with the detailed embodiments to explain this disclosure and do not constitute a limitation thereof. The above and other features and advantages will become more apparent to those skilled in the art from the description of the detailed embodiments with reference to the accompanying drawings.
[0030] This disclosure may be described with reference to plan and / or cross-sectional views using the ideal schematic diagrams of this disclosure. Therefore, the example illustrations may be modified according to manufacturing techniques and / or tolerances.
[0031] Where there is no conflict, the various embodiments of this disclosure and the features thereof in the embodiments may be combined with each other.
[0032] The terminology used in this disclosure is for the purpose of describing particular embodiments only and is not intended to limit the disclosure. The term "and / or" as used in this disclosure includes any and all combinations of one or more of the associated enumerated entries. The singular forms "a" and "the" as used in this disclosure are also intended to include the plural forms, unless the context clearly indicates otherwise. The terms "comprising," "made of," etc., as used in this disclosure specify the presence of the stated feature, integral, step, operation, element, and / or component, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components, and / or groups thereof.
[0033] Unless otherwise specified, all terms used in this disclosure (including technical and scientific terms) have the same meaning as commonly understood by one of ordinary skill in the art. It will also be understood that terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with their meaning in the context of the relevant art and this disclosure, and will not be interpreted as having an idealized or overly formal meaning, unless expressly so specified in this disclosure.
[0034] This disclosure is not limited to the embodiments shown in the accompanying drawings, but includes modifications to the configuration based on the manufacturing process. Therefore, the areas illustrated in the drawings are schematic, and the shapes of the areas shown illustrate specific shapes of the areas of an element, but are not intended to be limiting.
[0035] Mixed networking of different network types can achieve long-distance transmission and improve network transmission speed, but it brings inconvenience to fault diagnosis. The cross-layer fault diagnosis method, electronic equipment, readable medium, and program products provided in this disclosure can not only realize cross-layer diagnosis of mixed networks, but also improve the efficiency and accuracy of fault diagnosis.
[0036] Figure 1 illustrates a gray light application scenario according to an embodiment of this disclosure.
[0037] As shown in Figure 1, taking a hybrid network consisting of IP and optical networks as an example, the hybrid network includes a coordinator 1, an IP controller 3, and an optical controller 5. The coordinator 1 is used for functions such as cross-layer resource management, cross-layer unified view, cross-layer service provisioning, cross-layer protection coordination, cross-layer alarm association, cross-layer maintenance, and cross-layer optimization. The IP controller 3 is used for IP network operation and maintenance, and the optical controller 5 is used for optical network operation and maintenance.
[0038] Coordinator 1 interacts with IP controller 3 through the Abstraction and Control of Traffic Engineered Networks (ACTN) Multi-Point Interface (MPI). Coordinator 1 also interacts with optical controller 5 through the Telephone Application Programming Interface (TAPI).
[0039] The IP network includes multiple routers 31, which are used for service processing of user traffic. Routers 31 establish connections with the optical network through gray optical modules 6.
[0040] An optical network comprises an Optical Transport Network (OTN) layer and a Dense Wavelength Division Multiplexing (DWDM) layer. The OTN layer houses a forwarding board, specifically the xPonder board 4, which includes a MuxPonder / TransPonder. An Optical Access Channel (OAC) connection is established via the Generalized Multiprotocol Label Switching-User Network Interface (GMPLS-UNI) or via a Software Defined Networking Controller (SDNC). Electrical switching between the OTN and DWDM layers occurs through Optical Digital Units (ODUs). An Optical Channel (OCH) connection can also be established between the OTN and DWDM layers.
[0041] Figure 2 illustrates a colored light application scenario according to an embodiment of this disclosure.
[0042] As shown in Figure 2, taking a hybrid network consisting of IP and optical networks as an example, the hybrid network includes a coordinator 1, an IP controller 3, and an optical controller 5. The coordinator 1 is used for functions such as cross-layer resource management, cross-layer unified view, cross-layer service provisioning, cross-layer protection coordination, cross-layer alarm association, cross-layer maintenance, and cross-layer optimization. The IP controller 3 is used for IP network operation and maintenance, and the optical controller 5 is used for optical network operation and maintenance.
[0043] Coordinator 1 interacts with IP controller 3 via ACTN MPI. Coordinator 1 also interacts with optical controller 5 via TAPI.
[0044] The optical network includes a DWDM layer, which implements optical switching. The IP network is directly connected to the multiplexing / demultiplexing board 8 of the DWDM layer via the colored optical module 7, and an OCH connection is established through optical SDNC, eliminating the need for the xPonder board and OTN electrical switch. Compared with the application scenario in Figure 1, the direct connection of the colored optical module to the DWDM layer can reduce power consumption and save costs.
[0045] In this embodiment of the disclosure, regardless of whether it is a gray light application scenario or a colored light application scenario, the following two hybrid network architectures can be adopted: a layered collaborative architecture and a fusion architecture.
[0046] Figure 3 illustrates a schematic diagram of a hybrid network architecture combining IP and optical networks according to an embodiment of this disclosure.
[0047] As shown in Figure 3, the coordinator 1 connects to the IP controller 3 via the Representational State Transfer (RESTful) Northbound Network Boundary Interface (NBI) to enable information exchange between the coordinator 1 and the IP controller 3. The coordinator 1 also exchanges information with the optical controller 5 via the Northbound TAPI.
[0048] Coordinator 1 is used to implement functions such as detailed IP topology, Missing Link (ML) - Path Computation Element (PCE), ML-Link, multi-layer alarm association, multi-layer maintenance window, optical abstract topology, multi-layer visibility, Media Loss Rate (MLR), multi-layer bandwidth on demand (BoD), and multi-layer monitoring and analysis. IP Controller 3 is used to implement functions such as IP topology, IP resource storage, IP PCE, Traffic Engineering Database (TEDB), IP tunneling, IP alarms, IP protection, and IP fault diagnosis. Optical Controller 5 is used to implement functions such as optical topology construction, optical asset management, optical PCE, wavelength management, optical connectivity, optical alarms, WDM / OTN Automatically Switched Optical Network (WASON), and fault diagnosis.
[0049] IP devices in an IP network include multiple routers 31. Routers 31 can connect Virtual Private Networks (VPNs), that is, different VPNs are connected through Virtual Terminal Emulators (VTEs), and VTE links (VTELinks) are established between different VPNs to enable service processing of user traffic.
[0050] Optical devices in an optical network include the xPonder board 4 and the multiplexing / demultiplexing board 8. IP networks and optical networks establish connections through the xPonder board 4 and the multiplexing / demultiplexing board 8 to enable signal transmission between the IP network and the optical network.
[0051] Figure 4 shows a schematic diagram of a converged architecture according to an embodiment of the present disclosure.
[0052] As shown in Figure 4, the converged controller 9 directly manages the IP network and the optical network. The converged controller 9 is used to realize functions such as multi-layer visibility, multi-layer monitoring and analysis, ML-Link, MLR, multi-layer alarm association, multi-layer BoD, multi-layer planning simulation, multi-layer maintenance window, multi-dimensional unified topology, unified resources, unified PCE, unified alarm, and southbound MPI / TAPI.
[0053] IP devices in an IP network include multiple routers. Router 31 can connect VPNs, that is, different VPNs are connected through VTE, and VTElinks are established between different VPNs to enable service processing of user traffic.
[0054] Optical devices in an optical network include the xPonder board 4 and the multiplexing / demultiplexing board 8. IP networks and optical networks establish connections through the xPonder board 4 and the multiplexing / demultiplexing board 8 to enable signal transmission between the IP network and the optical network.
[0055] This disclosure provides a method for diagnosing cross-layer faults.
[0056] Figure 5 shows a flowchart of a cross-layer fault diagnosis method according to an embodiment of the present disclosure.
[0057] As shown in Figure 5, the cross-layer fault diagnosis method according to an embodiment of the present disclosure includes the following steps S501 to S502.
[0058] In step S501, a fault phenomenon is obtained, wherein the fault phenomenon has a fault manifestation form in a hybrid network, and the hybrid network includes two or more networks.
[0059] When a fault occurs in a hybrid network, the monitoring system will issue a description of the fault phenomenon, or the user can describe the fault phenomenon via voice or text based on their observations. In some systems, an alarm will be issued when a fault occurs. In other systems, no alarm will be issued if the fault does not exceed a fault threshold. By identifying fault phenomena in advance, faults can be diagnosed and resolved before the system issues an alarm, thereby improving the stability of system operation.
[0060] In this embodiment, the fault phenomena include, but are not limited to, service interruption and packet loss. Faults can be initiated by the monitoring system or the user, or initiated by the monitoring system and then supplemented by the user. Users can provide fault phenomena through interactive sessions.
[0061] Hybrid networks include two or more types of networks. For example, a hybrid network includes an IP network and an optical network.
[0062] This disclosure uses a hybrid network consisting of IP and optical networks as an example for description. Before describing cross-layer fault diagnosis, the service layering model of the IP and optical networks is described first.
[0063] Figure 6 illustrates a service layer model of an IP network and an optical network according to an embodiment of this disclosure.
[0064] As shown in Figure 6, IP networks and optical networks, from top to bottom, include the application layer (L4-L7), IP service layer, IP path layer, IP protocol layer (L3), IP physical layer (L2), OTN layer (L1), DWDM layer (L0), and fiber optic layer.
[0065] The application layer is used to implement customer reasons / prefixes and the Access Channel (AC) associated with the VPN.
[0066] The IP service layer is used to implement public network routing, VPN private network routing, L2 / L3 unicast / multicast, and segmented / end-to-end services.
[0067] The IP path layer is used to implement VPN automatic routing to Flexible Algorithm (FlexAlgo), Segment Routing Policy (SR Policy), Label Distribution Protocol (LDP), Segment Routing Best Effort (SR-BE), Resource Reservation Protocol Traffic Engine (RSVP-TE), and Border Gateway Protocol Labeled Unicast (BGP-LU).
[0068] The IP protocol layer is used for SR-BE / SR Policy associated nodes / Adjacency Segment Identity Document (Adj SID) links / Locator Interior Gateway Protocol (Locator IGP L3DB) / Traffic Engineering Database (TEDB).
[0069] The IP physical layer is used for physical nodes / ports / signaling gateways (SG) / optical modules, Ethereum (ETH) / Link Layer Discovery Protocol (LLDP), port rate / Quality of Service (QoS) queues, etc.; ETH connections (Link) / cross-layer Links.
[0070] Interlayer IP and optical networks are used for gray optical modules - xPonder boards (low-speed 10G / 20G / 100G, etc.); colored optical modules - multiplexing / demultiplexing boards (100G / 200G / 400G, etc.); LLDP interlayer link discovery / manual planning.
[0071] The OTN layer is used for OAC services of gray light modules, associated cross-layer links, and settings for xPonder boards and electrical layers.
[0072] The DWDM layer is used for the OCH service of the color light module, to associate the color light cross-layer link, and to set up the multiplexing and demultiplexing single board.
[0073] The fiber layer is used to establish OAC connections, OCH connections, optical multiplexing sections (OMS), optical transport sections (OTS), and fiber optic transmission.
[0074] In step S502, based on the fault phenomena, the first large model is used to perform cross-layer fault diagnosis on the hybrid network to obtain the diagnosis results.
[0075] The primary model is a model trained based on knowledge from a knowledge base, and this knowledge includes cross-layer fault diagnosis rules. In some embodiments, the knowledge base also includes at least one of operation and maintenance manuals, fault cases, and diagnostic procedures.
[0076] In this embodiment, the first primary model can be an AI model, which can be trained in a supervised manner and further fine-tuned using prompts. The diagnostic results output by the first primary model include at least one of the following: fault location, fault reasoning process, and fault repair suggestions. The first primary model can output one or more diagnostic results. When the first primary model outputs multiple diagnostic results, these results can be ranked, with higher ranking results indicating higher accuracy; that is, the first primary model places the most likely diagnostic result first.
[0077] In some embodiments, different diagnostic rules correspond to different networks and different fault scenarios. Fault scenarios include service interruption faults and service packet loss faults, etc.
[0078] Table 1 details the IP network service interruption faults and diagnostic rules, and Table 2 details the IP network service packet loss faults and diagnostic rules. In this embodiment, "IP+optical" is an abbreviation for IP network and optical network.
[0079] Table 1 lists the rules for troubleshooting and diagnosing IP network service interruptions.
[0080] Table 2 lists the rules for diagnosing packet loss faults in IP network services.
[0081] Figure 7 illustrates a schematic diagram of IP network service fault diagnosis rules according to an embodiment of this disclosure.
[0082] Figure 7 illustrates the correspondence between fault diagnosis rules and fault repair suggestions when different fault phenomena occur in L2VPN services. For example, when packet loss occurs, sub-processes can be performed, including In-situ Operations, Administration, and Maintenance (IOAM) checks, performance checks, alarm checks, tunnel checks, and service-PW checks. Sub-processes include hop-by-hop packet loss detection, port optical power over-limit diagnosis, and tunnel packet loss diagnosis. IOAM checks include IOAM deployment checks (whether IOAM is deployed) and IOAM performance status checks. Performance checks include port Cyclic Redundancy Check (CRC) performance testing, Two-Way Active Measurement Protocol (TWAMP) packet loss detection, and port bandwidth utilization testing. Alarm checks include general alarm detection (hardware alarms, resource alarms, Cyber-Physical Systems (CPS) alarms: current and historical), Pseudo-wire Bidirectional Forwarding Detection (PW-BFD) alarm detection, classified packet loss alarm checks, port optical power over-limit alarm checks, and QoS packet loss alarm detection. Tunnel checks include tunnel reachability checks (LSP-LB) - inbound tunnel process, LSP flow control configuration checks, and tunnel type checks. Service end-to-end path checks include path reconstruction. Physical port checks include port status checks (AC ports). Service checks include service reachability checks (connection detection virtual port checks), service reachability checks (UNI / NNIPing), service type checks (VPLS / VPWS determination), and MAC routing checks (AC port MAC learning status checks). Service-PW checks include PW reachability checks (PW-LB, PW-Ping), PW quality checks (PW-LM), PW connectivity tracing (PW-Trace), PW-OAM configuration checks (whether PW-OAM is configured), PW-BDF configuration checks (whether PW-BDF is configured), PW type configuration checks (static / dynamic), PW configuration consistency checks, PW status checks, and PW flow control configuration checks. Service-LDP neighbor checks include LDP neighbor reachability checks (ICMP-Ping), LDP protocol peer configuration checks, and LDP neighbor status checks (remote neighbors).
[0083] Optical network service fault diagnosis rules include OTN end-to-end fault diagnosis and fault definition. When diagnosing optical network service faults, the service type and diagnostic rules can be selected, and then fault diagnosis processing can be performed, specifically including the following steps.
[0084] Step 1: Query the service layer business and the sorted physical routes.
[0085] Step 2: Perform OTN service fault diagnosis to obtain root cause alarms and fault delimitation results related to the service.
[0086] If the fault is determined to be outside the boundary, the diagnosis ends and the diagnosis result is returned as a third-party equipment fault; otherwise, the fault diagnosis is performed based on the following diagnostic rules.
[0087] Diagnostic Rule 1: Basic Configuration Check
[0088] 1. Consistency of access service types
[0089] Determine whether the service type parameter configuration (F port: 85102) of the source and destination node customer interface is consistent. If they are inconsistent, output: The service configuration of the xxx (network element + board label) board and the xxx (network element + board label) board are inconsistent.
[0090] Troubleshooting suggestion: Please check if the configuration of the access service type on the single board is consistent.
[0091] 2. Consistency of business mapping
[0092] Determine whether the single-board service mapping parameter configuration (F port: 85179) of the source node and the relay node is consistent. If they are inconsistent, output: xxx (network element + single board label) single board, xxx (network element + single board label) single board... service configuration is inconsistent.
[0093] Troubleshooting suggestion: Please check if the service mapping configuration of the single board is consistent.
[0094] 3. Consistency of Forward Error Correction (FEC)
[0095] Check if the FEC parameter configuration (F port: 84947) of the single-board line port of the source node, destination node, and relay node is consistent. If they are inconsistent, output: xxx (network element + board label) board, xxx (network element + board label) board... The FEC configurations are inconsistent, and list the specific values of the board port. When determining the consistency of FEC, it is necessary to distinguish between boards.
[0096] Troubleshooting suggestion: Please check the FEC mode configuration of the single board.
[0097] 4. Consistency of wavelength settings
[0098] If the wavelengths (F port: 85207) of the service board lines of the source node and relay node are inconsistent, the output will be: xxx (network element + board label) board, xxx (network element + board label) board... wavelengths are inconsistent.
[0099] Troubleshooting suggestion: Please check the single-board wavelength tuning configuration.
[0100] Diagnostic Rule 2: OTS Layer Alarm
[0101] 1. Fiber breakage rules
[0102] The supplementary document link allows you to directly obtain fiber breakage status and execute more comprehensive rules covering more scenarios.
[0103] 2. Power fluctuation diagnosis rules
[0104] Condition 1) There is an optical power over-limit alarm on the OTS port; Condition 2) 24 / 72 hours before the alarm reporting time, the absolute value of the difference between the input / output power of the port reporting the alarm and the corresponding high / low threshold is >1; When conditions 1) and 2) are met, the power fluctuation diagnosis rule is activated.
[0105] Step a) According to the signal flow direction, query the input power of each port of all OTS segments 15 minutes after the alarm reporting time; if the maximum input power - minimum input power > 2, then calculate the input power difference between any two adjacent ports in reverse according to the signal flow direction to obtain the segment corresponding to the first fluctuating light amplifier (OA), give the diagnosis result and exit the power fluctuation diagnosis rule; otherwise, proceed to step b).
[0106] Figure 8 shows a schematic diagram of the input power fluctuations of all OTS segment optical amplifiers according to embodiments of the present disclosure.
[0107] As shown in Figure 8, all OTS segments include four optical amplifiers 81 to 84. By querying the input optical power of all OTS segment 402 interface alarm reporting times within 15 minutes and calculating the input power difference between adjacent ports, the two ports with the largest input power difference are taken as the fluctuation source. For example, the fluctuation source is between optical amplifier 81 and optical amplifier 82.
[0108] Step b) According to the signal flow direction, query all alarm times, the 24 hours and 72 hours before the alarm was reported, and the current fiber optic attenuation.
[0109] The calculation proceeds in reverse segment by segment. If the fiber attenuation at the alarm reporting time minus the current fiber attenuation is greater than or equal to 2, or the fiber attenuation at the alarm reporting time minus the fiber attenuation 24 hours before the alarm reporting time is greater than or equal to 2, or the fiber attenuation at the alarm reporting time minus the fiber attenuation 72 hours before the alarm reporting time is greater than or equal to 2, then it is determined that the corresponding fiber attenuation increased at the alarm reporting time.
[0110] The calculation proceeds in reverse segment by segment. If the fiber attenuation at the alarm reporting time minus the current fiber attenuation is ≤-2, or the fiber attenuation at the alarm reporting time minus the fiber attenuation 24 hours before the alarm reporting time is ≤-2, or the fiber attenuation at the alarm reporting time minus the fiber attenuation 72 hours before the alarm reporting time is ≤-2, then it is determined that the corresponding fiber attenuation decreases at the alarm reporting time.
[0111] In this embodiment of the disclosure, fiber attenuation = OTS transmitter power - OTS receiver power - LAC / VOA attenuation value.
[0112] Fault diagnosis result: xxx (fiber optic tag) attenuation increased / decreased, compared to 24 / 72 hours ago, attenuation increased / decreased by xxdB.
[0113] Troubleshooting suggestions: Please check for issues such as contaminated optical interfaces, bent pigtails, and deteriorated optical cables.
[0114] Diagnostic Rule 3: OMS Alarm
[0115] 1. Power fluctuation diagnosis rules
[0116] 1) Covers power fluctuation diagnosis rules and alarm diagnosis rules reported at the OTS layer.
[0117] 2) Based on the OTS layer diagnostic rules, further analyze the internal loss between the multiplexing / demultiplexing devices and the OA (OMS link in the routing) within the network element. Sources of internal loss: pigtail + optical port. Whether it's pigtail or interface degradation, the final result is an increase in the difference between the output power of the optical port corresponding to the pigtail head and the input power of the optical port corresponding to the pigtail tail.
[0118] If the internal loss at the time of alarm reporting minus the internal attenuation 24 hours prior to alarm reporting is greater than 1, or if the internal loss at the time of alarm reporting minus the internal attenuation 72 hours prior to alarm reporting is greater than 1, then the internal loss is determined to be abnormal.
[0119] The diagnosis result is: abnormal internal loss of xxx (fiber optic tag), with attenuation increasing / decreasing by xx dB compared to 24 / 72 hours ago.
[0120] Troubleshooting suggestions: Please check for issues such as contaminated optical interfaces, bent pigtails, and deteriorated optical cables.
[0121] 2. Fault in the multiplexing / splitting board
[0122] If there is an alarm indicating that the AWG operating temperature (°C) exceeds the limit, the module fails, or the module communication fails, and the difference between the alarm time and the service failure time is within 15 minutes, the diagnostic result will be output as: xxx (network element + board label) board hardware failure.
[0123] Troubleshooting suggestion: Please replace the multiplexing / splitting board.
[0124] Diagnostic Rule 4: OCH Layer Alarm
[0125] 1. Single-board operation status diagnosis
[0126] 1) Query the current and historical alarms of OCH service A and Z ends within 2 hours; if the board where the originating end (port 454, port 407, port 448) is located has two or more board online alarms, and at the same time, within 15 minutes before the time the board online alarm was generated, the other end (port 454, port 407, port 448) has an input no light alarm, then proceed to 2); otherwise, exit.
[0127] 2) Check the current performance of the originating board and whether the temperature at the probe point (as shown below) exceeds 55℃:
[0128] If the temperature is greater than or equal to 55℃, the diagnostic result will be output as: xxx(network element + board label) The ambient temperature of the board is too high. The fault repair suggestion is: Please check the ambient temperature of the board. Otherwise, the diagnostic result will be output as: xxx(network element + board label) The board has a software or hardware fault. The fault repair suggestion is: Please replace the board.
[0129] 2. Power fluctuation diagnosis
[0130] 1) Covers diagnostic rules for power limit exceeding alarms reported at the OMS layer.
[0131] 2) Based on the diagnostic rules of the OMS layer, further analyze the internal loss between the line-side board and the multiplexing / splitting board.
[0132] If the internal loss at the time of alarm reporting minus the internal attenuation 24 hours prior to alarm reporting is greater than 1, or if the internal loss at the time of alarm reporting minus the internal attenuation 72 hours prior to alarm reporting is greater than 1, then the internal loss is determined to be abnormal, and the diagnostic result is output: xxx (fiber optic tag) internal loss abnormal, with the attenuation increasing / decreasing by xx dB compared to 24 / 72 hours prior. Fault repair suggestion: Please check for issues such as optical interface contamination, pigtail bending, and optical cable deterioration.
[0133] Diagnostic Rule 5: ODU and OAC Layer Alarms
[0134] ODU and OAC are the client layers of OCH. ODU and OAC are composed of multiple OCH segments. When an alarm is detected in the ODU and OAC layers, the OCH can be drilled down and the diagnostic rules of the OCH layer can be reused.
[0135] Diagnostic Rule Six: Impact of Optical Layer Failures (Connectivity and Performance Degradation, including Alarms) on the IP Layer. When an optical layer failure occurs, it should automatically be associated with the corresponding IP+optical cross-layer link, as well as the IP layer service path (LDP / BE / RSVP-TE / SR-TE tunnel / SR / segment path Internet version 6 (SRv6) policy) and L2 and L3 VPN services, outputting the root cause of the failure and providing fault repair suggestions.
[0136] Figure 9 shows a schematic diagram of optical network service fault diagnosis rules according to an embodiment of the present disclosure.
[0137] As shown in Figure 9, the optical network service fault diagnosis rules include alarm correlation analysis to obtain root cause alarms, alarm layering, scenarios with inaccurate correlation localization, and customized rules based on scenarios. For OAC, fault delimitation can be checked to see if it is within the boundary; if it is outside the boundary, the diagnosis result may be a third-party fault. For basic configuration checks, the correctness of service type configuration, service mapping, FEC mode configuration, wavelength tuning, and single-board hardware working status can be checked. When analyzing alarms, single-board alarms on the service path and alarms of network elements associated with the service path can be analyzed. Single-board alarms on the service path include OTS layer alarms, OMS layer alarms, OCH layer alarms, ODU layer alarms, and OAC layer alarms. OTS layer alarms include power limit exceeding alarms and OA status abnormal alarms. For power limit exceeding alarms, diagnosis can be performed according to power fluctuation diagnosis rules. For OA status abnormal alarms, the diagnosis results include hardware faults and system debugging failures. OMS layer alarms include power limit exceeding alarms and multiplexer / demultiplexer board status abnormality alarms. Power limit exceeding alarms can be diagnosed according to power fluctuation diagnostic rules. For multiplexer / demultiplexer board status abnormality alarms, the diagnostic results include hardware faults. OCH layer alarms include power limit exceeding alarms, signal degradation alarms, and signal interruption alarms. Each of these alarms can be diagnosed according to power fluctuation diagnostic rules, and may also be due to board failure or optical module failure. If it is a board failure, it may be a hardware failure or a software failure. The diagnostic results corresponding to ODU layer alarms include ODU k cross-connection loss and ODU k frame loss. The diagnostic results corresponding to OAC layer alarms include input power detection, interfacing setting problems, board failure, and optical module failure. Board failures include hardware failures and software failures. Service path associated network element alarms include fan alarms, power supply alarms, clock alarms, and monitoring alarms.
[0138] In some embodiments, the fault phenomenon can be obtained through the following steps: acquiring anomaly monitoring information and user interaction information, wherein the interaction information is information supplemented by the user to the anomaly monitoring information; performing semantic recognition on the anomaly monitoring information and the interaction information to obtain the fault phenomenon.
[0139] Anomaly monitoring information refers to the information detected by the anomaly monitoring system. Users can interact with the AI robot to supplement the anomaly monitoring information, thereby improving the accuracy of fault diagnosis. Fault phenomena are the result of semantic recognition of the anomaly monitoring information and the interactive information.
[0140] In some embodiments, based on the fault phenomenon, the first major model is used to perform cross-layer fault diagnosis on the hybrid network to obtain the diagnosis result (i.e., step S502) includes: obtaining a cross-layer network knowledge graph, wherein the cross-layer network knowledge graph is a graph that associates the natural knowledge of each network in the hybrid network; based on the fault phenomenon and the cross-layer network knowledge graph, the first major model is used to perform cross-layer fault diagnosis on the hybrid network to obtain the diagnosis result.
[0141] A cross-layer network knowledge graph is a structured block of knowledge data. Knowledge describes the data characteristics of the knowledge graph, and the connections between knowledge points represent its structural features. Based on cross-layer network knowledge graphs, cross-layer network fault management and root cause analysis can be achieved, such as fault aggregation, root alarm analysis, and rapid fault diagnosis and location. It can also be used for hazard analysis, such as querying port-related services based on batch port queries. For aggregated port scenarios, graph-based relational analysis is required. Furthermore, multi-layer resource queries and knowledge question answering can be performed based on cross-layer network knowledge graphs. Utilizing cross-layer network knowledge graphs can reduce the complexity of fault diagnosis.
[0142] Figure 10 illustrates a schematic diagram of a cross-layer network knowledge graph of IP networks and optical networks according to an embodiment of the present disclosure.
[0143] As shown in Figure 10, the cross-layer network knowledge graph of IP and optical networks includes the bearer service layer, tunnel layer, protocol layer, electrical layer, optical layer, and physical layer. Different layers possess different levels of knowledge; knowledge within the same layer is interconnected; knowledge across different layers is at least partially related; and knowledge from different layers can be linked across one or more layers. These relationships can improve the accuracy of fault location during cross-layer fault diagnosis.
[0144] In this embodiment of the disclosure, the first major model automatically learns fault diagnosis rules based on the cross-layer network knowledge graph. Thus, after the fault phenomenon is input into the first major model, the first major model can automatically complete the cross-layer fault diagnosis based on the fault phenomenon and the cross-layer network knowledge graph, and output the diagnosis result.
[0145] In some embodiments, based on fault phenomena and cross-layer network knowledge graphs, a first major model is used to perform cross-layer fault diagnosis on the hybrid network to obtain diagnostic results, including: the first major model obtains fault information based on fault phenomena; the fault information is matched in the cross-layer network knowledge graph to obtain target knowledge that matches the fault information, and resource information is obtained based on the fault information; cross-layer fault diagnosis is performed on the hybrid network based on the resource information and target knowledge to obtain diagnostic results.
[0146] Fault symptoms are the manifestations of a fault, while fault information is the text describing those symptoms.
[0147] In some embodiments, fault phenomena can be input using an AI robot or through a system anomaly monitoring system. For example, an AI robot can interact with a user, who describes the fault phenomenon during the interaction. The first major model identifies the fault phenomenon, obtains fault information, and then matches the fault information in a cross-layer network knowledge graph to obtain target knowledge that matches the fault information. This target knowledge consists of knowledge in the cross-layer network knowledge graph that is associated with the fault information and is useful for fault diagnosis. Resource information can also be obtained based on the fault information. Cross-layer fault diagnosis of the hybrid network is then performed based on the resource information and target knowledge to obtain diagnostic results. Resource information includes, but is not limited to, diagnostic instances. Using resource information and target knowledge for cross-layer fault diagnosis of the hybrid network can improve the accuracy of the diagnostic results.
[0148] In some embodiments, a cross-layer network knowledge graph is obtained through the following steps: cleaning the documents of the hybrid network to obtain cleaned documents; extracting knowledge from the cleaned documents using a second model based on pre-defined prompts, wherein the second model is a pre-trained model for acquiring knowledge in the hybrid network; performing deduplication and alignment processing on the extracted knowledge to obtain the relationships between different knowledge; and constructing a cross-layer network knowledge graph based on the knowledge and the relationships between knowledge.
[0149] For hybrid networks combining IP and optical networks, the documentation includes both IP network documentation and optical network documentation. Cleaning methods include, but are not limited to, format cleaning, removing documentation that the second-largest model cannot recognize, and retaining only the recognizable documentation.
[0150] In this embodiment of the disclosure, the prompt is pre-defined based on the hybrid network. The prompt can be knowledge related to IP networks or knowledge related to optical networks.
[0151] The second major model is used to extract knowledge from the cleaned documents, and the extracted knowledge is deduplicated. This reduces the amount of data processing in the later stages, improves the efficiency of building a cross-layer network knowledge graph, and reduces the time required to build the cross-layer network knowledge graph.
[0152] After deduplicating the extracted knowledge, the remaining knowledge is aligned to obtain the relationships between different pieces of knowledge. A cross-layer network knowledge graph is then constructed based on these relationships.
[0153] In some embodiments, a first major model is used to perform cross-layer fault diagnosis on a hybrid network to obtain diagnostic results, including: the first major model generates diagnostic steps for cross-layer fault diagnosis based on fault phenomena, wherein each diagnostic step includes one or more application programming interfaces (APIs); the APIs are executed according to the diagnostic steps, and diagnostic results are obtained based on the execution results of the APIs.
[0154] When performing cross-layer fault diagnosis on a hybrid network using a large model, the diagnostic results are determined through diagnostic steps. Each diagnostic step includes at least one API. The APIs are called according to the diagnostic steps, and the execution results of each API are summarized to obtain the diagnostic results.
[0155] In this embodiment of the disclosure, cross-layer fault knowledge question answering can be implemented based on the first major model, and fault knowledge question answering can also be implemented based on the first major model and the cross-layer network knowledge graph.
[0156] Figure 11 shows a flowchart of a cross-layer fault diagnosis method based on a first large model and a cross-layer network knowledge graph according to an embodiment of the present disclosure.
[0157] As shown in Figure 11, the cross-layer fault diagnosis method based on the first major model and cross-layer network knowledge graph can realize online fault diagnosis, specifically including the following steps S1101 to S1106.
[0158] In step S1101, the fault phenomenon is obtained through the anomaly monitoring system and / or user dialogue interaction.
[0159] The anomaly monitoring system can detect faults in hybrid networks. Users can interact with an AI robot, which uses a question-and-answer format to identify faults.
[0160] In step S1102, the first large model is used to identify the fault phenomenon and obtain fault information.
[0161] Based on the identification results of the fault phenomena, fault information is determined, which is the basic information for diagnosing faults.
[0162] In step S1103, the fault information is matched in the cross-layer network knowledge graph to obtain the target knowledge that matches the fault information.
[0163] Since the amount of data in the cross-layer network knowledge graph is large, and not all the knowledge in the cross-layer network knowledge graph is used for this fault diagnosis, the fault information is matched with the cross-layer network knowledge graph to filter out the knowledge that is useful for this fault diagnosis, thereby reducing the amount of data for fault diagnosis and improving the efficiency of fault diagnosis.
[0164] According to embodiments of this disclosure, fuzzy matching can be used for matching. Based on the fuzzy matching interfaces provided by various resource types in the configuration file, fuzzy matching capabilities are provided to obtain relevant matching information for the client.
[0165] In step S1104, resource information is obtained based on the fault information.
[0166] Resource information related to the fault is obtained from IP and optical networks based on the fault information. Resource information includes, but is not limited to, resource instances.
[0167] In step S1105, cross-layer fault diagnosis is performed based on resource information and target knowledge to obtain diagnosis results.
[0168] During fault diagnosis, the first major model utilizes resource information and target knowledge for cross-layer fault diagnosis. Based on the matched target knowledge-related nodes and resource instances, it investigates all possible causes and performs automatic diagnosis. In the automatic diagnosis process, the API can be determined first, then executed, and the results of each API can be summarized to output the diagnostic results. To further improve the accuracy of the diagnosis, while the first major model utilizes resource information and target knowledge for diagnosis, it can also be combined with cross-layer network knowledge graphs for further fault diagnosis.
[0169] In step S1106, the diagnostic results are presented.
[0170] In this embodiment of the disclosure, the diagnostic results can be presented via display or voice broadcast. The diagnostic results can also be recorded via a storage module.
[0171] In some embodiments, the construction and maintenance of cross-layer network knowledge graphs can be performed offline. The following description uses a hybrid network consisting of IP and optical networks as an example to illustrate the construction and maintenance of cross-layer network knowledge graphs, specifically including the following steps S1110 to S1150.
[0172] In step S1110, the document data is cleaned to obtain the cleaned document data.
[0173] The documentation includes, but is not limited to, IP+optical cross-layer diagnostic rules, IP+optical operation and maintenance documents, and digital twin simulation exercises. Cross-layer diagnostic rules can be manually defined.
[0174] In step S1120, knowledge extraction is performed using the second large model.
[0175] The second major model is used to extract knowledge from the cleaned documents.
[0176] In step S1130, an IP+optical cross-layer network knowledge graph is constructed.
[0177] The extracted knowledge is deduplicated and aligned to obtain the relationships between different pieces of knowledge. Based on the knowledge and the relationships between them, an IP+optical cross-layer network knowledge graph is constructed. The IP+optical cross-layer network knowledge graph can be manually reviewed, corrected, and supplemented.
[0178] In step S1140, an executable diagnostic API is generated.
[0179] A diagnostic scheme is determined by combining the extracted knowledge with an IP+optical knowledge graph design, and an executable diagnostic API is generated based on the diagnostic scheme. The IP+optical diagnostic scheme is determined based on the fault scenario and fault diagnosis rules, and includes a diagnostic process, with each diagnostic process including at least one API.
[0180] In step S1150, a knowledge graph of IP+optical cross-layer faults is constructed based on the confirmed API.
[0181] This disclosure also allows for the management of IP+optical cross-layer network knowledge graphs.
[0182] In this embodiment of the disclosure, the hybrid network architecture shown in Figure 4 can be used to perform fault diagnosis using a large model, or the hybrid network architecture shown in Figure 3 can be used to perform fault diagnosis using multiple sub-models.
[0183] When using the hybrid network architecture shown in Figure 3 for fault diagnosis, the first major model includes i+1 sub-models, each corresponding to a network. For example, in a hybrid network composed of IP and optical networks, the first major model includes two sub-models: the IP sub-model and the photonics sub-model.
[0184] Based on the fault phenomenon, the first major model is used to perform cross-layer fault diagnosis on the hybrid network to obtain the diagnosis result (i.e., step S502) includes: based on the fault phenomenon, the i-th diagnosis result of the i-th network is obtained using the i-th sub-model; based on the fault phenomenon and the i-th diagnosis result, the i+1-th diagnosis result of the i+1-th network is obtained using the i+1-th sub-model; the final diagnosis result is determined by combining the i-th diagnosis result and the i+1-th diagnosis result, where i is an integer greater than or equal to 1.
[0185] Figure 12 shows another flowchart of a cross-layer fault diagnosis method according to an embodiment of the present disclosure.
[0186] As shown in Figure 12, the fault diagnosis method includes the following steps S1201 to S1209.
[0187] In step S1201, the user sends a fault diagnosis command.
[0188] Users send fault diagnosis commands to the coordinator, which is a coordinator between the IP network and the optical network; for ease of description, it is simply referred to as the coordinator. Fault diagnosis commands include user complaints, IP network service interruptions, and packet loss.
[0189] In step S1202, the coordinator sends the fault information to the IP controller.
[0190] An IP controller is a controller corresponding to an IP network, used to analyze and locate faults in the IP network. In this embodiment of the disclosure, the sub-model corresponding to the IP network is the first sub-model.
[0191] After receiving a fault diagnosis instruction, the coordinator sends the fault symptoms to the IP controller and requests the IP controller to perform fault diagnosis.
[0192] In step S1203, the IP controller performs fault diagnosis of the IP network based on the fault phenomenon using the first sub-model to obtain the first fault diagnosis result.
[0193] The first sub-model is obtained after learning the IP network fault diagnosis rules from the knowledge base. The first sub-model obtains the first fault diagnosis result based on the fault phenomenon. The first fault diagnosis result includes the fault point (IP link) of the IP network and the cause of the fault.
[0194] In step S1204, the first fault diagnosis result is returned to the coordinator.
[0195] In step S1205, the coordinator requests the optical controller to locate the fault point of the optical network and analyze the cause of the fault based on the first fault diagnosis result and fault phenomenon.
[0196] In step S1206, the optical controller uses the second sub-model to perform fault diagnosis of the optical network and obtains the second fault diagnosis result.
[0197] The second sub-model is obtained after learning the optical network fault diagnosis rules from the knowledge base. Based on the fault phenomena, the second sub-model obtains a second fault diagnosis result. The second fault diagnosis result includes the fault point and the cause of the optical network.
[0198] In step S1207, the optical controller returns the second fault diagnosis result to the coordinator.
[0199] In step S1208, the coordinator combines the first fault diagnosis result and the second fault diagnosis result to determine the diagnosis result.
[0200] The diagnostic results include, but are not limited to, the location of the fault, the root cause of the fault, and repair recommendations.
[0201] In step S1209, the diagnostic results are returned to the user.
[0202] In some embodiments, based on the fault phenomenon, cross-layer fault diagnosis of the hybrid network is performed using the first major model to obtain the diagnosis result (i.e., step S502) includes: based on the fault phenomenon, obtaining the i-th diagnosis result of the i-th network using the i-th sub-model; based on the fault phenomenon, obtaining the j-th diagnosis result of the j-th network using the j-th sub-model; and combining the i-th diagnosis result and the j-th diagnosis result to determine the final diagnosis result of the fault phenomenon.
[0203] The i-th sub-model and the j-th sub-model are two distinct sub-models within the first main model. The i-th sub-model and the j-th sub-model correspond to the i-th network and the j-th network, respectively.
[0204] For example, the i-th sub-model is the IP sub-model, the j-th sub-model is the photonic model, the i-th network is the IP network, and the j-th network is the optical network. Based on the fault phenomena, the diagnostic results of the IP network are obtained using the IP sub-model; based on the fault phenomena, the diagnostic results of the optical network are obtained using the photonic model; the final diagnostic result is determined by combining the diagnostic results of the IP network and the optical network.
[0205] In some embodiments, before obtaining the i-th diagnostic result of the i-th network using the i-th sub-model based on the fault phenomenon, the method further includes: determining the fault scenario based on the fault phenomenon. Fault diagnosis based on the fault scenario can improve the diagnostic efficiency of the first major model.
[0206] Fault scenarios include interruption-type (connectivity) faults and packet loss-type faults.
[0207] The IP+optical cross-layer service connectivity fault diagnosis rules include: initiating a ping connectivity test; if the service fails, checking if the tunnel (SR Policy / SR-BE) carrying the service is connected. If both the primary and backup SR Policy devices are down, the service switches to SR-BE for escape; otherwise, initiating a trace route command to check the faulty node in the IP network. If the fault point is at router C, checking for VTELINK faults between CE (if VTELINK2 is in a down state, it indicates an interruption of the optical layer OCH connection); drilling down VTELINK 2 to check UNI-LINK and optical layer OAC; the optical controller detects and returns the specific fault point (e.g., fiber optic break point) and the cause of the fault via OTDR, and returns this information to the IP controller for display.
[0208] Figure 13 illustrates a flowchart of cross-layer service connectivity fault diagnosis for IP networks and optical networks according to embodiments of the present disclosure.
[0209] As shown in Figure 13, the steps for diagnosing IP+optical cross-layer service connectivity faults include the following steps S1301 to S1318.
[0210] In step S1301, a ping connectivity detection signal is initiated.
[0211] In step S1302, the ping result is obtained; if the ping result is successful, the process ends; if the ping result is unsuccessful, step S1303 is executed. Ping success and ping failure are the results of the ping connectivity test.
[0212] In step S1303, the route is resolved.
[0213] Route resolution includes looking up the forwarding table and tracing the path. When looking up the forwarding table, you can query the forwarding table via A->Z or Z->A (uplink and downlink). For IPv4, you can query the private network routing table, the forwarding table, and the tunnel layer. For IPv6, you can query the private network routing table and the IPv6 forwarding route. For L2VPN, you can query for the existence of an MSPW via a PW, or query the forwarding table.
[0214] In step S1304, test whether the network connection (IP ping) is successful.
[0215] In step S1305, the IP ping result is obtained. If the IP ping result is successful, the process ends; if the IP ping result is unsuccessful (network connection not established), step S1306 is executed.
[0216] In step S1306, IP path tracing is performed. The IP path tracing signal carries the source address.
[0217] If the A->Z ping fails, then perform an A->Z IP Trace. If the Z->A ping fails, then perform a Z->A IP Trace.
[0218] In step S1307, it is determined whether the Test Message System (TMS) between the faulty node and the destination has been interrupted. If yes, step S1308 is executed; otherwise, step S1309 is executed.
[0219] In step S1308, optical network diagnostics are performed.
[0220] In step S1309, the status of the laser (IPG) is checked.
[0221] In step S1309, the IPG status and configuration can be checked. The configuration includes the IP+optical cross-layer configuration; the configuration steps and instructions are detailed in Table 3.
[0222] Table 3 shows the configuration steps and instructions for IP+optical cross-layer.
[0223] In step S1310, it is determined whether the TMS corresponding to the associated link is experiencing light leakage. If yes, step S1311 is executed; otherwise, the process ends.
[0224] In step S1311, optical network diagnostics are performed.
[0225] After step S1303, steps S1312 and subsequent steps can also be performed simultaneously.
[0226] In step S1312, a Label Switched Path (LSP) Trace test is performed.
[0227] During LSP Trace testing, the LSP Trace carries the source address (A<->Z).
[0228] In step S1313, determine whether the LSP Trace result is connected. If yes, the process ends; otherwise, proceed to step S1314.
[0229] In step S1314, query the forwarding table from the faulty node to the destination.
[0230] If the A->Z Trace fails, query the forwarding table from the fault point to the Z end.
[0231] If the Z->A Trace fails, query the forwarding table from the fault point to segment A.
[0232] In step S1315, check the configuration of the faulty node.
[0233] For the service bearer tunnel SR-RE, determine whether ISIS enables SR and whether the Prefix-Sid is included in the interface address of ISIS. For LDP, check the LDP configuration (whether the interface under ISIS / OSPF is enabled in LDP). For SRV6-RE, determine whether global SRV6 is enabled, whether Locate is defined, whether SRV6 is enabled in ISIS, and whether SRV6Locate is configured.
[0234] In step S1316, a faulty node alarm is detected.
[0235] Check the faulty node hardware, resource alarms, and CPS alarms (CPU and corresponding ports).
[0236] In step S1317, it is determined whether the TMS between the fault node and the destination has lost light. If not, the process ends; if so, step S1318 is executed.
[0237] In step S1318, optical network diagnostics are performed.
[0238] The fault diagnosis rules for cross-layer packet loss in IP+optical include: the service initiates TWAMP / IOAM quality detection; if excessive latency or jitter occurs, the IP controller returns the end-to-end (E2E) latency of the SR policy path (the synchronization of Optical Layer Shared Risk Link Groups (SRLG) / latency has been completed in advance); if the latency or jitter of a certain VTE Link is too high, the optical controller returns the specific optical layer node and link with excessive OCH->OMS->OTS latency.
[0239] If excessive packet loss occurs, the specific IP node / VTE Link is located via IOAM for drill-down analysis. The optical controller then returns the information to the OAC, OCH->OMS->OTS optical network to pinpoint the exact location of any abnormalities in fiber attenuation / optical power.
[0240] It is necessary to inspect and output the optical network of each VTE Link that the SR Policy passes through.
[0241] Figure 14 illustrates a flowchart of cross-layer packet loss fault diagnosis for IP networks and optical networks according to embodiments of the present disclosure.
[0242] As shown in Figure 14, the steps for diagnosing IP+optical cross-layer packet loss faults include the following steps S1401 to S1418.
[0243] In step S1401, the path is restored.
[0244] Path restoration includes querying the forwarding table and restoring the path using trace.
[0245] By querying the forwarding table, query the forwarding table (uplink, downlink) for L3VPN / L3EVPN via A->Z and Z->A.
[0246] For IPv4, you can query the private network routing table or the forwarding table - tunnel layer.
[0247] For IPv6, you can query the private network routing table or query the IPv6 forwarding route - tunnel layer.
[0248] For L2VPN (SR-BE / LDP only), query for MSPW via PW or query the forwarding table - tunnel layer.
[0249] For L2EVPN, you can query I2-vpws or query the forwarding table - tunnel layer.
[0250] For IPv6, you can query the IPv6 forwarding route - tunnel layer.
[0251] In step S1402, check the port status.
[0252] Check the port status of network elements along the path. Ports include physical ports, physical sub-interfaces, SG, SG sub-interfaces, VEI, and VEI sub-interfaces. Ping the ports (available on the network side): ping 20 packets without packet loss. Configure traffic rate limiting (available on the user side). The following queries the physical ports; the last five types of interfaces will show their corresponding physical ports.
[0253] Packet loss: Port default queue or QoS packet loss (supported by some devices), CRC: cause location, optical power: cause location, port bandwidth utilization: query performance.
[0254] In step S1403, tunnel BFD alarm check is performed.
[0255] The tunnel BFD alarm check includes checking for tunnel-BFD oscillation alarms and historical alarms within a time frame, with a default time frame of 1 hour.
[0256] In step S1404, tunnel ping detection is performed.
[0257] When performing tunnel ping testing, you can ping 20 packets, A<->Z LSP-Ping.
[0258] In step S1405, check if there is packet loss. If yes, proceed to step S1406; otherwise, proceed to step S1410.
[0259] In step S1406, a TMS Ping test is initiated.
[0260] By using TMS Ping detection, the specific set of links experiencing packet loss can be identified.
[0261] In step S1407, determine whether there is packet loss or TMS light leakage. If yes, proceed to step S1408; otherwise, proceed to step S1409.
[0262] In step S1408, optical network diagnostics are performed.
[0263] In step S1409, check the hardware critical chip alarms, tunnel node hardware, resource alarms and CPS alarms.
[0264] Hardware critical chip alarms include chip alarms of network elements along the path, and CPS alarms include CPU and corresponding port alarms.
[0265] In step S1410, check the OSPF / ISIS neighbor disconnection history alarms.
[0266] First check the OSPF / ISIS configuration, then confirm the OSPF / ISIS alarms; there are more than or equal to 5 disconnections within 1 hour.
[0267] In step S1411, it is determined whether there is an alarm.
[0268] Based on the inspection results in step S1410, determine whether there is an alarm. If yes, proceed to step S1412; otherwise, proceed to step S1414.
[0269] In step S1412, it is determined whether the TMS corresponding to the associated link is out of order. If yes, step S1413 is executed; otherwise, step S1409 is executed.
[0270] In step S1413, optical network diagnostics are performed.
[0271] In step S1414, check the OSPF / ISIS protocol status.
[0272] In step S1415, determine whether the status is normal. If yes, proceed to step S1416; otherwise, proceed to step S1412.
[0273] In step S1416, Twamp performance is tested.
[0274] Twamp performance includes link-level Twamp performance.
[0275] In step S1417, it is determined whether the TMS of the link-level Twamp packet loss has been lost. If yes, step S1418 is executed; otherwise, the process ends.
[0276] In step S1418, optical network diagnostics are performed.
[0277] In some embodiments, after performing cross-layer fault diagnosis on the hybrid network based on the fault phenomenon using the first major model to obtain the diagnosis result (i.e., step S502), the method further includes: obtaining suggestions for modifying the diagnosis steps based on the diagnosis result using the first major model. After determining the diagnosis result, the first major model can also propose modification suggestions for the previous diagnosis steps. The first major model can modify the diagnosis steps, including changing the order, modifying the specific content and threshold values of the diagnosis, deleting steps, and adding new diagnosis steps and rules.
[0278] In this embodiment of the disclosure, based on the fault phenomenon, existing diagnostic steps in the knowledge base are queried, fault diagnosis steps for the same fault phenomenon are retrieved, and after deduplication and merging, diagnostic steps are automatically generated.
[0279] In some embodiments, engaging in conversational interaction with users during cross-layer fault diagnosis, allowing user involvement, and connecting user feedback can improve the accuracy and efficiency of diagnosis. Conversational interaction can take various forms, such as proactive reporting, proactive requests for instructions, or following orders; the variety of conversational methods can enhance diagnostic efficiency.
[0280] It should be noted that the embodiments disclosed herein can conduct one or more rounds of dialogue and perform session management. If it is required to display fault diagnosis results in the form of text and tables, it can support the display of text and tables in formats such as Markdown.
[0281] For ease of understanding, the cross-layer fault diagnosis method provided in this disclosure embodiment is described below with reference to Figure 15. The cross-layer fault diagnosis method provided in this disclosure embodiment mainly utilizes a front-end, a management service module, an intent classification service module, a fault diagnosis module, a vector database, a Large Language Model (LLM) service module, an asynchronous state sharing service, and a controller (UME). The front-end and management service module can be a ChatBot front-end and a ChatBot management service module, or other front-end and management service modules.
[0282] As shown in Figure 15, the cross-layer fault diagnosis method includes the following steps S1501 to S1537.
[0283] In step S1501, the user sends a fault report to the ChatBot front end.
[0284] In step S1502, the ChatBot frontend submits the fault phenomenon to the ChatBot management service module through the Representational State Transfer (REST) API.
[0285] In step S1503, the ChatBot management service module sends a query request to the intent classification service module to query the scenario corresponding to the fault phenomenon.
[0286] The ChatBot management service module sends a query request to the intent classification service module via the intent classification RESTAPI to query the scenario corresponding to the fault phenomenon.
[0287] In step S1504, the intent classification service module returns the fault diagnosis scenario.
[0288] The intent classification service module determines the fault diagnosis scenario based on the fault phenomenon and returns the determined fault diagnosis scenario to the ChatBot management service module. For example, the intent classification service module returns "IP + optical cross-layer fault diagnosis scenario".
[0289] In step S1505, the ChatBot management service module sends the fault symptoms to the fault diagnosis service module.
[0290] The ChatBot management service module calls the REST API interface to send the fault symptoms and fault diagnosis scenarios to the fault diagnosis service module.
[0291] In step S1506, the fault diagnosis service module generates a session ID and returns the session ID to the ChatBot management service module.
[0292] In step S1507, the fault diagnosis service module starts the IP network fault diagnosis agent thread and the optical network fault diagnosis agent thread to carry out the diagnosis process.
[0293] In step S1508, the IP network fault diagnosis agent sends a query request to the vector database to obtain the diagnostic rules corresponding to the IP network.
[0294] A vector database can be a single database, or it can include IP network vector databases and optical network vector databases.
[0295] The IP network fault diagnosis agent queries the IP network vector database for corresponding diagnostic rules through "user questions".
[0296] In step S1509, the vector database returns diagnostic rules to the fault diagnosis service module.
[0297] The diagnostic rules include prerequisites, additional questions, possible causes, and diagnostic steps.
[0298] It should be noted that if one or more cross-layer links are faulty, the optical network fault diagnosis agent queries the optical network vector database for the corresponding diagnostic rules; the optical network fault diagnosis agent returns fault diagnosis information, such as possible causes of the fault and fault diagnosis steps, to the IP network fault diagnosis agent thread.
[0299] In step S1510, the fault diagnosis service module sends the fault diagnosis prompt information to the asynchronous status sharing service module.
[0300] The fault diagnosis service module integrates the diagnostic rules returned by the IP network fault diagnosis agent and the optical network fault diagnosis agent to obtain cross-layer fault diagnosis prompts, such as possible causes of faults and fault diagnosis steps, and writes them into the asynchronous state sharing service (marked with a session ID).
[0301] In step S1511, the ChatBot management service module receives fault diagnosis prompt information from the asynchronous state sharing service module.
[0302] In step S1512, the ChatBot management service module returns cross-layer fault diagnosis prompts to the ChatBot front end, which can then display them to the user.
[0303] In step S1513, the fault diagnosis service module calls the LLM service module.
[0304] The IP network fault diagnosis agent and optical network fault diagnosis agent in the fault diagnosis service module call the LLM service module based on the "user question" and the list of diagnostic rule input parameters. The LLM service module determines the correspondence between target entity identification and diagnostic rule input parameters based on the prompt. The prompt is used to obtain the correspondence between the user question and the diagnostic rule input parameters. Target entity identification includes location, time, event, etc.
[0305] In step S1514, the LLM service module returns the correspondence between the diagnostic target entity identification and the diagnostic rule input parameters.
[0306] In step S1515, the fault diagnosis service module writes the missing entry parameter prompt information that needs to be supplemented into the asynchronous state sharing service module.
[0307] The fault diagnosis service module determines the missing entry parameters, identifies the missing entry parameters that need to be supplemented, and then writes the missing entry parameters that need to be supplemented to the asynchronous state sharing service.
[0308] In step S1516, the fault diagnosis service module enters a waiting state.
[0309] In step S1517, the ChatBot management service module receives missing entry parameter information from the asynchronous state sharing service module.
[0310] In step S1518, the ChatBot management service module returns the missing entry parameter information to the ChatBot frontend.
[0311] In step S1519, the ChatBot frontend sends the missing entry parameters of the user's reply to the ChatBot management service module.
[0312] In step S1520, the ChatBot management service module writes the missing entry parameters of the user's reply into the asynchronous state sharing service module.
[0313] In step S1521, the asynchronous information sharing service module sends the missing entry parameters from the user's reply to the fault diagnosis service module.
[0314] In step S1522, the fault diagnosis service module fills in the input parameters. If there are additional questions in the diagnosis rules (such as group failures), the additional questions are asked to the user. The process is the same as asking for missing input parameters, and will not be repeated here.
[0315] In step S1523, the fault diagnosis service module executes the current diagnosis step and retrieves the function corresponding to the diagnosis step from the vector database.
[0316] In step S1524, the vector database returns a function description.
[0317] If the function corresponding to the diagnostic step is found to be a swagger function from the vector database, then the swagger description of the function is returned to the fault diagnosis service module.
[0318] In step S1525, the fault diagnosis service module generates a REST interface JSON body based on the input parameters described by the function swagger and the existing context variables, and sends it to the LLM service module.
[0319] In step S1526, the fault diagnosis service module calls the REST interface to access the controller.
[0320] In step S1527, the controller returns the call result of the REST interface to the fault diagnosis service module.
[0321] In step S1528, the fault diagnosis service module composes a prompt word based on the output parameters and return results described by the swagger function and the description of the diagnosis steps, and sends the prompt word to the LLM service module.
[0322] In step S1529, the LLM service module returns the diagnostic result information of the current diagnostic step to the fault diagnosis service module and guides the next step to be executed or the diagnostic process to end.
[0323] In step S1530, the fault diagnosis service module writes the diagnosis result information of the current diagnosis step into the asynchronous state sharing service module.
[0324] In step S1531, the asynchronous state sharing service module sends the diagnostic result information of the current diagnostic step to the ChatBot management service module.
[0325] In step S1532, the ChatBot management service module sends the diagnostic result information to the ChatBot front end, and the ChatBot front end presents the diagnostic process information or the end information.
[0326] In step S1533, the fault diagnosis service module performs the next diagnosis step based on the diagnosis result information of the LLM service module until the end.
[0327] If the fault diagnosis service module fails to find the corresponding function from the vector database in step S1523, the asynchronous state sharing service module returns a manual operation prompt.
[0328] In step S1534, the fault diagnosis service module generates cross-layer diagnostic results corresponding to the fault phenomenon based on the diagnostic result information obtained from each diagnostic step.
[0329] In step S1535, the fault diagnosis service module sends the cross-layer diagnosis results to the asynchronous state sharing service module.
[0330] In step S1536, the state sharing service module sends the cross-layer diagnostic results to the ChatBot management service module.
[0331] In step S1537, the ChatBot management service module sends the cross-layer diagnostic results to the ChatBot frontend so that the ChatBot frontend can display the cross-layer diagnostic results.
[0332] The cross-layer fault diagnosis method in this embodiment of the present disclosure, based on the fault phenomenon, uses a first major model to perform cross-layer fault diagnosis on the hybrid network to obtain the diagnosis result. Since the first major model is a model trained based on cross-layer fault diagnosis rules, the first major model can combine the fault diagnosis rules of different network layers to accurately locate the fault, thereby improving the accuracy of fault diagnosis. Moreover, it can automatically complete cross-layer fault diagnosis without collaboration, which can improve the efficiency of fault diagnosis.
[0333] Secondly, embodiments of this disclosure provide an electronic device, which includes a memory and a processor; the memory stores a computer program that can be executed by the processor, and when the computer program is executed by the processor, it implements any one of the cross-layer fault diagnosis methods of this disclosure, which will not be described in detail here for the sake of brevity.
[0334] Among them, the processor is a device with data processing capabilities, including but not limited to the central processing unit (CPU); the memory is a device with data storage capabilities, including but not limited to random access memory (RAM, more specifically SDRAM, DDR, etc.), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), and flash memory (FLASH); the I / O interface (read-write interface) is connected between the processor and the memory, enabling information exchange between the memory and the processor, including but not limited to the data bus (Bus).
[0335] The electronic device provided in this embodiment of the present disclosure uses a first major model to perform cross-layer fault diagnosis on a hybrid network based on fault phenomena, and obtains diagnostic results. Since the first major model is a model trained based on cross-layer fault diagnosis rules, the first major model can combine fault diagnosis rules of different network layers to accurately locate faults and improve the efficiency of fault diagnosis.
[0336] Thirdly, embodiments of this disclosure provide a computer-readable medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements any of the cross-layer fault diagnosis methods of embodiments of this disclosure.
[0337] Fourthly, embodiments of this disclosure provide a computer program product, which includes a computer program that, when executed by a processor, implements any of the cross-layer fault diagnosis methods of this disclosure.
[0338] Those skilled in the art will understand that all or some of the steps, systems, and devices disclosed above, as functional modules / units, can be implemented as software, firmware, hardware, or suitable combinations thereof.
[0339] In hardware implementations, the division between functional modules / units mentioned in the above description does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be executed by several physical components working together.
[0340] Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit (CPU), digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit (ASIC). Such software may be distributed on a computer-readable medium, which may include computer storage media (or non-transitory media) and communication media (or transient media). As is known to those skilled in the art, the term computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technique for storing information (such as computer-readable instructions, data structures, program modules, or other data). Computer storage media include, but are not limited to, random access memory (RAM, more specifically SDRAM, DDR, etc.), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory (FLASH) or other disk storage; read-only optical disc (CD-ROM), digital versatile disc (DVD) or other optical disc storage; magnetic cartridges, magnetic tapes, disk storage or other magnetic storage; and any other media that can be used to store desired information and can be accessed by a computer. Furthermore, as is known to those skilled in the art, communication media typically contain computer-readable instructions, data structures, program modules, or other data in modulated data signals such as carrier waves or other transmission mechanisms, and may include any information delivery medium.
[0341] This disclosure has disclosed exemplary embodiments, and although specific terminology has been used, it is for general illustrative purposes only and should not be construed as limiting. In some instances, it will be apparent to those skilled in the art that features, characteristics, and / or elements described in conjunction with particular embodiments may be used alone, or in combination with features, characteristics, and / or elements described in conjunction with other embodiments, unless otherwise expressly indicated. Therefore, those skilled in the art will understand that various changes in form and detail may be made without departing from the scope of this disclosure as set forth by the appended claims.
Claims
1. A cross-layer fault diagnosis method, comprising: acquiring a fault phenomenon, wherein the fault phenomenon has a fault manifestation in a mixed network comprising two or more networks; based on the fault phenomenon, using a first large model to perform cross-layer fault diagnosis on the mixed network to obtain a diagnosis result, wherein the first large model is a model trained based on knowledge in a knowledge base, and the knowledge comprises cross-layer fault diagnosis rules.
2. The method of claim 1, wherein, Based on the fault phenomenon, using a first large model to perform cross-layer fault diagnosis on the mixed network to obtain a diagnosis result comprises: acquiring a cross-layer network knowledge graph, wherein the cross-layer network knowledge graph is a graph that associates natural knowledge of each network in the mixed network together; based on the fault phenomenon and the cross-layer network knowledge graph, using a first large model to perform cross-layer fault diagnosis on the mixed network to obtain a diagnosis result.
3. The method of claim 2, wherein, Based on the fault phenomenon and the cross-layer network knowledge graph, using a first large model to perform cross-layer fault diagnosis on the mixed network to obtain a diagnosis result comprises: the first large model acquires fault information based on the fault phenomenon; matching the fault information in the cross-layer network knowledge graph to obtain target knowledge matched with the fault information, and obtaining resource information based on the fault information; based on the resource information and the target knowledge, performing cross-layer fault diagnosis on the mixed network to obtain the diagnosis result.
4. The method of claim 2, wherein, The cross-layer network knowledge graph is obtained by the following steps: cleaning document materials of the mixed network to obtain cleaned document materials; based on pre-set prompt words, using a second large model to extract knowledge from the cleaned document materials, wherein the second large model is a pre-trained model for obtaining knowledge in the mixed network; de-duplicating and aligning the extracted knowledge to obtain the association relationship between different knowledge; based on the knowledge and the association relationship between the knowledge, constructing the cross-layer network knowledge graph.
5. The method of claim 1, wherein, Using a first large model to perform cross-layer fault diagnosis on the mixed network to obtain a diagnosis result comprises: the first large model generates diagnosis steps for cross-layer fault diagnosis based on the fault phenomenon, wherein each diagnosis step comprises one or more application programming interfaces (APIs); executing the APIs according to the diagnosis steps, and obtaining the diagnosis result based on the execution results of the APIs.
6. The method of claim 1, wherein, The first large model comprises i+1 sub-models; Based on the fault phenomenon, using a first large model to perform cross-layer fault diagnosis on the mixed network to obtain a diagnosis result comprises: based on the fault phenomenon, using the i-th sub-model to obtain the i-th diagnosis result of the i-th network; based on the fault phenomenon and the i-th diagnosis result, using the i+1-th sub-model to obtain the i+1-th diagnosis result of the i+1-th network; combining the i-th diagnosis result and the i+1-th diagnosis result to determine the diagnosis result, wherein i is an integer greater than or equal to 1.
7. The method of claim 6, wherein, Before obtaining the i-th diagnosis result of the i-th network using the i-th sub-model based on the fault phenomenon, the method further comprises: Determine a fault scenario based on the fault phenomenon.
8. The method of claim 1, wherein, The obtaining of the fault phenomenon comprises: Obtaining abnormal monitoring information and interaction information of a user, wherein the interaction information is information supplemented by the user to the abnormal monitoring information; Performing semantic recognition on the abnormal monitoring information and the interaction information to obtain the fault phenomenon.
9. The method of claim 1, wherein, After the first large model is used to perform cross-layer fault diagnosis on the hybrid network based on the fault phenomenon to obtain a diagnosis result, the method further comprises: Based on the diagnosis result, using the first large model to obtain a suggestion for modifying the diagnosis steps.
10. The method of claim 1, wherein, The knowledge comprises at least one of an operation and maintenance manual, a fault case, and a diagnosis step.
11. The method of claim 1, wherein, The hybrid network comprises an Internet Protocol network and an optical network.
12. The method of claim 1, wherein, The application scenarios of the hybrid network comprise a gray optical application scenario and a colored optical application scenario.
13. An electronic device comprising a memory and a processor; the memory stores a computer program, and the computer program is executed by the processor to implement the cross-layer fault diagnosis method in any one of claims 1 to 12.
14. A computer readable medium having a computer program stored thereon, wherein the computer program is executed by a processor to implement the cross-layer fault diagnosis method in any one of claims 1 to 12.
15. A computer program product comprising a computer program, wherein the computer program is executed by a processor to implement the cross-layer fault diagnosis method in any one of claims 1 to 12.