Operation and maintenance method and system for routing device, electronic device and storage medium

By analyzing routing device status data using AI computing boards and intelligent agents, the problem of low efficiency in routing device fault detection was solved, automated operation and maintenance was achieved, and the accuracy and efficiency of fault detection were improved.

WO2026138071A1PCT designated stage Publication Date: 2026-07-02ZTE CORP

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
ZTE CORP
Filing Date
2025-10-11
Publication Date
2026-07-02

Smart Images

  • Figure CN2025126968_02072026_PF_FP_ABST
    Figure CN2025126968_02072026_PF_FP_ABST
Patent Text Reader

Abstract

Disclosed in embodiments of the present application are an operation and maintenance method and system for a routing device, an electronic device and a storage medium. The operation and maintenance method for a routing device comprises: acquiring state data of a routing device; performing combination processing on the state data to obtain a plurality of state data groups, each piece of state data belonging to one or more state data groups, and each state data group corresponding to one data category; on the basis of a preconfigured correspondence between prompt information and data categories, obtaining target prompt information corresponding to each data state group, each piece of prompt information describing a data analysis policy; on the basis of the target prompt information and the state data in the state data group, using a first large model to perform data analysis processing, to obtain a data analysis result; and performing operation and maintenance processing on the routing device on the basis of the data analysis result.
Need to check novelty before this filing date? Find Prior Art

Description

Router maintenance methods, maintenance systems, electronic devices, and storage media

[0001] Cross-reference to related applications

[0002] This application claims priority to Chinese Patent Application No. 202411954072.X, filed on December 27, 2024, entitled “Operation and Maintenance Method, Operation and Maintenance System, Electronic Device and Storage Medium for Routing Device”, the entire contents of which are incorporated herein by reference. Technical Field

[0003] This application relates to the field of operation and maintenance, and in particular to a method for operation and maintenance of routing devices, an operation and maintenance system, electronic devices, and storage media. Background Technology

[0004] During the operation of routing devices, random failures may occur. In some cases, the cause of the failure may go undetected because there are no identical failure records in the historical maintenance data. Alternatively, the failure may also be due to incomplete configuration by the staff, failing to pre-configure targeted detection conditions, thus affecting subsequent maintenance work. Therefore, when a routing device failure is detected, it is often necessary to notify maintenance staff, who then rely on their individual maintenance experience to perform targeted detection methods to locate the cause of the failure and implement corresponding maintenance measures. Manual maintenance is inefficient and degrades the user experience. Summary of the Invention

[0005] The purpose of this application is to provide a method, system, electronic device, and storage medium for the operation and maintenance of routing devices, in order to solve the problem of how to improve the operation and maintenance efficiency of routing devices.

[0006] On one hand, embodiments of this application provide a method for the operation and maintenance of a routing device, comprising: acquiring status data of the routing device; combining the status data to obtain multiple status data groups; each status data group belonging to one or more status data groups; each status data group corresponding to a data category; deriving target prompt information corresponding to each data status group based on a pre-configured correspondence between prompt information and the data category; each prompt information describing a data analysis strategy; performing data analysis processing using a first large model based on the target prompt information and the status data in the status data group to obtain data analysis results; and performing operation and maintenance processing on the routing device according to the data analysis results.

[0007] On the other hand, embodiments of this application provide an operation and maintenance system for a routing device. The system includes a first intelligent agent, a data acquisition module, and a fault handling module. Specifically: the data acquisition module collects status data of the routing device and transmits it to the first intelligent agent; the first intelligent agent combines the status data to obtain multiple status data groups; each status data group belongs to one or more status data groups; each status data group corresponds to a data category; the first intelligent agent is further configured to derive target prompt information corresponding to each data status group based on a pre-configured correspondence between prompt information and the data category; each prompt information describes a data analysis strategy; the first intelligent agent is further configured to perform data analysis processing using a first large model based on the target prompt information and the status data in the status data groups, obtain data analysis results, and transmit them to the fault handling module; the fault handling module performs operation and maintenance processing on the routing device according to the data analysis results.

[0008] In another aspect, embodiments of this application provide an electronic device, including a processor and a memory electrically connected to the processor, the memory storing a computer program, and the processor being used to call and execute the computer program from the memory to implement the above-mentioned operation and maintenance method for the routing device.

[0009] In another aspect, embodiments of this application provide a computer-readable storage medium for storing a computer program that can be executed by a processor to implement the above-described operation and maintenance method for the routing device.

[0010] In another aspect, embodiments of this application provide a computer program product, including a computer program, which is executed by a processor to implement the above-described operation and maintenance method for routing devices. Attached Figure Description

[0011] To more clearly illustrate the technical solutions in one or more embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in one or more embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0012] Figure 1 is a schematic flowchart of an operation and maintenance method for a routing device provided in an embodiment of this application;

[0013] Figure 2 is a schematic diagram of the structure of an AI computing board provided in an embodiment of this application;

[0014] Figure 3 is a partial flowchart of a routing device operation and maintenance method provided in an embodiment of this application;

[0015] Figure 4 is another partial flowchart of a routing device operation and maintenance method provided in an embodiment of this application;

[0016] Figure 5 is a partial flowchart of another method for the operation and maintenance of a routing device provided in an embodiment of this application;

[0017] Figure 6 is a data flow diagram of an operation and maintenance system for a routing device provided in an embodiment of this application;

[0018] Figure 7 is another data flow diagram of an operation and maintenance system for a routing device provided in an embodiment of this application;

[0019] Figure 8 is a schematic diagram of the operation and maintenance system of a routing device provided in an embodiment of this application;

[0020] Figure 9 is a schematic diagram of the interaction between intelligent agents in the operation and maintenance system of a routing device provided in an embodiment of this application;

[0021] Figure 10 is a schematic flowchart of a routing device operation and maintenance method provided in an embodiment of this application;

[0022] Figure 11 is another schematic flowchart of a routing device operation and maintenance method provided in an embodiment of this application;

[0023] Figure 12 is another schematic flowchart of a routing device operation and maintenance method provided in an embodiment of this application;

[0024] Figure 13 is a schematic block diagram of an operation and maintenance system for a routing device provided in an embodiment of this application;

[0025] Figure 14 is a schematic block diagram of an electronic device according to an embodiment of the present application. Detailed Implementation

[0026] This application provides a method for operating and maintaining a routing device, an operation and maintenance system, an electronic device, and a storage medium.

[0027] To enable those skilled in the art to better understand the technical solutions in this application, the technical solutions in the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of this application.

[0028] The terms "first," "second," etc., used in the specification and claims of this application are used to distinguish similar objects and are not used to describe a specific order or sequence. It should be understood that such terms can be used interchangeably where appropriate so that embodiments of this application can be implemented in orders other than those illustrated or described herein.

[0029] The operation and maintenance method for routing devices provided in this application can be executed by an electronic device or by software installed in an electronic device. In one example, the electronic device can be a terminal device or a server device. The terminal device can include smartphones, laptops, smart wearable devices, vehicle terminals, etc., and the server device can include an independent physical server, a server cluster composed of multiple servers, or a cloud server capable of cloud computing.

[0030] Figure 1 is a schematic flowchart of a routing device operation and maintenance method provided in an embodiment of this application.

[0031] The operation and maintenance methods for routing devices can be applied to the operation and maintenance system of routing devices.

[0032] The operation and maintenance method for routing devices provided in this application can be used to operate and maintain routing devices, which can be any type of routing device. A routing device is a network device primarily used to forward data packets between different networks, enabling communication between them. It is a key component in computer networks, responsible for transmitting data from a source address to a destination address. Routing devices include, but are not limited to, routers, Layer 3 switches, gateways, etc.

[0033] Routing devices can also be high-end routing devices. High-end routing devices refer to network devices that achieve high levels of performance, functionality, reliability, scalability, and security. They are typically used in scenarios with extremely high requirements for network performance and stability, such as large enterprises, data centers, telecommunications operators, and cloud computing centers. Compared to ordinary home or small business routers, high-end routing devices have more powerful processing capabilities, higher throughput, more complex routing functions, and richer security and management features.

[0034] The operation and maintenance system for routing devices can be a software platform or toolset used to manage and maintain routing devices.

[0035] As shown in Figure 1, the operation and maintenance method of the routing device includes the following steps S102-S110.

[0036] S102, Obtain the status data of the routing device.

[0037] During the operation of a routing device, various components within the device generate a variety of data that reflect the device's operational status, known as status data.

[0038] Status data of routing devices includes, but is not limited to: temperature, voltage, humidity, traffic, etc.

[0039] The status data listed above are merely examples; any data that reflects the operating status of a routing device during operation can be used as status data.

[0040] Acquiring status data from the routing device may include receiving temperature information collected by a first sensor.

[0041] Acquiring status data from the routing device may include receiving voltage information collected by a second sensor.

[0042] Acquiring status data from the routing device may include receiving humidity information collected by a third sensor.

[0043] Obtaining status data from routing devices can include receiving traffic information collected through a traffic monitoring system, and so on.

[0044] In some implementations, step S102 can be executed by the data acquisition module in the operation and maintenance system of the routing device, that is, the data acquisition module obtains the status data of the routing device.

[0045] The data acquisition module obtains the status data of the routing device. This can be achieved by the data acquisition module performing data acquisition operations on the routing device to obtain the status data.

[0046] In one instance, the data acquisition module can collect real-time status data of the routing device through the data interface of the AI ​​(Artificial Intelligence) computing board and the network interface.

[0047] AI computing boards are hardware devices specifically designed for artificial intelligence applications, typically used to accelerate the training and inference processes of AI models. They integrate high-performance processors, accelerators, and associated memory and interfaces, providing efficient computing power to meet the high computational resource demands of AI applications.

[0048] The structure of the AI ​​computing board will now be illustrated with reference to Figure 2. Figure 2 is a schematic diagram of the structure of an AI computing board provided in an embodiment of this application.

[0049] The AI ​​computing board includes the following components.

[0050] (a1) The computing core module and AI accelerator provide ample computing resources to support the deployment of machine learning and deep learning models, ensuring rapid response for real-time data analysis and fault prediction.

[0051] As shown in Figure 2, the AI ​​computing board includes a computing core module 202 and an AI accelerator 204. The computing core module 202 and the AI ​​accelerator 204 are connected.

[0052] (a2) The high-speed cache module and data interface module enable high-speed data exchange and ensure processing efficiency in large data scenarios.

[0053] As shown in Figure 2, the AI ​​computing board also includes a high-efficiency cache 206 and a data interface 208. The high-efficiency cache 206 is connected to the computing core module 202, and the data interface 208 is connected to the computing core module 202.

[0054] (a3) Storage unit module and power management module, which support data storage and persistence, while ensuring the stability of the computing board.

[0055] As shown in Figure 2, the AI ​​computing board also includes a storage unit 210 and a power management unit 212. The storage unit 210 is connected to the computing core module 202, and the power management unit 212 is also connected to the computing core module 202.

[0056] (a4) Heat dissipation system to ensure temperature control of the computing board under long-term high load operation.

[0057] As shown in Figure 2, the AI ​​computing board also includes a heat dissipation system 214. The heat dissipation system 214 is connected to the computing core module 202.

[0058] (a5) Control and management module and network interface module to ensure that the computing board runs smoothly in the entire network management system and can quickly respond to and process abnormal data.

[0059] As shown in Figure 2, the AI ​​computing board also includes a control management system 216 and a network interface 218. The control management system 216 is connected to the computing core module 202, and the network interface 218 is connected to the computing core module 202.

[0060] After the data acquisition module obtains the status data of the routing device, it can transmit the status data to the first intelligent agent in the operation and maintenance system of the routing device.

[0061] Alternatively, the status data of the routing device can also be obtained by the first intelligent agent in the operation and maintenance system of the routing device. That is, the first intelligent agent obtains the status data of the routing device, which is collected by the data acquisition module.

[0062] The first intelligent agent acquires the status data of the routing device, which can be achieved by the first intelligent agent receiving the status data of the routing device sent by the data acquisition module. This data acquisition module can belong to the same operation and maintenance system of the routing device as the first intelligent agent, or it can belong to a device or system outside the operation and maintenance system of the routing device.

[0063] An intelligent agent is an entity capable of perceiving, making decisions, and executing actions within a specific environment. It typically possesses a degree of autonomy, intelligence, and adaptability, enabling it to react to changes in its environment and accomplish specific tasks or goals. The concept of intelligent agents is widely applied in fields such as artificial intelligence, robotics, software systems, and network communication. The core objective of an intelligent agent is to autonomously complete tasks based on changes in its environment, usually achieved through interaction with that environment.

[0064] In this embodiment of the application, the operation and maintenance system of the routing device may include one or more intelligent agents, and the first intelligent agent may be one of the intelligent agents in the operation and maintenance system of the routing device.

[0065] One or more intelligent agents in the operation and maintenance system of routing equipment integrate machine learning and data mining technologies, enabling them to learn and optimize autonomously. Among them, the first intelligent agent can predict faults based on real-time data and take proactive measures based on the analysis results to ensure the health of the network. It can also operate seamlessly, requiring no user intervention during use.

[0066] The first intelligent agent operates with the support of a large model. The large model is a large model trained using training data from the target domain. In the field of artificial intelligence, a large model typically refers to a deep learning model trained on a large dataset and using complex algorithms. The large model can be used for real-time data analysis, anomaly identification, and decision-making, among other things.

[0067] The target domain can be a pre-defined private domain, such as the operation and maintenance of high-end routing equipment. A private domain is a concept relative to a general domain. A general domain refers to widely applicable, publicly available, and standardized knowledge, technologies, or rules applicable to multiple industries or fields. A private domain refers to confidential resources or rules owned by a specific individual, organization, or institution, and is therefore exclusive.

[0068] S104, combine the state data to obtain multiple state data groups; each state data belongs to one or more state data groups; each state data group corresponds to a data category.

[0069] Step S104 can be executed by the first agent in the operation and maintenance system of the routing device.

[0070] In one implementation, the state data is combined to obtain multiple state data groups. This can be achieved by the first intelligent agent combining the state data based on the first major model to obtain multiple state data groups.

[0071] The first major model does not simply perform data queries on historical data during data processing. Instead, it utilizes the data processing capabilities learned during training to process the data received by the first major model. Therefore, in the process of the first agent combining state data based on the first major model to obtain multiple state data groups, the first major model uses its trained combination capabilities to combine various state data. On the one hand, multiple state data matching the data analysis strategy can be aggregated together before data analysis processing, which is beneficial to improving data analysis efficiency. On the other hand, a state data may belong to multiple state data groups at the same time, and this state data can be reused in data analysis processes using different data analysis strategies. This is beneficial to diversify the state data in each state data group and improve the accuracy of data analysis in subsequent steps.

[0072] In another implementation, the state data is combined to obtain multiple state data groups. This can be achieved by a first intelligent agent classifying the state data for each data category, obtaining a classification result indicating whether the state data belongs to the corresponding state data group for that data category. Then, based on the classification result, the first intelligent agent combines the state data to obtain multiple state data groups.

[0073] In another implementation, the state data is combined to obtain multiple state data groups. This can be achieved by the first intelligent agent querying one or more historical state data groups corresponding to each state data in the historical operation and maintenance data. Then, the first intelligent agent combines the state data according to the query results to obtain multiple state data groups.

[0074] The implementation methods listed above are merely examples. In practical applications, the combination methods used to process state data can also be customized according to the scenario.

[0075] Data categories can be pre-designed for various faults of the routing device. For any data category, the status data belonging to that data category can be used as a reference for analyzing whether the routing device has a fault corresponding to that data category.

[0076] Alternatively, the data categories can be pre-designed for each component in the routing device. For any given data category, the status data belonging to that data category can serve as a reference for analyzing whether the corresponding component in the routing device for that data category has failed.

[0077] Alternatively, data categories can be designed for sets of components within a routing device. For any given data category, the status data belonging to that category can serve as a reference for analyzing whether a fault has occurred in the corresponding set of components within that data category of the routing device. Each set of components can include one or more components from the routing device. This setup takes into account that some components may exhibit a high degree of similarity in the types of faults they may experience. Therefore, the data analysis process requires the same or similar data types and employs the same or similar data analysis strategies, allowing the status data related to these components to be analyzed together.

[0078] The data category settings listed above are merely examples; in practical applications, data categories can be flexibly divided according to the specific scenario.

[0079] Considering that the numerical anomalies of some state data may be caused by any of the various reasons, a state data may belong to multiple state data groups at the same time. As a result, there are a wide variety of state data that can be used as a reference when analyzing each state data group.

[0080] For example, if a status data belongs to at most one status data group, then when using status data group 1 to determine whether fault A exists, temperature 1 has already been used, and when using status data group 2 to determine whether fault B exists, temperature 1 cannot be used.

[0081] The operation and maintenance method of the routing device provided in the embodiments of this application can use temperature 1 when using status data group 1 to determine whether fault A exists, and can still use temperature 1 when using status data group 2 to determine whether fault B exists.

[0082] Combining state data refers to combining two or more types of state data together.

[0083] For example, status data includes: temperature 1, temperature 2, voltage 1, humidity 1, flow rate 1.

[0084] The state data is combined to obtain state data group 1, state data group 2 and state data group 3.

[0085] The status data group 1 includes: temperature 1, temperature 2, and voltage 1.

[0086] Status data group 2 includes: temperature 2, voltage 1, humidity 1, and flow rate 1.

[0087] Status data group 3 includes: Traffic 1.

[0088] In this embodiment of the application, each state data group may include one or more state data. Each state data may belong to one or more state data groups.

[0089] In addition, after step S102 is executed and before step S104 is executed, the first intelligent agent can also perform operations such as cleaning, transformation, segmentation, and storage on the state data. During the storage operation on the state data, the first intelligent agent can store local data in large-capacity storage SSD (Solid State Drive) or NVME (Non-Volatile Memory Express), or it can actively send the data to the cloud for storage through the network management system when the operation and maintenance system is idle.

[0090] Based on similar technical concepts, in some implementations, the operation and maintenance system of a routing device may include multiple intelligent agents. One intelligent agent performs preprocessing operations on the acquired state data to obtain processed state data, and transmits the processed state data to another intelligent agent, which then executes steps S104-S108. Preprocessing operations include, but are not limited to, cleaning, transformation, segmentation, storage, etc.

[0091] S106, based on the pre-configured correspondence between prompt information and data categories, the target prompt information corresponding to each data state group is obtained; each prompt information describes a data analysis strategy.

[0092] Step S106 can be executed by the first agent in the operation and maintenance system of the routing device.

[0093] Based on the pre-configured correspondence between prompt information and data categories, the target prompt information corresponding to each data state group is obtained. For each state data group, the first agent queries the pre-configured correspondence between prompt information and data categories to obtain the target prompt information corresponding to the state data group according to the data category corresponding to the state data group.

[0094] Each prompt message can be designed for a specific fault, and this prompt message can describe the data analysis strategy for the corresponding fault. Each data category corresponds to a fault, so there is a correspondence between data categories and prompt messages. Data categories can be used to query the corresponding prompt messages within this correspondence.

[0095] Alternatively, each alert message can be designed for a specific component of the routing device, describing data analysis strategies for various potential failures of that component. Each data category corresponds to a component within the routing device, thus establishing a relationship between data categories and alert messages. Data categories can be used to retrieve the corresponding alert messages within this relationship.

[0096] Alternatively, each alert message can be designed for a set of components within the routing device. Each component set includes one or more components within the routing device, and the alert message can describe the data analysis strategies for various potential failures of the corresponding component set. Each data category corresponds to a component set within the routing device; therefore, a correspondence exists between data categories and alert messages, and data categories can be used to query the corresponding alert messages within this correspondence.

[0097] The prompts listed above are merely examples; in practical applications, the content of the prompts can be flexibly set according to the specific scenario.

[0098] For example, the pre-configured correspondences include: data category 1 <—> prompt message 1; data category 2 <—> prompt message 2; data category 3 <—> prompt message 3, and so on.

[0099] For state data group 1, which corresponds to data category 2, the first agent can query the target prompt information: prompt information 2 from the pre-configured correspondence based on data category 2.

[0100] Each prompt message can describe a data analysis strategy. Each data analysis strategy can be designed based on a specific type of fault. Data analysis strategies for different types of faults may be the same, similar, or different.

[0101] In configuring prompts, one or more of the following prompting techniques can be used: COTPrompt (Chain of Thought Prompting), Active-Prompt, Zero-shot prompting, and Reflexion.

[0102] The various prompting word technologies listed above are merely examples. In practical applications, any prompting word technology that matches the scenario can be adopted.

[0103] S108, based on the target prompt information and the state data in the state data group, the first major model is used to perform data analysis and processing to obtain the data analysis results.

[0104] Step S108 can be executed by the first intelligent agent in the operation and maintenance system of the routing device.

[0105] Based on the target prompt information and the state data in the state data group, the first major model is used for data analysis and processing to obtain the data analysis results. Alternatively, the first intelligent agent can input the target prompt information and the state data in the state data group into the first major model for data analysis and processing to obtain the data analysis results of the state data group.

[0106] In some implementations, data analysis results may include the fault categories and fault handling methods of routing devices.

[0107] Among them, the fault category can be used to identify the corresponding fault to be handled in the routing device, and the fault handling method can describe the operation and maintenance measures taken for the operation and maintenance of the fault to be handled.

[0108] The fault to be handled can be a fault that has already occurred in the routing device, or a fault that is about to occur in the routing device.

[0109] For example, the state data group includes state data 1, state data 2, and state data 3. The first agent inputs the target prompt information and these three state data into the first big model. After receiving these data, the first big model will automatically perform data analysis and generate a data analysis result.

[0110] In other implementations, the data analysis results may also include the operational level of the routing device, which indicates the extent to which existing or impending unresolved faults in the routing device affect its normal operation.

[0111] For example, maintenance level S1 indicates that the pending faults that have occurred or are about to occur in the routing device have almost no impact on the normal operation of the routing device; maintenance level S2 indicates that the pending faults that have occurred or are about to occur in the routing device have a slight impact on the normal operation of the routing device; maintenance level S3 indicates that the pending faults that have occurred or are about to occur in the routing device cause the routing device to be unable to operate normally, and so on.

[0112] Furthermore, under the S1 level of operation and maintenance, the operation and maintenance system of the routing device does not need to perform special operation and maintenance operations. Under the S2 level of operation and maintenance, the operation and maintenance system of the routing device can add operation and maintenance plans according to a specified time period, such as from 2:00 AM to 5:00 AM, so as to minimize the interference of operation and maintenance operations to users. Under the S3 level of operation and maintenance, the operation and maintenance system of the routing device immediately performs operation and maintenance processing to avoid the routing device from failing to work properly.

[0113] The first major model possesses data analysis capabilities, which it learns during the model training process. It can be used to solve complex logical problems, generate answers through step-by-step reasoning, provide decision-making suggestions based on data analysis, and so on.

[0114] The first major model utilizes the target domain expertise accumulated during training in the data analysis process. This expertise includes both standard definitions of concepts and historical operational experience.

[0115] By analyzing data to determine the fault categories of routing devices, fault detection can be achieved. By analyzing data to determine the fault handling methods of routing devices, some of the operation and maintenance work of routing devices can be accomplished.

[0116] It is important to note that the most significant difference between the two technical approaches—fault detection through the analysis of the first major model data and fault detection according to manually configured detection rules—lies in the fact that fault detection according to manually configured detection rules can only be judged based on individual indicators with very prominent abnormal behavior, such as when the temperature T is greater than the temperature threshold T0, or when the flow rate S is less than or equal to the flow rate threshold S0.

[0117] In practical applications, when a fault occurs, or is about to occur, it may affect many metrics. However, the abnormal behavior of these metrics may not be obvious; they might simply appear as "slightly high temperature" or "slightly low traffic." Experienced maintenance personnel might sense an impending fault based on these subtle changes, but from the machine's perspective, as long as the metric doesn't reach a significant outlier, it's considered to be operating normally and not faulty. Therefore, fault detection using manually configured rules typically only detects faults that have already occurred and where only a few metrics have reached significant outliers. Furthermore, if there are multiple reasons why a single metric might reach an outlier, additional manual checks may be necessary to further pinpoint the cause of the fault.

[0118] The data analysis using the first large model is not about comparing the size of the indicators with the preset indicator thresholds. The large model has accumulated a wealth of operational knowledge during the training process. In the data analysis process, both indicators with outstanding abnormal performance and those with inconspicuous abnormal performance can be used as references for data analysis.

[0119] For example, if the temperature 1 of the first component in the routing device is detected individually and found to be slightly elevated, it is not considered that there is a fault. However, if the temperature 1 of the first component rises slightly, the humidity 1 of the second component in the routing device rises slightly, the traffic 1 of the third component in the routing device decreases significantly, and so on, and many other factors are linked together, the first model can predict through data analysis that component 1 is about to experience a certain fault and needs to be dealt with as soon as possible.

[0120] Therefore, data analysis using the first major model helps to prevent problems before they occur, detect faults before they actually happen, and make full use of state data that does not show obvious abnormalities when a fault occurs or is about to occur, thereby improving the accuracy of data analysis and enabling more accurate location of the cause of the fault.

[0121] In one implementation, based on the target prompt information and the state data in the state data group, a first major model is used for data analysis and processing to obtain data analysis results, including: determining the data analysis strategy corresponding to the target prompt information; performing data analysis and processing on each state data in the data state group according to the data analysis strategy to obtain a first analysis result; the first analysis result includes the unprocessed faults of the routing device and the fault categories of the unprocessed faults; selecting a processing method from the candidate fault processing methods of the unprocessed faults based on the first major model to obtain the target fault processing method, and adding the fault category and the target fault processing method to the data analysis results.

[0122] The number of pending faults can be one or more. Pending faults can be faults that have already occurred in the routing device or faults that are about to occur in the routing device.

[0123] A data analysis strategy refers to a series of plans, methods, and steps developed to achieve specific goals during the data analysis process. It encompasses the entire process from data collection, processing, and analysis to result interpretation and application, aiming to maximize the value of data through a systematic approach. Developing a data analysis strategy requires considering business needs, data characteristics, and technical capabilities to ensure the accuracy, effectiveness, and operability of the analysis results.

[0124] Each prompt message corresponds to a data analysis strategy, and the target prompt message is one of several prompt messages. Therefore, determining the data analysis strategy corresponding to the target prompt message can be done by querying the pre-configured correspondence between prompt messages and data analysis strategies based on the target prompt message.

[0125] The data analysis strategy is used to process the data of each state in the data state group to obtain the first analysis result; the first analysis result includes one or more unprocessed faults of the routing device and the fault category of each unprocessed fault.

[0126] For example, the data received by the first model includes: target prompt information 1 and status data group 1, target prompt information 2 and status data group 2, and target prompt information 3 and status data group 3.

[0127] The first major model performs data analysis and processing on each state data in data state group 1 according to the data analysis strategy corresponding to target prompt information 1, and obtains analysis result 1.

[0128] The first major model performs data analysis and processing on each state data in data state group 2 according to the data analysis strategy corresponding to target prompt information 2, and obtains analysis result 2.

[0129] The first major model performs data analysis and processing on each state data in data state group 3 according to the data analysis strategy corresponding to target prompt information 3, and obtains analysis result 3.

[0130] Analysis result 1 indicates that component 1 in the routing device does not have at least one fault. Analysis result 2 indicates that component 2 in the routing device has a fault 1 to be processed, and the fault category of fault 1 to be processed is the first category. Analysis result 3 indicates that component 3 in the routing device has faults 2 and 3 to be processed, and the fault category of fault 2 to be processed is the second category, and the fault category of fault 3 to be processed is the third category.

[0131] Based on analysis results 1, 2, and 3, a first analysis result is generated. The first analysis result includes fault 1, fault 2, and fault 3 to be processed. Fault 1 is classified as category 1, fault 2 as category 2, and fault 3 as category 3.

[0132] For example, the data received by the first major model includes: target prompt information 1 and status data group 1, target prompt information 2 and status data group 2, and target prompt information 3 and status data group 3.

[0133] The first major model performs data analysis and processing on each state data in the data state group according to the data analysis strategy corresponding to the target prompt information 1, and obtains analysis result 1.

[0134] The first major model performs data analysis and processing on each state data in the data state group according to the data analysis strategy corresponding to the target prompt information 2, and obtains analysis result 2.

[0135] The first major model performs data analysis and processing on each state data in the data state group according to the data analysis strategy corresponding to the target prompt information 3, and obtains analysis result 3.

[0136] In this analysis, result 1 indicates that fault 1 does not exist in the routing device. Result 2 indicates that fault 2 exists in the routing device, and fault 2 belongs to the first category. Result 3 indicates that fault 3 does not exist in the routing device.

[0137] Based on analysis results 1, 2, and 3, a first analysis result is generated. This first analysis result includes one unresolved fault, namely fault 2, which falls under the first fault category.

[0138] Based on the first major model, the candidate fault handling methods are selected to obtain the target fault handling method, and the fault category and the target fault handling method are added to the data analysis results.

[0139] Each pending fault can have one or more candidate fault handling methods. Each candidate fault handling method can describe an operational and maintenance technique for resolving the pending fault.

[0140] For example, the candidate fault handling method could be to restart the first component in the routing device, generate a temperature control command to adjust the temperature of the first component, or generate a backup component to start the first component, and so on.

[0141] When there is only one candidate fault handling method, that candidate fault handling method can be directly adopted, that is, the candidate fault handling method is determined as the target fault handling method.

[0142] When there are multiple candidate fault handling methods, the first model can use the selection ability learned during model training to select from multiple candidate fault handling methods to obtain the target fault handling method.

[0143] After identifying the fault to be processed and the corresponding target fault handling method, the fault category and the corresponding target fault handling method can be added to the data analysis results of the status data group.

[0144] In addition, when there are multiple pending faults, the pending faults can be classified based on their fault categories and pre-configured fault category priorities.

[0145] For example, the faults to be processed include: fault 1, fault 2, fault 3, and fault 4. Fault 1 belongs to the first category, fault 2 belongs to the second category, fault 3 belongs to the third category, and fault 4 belongs to the fourth category.

[0146] In the pre-configured fault category priorities, the first category has the highest priority, the fourth category has the lowest priority, and the second and third categories have the same priority. This fault category priority is configured based on the magnitude of the potential safety risk posed by the fault category; a higher priority indicates a greater safety risk, and vice versa.

[0147] After classifying the faults to be processed, two groups of faults to be processed are obtained: the first group of faults to be processed includes fault 1; the second group of faults to be processed includes fault 2, fault 3, and fault 4.

[0148] By classifying pending faults by pre-configured fault category priorities, faults with relatively high risk that need to be dealt with immediately can be grouped into one group, while those with relatively low risk that can be dealt with later can be grouped into another group. In this way, the fault category and fault handling method of the classified pending faults can be transmitted to the fault handling module, so that the fault handling module can prioritize the handling of urgent pending faults.

[0149] In some implementations, the operation and maintenance system may further include a decision processing module. A first intelligent agent performs data analysis on each state data in the data state group according to a data analysis strategy, obtaining a first analysis result. This first analysis result includes the pending faults of the routing device and their fault categories. The decision processing module selects a processing method from the candidate fault handling methods based on a first major model to obtain a target fault handling method, and adds the fault category of the pending fault and the corresponding target fault handling method to the data analysis result. In other words, the first intelligent agent determines the fault category of the pending fault, and the decision processing module determines the fault handling method for each pending fault.

[0150] In addition, the operation and maintenance system for the routing equipment may also include a fault handling module. After the first agent executes step S108 and obtains the data analysis results, the first agent may also transmit the data analysis results to the fault handling module. The fault handling module is used for the operation and maintenance of the routing equipment.

[0151] S110 performs maintenance on routing devices based on data analysis results.

[0152] Step S110 can be executed by the fault handling module in the operation and maintenance system of the routing device.

[0153] When the data analysis results include fault categories and fault handling methods, the operation and maintenance of the routing equipment can be carried out based on the data analysis results. This can be achieved by the fault handling module performing operation and maintenance on the routing equipment according to the fault category and fault handling method.

[0154] If the data analysis results include the maintenance level, fault type, and fault handling method, the routing device can be maintained based on the data analysis results. Alternatively, the target time period for maintenance operations can be determined based on the maintenance level, and the routing device can be maintained based on the fault type and fault handling method within the target time period.

[0155] When there are multiple pending faults, the fault handling module can also determine the priority of each pending fault, determine the fault handling order based on the priority of each pending fault, and then perform operation and maintenance processing on the routing device according to the fault handling order, the fault category of each pending fault, and the fault handling method of each pending fault.

[0156] When there are multiple pending faults and the data analysis results show that each pending fault is divided into multiple groups, the fault handling module can also assign each group of pending faults to different operation and maintenance components in the fault handling module and execute the corresponding operation and maintenance processing.

[0157] For example, the first group of pending faults includes fault 1; the second group includes faults 2, 3, and 4. The first group of pending faults has a relatively higher priority, while the second group has a relatively lower priority. The fault handling module can assign the first group of pending faults to the first maintenance component and the second group to the second maintenance component. Compared to the second maintenance component, the first maintenance component has relatively abundant computing resources and can complete maintenance tasks more quickly.

[0158] The fault handling module can also store the fault handling results in a local database.

[0159] The fault handling module can also decide whether to send an alarm to the user immediately based on priority.

[0160] The fault handling module can also upload the fault handling process record to cloud storage when the operation and maintenance system of the routing device is idle, so as to accumulate more historical operation and maintenance data. The accumulated historical operation and maintenance data can be used to update the parameters of one or more models in the operation and maintenance system of the routing device.

[0161] In one implementation, the operation and maintenance method of the routing device further includes: acquiring fault information; performing intent recognition on the fault information to obtain intent recognition results; based on the intent recognition results and fault information, performing text generation processing using a second major model to obtain target fault text; performing retrieval processing on the target fault text in a first knowledge graph to obtain knowledge retrieval results; and determining target knowledge that matches the intent recognition results from the knowledge retrieval results.

[0162] Fault information can be fault status, fault phenomenon, abnormal alarm information, etc.

[0163] In the process of performing intent recognition on fault information and obtaining the intent recognition result, intent recognition can be achieved using keyword detection. For example, a keyword set can be pre-configured, and it can be determined whether the fault information matches each keyword in the keyword set. If the fault information matches at least one keyword in the keyword set, then the intent information corresponding to the at least one matched keyword is taken as the intent recognition result of the fault information.

[0164] The intent recognition process can be performed on fault information to obtain intent recognition results, or the fault information can be input into a second large model for intent recognition to obtain intent recognition results. That is, the intent recognition capabilities learned by the second large model during model training are used to recognize the intent of fault information.

[0165] The second major model belongs to the target domain, which can be referred to in the corresponding explanation of the first major model mentioned above. The second major model can be a large model trained using training data from the target domain.

[0166] Based on the intent recognition results and fault information, the second major model is used for text generation processing to obtain the target fault text. Alternatively, the intent recognition results and fault information can be input into the second major model for text generation processing to obtain the target fault text. This text generation process utilizes the text generation capabilities learned by the second major model during model training.

[0167] The target fault text and the fault information have the same substantive content, but their linguistic expressions may be the same or different.

[0168] The second major model, text generation, enables text rewriting, which makes the rewritten text more suitable for retrieval processing in subsequent steps.

[0169] For example, fault information includes X1. After inputting the intent recognition result and fault information X1 into the second large model for text generation, the target fault text X1' is obtained. Compared to directly using X1 for retrieval in the first knowledge graph, using X1' for retrieval in the first knowledge graph can obtain more knowledge. This is because fault information is not a retrieval keyword specifically designed for the first knowledge graph. During the retrieval process using fault information in the first knowledge graph, the retrieval module may mistakenly believe that a certain knowledge in the first knowledge graph is unrelated to the fault information due to different expressions of the same concept. Through text rewriting, the fault information can be transformed into an expression more suitable for retrieval in the first knowledge graph, thus improving the retrieval recall rate.

[0170] Recall is an important metric in information retrieval, machine learning, and data analysis. It measures the proportion of a retrieval system or classification model that correctly retrieves or classifies all relevant samples. Recall focuses on "how many relevant results the system can find," that is, how many of all actually relevant samples are correctly retrieved or classified.

[0171] The first knowledge graph can be a knowledge graph for the target domain.

[0172] Knowledge graphs are a structured method of knowledge representation used to describe entities in the real world and the relationships between them. By organizing knowledge into a graph structure, they can support various applications such as semantic search, intelligent question answering, and recommendation systems.

[0173] The first knowledge graph can include multiple nodes, each node representing a piece of knowledge in the target domain.

[0174] To determine the target knowledge that matches the intent recognition result from the knowledge retrieval results, one can input the intent recognition result and the knowledge retrieval result into the first model, select processing options in the knowledge retrieval results, and obtain the target knowledge that matches the intent recognition result.

[0175] In some implementations, the operation and maintenance system may also include a second intelligent agent and a retrieval module. The second intelligent agent receives fault information sent by the first intelligent agent; the second intelligent agent performs intent recognition on the fault information to obtain intent recognition results; the second intelligent agent uses the intent recognition results as intent prompt information, and inputs the intent prompt information and fault information into a second large model for text generation processing to obtain target fault text, which is then transmitted to the retrieval module. The retrieval module then performs retrieval processing based on the target fault text in a first knowledge graph to obtain knowledge retrieval results, which are transmitted to the second intelligent agent; the second intelligent agent identifies target knowledge matching the intent recognition results from the knowledge retrieval results and transmits it to the first intelligent agent.

[0176] The second agent can be an agent within the operation and maintenance system of the routing device. The second agent can operate with the support of a second major model.

[0177] In this embodiment of the application, the first intelligent agent can serve as the center of the operation and maintenance system of the routing device. The first intelligent agent can assign some tasks to other intelligent agents in the operation and maintenance system for processing, and receive the task processing results returned by other intelligent agents.

[0178] In this implementation, the first intelligent agent can assign a portion of the data processing tasks to the second intelligent agent and receive the task processing results returned by the second intelligent agent.

[0179] The first agent can send fault information to the second agent. After receiving the fault information from the first agent, the second agent can perform intent recognition on the fault information to obtain the intent recognition result. The second agent can use the intent recognition result as intent prompt information, and input the intent prompt information and fault information into the second model for text generation processing to obtain the target fault text, which is then transmitted to the retrieval module. The retrieval module then performs retrieval processing based on the target fault text in the first knowledge graph to obtain knowledge retrieval results, which are then transmitted to the second agent.

[0180] After receiving the target fault text, the retrieval module can perform retrieval processing on the first knowledge graph based on the target fault text, obtain the knowledge retrieval results, and transmit them to the second intelligent agent. After receiving the knowledge retrieval results returned by the retrieval module, the second intelligent agent can identify the target knowledge that matches the intent recognition result from the knowledge retrieval results and transmit it to the first intelligent agent.

[0181] The target knowledge can be used as reference information during the process of the first intelligent agent performing data analysis on fault information based on the first major model.

[0182] In one implementation, the search process is performed on the target fault text in a first knowledge graph to obtain knowledge retrieval results. This includes: performing full-text search on the target fault text in the first knowledge graph to obtain a first retrieval result; performing vector search on the target fault text in the first knowledge graph to obtain a second retrieval result; merging the first and second retrieval results to obtain a merged retrieval result; the merged retrieval result includes multiple candidate fault handling methods; and ranking the candidate fault handling methods in the merged retrieval result based on a third major model, and determining the ranked merged retrieval result as the knowledge retrieval result.

[0183] Full-text search is an information retrieval technology used to quickly find documents containing specific keywords or phrases within a large amount of text data.

[0184] Vector retrieval is an information retrieval technique based on a vector space model, used to quickly find vectors similar to the query vector in large-scale, high-dimensional vector data.

[0185] The first search result is obtained by performing a full-text search on the first knowledge graph based on the target fault text; the second search result is obtained by performing a vector search on the first knowledge graph based on the target fault text. Due to the different search methods used, the first and second search results may have some identical content and some different content.

[0186] The first and second search results are merged to obtain the merged search result. This can be done by taking the first search result as one set and the second search result as another set, finding the union of the two sets, and using the union as the merged search result.

[0187] The merged search results can include multiple candidate fault handling methods.

[0188] The third model is used to sort the candidate fault handling methods in the merged search results, and the sorted merged search results are determined as the knowledge retrieval results.

[0189] The third major model belongs to the target domain, which can be referred to in the corresponding description of the first major model mentioned above. The retrieval module can run with the support of the third major model. The third major model is a large model trained using training data from the target domain.

[0190] The sorting of candidate fault handling methods in the merged search results is based on the third major model. This can be achieved by using the sorting ability learned by the third major model during model training to sort the candidate fault handling methods in the merged search results.

[0191] In the merged search results after sorting, the candidate fault handling methods that appear earlier are the candidate fault handling methods with better operation and maintenance effects determined by the third model.

[0192] The reason for sorting is to facilitate the subsequent retrieval module to transmit the knowledge retrieval results to the second agent. The second agent can then more quickly identify the target knowledge that matches the intent recognition result in the knowledge retrieval results and transmit it to the first agent.

[0193] The data processing flow of the second intelligent agent and the retrieval module can be illustrated below with reference to Figure 3. Figure 3 is a partial flowchart of a routing device operation and maintenance method provided in an embodiment of this application.

[0194] As shown in Figure 3, step S302, intent recognition.

[0195] The second agent receives the fault information sent by the first agent; the second agent performs intent recognition on the fault information and obtains the intent recognition result; the second agent uses the intent recognition result as intent prompt information, and inputs the intent prompt information and fault information into the second large model for text generation processing to obtain the target fault text.

[0196] In addition, if the intent recognition result is within the knowledge range of the target domain, the intent recognition result will be transmitted to the retrieval module; otherwise, the fault information will be directly sent to the output buffer.

[0197] Step S304, mixed retrieval.

[0198] The second agent transmits the target fault text to the retrieval module, enabling the retrieval module to perform retrieval processing based on the target fault text in the first knowledge graph, thereby obtaining knowledge retrieval results. The purpose of hybrid retrieval is primarily to improve knowledge coverage.

[0199] Step S306: Full-text search.

[0200] The retrieval module performs full-text retrieval processing on the target fault text in the first knowledge graph to obtain the first retrieval result.

[0201] Step S308, vector retrieval.

[0202] The retrieval module performs vector retrieval processing on the target fault text in the first knowledge graph to obtain the second retrieval result.

[0203] Step S310: Merge and reorder.

[0204] The retrieval module merges the first and second retrieval results to obtain a merged retrieval result. The merged retrieval result includes multiple candidate fault handling methods. Based on the third major model, the candidate fault handling methods in the merged retrieval result are sorted, and the sorted merged retrieval result is determined as the knowledge retrieval result. The knowledge retrieval result is then transmitted to the second intelligent agent.

[0205] Step S312, Intent matching with knowledge.

[0206] Step S314: The target node is retrieved.

[0207] The second agent identifies target knowledge that matches the intent recognition result from the knowledge retrieval results. This target knowledge is the target node.

[0208] Step S316: Take backup measures.

[0209] Step S318: Store in the output buffer.

[0210] If the knowledge retrieval results identify target knowledge that matches the intent recognition result, the second agent sends the target knowledge to the output cache; otherwise, the second agent directly stores the intent recognition result in the output cache. The second agent can also transmit the data stored in the output cache to the first agent.

[0211] In one implementation, after determining the target knowledge that matches the intent recognition result in the knowledge retrieval results, the operation and maintenance method of the routing device further includes: obtaining log information; dividing the log information to obtain multiple log blocks; constructing a data control group based on the log blocks and the target knowledge; each data control group includes a log block and a target knowledge.

[0212] Log information refers to records automatically generated by a system, application, or device during operation, used to record events, states, errors, operations, or other relevant information. Log information is usually stored in text format and includes key information such as timestamps, event types, and descriptions.

[0213] Log information is divided into multiple log blocks. This can be achieved by first preprocessing the log information, including abnormal log records, and then using a pre-trained large model of the target domain to further divide the preprocessed log information into multiple log blocks, each containing at least one abnormal log record. The segmentation capabilities learned through training of this large model can be utilized in this process.

[0214] The data control group is constructed based on log blocks and target knowledge. This can be done by identifying the target knowledge that matches each log block and constructing a data control group based on the log block and the target knowledge that matches it.

[0215] In the case where each target knowledge describes a candidate fault handling method, the target knowledge that matches a log block refers to the fact that using the candidate fault handling method described by the target knowledge to maintain the routing device may restore the abnormal log records in the log block to normal.

[0216] If the candidate fault handling method described by the target knowledge is unrelated to the abnormal log records in the log block, the target knowledge does not match the log block.

[0217] In some implementations, the operation and maintenance system may also include a third intelligent agent; after the second intelligent agent determines the target knowledge that matches the intent recognition result in the knowledge retrieval results and transmits it to the first intelligent agent, the third intelligent agent receives the log information and target knowledge sent by the first intelligent agent; the third intelligent agent divides the log information to obtain multiple log blocks; the third intelligent agent constructs a data control group based on the log blocks and target knowledge and transmits it to the first intelligent agent; each data control group includes a log block and a target knowledge.

[0218] The third agent can be an agent within the operation and maintenance system of the routing device.

[0219] In this embodiment of the application, the first intelligent agent can serve as the center of the operation and maintenance system of the routing device. The first intelligent agent can assign some tasks to other intelligent agents in the operation and maintenance system for processing, and receive the task processing results returned by other intelligent agents.

[0220] In this implementation, the first intelligent agent can assign a portion of the data processing tasks to the third intelligent agent and receive the task processing results returned by the third intelligent agent.

[0221] Considering that logs can greatly assist in data analysis using the primary model, and that log text is typically quite long, a third agent can be configured to process the logs. This third agent can preprocess the received logs, thereby removing normal log information unrelated to the fault and invalid log information.

[0222] The first intelligent agent can transmit log information and target knowledge to the third intelligent agent.

[0223] After the third agent receives the log information and target knowledge sent by the first agent, the third agent can divide the log information into multiple log blocks.

[0224] During the process of partitioning log information by a third-party intelligent agent, the partitioning principle requires that each log block has explicit characteristics. The third-party intelligent agent can first preprocess the log information and then partition it into multiple log blocks. In this way, each log block includes one or more abnormal log records.

[0225] The third agent constructs data control groups based on log blocks and target knowledge. Each data control group includes a log block and a target knowledge.

[0226] The third agent can combine log blocks with knowledge blocks in the target knowledge to form one or more data control groups, for example, [(knowledge block 1, log fragment 1), (knowledge block 2, log fragment 2), ..., (knowledge block n, log fragment n)]. n is an integer greater than 1.

[0227] The third agent can transmit a data control group to the first agent, which can be used as reference information during the first agent's data analysis of fault information based on the first big model.

[0228] The data processing method of the third intelligent agent can be referred to Figure 4. Figure 4 is another partial flowchart of a routing device operation and maintenance method provided in an embodiment of this application.

[0229] As shown in Figure 4, step S402, obtain the log.

[0230] The third agent receives log information and target knowledge sent by the first agent.

[0231] Step S404, log preprocessing.

[0232] The third agent preprocesses the log information to remove normal log information and invalid log information that are irrelevant to the fault.

[0233] Step S406: Split the log.

[0234] The third agent divides the preprocessed log information into multiple log blocks.

[0235] Step S408: Construct a data control group.

[0236] The third agent constructs a data control group based on the log block and the target knowledge and transmits it to the first agent; each data control group includes a log block and a target knowledge.

[0237] In one implementation, after constructing a data control group based on log blocks and target knowledge, the operation and maintenance method of the routing device further includes: determining the log prompt information corresponding to the data control group; judging whether the log blocks in the data control group match the target knowledge in the data control group based on the prompt information; and retaining the data control group based on the judgment result; wherein the judgment result indicates that the log blocks in the data control group match the target knowledge in the data control group.

[0238] Determining the log prompt information corresponding to the data control group can be done by querying the pre-configured correspondence between log identifiers and prompt information in the log block mapping of the data control group.

[0239] To determine whether the log blocks in the data control group match the target knowledge in the data control group, the log prompts, log blocks in the data control group, and target knowledge in the data control group can be input into the fourth model to determine whether the log blocks in the data control group match the target knowledge in the data control group.

[0240] The fourth model belongs to the target domain, which can be found in the corresponding explanation of the first model mentioned above. The fourth model is a large model trained using training data from the target domain.

[0241] The data control group is retained based on the judgment result; where the judgment result indicates that the log blocks in the data control group match the target knowledge in the data control group. The above-mentioned retention of the data control group based on the judgment result can be understood as follows: if the judgment result indicates that the log blocks in the data control group match the target knowledge in the data control group, then the data control group is retained.

[0242] Alternatively, the data control group can be deleted based on the judgment result; where the judgment result indicates that the log blocks in the data control group do not match the target knowledge in the data control group. The above-mentioned deletion of the data control group based on the judgment result can be understood as follows: if the judgment result indicates that the log blocks in the data control group do not match the target knowledge in the data control group, then the data control group is deleted.

[0243] In some implementations, the operation and maintenance system may also include a fourth intelligent agent; after the third intelligent agent constructs a data control group based on log blocks and target knowledge and transmits it to the first intelligent agent, the operation and maintenance method of the routing device further includes: the fourth intelligent agent receiving the data control group sent by the first intelligent agent; the fourth intelligent agent determining the log prompt information corresponding to the data control group, and judging whether the log blocks in the data control group match the target knowledge in the data control group under the prompt of the log prompt information; if the judgment result indicates that the log blocks in the data control group match the target knowledge in the data control group, then the data control group is retained, and the fourth intelligent agent can transmit the data control group to the first intelligent agent.

[0244] The fourth agent can be an agent in the operation and maintenance system of the routing device.

[0245] In this embodiment of the application, the first intelligent agent can serve as the center of the operation and maintenance system of the routing device. The first intelligent agent can assign some tasks to other intelligent agents in the operation and maintenance system for processing, and receive the task processing results returned by other intelligent agents.

[0246] In this implementation, the first intelligent agent can assign a portion of the data processing tasks to the fourth intelligent agent and receive the task processing results returned by the fourth intelligent agent.

[0247] In addition, after the third agent constructs a data control group based on log blocks and target knowledge and transmits it to the first agent, before the fourth agent receives the data control group sent by the first agent, the first agent can perform data analysis processing on the data control group to obtain analysis results, and determine whether to assign tasks to the fourth agent based on the analysis results.

[0248] If the first agent determines to assign a task to the fourth agent, the first agent will send the data control group to the fourth agent; if the first agent determines that it does not need to assign a task to the fourth agent, the first agent can directly use the data control group as reference information in the process of the first agent performing data analysis on fault information based on the first big model.

[0249] After the fourth agent receives the data control group sent by the first agent, the fourth agent can determine the log prompt information corresponding to the data control group, and judge whether the log block in the data control group matches the target knowledge in the data control group based on the prompt information.

[0250] In one instance, the fourth agent can assemble log prompts with a data control group, input the assembled data into the fourth model, and then check whether the log blocks in the fourth model's control group match the target knowledge in the data control group.

[0251] Any prompting word technique can be used in determining the log prompt information, such as COTPrompt.

[0252] The fourth agent can operate with the support of the fourth major model.

[0253] If the judgment result indicates that the log block in the data control group matches the target knowledge in the data control group, the fourth agent retains the data control group and can transmit the retained data control group to the first agent. Conversely, if the judgment result indicates that the log block in the data control group does not match the target knowledge in the data control group, the fourth agent can process the data control group by deleting, masking, or adding invalid tags.

[0254] Through the data processing flow of the fourth intelligent agent, a portion of the log blocks that do not match the target knowledge can be filtered out from multiple data control groups. As a result, the accuracy of the data control group received by the first intelligent agent is higher. When the first intelligent agent uses the first big model to perform data analysis, the data control group can be used as reference information, which helps to improve the accuracy of data analysis.

[0255] The data processing flow of the fourth intelligent agent can be illustrated below with reference to Figure 5.

[0256] Figure 5 is another partial flowchart of a routing device operation and maintenance method provided in an embodiment of this application.

[0257] As shown in Figure 5, in step S502, a data control group is obtained.

[0258] The fourth agent receives the data control group sent by the first agent.

[0259] Step S504: Determine the target prompt information.

[0260] The fourth agent determines the log prompt information corresponding to the data control group.

[0261] Step S506: Input the data into the fourth major model.

[0262] Step S508: Determine the judgment result.

[0263] Following the prompts in the log messages, the fourth agent determines whether the log blocks in the data control group match the target knowledge in the data control group. If the result indicates that the log blocks in the data control group match the target knowledge in the data control group, the fourth agent transmits the result to the first agent.

[0264] In one implementation, the operation and maintenance method of the routing device further includes: obtaining the corresponding log data based on the alarm information; determining the corresponding alarm prompt information based on the alarm information; and performing data analysis and processing using the first major model based on the alarm prompt information, log data, and alarm information to obtain the data analysis results corresponding to the alarm information.

[0265] Based on the alarm information, the corresponding log data can be obtained. This can be done by performing data analysis on the alarm information using the first major model. If the data analysis results indicate a need for log data, then the corresponding log data can be obtained based on the alarm identifier in the alarm information.

[0266] Based on the alarm information, the corresponding alarm prompt information can be determined. This can be done by determining the corresponding alarm prompt information based on the pre-configured correspondence between alarm identifiers and prompt information.

[0267] Based on alarm notifications, log data, and alarm information, the first major model is used for data analysis and processing to obtain the corresponding data analysis results. Alternatively, the alarm notifications, log data, and alarm information can be input into the first major model for data analysis and processing to obtain the corresponding data analysis results.

[0268] In some implementations, the first intelligent agent can receive alarm information sent by the fault handling module; the first intelligent agent obtains corresponding log data based on the alarm information; the first intelligent agent determines corresponding alarm prompt information based on the alarm information; the first intelligent agent performs data analysis and processing using the first big model based on the alarm prompt information, log data, and alarm information to obtain the data analysis results corresponding to the alarm information.

[0269] The operation and maintenance system of the routing equipment includes a fault handling module. This fault handling module can receive proactive anomaly alarms from the operation and maintenance system of the routing equipment in real time. After receiving a proactive anomaly alarm, the fault handling module can transmit the alarm information of the proactive anomaly alarm to the first intelligent agent.

[0270] After receiving the alarm information, the first intelligent agent can perform preliminary data analysis based on the alarm information and determine whether it is necessary to obtain reference information such as logs, operation commands, and business configuration data based on the preliminary analysis results.

[0271] The reference information here refers to the auxiliary information that may need to be input into the first major model when the first intelligent agent performs data analysis on proactive anomaly alarms based on the first major model.

[0272] If the first agent determines that logs need to be obtained based on the preliminary analysis results, the first agent can obtain the corresponding log data based on the alarm information.

[0273] Similarly, if the first intelligent agent determines that an operation command or business configuration data is required based on the preliminary analysis results, it can obtain the corresponding operation command or business configuration data based on the alarm information.

[0274] In the process of obtaining reference information, the operation and maintenance system of the routing device can be called, or the corresponding data can be obtained by notifying the fault handling module through the diagnostic channel.

[0275] Furthermore, taking log data as an example, after acquiring the log data, the first agent can assign a preprocessing task to another agent. Taking the second agent as an example, the first agent can send the acquired log data to the second agent, which will then preprocess the log data and transmit the preprocessed log data back to the first agent. The operations performed during preprocessing include, but are not limited to, cleaning, transformation, segmentation, and storage.

[0276] The first intelligent agent determines the corresponding alarm prompt information based on the alarm information. This can be done by extracting the target alarm category from the alarm information and querying the corresponding alarm prompt information in the pre-configured correspondence between alarm categories and prompt information based on the target alarm category.

[0277] Because of the similar technical concepts, based on alarm prompts, log data, and alarm information, the first major model is used for data analysis and processing to obtain the data analysis results corresponding to the alarm information. Refer to the corresponding explanation section of step S108. The log data and alarm information serve a similar function to the status data group in step S108, and the alarm prompts serve a similar function to the target prompts in step S108.

[0278] The data analysis results can include the fault category corresponding to the alarm information and the fault handling method.

[0279] The first intelligent agent can transmit the data analysis results to the fault handling module, so that the fault handling module can perform operation and maintenance on the routing device according to the fault type and fault handling method. Please refer to the corresponding description of step S110.

[0280] Next, the data processing flow in the operation and maintenance system of the routing device in this implementation can be illustrated with reference to Figure 6. Figure 6 is a data flow diagram of an operation and maintenance system for a routing device provided in an embodiment of this application.

[0281] Referring to Figure 6, a portion of the data processing flow of the operation and maintenance system of the routing device is shown in steps (b1)-(b8) below.

[0282] (b1) When the routing device malfunctions, the fault handling module 606 transmits the alarm information triggered by the malfunction to the first intelligent agent 604.

[0283] (b2) After receiving the alarm information, the first intelligent agent 604 can perform preliminary data analysis based on the alarm information and determine whether it is necessary to obtain reference information such as logs, operation commands, and business configuration data based on the preliminary analysis results.

[0284] (b3) If the first agent 604 determines that it is necessary to obtain logs based on the preliminary analysis results, the first agent 604 may obtain log data based on the alarm information.

[0285] Additionally, during the log data acquisition process, the first agent 604 can notify the fault handling module 606 to acquire log data via a diagnostic channel. In one example, the first agent 604 sends a notification to the fault handling module 606, indicating which log data to acquire. The fault handling module 606 responds to the notification, acquires the log data, and then transmits the acquired log data to the first agent 604.

[0286] (b4) The first agent 604 can send the acquired log data to the second agent 602.

[0287] (b5) The second agent 602 preprocesses the log data and transmits the preprocessed log data to the first agent 604.

[0288] (b6) The first intelligent agent 604 determines the corresponding alarm prompt information based on the alarm information.

[0289] (b7) The first intelligent agent 604 inputs the alarm prompt information, preprocessed log data and alarm information into the first large model for data analysis and processing, and obtains the data analysis results corresponding to the alarm information. The data analysis results corresponding to the alarm information include the fault category and fault handling method corresponding to the alarm information.

[0290] (b8) The first intelligent agent 604 transmits the data analysis results to the fault handling module 606 so that the fault handling module 606 can perform operation and maintenance processing on the routing device according to the fault type and fault handling method.

[0291] In some implementations, users can communicate with the fault handling module in the operation and maintenance system of the routing device through the network management system.

[0292] A network management system is a software or hardware system used to detect, manage, and maintain computer networks. It helps network administrators detect network devices in real time, diagnose problems, optimize performance, ensure network security, and improve network reliability and efficiency.

[0293] The user communicates with the fault handling module through the network management system, which means that the network management system can transmit the user's input instructions to the fault handling module, and the instructions can be recognized and processed by the fault handling module.

[0294] When a user inputs an operation and maintenance command for a specific fault, the fault management module can transfer the operation and maintenance command to the first intelligent agent.

[0295] The faults targeted by this maintenance instruction can be faults that the maintenance system of the routing device itself cannot detect. Considering that the data analysis process of the maintenance system of the routing device often relies on one or more large models in the target domain, and that the large model learns maintenance knowledge and capabilities during the model training process, if some faults occur with extremely low frequency in the historical maintenance data used to train the large model, then the large model may perform poorly in handling these faults. For example, it may not be able to detect these faults, or it may mistake these faults for other common faults. In this case, manual intervention can also be performed on the maintenance system of the routing device.

[0296] For example, if the operation and maintenance system does not detect a failure in the first component of the routing device, but the user enters an operation and maintenance command for a specific failure of the first component, the operation and maintenance system will respond to the command and execute the corresponding operation and maintenance operation.

[0297] After performing preliminary data analysis on the maintenance command, the first intelligent agent can determine whether it needs to obtain reference information such as logs, operation commands, and service configuration data based on the analysis results. If the first intelligent agent determines that reference information is needed based on the analysis results, it can transfer a notification message to the network management system through the fault management module. This notification message prompts the user what data is required to execute the maintenance command. After receiving the notification message, the user can manually input some reference information, which is then transmitted to the first intelligent agent.

[0298] After the first agent obtains the reference information, it can transmit the reference information to the second agent, so that the second agent can preprocess the reference information and then transmit the preprocessed reference information back to the first agent. The preprocessing process can employ one or more of the following operations: cleaning, transformation, segmentation, storage, etc.

[0299] After receiving the preprocessed reference information, the first intelligent agent can determine the operation and maintenance prompt information corresponding to the operation and maintenance instruction. It then inputs the fault information, reference information, and operation and maintenance prompt information carried in the operation and maintenance instruction into the first large model for data analysis and processing, obtains the data analysis results corresponding to the operation and maintenance instruction, and transmits them to the fault handling module.

[0300] The data analysis results may include the fault category and fault handling method corresponding to the operation and maintenance instructions. The process of the fault handling module performing operation and maintenance processing can be referred to the corresponding description section of step S110.

[0301] Next, the data processing flow in the operation and maintenance system of the routing device described above can be illustrated with reference to Figure 7. Figure 7 is another data flow diagram of an operation and maintenance system for a routing device provided in an embodiment of this application.

[0302] Referring to Figure 7, a portion of the data processing flow of the operation and maintenance system of the routing device is shown in steps (c1)-(c9) below.

[0303] (c1) When a user inputs an operation and maintenance command for a certain fault, the network management system 708 transmits the operation and maintenance command to the fault management module 706.

[0304] (c2) The fault management module 706 transmits the maintenance instruction to the first intelligent agent 704.

[0305] When the maintenance command reaches the fault management module 706, it can be approximated that although the fault management module 706 did not discover any fault on its own, it discovered that the routing device was abnormal after being prompted by a human. The fault management module 706 transmits the maintenance command to the first intelligent agent 704, which is similar to the concept of step (b1) in the explanatory part of Figure 6 above.

[0306] (c3) After receiving the operation and maintenance instruction, the first intelligent agent 704 can perform preliminary data analysis and processing based on the operation and maintenance instruction, and determine whether it is necessary to obtain reference information such as logs, operation commands, and business configuration data based on the preliminary analysis results.

[0307] (c4) If the first intelligent agent 704 determines that it is necessary to obtain reference based on the preliminary analysis results, the first intelligent agent 704 may obtain the reference information corresponding to the operation and maintenance instructions according to the operation and maintenance instructions.

[0308] Additionally, during the process of acquiring reference information, the first intelligent agent 704 can notify the fault handling module 706 to acquire the reference information via a diagnostic channel. In one example, the first intelligent agent 704 sends a notification to the fault handling module 706, indicating which reference information to acquire. The fault handling module 706 responds to the notification, acquires the reference information, and then transmits the acquired reference information to the first intelligent agent 704.

[0309] During the process of the fault handling module 706 responding to the notification and obtaining reference information, the fault handling module 706 can transmit the notification to the network management system 708 to prompt the user what reference information needs to be obtained before executing the operation and maintenance instructions. After receiving the reference information input by the user, the network management system 708 transmits the reference information to the fault handling module 706.

[0310] (c5) The first intelligent agent 704 can send the acquired reference information to the second intelligent agent 702.

[0311] (c6) The second intelligent agent 702 preprocesses the reference information and transmits the preprocessed reference information to the first intelligent agent 704.

[0312] (c7) The first intelligent agent 704 determines the corresponding operation and maintenance prompt information according to the operation and maintenance instructions.

[0313] (c8) The first intelligent agent 704 inputs the operation and maintenance prompt information, the pre-processed reference information and the fault information in the operation and maintenance instructions into the first large model for data analysis and processing, and obtains the data analysis results corresponding to the operation and maintenance instructions. The data analysis results corresponding to the operation and maintenance instructions include the fault category and fault handling method corresponding to the operation and maintenance instructions.

[0314] (c9) The first intelligent agent 704 transmits the data analysis results to the fault handling module 706 so that the fault handling module 706 can perform operation and maintenance processing on the routing device according to the fault type and fault handling method.

[0315] In this embodiment, by combining state data, on the one hand, multiple state data matching the data analysis strategy can be aggregated before data analysis processing, which is beneficial to improving data analysis efficiency. On the other hand, a state data may belong to multiple state data groups simultaneously, and this state data can be reused in data analysis processes using different data analysis strategies. This helps to diversify the state data in each state data group and improve the accuracy of data analysis in subsequent steps. In addition, the first large model does not simply perform data queries in historical data during data processing, but uses the data processing capabilities learned during training to process the data received by the first large model. Based on this, the first large model uses the trained data analysis capabilities to perform data analysis based on target prompt information and state data in the state data group, and uses the data analysis results for operation and maintenance. Therefore, the generation efficiency of the data analysis results used for operation and maintenance is extremely high, and the accuracy of the data analysis results can reach a high level through pre-executed model training. The entire operation and maintenance process does not require manual intervention, which can reduce the dependence on manual operation and maintenance and improve the operation and maintenance efficiency of routing equipment.

[0316] Figure 8 is a schematic diagram of the operation and maintenance system of a routing device provided in an embodiment of this application.

[0317] As shown in Figure 8, the hardware structure of the operation and maintenance system of the routing device includes AI computing board 804 and intelligent agent 806.

[0318] It is important to emphasize that intelligent agents can be either hardware or software, depending on their application scenario and implementation method. Software intelligent agents refer to programs or algorithms running in a computer system that can autonomously perform tasks, perceive their environment, make decisions, and interact with other intelligent agents or systems. Hardware intelligent agents refer to physical devices with autonomous behavior capabilities, typically embedding software intelligent agents or working in collaboration with them.

[0319] As shown in Figure 8, agent 806 refers to one or more hardware agents in the operation and maintenance system of the routing device. It can be embedded with the aforementioned software agents, such as the first agent, the second agent, the third agent, the fourth agent, etc., and agent 806 can work collaboratively with the software agents.

[0320] As shown in Figure 8, the AI ​​computing board 804 can be used to support the aforementioned data acquisition module in acquiring the status data of the routing device 802, and the acquired status data is generated in real time. For example, during the operation of the routing device, the status data is generated at the first time point, and the operation and maintenance system of the routing device supports the aforementioned data acquisition module in acquiring the status data at the second time point through the AI ​​computing board 804. The time length between the second time point and the first time point is less than a preset time threshold, so the acquired status data can be approximated as real-time data.

[0321] In addition, the AI ​​computing board 804 can also be used to provide computing support for the intelligent agent 806, support the deployment of machine learning and deep learning models, ensure rapid response of real-time data analysis and fault prediction, etc., as can be seen in the corresponding description section of Figure 2 above.

[0322] The Telecom Big Model 808 is an intelligent decision support source for the operation and maintenance system of routing equipment. It is a large model trained on sample data from the target domain, providing high-level decision support for the system. This model integrates industry knowledge and experience from the target domain, enabling it to analyze fault causes, optimize handling solutions, and generate predictive results. Through continuous learning and updating, the Telecom Big Model 808 adapts to changes in the network environment and, combined with the current network operating status, provides policy suggestions to ensure that the agent 806's decisions are more comprehensive and efficient. Smaller models derived from this large model are periodically injected into agent 806 to ensure the effectiveness and accuracy of the decisions.

[0323] For the Telecom Big Model 808, please refer to the corresponding explanations of the first, second, third, and fourth big models mentioned above.

[0324] The network management system 810 is used to collect all data from the entire routing device's operation and maintenance system according to a preset time period. The collected data is stored in a backend database, providing rich private domain knowledge for the training of the telecom big data model 808. By integrating the network management system 810 into the routing device's operation and maintenance system, the knowledge base can be continuously updated, providing the agent 806 with the latest fault information, thereby improving the response capability of the routing device's operation and maintenance system.

[0325] Figure 9 is a schematic diagram of the interaction between intelligent agents in the operation and maintenance system of a routing device provided in an embodiment of this application.

[0326] As shown in Figure 9, each of the first agent 902, the second agent 904, the third agent 906, and the fourth agent 908 includes a corresponding large model, which belongs to the target domain.

[0327] The fault data extracted by the operation and maintenance system of the routing device can first reach the second intelligent agent 904. The second intelligent agent 904 preprocesses the fault data and transmits the preprocessed fault data to the first intelligent agent 902. The fault data extracted by the operation and maintenance system of the routing device refers to the aforementioned status data. The implementation process of the first intelligent agent 902 in processing the fault data can be referred to the corresponding descriptions of steps S102-S110 above.

[0328] The second intelligent agent 904 can receive fault information sent by the first intelligent agent 902; the second intelligent agent 904 performs intent recognition on the fault information to obtain intent recognition results; the second intelligent agent 904 uses the intent recognition results as intent prompt information, inputs the intent prompt information and fault information into the second big model for text generation processing, obtains the target fault text and transmits it to the retrieval module, so that the retrieval module can perform retrieval processing in the first knowledge graph based on the target fault text, obtain knowledge retrieval results and transmit them to the second intelligent agent; the second intelligent agent 904 determines the target knowledge that matches the intent recognition results in the knowledge retrieval results and transmits it to the first intelligent agent 902.

[0329] The third intelligent agent 906 can receive log information and target knowledge sent by the first intelligent agent 902; the third intelligent agent 906 divides the log information to obtain multiple log blocks; the third intelligent agent 906 constructs a data control group based on the log blocks and target knowledge and transmits it to the first intelligent agent 902; each data control group includes a log block and a target knowledge.

[0330] The fourth intelligent agent 908 can receive the data control group sent by the first intelligent agent 902; the fourth intelligent agent 908 determines the log prompt information corresponding to the data control group, and judges whether the log block in the data control group matches the target knowledge in the data control group under the prompt of the log prompt information; if the judgment result indicates that the log block in the data control group matches the target knowledge in the data control group, the fourth intelligent agent 908 transmits the judgment result to the first intelligent agent 902.

[0331] The first intelligent agent 902 can receive alarm information sent by the fault handling module 910; the first intelligent agent 902 obtains the corresponding log data according to the alarm information; the first intelligent agent 902 determines the corresponding alarm prompt information according to the alarm information; the first intelligent agent 902 inputs the alarm prompt information, log data and alarm information into the first big model for data analysis and processing, and obtains the data analysis results corresponding to the alarm information.

[0332] Figure 10 is a schematic flowchart of a routing device operation and maintenance method provided in an embodiment of this application.

[0333] In step S1002, the data acquisition module collects status data and transmits it to the second intelligent agent.

[0334] In step S1004, the second agent preprocesses the state data and transmits it to the first agent.

[0335] Step S1006: Data analysis of the first intelligent agent.

[0336] Step S1008: The fault handling module performs maintenance operations based on the data analysis results.

[0337] Figure 11 is another schematic flowchart of a routing device operation and maintenance method provided in an embodiment of this application.

[0338] In step S1102, the agent senses an anomaly, triggered by a communication failure log that occurs every 1 second.

[0339] Step S1104: After pre-analysis, notify fault management to retrieve logs.

[0340] Step S1106: Retrieve communication anomaly knowledge from the second intelligent agent.

[0341] Step S1108: Obtain the log.

[0342] Step S1110: Notify the third intelligent agent to process the log.

[0343] Step S1112: The fourth agent performs a control group discrimination.

[0344] Step S1114: Send the judgment result to the first intelligent agent.

[0345] Step S1116: The first agent analyzes and draws a conclusion.

[0346] Step S1118, Fault management execution decision.

[0347] When the temperature of a single board on a routing device fails to be acquired, the communication error log is recorded in real time. However, for low-speed bus detection information such as chassis temperature and fan speed, the detection system of the routing device may have a time cycle of about 5 minutes. After 5 minutes of failed temperature acquisition, the routing device will increase the fan speed to maximum to protect the board, resulting in a sudden increase in noise, which is very noticeable to the user. This will trigger an alarm, requiring human intervention. Maintenance personnel will perform initial manual troubleshooting based on the alarm information; complex problems may require contacting development, significantly prolonging the troubleshooting time. If the cause is found to be a communication error in the IPMC (Intelligent Platform Management Controller), to quickly restore the environment and reduce the impact of the failure, maintenance personnel will generally choose the most direct method of resetting the single board. The system will recover after the single board restarts, but the recovery time is more than 2 hours, and the entire process is highly unpleasant for the customer.

[0348] Using the operation and maintenance method for the routing device provided in this application embodiment, steps S1102-S1118 are executed. Given the powerful processing capabilities of the AI ​​computing board, system logs are read and processed in real time. Similarly, if the temperature acquisition of a single board in the system fails, the first intelligent agent in the operation and maintenance system of the routing device will detect the anomaly within 20 seconds. It will then notify the fault management module to further retrieve logs. After processing by the third and fourth intelligent agents, the logs are sent back to the first intelligent agent. Simultaneously, the first intelligent agent will retrieve graph knowledge related to IPMC communication anomalies from the knowledge base. After in-depth analysis, the first intelligent agent concludes that the IPMC bus is deadlocked, and this can only be resolved by resetting the IPMC. Furthermore, the first intelligent agent determines that the IPMC, as a monitoring board, does not affect services and can be directly reset and restored, with a restart speed within 7 seconds. Ultimately, the first intelligent agent decides to restart the IPMC. After the fault management module executes the decision, the temperature returns to normal, and the system recovers. If recovery fails, an alarm will be triggered again, awaiting human intervention. The entire process takes less than 30 seconds, the fan speed will not be adjusted, and a business alarm will be triggered, truly achieving a seamless experience for the user.

[0349] Figure 12 is another schematic flowchart of a routing device operation and maintenance method provided in an embodiment of this application.

[0350] Step S1202: After pre-analysis, notify fault management to retrieve logs.

[0351] Step S1204: Retrieve SSD (Solid State Drive) knowledge from the second intelligent agent.

[0352] Step S1206: Obtain the log.

[0353] Step S1208: Notify the third intelligent agent to process the log.

[0354] Step S1210: The fourth agent performs a control group discrimination.

[0355] Step S1212: Send the judgment result to the first intelligent agent.

[0356] Step S1214: The first agent analyzes and draws a conclusion.

[0357] Step S1216, Fault management execution decision.

[0358] Router devices frequently perform read and write operations on the SSD during operation, making file system errors a common occurrence. When a file read / write error occurs, the fault management module triggers an anomaly alarm in the network management system. Maintenance personnel then perform initial manual troubleshooting based on the alarm information; complex issues may require contacting development teams, significantly extending the troubleshooting time. Once the cause is identified as a file system error, hard drive repair is necessary. After manual repair, the system recovers, a process that takes over 3 hours in total. During this period, users will experience significant disruptions, and in severe cases, a large number of configuration files may be lost.

[0359] Using the operation and maintenance method for routing devices provided in this application embodiment, and executing steps S1202-S1216 above, when a file read / write failure occurs in the routing device, the fault management module will not immediately trigger an anomaly alarm in the network management system, but will first send it to the first intelligent agent. The first intelligent agent performs pre-analysis and then notifies the fault management module to retrieve logs, including SSD-related logs. After being processed by the third and fourth intelligent agents, the logs are sent back to the first intelligent agent, which retrieves graph knowledge related to file system anomalies from the knowledge base. After in-depth analysis, the first intelligent agent concludes that the file system is abnormal and needs to be repaired through fsck (File System Check). In order not to affect the normal operation of the current routing device, the first intelligent agent needs to consider a more comprehensive recovery plan. The final decision is to create a backup path for read and write operations in the routing device, temporarily store the read and write information in memory, then perform SSD repair operations, and after repair, switch back to the read and write path, while writing back the data generated during the process to the SSD. After the fault management module executes the decision, the file system read and write operations are normal, and the system recovers. If the recovery fails, an alarm will be triggered again to await human intervention. The entire process takes less than 60 seconds, does not affect system file reading and writing, and will not trigger any business alarms, truly achieving a seamless user experience.

[0360] Because the technical concepts are similar, the implementation methods shown in Figures 8-12 are described in a relatively simple way. Please refer to the corresponding explanations above.

[0361] In summary, specific embodiments of this subject matter have been described. Other embodiments are within the scope of the appended claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require a specific or sequential order to achieve the desired result. In some embodiments, multitasking and parallel processing can be advantageous.

[0362] The above are the operation and maintenance methods for routing devices provided in the embodiments of this application. Based on the same idea, the embodiments of this application also provide an operation and maintenance system for routing devices.

[0363] Figure 13 is a schematic block diagram of an operation and maintenance system for a routing device provided in an embodiment of this application.

[0364] As shown in Figure 13, this application embodiment provides a routing device operation and maintenance system 1300, which includes a first intelligent agent 1304, a data acquisition module 1302, and a fault handling module 1306.

[0365] The data acquisition module 1302 is used to collect status data of the routing device and transmit it to the first intelligent agent 1304; the first intelligent agent 1304 is used to combine and process the status data to obtain multiple status data groups; each status data belongs to one or more status data groups; each status data group corresponds to a data category; the first intelligent agent 1304 is also used to derive the target prompt information corresponding to each data status group based on the pre-configured correspondence between prompt information and data category; each prompt information describes a data analysis strategy; the first intelligent agent 1304 is also used to perform data analysis processing using a first large model based on the target prompt information and the status data in the status data group, obtain data analysis results and transmit them to the fault handling module 1036; the fault handling module 1306 is used to perform operation and maintenance processing on the routing device according to the data analysis results.

[0366] In the aforementioned operation and maintenance system 1300, the data acquisition module 1302 collects the status data of the routing device and transmits it to the first intelligent agent 130. Please refer to the corresponding description of step S102 in the aforementioned method embodiment.

[0367] In one embodiment, the operation and maintenance system 1300 further includes a second intelligent agent and a retrieval module; the second intelligent agent acquires fault information; the second intelligent agent performs intent recognition on the fault information to obtain intent recognition results; based on the intent recognition results and the fault information, the second intelligent agent uses a second large model to perform text generation processing to obtain target fault text and transmits it to the retrieval module, so that the retrieval module performs retrieval processing on the first knowledge graph according to the target fault text to obtain knowledge retrieval results and transmits them to the second intelligent agent; the second intelligent agent determines the target knowledge that matches the intent recognition results in the knowledge retrieval results.

[0368] The fault information obtained by the second intelligent agent mentioned above can be issued by the first intelligent agent 1304.

[0369] In addition, after the second agent determines the target knowledge that matches the intent recognition result in the knowledge retrieval results, it can transmit the target knowledge to the first agent 1304.

[0370] In one embodiment, when the retrieval module performs retrieval processing on the first knowledge graph based on the target fault text to obtain knowledge retrieval results, it performs the following steps: performs full-text retrieval processing on the first knowledge graph based on the target fault text to obtain a first retrieval result; performs vector retrieval processing on the first knowledge graph based on the target fault text to obtain a second retrieval result; merges the first retrieval result and the second retrieval result to obtain a merged retrieval result; the merged retrieval result includes multiple candidate fault handling methods; and sorts each candidate fault handling method in the merged retrieval result based on a third major model, and determines the sorted merged retrieval result as the knowledge retrieval result.

[0371] In one embodiment, the operation and maintenance system 1300 further includes a third intelligent agent; wherein: the third intelligent agent acquires log information; the third intelligent agent divides the log information to obtain multiple log blocks; the third intelligent agent constructs a data control group based on the log blocks and target knowledge; each data control group includes a log block and a target knowledge.

[0372] The log information acquired by the third intelligent agent can be issued by the first intelligent agent 1304. Furthermore, the target knowledge used by the third intelligent agent in constructing a data control group based on log blocks and target knowledge can be obtained from the first intelligent agent 1304, and the target knowledge stored by the first intelligent agent 1304 can be obtained from the second embodiment.

[0373] In one embodiment, the operation and maintenance system further includes a fourth intelligent agent; the fourth intelligent agent determines the log prompt information corresponding to the data control group, and judges whether the log blocks in the data control group match the target knowledge in the data control group based on the prompt information; the fourth intelligent agent retains the data control group according to the judgment result; wherein, the judgment result indicates that the log blocks in the data control group match the target knowledge in the data control group.

[0374] In addition, the fourth agent can transmit the retained data control group to the first agent 1304.

[0375] In one embodiment, when the first intelligent agent 1304 performs data analysis processing based on the target prompt information and the state data in the state data group using a first large model to obtain the data analysis result, it performs the following steps: determining the data analysis strategy corresponding to the target prompt information; performing data analysis processing on each state data in the data state group according to the data analysis strategy to obtain a first analysis result; the first analysis result includes the unprocessed faults of the routing device and the fault categories of the unprocessed faults; selecting a processing method from the candidate fault processing methods of the unprocessed faults based on the first large model to obtain the target fault processing method, and adding the fault category and the target fault processing method to the data analysis result.

[0376] In one embodiment, the first intelligent agent 1304 is further configured to perform the following steps: obtain corresponding log data based on alarm information; determine corresponding alarm prompt information based on alarm information; and perform data analysis processing using the first big model based on alarm prompt information, log data, and alarm information to obtain data analysis results corresponding to alarm information.

[0377] The alarm information can be transmitted from the fault handling module 1306 to the first intelligent agent 1304.

[0378] The first intelligent agent 1034 can also transmit the data analysis results corresponding to the alarm information to the fault handling module 1306, so that the fault handling module 1306 can perform operation and maintenance processing on the routing device according to the data analysis results corresponding to the alarm information.

[0379] In summary, the operation and maintenance system for routing devices provided in this application combines and processes state data. On the one hand, it can aggregate multiple state data matching the data analysis strategy before performing data analysis processing, which is beneficial to improving data analysis efficiency. On the other hand, a single state data may belong to multiple state data groups simultaneously, and this single state data can be reused in data analysis processes using different data analysis strategies. This helps to diversify the state data in each state data group and improve the accuracy of data analysis in subsequent steps. In addition, the first major model does not simply perform data queries in historical data during data processing. Instead, it uses the data processing capabilities learned during training to process the data received by the first major model. Based on this, the first major model uses the trained data analysis capabilities to perform data analysis based on target prompt information and state data in the state data group, and uses the data analysis results for operation and maintenance. Therefore, the generation efficiency of the data analysis results used for operation and maintenance is extremely high, and the accuracy of the data analysis results can reach a high level through pre-executed model training. The entire operation and maintenance process does not require manual intervention, which can reduce the dependence on manual operation and maintenance and improve the operation and maintenance efficiency of routing devices.

[0380] In addition, embodiments of this application can also provide another operation and maintenance system for routing devices. This system includes a first intelligent agent and a fault handling module, wherein: the first intelligent agent is used to acquire status data of the routing device; the first intelligent agent is also used to combine and process the status data to obtain multiple status data groups; each status data belongs to one or more status data groups; each status data group corresponds to a data category; the first intelligent agent is also used to derive target prompt information corresponding to each data status group based on a pre-configured correspondence between prompt information and data categories; each prompt information describes a data analysis strategy; the first intelligent agent is also used to perform data analysis processing using a first large model based on the target prompt information and the status data in the status data groups, obtain data analysis results, and transmit them to the fault handling module; the fault handling module is used to perform operation and maintenance processing on the routing device according to the data analysis results.

[0381] In the operation and maintenance system of the other routing device mentioned above, the first intelligent agent is used to obtain the status data of the routing device. This can be understood as: the first intelligent agent is used to receive the status data of the routing device. This status data can be obtained through a data acquisition module. This data acquisition module can belong to the operation and maintenance system of the other routing device, or it can belong to a device or system outside the operation and maintenance system of the other routing device.

[0382] In the operation and maintenance system of another routing device described above, the first intelligent agent obtains the status data of the routing device, which can be referred to the corresponding description of step S102 in the aforementioned method embodiment.

[0383] Because of the similar technical concepts, the implementation of the operation and maintenance system for this other routing device is described in a relatively simple way. Please refer to the corresponding description of the operation and maintenance system 1300 mentioned above.

[0384] Those skilled in the art will understand that the operation and maintenance system of the routing device in Figure 13 and the other operation and maintenance system of the routing device described above can be used to implement the operation and maintenance method of the routing device described above. The detailed descriptions therein should be similar to those in the method section above. To avoid being cumbersome, they will not be repeated here.

[0385] Based on the same idea, this application also provides an electronic device, as shown in FIG14. The electronic device can vary considerably due to differences in configuration or performance, and may include one or more processors 1401 and memory 1402. The memory 1402 may store one or more application programs or data. The memory 1402 may be temporary or persistent storage. The application programs stored in the memory 1402 may include one or more modules (not shown), each module may include a series of computer-executable instructions for the electronic device. Furthermore, the processor 1401 may be configured to communicate with the memory 1402 and execute the series of computer-executable instructions stored in the memory 1402 on the electronic device. The electronic device may also include one or more power supplies 1403, one or more wired or wireless network interfaces 1404, one or more input / output interfaces 1405, and one or more keyboards 1406.

[0386] In this embodiment, the electronic device includes a memory and one or more programs, wherein one or more programs are stored in the memory, and one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the electronic device, configured to be executed by one or more processors. The one or more programs include computer-executable instructions for performing the following: acquiring status data of the routing device; combining the status data to obtain multiple status data groups; each status data belongs to one or more status data groups; each status data group corresponds to a data category; based on a pre-configured correspondence between prompt information and the data category, deriving target prompt information corresponding to each data status group; each prompt information describes a data analysis strategy; based on the target prompt information and the status data in the status data group, performing data analysis processing using a first large model to obtain data analysis results; and performing operation and maintenance processing on the routing device according to the data analysis results.

[0387] In this embodiment, by combining state data, on the one hand, multiple state data matching the data analysis strategy can be aggregated before data analysis processing, which is beneficial to improving data analysis efficiency. On the other hand, a state data may belong to multiple state data groups simultaneously, and this state data can be reused in data analysis processes using different data analysis strategies. This helps to diversify the state data in each state data group and improve the accuracy of data analysis in subsequent steps. In addition, the first large model does not simply perform data queries in historical data during data processing, but uses the data processing capabilities learned during training to process the data received by the first large model. Based on this, the first large model uses the trained data analysis capabilities to perform data analysis based on target prompt information and state data in the state data group, and uses the data analysis results for operation and maintenance. Therefore, the generation efficiency of the data analysis results used for operation and maintenance is extremely high, and the accuracy of the data analysis results can reach a high level through pre-executed model training. The entire operation and maintenance process does not require manual intervention, which can reduce the dependence on manual operation and maintenance and improve the operation and maintenance efficiency of routing equipment.

[0388] This application also proposes a computer-readable storage medium storing one or more computer programs, each including instructions that, when executed by an electronic device including multiple applications, enable the electronic device to perform various processes of the above-described operation and maintenance method embodiment for the routing device, and to perform the following: acquiring status data of the routing device; combining the status data to obtain multiple status data groups; each status data belonging to one or more status data groups; each status data group corresponding to a data category; deriving target prompt information corresponding to each data status group based on a pre-configured correspondence between prompt information and the data category; each prompt information describing a data analysis strategy; performing data analysis processing using a first large model based on the target prompt information and the status data in the status data groups to obtain data analysis results; and performing operation and maintenance processing on the routing device according to the data analysis results.

[0389] In this embodiment, by combining state data, on the one hand, multiple state data matching the data analysis strategy can be aggregated before data analysis processing, which is beneficial to improving data analysis efficiency. On the other hand, a state data may belong to multiple state data groups simultaneously, and this state data can be reused in data analysis processes using different data analysis strategies. This helps to diversify the state data in each state data group and improve the accuracy of data analysis in subsequent steps. In addition, the first large model does not simply perform data queries in historical data during data processing, but uses the data processing capabilities learned during training to process the data received by the first large model. Based on this, the first large model uses the trained data analysis capabilities to perform data analysis based on target prompt information and state data in the state data group, and uses the data analysis results for operation and maintenance. Therefore, the generation efficiency of the data analysis results used for operation and maintenance is extremely high, and the accuracy of the data analysis results can reach a high level through pre-executed model training. The entire operation and maintenance process does not require manual intervention, which can reduce the dependence on manual operation and maintenance and improve the operation and maintenance efficiency of routing equipment.

[0390] This application provides a computer program product, including a computer program, which is executed by a processor to implement the various processes of the above-described operation and maintenance method for routing devices, and can achieve the same technical effect. To avoid repetition, it will not be described again here.

[0391] The systems, apparatuses, modules, or units described in the above embodiments can be implemented by computer chips or entities, or by products having certain functions. A typical implementation device is a computer. In one example, the computer can be, for example, a personal computer, laptop computer, cellular phone, camera phone, smartphone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or any combination of these devices.

[0392] For ease of description, the above devices are described separately by function as various units. Of course, in implementing this application, the functions of each unit can be implemented in one or more software and / or hardware.

[0393] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0394] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in one or more flowchart illustrations and / or one or more block diagrams.

[0395] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means that implement the functions specified in one or more flowcharts and / or one or more block diagrams.

[0396] These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process, such that the instructions, which execute on the computer or other programmable apparatus, provide steps for implementing the functions specified in one or more flowcharts and / or one or more block diagrams.

[0397] In a typical configuration, a computing device includes one or more processors (CPU), input / output interfaces, network interfaces, and memory.

[0398] Memory may include non-persistent storage in computer-readable media, such as random access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.

[0399] Computer-readable media includes both permanent and non-permanent, removable and non-removable media that can store information using any method or technology. Information can be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, digital versatile optical disc (DVD) or other optical storage, magnetic tape, magnetic magnetic disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient computer-readable media, such as modulated data signals and carrier waves.

[0400] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0401] This application can be described in the general context of computer-executable instructions, such as program modules, that are executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform a specific task or implement a specific abstract data type. This application can also be practiced in distributed computing environments where tasks are performed by remote processing devices connected via a communication network. In distributed computing environments, program modules can reside in local and remote computer storage media, including storage devices.

[0402] The various embodiments in this application are described in a progressive manner. Similar or identical parts between embodiments can be referred to mutually. Each embodiment focuses on describing the differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions of the method embodiments.

[0403] The above description is merely an embodiment of this application and is not intended to limit this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the principles of this application should be included within the scope of the claims of this application.

Claims

1. A method for operating and maintaining a routing device, comprising: Obtain the status data of the routing device; The state data is combined to obtain multiple state data groups; Each state data belongs to one or more state data groups; each state data group corresponds to a data category. Based on the pre-configured correspondence between the prompt information and the data category, the target prompt information corresponding to each data state group is obtained; each prompt information describes a data analysis strategy. Based on the target prompt information and the state data in the state data group, the first major model is used to perform data analysis and processing to obtain the data analysis results. Based on the data analysis results, the routing device is subjected to operation and maintenance procedures.

2. The method according to claim 1, wherein, Also includes: Obtain fault information; The fault information is subjected to intent recognition to obtain intent recognition results; Based on the intent recognition result and the fault information, the second major model is used to perform text generation processing to obtain the target fault text. The target fault text is retrieved and processed in the first knowledge graph to obtain knowledge retrieval results. The target knowledge that matches the intent recognition result is determined from the knowledge retrieval results.

3. The method according to claim 2, wherein, The step of performing retrieval processing on the target fault text in the first knowledge graph to obtain knowledge retrieval results includes: Based on the target fault text, a full-text search is performed in the first knowledge graph to obtain the first search result; Based on the target fault text, vector retrieval processing is performed in the first knowledge graph to obtain the second retrieval result; The first search result and the second search result are merged to obtain a merged search result; the merged search result includes multiple candidate fault handling methods. Based on the third major model, the candidate fault handling methods in the merged search results are sorted, and the sorted merged search results are determined as the knowledge search results.

4. The method according to claim 2, wherein, After determining the target knowledge that matches the intent recognition result in the knowledge retrieval results, the method further includes: Get log information; The log information is divided into multiple log blocks; A data control group is constructed based on the log block and the target knowledge; each data control group includes a log block and a target knowledge.

5. The method according to claim 4, wherein, After constructing the data control group based on the log block and the target knowledge, the method further includes: Determine the log prompt information corresponding to the data control group, and determine whether the log block in the data control group matches the target knowledge in the data control group based on the prompt information. The data control group is retained based on the judgment result; wherein, the judgment result indicates that the log block in the data control group matches the target knowledge in the data control group.

6. The method according to claim 1, wherein, Based on the target prompt information and the state data in the state data group, the first large model is used for data analysis and processing to obtain data analysis results, including: Determine the data analysis strategy corresponding to the target prompt information; The data analysis and processing of each state data in the data state group are performed according to the data analysis strategy to obtain a first analysis result; the first analysis result includes the unprocessed faults of the routing device and the fault categories of the unprocessed faults; Based on the first large model, a processing method is selected from the candidate fault processing methods of the fault to be processed to obtain the target fault processing method, and the fault category and the target fault processing method are added to the data analysis results.

7. The method according to claim 1, wherein, The method further includes: Based on the alarm information, obtain the corresponding log data; Based on the alarm information, determine the corresponding alarm prompt information; Based on the alarm notification information, the log data, and the alarm information, the first large model is used for data analysis and processing to obtain the data analysis results corresponding to the alarm information.

8. An operation and maintenance system for a routing device, the operation and maintenance system comprising a first intelligent agent, a data acquisition module, and a fault handling module, wherein: The data acquisition module is used to collect the status data of the routing device and transmit it to the first intelligent agent; The first intelligent agent is used to combine and process the state data to obtain multiple state data groups; each state data belongs to one or more state data groups; each state data group corresponds to a data category. The first intelligent agent is further configured to derive target prompt information corresponding to each data state group based on the pre-configured correspondence between prompt information and the data category; each prompt information describes a data analysis strategy. The first intelligent agent is also used to perform data analysis and processing using the first large model based on the target prompt information and the state data in the state data group, to obtain the data analysis results and transmit them to the fault handling module; The fault handling module is used to perform operation and maintenance on the routing device based on the data analysis results.

9. An electronic device comprising a processor and a memory electrically connected to the processor, the memory storing a computer program, the processor being configured to call and execute the computer program from the memory to implement the operation and maintenance method of a routing device as described in any one of claims 1-7.

10. A computer-readable storage medium for storing a computer program that can be executed by a processor to implement the operation and maintenance method of a routing device as described in any one of claims 1-7.

11. A computer program product comprising a computer program executed by a processor to implement the operation and maintenance method of a routing device as described in any one of claims 1-7.