An operation and maintenance method, a model training method, and related devices

By training the first model using the training dataset, the problem of cross-vendor operation and maintenance was solved, enabling more efficient network device operation and maintenance and improving the identification accuracy and applicability of the operation and maintenance system.

CN122309644APending Publication Date: 2026-06-30HUAWEI TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUAWEI TECH CO LTD
Filing Date
2024-12-31
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Users cannot use one vendor's maintenance system to maintain network equipment from other vendors, which makes maintenance work inconvenient.

Method used

By designing a training dataset related to the second operation and maintenance system, the first model is trained to identify the business function interfaces in the third-party system, thereby achieving cross-vendor operation and maintenance.

Benefits of technology

It improves the convenience for users to maintain network equipment from other vendors, reduces the demand for computing resources, and improves the model's recognition accuracy and generalization ability.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309644A_ABST
    Figure CN122309644A_ABST
Patent Text Reader

Abstract

An operation and maintenance method, a model training method, and related apparatus are disclosed. The operation and maintenance method includes: acquiring a user's input question; inputting the input question into a first model to output an identifier of a target operation and maintenance module; and calling the target operation and maintenance module according to parameter information indicated by the input question to obtain the answer to the input question. The first model is trained on a training dataset related to a second operation and maintenance system. In this application, the first model is trained on a training dataset related to the second operation and maintenance system, so the target operation and maintenance module identified by the first model can be an operation and maintenance module within either the first or second operation and maintenance system. Furthermore, users can use the first operation and maintenance system to operate and maintain network devices managed by other operation and maintenance systems, improving the convenience of user operation and maintenance work.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of network operation and maintenance technology, and in particular to an operation and maintenance method, a model training method, and related devices. Background Technology

[0002] In the field of network operation and maintenance technology, the network equipment maintained by users may generally come from different manufacturers, and each manufacturer has its own independent operation and maintenance system. Therefore, users cannot maintain network equipment from other manufacturers through the operation and maintenance system of one manufacturer, which brings inconvenience to users' operation and maintenance work. Summary of the Invention

[0003] To address the aforementioned issues, this application provides an operation and maintenance method, a model training method, and related apparatus. By designing a training dataset related to a second operation and maintenance system, a first model of the first operation and maintenance system is trained. The trained first model can identify the business function interfaces of third-party systems, thereby enabling users to operate and maintain network equipment from other vendors through the first operation and maintenance system.

[0004] Therefore, this application provides the following technical solution:

[0005] Firstly, this application provides an operation and maintenance method applied to a first operation and maintenance system, comprising: obtaining a user's input question; inputting the input question into a first model to output an identifier of a target operation and maintenance module; and invoking the target operation and maintenance module according to parameter information indicated by the input question to obtain an answer to the input question. The first model is trained on a training dataset, which is related to a second operation and maintenance system. The input question includes an operation and maintenance task and parameter information for executing the task; for example, the input question could be "I need to view the routing information of device name HW002," where "view routing information" is the operation and maintenance task, and "device name HW002" is the parameter information.

[0006] In this application, the first model is trained using a training dataset related to the second operation and maintenance system. Therefore, the identifier of the target operation and maintenance module identified by the first model can be an operation and maintenance module within either the first or second operation and maintenance system. Consequently, users can use the first operation and maintenance system to operate and maintain network devices managed by other operation and maintenance systems, improving the convenience of user operation and maintenance work.

[0007] In one possible implementation, the number of parameters in the first model is less than a first threshold.

[0008] In this implementation, the number of parameters in the first model is less than a first threshold, such as 1 billion. It is understood that the training process of the first model can be implemented using the user's existing computing resources. Furthermore, given the user's limited existing computing resources, the user can generate a training dataset related to the second operations and maintenance system to train the first model of the first operations and maintenance system.

[0009] In one possible implementation, the method further includes: inputting an input question into a second model, the second model generating target parameter information based on the parameter information indicated by the input question, wherein the number of parameters in the second model is greater than a second threshold; and invoking a target operation and maintenance module based on the parameter information indicated by the input question to obtain the answer to the input question, including: invoking the target operation and maintenance module based on the target parameter information to obtain the answer to the input question.

[0010] In this implementation, the number of parameters in the second model is greater than the second threshold. The second threshold and the first threshold can be the same or different thresholds. The fact that the second model has more parameters than the second threshold suggests that it is a large model with excellent generalization capabilities. For example, the second model could be a large language model. Therefore, the second model has excellent context learning capabilities, can understand the input question well, and thus generates highly accurate target parameter information.

[0011] In one possible implementation, the above-mentioned inputting the input question into the second model to generate target parameter information includes: determining a target parameter extraction example corresponding to the input question from at least one parameter extraction example; using the target parameter extraction example and the input question as input to the second model, and with reference to the parameter extraction example, the second model outputs the target parameter information.

[0012] In this implementation, the parameter extraction example serves as a reference for the second model to extract target parameter information from the input problem, thereby improving the accuracy of parameter extraction by the second model.

[0013] In one possible implementation, the method further includes: obtaining a sample set, which includes multiple parameter extraction samples; performing clustering processing on the multiple parameter extraction samples in the sample set, selecting at least one parameter extraction sample from each category, and obtaining the above-mentioned at least one parameter extraction example.

[0014] In this implementation, multiple parameter extraction samples in the sample set are clustered into multiple categories. Then, one or more representative extraction examples are selected from each category as parameter extraction examples. This ensures the diversity of parameter extraction examples while reducing their number, thus meeting the input text length limit of large language models.

[0015] In one possible implementation, the method further includes: obtaining a description file of a second operation and maintenance system, the description file indicating parameter information of at least one operation and maintenance module in the second operation and maintenance system, the operation and maintenance module representing a functional module that performs operation and maintenance tasks; based on the description file, calling a second model to generate an operation and maintenance problem, the number of parameters of the second model being greater than a second threshold; and obtaining a training dataset based on the operation and maintenance problem.

[0016] In this implementation, the training dataset for the first model is obtained using the second model, which has a larger number of parameters than a second threshold. The second model is, for example, a large language model. This is because the training dataset generated by the second model is diverse, ensuring the accuracy of the first model's output. Furthermore, the operational problems generated by the large language model are closer to the spoken language habits of natural language, thus leading to higher accuracy for the first model in recognizing colloquial input problems.

[0017] In one possible implementation, the aforementioned description file includes parameters of the operation and maintenance module and examples of parameter values. The step of calling the second model based on the description file to generate an operation and maintenance problem includes: selecting at least one parameter from the parameters in the description file to obtain a parameter group; assigning values ​​to the parameters in the parameter group according to the parameter value examples in the description file to obtain parameter group information; and using the parameter group information as input to the second model to generate the operation and maintenance problem.

[0018] In this implementation, the description file of the second maintenance system typically contains many parameters, but users cannot provide all of them when asking questions in natural language. By selecting some parameters from the description file and assigning values, the parameter set information for the input large language model is obtained. This results in maintenance questions with fewer parameters, better reflecting the user's natural language expression, and improving the recognition accuracy of the first model.

[0019] In one possible implementation, the description file also includes problem examples. The process of calling the second model to generate operation and maintenance problems based on the description file includes: calling the second model based on the description file and the problem examples, and generating operation and maintenance problems with reference to the problem examples.

[0020] In this implementation, the description file can be made available to users through the client. Users can set some problem examples through the client to guide the process of generating operation and maintenance problems by the large language model, so that the operation and maintenance problems are more in line with the user's specific language habits.

[0021] In one possible implementation, the training dataset further includes a modified operation and maintenance problem, which is obtained by modifying the parameter values ​​in the operation and maintenance problem. The method further includes: replacing the parameter values ​​in the operation and maintenance problem with example parameter values ​​from the description file to obtain the modified operation and maintenance problem; and / or, modifying the parameter values ​​in the operation and maintenance problem according to preset rules to obtain the modified operation and maintenance problem.

[0022] In this implementation, modified operation and maintenance (O&M) problems are obtained by altering the parameter values ​​in the O&M problem. Adding these modified O&M problems to the training dataset ensures the diversity of the training dataset.

[0023] In one possible implementation, the method further includes: determining whether there exists a first operation and maintenance module and a second operation and maintenance module that perform the same operation and maintenance task; if the determination result is that they exist, setting a first training subset in the training dataset; setting specified information in the operation and maintenance issues in the first training subset, wherein the truth value of the operation and maintenance module in the first training subset is either the first operation and maintenance module or the second operation and maintenance module, and the specified information is used to specify the operation and maintenance system in which the functional module performing the operation and maintenance task is located.

[0024] In this implementation, if the second and first operation and maintenance systems contain operation and maintenance modules performing the same operation and maintenance tasks, at least one of the second and first training subsets can be set in the training dataset. The operation and maintenance questions in the second training subset do not include system-specified information, while the operation and maintenance questions in the first training subset do. Thus, regardless of whether the input question contains system-specified information or not, the first model can identify the target operation and maintenance module corresponding to the input question. Furthermore, the first model can identify stable operation and maintenance modules based on the input question.

[0025] In one possible implementation, the method further includes: obtaining a verification dataset; verifying the trained first model based on the verification dataset to obtain verification results, the verification results including the recognition accuracy and conflict error rate of the module to be identified, the module to be identified being one of the first operation and maintenance module and the second operation and maintenance module, the conflict error rate representing the error rate at which the module to be identified is identified as a conflict module, and the conflict module representing an operation and maintenance module other than the module to be identified in the first operation and maintenance module and the second operation and maintenance module; when the recognition accuracy is lower than a first preset threshold and the conflict error rate is greater than a second preset threshold, adjusting the ratio of the training samples corresponding to the module to be identified and the conflict module in the training dataset to obtain an adjusted training dataset; and retraining the first model based on the adjusted training dataset.

[0026] In this implementation, when the module to be identified is identified as a conflicting module, the ratio of training samples corresponding to the module to be identified and the conflicting modules in the training dataset can be adjusted, and the first model can be retrained. This reduces the likelihood of the module to be identified being identified as a conflicting module, ensuring the accuracy of the first model's identification results.

[0027] In one possible implementation, the above adjustment of the proportion of training samples corresponding to the module to be identified and the conflicting module in the training dataset includes: reducing the proportion of samples of low-priority modules in the training dataset to obtain an adjusted training dataset, wherein the low-priority modules are the operation and maintenance modules with lower priority among the modules to be identified and the conflicting modules.

[0028] In this implementation, the priority of the operation and maintenance modules can be set so that the first model can identify the operation and maintenance modules with higher priority, thereby solving the problem of unstable identification results of modules with the same operation and maintenance functions.

[0029] Secondly, this application provides a model training method applied to a first operation and maintenance system, comprising: obtaining a description file of a second operation and maintenance system, the description file indicating parameter information of at least one operation and maintenance module in the second operation and maintenance system, the operation and maintenance module representing a functional module that performs operation and maintenance tasks; according to the description file, calling a second model to generate an operation and maintenance problem, the number of parameters of the second model being greater than a second threshold; obtaining a training dataset according to the operation and maintenance problem; and training a first model according to the training dataset to obtain a trained first model, the trained first model being used to identify the identifier of the target operation and maintenance module corresponding to the input problem, the target operation and maintenance module representing a functional module that performs the operation and maintenance task represented by the input problem.

[0030] In one possible implementation, the aforementioned description file includes parameters of the operation and maintenance module and examples of parameter values. The step of calling the second model based on the description file to generate an operation and maintenance problem includes: selecting at least one parameter from the parameters in the description file to obtain a parameter group; assigning values ​​to the parameters in the parameter group according to the parameter value examples in the description file to obtain parameter group information; and using the parameter group information as input to the second model to generate the operation and maintenance problem.

[0031] In one possible implementation, the description file also includes problem examples. The process of calling the second model to generate operation and maintenance problems based on the description file includes: calling the second model based on the description file and the problem examples, and generating operation and maintenance problems with reference to the problem examples.

[0032] In one possible implementation, the training dataset further includes a modified operation and maintenance problem, which is obtained by modifying the parameter values ​​in the operation and maintenance problem. The method further includes: replacing the parameter values ​​in the operation and maintenance problem with example parameter values ​​from the description file to obtain the modified operation and maintenance problem; and / or, modifying the parameter values ​​in the operation and maintenance problem according to preset rules to obtain the modified operation and maintenance problem.

[0033] In one possible implementation, the method further includes: determining whether there exists a first operation and maintenance module and a second operation and maintenance module that perform the same operation and maintenance task; if the determination result is that they exist, setting a first training subset in the training dataset; setting specified information in the operation and maintenance issues in the first training subset, wherein the truth value of the operation and maintenance module in the first training subset is either the first operation and maintenance module or the second operation and maintenance module, and the specified information is used to specify the operation and maintenance system in which the functional module performing the operation and maintenance task is located.

[0034] In one possible implementation, the method further includes: obtaining a verification dataset; verifying the trained first model based on the verification dataset to obtain verification results, the verification results including the recognition accuracy and conflict error rate of the module to be identified, the module to be identified being one of the first operation and maintenance module and the second operation and maintenance module, the conflict error rate representing the error rate at which the module to be identified is identified as a conflict module, and the conflict module representing an operation and maintenance module other than the module to be identified in the first operation and maintenance module and the second operation and maintenance module; when the recognition accuracy is lower than a first preset threshold and the conflict error rate is greater than a second preset threshold, adjusting the ratio of the training samples corresponding to the module to be identified and the conflict module in the training dataset to obtain an adjusted training dataset; and retraining the first model based on the adjusted training dataset.

[0035] In one possible implementation, the above adjustment of the proportion of training samples corresponding to the module to be identified and the conflicting module in the training dataset includes: reducing the proportion of samples of low-priority modules in the training dataset to obtain an adjusted training dataset, wherein the low-priority modules are the operation and maintenance modules with lower priority among the modules to be identified and the conflicting modules.

[0036] Thirdly, this application provides an operation and maintenance device applied to a first operation and maintenance system, comprising: a first acquisition module for acquiring user input questions; a first processing module for inputting the input questions into a first model to output an identifier of a target operation and maintenance module; and invoking the target operation and maintenance module according to parameter information indicated by the input questions to obtain the answer to the input questions. The first model is trained on a training dataset, which is related to a second operation and maintenance system. For example, the input questions are generated in natural language and include operation and maintenance tasks and parameter information for executing the tasks.

[0037] In one possible implementation, the number of parameters in the first model is less than a first threshold.

[0038] In one possible implementation, the first processing module is further configured to: input the input question into a second model, the second model generating target parameter information based on the parameter information indicated by the input question, wherein the number of parameters in the second model is greater than a second threshold; and, based on the parameter information indicated by the input question, invoke a target operation and maintenance module to obtain the answer to the input question, including: invoking the target operation and maintenance module based on the target parameter information to obtain the answer to the input question.

[0039] In one possible implementation, the first processing module is specifically used to: determine a target parameter extraction example corresponding to the input problem from at least one parameter extraction example; use the target parameter extraction example and the input problem as input to the second model, and output target parameter information with reference to the target parameter extraction example.

[0040] In one possible implementation, the first processing module is further configured to: obtain a sample set, the sample set including multiple parameter extraction samples; perform clustering processing on the multiple parameter extraction samples in the sample set, select at least one parameter extraction sample from each category, and obtain the above-mentioned at least one parameter extraction example.

[0041] In one possible implementation, the first processing module is further configured to: obtain a description file of the second operation and maintenance system, the description file indicating parameter information of at least one operation and maintenance module in the second operation and maintenance system, the operation and maintenance module representing a functional module that performs operation and maintenance tasks; based on the description file, call the second model to generate an operation and maintenance problem, the number of parameters of the second model being greater than a second threshold; and obtain a training dataset based on the operation and maintenance problem.

[0042] In one possible implementation, the aforementioned description file includes parameters of the operation and maintenance module and examples of parameter values. The first processing module is specifically configured to: select at least one parameter from the parameters in the description file to obtain a parameter group; assign values ​​to the parameters in the parameter group according to the example parameter values ​​in the description file to obtain parameter group information; and use the parameter group information as input to the second model to generate an operation and maintenance problem.

[0043] In one possible implementation, the description file also includes problem examples. The first processing module is specifically used to: based on the description file and the problem examples, invoke the second model, which generates operational problems with reference to the problem examples.

[0044] In one possible implementation, the training dataset also includes a modified operation and maintenance problem, which is obtained by modifying the parameter values ​​in the operation and maintenance problem. The first processing module is further configured to: replace the parameter values ​​in the operation and maintenance problem with the parameter value examples in the description file to obtain the modified operation and maintenance problem; and / or, modify the parameter values ​​in the operation and maintenance problem according to preset rules to obtain the modified operation and maintenance problem.

[0045] In one possible implementation, the first processing module is further configured to: determine whether there exists a first operation and maintenance module and a second operation and maintenance module that perform the same operation and maintenance task; if the determination result is that there exists, set a first training subset in the training dataset; set specified information in the operation and maintenance issues in the first training subset, wherein the truth value of the operation and maintenance module in the first training subset is either the first operation and maintenance module or the second operation and maintenance module, and the specified information is used to specify the operation and maintenance system in which the functional module performing the operation and maintenance task is located.

[0046] In one possible implementation, the first processing module is further configured to: acquire a verification dataset; verify the trained first model based on the verification dataset to obtain verification results, the verification results including the recognition accuracy and conflict error rate of the module to be identified, the module to be identified being one of the first operation and maintenance module and the second operation and maintenance module, the conflict error rate representing the error rate at which the module to be identified is identified as a conflict module, and the conflict module representing an operation and maintenance module other than the module to be identified in the first operation and maintenance module and the second operation and maintenance module; when the recognition accuracy is lower than a first preset threshold and the conflict error rate is greater than a second preset threshold, adjust the ratio of the training samples corresponding to the module to be identified and the conflict module in the training dataset to obtain an adjusted training dataset; and retrain the first model based on the adjusted training dataset.

[0047] In one possible implementation, the first processing module is specifically used to: reduce the proportion of low-priority modules in the training dataset to obtain an adjusted training dataset, wherein the low-priority modules are the operation and maintenance modules with lower priority among the modules to be identified and the conflicting modules.

[0048] Fourthly, this application provides a model training apparatus applied to a first operation and maintenance system, comprising: a second acquisition module for acquiring a description file of the second operation and maintenance system, the description file indicating parameter information of at least one operation and maintenance module in the second operation and maintenance system, the operation and maintenance module representing a functional module that performs operation and maintenance tasks; a second processing module for, according to the description file, calling a second model to generate an operation and maintenance problem, the second model having more than a second threshold number of parameters; obtaining a training dataset based on the operation and maintenance problem; and training a first model based on the training dataset to obtain a trained first model, the trained first model being used to identify the identifier of the target operation and maintenance module corresponding to the input problem, the target operation and maintenance module representing a functional module that performs the operation and maintenance task represented by the input problem.

[0049] In one possible implementation, the aforementioned description file includes parameters of the operation and maintenance module and examples of parameter values. The second processing module is specifically used for: selecting at least one parameter from the parameters in the description file to obtain a parameter group; assigning values ​​to the parameters in the parameter group according to the example parameter values ​​in the description file to obtain parameter group information; and using the parameter group information as input to the second model to generate an operation and maintenance problem.

[0050] In one possible implementation, the description file also includes problem examples. The second processing module is specifically used to: based on the description file and the problem examples, invoke a second model, which generates operational issues with reference to the problem examples.

[0051] In one possible implementation, the training dataset also includes a modified operation and maintenance problem, which is obtained by modifying the parameter values ​​in the operation and maintenance problem. The second processing module is further configured to: replace the parameter values ​​in the operation and maintenance problem with the parameter value examples in the description file to obtain the modified operation and maintenance problem; and / or, modify the parameter values ​​in the operation and maintenance problem according to preset rules to obtain the modified operation and maintenance problem.

[0052] In one possible implementation, the second processing module is further configured to: determine whether there is a first operation and maintenance module and a second operation and maintenance module that perform the same operation and maintenance task; if the determination result is that there is, set a first training subset in the training dataset; set specified information in the operation and maintenance issues in the first training subset, wherein the truth value of the operation and maintenance module in the first training subset is either the first operation and maintenance module or the second operation and maintenance module, and the specified information is used to specify the operation and maintenance system in which the functional module performing the operation and maintenance task is located.

[0053] In one possible implementation, the second processing module is further configured to: acquire a verification dataset; verify the trained first model based on the verification dataset to obtain verification results, the verification results including the recognition accuracy and conflict error rate of the module to be identified, the module to be identified being one of the first and second operation and maintenance modules, the conflict error rate representing the error rate at which the module to be identified is identified as a conflict module, and the conflict module representing an operation and maintenance module other than the module to be identified in the first and second operation and maintenance modules; when the recognition accuracy is lower than a first preset threshold and the conflict error rate is greater than a second preset threshold, adjust the ratio of the training samples corresponding to the module to be identified and the conflict module in the training dataset to obtain an adjusted training dataset; and retrain the first model based on the adjusted training dataset.

[0054] In one possible implementation, the second processing module is specifically used to: reduce the proportion of low-priority modules in the training dataset to obtain an adjusted training dataset, wherein the low-priority modules are the operation and maintenance modules with lower priority among the modules to be identified and the conflicting modules.

[0055] Fifthly, this application provides an operation and maintenance architecture, including a terminal and a server. The terminal is used to obtain user input questions, and the server is used to execute the methods described above based on the input questions.

[0056] Sixthly, this application provides a computing device, including a memory and a processor. The memory stores instructions that, when executed by the processor, cause the aforementioned method to be implemented.

[0057] Seventhly, this application provides a computer-readable storage medium having a computer program stored thereon. When the computer program is executed by a processor, the above-described method is implemented.

[0058] Eighthly, this application provides a computer program product including program instructions that, when executed by a computer, cause the computer to perform the above-described method. Attached Figure Description

[0059] Figure 1 This is a schematic diagram illustrating an application scenario of user operation and maintenance network equipment in related technologies;

[0060] Figure 2 This is a schematic diagram illustrating an application scenario of the user operation and maintenance network equipment provided in this application embodiment;

[0061] Figure 3a A flowchart illustrating the first technical solution provided in this application embodiment;

[0062] Figure 3b A flowchart illustrating the second technical solution provided in this application embodiment;

[0063] Figure 4 A flowchart illustrating an operation and maintenance method provided in an embodiment of this application;

[0064] Figure 5 A schematic diagram illustrating the generation of target parameter information for the second model provided in this application embodiment;

[0065] Figure 6 A schematic diagram illustrating an example of obtaining at least one parameter extraction through clustering, as provided in an embodiment of this application;

[0066] Figure 7 A schematic flowchart illustrating a model training method provided in an embodiment of this application;

[0067] Figure 8a A schematic diagram illustrating the training of a first model based on a training dataset generated from a second model, provided in an embodiment of this application;

[0068] Figure 8b A schematic diagram illustrating the training process of the first model provided in an embodiment of this application;

[0069] Figure 9 This is a schematic diagram illustrating the determination of parameter group information provided in an embodiment of this application;

[0070] Figure 10 The second model provided in this application provides a schematic diagram of generating an operation and maintenance problem with reference to a problem example;

[0071] Figure 11 A schematic diagram illustrating the generation and modification of operation and maintenance issues based on operation and maintenance issues, provided for embodiments of this application;

[0072] Figure 12 A schematic diagram of the second training subset and the first training subset provided in the embodiments of this application;

[0073] Figure 13a A schematic diagram illustrating the verification results provided in an embodiment of this application;

[0074] Figure 13b A schematic diagram illustrating the adjustment of the training dataset provided in an embodiment of this application;

[0075] Figure 14 A schematic diagram of the hardware architecture of a first operation and maintenance system provided in this application embodiment;

[0076] Figure 15 A schematic diagram of the software architecture of a first operation and maintenance system provided in an embodiment of this application;

[0077] Figure 16 This is a schematic diagram of the composition of an operation and maintenance device provided in an embodiment of this application;

[0078] Figure 17 This is a schematic diagram of the composition of a model training device provided in an embodiment of this application;

[0079] Figure 18 This is a schematic diagram of the structure of a computing device provided in an embodiment of this application. Detailed Implementation

[0080] The technical solutions in the embodiments of this application will now be described with reference to the accompanying drawings.

[0081] In this article, the term "and / or" describes the relationship between related objects, indicating that there can be three relationships. For example, A and / or B can represent three cases: A exists alone, A and B exist simultaneously, and B exists alone. The symbol " / " in this article indicates that the related objects have an "or" relationship; for example, A / B means A or B.

[0082] The terms "first" and "second," etc., used in the specification and claims herein are used to distinguish different objects, not to describe a specific order of objects. For example, "first response message" and "second response message," etc., are used to distinguish different response messages, not to describe a specific order of response messages.

[0083] In the embodiments of this application, the terms "exemplary" or "for example" are used to indicate that something is an example, illustration, or description. Any embodiment or design that is described as "exemplary" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design. Specifically, the use of the terms "exemplary" or "for example" is intended to present the relevant concepts in a specific manner.

[0084] In the description of the embodiments of this application, unless otherwise stated, "multiple" means two or more, for example, multiple processing units means two or more processing units, multiple elements means two or more elements, etc.

[0085] To facilitate understanding of the solutions provided in the embodiments of this application, a brief introduction to some of the terms involved in this solution will be given first.

[0086] BERT (Bidirectional Encoder Representation from Transformers) model: This is a pre-trained model for language representation. BERT employs specific training strategies to generate deep bidirectional language representation models.

[0087] Large Language Models (LLMs): Also known as large models or large language models, LLMs typically refer to neural network models containing an extremely large number of parameters (usually over a billion). They possess the following characteristics: First, they have a massive parameter scale, containing billions of parameters and reaching sizes of hundreds of gigabytes or even larger. This enormous model scale provides them with powerful expressive and learning capabilities. Second, they can learn multiple tasks simultaneously. LLMs often learn multiple different Natural Language Processing (NLP) tasks together, such as machine translation, text summarization, and question answering systems. Multi-task learning allows the model to learn broader and more generalized language understanding abilities. Third, they require powerful computing resources. Training large language models typically requires hundreds or even thousands of graphics processing units (GPUs) and a significant amount of time, usually ranging from weeks to months. Powerful computing resources can accelerate the training process while preserving the capabilities of the large language model. Fourth, they require abundant data. Large language models require vast amounts of data for training; only with a large amount of data can the advantages of their parameter scale be fully realized. Furthermore, large language models are widely used in the field of natural language processing (NLP) and are fundamentally changing the landscape of NLP tasks, giving rise to more powerful and intelligent language technologies. Large language models are one of the important directions in the development of artificial intelligence (AI). At the same time, large language models also possess the ability to perform exceptionally well in various NLP tasks, such as text classification, sentiment analysis, summarization, and translation. Large language models can be used in multiple application areas, including automatic writing, chatbots, virtual assistants, voice assistants, and automatic translation.

[0088] Large language models employ token constraints: a token can be a word, a part of a word, or even a single character, depending on the tokenization method used. Humans calculate text length by word count, while large language models calculate text length by tokens. For large language models, the longer the input text, the more difficult it is to maintain sufficient attention and to fully comprehend it. Furthermore, processing long texts requires significant computational power; the computational cost of the self-attention mechanism increases quadratically with text length, thus increasing costs. Therefore, large language models employ token constraints to limit the length of the input text.

[0089] Artificial Intelligence Agent (AI Agent): Also known as intelligent agent, AI agent, etc., it is an artificial intelligence system that simulates human intelligent behavior, with a large language model (LLM) as its core engine. An AI Agent can perceive its environment, make decisions, and execute tasks to achieve specific goals. The design philosophy is to endow machines with autonomy, adaptability, and interactivity, enabling them to operate independently in complex and ever-changing environments.

[0090] An Application Programming Interface (API) is a calling interface provided by an application to developers. Developers use the API to access the application during programming without needing to access the application's source code or understand the details of its internal workings. For example, an application's API can be a set of predefined functions, methods, classes, and protocols, allowing developers to leverage these predefined functionalities to implement their own applications without writing all the code from scratch. One of the main functions of an API is to provide a set of common functionalities, enabling developers to quickly integrate and use these functionalities. For instance, an operations and maintenance system can implement multiple business functions, such as querying Address Resolution Protocol (ARP) entries, querying routing information, and querying interface information. Each business function provides a user with an API interface so that users can call the API to implement the business function and maintain network devices. Users can also integrate API interfaces of different business functions to achieve unified management of operations and maintenance work.

[0091] Third-party systems: These can also be understood as third-party systems or third-party operation and maintenance systems, referring to operation and maintenance systems provided by a third party other than the primary operation and maintenance system. Taking data center networks as an example, data center networks involve network equipment such as switches, firewalls, and routers from different manufacturers, each with its own independent operation and maintenance system; relative to a primary operation and maintenance system, the operation and maintenance systems outside of that primary system are third-party systems.

[0092] Operations and maintenance (O&M) refers to the maintenance of established network hardware and software systems. Its purpose is to ensure the normal operation and launch of services. During system operation, it involves maintaining the equipment within the system. Personnel performing O&M work are also called O&M engineers. The main task of O&M engineers is to ensure the secure and stable operation of websites and software services. The core of their work is to ensure the stable operation of products after launch, quickly resolve various problems that arise during this period, and continuously optimize the system architecture and deployment rationality in their daily work to improve system services.

[0093] Intelligent Operations and Maintenance (O&M): By introducing artificial intelligence and automation technologies, O&M work has been automated and made intelligent, improving efficiency, reducing costs, optimizing resource utilization, and enhancing system stability and security. Interactive O&M systems represent a key development direction and capability for intelligent network O&M. These systems utilize generative large language models for intent understanding and knowledge extraction, providing users with a natural language interaction interface. Users can operate the O&M system using natural language, such as calling APIs for different business functions, performing data queries and troubleshooting, reducing the learning curve and lowering the barrier to entry, thus improving efficiency.

[0094] Corpus: This refers to language materials used for linguistic research or natural language processing, including text, speech, and images. These corpora can be raw materials of human language use or language data that has undergone specific processing and annotation. In the field of natural language processing, corpora are commonly used to train and test machine learning algorithms to improve the accuracy and efficiency of natural language processing. Common corpora include labeled corpora, balanced corpora, dependency treebanks, and aligned corpora. Corpora have wide applications in linguistics and computer science, helping researchers better understand language rules and develop language processing applications.

[0095] Representational State Transfer (RESTful) style: A software architectural style or design style, not a standard, but rather a set of design principles and constraints. It is primarily used for software involving interaction between the client and server. Software designed based on this style can be more concise, hierarchical, and easier to implement mechanisms such as caching.

[0096] In the field of network operation and maintenance technology, the network equipment maintained by users may generally come from different manufacturers, and each manufacturer has its own independent operation and maintenance system. Therefore, users cannot maintain network equipment from other manufacturers through the operation and maintenance system of one manufacturer, which brings inconvenience to users' operation and maintenance work.

[0097] Figure 1 This is a schematic diagram illustrating an application scenario of user operation and maintenance network equipment in related technologies. For example... Figure 1 As shown, in related technologies, users need to operate different operation and maintenance systems to maintain network devices. These different operation and maintenance systems include, for example,... Figure 1 The diagram shows three operation and maintenance systems: A, B, and C. Users can maintain network devices such as router 1 and firewall 1 through system A; router 2 and switch 1 through system B; and router 3, firewall 2, and switch 2 through system C.

[0098] In view of this, embodiments of this application provide an operation and maintenance system and an operation and maintenance method. The operation and maintenance method is applied to the operation and maintenance system, which can be any type of operation and maintenance system. To distinguish the operation and maintenance system of this application embodiment from other operation and maintenance systems, the operation and maintenance system of this application embodiment is also referred to as the first operation and maintenance system, and other operation and maintenance systems are also referred to as the second operation and maintenance system, the third operation and maintenance system, etc.

[0099] Figure 2 This is a schematic diagram illustrating an application scenario of the user operation and maintenance network equipment provided in an embodiment of this application. For example... Figure 2 As shown in the embodiments of this application, users can interact with the first operation and maintenance system to operate and maintain all network devices. This application embodiment provides a first operation and maintenance system, which can call a first model and a second model. This application embodiment provides two technical solutions.

[0100] Figure 3a This is a flowchart illustrating the first technical solution provided in an embodiment of this application. Figure 3a As shown, the first operation and maintenance system can input user questions into a first model and a second model respectively. The first model can identify the target operation and maintenance module of the business function corresponding to the input question, such as identifying the first N API interfaces from multiple API interfaces and outputting the identifiers of the first N API interfaces, where N is an integer greater than 1, and the identifiers include, for example, interface number and interface name. The second model can analyze the parameter information indicated by the input question and identify the final API interface corresponding to the input question from the first N API interfaces, and output the interface number and parameter information of the final API interface. The first operation and maintenance system can call the API interface corresponding to the interface number based on the parameter information; and obtain the answer to the input question from the data returned by the API interface.

[0101] For example, an operations and maintenance (O&M) module is a business function module in an O&M system used to implement a specific O&M task, such as an application or software. The number of O&M modules corresponds to the number of O&M tasks the O&M system can perform. The first model can be considered a classification model; multiple O&M modules in multiple O&M systems can be considered as multiple categories (each O&M module corresponds to a category). After the user inputs a question into the first model, the first model can identify the O&M module corresponding to the input question (i.e., the target O&M module) from the multiple O&M modules in multiple O&M systems. The output of the first model can be an identifier for the target O&M module. The identifier can be any implementation method that distinguishes different O&M modules. For example, multiple O&M modules in multiple O&M systems can be assigned numbers to distinguish them; for example, each O&M module in multiple O&M systems provides an API interface to the outside world so that users can access and call the O&M module through this interface. In this way, each O&M module's API interface can be assigned a different number, i.e., the API interface number (an example of the target O&M module), to distinguish different O&M modules. Different names can be assigned to multiple operation and maintenance modules in multiple operation and maintenance systems to distinguish them; or any other implementation method that distinguishes different operation and maintenance modules. This application embodiment does not limit this.

[0102] For example, business functions could include querying device ARP entries, querying routing information, and querying interface information.

[0103] Generally, the number of parameters in the first model is less than the first threshold, which may be, for example, 1B. B represents billion. The first model is used to classify the input question and output the identifiers of the top N API interfaces. The number of parameters in the second model is greater than the second threshold. The second threshold can be the same as the first threshold or a different threshold; for example, the second threshold can be greater than the first threshold. The second threshold may be, for example, 100B. The second model is used to analyze the input question based on its context, extract the parameter information indicated by the input question, and identify the final identifier of the API interface from the identifiers of the top N API interfaces. The following explanation uses BERT as the first model and LLM as the second model as an example.

[0104] In the first technical solution, the training dataset for the BERT model is created manually or using templates. This results in a training dataset lacking generalization ability, leading to low accuracy for the BERT model. Therefore, an LLM model is needed for further API interface identification and selection. For example, the first five API numbers of the BERT model can be selected, and the LLM model can then choose the final API number from those five.

[0105] In the first technical solution, the current BERT and LLM models are deployed in the first operations and maintenance system. These models can only identify multiple operations and maintenance modules within the first system. To enable them to identify modules in the second system, users need to fine-tune the BERT and LLM models. Therefore, this first technical solution has the following drawbacks:

[0106] First, LLM models have a massive number of parameters; large language models typically have tens or hundreds of billions of parameters. Fine-tuning requires significant computing resources, and users have limited computing resources, making LLM fine-tuning impossible. This prevents users from integrating the API interface (example of an operations and maintenance module) of the second operations and maintenance system into the first operations and maintenance system.

[0107] Secondly, users lack training datasets (such as question-and-answer (QA) corpora) for the secondary operation and maintenance system. They typically need to generate training datasets manually or using templates. However, manually generating training datasets or using templates cannot guarantee the diversity and accuracy of the training datasets. Furthermore, this method also suffers from high labor costs and requires a high level of expertise.

[0108] Third, the training datasets built into the second operations and maintenance system are generally described in formal written language, which differs greatly from the spoken language used by users in daily life. This results in low accuracy of the BERT model in recognizing user input questions after training. Training datasets that are manually written or generated using templates are usually provided by professionals. However, the training data built into the second operations and maintenance system prevents users from participating in the design process of the training dataset and from achieving personalized customization to make the training dataset conform to the user's specific language habits.

[0109] Figure 3b This is a flowchart illustrating the second technical solution provided in an embodiment of this application. Figure 3b As shown, the first operation and maintenance system can input user questions into a first model and a second model respectively. The first model can identify the API interface of the business function corresponding to the input question and output the final API interface identifier, such as interface number and interface name. The second model can analyze the parameter information indicated by the input question and output the parameter information. The first operation and maintenance system can call the API interface corresponding to the interface number based on the parameter information; and obtain the answer to the input question from the data returned by the API interface.

[0110] In the second technical solution, users set description files for different operation and maintenance systems. Based on these description files, a large language model is used to generate the training dataset for the BERT model. This training dataset generated by the large language model has diverse and generalizable characteristics, resulting in high accuracy for the BERT model. Consequently, the BERT model can identify the interface number of the final API interface. Therefore, to enable the BERT model to recognize the API interface of the second operation and maintenance system, only the BERT model needs to be trained, without the need to train an LLM model. Thus, the second technical solution overcomes the shortcomings of the first technical solution.

[0111] First, the BERT model has a small number of parameters, and fine-tuning requires only a small amount of computing resources. The user's existing computing resources can meet the requirements for training the BERT model. Furthermore, even with limited existing computing resources, the user can use the first operation and maintenance system to operate and maintain network devices managed by other operation and maintenance systems, thus improving the convenience of the user's operation and maintenance work.

[0112] Secondly, large language models possess generalization and strong text generation capabilities, thus ensuring the diversity and accuracy of training samples in the training dataset. Furthermore, their powerful text generation eliminates the need for template-based or manually generated training datasets, saving labor costs.

[0113] Third, optionally, users can set example questions that closely resemble their everyday spoken language. These example questions can serve as a reference for generating training datasets for large language models. Consequently, the BERT model achieves high recognition accuracy after being trained on the training dataset. By setting example questions, users can also personalize the training dataset, making the operation and maintenance process more aligned with their expectations.

[0114] In this embodiment, the identifier of the API interface output by the first model can be the identifier of the API interface of a business function in this operation and maintenance system, or it can be the identifier of the API interface of a business function in other operation and maintenance systems (such as the second operation and maintenance system, the third operation and maintenance system, etc.). That is, users can operate and maintain network devices managed by the first operation and maintenance system, or they can operate and maintain network devices managed by other operation and maintenance systems. Assuming the first operation and maintenance system has 10 operation and maintenance modules corresponding to 10 business functions (such as querying device ARP entries, querying routing information, etc.) and 15 operation and maintenance modules corresponding to 15 business functions, the operation and maintenance modules of the first operation and maintenance system can be numbered 1-10 and 11-25. The identifier of the API interface output by the first model can be any one of the numbers 1-25.

[0115] Figure 4 This is a flowchart illustrating an operation and maintenance method provided in an embodiment of this application. Figure 4 As shown, based on Figure 3b The second technical solution in this application provides an operation and maintenance method applied to the first operation and maintenance system, which mainly includes the following steps:

[0116] Step S410: Obtain the user's input question.

[0117] In one example, the input question is presented in natural language (audio, video, etc.), and includes the operation and maintenance task and the parameter information for performing the task; for example, the input question is "I need to view the routing information of device name HW002," where "view routing information" is the operation and maintenance task, and "device name HW002" is the parameter information. In other examples, the input question can also be presented in the form of user-input text, images, scripts, etc.

[0118] Step S420: Input the input problem into the first model to output the identifier of the target operation and maintenance module.

[0119] It should be noted that the function of the first model is to identify, based on the input question, which specific operations and maintenance (O&M) module from multiple O&M modules across multiple O&M systems is responsible for executing the O&M task corresponding to the input question. This O&M module is the target O&M module. Optionally, to distinguish between different O&M modules, different identifiers can be set for each module. For example, each O&M module can be numbered, in which case the first model outputs the target O&M module's number; alternatively, each O&M module can be given a different name, in which case the first model outputs the target O&M module's name.

[0120] Step S430: Based on the parameter information indicated by the input question, call the target operation and maintenance module to obtain the answer to the input question.

[0121] In this embodiment, the first operation and maintenance system deploys a first model, which can identify corresponding identifiers based on user input questions. The first model is also trained using a training dataset related to the second operation and maintenance system, so the target operation and maintenance module identified by the first model can be an operation and maintenance module within either the first or second operation and maintenance system. Furthermore, users can use the first operation and maintenance system to operate and maintain network devices managed by other operation and maintenance systems, improving the convenience of user operation and maintenance work.

[0122] It is understood that the operation and maintenance method provided in this application embodiment is used in a first operation and maintenance system, which deploys a first model. The first model is trained on a training dataset related to a second operation and maintenance system, so the first model can identify the operation and maintenance modules in the second operation and maintenance system. Therefore, users can operate and maintain network devices managed by the second operation and maintenance system through the first operation and maintenance system. The "second operation and maintenance system" can be any operation and maintenance system other than the first operation and maintenance system, and the number of "second operation and maintenance systems" can be one or more; this application embodiment does not impose any limitation. Users can not only operate and maintain devices managed by the first operation and maintenance system through the first operation and maintenance system, but also operate and maintain network devices managed by one or more operation and maintenance systems other than the first operation and maintenance system through the first operation and maintenance system.

[0123] In one possible implementation, the number of parameters in the first model is less than a first threshold.

[0124] Generally, users have limited computing resources and can only train neural network models with a small number of parameters. The number of parameters in the first model is less than a first threshold, such as 1 billion (B), making the first model a small model. It is understandable that the user's existing computing resources can meet the requirements for training the BERT model. Furthermore, given the limited computing resources on the user's existing network, the user can also train the first model based on the training dataset related to the second operations and maintenance system.

[0125] In one possible implementation, the method further includes: inputting an input question into a second model, the second model generating target parameter information based on the parameter information indicated by the input question, wherein the number of parameters in the second model is greater than a second threshold; and invoking a target operation and maintenance module based on the parameter information indicated by the input question to obtain the answer to the input question, including: invoking the target operation and maintenance module based on the target parameter information to obtain the answer to the input question.

[0126] In this implementation, the number of parameters in the second model is greater than the second threshold. The second threshold and the first threshold can be the same or different thresholds. Because the second model has more parameters than the second threshold, it can be, for example, a large language model. This gives the second model excellent context learning capabilities, allowing it to understand the input question well and generate highly accurate target parameter information.

[0127] In one possible implementation, the input question is fed into a second model, which extracts parameter information indicated by the input question to generate target parameter information. This includes: determining a target parameter extraction example corresponding to the input question from at least one parameter extraction example; using the target parameter extraction example and the input question as input to the second model, and with reference to the target parameter extraction example, the second model outputs the target parameter information. The target parameter extraction example refers to the parameter extraction example corresponding to the input question.

[0128] Figure 5 This is a schematic diagram illustrating the generation of target parameter information for the second model provided in this application embodiment. For example... Figure 5 As shown, the example of target parameter extraction includes examples of input questions and target parameter information. In one example, the input question might be "View..."

[0129] The target parameter information is extracted as follows: "The current static ARP entry for device 10.78.119.211"; an example of the target parameter information is "{"deviceName":"10.78.119.211","updateTime":"current","type":"static"}". When the user inputs the question "View the dynamic ARP entry for device leaf211 at 9 AM", the second model refers to the target parameter extraction example and extracts the target parameter information as "{"deviceName":"leaf211","updateTime":"9 AM","type":"dynamic"}". The target parameter extraction example serves as a reference for the second model to extract the target parameter information from the input question, improving the accuracy of parameter extraction by the second model.

[0130] In one possible implementation, the method further includes: obtaining a sample set, which includes multiple parameter extraction samples; performing clustering processing on the multiple parameter extraction samples in the sample set, selecting at least one parameter extraction sample from each category, and obtaining the above-mentioned at least one parameter extraction example.

[0131] Figure 6 This is a schematic diagram illustrating an example of obtaining at least one parameter extraction through clustering, as provided in an embodiment of this application. Figure 6As shown, in one example, the sample set includes parameter extraction samples 1-8. Multiple parameter extraction samples in the sample set are clustered to obtain multiple categories, each containing at least one parameter extraction sample. For example, category A includes parameter extraction samples 1, 2, 5, and 7; category B includes parameter extraction samples 3, 4, 6, and 8. At least one parameter extraction sample from each category is selected to obtain at least one parameter extraction example, such as selecting the first parameter extraction sample in each category, which includes parameter extraction sample 1, parameter extraction sample 3, etc. This ensures the diversity of parameter extraction examples while reducing their number, thus meeting the input text length limit of large language models.

[0132] In one possible implementation, the method further includes: obtaining a description file of a second operation and maintenance system, the description file indicating parameter information of at least one operation and maintenance module in the second operation and maintenance system, the operation and maintenance module representing a functional module that performs operation and maintenance tasks; invoking a second model based on the description file to generate operation and maintenance problems; and obtaining a training dataset based on the operation and maintenance problems. Optionally, the number of parameters in the second model is greater than a second threshold.

[0133] In this implementation, the training dataset for the first model is obtained using the second model, which has a larger number of parameters than a second threshold. The second model is, for example, a large language model. This is because the training dataset generated by the second model is diverse, ensuring the accuracy of the first model's output. Furthermore, the operational problems generated by the large language model are closer to the spoken language habits of natural language, thus leading to higher accuracy for the first model in recognizing colloquial input problems.

[0134] Figure 7 This is a flowchart illustrating a model training method provided in an embodiment of this application. Figure 7 As shown, this application provides a model training method applied to the aforementioned first operation and maintenance system, with the aim of training the first model to recognize the ability of different operation and maintenance modules in the system. The model training method mainly includes the following steps:

[0135] Step S710: Obtain the description file of the second operation and maintenance system. The description file indicates the parameter information of at least one operation and maintenance module in the second operation and maintenance system, and the operation and maintenance module represents the functional module that performs operation and maintenance tasks.

[0136] Step S720: Based on the description file, call the second model to generate operation and maintenance problems to obtain the training dataset. The number of parameters in the second model is greater than a second threshold.

[0137] Step S730: Based on the training dataset, train the first model to obtain the trained first model. The trained first model is used to identify the identifier of the target operation and maintenance module corresponding to the input problem. The target operation and maintenance module represents the functional module that performs the operation and maintenance task represented by the input problem.

[0138] Figure 8a This is a schematic diagram illustrating the training of a first model based on a training dataset generated from a second model, as provided in an embodiment of this application. Figure 8a As shown, based on the description file, the second model is called to generate operation and maintenance problems; based on the operation and maintenance problems, the training dataset is obtained; based on the training dataset, the first model is trained.

[0139] Figure 8b This is a schematic diagram illustrating the training process of the first model provided in an embodiment of this application. Figure 8b As shown, the operation and maintenance (O&M) problems in the training dataset are used as training samples, and the ground truth values ​​of the corresponding O&M modules are used as labels. The O&M problems are input into the first model, and the first model outputs the predicted values ​​of the O&M modules. The loss value between the ground truth value and the predicted value of the O&M modules is calculated. When the loss value meets the threshold requirement, the training of the current training sample ends, and the training of the next training sample continues. When the loss value does not meet the threshold requirement, the parameters of the first model are adjusted. Optionally, each training sample can be a single sample or multiple samples, realizing batch training of the model. This training process can be a batch training process (after training batches of samples, the parameters of the first model are adjusted); or it can be a single training sample training process (after training a single sample, the parameters of the first model are adjusted).

[0140] In one possible implementation, the aforementioned description file includes parameters of the operations and maintenance module and examples of parameter values. The step of calling the second model based on the description file to generate an operations and maintenance problem includes: selecting at least one parameter from the parameters in the description file to obtain a parameter group; assigning values ​​to the parameters in the parameter group according to the parameter value examples in the description file to obtain parameter group information; and generating a prompt word based on the parameter group information, with the prompt word serving as input to the second model to generate the operations and maintenance problem.

[0141] Figure 9 This is a schematic diagram illustrating the determination of parameter group information provided in an embodiment of this application. For example... Figure 9As shown in the example, the description file includes parameter definitions and parameter value examples for all parameters of the operations and maintenance module. Selecting some parameters from all parameters yields at least one parameter group. Based on the parameter value examples, assign values ​​to the parameters in each parameter group to obtain the parameter group information. Parameter definitions include, for example, deviceName: device name, updateTime: table entry update time, ..., vrfName: VRF name, etc. Parameter value examples for device name include 10.78.119.211, leaf211, etc. Parameter groups include, for example, {"deviceName","type","ipAddr","macAddr"}, etc. Parameter group information includes, for example...

[0142] {"deviceName":"10.78.119.211","type":"interface","ipAddr":"10.141.152.66","macAddr":"0001-0034-5573"} etc.

[0143] In this implementation, the description file of the second maintenance system typically contains many parameters, but users cannot provide all of them when asking questions in natural language. By selecting some parameters from the description file and assigning values, the parameter set information for the input large language model is obtained. This results in maintenance questions with fewer parameters, better reflecting the user's natural language expression, and improving the recognition accuracy of the first model.

[0144] In one possible implementation, the description file also includes problem examples. The process of calling the second model to generate operation and maintenance problems based on the description file includes: calling the second model based on the description file and the problem examples, and generating operation and maintenance problems with reference to the problem examples.

[0145] Figure 10 The second model provided in this application provides a schematic diagram of an operation and maintenance problem generated with reference to a problem example. For example... Figure 10 As shown, in one example, the problem might be "Retrieve ARP entries for device 10.78.119.211, updated at 9:00 AM". Parameter group information might be "{"deviceName":"52.159.192.21","updateTime":"7:00 AM"}". Following this example, the maintenance problem generated by the second model might be "Retrieve ARP entries for device 52.159.192.21, updated at 7:00 AM".

[0146] In this implementation, the first operations and maintenance (O&M) system can obtain a description file of the second O&M system set by the user from the client. This description file includes some problem examples set by the user through the client. The first O&M system can input the problem examples into a large language model, so that the large language model can generate O&M problems with reference to the problem examples. Since the problem examples are set by the user, the O&M problems generated by the large language model are more in line with the user's specific language habits.

[0147] In one possible implementation, the training dataset also includes modified operation and maintenance (O&M) problems (training samples other than those generated by the large language model), which are obtained by modifying the parameter values ​​in the O&M problems. The method further includes: replacing parameter values ​​in the O&M problems according to parameter value examples in a description file to obtain the modified O&M problems; and / or, modifying parameter values ​​in the O&M problems according to preset rules to obtain the modified O&M problems.

[0148] Figure 11 This is a schematic diagram illustrating the generation and modification of maintenance problems based on maintenance issues, provided as an embodiment of this application. For example... Figure 11 As shown in the example, an operational issue might be: "I need to view the ARP table entry for the interface on 10.78.119.211, the MAC address corresponding to IP address 10.141.152.66 is 0001-0034-5573". Here, the IP address and MAC address are replaced with parameter values ​​generated according to preset rules, and the device name is replaced with parameter value examples from the description file. A similar operational issue could be: "I need to view the dynamic ARP table entry for 222.184.125.104, the MAC address corresponding to IP address 66.227.12.198 is 1B17-A010-11B1".

[0149] In this implementation, modified operation and maintenance (O&M) problems are obtained by altering the parameter values ​​in the O&M problem. Adding these modified O&M problems to the training dataset ensures the diversity of the training dataset.

[0150] In one possible implementation, the method further includes: determining whether a first operation and maintenance module and a second operation and maintenance module exist to perform the same operation and maintenance task; if the determination result is that they exist, a first training subset is also set in the training dataset; specific information is set in the operation and maintenance problems in the first training subset, and the specific information is used to specify the functional module that performs the operation and maintenance task. Optionally, a second training subset may also be set in the training dataset, and the specific information is not set in the operation and maintenance problems in the second training subset.

[0151] Figure 12 This is a schematic diagram illustrating the second training subset and the first training subset provided in an embodiment of this application. (See attached diagram.) Figure 12As shown, in one example, the second training subset includes training sample pair 1, training sample pair 2, etc. In training sample pair 1, the training samples are operation and maintenance issues, such as "querying the ARP entry for IP address 13.107.184.60, MAC address 6D03-2651-65B6 at the current time," labeled with the interface number, such as "3"; in training sample pair 2, the training samples are such as "querying the dynamic ARP entry for VRF name _public_ on device spine232," labeled with, for example, "5." The first training subset includes training sample pair 3, training sample pair 4, etc. In training sample pair 3, a training sample might be "querying the ARP entry for IP address 13.107.184.60 and MAC address 6D03-2651-65B6 at the current time point via the XX system," with a label like "3." In training sample pair 4, a training sample might be "querying the dynamic ARP entry for VRF name _public_ on device spine232 via the YY system," with a label like "5." The XX system and YY system represent two different operation and maintenance systems.

[0152] In this implementation, if the second and first operation and maintenance systems contain operation and maintenance modules performing the same operation and maintenance tasks, at least one of the second and first training subsets can be set in the training dataset. The operation and maintenance questions in the second training subset do not include system-specified information, while the operation and maintenance questions in the first training subset do. Thus, regardless of whether the input question contains system-specified information or not, the first model can identify the target operation and maintenance module corresponding to the input question. Furthermore, the first model can identify stable operation and maintenance modules based on the input question.

[0153] In one possible implementation, the method further includes: obtaining a validation dataset; and validating the trained first model based on the validation dataset to obtain a validation result. The purpose of using the validation dataset to validate the first model is to verify the recognition accuracy of the first model, thereby verifying whether the first model has been successfully trained.

[0154] It should be noted that the validation dataset includes validation samples, each labeled (i.e., the ground truth value of the operations and maintenance module). After each sample is input into the first model, the first model outputs the predicted value of the operations and maintenance module. When the predicted value of the operations and maintenance module matches the ground truth value, the first model is considered to have correctly identified it. When the predicted value of the operations and maintenance module does not match the ground truth value, the first model is considered to have misidentified it. The inconsistency between the predicted value and the ground truth value of the operations and maintenance module falls into two categories. The first category is: if the predicted value and the ground truth value are different operations and maintenance modules, then the "operations and maintenance module corresponding to the predicted value" and the "operations and maintenance module corresponding to the ground truth value" are considered to be in conflict. Here, "conflict" means that the first model cannot distinguish between the "operations and maintenance module corresponding to the predicted value" and the "operations and maintenance module corresponding to the ground truth value."

[0155] Figure 13a This is a schematic diagram illustrating the verification results provided in an embodiment of this application. For example... Figure 13a As shown, in one example, the label (i.e., the truth value of the operations and maintenance module) is operations and maintenance module A. Let's discuss the validation results exemplarily. Suppose there are M validation samples labeled with operations and maintenance module A.

[0156] In the first scenario, the recognition accuracy of maintenance module A is determined based on the verification results. For example, after inputting m1 verification samples out of M verification samples into the first model, the predicted value of the maintenance module output by the first model is maintenance module A. The recognition accuracy of maintenance module A is m1 / M.

[0157] In the second scenario, the recognition error rate of operation and maintenance module A is determined based on the verification results. For example, if m2 out of M verification samples are input into the first model, and the predicted value of the operation and maintenance module output by the first model is incorrect, then the recognition error rate of operation and maintenance module A is m2 / M. Where M = m1 + m2.

[0158] In the third scenario, the conflict error rate of operation and maintenance module A is determined based on the verification results. For example, if m3 out of M verification samples are input into the first model, the predicted value of the operation and maintenance module output by the first model is operation and maintenance module B. The conflict error rate between operation and maintenance module A and operation and maintenance module B is m3 / M. Since this third scenario is part of the second scenario, the conflict error rate between operation and maintenance module A and operation and maintenance module B can also be m3 / m2.

[0159] For example, after inputting m4 of the M validation samples into the first model, the predicted value of the first model for the operation and maintenance module is operation and maintenance module C. The conflict error rate between operation and maintenance module A and operation and maintenance module C is m4 / M. Since the third case is part of the second case, the conflict error rate between operation and maintenance module A and operation and maintenance module C can also be m4 / m2.

[0160] In one possible implementation, the verification results include the recognition accuracy (e.g., m1 / M) and conflict error rate (e.g., m3 / M, m3 / m2, etc.) of the module to be identified (e.g., operation and maintenance module A). The conflict error rate represents the error rate at which the module to be identified (e.g., operation and maintenance module A) is identified as a conflicting module (e.g., operation and maintenance module B, operation and maintenance module C, etc.), where a conflicting module represents an operation and maintenance module other than the module to be identified. When the recognition accuracy is lower than a first preset threshold and the conflict error rate is greater than a second preset threshold, the ratio of training samples corresponding to the module to be identified and the conflicting modules in the training dataset is adjusted to obtain an adjusted training dataset; based on the adjusted training dataset, the first model is trained again.

[0161] In this implementation, when the module to be identified is identified as a conflicting module, the ratio of training samples corresponding to the module to be identified and the conflicting modules in the training dataset can be adjusted, and the first model can be retrained. This reduces the likelihood of the module to be identified being identified as a conflicting module, ensuring the accuracy of the first model's identification results.

[0162] Figure 13b This is a schematic diagram illustrating the adjustment of the training dataset provided in an embodiment of this application. Figure 13b As shown in the example, the module to be identified is operation and maintenance module A, and the conflicting module is operation and maintenance module B. A verification dataset is obtained to verify the trained first model and obtain the verification results. In the verification results, it is determined whether the accuracy rate of operation and maintenance module A's identification is lower than a first preset threshold, such as 90%. If the result is negative, operation and maintenance module A is successfully identified. If the result is positive, it is determined whether the error rate of operation and maintenance module A's conflict (operation and maintenance module A is identified as operation and maintenance module B) is higher than a second preset threshold. The second preset threshold can be the threshold corresponding to m3 / m2, such as 60%; or it can be the threshold corresponding to m3 / M, such as 6%. If the result is negative, it means the error reason of operation and maintenance module A is not a business function conflict, so training samples of operation and maintenance module A are added to the training dataset, and the first model is trained again. If the result is positive, it means the error reason of operation and maintenance module A is a business function conflict, so the ratio of training samples of operation and maintenance module A and operation and maintenance module B in the training dataset is adjusted, for example, reducing the proportion of samples of low-priority modules in the training dataset, and the first model is trained again.

[0163] It's important to note that in the first scenario, the user's input question contains information about a specific operations and maintenance (O&M) system. Generally, the business functions within each O&M system are non-conflicting, meaning the O&M modules within each O&M system are also non-conflicting. In this case, the first model has a high accuracy rate in identifying the target O&M module. In the second scenario, the user's input question does not contain information about a specific O&M system, but it typically contains contextual information that allows the first model to accurately identify the target O&M module. For example, the input question might be, "I found that a blade server is malfunctioning, experiencing a long delay when reading data. This device number is HW002. Can you help me find the routing information for this device?" In this example, the contextual information such as "blade server," "read data," "delay," "device number is HW002," and "routing information" allows the first model to identify the target O&M module as the "query routing information" O&M module in Company H's O&M system. This allows the model to then call the "query routing information" API interface in Company H's O&M system to obtain the relevant data and get the answer to the input question.

[0164] In other words, despite the duplication of business functions within different operation and maintenance systems, the first model still achieves a high accuracy rate in identifying the target operation and maintenance module when combined with the context information of the input question. Conflicts in the identified operation and maintenance modules only occur when the user's input question has little or no context information. In such cases, adjusting the training dataset can further improve the accuracy of the first model. Adjusting the training dataset can be achieved by setting the priorities of different operation and maintenance systems and reducing the proportion of low-priority operation and maintenance modules in the training dataset.

[0165] To further illustrate the technical solutions of the embodiments of this application, based on the operation and maintenance method and model training method of the first operation and maintenance system described above, the embodiments of this application also provide examples of the architecture of the first operation and maintenance system at the hardware and software levels, so as to exemplarily illustrate the implementation of the technical solutions of the embodiments of this application.

[0166] Figure 14 This is a schematic diagram of the hardware architecture of a first operation and maintenance system provided in an embodiment of this application. In conjunction with the above-described first operation and maintenance system, Figure 14 The hardware implementation of the first operation and maintenance system is illustrated by way of example. The architecture of this first operation and maintenance system may include terminal 1 and server 2. Server 2 may include one or more servers (…). Figure 14 (The example includes a server) Server 2 can provide the methods and / or apparatus provided in the embodiments of this application to one or more terminals.

[0167] Optionally, a relevant application may be installed on terminal 1. This application can receive user input questions through terminal 1 and send the input questions to server 2. Server 2 processes the input questions, obtains the corresponding answers, and sends the answers to terminal 1 to display the answers to the user.

[0168] It should be understood that in some optional implementations, terminal 1 can also implement the methods and / or apparatus of the embodiments of this application independently. That is, terminal 1 can complete the above-mentioned operation and maintenance methods on its own without the cooperation of server 2. The embodiments of this application are not limited in this respect. In some optional implementations, server 2 can also implement the methods based on input questions received from other devices other than terminal 1 or locally stored, without the cooperation of terminal 1. The embodiments of this application are not limited in this respect.

[0169] The following description Figure 14 The product form of terminal 1. In this application embodiment, terminal 1 can be a mobile phone, a speaker, a robot, a watch with voice function, a tablet computer, a wearable device, an in-vehicle device, an augmented reality (AR) / virtual reality (VR) device, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (PDA), etc., and this application embodiment does not impose any restrictions on it.

[0170] The following description Figure 14 The product form of server 2. It can be further understood that server 2 can be various types of servers, such as x86 architecture servers, specifically rack servers, blade servers, high-density servers, platform servers, or high-performance servers, etc. In other words, this application embodiment does not specifically limit the specific type of server. Furthermore, it can be understood that... Figure 14 The server structure shown does not constitute a limitation on the server structure. A server may include more or fewer components than shown, or combine certain components, or have different component arrangements.

[0171] Furthermore, server 2 can be configured as an independent physical server, a server cluster or distributed system consisting of multiple physical servers, or a cloud server or cloud server cluster deployed in several cloud data centers; the software can be an application implementing object control methods, etc., but is not limited to the above forms. Optionally, the method or apparatus of this application embodiment can be deployed on server 2 equipped with a GPU (Graphics Processing Unit) or an NPU (Neural-network Processing Unit).

[0172] Next, the communication connection method between terminal 1 and server 2 is described. For example, terminal 1 and server 2 are connected via a network, enabling terminal 1 to access the cloud management platform deployed on server 2. The network can be a wired network or a wireless network. For example, a wired network can be a cable network, fiber optic network, Digital Data Network (DDN), etc., while a wireless network can be a telecommunications network, intranet, Internet, Local Area Network (LAN), Wide Area Network (WAN), Wireless Local Area Network (WLAN), Metropolitan Area Network (MAN), Public Service Telephone Network (PSTN), Bluetooth network, ZigBee network, Global System for Mobile Communications (GSM), CDMA (Code Division Multiple Access) network, CPRS (General Packet Radio Service) network, etc., or any combination thereof.

[0173] Understandably, a network can use any known network communication protocol to enable communication between different terminal layers and gateways. These network communication protocols can be various wired or wireless communication protocols, such as Ethernet, Universal Serial Bus (USB), FireWire, Global System for Mobile Communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), New Radio (NR), Bluetooth, Wireless Fidelity (Wi-Fi), and other communication protocols.

[0174] In one possible scenario, server 2 can serve as a cloud platform (e.g., a software platform based on virtualization technology). In practical use, server 2 can deploy a cloud management platform and a data center, allowing terminal 1 to interact with the cloud through the cloud management platform. Additionally, the data center can deploy nodes, which can be virtual machine instances, container instances, physical servers, etc.

[0175] In another possible scenario, the method provided in this application embodiment can be implemented by software. The software has a user terminal and a server terminal; terminal 1 is the user terminal running the software, and server 2 is the server terminal running the software. During the operation of the software on terminal 1, it can call the server terminal running on server 2 to implement the method provided in this application embodiment.

[0176] In other words, the method provided in this application embodiment can be applied to terminal 1 or server 2. In specific implementation, it can run as software on terminal 1 or server 2; for example, the software can be a service or an application. This application embodiment can be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc., that perform specific tasks or implement specific abstract data types. This application embodiment can also be implemented in a distributed computing environment, where tasks are performed by remote processing devices connected via a communication network. In a distributed computing environment, program modules can reside in local and remote computer storage media, including storage devices.

[0177] Figure 15 This is a schematic diagram of the software architecture of a first operation and maintenance system provided in an embodiment of this application. Based on the above-described first operation and maintenance system, this application provides a software-level architecture diagram of the first operation and maintenance system. Figure 15 As shown, the first operation and maintenance system mainly includes a user-side device and a management device. The first operation and maintenance system communicates with one or more third-party systems. The user-side device can be deployed in, for example,... Figure 14 Terminal 1 shown, the management device can be deployed in, for example, Figure 14 Server 2 is shown.

[0178] It is understood that the method provided in this application embodiment can be implemented by software. The user-side device refers to the user client of the software, and the management device refers to the server-side device of the software. Terminal 1 can run the user client of the software, and server 2 can run the server-side device of the software. While terminal 1 is running the user client of the software, it can call the server-side device running on server 2 to implement the method provided in this application embodiment.

[0179] In the first operation and maintenance system, the user-side device is used for human-computer interaction, including a "file import module" and a "dialogue module." The file import module is used to create and modify description files. Users can start and view the training task of the first model. Users can also create and modify description files, such as modifying interface definitions. An interface refers to the interface that calls the operation and maintenance system; an interface may be an API. The dialogue module is the natural language interaction entry point. Users input questions described in natural language through this dialogue module, and the results returned by the management device are presented to the user in the form of text, images, tables, and sound.

[0180] It's important to note that in practice, an operations and maintenance (O&M) system can execute multiple business functions, such as querying device ARP entries, routing information, and interface information. Let's assume each business function corresponds to an O&M module, and each O&M module corresponds to an interface. An interface is an API (Application Programming Interface) provided to the outside world. Therefore, calling an API interface actually means calling the corresponding O&M module to execute the relevant O&M task or business function.

[0181] Depending on the actual usage scenario, the "file import module" and the "dialogue module" may be deployed on the same user-side device on the same terminal, or they may be deployed on user-side devices on different terminals. The user-side device is open to the user. In the file import module, instructions such as corpus generation methods and illustrative examples can guide users on how to write high-quality description files to ensure the diversity of corpus generation. The file import module can also guide users on how to modify description files according to the conflict resolution methods described in this application embodiment, to resolve conflicts between the operation and maintenance functions of multiple operation and maintenance systems, and how to use the functions of each system in the event of conflicts between multiple system functions.

[0182] In this example, the corpus or QA corpus is a dataset. Based on the corpus, a training dataset can be obtained, as well as at least one of the parameter extraction examples mentioned above. For instance, the corpus is tabular data, facilitating data extraction. For example, the corpus includes multiple rows of information, where each row includes an operation and maintenance problem, the interface for that problem, and the parameter information within that problem. Extracting the operation and maintenance problem and its interface from the corpus yields the training data; the operation and maintenance problem is the training sample, and the interface is the label. Extracting the operation and maintenance problem and its parameter information from the corpus yields at least one parameter extraction example mentioned above; the parameter information is extracted from the operation and maintenance problem. These parameter extraction examples are also referred to as example corpus.

[0183] In the first operation and maintenance system, the management unit is used to integrate the API interfaces (examples of interfaces) of third-party systems and handle user input issues. The management unit includes an "interface integration module", an "intelligent agent module", and a "general LLM module".

[0184] Furthermore, the interface integration module integrates the API interfaces of third-party systems, performing corpus generation, model training, and configuration generation based on the description files of each operation and maintenance system. For example, the corpus generation submodule generates the full QA corpus; the model training submodule trains the first model based on the QA corpus; and the configuration generation module generates relevant configuration files that guide the parameter extraction process of the large language model. These configuration files include interface definitions, question examples, parameter definitions, parameter examples, and example corpora. After combining this configuration file with the user's input question to form a prompt, the prompt is input into the large language model. The configuration file provides the large language model with relevant information about the second operation and maintenance system, such as information on its operation and maintenance functions and the parameter information for each function. The example corpus includes operation and maintenance questions and their corresponding parameters, serving as an example and reference for parameter extraction in the large language model.

[0185] Furthermore, the intelligent agent module includes an AI intelligent agent based on LLM, which is used to handle user questions. First, it performs intent recognition through the intent recognition submodule, extracts parameters through the parameter extraction submodule, and then calls a third-party system (an example of the second operation and maintenance system) through a plugin to obtain the operation and maintenance results returned by the third-party system and return the obtained operation and maintenance results to the user-side device.

[0186] Furthermore, the general LLM module is used to provide LLM services and provide an inference interface for business modules to use. Business modules use LLM capabilities by calling the inference interface.

[0187] Depending on the actual use case, the "Interface Integration Model," "Intelligent Agent Module," and "General LLM Module" may be deployed on the same server or on different servers. In an interactive operation and maintenance architecture, a third-party system refers to an operation and maintenance system outside the primary operation and maintenance system, which provides API interfaces for functions such as data query and business troubleshooting, and is integrated into the primary operation and maintenance system.

[0188] It should be noted that, in this embodiment of the application, the operation and maintenance task executed by the first operation and maintenance system can be any operation and maintenance task, such as querying device ARP entries, querying routing information, or querying interface information. The following uses "querying device ARP entries" as an example to illustrate the interactive operation and maintenance method of this embodiment of the application.

[0189] The first operation and maintenance system of this application embodiment mainly includes two stages: a training stage and an inference stage. In the training stage, a training dataset related to the third-party system is constructed, and a first model is trained based on the training dataset. The first model then identifies interfaces in different operation and maintenance systems. In the inference stage, the first operation and maintenance system obtains the user's input question; inputs the input question into the first model to identify the target operation and maintenance module corresponding to the input question; inputs the input question into a second model to extract target parameter information from the input question; and calls the target operation and maintenance module based on the target parameter information to obtain the answer to the input question.

[0190] In this embodiment, a description file is first generated. This description file mainly includes interface definitions, problem examples, parameter definitions, and parameter examples. The first operation and maintenance system organizes the operation and maintenance modules of the second operation and maintenance system (for example, organizing the API interfaces of each operation and maintenance module of the second operation and maintenance system), models the description file, and generates plugins. The second operation and maintenance system is only one example; the first operation and maintenance system can also organize the operation and maintenance modules of one or more third-party operation and maintenance systems (referred to as "third-party systems"). A third-party system refers to any operation and maintenance system other than the first operation and maintenance system.

[0191] This explanation uses the API interfaces of each maintenance module in the second maintenance system as an example. Generally, third-party system API interfaces provide services using a RESTful architecture and the HTTP protocol. The purpose of the first maintenance system organizing the maintenance modules of the second maintenance system is to determine the definition of the third-party system's API interfaces and understand the business functions of the third-party system's APIs (i.e., maintenance modules). The definition of a third-party system API includes a request message body and a response message body, as shown in Table 2.

[0192] For example, a first-level operation and maintenance system can integrate n third-party systems, such as third-party system 1, third-party system 2, ..., third-party system n, where n is a positive integer. Each third-party system can operate and maintain multiple devices; for example, third-party system 1 can operate and maintain device 1 and device 2. A third-party system can perform multiple operation and maintenance tasks on a single device, such as querying the device's ARP table entries, querying routes, querying interfaces, etc. Figure 4 The third-party system 1 shown can perform maintenance tasks 1 and 2 on device 1.

[0193] Taking maintenance task 1, "querying device ARP entries," as an example, the definition of the third-party system API includes a request message body and a response message body. In one example, the ARP entry query request message body (taking the actual request data as an example) is shown in Table 1, and the ARP entry query response message body (taking the actual response data as an example) is shown in Table 2.

[0194] Table 1 shows an example of an ARP entry query request message body.

[0195]

[0196] Table 2 shows an example of an ARP query response message body.

[0197]

[0198]

[0199] Since the API interfaces provided by third-party systems are often used for inter-machine communication, they contain many numerical parameters that are difficult to express well in natural language. Modeling is used to convert these parameters into text parameters that are easier to express in natural language. To distinguish them from the original APIs of third-party systems, the modeled API will be referred to as a description file below. Because a description file is a Language User Interface (LUI), it is also called a description file for LUI or simply LUI. For example, a description file includes interface definitions, problem examples, parameter definitions, and parameter value examples.

[0200] In one example, the maintenance task is to query the device's ARP table entries, and the description file of the modeled LUI is shown in Table 3. In Table 3, the format of the modeled description file is Yet Another Markup Language (YAML). YAML is a highly readable data serialization format, mainly used for representing configuration files and data exchange.

[0201] Table 3. Examples of LUI description files after modeling.

[0202]

[0203]

[0204] The parameters in the description file of Table 3 are explained as follows: sysName: System name; priority: Priority; number: Interface number, which is unique within the third-party system; name: LUI name; description: LUI function description; questionExample: Example of a user question; parameters: Parameter definition; description: Parameter description; type: Parameter type; default: Default parameter; example: Parameter example; mapper: Parameter mapping, mapping natural language to enumeration definitions;

[0205] requiredParams: Required parameter.

[0206] Optionally, the description files can be divided according to operation and maintenance tasks or functions, meaning that a description file is obtained for each operation and maintenance task, and this description file may be used to describe the operation and maintenance task of multiple third-party systems. Alternatively, they can be divided according to third-party systems, meaning that a description file is obtained for each third-party system, or each operation and maintenance task of a third-party system corresponds to a description file. This application's embodiments limit this approach.

[0207] Optionally, the primary operations and maintenance system can also generate plugins to call third-party systems and retrieve data from them. Plugins consist of two parts: API request parameter settings and API response message parsing. Plugins can be in the form of scripts; they can also be implemented in various other ways, such as using a domain-specific language (DSL). The following example uses a Python script to illustrate the LUI plugin content for "ARP entry query".

[0208] An example of a plugin provided in this application embodiment. Taking a Python script as an example, the LUI plugin content for "ARP entry query" is illustrated. This plugin mainly includes constructing API request parameters, calling the API interface, and parsing the API return results. Table 4 provides an exemplary description of the specific Python code for each part.

[0209] Table 4 shows examples of the plugin's Python code.

[0210]

[0211]

[0212] In this embodiment, the first operation and maintenance system generates a full QA corpus and, based on the full QA corpus, generates a training dataset for training the first model, a configuration file, etc. The full QA corpus refers to all QA corpora corresponding to the training dataset related to the second operation and maintenance system obtained by the aforementioned model training method. The interface integration module includes three steps: "corpus generation," "model training," and "configuration generation," corresponding to the "corpus generation submodule," "model training submodule," and "configuration generation submodule," respectively. The input to the interface integration module is the LUI description file, and the output is the model file and configuration file, which, along with the LUI plugins, are then updated to the first operation and maintenance system.

[0213] The steps for generating the corpus are explained below. QA corpora are generated using LLM-based LUI description files. Examples of QA corpora are shown in Table 5. In Table 5, each LUI generates approximately several hundred QA corpora; Table 1 only lists 10 of these. The generated QA corpora are persistently stored together with the existing API corpora. In Table 5, each row represents a QA corpus. Within a QA corpus, the number identifies the maintenance module. All maintenance modules across different maintenance systems are uniformly numbered, with a one-to-one correspondence between the number and the maintenance module. For example, number "3" indicates that the maintenance module corresponding to the maintenance problem is the "Query ARP Table Entry" maintenance module in the second maintenance system; number "5" indicates that the maintenance module corresponding to the maintenance problem is the "Query ARP Table Entry" maintenance module in the first maintenance system; and number "2" indicates that the maintenance module corresponding to the maintenance problem is the "Query Routing Information" maintenance module in the second maintenance system.

[0214] Table 5 shows multiple examples of QA corpora.

[0215]

[0216]

[0217] The process of generating diverse corpora is illustrated below.

[0218] Taking the "ARP entry query" function as an example, this describes the process of generating a diverse QA corpus. The diversity of the QA corpus is reflected in various question formats, parameter combinations, and parameter values. The input of the corpus generation submodule for the diverse QA corpus is a description file, and the output is the diverse QA corpus.

[0219] During the QA corpus generation process, the description file contains multiple parameter definitions. Users cannot input all parameters when asking questions normally. In order to make the QA corpus more similar to user questions, the parameters are randomly combined to obtain different parameter combination examples as shown in Table 6.

[0220] Table 6 Examples of different parameter combinations

[0221]

[0222] During the QA corpus generation process, it is also necessary to assign values ​​to the parameter combination examples. To ensure the controllability of the QA corpus generation process, such as preventing the illusion problem of large language models, parameter standardization is also required. For example, when setting a fixed value for each parameter, it can be selected from the parameter examples in the description file, or the parameter value can be selected through other methods. The results of assigning values ​​to the parameter combination examples from the parameter examples in the description file are shown in Table 7.

[0223] Table 7 Examples of parameter combinations after assignment

[0224]

[0225]

[0226] The corpus generation process is explained below. First, as shown in Table 8, prompt words are obtained by combining question examples and parameter combination examples after assignment. These prompt words are then input into the LLM (Language Model) to generate corresponding questions. Optionally, parameter combination examples can be used as parameters, and user question examples generated by the large language model can be used as questions, thus completing the initial QA corpus generation.

[0227] For example, suppose the parameter combination input by the first maintenance system to the LLM is: {"deviceName":"10.78.119.211","type":"interface","ipAddr":"10.141.152.66","macAddr":"0001-0034-5573"}. A feasible prompt would be: "Please generate a maintenance question to query the device's ARP table entries based on the following information: device name is 10.78.119.211; the device type is interface; IP address is 10.141.152.66; physical address is 0001-0034-5573."

[0228] Table 8 Examples of User Questions Generated by Major Language Models

[0229]

[0230] The corpora generated by large language models are diverse, but can be further optimized. One feasible approach is parameter randomization. Parameters in the initial QA corpus are randomly assigned values. During randomization, a sample parameter can be randomly selected from the LUI description file for replacement. For common parameters, such as Internet Protocol Addresses (IP addresses), parameters can be further randomly generated according to rules to improve corpus diversity. The results of parameter randomization are shown in Table 9. The left side shows the modified operation and maintenance issues obtained after parameter randomization, and the right side shows the parameter information corresponding to each modified operation and maintenance issue.

[0231] Table 9 Results of parameter randomization

[0232]

[0233]

[0234] The model training steps are explained below. The first model (e.g., BERT) is trained under supervised instruction using "Question (Operational / Maintenance Question)" and "ID or Name" from the full QA corpus as input and output. This enables the first model to have classification capabilities. When the input is a user question, it can classify and identify the user question, outputting the corresponding API, and then allowing the user to select the API to call. A new model file is generated after each BERT training iteration, and this new model file is updated in the first operations and maintenance system.

[0235] The steps for generating the configuration file are explained. The configuration file is generated based on the description file, and examples are selected from the QA corpus as part of the configuration file (content of the prompts field). The QA corpus examples serve as example corpora for LLM context learning in the parameter extraction stage, and the example corpus serves as a reference for generating examples.

[0236] This application provides an example of a configuration file. In one example, as shown in Table 10, the configuration file includes interface definitions, problem examples, parameter definitions, parameter examples, and example corpora.

[0237] Table 10 Examples of Configuration Files

[0238]

[0239]

[0240]

[0241]

[0242] The following example illustrates the method for selecting example corpora. This implementation does not constitute a limitation on the method for selecting example corpora. For example, corpora can also be categorized by operation and maintenance tasks or functions, and a portion of the QA corpora from each category can be selected as example corpora. Example corpora are also referred to as parameter extraction examples.

[0243] During the inference process, a large language model is used to extract parameter information from the user's input question. Based on this parameter information, the first operation and maintenance system (obtained through the analysis and processing of the input question by the first model) is invoked. The parameter extraction process is implemented through the LLM's In-Context Learning (ICL) capability. However, the LLM's ICL capability is limited by the length of the Prompt token, allowing it to carry only a limited amount of example corpus. Furthermore, to ensure that the LLM can fully learn the parameter extraction process, the diversity of the QA example corpus carried in the Prompt needs to be guaranteed. Therefore, within the Prompt token length limit, it is necessary to select as many representative example corpora as possible.

[0244] Optionally, the QA corpus can be clustered, and then one QA corpus entry can be selected from each cluster as example corpus. For example, the K-Means algorithm is used for clustering, forming N clusters, where N is determined by the LLM context learning capability and token length limitations, such as N=6. The clustering feature is a parameter vector; if the QA corpus exists, the parameter is set to 1, and if it does not exist, it is set to 0. As shown in Table 11, the feature vector is defined according to the parameter order (deviceName / updateTime / type / ipAddr / macAddr / ifName / vrfName). The selected QA example corpus after clustering is written to the LUI configuration file and finally set in the prompts field.

[0245] Table 11 Examples of feature vectors from example corpora

[0246]

[0247] The following provides an example of how to resolve conflicts in the operation and maintenance functions of multiple operation and maintenance systems.

[0248] For example, when multiple operation and maintenance systems (including third-party systems and the primary operation and maintenance system) have the same operation and maintenance functions, such as both XX system and YY system having ARP table entry query functions and both needing to be integrated into the primary operation and maintenance system, when a user performs an ARP table entry query, it may result in the ARP table entry query function being called from XX system at one time and from YY system at another.

[0249] In the first feasible implementation, in order to ensure that the first operation and maintenance system has stable output, the priority of each operation and maintenance system can be represented by setting the priority field in the description file. For example, the larger the priority value, the higher the priority.

[0250] In the second feasible implementation, QA corpora with system identifiers (such as the sysName field, number field, etc. in the description file) are generated. That is, users can add system identifiers to their questions to query data through a specified system. For ease of description, we assume that the ARP table query function descriptions and implementations in systems XX and YY are completely identical, and the generated QA corpora are also completely identical.

[0251] For example, in the QA corpus generation stage, in addition to generating normal corpus (hereinafter referred to as corpus A, as shown in rows 1, 2, 5 and 6 of Table 12), corpus with system identifier-qualified modifiers is also generated (hereinafter referred to as corpus B, as shown in rows 3, 4, 7 and 8 of Table 12), and both corpus types are used to train the BERT model.

[0252] The numbering is explained below. An operations and maintenance (O&M) system includes multiple interfaces. For example, interfaces for querying ARP entries in different O&M systems have different numbers. For instance, system XX has 3 interfaces, numbered 1-3, and the interface for querying ARP entries in system XX corresponds to number 3. System XX has 4 interfaces, numbered 4-7, and the interface for querying ARP entries in system YY corresponds to number 5. Therefore, the interface or API number is unique. Interfaces within different O&M systems can be distinguished based on their API numbers.

[0253] Table 12 Examples from Corpus A and Corpus B

[0254]

[0255]

[0256] Next, conflict checking is performed on the first trained model. After training, the BERT model is validated offline. Optionally, the validation set data can be obtained by proportionally splitting a portion of the full QA corpus. If the intent recognition accuracy of a certain operations and maintenance system is below standard (e.g., less than 90%), and most of the erroneous test cases (e.g., >50%) are misclassified as another operations and maintenance system, then a functional conflict is considered to exist; otherwise, no functional conflict exists. Erroneous test cases are the number of samples where the intent recognition accuracy of the operations and maintenance system is below standard.

[0257] For example, in a conflict scenario, the offline verification accuracy of interface number 5 (which can also be understood as an API interface) in the YY system is 80%, and 60% of the erroneous test cases are misclassified as interface number 3 in the XX system. Therefore, it is considered that interface number 5 in the YY system and interface number 3 in the XX system have a functional conflict.

[0258] The second feasible implementation method described above yielded corpus B. Since the questions in corpus B include modifiers specified by the operations and maintenance system, and there are no interface conflict issues within the operations and maintenance system, the system conflicts in the aforementioned recognition results are primarily caused by corpus A from the first feasible implementation method. Therefore, a feasible way to resolve these conflicts is to adjust the proportion of training samples in corpus A, for example, by reducing the number of training samples for interfaces in the lower-priority XX system.

[0259] For example, according to the priority definition in the description file, the interface for querying ARP entries in the YY system (numbered 5) has a higher priority than the interface for querying ARP entries in the XX system (numbered 3). The number of training samples (including training and validation datasets) for interface number 3 in the XX system in corpus A is reduced proportionally. Then, based on the adjusted training and validation datasets, the first model is retrained, conflict checked, and corpus allocation adjusted until no functional conflicts exist or the maximum number of iterations is reached. During the iteration process, corpus B remains unchanged. That is, during the inference phase, when a user wants to use the function to query ARP entries in the XX system, they can add the modifier "XX system" to the question, thus avoiding system conflict issues.

[0260] During the reasoning phase, after resolving the functional conflicts of multiple systems, the intent recognition results of the first model when the user inputs a question are shown in Table 13.

[0261] Table 13 Examples of Intent Identification Without System Conflict

[0262] User input issues Intent recognition results Query the dynamic ARP entries for device 222.184.125.104. Call the YY system ARP table query function (LUI-5) Query the dynamic ARP entries of device 222.184.125.104 through the XX system. Call the ARP table query function of XX system (LUI-3) Query the dynamic ARP entries of device 222.184.125.104 through the YY system. Call the YY system ARP table query function (LUI-5)

[0263] The reasoning phase is illustrated below.

[0264] During the inference phase, the user can input a question through the dialogue module on the user-side device. The dialogue module inputs the question into the agent module, the intent recognition submodule calls the trained first model, inputs the question into the first model, and the first model outputs the first operation and maintenance system corresponding to the input question. The parameter extraction submodule calls the large language model in the general LLM module, inputs the question into the large language model, and the large language model extracts parameters from the input question to obtain target parameter information. The plugin execution submodule calls the plugin, determines, based on the target parameter information, such as interface address and interface parameters, and calls the first operation and maintenance system. The first operation and maintenance system begins to execute the operation and maintenance task and returns the operation and maintenance results. The plugin execution submodule calls the plugin, parses the operation and maintenance results, and returns the parsed answer data to the dialogue module on the user-side device for presentation to the user. The configuration file generates prompts, which are used to guide the large language model through the parameter extraction process of the input question. For example, the prompts might be: "Please extract parameter information based on the following information. This information includes… (information obtained from the configuration file)".

[0265] In one example, during the inference phase, users can use the First Operations and Maintenance System to ask maintenance questions. The following example, "Query the dynamic ARP entries of device 222.184.125.104, with the time being 9:00 AM," illustrates the processing procedure of the First Operations and Maintenance System.

[0266] First, intent recognition is performed. When the agent module starts, the trained BERT model (example of the first model) is loaded; when the user asks a question, the BERT model is used for intent recognition. As shown in Table 14, the input of the BERT model is the string of the user's question, and the output of the BERT model is the identifier of the operations and maintenance module. This identifier uses a unique identifier called "number," which refers to the module's ID. Each operations and maintenance module in multiple operations and maintenance systems has a unique ID. In practical applications, each operations and maintenance module has an API interface, and the "number" of each operations and maintenance module can also be understood as the ID of the API interface of each operations and maintenance module.

[0267] Table 14 Examples of inputs and outputs of the first model during the inference phase.

[0268] enter Output Query the dynamic ARP entries for device 222.184.125.104 at 9:00 AM. 3

[0269] Next, parameter extraction is performed. Example corpora are retrieved from the configuration file based on the identifiers and set in the prompts content. Based on its context learning capabilities, the large language model extracts parameters from the user's questions, referring to the example corpora, to obtain the target parameter information. Examples of the input and output of the large language model during parameter extraction are shown in Table 15.

[0270] For example, for time-type parameters (such as updateTime), the agent module (such as a large language model) will convert the parameter from natural language into a start timestamp and end timestamp of type ULONG. For parameters containing mapper definitions (such as type), the agent module (such as a large language model) will convert the parameter from natural language into an enumeration definition.

[0271] Table 15 shows examples of the inputs and outputs of the large language model during the parameter extraction process.

[0272]

[0273] Finally, the primary operations and maintenance (O&M) system invokes the plugin to call the target O&M module to execute the O&M task indicated by the input question. After completing intent recognition and parameter extraction, the intelligent agent module automatically invokes the plugin to query data from the third-party system or the primary O&M system and presents it to the user through the user-side device.

[0274] Optionally, as shown in Table 16, ARP entry data is presented in a tabular format.

[0275] Table 16 presents ARP entry data in tabular format.

[0276] IP address type MAC address Logical Interface physical interface VRF name 10.78.119.12 D 94b2-71a5-19b0 VlanIf10 GE0 / 1 / 1 _public_

[0277] Based on the same concept as the aforementioned embodiments, this application also provides an operation and maintenance device and a model training device.

[0278] Figure 16 This is a schematic diagram illustrating the composition of an operation and maintenance device provided in an embodiment of this application. Figure 16 As shown in the figure, this application embodiment provides an operation and maintenance device 1600, applied to a first operation and maintenance system, mainly including:

[0279] The first acquisition module 1610 is used to acquire the user's input question. For example, the input question is generated in natural language and includes the operation and maintenance task and parameter information for executing the operation and maintenance task.

[0280] The first processing module 1620 is used to input an input question into a first model to output the identifier of the target operation and maintenance module; and to call the target operation and maintenance module according to the parameter information indicated by the input question to obtain the answer to the input question. The first model is trained on a training dataset, which is related to the second operation and maintenance system. The training dataset is the dataset used to train the first model, and can be related to "any operation and maintenance system," enabling the first model to identify operation and maintenance modules within "any operation and maintenance system." Users can then use the first operation and maintenance system to operate and maintain network devices managed by "any operation and maintenance system." The second operation and maintenance system is merely one example of "any operation and maintenance system." "Any operation and maintenance system" can be the first operation and maintenance system, which is equivalent to using the training dataset related to the first operation and maintenance system to fine-tune the first model, thereby improving the accuracy of the first model in identifying operation and maintenance modules within the first operation and maintenance system. "Any operation and maintenance system" can also be any operation and maintenance system other than "the first operation and maintenance system and the second operation and maintenance system," and this embodiment does not limit this.

[0281] In one possible implementation, the number of parameters in the first model is less than a first threshold.

[0282] In one possible implementation, the first processing module 1620 is further configured to: input the input question into a second model; the second model generates target parameter information based on the parameter information indicated by the input question; the number of parameters in the second model is greater than a second threshold; and, based on the parameter information indicated by the input question, invoke a target operation and maintenance module to obtain the answer to the input question, including: invoking the target operation and maintenance module based on the target parameter information to obtain the answer to the input question.

[0283] In one possible implementation, the first processing module 1620 is specifically used to: determine a target parameter extraction example corresponding to the input problem from at least one parameter extraction example; use the target parameter extraction example and the input problem as input to the second model; and, with reference to the target parameter extraction example, output target parameter information.

[0284] In one possible implementation, the first processing module 1620 is further configured to: obtain a sample set, the sample set including multiple parameter extraction samples; perform clustering processing on the multiple parameter extraction samples in the sample set, select at least one parameter extraction sample in each category, and obtain the above-mentioned at least one parameter extraction example.

[0285] In one possible implementation, the first processing module 1620 is further configured to: obtain a description file of the second operation and maintenance system, the description file indicating parameter information of at least one operation and maintenance module in the second operation and maintenance system, the operation and maintenance module representing a functional module that performs operation and maintenance tasks; based on the description file, call the second model to generate an operation and maintenance problem, the number of parameters of the second model being greater than a second threshold; and obtain a training dataset based on the operation and maintenance problem.

[0286] In one possible implementation, the aforementioned description file includes parameters of the operation and maintenance module and examples of parameter values. The first processing module 1620 is specifically configured to: select at least one parameter from the parameters in the description file to obtain a parameter group; assign values ​​to the parameters in the parameter group according to the example parameter values ​​in the description file to obtain parameter group information; and use the parameter group information as input to the second model to generate an operation and maintenance problem.

[0287] In one possible implementation, the description file also includes problem examples. The first processing module 1620 is specifically used to: based on the description file and the problem examples, invoke the second model, and, with reference to the problem examples, generate operation and maintenance problems.

[0288] In one possible implementation, the training dataset also includes a modified operation and maintenance problem, which is obtained by modifying the parameter values ​​in the operation and maintenance problem. The first processing module 1620 is further configured to: replace the parameter values ​​in the operation and maintenance problem with the parameter value examples in the description file to obtain the modified operation and maintenance problem; and / or, modify the parameter values ​​in the operation and maintenance problem according to preset rules to obtain the modified operation and maintenance problem.

[0289] In one possible implementation, the first processing module 1620 is further configured to: determine whether there is a first operation and maintenance module and a second operation and maintenance module that perform the same operation and maintenance task; if the determination result is that there is, set a first training subset in the training dataset; set specified information in the operation and maintenance issues in the first training subset, wherein the truth value of the operation and maintenance module in the first training subset is either the first operation and maintenance module or the second operation and maintenance module, and the specified information is used to specify the operation and maintenance system in which the functional module performing the operation and maintenance task is located.

[0290] In one possible implementation, the first processing module 1620 is further configured to: acquire a verification dataset; verify the trained first model based on the verification dataset to obtain a verification result, the verification result including the recognition accuracy and conflict error rate of the module to be identified, the module to be identified being one of the first operation and maintenance module and the second operation and maintenance module, the conflict error rate representing the error rate at which the module to be identified is identified as a conflict module, and the conflict module representing an operation and maintenance module other than the module to be identified in the first operation and maintenance module and the second operation and maintenance module; when the recognition accuracy is lower than a first preset threshold and the conflict error rate is greater than a second preset threshold, adjust the ratio of the training samples corresponding to the module to be identified and the conflict module in the training dataset to obtain an adjusted training dataset; and retrain the first model based on the adjusted training dataset.

[0291] In one possible implementation, the first processing module 1620 is specifically used to: reduce the proportion of samples of low-priority modules in the training dataset to obtain an adjusted training dataset, wherein the low-priority modules are the operation and maintenance modules with lower priority among the modules to be identified and the conflicting modules.

[0292] Figure 17 This is a schematic diagram illustrating the composition of a model training device provided in an embodiment of this application. Figure 17 As shown, this application embodiment provides a model training device 1700, applied to a first operation and maintenance system, mainly including:

[0293] The second acquisition module 1710 is used to acquire the description file of the second operation and maintenance system. The description file indicates the parameter information of at least one operation and maintenance module in the second operation and maintenance system, and the operation and maintenance module represents the functional module that performs operation and maintenance tasks.

[0294] The second processing module 1720 is used to call the second model according to the description file to generate an operation and maintenance problem. The number of parameters of the second model is greater than a second threshold. Based on the operation and maintenance problem, a training dataset is obtained. Based on the training dataset, the first model is trained to obtain a trained first model. The trained first model is used to identify the target operation and maintenance module corresponding to the input problem. The target operation and maintenance module represents the functional module that performs the operation and maintenance task represented by the input problem.

[0295] In one possible implementation, the aforementioned description file includes parameters of the operation and maintenance module and examples of parameter values. The second processing module 1720 is specifically configured to: select at least one parameter from the parameters in the description file to obtain a parameter group; assign values ​​to the parameters in the parameter group according to the example parameter values ​​in the description file to obtain parameter group information; and use the parameter group information as input to the second model to generate an operation and maintenance problem.

[0296] In one possible implementation, the description file also includes problem examples. The second processing module 1720 is specifically used to: based on the description file and the problem examples, invoke the second model, and, with reference to the problem examples, generate operation and maintenance problems.

[0297] In one possible implementation, the training dataset also includes a modified operation and maintenance problem, which is obtained by modifying the parameter values ​​in the operation and maintenance problem. The second processing module 1720 is further configured to: replace the parameter values ​​in the operation and maintenance problem according to the parameter value examples in the description file to obtain the modified operation and maintenance problem; and / or, modify the parameter values ​​in the operation and maintenance problem according to preset rules to obtain the modified operation and maintenance problem.

[0298] In one possible implementation, the second processing module 1720 is further configured to: determine whether there is a first operation and maintenance module and a second operation and maintenance module performing the same operation and maintenance task; if the determination result is that there is, set a first training subset in the training dataset; set specified information in the operation and maintenance issues in the first training subset, wherein the truth value of the operation and maintenance module in the first training subset is either the first operation and maintenance module or the second operation and maintenance module, and the specified information is used to specify the operation and maintenance system in which the functional module performing the operation and maintenance task is located.

[0299] In one possible implementation, the second processing module 1720 is further configured to: acquire a verification dataset; verify the trained first model based on the verification dataset to obtain verification results, the verification results including the recognition accuracy and conflict error rate of the module to be identified, the module to be identified being one of the first operation and maintenance module and the second operation and maintenance module, the conflict error rate representing the error rate at which the module to be identified is identified as a conflict module, and the conflict module representing an operation and maintenance module other than the module to be identified in the first operation and maintenance module and the second operation and maintenance module; when the recognition accuracy is lower than a first preset threshold and the conflict error rate is greater than a second preset threshold, adjust the ratio of the training samples corresponding to the module to be identified and the conflict module in the training dataset to obtain an adjusted training dataset; and retrain the first model based on the adjusted training dataset.

[0300] In one possible implementation, the second processing module 1720 is specifically used to: reduce the proportion of samples from low-priority modules in the training dataset to obtain an adjusted training dataset, wherein the low-priority modules are the operation and maintenance modules with lower priority among the modules to be identified and the conflicting modules.

[0301] The following is about... Figure 16 The operation and maintenance device 1600 shown and such Figure 17 The software and hardware implementation of the model training device 1700 shown (hereinafter referred to as "the device in the embodiments of this application") needs further explanation.

[0302] As an example of a software functional unit, a module can include code running on a computing instance. A computing instance can include at least one of a physical host (computing device), a virtual machine, or a container. Furthermore, the aforementioned computing instance can be one or more. For example, a module can include code running on multiple hosts / virtual machines / containers. It should be noted that the multiple hosts / virtual machines / containers used to run the code can be distributed within the same region or in different regions. Further, the multiple hosts / virtual machines / containers used to run the code can be distributed within the same availability zone (AZ) or in different AZs, each AZ comprising one or more geographically proximate data centers. Typically, a region can include multiple AZs.

[0303] Similarly, multiple hosts / virtual machines / containers used to run this code can be distributed within the same Virtual Private Cloud (VPC) or across multiple VPCs. Typically, a VPC is set up within a region. Communication between two VPCs within the same region, as well as between VPCs in different regions, requires a communication gateway to be set up within each VPC to enable interconnection between VPCs.

[0304] As an example of a hardware functional unit, a module may include at least one computing device, such as a server. Alternatively, a module may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). The aforementioned PLD may be implemented using a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), or any combination thereof.

[0305] The multiple computing devices included in the module can be distributed within the same region or in different regions. Similarly, the multiple computing devices included in the module can be distributed within the same Availability Zone (AZ) or in different AZs. Likewise, the multiple computing devices included in the module can be distributed within the same Virtual Private Cloud (VPC) or multiple VPCs. These multiple computing devices can be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.

[0306] It should be noted that, in other embodiments, the apparatus in this application embodiment may additionally provide one or more modules for performing any of the steps included in the above implementation. The steps implemented by one or more modules in the apparatus in this application embodiment can be specified as needed, and more or fewer modules can be obtained than in the embodiments of this application to implement different steps in the above method, thereby realizing all the functions of the apparatus in the embodiments of this application.

[0307] Based on the same concept as the foregoing embodiments, this application also provides a computing device, which includes at least a processor and a memory. The memory stores a program, and when the processor reads the program, it can implement the algorithmic functions embodied by the above-described methods or devices.

[0308] Figure 18 This is a schematic diagram of the structure of a computing device provided in an embodiment of this application. Figure 18 As shown, the computing device 1800 includes at least one processor 1801, a memory 1802, and a communication interface 1803. The processor 1801, memory 1802, and communication interface 1803 are communicatively connected, which can be achieved via a wired (e.g., bus) or wireless connection. The communication interface 1803 is used to receive data sent by other devices; the memory 1802 stores computer instructions, and the processor 1801 executes these computer instructions to perform the method described in the aforementioned method embodiments.

[0309] It should be understood that, in the embodiments of this application, the processor 1801 may be a central processing unit (CPU), or it may be other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or any conventional processor.

[0310] The memory 1802 may include read-only memory and random access memory, and provides instructions and data to the processor 1801. The memory 1802 may also include non-volatile random access memory.

[0311] The memory 1802 can be volatile memory or non-volatile memory, or it can include both. The non-volatile memory can be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. The volatile memory can be random access memory (RAM), which is used as an external cache. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous linked dynamic random access memory (SLDRAM), and direct memory bus RAM (DR RAM).

[0312] It should be understood that the computing device 1800 according to the embodiments of this application can execute the methods mentioned in the embodiments of this application. For a detailed description of the implementation of the method, please refer to the above text. For the sake of brevity, it will not be repeated here.

[0313] Embodiments of this application provide a computer-readable storage medium having a computer program stored thereon, wherein when the computer instructions are executed by a processor, the aforementioned technical solutions are implemented.

[0314] An embodiment of this application provides a chip including at least one processor and an interface. The at least one processor determines program instructions or data through the interface. The at least one processor is used to execute the program instructions to implement the technical solutions mentioned above.

[0315] Embodiments of this application provide a computer program or computer program product that includes instructions that, when executed, cause a computer to perform the aforementioned technical solutions.

[0316] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0317] The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein can be implemented in hardware, processor-executed software modules, or a combination of both. The software modules can be located in random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or any other form of storage medium known in the art.

[0318] The specific embodiments described above further illustrate the purpose, technical solution, and beneficial effects of this application. It should be understood that the above description is only a specific embodiment of this application and is not intended to limit the scope of protection of this application. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application.

Claims

1. An operation and maintenance method, characterized in that, Applied to the first operation and maintenance system, including: Obtain user input questions, which indicate maintenance tasks and parameter information for executing the maintenance tasks; The input question is input into the first model to output the identifier of the target operation and maintenance module; wherein, the first model is trained on a training dataset, and the training dataset is related to the second operation and maintenance system; Based on the parameter information indicated by the input question, the target operation and maintenance module is invoked to obtain the answer to the input question.

2. The method according to claim 1, characterized in that, The number of parameters in the first model is less than the first threshold.

3. The method according to claim 1 or 2, characterized in that, The method further includes: The input problem is input into the second model to generate target parameter information, wherein the number of parameters in the second model is greater than a second threshold. The step of invoking the target maintenance module based on the parameter information indicated by the input question to obtain the answer to the input question includes: Based on the target parameter information, the target operation and maintenance module is invoked to obtain the answer to the input question.

4. The method according to claim 3, characterized in that, The step of inputting the input question into the second model to generate target parameter information includes: From at least one parameter extraction example, determine the target parameter extraction example corresponding to the input question; Using the target parameter extraction example and the input question as input to the second model, the second model generates the target parameter information with reference to the target parameter extraction example.

5. The method according to claim 4, characterized in that, The method further includes: Obtain a sample set, which includes multiple parameter extraction samples; Clustering is performed on multiple parameter extraction samples in the sample set, and at least one parameter extraction sample is selected from each category to obtain the at least one parameter extraction example.

6. The method according to any one of claims 1-5, characterized in that, The method further includes: Obtain the description file of the second operation and maintenance system, wherein the description file indicates the parameter information of at least one operation and maintenance module in the second operation and maintenance system; According to the description file, the second model is invoked to generate operation and maintenance problems, wherein the number of parameters of the second model is greater than the second threshold; The training dataset is obtained based on the aforementioned operational and maintenance issues.

7. The method according to claim 6, characterized in that, The description file includes parameters of the operation and maintenance module and examples of parameter values; The step of calling the second model based on the description file to generate operation and maintenance issues includes: Select at least one parameter from the parameters in the description file to obtain a parameter group; Based on the parameter value examples in the description file, assign values ​​to the parameters in the parameter group to obtain the parameter group information; The parameter group information is used as input to the second model to generate the operation and maintenance problem.

8. The method according to claim 6 or 7, characterized in that, The description file also includes example questions; The step of calling the second model based on the description file to generate operation and maintenance issues includes: Based on the description file and the problem example, the second model is invoked, and with reference to the problem example, the second model generates the operation and maintenance problem.

9. The method according to any one of claims 6-8, characterized in that, The training dataset also includes modified operation and maintenance problems, which are obtained by modifying the parameter values ​​in the operation and maintenance problems; The method further includes: Based on the parameter value examples in the description file, replace the parameter values ​​in the operation and maintenance problem to obtain the modified operation and maintenance problem; And / or, according to preset rules, modify the parameter values ​​in the operation and maintenance problem to obtain the modified operation and maintenance problem.

10. The method according to any one of claims 6-9, characterized in that, The method further includes: Determine whether there is a first and a second operation and maintenance module performing the same operation and maintenance task; If the determination result is that it exists, a first training subset is set in the training dataset. Specific information is set in the operation and maintenance issues of the first training subset. The true value of the operation and maintenance module in the first training subset is either the first operation and maintenance module or the second operation and maintenance module. The specific information is used to specify the operation and maintenance system in which the functional module performing the operation and maintenance task is located.

11. The method according to claim 10, characterized in that, The method further includes: Obtain the validation dataset; Based on the verification dataset, the trained first model is verified to obtain verification results. The verification results include the recognition accuracy and conflict error rate of the module to be identified. The module to be identified is one of the first operation and maintenance module and the second operation and maintenance module. The conflict error rate represents the error rate at which the module to be identified is identified as a conflict module. The conflict module represents an operation and maintenance module other than the module to be identified in the first operation and maintenance module and the second operation and maintenance module. When the recognition accuracy is lower than a first preset threshold and the conflict error rate is greater than a second preset threshold, the ratio of the training samples corresponding to the module to be identified and the conflict module in the training dataset is adjusted to obtain an adjusted training dataset. The first model is retrained based on the adjusted training dataset.

12. The method according to claim 11, characterized in that, Adjusting the ratio of training samples corresponding to the module to be identified and the conflicting module in the training dataset includes: The proportion of low-priority modules in the training dataset is reduced to obtain an adjusted training dataset. The low-priority modules are the operation and maintenance modules with lower priority among the modules to be identified and the conflicting modules.

13. A model training method, characterized in that, Applied to the first operation and maintenance system, the method includes: Obtain a description file for a second operation and maintenance system, the description file indicating parameter information of at least one operation and maintenance module in the second operation and maintenance system; According to the description file, the second model is invoked to generate operation and maintenance problems in order to obtain a training dataset; the number of parameters of the second model is greater than the second threshold. Based on the training dataset, a first model is trained to obtain a trained first model. The trained first model is used to identify the identifier of the target operation and maintenance module corresponding to the input question. The target operation and maintenance module represents the functional module that performs the operation and maintenance task indicated by the input question.

14. The method according to claim 13, characterized in that, The description file includes parameters for calling the operation and maintenance module and examples of parameter values; The step of calling the second model based on the description file to generate operation and maintenance issues includes: Select at least one parameter from the parameters in the description file to obtain a parameter group; Based on the parameter value examples in the description file, assign values ​​to the parameters in the parameter group to obtain the parameter group information; The parameter group information is used as input to the second model to generate the operation and maintenance problem.

15. The method according to claim 14, characterized in that, The description file also includes example questions; The step of calling the second model based on the description file to generate operation and maintenance issues includes: Based on the description file and the problem example, the second model is invoked, and with reference to the problem example, the second model generates the operation and maintenance problem.

16. The method according to any one of claims 13-15, characterized in that, The training dataset also includes modified operation and maintenance problems, which are obtained by modifying the parameter values ​​in the operation and maintenance problems; The method further includes: Based on the parameter value examples in the description file, replace the parameter values ​​in the operation and maintenance problem to obtain the modified operation and maintenance problem; And / or, according to preset rules, modify the parameter values ​​in the operation and maintenance problem to obtain the modified operation and maintenance problem.

17. The method according to any one of claims 13-16, characterized in that, The method further includes: Determine whether there is a first and a second operation and maintenance module performing the same operation and maintenance task; If the determination result is that it exists, a first training subset is set in the training dataset; specified information is set in the operation and maintenance issues in the first training subset, and the truth value of the operation and maintenance module in the first training subset is the first operation and maintenance module or the second operation and maintenance module. The specified information is used to specify the operation and maintenance system in which the functional module that performs the operation and maintenance task is located.

18. The method according to claim 17, characterized in that, The method further includes: Obtain the validation dataset; Based on the validation dataset, the trained first model is validated to obtain validation results. The validation results include the recognition accuracy and conflict error rate of the module to be identified. The module to be identified is one of the first operation and maintenance module and the second operation and maintenance module. The conflict error rate represents the error rate at which the module to be identified is identified as a conflict module. The conflict module represents an operation and maintenance module other than the module to be identified in the first operation and maintenance module and the second operation and maintenance module. When the recognition accuracy is lower than a first preset threshold and the conflict error rate is greater than a second preset threshold, the ratio of the training samples corresponding to the module to be identified and the conflict module in the training dataset is adjusted to obtain an adjusted training dataset. The first model is retrained based on the adjusted training dataset.

19. The method according to claim 18, characterized in that, Adjusting the ratio of training samples corresponding to the module to be identified and the conflicting module in the training dataset includes: The proportion of low-priority modules in the training dataset is reduced to obtain an adjusted training dataset. The low-priority modules are the operation and maintenance modules with lower priority among the modules to be identified and the conflicting modules.

20. A maintenance device, characterized in that, Applied to the first operation and maintenance system, including: The first acquisition module is used to acquire the user's input question, which is generated in natural language and includes the operation and maintenance task and parameter information for executing the operation and maintenance task; The first processing module is used to input the input question into a first model to output the identifier of the target operation and maintenance module; wherein, the first model is trained on a training dataset, the training dataset is related to the second operation and maintenance system, and the target operation and maintenance module represents a functional module that performs the operation and maintenance task represented by the input question; according to the parameter information indicated by the input question, the target operation and maintenance module is invoked to obtain the answer to the input question.

21. A model training device, characterized in that, Applied to the first operation and maintenance system, including: The second acquisition module is used to acquire a description file of the second operation and maintenance system, wherein the description file indicates parameter information of at least one operation and maintenance module in the second operation and maintenance system; The second processing module is used to call the second model according to the description file to generate an operation and maintenance problem, wherein the number of parameters of the second model is greater than a second threshold; obtain a training dataset according to the operation and maintenance problem; train the first model according to the training dataset to obtain a trained first model, wherein the trained first model is used to identify the identifier of the target operation and maintenance module corresponding to the input problem, wherein the target operation and maintenance module represents the functional module that performs the operation and maintenance task represented by the input problem.

22. An operation and maintenance architecture, characterized in that, It includes a terminal and a server; the terminal is used to obtain a user's input question, and the server is used to execute the method as described in any one of claims 1-19 based on the input question.

23. A computing device, comprising a memory and a processor, characterized in that, The memory stores instructions that, when executed by a processor, cause the method described in any one of claims 1-19 to be implemented.

24. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it causes the method as described in any one of claims 1-19 to be implemented.

25. A computer program product, characterized in that, The computer program product includes program instructions that, when executed by a computer, cause the computer to perform the method as described in any one of claims 1-19.