A system fault diagnosis analysis method and related device
By standardizing and chaining multi-source heterogeneous fault data, and combining multi-dimensional verification and comprehensive confidence scoring, the problems of poor adaptability of multi-source heterogeneous data and uninterpretable diagnosis in existing technologies are solved, enabling accurate and reliable diagnosis of complex system faults and reducing the rate of misdiagnosis and missed diagnosis, as well as operation and maintenance costs.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGDONG KAITONG SOFTWARE DEV
- Filing Date
- 2026-03-31
- Publication Date
- 2026-06-23
AI Technical Summary
Existing technologies for fault diagnosis of complex systems suffer from poor adaptability to multi-source heterogeneous data, uninterpretable diagnostic results, and high rates of misdiagnosis and missed diagnosis. They also lack effective verification mechanisms and cannot meet the requirements for high precision and high reliability.
By acquiring multi-source heterogeneous fault data and converting it into structured and standardized input data, a pre-trained fault diagnosis model is used for chain-like reasoning. Combined with a meta-evaluator, multi-dimensional verification and penalty factor fusion are performed to generate a comprehensive confidence score, thereby achieving interpretability and reliability of the diagnosis process.
It significantly improves the accuracy and reliability of fault diagnosis, reduces operation and maintenance costs, and adapts to the fault diagnosis needs of various complex systems.
Smart Images

Figure CN122262584A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of intelligent analysis technology, and more specifically, to a system fault diagnosis and analysis method and related equipment. Background Technology
[0002] In fields such as industrial automation, power communication, and intelligent equipment operation and maintenance, which rely on the operation of complex systems, system fault diagnosis is a core link to ensure the continuous and stable operation of equipment, reduce operation and maintenance costs, and avoid safety accidents. Therefore, providing efficient, accurate, and reliable fault diagnosis solutions is of great practical necessity.
[0003] Currently, existing technologies for system fault diagnosis in the industry mainly fall into two categories. One category is rule-based expert diagnostic systems. These systems rely on domain experts manually writing fault judgment rules, matching fault phenomena with their causes through pre-defined logic. However, they have significant limitations: extremely poor adaptability to various types of unstructured fault data, unable to handle multi-source heterogeneous data such as logs, real-time sensor data, and maintenance records; lagging rule base updates, making it difficult to cope with new faults arising after system upgrades; and high cost and low efficiency in manually maintaining the rule base. The other category is diagnostic methods based on traditional machine learning. These methods predict fault data by constructing a single classification model. While they can handle some structured fault data, they cannot effectively integrate multi-source heterogeneous data. The diagnostic reasoning process is a black box, making it impossible to trace the diagnostic basis and reasoning logic, resulting in extremely poor interpretability of the diagnostic results.
[0004] At the same time, existing technologies generally lack effective verification mechanisms for the diagnostic process, making it impossible to verify the logical rationality between diagnostic steps, the matching degree between diagnostic conclusions and evidence, or to quantify the reliability of diagnostic results. This leads to problems such as logical breaks and insufficient evidence in the diagnostic process, resulting in a high rate of misdiagnosis and missed diagnosis. It is difficult to meet the high precision and high reliability requirements of complex systems for fault diagnosis, and many shortcomings of existing technologies urgently need to be addressed.
[0005] Therefore, there is an urgent need for a new system fault diagnosis and analysis method to overcome the shortcomings of existing technologies and achieve accurate, reliable and interpretable diagnosis of complex system faults. Summary of the Invention
[0006] This application provides a system fault diagnosis and analysis method and related equipment, which significantly improves the accuracy, reliability and interpretability of system fault diagnosis, reduces operation and maintenance costs, and adapts to the fault diagnosis needs of various complex systems.
[0007] A system fault diagnosis and analysis method, comprising:
[0008] Acquire multi-source heterogeneous fault data related to the target system, and convert the multi-source heterogeneous fault data into structured standardized input data;
[0009] The standardized input data is input into a pre-trained fault diagnosis model for chain reasoning to generate an initial diagnosis trajectory. The initial diagnosis trajectory consists of multiple diagnosis nodes arranged in reasoning order. Each diagnosis node contains the reasoning statement of the diagnosis node, the index of the evidence field on which the reasoning statement is based in the standardized input data, and the intermediate conclusion of the diagnosis node.
[0010] The initial diagnostic trajectory is input into the meta-evaluator to perform multi-dimensional verification on each diagnostic node in the initial diagnostic trajectory. The multi-dimensional verification includes at least the first dimension verification of the logical consistency between the intermediate conclusion of the diagnostic node and the evidence on which it is based, the second dimension verification of whether the logical succession relationship between adjacent diagnostic nodes conforms to the preset investigation order, and the third dimension verification of the sufficiency of the evidence on which the diagnostic node is based.
[0011] Based on the results of the multidimensional verification, a comprehensive confidence score for the initial diagnostic trajectory is generated using a penalty factor fusion method, and the initial diagnostic trajectory is output in association with the comprehensive confidence score.
[0012] Optionally, the first dimension verification includes:
[0013] Extract the index of the evidence field on which the diagnostic node is based, obtain the index value corresponding to the index from the standardized input data, verify the logical consistency between the intermediate conclusion of the diagnostic node and the index value, and mark the diagnostic node as an evidence conflict node if they are inconsistent.
[0014] Optionally, the criteria for judging logical consistency include predefined indicator threshold rules and logical relationship rules between indicators, wherein the indicator threshold rules are used to determine whether the indicator value is within the normal range, and the logical relationship rules between indicators are used to determine whether the size relationship between multiple indicator values conforms to preset constraints.
[0015] Optionally, the second dimension validation includes:
[0016] Based on a predefined set of fault diagnosis order rules, the logical connection between adjacent diagnostic nodes is checked to see if it conforms to the set of fault diagnosis order rules. If it does not conform, the skip position is marked and the missing diagnostic node is identified.
[0017] Optionally, the fault troubleshooting order rule set is a directed acyclic graph structure, where nodes in the graph represent troubleshooting steps and directed edges represent prerequisite dependencies between steps.
[0018] The verification of whether the logical connection relationship between adjacent diagnostic nodes conforms to the fault troubleshooting order rule set includes:
[0019] The diagnostic node is mapped to the corresponding node in the directed acyclic graph. It is checked whether there is a directed path from the previous diagnostic node's mapped node to the current diagnostic node's mapped node. If there is no such path, it is determined to be a logical step, and the missing diagnostic node is identified based on the graph's shortest path algorithm.
[0020] Optionally, the third-dimensional verification includes:
[0021] For diagnostic nodes that are not marked as conflicting evidence, the sufficiency score of the diagnostic node is calculated. Diagnostic nodes with scores below a preset threshold are marked as nodes with insufficient evidence. The sufficiency score is calculated by weighting the number of evidence fields on which the diagnostic node is based, the dispersion of the index values of the evidence fields, and the weight of the index corresponding to the evidence fields in a predefined set of key indicators.
[0022] Optionally, the step of generating the comprehensive confidence score of the initial diagnostic trajectory using a penalty factor fusion method includes:
[0023] Based on the results of the multidimensional verification, the initial confidence score is used as the base score. For each node with conflicting evidence, a first penalty factor is multiplied; for each logical jump, a second penalty factor is multiplied; and for each node with insufficient evidence, a third penalty factor is multiplied to obtain the comprehensive confidence score. The first penalty factor is less than the second penalty factor, and the second penalty factor is less than the third penalty factor.
[0024] Optionally, the training process of the fault diagnosis large model includes:
[0025] Acquire knowledge of operation and maintenance fault diagnosis, and construct the knowledge of operation and maintenance fault diagnosis into a directed acyclic graph structure containing fault types, troubleshooting steps, judgment conditions and conclusions, as a structured experience graph;
[0026] The structured experience graph is compiled into meta-prompt words, which drive the teacher big model to simulate and deduce historical failure cases, generate thought chain data containing complete reasoning trajectories, and construct a supervised training set. The meta-prompt words contain instructions for generating evidence field indexes, so that the teacher big model outputs the index of the evidence it is based on while outputting reasoning statements.
[0027] The basic large language model is fine-tuned under supervision using the supervised training set. The training objective is to minimize the cross-entropy loss between the output of the basic large language model and the thought chain data, thereby obtaining a preliminary diagnostic model.
[0028] A multidimensional reward function is constructed, which includes a first reward component for evaluating the accuracy of the diagnostic conclusion, a second reward component for evaluating the matching degree between the reasoning steps and the structured experience graph, a third reward component for evaluating the correctness of the evidence index, and a fourth reward component for evaluating the standardization of the output format. The preliminary diagnostic model is optimized using a group relative strategy optimization algorithm to obtain the dedicated fault diagnosis large model.
[0029] Optionally, in the second reward component, the matching degree between the reasoning steps and the structured experience graph is calculated by aligning the sequence of reasoning steps with the paths in the directed acyclic graph structure, and calculating the matching score based on the edit distance or longest common subsequence algorithm.
[0030] Optional, also includes:
[0031] When the overall confidence score is lower than a preset threshold, the standardized input data, the initial diagnostic trajectory, and the results of the multidimensional verification are pushed to the manual review interface to obtain the corrected diagnostic trajectory returned after manual review. The standardized input data and the corrected diagnostic trajectory are then combined into training samples and fed back into the iterative training set of the fault diagnosis big model and the meta-evaluator.
[0032] Optional, also includes:
[0033] Based on the causal structure model of the target system, counterfactual reasoning is performed on the standardized input data to generate a system resilience analysis report. The counterfactual reasoning includes applying a preset perturbation to the key variables in the current system state vector, simulating whether a fault alarm is triggered in the counterfactual scenario after the perturbation, and identifying the redundancy mechanism or compensation mechanism in the target system based on the simulation results.
[0034] A system fault diagnosis and analysis device, comprising a memory and a processor;
[0035] The memory is used to store programs;
[0036] The processor is used to execute the program to implement the various steps of the system fault diagnosis and analysis method as described in any of the above claims.
[0037] A readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the system fault diagnosis and analysis method as described in any of the preceding claims.
[0038] As can be seen from the above technical solutions, the system fault diagnosis and analysis method and related equipment provided in this application achieve accurate, reliable, and interpretable diagnosis of system faults through standardized processing of multi-source heterogeneous fault data, chain-like reasoning of a large-scale fault diagnosis model, multi-dimensional verification of the meta-evaluator, and generation of comprehensive confidence scores. First, by acquiring multi-source heterogeneous fault data of the target system and converting it into structured standardized input data, unified integration and standardization of fault data of different types and formats are achieved, solving the shortcomings of existing technologies in adapting to multi-source heterogeneous data and effectively expanding the data coverage of fault diagnosis. Second, the standardized input data is input into a pre-trained large-scale fault diagnosis model for chain-like reasoning, generating an initial diagnostic trajectory containing reasoning statements, evidence indexes, and intermediate conclusions. By clearly presenting the basis and conclusions of each step of the diagnosis, the black-box dilemma of traditional diagnostic methods is broken, achieving interpretability of the diagnostic process and facilitating maintenance personnel to trace the diagnostic logic and investigate the root cause of problems. Furthermore, a meta-evaluator performs multi-dimensional verification on each diagnostic node of the initial diagnostic trajectory, comprehensively reviewing the diagnostic process from three dimensions: logical consistency, logical continuity between adjacent nodes, and sufficiency of evidence. This effectively avoids problems such as logical breaks and insufficient evidence caused by a lack of verification in existing technologies, significantly reducing the probability of misdiagnosis and missed diagnosis. Finally, a comprehensive confidence score is generated using a penalty factor fusion method and correlated with the initial diagnostic trajectory for output, providing a quantitative basis for the reliability of the diagnostic results. In summary, this application addresses the core deficiencies of existing technologies one by one through the above technical means, significantly improving the accuracy, reliability, and interpretability of system fault diagnosis, reducing operation and maintenance costs, and adapting to the fault diagnosis needs of various complex systems. Attached Figure Description
[0039] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of this application. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0040] Figure 1 This is a flowchart of a system fault diagnosis and analysis method disclosed in an embodiment of this application;
[0041] Figure 2 This is a hardware structure block diagram of a system fault diagnosis and analysis device disclosed in an embodiment of this application. Detailed Implementation
[0042] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0043] This application can be used in a wide variety of general-purpose or special-purpose computing device environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor devices, distributed computing environments including any of the above devices, etc.
[0044] The following section introduces the solution proposed in this application. The technical solution is as follows, and details are provided below.
[0045] Figure 1 This is a flowchart of a system fault diagnosis and analysis method disclosed in an embodiment of this application.
[0046] like Figure 1 As shown, the method may include:
[0047] Step S1: Obtain multi-source heterogeneous fault data related to the target system, and convert the multi-source heterogeneous fault data into structured standardized input data.
[0048] Specifically, during actual operation, the target system generates a large amount of raw fault-related data from different functional modules, sensing units, and maintenance processes. This type of data is multi-source heterogeneous fault data, and its sources include, but are not limited to, system operation log files, real-time sensor parameter data, fault alarm messages, historical maintenance records, equipment configuration parameter information, and user-reported fault phenomenon descriptions. The fault data from these different sources exhibit significant differences in data format, data type, data representation standards, and data structure. For example, log data is mostly in unstructured text format, sensor data is a structured numerical sequence, and maintenance records may be in semi-structured tabular form. This heterogeneity can create data adaptation obstacles for subsequent fault reasoning and analysis.
[0049] To eliminate format and structural differences among multi-source heterogeneous data and ensure that the data can be effectively identified, parsed, and utilized by subsequent fault diagnosis models, this step standardizes the acquired raw fault data. First, data cleaning is performed on the raw multi-source heterogeneous fault data. This includes removing invalid records, redundant interference data, abnormal noise data, and completing missing values to ensure the quality and integrity of the input data. Then, according to a pre-defined unified data specification standard, the cleaned data undergoes format conversion and field standardization. The names, definitions, data types, and relationships between fields are clarified, and fault data of different formats and types are uniformly converted into a structured data format. This ultimately forms standardized input data with a unified format, standardized structure, and clear semantics, providing a high-quality, unified data foundation for subsequent fault diagnosis reasoning.
[0050] Step S2: Input the standardized input data into the pre-trained fault diagnosis model for chain reasoning to generate an initial diagnostic trajectory. The initial diagnostic trajectory consists of multiple diagnostic nodes arranged in the order of reasoning. Each diagnostic node contains the reasoning statement of the diagnostic node, the index of the evidence field on which the reasoning statement is based in the standardized input data, and the intermediate conclusion of the diagnostic node.
[0051] Specifically, the fault diagnosis big data model is a deep learning model pre-trained with large-scale fault sample data, domain expertise, and diagnostic logic rules. It possesses deep feature extraction, semantic understanding, and multi-step chain reasoning capabilities for standardized input data, simulating the fault troubleshooting logic and reasoning process of professional maintenance personnel. After the generated standardized input data is input into this pre-trained fault diagnosis big data model, the model first performs feature parsing on the input data, extracting key fault feature information and fault phenomenon descriptions. Subsequently, according to the preset fault troubleshooting logic and the reasoning rules learned during training, the model starts from the initial fault features and gradually carries out progressive chain reasoning, sequentially deriving staged diagnostic results such as the potential causes of the fault, the scope of the fault's impact, and the location information of the fault point, forming a coherent reasoning sequence.
[0052] The chain-like reasoning process is output and stored in the form of an initial diagnostic trajectory. The initial diagnostic trajectory consists of several diagnostic nodes arranged sequentially according to the reasoning order. Each diagnostic node is a basic unit constituting a complete reasoning process and contains three types of core information: First, reasoning statements, which clearly describe the reasoning process, logical basis, and analysis ideas corresponding to the diagnostic node, realizing the interpretability of the reasoning process; second, evidence field indexes, which clarify which field and position in the standardized input data the specific data on which the reasoning process of the diagnostic node depends comes from, facilitating the subsequent tracing and verification of the reasoning basis; and third, intermediate conclusions, which are the stage-by-stage fault diagnosis results obtained by the diagnostic node through reasoning. These intermediate conclusions will serve as the basis for the reasoning of the next diagnostic node, and finally, a complete fault diagnosis reasoning trajectory is formed by the chaining of multiple diagnostic nodes.
[0053] Step S3: Input the initial diagnostic trajectory into the meta-evaluator to perform multi-dimensional verification on each diagnostic node in the initial diagnostic trajectory. The multi-dimensional verification includes at least the first dimension verification of the logical consistency between the intermediate conclusion of the diagnostic node and the evidence on which it is based, the second dimension verification of whether the logical connection between adjacent diagnostic nodes conforms to the preset screening order, and the third dimension verification of the sufficiency of the evidence on which the diagnostic node is based.
[0054] Specifically, the meta-evaluator is a specialized verification module independent of the large-scale fault diagnosis model. Its core function is to comprehensively review the reasoning rationality, logical rigor, and evidence sufficiency of the initial diagnostic trajectory, in order to filter out problematic nodes with logical loopholes or evidentiary deficiencies in the reasoning process, thus ensuring the reliability of the diagnostic trajectory. After the generated initial diagnostic trajectory is input into the meta-evaluator, the meta-evaluator performs multi-dimensional verification operations on each diagnostic node in the trajectory. This multi-dimensional verification includes at least three core verification dimensions, and each dimension is independent yet complementary to the others.
[0055] The first dimension is logical consistency verification, primarily used to check whether there are logical contradictions or causal disconnects between the intermediate conclusions of each diagnostic node and the evidence fields on which that node is based. It verifies whether the conclusions can be reasonably derived from the corresponding evidence, preventing unfounded conclusions or mismatches between evidence and conclusions. The second dimension is logical continuity verification, focusing on whether the reasoning logic between two adjacent diagnostic nodes is coherent and conforms to the pre-defined fault-finding sequence, standard reasoning process, and professional logical rules within the domain, avoiding issues such as skipped steps in reasoning, logical breaks, or disordered reasoning order. The third dimension is evidence sufficiency verification, which focuses on verifying whether the evidence fields on which each diagnostic node is based are complete and sufficient to support the intermediate conclusions of that node, determining whether there are missing evidence, insufficient quantity or type of evidence leading to reduced reliability of the conclusions. Through comprehensive verification across these three dimensions, the meta-evaluator can accurately identify various problematic nodes in the initial diagnostic trajectory, providing clear verification results for subsequent confidence score calculations.
[0056] Step S4: Based on the results of the multidimensional verification, a comprehensive confidence score for the initial diagnostic trajectory is generated using a penalty factor fusion method, and the initial diagnostic trajectory is associated with the comprehensive confidence score and output.
[0057] Furthermore, a comprehensive confidence score for the initial diagnostic trajectory is generated using a penalty factor fusion method, including:
[0058] Based on the results of the multidimensional verification, the initial confidence score is used as the base score. For each node with conflicting evidence, a first penalty factor is multiplied; for each logical jump, a second penalty factor is multiplied; and for each node with insufficient evidence, a third penalty factor is multiplied to obtain the comprehensive confidence score. The first penalty factor is less than the second penalty factor, and the second penalty factor is less than the third penalty factor.
[0059] Specifically, based on the multi-dimensional verification results, the overall reliability of the initial diagnostic trajectory is quantitatively evaluated, a comprehensive confidence score is generated, and the score is correlated with the diagnostic trajectory for output, providing a reference for the decision-making of maintenance personnel. First, after the meta-evaluator completes the multi-dimensional verification of all diagnostic nodes, it outputs the verification status of each node, clarifying whether there are problems such as evidence conflicts, logical jumps, or insufficient evidence in each node. Then, based on the verification results, a penalty factor fusion method is used to calculate the comprehensive confidence score. This score uses a preset initial confidence score as the base score, which is determined based on the training accuracy of the fault diagnosis model, historical diagnostic performance, and domain-general standards.
[0060] For various problematic nodes identified during the verification process, different penalty factors are applied to adjust the base score: for diagnostic nodes with conflicting evidence, the base score is multiplied by a first penalty factor; for diagnostic nodes with logical jumps, it is multiplied by a second penalty factor; and for diagnostic nodes with insufficient evidence, it is multiplied by a third penalty factor. The values of the first, second, and third penalty factors satisfy the condition that the first penalty factor is less than the second penalty factor, and the second penalty factor is less than the third penalty factor. This indicates that insufficient evidence has the greatest negative impact on diagnostic reliability, followed by logical jumps, while conflicting evidence has a relatively smaller negative impact. Through the fusion calculation of these penalty factors, a comprehensive confidence score that objectively reflects the overall reliability of the initial diagnostic trajectory is obtained. Finally, the initial diagnostic trajectory is linked to the calculated comprehensive confidence score and synchronously output to the user interface or operation and maintenance system. This allows operation and maintenance personnel to intuitively judge the reliability of the diagnostic results through the comprehensive confidence score while obtaining the complete fault reasoning trajectory, providing accurate and reliable reference for fault handling, equipment maintenance, and other decisions.
[0061] This application, through the collaborative design of steps S1 to S4, achieves effective adaptation of multi-source heterogeneous data, interpretability of the diagnostic reasoning process, rigor of diagnostic logic, and quantitative reliability of diagnostic results. It comprehensively solves the core pain points in the prior art and is applicable to fault diagnosis scenarios of various complex systems such as industrial automation, power communication, and intelligent equipment operation and maintenance. It significantly improves the efficiency and accuracy of fault diagnosis and reduces system operation and maintenance costs.
[0062] As can be seen from the above technical solutions, the system fault diagnosis and analysis method and related equipment provided in this application achieve accurate, reliable, and interpretable diagnosis of system faults through standardized processing of multi-source heterogeneous fault data, chain-like reasoning of a large-scale fault diagnosis model, multi-dimensional verification of the meta-evaluator, and generation of comprehensive confidence scores. First, by acquiring multi-source heterogeneous fault data of the target system and converting it into structured standardized input data, unified integration and standardization of fault data of different types and formats are achieved, solving the shortcomings of existing technologies in adapting to multi-source heterogeneous data and effectively expanding the data coverage of fault diagnosis. Second, the standardized input data is input into a pre-trained large-scale fault diagnosis model for chain-like reasoning, generating an initial diagnostic trajectory containing reasoning statements, evidence indexes, and intermediate conclusions. By clearly presenting the basis and conclusions of each step of the diagnosis, the black-box dilemma of traditional diagnostic methods is broken, achieving interpretability of the diagnostic process and facilitating maintenance personnel to trace the diagnostic logic and investigate the root cause of problems. Furthermore, a meta-evaluator performs multi-dimensional verification on each diagnostic node of the initial diagnostic trajectory, comprehensively reviewing the diagnostic process from three dimensions: logical consistency, logical continuity between adjacent nodes, and sufficiency of evidence. This effectively avoids problems such as logical breaks and insufficient evidence caused by a lack of verification in existing technologies, significantly reducing the probability of misdiagnosis and missed diagnosis. Finally, a comprehensive confidence score is generated using a penalty factor fusion method and correlated with the initial diagnostic trajectory for output, providing a quantitative basis for the reliability of the diagnostic results. In summary, this application addresses the core deficiencies of existing technologies one by one through the above technical means, significantly improving the accuracy, reliability, and interpretability of system fault diagnosis, reducing operation and maintenance costs, and adapting to the fault diagnosis needs of various complex systems.
[0063] In some embodiments of this application, in step S3, after the initial diagnostic trajectory is input to the meta-evaluator, the meta-evaluator performs multi-dimensional verification operations on each diagnostic node in the initial diagnostic trajectory. Through multi-dimensional and comprehensive review, it ensures the logical rigor, evidentiary sufficiency, and procedural standardization of the diagnostic reasoning process, providing an accurate basis for the subsequent generation of comprehensive confidence scores. The specific implementation process of the three core dimensions of multi-dimensional verification is as follows:
[0064] The first dimension of verification includes:
[0065] Extract the index of the evidence field on which the diagnostic node is based, obtain the index value corresponding to the index from the standardized input data, verify the logical consistency between the intermediate conclusion of the diagnostic node and the index value, and mark the diagnostic node as an evidence conflict node if they are inconsistent.
[0066] The criteria for judging logical consistency include predefined indicator threshold rules and logical relationship rules between indicators. The indicator threshold rules are used to determine whether the indicator value is within the normal range, and the logical relationship rules between indicators are used to determine whether the size relationship between multiple indicator values meets the preset constraints.
[0067] Specifically, the first dimension verification focuses on checking the logical consistency between the intermediate conclusions of each diagnostic node and the evidence on which they are based, preventing discrepancies or contradictions between evidence and conclusions. The specific process includes: First, the meta-evaluator extracts the index information corresponding to the evidence fields on which the reasoning statement of the current diagnostic node is based from the current diagnostic node to be verified. This index information can accurately locate the specific field position in the standardized input data generated in step S1. Then, based on the extracted index, the corresponding indicator value is retrieved from the standardized input data to ensure that the obtained evidence data is completely consistent with the reasoning basis of the diagnostic node, avoiding verification deviations caused by incorrect evidence retrieval. Next, based on the preset judgment criteria, the logical consistency between the intermediate conclusions of the diagnostic node and the retrieved indicator values is verified. If there is a logical contradiction between the two and they cannot support each other, the diagnostic node is marked as an evidence conflict node, which facilitates subsequent targeted penalty adjustments.
[0068] The judgment criteria for logical consistency mainly include two types of predefined rules, which complement and work together to ensure the accuracy of the verification. The first type is the indicator threshold rule. This rule pre-sets corresponding normal ranges for various indicators in the standardized input data. By judging whether the retrieved indicator values are within the preset normal range, the rationality of the evidence is initially verified, and then the logical matching degree between the intermediate conclusion and the evidence is judged. The second type is the logical relationship rule between indicators. This rule pre-sets clear constraints for multiple indicators that are related. It is used to judge whether the size relationship, ratio relationship, etc. between multiple related indicator values meet the preset constraints, avoiding the situation where a single indicator is normal but the logical relationship between multiple indicators is abnormal, and further improving the comprehensiveness and rigor of the logical consistency verification.
[0069] In the actual verification process, the meta-evaluator first performs preliminary verification of individual evidence indicator values using indicator threshold rules. If an indicator value exceeds the normal range and the intermediate conclusion does not reflect the anomaly, or if the conclusion contradicts the fault phenomenon reflected by the anomaly indicator, it is initially determined to be logically inconsistent. Subsequently, multiple related indicators are verified using logical relationship rules between indicators. For example, if the intermediate conclusion of a diagnostic node is "the system load is too high, causing the fault," the evidence it is based on is two indicators: CPU utilization and memory utilization. The preset logical relationship rule between indicators is "only when both CPU utilization and memory utilization are higher than the threshold can it be determined that the system load is too high." If only one indicator is higher than the threshold and the other is in the normal range, it is determined that the intermediate conclusion of the node is logically inconsistent with the evidence and is marked as an evidence conflict node.
[0070] The second dimension of verification includes:
[0071] Based on a predefined set of fault diagnosis order rules, the logical connection between adjacent diagnostic nodes is checked to see if it conforms to the set of fault diagnosis order rules. If it does not conform, the skip position is marked and the missing diagnostic node is identified.
[0072] The fault troubleshooting order rule set is a directed acyclic graph structure, where nodes represent troubleshooting steps and directed edges represent prerequisite dependencies between steps.
[0073] The verification of whether the logical connection relationship between adjacent diagnostic nodes conforms to the fault troubleshooting order rule set includes:
[0074] The diagnostic node is mapped to the corresponding node in the directed acyclic graph. It is checked whether there is a directed path from the previous diagnostic node's mapped node to the current diagnostic node's mapped node. If there is no such path, it is determined to be a logical step, and the missing diagnostic node is identified based on the graph's shortest path algorithm.
[0075] Specifically, the second-dimensional verification is mainly used to check whether the reasoning logic between adjacent diagnostic nodes is coherent and conforms to the preset fault diagnosis process, avoiding problems such as reasoning jumps and logical breaks. The specific process includes: First, the meta-evaluator calls the predefined fault diagnosis sequence rule set. This rule set is built based on domain expertise, standard fault diagnosis processes, and a large amount of historical operation and maintenance experience, and is stored and managed using a directed acyclic graph structure. In this directed acyclic graph, each node corresponds to a specific fault diagnosis step, and the directed edges between nodes represent the prerequisite dependencies between different diagnosis steps. That is, the execution of one step must be based on the completion of another step, which clarifies the order and logical relationship of fault diagnosis.
[0076] Subsequently, the meta-evaluator maps each diagnostic node in the initial diagnostic trajectory to a corresponding node in the directed acyclic graph (DAG) according to its corresponding investigation content and intermediate conclusions, ensuring that each diagnostic node can find a unique corresponding investigation step node. Next, for two adjacent diagnostic nodes, it focuses on checking whether there is a direct or indirect directed path between the DAG node mapped by the previous diagnostic node and the DAG node mapped by the current diagnostic node. If no directed path exists, it means that the reasoning order of the two adjacent diagnostic nodes does not conform to the preset fault investigation logic and is judged as a logical jump. Finally, based on the shortest path algorithm of the DAG, the shortest path between the previous mapped node and the current mapped node is calculated. The nodes contained on this path that do not appear in the initial diagnostic trajectory are the missing diagnostic nodes. The meta-evaluator will mark the specific location of the logical jump and clearly identify the missing diagnostic nodes, providing a basis for the optimization of the subsequent diagnostic trajectory and confidence scoring.
[0077] For example, in the directed acyclic graph corresponding to the preset fault investigation sequence rule set, the investigation steps are "fault phenomenon collection → basic parameter detection → core module investigation → fault cause location". There are clear directed paths between each node. If two adjacent diagnostic nodes in the initial diagnostic trajectory are mapped to the "fault phenomenon collection" and "core module investigation" nodes respectively, and there is no direct or indirect directed path between them, that is, the "basic parameter detection" step is skipped, it is determined to be a logical skip. The missing diagnostic node is identified as "basic parameter detection" by the shortest path algorithm, and the skip position is marked as between "fault phenomenon collection" and "core module investigation".
[0078] The third dimension of verification includes:
[0079] For diagnostic nodes that are not marked as conflicting evidence, the sufficiency score of the diagnostic node is calculated. Diagnostic nodes with scores below a preset threshold are marked as nodes with insufficient evidence. The sufficiency score is calculated by weighting the number of evidence fields on which the diagnostic node is based, the dispersion of the index values of the evidence fields, and the weight of the index corresponding to the evidence fields in a predefined set of key indicators.
[0080] Specifically, the third dimension verification focuses on diagnostic nodes that have not been marked as conflicting evidence nodes. It examines whether the evidence upon which their reasoning is based is sufficiently adequate to ensure the reliability of intermediate conclusions. The process includes: First, the meta-evaluator filters diagnostic nodes that have not been marked as conflicting evidence nodes after the first dimension verification. The intermediate conclusions of these nodes do not contradict the evidence they rely on, but the sufficiency of the evidence still needs further verification. Then, for each filtered diagnostic node, an evidence sufficiency score is calculated. This score is obtained through a weighted calculation of multi-dimensional indicators and can objectively reflect the sufficiency of the evidence relied upon by the node. Finally, the calculated evidence sufficiency score is compared with a preset score threshold. If the score is lower than the preset threshold, it indicates that the evidence relied upon by the diagnostic node is insufficient to support its intermediate conclusion, and it is marked as an insufficient evidence node.
[0081] The calculation of the evidence sufficiency score is based on three core elements, which are weighted according to a preset weight ratio to ensure the reasonableness and comprehensiveness of the score. The first element is the number of evidence fields upon which the diagnostic node is based. A larger number of evidence fields indicates more comprehensive evidence supporting the intermediate conclusion, resulting in a higher score. The second element is the dispersion of the indicator values of the evidence fields. Lower dispersion indicates more consistent fault information reflected by each evidence field, higher credibility of the evidence, and a higher score. The third element is the weight of the indicator corresponding to the evidence field within a predefined set of key indicators. This set of key indicators is set based on the core needs of fault diagnosis, and the indicators within it have a greater impact on fault diagnosis. If the indicator corresponding to the evidence field belongs to the set of key indicators and has a higher weight, the evidence is more important and can significantly improve the evidence sufficiency score. Through the weighted calculation of these three elements, the evidence sufficiency score for each diagnostic node is obtained, achieving a quantitative assessment of evidence sufficiency.
[0082] In some embodiments of this application, the training process of the fault diagnosis large model in step S2 is described, which may specifically include:
[0083] ①Acquire knowledge of operation and maintenance fault diagnosis, and construct the knowledge of operation and maintenance fault diagnosis into a directed acyclic graph structure containing fault types, troubleshooting steps, judgment conditions and conclusions, as a structured experience graph;
[0084] ② The structured experience graph is compiled into meta-prompt words, which drive the teacher big model to simulate and deduce historical failure cases, generate thought chain data containing complete reasoning trajectories, and construct a supervised training set. The meta-prompt words contain instructions for generating evidence field indexes, so that the teacher big model outputs the index of the evidence it is based on while outputting reasoning statements.
[0085] ③ The basic large language model is fine-tuned under supervision using the supervised training set. The training objective is to minimize the cross-entropy loss between the output of the basic large language model and the thought chain data, so as to obtain a preliminary diagnostic model.
[0086] ④ Construct a multidimensional reward function, which includes a first reward component for evaluating the accuracy of the diagnostic conclusion, a second reward component for evaluating the matching degree between the reasoning steps and the structured experience map, a third reward component for evaluating the correctness of the evidence index, and a fourth reward component for evaluating the standardization of the output format. The preliminary diagnostic model is optimized using a group relative strategy optimization algorithm to obtain the dedicated fault diagnosis large model.
[0087] Specifically, the process begins by acquiring a broad range of troubleshooting knowledge within the system operation and maintenance (O&M) field. This knowledge includes, but is not limited to, the troubleshooting experience of domain experts, industry-wide fault diagnosis standards and specifications, historical O&M fault handling manuals, and equipment fault troubleshooting guidelines. This ensures that the acquired O&M troubleshooting knowledge is comprehensive, authoritative, and relevant to practical application scenarios. Subsequently, the acquired O&M troubleshooting knowledge is organized, refined, and structured, constructing a directed acyclic graph (DAG) structure—the structured experience graph. This DAG clearly defines the causal relationships between fault types, troubleshooting steps, judgment conditions, and conclusions. Nodes in the graph correspond to fault types, troubleshooting steps, judgment conditions, and diagnostic conclusions, respectively. The directed edges between nodes represent the causal relationships and logical order between these elements, clearly presenting the complete logical chain from fault type, through a series of troubleshooting steps, meeting corresponding judgment conditions, to finally arriving at a diagnostic conclusion. This provides standardized domain knowledge support for subsequent model training.
[0088] After constructing the structured experience graph, it is compiled into meta-hints. The core function of these meta-hints is to provide clear reasoning guidance for the teacher's large-scale model, enabling it to perform fault diagnosis simulations according to the logic of the structured experience graph. Specifically, the meta-hints embed instructions for generating evidence field indexes. These instructions explicitly require the teacher's large-scale model to simultaneously output the index information of the evidence upon which the reasoning statement is based in the corresponding data, ensuring the traceability of the generated reasoning data and its consistency with the evidence index requirements of subsequent diagnostic nodes. Subsequently, these meta-hints are input into the teacher's large-scale model, driving it to simulate and deduce a large number of historical fault cases. Guided by the meta-hints, the teacher's large-scale model simulates the fault diagnosis approach of domain experts. For each historical fault case, it outputs thought chain data containing a complete reasoning trajectory. This thought chain data has the same structure as the initial diagnostic trajectory in step S2, including reasoning statements arranged in reasoning order, corresponding evidence indexes, and intermediate conclusions. Finally, the thought chain data corresponding to all historical failure cases were sorted and filtered, invalid and abnormal data were removed, and a supervised training set was constructed to provide high-quality labeled data for subsequent fine-tuning of the basic large language model.
[0089] A basic large language model adapted for text reasoning and semantic understanding scenarios was selected as the initial model. Supervised fine-tuning was then performed on this basic large language model using the constructed supervised training set, enabling it to gradually learn the reasoning logic, evidence association, and output specifications in the field of fault diagnosis. The core training objective of this supervised fine-tuning was to minimize the cross-entropy loss between the output of the basic large language model and the thought chain data in the supervised training set. By iteratively optimizing the model parameters, the inference trajectory output by the model was made to closely match the thought chain data in the supervised training set, ensuring that the model could master the logical rules of fault diagnosis, the evidence index generation method, and the intermediate conclusion derivation method. After multiple rounds of supervised fine-tuning, when the cross-entropy loss reached a preset threshold and the model output tended to stabilize, fine-tuning was stopped, resulting in a preliminary diagnostic model. This model possessed basic fault diagnosis reasoning capabilities and evidence index output capabilities.
[0090] To further improve the diagnostic accuracy, reasoning standardization, and evidence relevance of the preliminary diagnostic model, a multi-dimensional reward function is constructed to evaluate the model output from multiple dimensions, guiding the model to optimize its output results through a reward mechanism. This multi-dimensional reward function comprises four core reward components, each corresponding to a different evaluation dimension of the model output, complementing and synergistically working together: The first reward component evaluates the accuracy of the diagnostic conclusion, i.e., the degree of matching between the model's final diagnostic conclusion and the actual fault situation; the higher the matching degree, the higher the reward score. The second reward component evaluates the matching degree between the reasoning steps and the structured experience graph, ensuring that the model's reasoning process conforms to domain knowledge logic. The third reward component evaluates the correctness of the evidence index, verifying whether the evidence index output by the model accurately corresponds to the evidence fields upon which the reasoning statements are based. The fourth reward component evaluates the standardization of the output format, ensuring that the format of the model's output reasoning trajectory is consistent with the preset standard, facilitating subsequent meta-evaluator verification and user viewing.
[0091] In the second reward component, the matching degree between the reasoning steps and the structured experience graph is calculated by aligning the sequence of reasoning steps with the paths in the directed acyclic graph structure, and calculating the matching score based on the edit distance or longest common subsequence algorithm.
[0092] Specifically, in the second reward component, the matching degree between the reasoning steps and the structured experience graph is calculated as follows: First, the sequence of reasoning steps output by the model is aligned with the effective paths in the structured experience graph (directed acyclic graph structure) to clarify the graph path corresponding to the sequence of reasoning steps; then, the edit distance algorithm or the longest common subsequence algorithm is used to calculate the matching score between the sequence of reasoning steps and the corresponding graph path. This score is the matching degree between the reasoning steps and the structured experience graph. The higher the score, the more the model's reasoning steps conform to the domain's pre-defined fault diagnosis logic. After constructing the multidimensional reward function, a group relative strategy optimization algorithm is used to optimize the preliminary diagnostic model. This algorithm continuously adjusts the model parameters to maximize the total reward score of the multidimensional reward function, gradually optimizing the model's reasoning accuracy, logical regularity, and evidence relevance. After multiple rounds of strategy optimization, the final dedicated fault diagnosis model is obtained. This model can be directly used for chain reasoning in step S2 to generate an initial diagnostic trajectory that meets the requirements.
[0093] In some embodiments of this application, considering the accuracy requirements of complex system fault diagnosis and the long-term optimization needs of system operation and maintenance, and in order to further improve the performance of the fault diagnosis large model and meta-evaluator, and enrich the functional dimensions of the fault diagnosis method, this application may also include a process of manual review of sample return and system resilience analysis, which may specifically include the following two parts:
[0094] Manual verification of sample return process:
[0095] When the overall confidence score is lower than a preset threshold, the standardized input data, the initial diagnostic trajectory, and the results of the multidimensional verification are pushed to the manual review interface to obtain the corrected diagnostic trajectory returned after manual review. The standardized input data and the corrected diagnostic trajectory are then combined into training samples and fed back into the iterative training set of the fault diagnosis big model and the meta-evaluator.
[0096] Specifically, when the overall confidence score is lower than a preset threshold, it indicates that the reliability of the initial diagnostic trajectory has not met the preset standard. In this case, a manual review process needs to be initiated to ensure the accuracy of the diagnostic results and provide high-quality samples for model iteration. The specific process is as follows: First, the meta-evaluator simultaneously pushes the standardized input data generated in step S1, the initial diagnostic trajectory generated in step S2, and the multi-dimensional verification results obtained in step S3 to a preset manual review interface. This interface clearly displays the inference nodes in the initial diagnostic trajectory, the verification status of each node (such as nodes with conflicting evidence, logical jump positions, and nodes with insufficient evidence), and the corresponding evidence fields, facilitating quick problem identification and efficient review by manual reviewers. Subsequently, manual reviewers, combining their professional knowledge, domain experience, and the pushed relevant data, review and correct the initial diagnostic trajectory, supplementing missing diagnostic nodes, adjusting logically contradictory inference statements, improving evidence fields and corresponding indexes, and finally generating a corrected diagnostic trajectory. The corrected diagnostic trajectory is then fed back to the system through the manual review interface. Finally, the system associates and combines the standardized input data from step S1 with the corrected diagnostic trajectory returned after manual review to form a new high-quality training sample. This training sample is then fed back into the iterative training set of the fault diagnosis big model and the meta-evaluator for continuous iterative optimization of the model. This gradually improves the chain reasoning accuracy of the fault diagnosis big model and the multi-dimensional verification accuracy of the meta-evaluator, achieving a closed-loop improvement in model performance.
[0097] The manual review interface includes modules for node editing, evidence supplementation, and conclusion correction. Reviewers can directly edit abnormal nodes in the initial diagnostic trajectory, supplement missing evidence indexes and reasoning logic, and correct intermediate or final diagnostic conclusions. Simultaneously, the interface records the reviewer's correction operations, storing them in conjunction with the corrected diagnostic trajectory for easy tracing of the review process later. When training samples are fed back, the system categorizes and labels the samples, clearly indicating the correction type (e.g., node supplementation, logic correction, evidence enhancement), which are used to optimize the reasoning logic of the large-scale fault diagnosis model and adjust the meta-evaluator validation rules, ensuring the targeted and effective iterative training.
[0098] System resilience analysis process:
[0099] Based on the causal structure model of the target system, counterfactual reasoning is performed on the standardized input data to generate a system resilience analysis report. The counterfactual reasoning includes applying a preset perturbation to the key variables in the current system state vector, simulating whether a fault alarm is triggered in the counterfactual scenario after the perturbation, and identifying the redundancy mechanism or compensation mechanism in the target system based on the simulation results.
[0100] Specifically, based on the causal structure model of the target system, counterfactual reasoning is performed on the standardized input data to generate a system resilience analysis report. This report identifies redundancy or compensation mechanisms in the target system, providing additional support for system operation and maintenance optimization and fault prevention, thereby improving the system's fault resistance. The counterfactual reasoning refers to simulating the system's operating state and fault triggering conditions based on the actual operating state of the target system, assuming changes in one or more key variables, and then analyzing the system's resilience level. The specific process includes: First, retrieving the causal structure model of the target system. This model is constructed based on the target system's architecture design, module relationships, and parameter influence patterns, clearly presenting the causal relationships and interaction mechanisms between various system variables. Then, extracting the current system state vector from the standardized input data generated in step S1. This state vector contains key information such as the operating parameters and indicator values of each core module of the system. Next, applying preset perturbations to the key variables in the system state vector. These perturbations include increases or decreases in variable values and changes in variable states, simulating the changes in the system's operating state under the counterfactual scenario corresponding to the perturbation, with a focus on monitoring whether fault alarms are triggered.
[0101] During counterfactual reasoning, different preset disturbance ranges are set for different types of key variables to ensure the rationality and relevance of the disturbance scenarios and avoid ineffective disturbances. Simultaneously, various system operating indicators and module response statuses are recorded in real time under counterfactual scenarios. If no fault alarm is triggered under a certain disturbance scenario, it indicates that the system has a mechanism to resist this type of disturbance. Further analysis of the collaborative interaction process of various modules in this scenario identifies redundancy or compensation mechanisms within the system. Redundancy mechanisms refer to backup modules and spare parameters in the system, which can replace core modules or parameters when they are abnormal. Compensation mechanisms refer to the collaborative adjustment capabilities between modules in the system, which can compensate for the impact of a module's abnormality by adjusting the operating status of other modules. Finally, based on all the simulation results of counterfactual reasoning, a system resilience analysis report is compiled, clarifying the system's fault resistance capabilities under different disturbance scenarios, detailing the identified redundancy and compensation mechanisms and their operating principles, and providing targeted optimization suggestions for system maintenance personnel to help improve the system's stability and resilience.
[0102] The system fault diagnosis and analysis method provided in this application embodiment can be applied to system fault diagnosis and analysis equipment. Figure 2 The hardware structure block diagram of the system fault diagnosis and analysis equipment is shown. (Refer to...) Figure 2 The hardware structure of the system fault diagnosis and analysis equipment may include: at least one processor 1, at least one communication interface 2, at least one memory 3, and at least one communication bus 4;
[0103] In this embodiment of the application, the number of processor 1, communication interface 2, memory 3, and communication bus 4 is at least one, and processor 1, communication interface 2, and memory 3 communicate with each other through communication bus 4;
[0104] Processor 1 may be a central processing unit (CPU), an application-specific integrated circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention.
[0105] Memory 3 may include high-speed RAM, and may also include non-volatile memory, such as at least one disk storage device;
[0106] The memory stores a program, which the processor can call. The program is used for:
[0107] Acquire multi-source heterogeneous fault data related to the target system, and convert the multi-source heterogeneous fault data into structured standardized input data;
[0108] The standardized input data is input into a pre-trained fault diagnosis model for chain reasoning to generate an initial diagnosis trajectory. The initial diagnosis trajectory consists of multiple diagnosis nodes arranged in reasoning order. Each diagnosis node contains the reasoning statement of the diagnosis node, the index of the evidence field on which the reasoning statement is based in the standardized input data, and the intermediate conclusion of the diagnosis node.
[0109] The initial diagnostic trajectory is input into the meta-evaluator to perform multi-dimensional verification on each diagnostic node in the initial diagnostic trajectory. The multi-dimensional verification includes at least the first dimension verification of the logical consistency between the intermediate conclusion of the diagnostic node and the evidence on which it is based, the second dimension verification of whether the logical succession relationship between adjacent diagnostic nodes conforms to the preset investigation order, and the third dimension verification of the sufficiency of the evidence on which the diagnostic node is based.
[0110] Based on the results of the multidimensional verification, a comprehensive confidence score for the initial diagnostic trajectory is generated using a penalty factor fusion method, and the initial diagnostic trajectory is output in association with the comprehensive confidence score.
[0111] Optionally, the refined and extended functions of the program can be referred to the above description.
[0112] This application embodiment also provides a readable storage medium that can store a program suitable for execution by a processor, the program being used for:
[0113] Acquire multi-source heterogeneous fault data related to the target system, and convert the multi-source heterogeneous fault data into structured standardized input data;
[0114] The standardized input data is input into a pre-trained fault diagnosis model for chain reasoning to generate an initial diagnosis trajectory. The initial diagnosis trajectory consists of multiple diagnosis nodes arranged in reasoning order. Each diagnosis node contains the reasoning statement of the diagnosis node, the index of the evidence field on which the reasoning statement is based in the standardized input data, and the intermediate conclusion of the diagnosis node.
[0115] The initial diagnostic trajectory is input into the meta-evaluator to perform multi-dimensional verification on each diagnostic node in the initial diagnostic trajectory. The multi-dimensional verification includes at least the first dimension verification of the logical consistency between the intermediate conclusion of the diagnostic node and the evidence on which it is based, the second dimension verification of whether the logical succession relationship between adjacent diagnostic nodes conforms to the preset investigation order, and the third dimension verification of the sufficiency of the evidence on which the diagnostic node is based.
[0116] Based on the results of the multidimensional verification, a comprehensive confidence score for the initial diagnostic trajectory is generated using a penalty factor fusion method, and the initial diagnostic trajectory is output in association with the comprehensive confidence score.
[0117] Optionally, the refined and extended functions of the program can be referred to the above description.
[0118] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.
[0119] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on the differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other.
[0120] The above description of the disclosed embodiments enables those skilled in the art to make or use this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A system fault diagnosis and analysis method, characterized in that, include: Acquire multi-source heterogeneous fault data related to the target system, and convert the multi-source heterogeneous fault data into structured standardized input data; The standardized input data is input into a pre-trained fault diagnosis model for chain reasoning to generate an initial diagnosis trajectory. The initial diagnosis trajectory consists of multiple diagnosis nodes arranged in reasoning order. Each diagnosis node contains the reasoning statement of the diagnosis node, the index of the evidence field on which the reasoning statement is based in the standardized input data, and the intermediate conclusion of the diagnosis node. The initial diagnostic trajectory is input into the meta-evaluator to perform multi-dimensional verification on each diagnostic node in the initial diagnostic trajectory. The multi-dimensional verification includes at least the first dimension verification of the logical consistency between the intermediate conclusion of the diagnostic node and the evidence on which it is based, the second dimension verification of whether the logical succession relationship between adjacent diagnostic nodes conforms to the preset investigation order, and the third dimension verification of the sufficiency of the evidence on which the diagnostic node is based. Based on the results of the multidimensional verification, a comprehensive confidence score for the initial diagnostic trajectory is generated using a penalty factor fusion method, and the initial diagnostic trajectory is output in association with the comprehensive confidence score.
2. The method according to claim 1, characterized in that, The first dimension verification includes: Extract the index of the evidence field on which the diagnostic node is based, obtain the index value corresponding to the index from the standardized input data, verify the logical consistency between the intermediate conclusion of the diagnostic node and the index value, and mark the diagnostic node as an evidence conflict node if they are inconsistent.
3. The method according to claim 2, characterized in that, The criteria for determining logical consistency include predefined indicator threshold rules and logical relationship rules between indicators. The indicator threshold rules are used to determine whether the indicator value is within the normal range, and the logical relationship rules between indicators are used to determine whether the size relationship between multiple indicator values conforms to preset constraints.
4. The method according to claim 1, characterized in that, The second dimension of verification includes: Based on a predefined set of fault diagnosis order rules, the logical connection between adjacent diagnostic nodes is checked to see if it conforms to the set of fault diagnosis order rules. If it does not conform, the skip position is marked and the missing diagnostic node is identified.
5. The method according to claim 4, characterized in that, The fault troubleshooting sequence rule set is a directed acyclic graph structure, where nodes represent troubleshooting steps and directed edges represent prerequisite dependencies between steps. The verification of whether the logical connection relationship between adjacent diagnostic nodes conforms to the fault troubleshooting order rule set includes: The diagnostic node is mapped to the corresponding node in the directed acyclic graph. It is checked whether there is a directed path from the previous diagnostic node's mapped node to the current diagnostic node's mapped node. If there is no such path, it is determined to be a logical step, and the missing diagnostic node is identified based on the graph's shortest path algorithm.
6. The method according to claim 1, characterized in that, The third dimension verification includes: For diagnostic nodes that are not marked as conflicting evidence, the sufficiency score of the diagnostic node is calculated. Diagnostic nodes with scores below a preset threshold are marked as nodes with insufficient evidence. The sufficiency score is calculated by weighting the number of evidence fields on which the diagnostic node is based, the dispersion of the index values of the evidence fields, and the weight of the index corresponding to the evidence fields in a predefined set of key indicators.
7. The method according to claim 1, characterized in that, The comprehensive confidence score for generating the initial diagnostic trajectory using a penalty factor fusion method includes: Based on the results of the multidimensional verification, the initial confidence score is used as the base score. For each node with conflicting evidence, a first penalty factor is multiplied; for each logical jump, a second penalty factor is multiplied; and for each node with insufficient evidence, a third penalty factor is multiplied to obtain the comprehensive confidence score. The first penalty factor is less than the second penalty factor, and the second penalty factor is less than the third penalty factor.
8. The method according to claim 1, characterized in that, The training process of the large-scale fault diagnosis model includes: Acquire knowledge of operation and maintenance fault diagnosis, and construct the knowledge of operation and maintenance fault diagnosis into a directed acyclic graph structure containing fault types, troubleshooting steps, judgment conditions and conclusions, as a structured experience graph; The structured experience graph is compiled into meta-prompt words, which drive the teacher big model to simulate and deduce historical failure cases, generate thought chain data containing complete reasoning trajectories, and construct a supervised training set. The meta-prompt words contain instructions for generating evidence field indexes, so that the teacher big model outputs the index of the evidence it is based on while outputting reasoning statements. The basic large language model is fine-tuned under supervision using the supervised training set. The training objective is to minimize the cross-entropy loss between the output of the basic large language model and the thought chain data, thereby obtaining a preliminary diagnostic model. A multidimensional reward function is constructed, which includes a first reward component for evaluating the accuracy of the diagnostic conclusion, a second reward component for evaluating the matching degree between the reasoning steps and the structured experience graph, a third reward component for evaluating the correctness of the evidence index, and a fourth reward component for evaluating the standardization of the output format. The preliminary diagnostic model is optimized using a group relative strategy optimization algorithm to obtain the dedicated fault diagnosis large model.
9. The method according to claim 8, characterized in that, In the second reward component, the matching degree between the reasoning steps and the structured experience graph is calculated by aligning the sequence of reasoning steps with the paths in the directed acyclic graph structure, and calculating the matching score based on the edit distance or longest common subsequence algorithm.
10. The method according to claim 1, characterized in that, Also includes: When the overall confidence score is lower than a preset threshold, the standardized input data, the initial diagnostic trajectory, and the results of the multidimensional verification are pushed to the manual review interface to obtain the corrected diagnostic trajectory returned after manual review. The standardized input data and the corrected diagnostic trajectory are then combined into training samples and fed back into the iterative training set of the fault diagnosis big model and the meta-evaluator.
11. The method according to claim 1, characterized in that, Also includes: Based on the causal structure model of the target system, counterfactual reasoning is performed on the standardized input data to generate a system resilience analysis report. The counterfactual reasoning includes applying a preset perturbation to the key variables in the current system state vector, simulating whether a fault alarm is triggered in the counterfactual scenario after the perturbation, and identifying the redundancy mechanism or compensation mechanism in the target system based on the simulation results.
12. A system fault diagnosis and analysis device, characterized in that, Including memory and processor; The memory is used to store programs; The processor is used to execute the program to implement each step of the system fault diagnosis and analysis method as described in any one of claims 1-11.
13. A readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by the processor, it implements each step of the system fault diagnosis and analysis method as described in any one of claims 1-11.