Fault source positioning method and system based on causal graph and global score

By constructing a digital twin model of causal dependencies between devices and a global scoring algorithm, the problems of inaccurate root cause localization and false alarms/missed alarms in complex industrial systems are solved, enabling accurate localization and clear analysis of fault propagation paths and improving the level of predictive maintenance.

CN122263045APending Publication Date: 2026-06-23SHENZHEN POLYTECHNIC

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SHENZHEN POLYTECHNIC
Filing Date
2026-03-26
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing technologies lack effective modeling of complex causal dependencies and fault propagation paths between devices in complex industrial systems, resulting in inaccurate root cause localization, high false alarm and false negative rates, poor interpretability of output results, and difficulty in adapting to changes in operating conditions and providing clear fault analysis.

Method used

A digital twin model of causal dependencies between integrated devices is constructed, and a weighted directed graph is used for mathematical representation. Combined with individualized health baselines and global scoring algorithms, anomalies are detected in real time and the root cause of failure is inferred, generating interpretable diagnostic reports.

Benefits of technology

It enables precise location of fault propagation paths in complex industrial systems, reduces false alarms, improves sensitivity to early faults, provides clear fault analysis, and reduces unplanned downtime and maintenance costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122263045A_ABST
    Figure CN122263045A_ABST
Patent Text Reader

Abstract

The application discloses a fault source positioning method and system based on a causal graph and global scoring, and belongs to the technical field of industrial equipment fault prediction. Firstly, a digital twin model integrating equipment causal dependency is constructed, and a system-level fault propagation network is represented by a weighted directed graph. Secondly, an individualized health baseline based on working condition self-learning is adopted to establish a real-time updated individualized health benchmark for each equipment key parameter, thereby realizing working condition adaptive anomaly detection. When multiple equipment concurrent anomalies are detected, the pre-constructed causal graph is used for reasoning, the rationality score of the candidate fault source on the global anomaly mode is calculated, and the root fault source is accurately positioned. Finally, a diagnosis report integrating fault root sources, visualized propagation paths and quantitative evidence is generated. The application significantly improves the accuracy, robustness and explainability of decision support of complex industrial system fault prediction, and provides an efficient solution for predictive maintenance.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of industrial equipment fault prediction technology, and in particular relates to a fault source localization method and system based on causal graphs and global scoring. Background Technology

[0002] With the deepening of industrial and intelligent manufacturing, digital twin technology, as a core link connecting the physical and information worlds, has shown great potential in the field of industrial equipment condition monitoring and fault prediction. This technology provides a new means to understand the internal state of equipment and predict potential faults by constructing high-fidelity virtual models of physical entities and achieving real-time data-driven operation.

[0003] Currently, most common fault prediction solutions based on digital twins focus on the individual device level. The typical approach is to build a simulation or data-driven model for a specific device, continuously collect the device's operating parameters, and compare the real-time data with the model's preset normal operating range or historical baseline. Once a parameter is detected to exceed a static threshold, the device is determined to have a fault or abnormal risk.

[0004] However, the aforementioned traditional methods have revealed several inherent defects when dealing with complex industrial systems, which seriously restrict their application effectiveness and reliability in actual production.

[0005] First, current technologies generally lack effective modeling of complex causal dependencies and fault propagation paths between equipment. In real industrial production lines or complete sets of equipment, equipment does not operate in isolation, but rather forms an organic whole through close coupling of material flow, energy flow, and information flow. An initial fault or performance degradation in one piece of equipment often propagates and amplifies along a predetermined process chain or control chain, triggering a chain reaction of abnormalities in a series of downstream equipment. For example, a decrease in pump efficiency may lead to reduced flow, which in turn causes abnormal heat exchanger temperatures, ultimately resulting in reactor pressure deviating from the standard. Existing methods analyze data from each piece of equipment in isolation, only perceiving the abnormal phenomena manifested at the very end, and cannot trace back to the root cause of the fault that triggered this chain reaction. This leads to maintenance actions often being merely stopgap measures; replacing alarm components does not solve the fundamental problem, and the fault quickly recurs, resulting in wasted maintenance resources and repeated production interruptions.

[0006] Second, existing fault diagnosis mechanisms heavily rely on statically preset thresholds. The operating conditions of industrial equipment are not static; their load, speed, throughput, and ambient temperature vary with production plans and market demands. These changes cause dynamic shifts in the normal fluctuation range of equipment operating parameters. Using fixed thresholds easily leads to false alarms when operating conditions increase, misjudging normal high-load operation as abnormal; conversely, when operating conditions decrease or faults are in their slow nascent stages, the thresholds are too lenient, resulting in missed alarms and missing the optimal maintenance window. This rigid judgment logic is ill-suited to the complex and ever-changing environment of industrial sites, reducing the alarm signal-to-noise ratio of the predictive system and compromising its reliability.

[0007] Third, the interpretability of fault prediction results provided by existing technologies is severely lacking. Systems typically only output an abstract "fault probability" value or a simple "abnormal" label, failing to clearly explain the specific mechanism of the fault, which key parameters deviated in what ways, and the correlation between anomalies in different devices. Faced with such "black box" conclusions, field maintenance personnel struggle to quickly understand the nature of the fault, assess the scope of its impact, and determine repair priorities, still relying heavily on personal experience for extensive on-site troubleshooting and diagnosis. This significantly weakens the decision support advantages that digital twin technology should offer, failing to translate data value into efficient operational and maintenance actions.

[0008] In summary, overcoming the aforementioned shortcomings and providing a fault prediction scheme that can systematically model fault propagation, intelligently adapt to changes in operating conditions, and output clear and interpretable conclusions has become a key requirement for improving the predictive maintenance level of industrial equipment. Summary of the Invention

[0009] The purpose of this invention is to provide a fault source localization method and system based on causal graphs and global scoring, which solves the problems of inaccurate root cause localization due to ignoring the fault propagation relationship between devices, high false alarm and false negative rates due to the use of static thresholds, and poor interpretability due to the abstractness of the output results.

[0010] To achieve the above objectives, this invention provides a fault source localization method based on causal graphs and global scoring, comprising the following steps: Step S1: Construct a digital twin model of the causal dependencies between integrated devices; Step S2: Based on the individualized health baseline of working condition self-learning, perform real-time anomaly detection on the operating parameters of each device in the digital twin model constructed in step S1, and generate abnormal status information. Step S3: Based on the causal dependencies between devices in the digital twin model constructed in step S1, analyze and reason about the abnormal state information detected in step S2 to locate the root cause of the system abnormality. Step S4: Based on the root cause of the fault located in step S3 and the relevant information generated in steps S1 and S2, synthesize and output an interpretable diagnostic report containing evidence of the root cause of the fault, the propagation path and parameters.

[0011] Preferably, step S1 includes: S11. Collect static attribute information, dynamic operating data, and physical connection and process control relationship data between devices in the industrial system; S12. Based on the collected data, construct a system-level causal relationship graph model of device mechanism-data fusion. This model adopts a weighted directed graph. Mathematical representation, vertex set Edge set represents all devices in the system. The weight matrix represents the direct dependencies between devices. elements in Quantitatively representing faults or anomalies from equipment The spread affects the equipment The normalized impact coefficient based on the failure propagation probability; this coefficient is obtained by analyzing the correlation of historical failure event data or by simulation based on physical mechanism models; S13. The three-dimensional geometric model and physical attribute model of the individual equipment are fused with the system-level causal relationship graph model of equipment mechanism-data fusion to establish a multi-level digital twin that reflects the individual equipment state and the system-level causal linkage effect; among which... Each device node in Each of these is associated with a predefined set of key performance parameters, which is derived from S11 and serves as the specific target for independent monitoring and baseline modeling in step S2.

[0012] Preferably, step S2 contains the following: S21, Key performance parameters defined in S13 and associated with each device node Establish and maintain a dynamic baseline model that estimates the normal range of parameter values ​​under the current operating conditions in real time. The dynamic baseline model is used to calculate the parameter values ​​at time t. t dynamic reference center value and the dynamic standard deviation characterizing the normal range of fluctuations ; S22. Calculate the current sample value in real time. Normalized statistical deviation relative to its dynamic baseline center value The calculation formula is: ; in, These are preset dynamic adjustment factors for different health decline patterns, used to control the strictness of abnormality detection. S23. Based on the degree of deviation Perform anomaly detection: If If so, the parameter is determined to be in an abnormal state. The threshold for judgment is set to 1. The anomaly judgment results of all monitored parameters for each device are aggregated to form structured anomaly status information. This anomaly status information includes at least: a list of anomaly device identifiers (such as device ID and name), a list of anomaly parameters corresponding to each anomaly device, and the real-time deviation of each anomaly parameter. value.

[0013] Preferably, step S3 includes: S31. When step S2 detects that multiple devices in the system report anomalies simultaneously, and the number of devices is ≥2, extract all currently abnormal devices to form a candidate abnormal device set. ; S32, For sets Each candidate device node in Assuming it is the root cause of the failure, calculate its reasonableness score in explaining the abnormal modes of the entire system. Reasonableness score The calculation formula is: ; in, Indicates from candidate device nodes To the abnormal device node A measure of the strength of causal influence in G Indicates abnormal device A comprehensive measure of the severity of the anomaly; S33. Compare all candidate device nodes Reasonableness score Choose the device with the highest rating. As the root cause of the final fault location.

[0014] Preferred, causal influence strength measurement The specific calculation method is as follows: in a weighted directed graph G In the middle, search from candidate device nodes To the abnormal device node The set of all directed paths For each path Calculate the product of the weights of all edges along the path, and denote it as the path propagation strength. ,in Representing a path A directed edge on, Weight matrix The elements in the table represent elements from the node. To the node The normalized influence coefficient based on the fault propagation probability, The multiplication symbol represents the chain multiplication; then Defined as the maximum path propagation strength among all paths, the expression is: .

[0015] Preferred, comprehensive measurement of anomaly severity The calculation method is as follows: First, obtain the device... Deviation of all abnormal parameters reported in step S2 ,in Indicates the number of abnormal parameters; then for The deviations are aggregated and calculated, specifically using a weighted average or taking the maximum value: ; in, For the first The weighting coefficient of each abnormal parameter reflects the importance of that parameter to the health status of the equipment.

[0016] Preferably, the establishment and maintenance of the dynamic baseline model in step S21 adopts the exponentially weighted moving average algorithm, and the specific update formula is as follows: ; in, The forgetting factor controls the model's retention time of historical data and its speed of adaptation to new data. , and The initial value was obtained based on data statistics from a period of stable operation after the device was started.

[0017] Preferably, step S4 includes: S41. In the three-dimensional visualization scene of the digital twin model, highlight the root fault source device located in step S3 by using bright colors, flashing or outer frame markings. S42. In a 3D visualization scene, draw the fault propagation link from the root fault source device to other affected abnormal devices. The link is visualized and rendered based on the causal relationship graph model of mechanism-data fusion in step S1 and the path of maximum causal influence calculated in step S3, and the direction of fault propagation is indicated by animated arrows. S43. Provide a multi-dimensional parameter evidence view, including at least: (a) a historical trend graph of key abnormal parameters of the fundamental fault source equipment, with its dynamic baseline center value overlaid. and normal fluctuation range (b) A list of key abnormal parameters of the affected equipment and a table of their current deviation values; S44. Generate a structured diagnostic report document, including the following sections: fault summary, root cause analysis, impact scope assessment, parameter evidence details, and maintenance recommendations. The root cause analysis section is scored based on reasonableness. As a basis for inference.

[0018] This invention also provides a fault source localization system based on causal graphs and global scoring, comprising: The system modeling and knowledge base module is used to execute step S1, specifically including: an equipment information acquisition unit, used to collect static and dynamic data of equipment from the industrial site; a causal relationship mining unit, used to analyze data and construct a causal relationship graph of equipment mechanism-data fusion; and a model fusion unit, used to integrate geometric, attribute and causal models to form a digital twin. This module outputs and maintains a system-level causal knowledge base. The real-time sensing and anomaly detection module is used to execute step S2, and specifically includes: a data stream processing unit for receiving and processing sensor data in real time; a dynamic baseline management unit for maintaining an independent EWMA baseline model for each monitoring parameter; and an anomaly calculation and judgment unit for calculating deviation and generating anomaly events in real time. This module outputs a real-time and accurate anomaly state stream. The intelligent diagnosis and root cause reasoning module is used to execute step S3, specifically including: a multi-abnormal event aggregation unit to identify concurrent abnormal patterns; a causal graph traversal calculation unit to perform path search and intensity calculation on the causal network in the knowledge base; and a scoring and decision unit to calculate the rationality score of each candidate source and determine the root cause of the failure. This module is the core analysis engine of the system. The visualization interaction and report generation module is used to execute step S4, and specifically includes: a 3D scene rendering unit, responsible for the display and interaction of the digital twin scene; a visualization annotation unit, responsible for highlighting fault sources and drawing propagation paths; a data visualization unit, responsible for generating parameter trend charts; and a report synthesis unit, which automatically assembles and generates structured diagnostic documents. This module is the human-computer interaction interface.

[0019] Preferably, it also includes a unified data and service bus, which provides standardized data access, message communication, service invocation and storage support for the system modeling and knowledge base module, real-time perception and anomaly detection module, intelligent diagnosis and root cause reasoning module and visualization interaction and report generation module, ensuring reliable and efficient transmission of data flow and instruction flow between modules.

[0020] Therefore, the fault source localization method and system based on causal graphs and global scoring described above have the following beneficial effects: (1) By introducing system-level dependency modeling and graph propagation-based root cause analysis algorithm, the present invention can penetrate surface anomalies, effectively distinguish between root cause faults and derivative faults, and thus accurately locate the root cause device that triggers system-level chain reaction; (2) By replacing fixed thresholds with individualized health baselines based on operating conditions self-learning, the abnormality detection criteria can be automatically adjusted according to the actual operating conditions of the equipment. This significantly reduces false alarms caused by changes in production load, while enhancing the sensitivity to early-stage, slowly changing faults; (3) The diagnostic report provided not only points out "where it is broken", but also clearly explains "why it is broken" and "how it affects other parts" through visualization of the propagation path, comparison of parameter trends and quantitative impact analysis, which greatly enhances the understanding and trust of the operation and maintenance personnel in the prediction results; (4) By early and accurate detection and location of fundamental defects, planned maintenance can be carried out before they cause serious secondary damage or large-scale production stoppages, thereby reducing unplanned downtime and lowering maintenance costs.

[0021] The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Attached Figure Description

[0022] Figure 1 This is an overall flowchart of the fault source localization method based on causal graph and global scoring of the present invention; Figure 2 This is a structural block diagram of the fault source localization system based on causal graph and global scoring of the present invention; Figure 3 This is a timing diagram illustrating the fault propagation pattern in an embodiment of the present invention. Figure 4 This is a diagram showing the rationality scoring results of an embodiment of the present invention. Detailed Implementation

[0023] The following detailed description of embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely to illustrate selected embodiments of the invention. All other embodiments obtained by those skilled in the art based on the embodiments of the invention without inventive effort are within the scope of protection of the invention.

[0024] Please see Figure 1 The fault source localization method based on causal graphs and global scoring includes the following steps: Step S1: Construct a digital twin model of the causal dependencies between integrated devices; specifically including: S11. Collect static attribute information, dynamic operating data, and physical connection and process control relationship data between devices in the industrial system.

[0025] S12. Based on the collected data, construct a system-level causal relationship graph model of device mechanism-data fusion. This model adopts a weighted directed graph. Mathematical representation, vertex set Edge set represents all devices in the system. The weight matrix represents the direct dependencies between devices. elements in Quantitatively representing faults or anomalies from equipment The spread affects the equipment The normalized impact coefficient based on the probability of fault propagation is obtained by analyzing the correlation of historical fault event data or by simulation based on physical mechanism models.

[0026] S13. The three-dimensional geometric model and physical attribute model of the individual equipment are fused with the system-level causal relationship graph model of equipment mechanism-data fusion to establish a multi-level digital twin that reflects the individual equipment state and the system-level causal linkage effect; among which... Each device node in Each of these is associated with a predefined set of key performance parameters, which is derived from S11 and serves as the specific target for independent monitoring and baseline modeling in step S2.

[0027] Step S2: Based on the individualized health baseline of working condition self-learning, perform real-time anomaly detection on the operating parameters of each device in the digital twin model constructed in step S1, and generate abnormal status information; the specific content is as follows: S21, Key performance parameters defined in S13 and associated with each device node Establish and maintain a dynamic baseline model that estimates the normal range of parameter values ​​under the current operating conditions in real time. The dynamic baseline model is used to calculate the parameter values ​​at time t. t dynamic reference center value and the dynamic standard deviation characterizing the normal range of fluctuations The dynamic baseline model is established and maintained using an exponentially weighted moving average algorithm, with the specific update formula as follows: ; in, The forgetting factor controls the model's retention time of historical data and its speed of adaptation to new data. , and The initial value was obtained based on data statistics from a period of stable operation after the device was started.

[0028] S22. Calculate the current sample value in real time. Normalized statistical deviation relative to its dynamic baseline center value The calculation formula is: ; in, These are preset dynamic adjustment factors for different health decline patterns, used to control the strictness of abnormality detection.

[0029] S23. Based on the degree of deviation Perform anomaly detection: If If so, the parameter is determined to be in an abnormal state. The threshold for judgment is set to 1. The anomaly judgment results of all monitoring parameters for each device are aggregated to form structured anomaly status information. This anomaly status information includes at least: a list of anomaly device identifiers, such as device ID and name; a list of anomaly parameters corresponding to each anomaly device; and the real-time deviation of each anomaly parameter. value.

[0030] Step S3: Based on the causal dependencies between devices in the digital twin model constructed in Step S1, analyze and reason about the abnormal state information detected in Step S2 to locate the root cause of the system abnormality; specifically including: S31. When step S2 detects that multiple devices in the system report anomalies simultaneously, and the number of devices is ≥2, extract all currently abnormal devices to form a candidate abnormal device set. .

[0031] S32, For sets Each candidate device node in Assuming it is the root cause of the failure, calculate its reasonableness score in explaining the abnormal modes of the entire system. Reasonableness score The calculation formula is: ; in, Indicates from candidate device nodes To the abnormal device node A measure of the strength of causal influence in G Indicates abnormal device A comprehensive measure of the severity of anomalies; among which, the measure of the intensity of causal influence. The specific calculation method is as follows: in a weighted directed graph G In the middle, search from candidate device nodes To the abnormal device node The set of all directed paths For each path Calculate the product of the weights of all edges along the path, and denote it as the path propagation strength. ,in Representing a path A directed edge on, Weight matrix The elements in the table represent elements from the node. To the node The normalized influence coefficient based on the fault propagation probability, The multiplication symbol represents the chain multiplication; then Defined as the maximum path propagation strength among all paths, the expression is: .

[0032] Comprehensive measurement of abnormal severity The calculation method is as follows: First, obtain the device... Deviation of all abnormal parameters reported in step S2 ,in Indicates the number of abnormal parameters; then for The deviations are aggregated and calculated, specifically using a weighted average or taking the maximum value: ; in, For the first The weighting coefficient of each abnormal parameter reflects the importance of that parameter to the health status of the equipment.

[0033] S33. Compare all candidate device nodes Reasonableness score Choose the device with the highest rating. As the root cause of the final fault location.

[0034] Step S4: Based on the root cause of the fault located in step S3 and the relevant information generated in steps S1 and S2, synthesize and output an interpretable diagnostic report containing evidence of the root cause of the fault, propagation path, and parameters; specifically including: S41. In the three-dimensional visualization scene of the digital twin model, highlight the root fault source device located in step S3 by using bright colors, flashing, or outer frame markings.

[0035] S42. In a 3D visualization scene, draw the fault propagation link from the root fault source device to other affected abnormal devices. The link is visualized and rendered based on the causal relationship graph model of mechanism-data fusion in step S1 and the path of maximum causal influence calculated in step S3, and the direction of fault propagation is indicated by animated arrows.

[0036] S43. Provide a multi-dimensional parameter evidence view, including at least: (a) a historical trend graph of key abnormal parameters of the fundamental fault source equipment, with its dynamic baseline center value overlaid. and normal fluctuation range (b) A list of key abnormal parameters of the affected equipment and a table of their current deviation values.

[0037] S44. Generate a structured diagnostic report document, including the following sections: fault summary, root cause analysis, impact scope assessment, parameter evidence details, and maintenance recommendations. The root cause analysis section is scored based on reasonableness. As a basis for inference.

[0038] Please see Figure 2 A fault source localization system based on causal graphs and global scoring includes: The system modeling and knowledge base module is used to execute step S1, specifically including: an equipment information acquisition unit, used to collect static and dynamic data of equipment from the industrial site; a causal relationship mining unit, used to analyze data and construct a causal relationship map of equipment mechanism-data fusion; and a model fusion unit, used to integrate geometric, attribute and causal models to form a digital twin. This module outputs and maintains a system-level causal knowledge base.

[0039] The real-time sensing and anomaly detection module is used to execute step S2, and specifically includes: a data stream processing unit for receiving and processing sensor data in real time; a dynamic baseline management unit for maintaining an independent EWMA baseline model for each monitoring parameter; and an anomaly calculation and judgment unit for calculating deviation and generating anomaly events in real time. This module outputs a real-time and accurate anomaly state stream.

[0040] The intelligent diagnosis and root cause reasoning module is used to execute step S3, specifically including: a multi-abnormal event aggregation unit to identify concurrent abnormal patterns; a causal graph traversal calculation unit to perform path search and intensity calculation on the causal network in the knowledge base; and a scoring and decision unit to calculate the rationality score of each candidate source and determine the root cause of the failure. This module is the core analysis engine of the system.

[0041] The visualization interaction and report generation module is used to execute step S4, and specifically includes: a 3D scene rendering unit, responsible for the display and interaction of the digital twin scene; a visualization annotation unit, responsible for highlighting fault sources and drawing propagation paths; a data visualization unit, responsible for generating parameter trend charts; and a report synthesis unit, which automatically assembles and generates structured diagnostic documents. This module is the human-computer interaction interface.

[0042] A unified data and service bus provides standardized data access, message communication, service invocation, and storage support for the system modeling and knowledge base module, real-time perception and anomaly detection module, intelligent diagnosis and root cause reasoning module, and visualization interaction and report generation module, ensuring reliable and efficient transmission of data and instruction flows between modules.

[0043] Example This embodiment uses a material pretreatment unit of a chemical plant as an example to illustrate the specific implementation of the present invention.

[0044] 1. Target System and Modeling: The pretreatment unit consists of three key devices connected in series: a feed pump ( → Tubular heat exchanger (H) → Preheating reactor (R). Wherein, Provide material flow for the system, Responsible for precise temperature control, To carry out a preliminary chemical reaction.

[0045] The first step in implementing this invention is to construct a digital twin causal model for this unit: Data Acquisition and Parameter Definition: Access Export pressure, outlet temperature, Real-time sensor data of key parameters such as internal pressure.

[0046] Cause-effect graph construction: Based on historical operational data and process knowledge, construct a weighted directed graph. Set the node set. edge set Weight (Pump performance has a significant impact on heat exchange). (Temperature is crucial for the reaction.)

[0047] 2. Dynamic monitoring and anomaly injection: for Pressure temperature Establish a dynamic baseline model for pressure, parameters Real-time monitoring of its deviation .

[0048] Simulate real-world failure scenarios: Due to wear of the mechanical seal, its outlet pressure begins to decrease slowly but continues to decline.

[0049] 3. Fault propagation and detection: Decreased stress leads to supply The material flow rate is reduced.

[0050] Due to insufficient flow and inadequate heat exchange, its outlet temperature begins to fall below the set value.

[0051] Due to insufficient feed temperature, the chemical reaction rate slowed down, and the internal pressure rose at an abnormal rate.

[0052] The dynamic baseline model detected successively temperature, pressure, stress If the value exceeds the threshold, the system determines that multiple devices are experiencing concurrent anomalies and triggers the root cause localization process.

[0053] like Figure 3 The specific timing pattern of fault propagation is as follows: (Heat exchanger) First alarm: at time T1 (time point). The temperature deviation first exceeded the threshold (1.0), which verifies that downstream equipment is often the first to show obvious abnormalities.

[0054] (Reaction vessel) Secondary alarm: at time T2, The abnormal pressure followed, indicating that the fault had spread along the process chain.

[0055] The feed pump was eventually found to be faulty: although It is the root cause of the fault, but its abnormality (pressure drop) did not reach a detectable level until time T3, revealing that the root cause of the fault may be hidden.

[0056] In addition, from Figure 3 It can be concluded that, When the pressure deviation is only 0.5, the system has passed. and The system detects anomalies as system-level problems and issues an early warning two time units in advance. By using a dynamic baseline center value, the system avoids false alarms during the normal fluctuation period of static thresholds from T0 to T1, thus demonstrating the superiority of the invention's design. Ultimately, at time T4, all three devices malfunctioned simultaneously, triggering the root cause analysis logic.

[0057] 4. Causal reasoning and root cause identification: The system extracts a set of abnormal nodes. And calculate the rationality score for each candidate source. ,like Figure 4 As shown: by Source: Able to explain the anomalies of both H and R simultaneously through causal paths; reasonableness score. .

[0058] by For source: can only explain Abnormalities, reasonableness score .

[0059] by Source: Unable to explain upstream anomaly, reasonableness score is 0.

[0060] Compare scores, At its highest level, the system accurately determines the feed pump. This is the root cause of the failure.

[0061] 5. Visualized diagnostic report generation: In the digital twin interface: feed pump It is highlighted in red.

[0062] from arrive Then The fault propagation path is clearly shown in the red arrow animation.

[0063] Simultaneously pop up The historical trend chart of the pressure shows the process of it gradually falling below the dynamic baseline zone.

[0064] The system generates a structured report: "Root cause of failure: feed pump" (Seal wear). Impact path: (85% impact, resulting in lower than average temperature) (Impact rate 78.2%, causing abnormal pressure). Pump overhaul is recommended as a priority. ".

[0065] In this embodiment, compared with the traditional single-point threshold alarm method, the present invention not only detects system-level anomalies earlier, but more importantly, it penetrates the two surface symptoms of "low heat exchanger temperature" and "abnormal reactor pressure" and directly, automatically and interpretably locates the real root cause hidden upstream with insignificant initial symptoms - feed pump failure. This guides maintenance personnel to carry out precise and efficient repairs, avoiding misjudgment and waste of maintenance resources.

[0066] Therefore, this invention employs the aforementioned fault source localization method and system based on causal graphs and global scoring to construct an integrated intelligent fault prediction framework encompassing "perception-cognition-reasoning-decision." The technical solution begins with system-level causal modeling: by analyzing historical data and mechanistic knowledge, a causal graph is constructed with devices as nodes, fault propagation relationships as directed edges, and influence intensity as weights. This graph is then integrated with the device's 3D model to form a digital twin reflecting the system's interconnected effects. Based on this, the solution enters the real-time dynamic perception stage: an independent, condition-based, individualized health baseline model is established for each key performance parameter. Algorithms such as exponentially weighted moving averages are used to learn its normal behavior range online, and standardized deviations are calculated to overcome the limitations of fixed thresholds and achieve accurate anomaly detection through condition-adaptive learning. When multiple device anomalies are detected, the solution activates the intelligent causal reasoning engine: abnormal events are mapped to the causal graph, each anomalous device is assumed to be the root cause, and a graph traversal algorithm is used to calculate the theoretical influence intensity of the anomalous device on all other anomalous devices through the causal path. This is combined with the severity of each device's anomaly to form a comprehensive explanatory power score. The device with the highest score is identified as the most likely root cause of the fault. Ultimately, the solution delivers interpretable decision outputs: the fault source and propagation path are highlighted in the digital twin visualization interface, the historical trends and dynamic baselines of key parameters are displayed simultaneously, and a structured diagnostic report is automatically generated, clearly explaining "where the fault is, why it is so, and what the impact is," thereby transforming data value into knowledge that can directly guide operation and maintenance actions, and realizing a closed loop from anomaly perception to root cause management.

[0067] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can still be made to the technical solutions of the present invention, and these modifications or equivalent substitutions cannot cause the modified technical solutions to deviate from the spirit and scope of the technical solutions of the present invention.

Claims

1. A fault source localization method based on causal graphs and global scoring, characterized in that, Includes the following steps: Step S1: Construct a digital twin model of the causal dependencies between integrated devices; Step S2: Based on the individualized health baseline of working condition self-learning, perform real-time anomaly detection on the operating parameters of each device in the digital twin model constructed in step S1, and generate abnormal status information. Step S3: Based on the causal dependencies between devices in the digital twin model constructed in step S1, analyze and reason about the abnormal state information detected in step S2 to locate the root cause of the system abnormality. Step S4: Based on the root cause of the fault located in step S3 and the relevant information generated in steps S1 and S2, synthesize and output an interpretable diagnostic report containing evidence of the root cause of the fault, the propagation path and parameters.

2. The fault source localization method based on causal graph and global scoring according to claim 1, characterized in that, Step S1 includes: S11. Collect static attribute information, dynamic operating data, and physical connection and process control relationship data between devices in the industrial system; S12. Based on the collected data, construct a system-level causal relationship graph model of device mechanism-data fusion. This model adopts a weighted directed graph. Mathematical representation, vertex set Edge set represents all devices in the system. The weight matrix represents the direct dependencies between devices. elements in Quantitatively representing faults or anomalies from equipment The spread affects the equipment The normalized influence coefficient based on the failure propagation probability is obtained by analyzing the correlation of historical failure event data or by simulation based on physical mechanism model. S13. The three-dimensional geometric model and physical attribute model of the individual equipment are fused with the system-level causal relationship graph model of equipment mechanism-data fusion to establish a multi-level digital twin that reflects the individual equipment state and the system-level causal linkage effect; among which... Each device node in Each of these is associated with a predefined set of key performance parameters, which is derived from S11 and serves as the specific target for independent monitoring and baseline modeling in step S2.

3. The fault source localization method based on causal graphs and global scoring according to claim 2, characterized in that, The content of step S2 is as follows: S21, Key performance parameters defined in S13 and associated with each device node A dynamic baseline model is established and maintained, which estimates the normal range of parameter values ​​under the current operating conditions in real time. The dynamic baseline model is used to calculate the dynamic reference center value of the parameter at time t. and the dynamic standard deviation characterizing the normal range of fluctuations ; S22. Calculate the current sample value in real time. Normalized statistical deviation relative to its dynamic baseline center value The calculation formula is: ; in, These are preset dynamic adjustment factors for different health decline patterns, used to control the strictness of abnormality detection. S23. Based on the degree of deviation Perform anomaly detection: If If so, the parameter is determined to be in an abnormal state. The threshold for judgment is set to 1; the anomaly judgment results of all monitoring parameters of each device are aggregated to form structured anomaly status information, which includes at least: a list of abnormal device identifiers, a list of abnormal parameters corresponding to each abnormal device, and the real-time deviation of each abnormal parameter. value.

4. The fault source localization method based on causal graphs and global scoring according to claim 3, characterized in that, Step S3 includes: S31. When step S2 detects that multiple devices in the system report anomalies simultaneously, and the number of devices is ≥2, extract all currently abnormal devices to form a candidate abnormal device set. ; S32, For sets Each candidate device node in Assuming it is the root cause of the failure, calculate its reasonableness score in explaining the abnormal modes of the entire system. Reasonableness score The calculation formula is: ; in, Indicates from candidate device nodes To the abnormal device node exist G The measure of causal influence strength in Indicates abnormal device A comprehensive measure of the severity of the anomaly; S33. Compare all candidate device nodes Reasonableness score Choose the device with the highest rating. As the root cause of the final fault location.

5. The fault source localization method based on causal graph and global scoring according to claim 4, characterized in that, Measurement of the strength of causal influence The specific calculation method is as follows: in a weighted directed graph G In the middle, search from candidate device nodes To the abnormal device node The set of all directed paths For each path Calculate the product of the weights of all edges along the path, and denote it as the path propagation strength. ,in Representing a path A directed edge on, Weight matrix The elements in the table represent elements from the node. To the node The normalized influence coefficient based on the fault propagation probability, The multiplication symbol represents the chain multiplication; then Defined as the maximum path propagation strength among all paths, the expression is: 。 6. The fault source localization method based on causal graph and global scoring according to claim 5, characterized in that, Comprehensive measurement of abnormal severity The calculation method is as follows: First, obtain the device... Deviation of all abnormal parameters reported in step S2 ,in Indicates the number of abnormal parameters; then for The deviations are aggregated and calculated, specifically using a weighted average or taking the maximum value: ; in, For the first The weighting coefficient of each abnormal parameter reflects the importance of that parameter to the health status of the equipment.

7. The fault source localization method based on causal graph and global scoring according to claim 6, characterized in that, In step S21, the establishment and maintenance of the dynamic baseline model adopts the exponentially weighted moving average algorithm, and the specific update formula is as follows: ; in, The forgetting factor controls the model's retention time of historical data and its speed of adaptation to new data. .

8. The fault source localization method based on causal graph and global scoring according to claim 7, characterized in that, Step S4 includes: S41. In the three-dimensional visualization scene of the digital twin model, highlight the root fault source device located in step S3 by using bright colors, flashing or outer frame markings. S42. In a 3D visualization scene, draw the fault propagation link from the root fault source device to other affected abnormal devices. The link is visualized and rendered based on the causal relationship graph model of mechanism-data fusion in step S1 and the path of maximum causal influence calculated in step S3, and the direction of fault propagation is indicated by animated arrows. S43. Provide a multi-dimensional parameter evidence view, including at least: (a) a historical trend graph of key abnormal parameters of the fundamental fault source equipment, with its dynamic baseline center value overlaid. and normal fluctuation range (b) A list of key abnormal parameters of the affected equipment and a table of their current deviation values; S44. Generate a structured diagnostic report document, including the following sections: fault summary, root cause analysis, impact scope assessment, parameter evidence details, and maintenance recommendations. The root cause analysis section is scored based on reasonableness. As a basis for inference.

9. A fault source localization system based on causal graphs and global scoring, applied to the fault source localization method based on causal graphs and global scoring as described in any one of claims 1-8, characterized in that, include: The system modeling and knowledge base module is used to execute step S1, specifically including: an equipment information acquisition unit, used to collect static and dynamic data of equipment from the industrial site; a causal relationship mining unit, used to analyze data and construct a causal relationship graph of equipment mechanism-data fusion; and a model fusion unit, used to integrate geometric, attribute and causal models to form a digital twin. This module outputs and maintains a system-level causal knowledge base. The real-time sensing and anomaly detection module is used to execute step S2, and specifically includes: a data stream processing unit for receiving and processing sensor data in real time; a dynamic baseline management unit for maintaining an independent EWMA baseline model for each monitoring parameter; and an anomaly calculation and judgment unit for calculating deviation and generating anomaly events in real time. This module outputs a real-time and accurate anomaly state stream. The intelligent diagnosis and root cause reasoning module is used to execute step S3, specifically including: a multi-abnormal event aggregation unit to identify concurrent abnormal patterns; a causal graph traversal calculation unit to perform path search and intensity calculation on the causal network in the knowledge base; and a scoring and decision unit to calculate the rationality score of each candidate source and determine the root cause of the failure. This module is the core analysis engine of the system. The visualization interaction and report generation module is used to execute step S4, and specifically includes: a 3D scene rendering unit, responsible for the display and interaction of the digital twin scene; a visualization annotation unit, responsible for highlighting fault sources and drawing propagation paths; a data visualization unit, responsible for generating parameter trend charts; and a report synthesis unit, which automatically assembles and generates structured diagnostic documents. This module is the human-computer interaction interface.

10. The fault source localization system based on causal graphs and global scoring according to claim 9, characterized in that: It also includes a unified data and service bus, which provides standardized data access, message communication, service invocation and storage support for the system modeling and knowledge base module, real-time perception and anomaly detection module, intelligent diagnosis and root cause reasoning module and visualization interaction and report generation module, ensuring reliable and efficient transmission of data flow and instruction flow between modules.