UPS operating state data analysis method and system
By constructing a dynamic heterogeneous graph model with physical topology alignment for UPS and multi-scale temporal feature encoding, combined with an adaptive graph attention mechanism, the problem of cross-component coupling fault identification in UPS systems under complex operating conditions is solved, achieving highly sensitive fault identification and interpretable diagnosis, adaptable to different models of UPS equipment.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- JIANGMEN ZETA POWER SUPPLY TECH CO LTD
- Filing Date
- 2026-01-17
- Publication Date
- 2026-06-12
Smart Images

Figure CN122196380A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of computer technology, specifically relating to a method and system for analyzing UPS operating status data. Background Technology
[0002] Uninterruptible power supplies (UPS) are core devices that ensure continuous power supply to critical loads in data centers, medical systems, and industrial automation. Real-time monitoring and accurate analysis of their operating status are crucial for improving system reliability. Currently, UPS systems are typically equipped with various sensors to collect parameters such as voltage, current, and temperature, and rely on threshold comparisons or simple statistical methods to determine anomalies. These methods have significant limitations under real-world, complex operating conditions.
[0003] Existing monitoring methods typically process data from each sensor independently, lacking the ability to comprehensively model the coupling relationships of energy flow, heat flow, and signal flow among multiple components within a UPS. This results in insufficient ability to identify cross-component and multi-parameter coupled faults, leading to frequent false alarms and missed alarms. Furthermore, although some systems have introduced machine learning algorithms for state classification, they still suffer from poor model generalization, weak interpretability, and reliance on a large number of fault samples for training, making it difficult to adapt to new equipment or unknown fault modes. While graph neural networks have the potential for relational modeling, how to achieve accurate mapping of physical topology, dynamic fusion of multi-source data, and real-time inference under resource constraints in industrial scenarios remains an unsolved technical challenge.
[0004] Therefore, existing technologies are insufficient to achieve in-depth integrated analysis of UPS operating status, nor can they provide clear anomaly propagation paths and diagnostic criteria. There is an urgent need for an intelligent analysis method and system that can integrate equipment physical topology and operating data, possess cross-component correlation analysis capabilities, and support interpretable diagnostics. Summary of the Invention
[0005] To address the aforementioned technical problems, this invention provides a UPS operating status data analysis method and system, specifically employing the following technical solution: Firstly, a method for analyzing UPS operating status data includes: Based on the physical topology information of the UPS device to be analyzed, a dynamic heterogeneous graph model is constructed, wherein the physical topology information includes the structural connection relationship and functional dependency relationship between components; The system acquires multi-channel time-series running status data of each component node, and processes the time-series running status data based on a multi-scale time-series feature encoder to generate state embedding vectors for each node. The state embedding vector and the dynamic heterogeneous graph model are input into the adaptive graph attention network, and multiple rounds of information aggregation are performed along the physical relationship edges defined in the graph to obtain the node state representation that integrates the global context. Based on the graph autoencoder, the anomaly score of each node is calculated according to the node state representation and its reconstruction error, and anomaly determination is made based on a preset threshold. In response to the anomaly determination, the anomaly propagation path is traced back based on the attention weights output by the adaptive graph attention network, and an interpretable diagnostic report containing suspected faulty components and propagation paths is generated.
[0006] Preferably, a dynamic heterogeneous graph model is constructed based on the physical topology information of the UPS device to be analyzed, specifically including: Identify the functional components in the UPS equipment and the sensors deployed on them, forming a set of entities; Based on the technical documentation and domain knowledge of UPS equipment, the structural connection relationships and functional dependencies between entities are extracted. The structural connection relationships include electrical conduction paths, heat conduction paths, and control signal paths. The functional dependencies are derived based on the chain reaction of energy transfer, heat diffusion, or control signals between components. Relationships are uniformly represented as triplets of (source component, relation type, target component) as the basis for constructing the graph model.
[0007] Preferably, constructing a dynamic heterogeneous graph model further includes: Each functional component is abstracted as a graph node, and the sensor data on it is attributed to that node. Each node is assigned a type identifier, which includes power conversion node, energy storage node, heat dissipation node, and sensing node; Based on the set of triples, a directed edge with the corresponding edge type is established between the corresponding source node and the target node. The direction of the directed edge follows the natural propagation direction of energy flow, heat flow, or signal flow in the physical world.
[0008] Preferably, the process involves acquiring multi-channel time-series operational status data for each component node and processing the time-series operational status data based on a multi-scale time-series feature encoder, specifically including: The synchronous operation status data collected by the sensor is segmented into segments with a fixed time window, and each parameter is standardized based on historical statistics. The standardized data is reorganized according to the nodes to which they belong, and a three-dimensional temporal tensor with dimensions of (number of nodes, number of time steps, and feature dimension) is constructed. The three-dimensional temporal tensor is input into a multi-scale temporal feature encoder. The multi-scale temporal feature encoder extracts high-frequency, mid-frequency, and low-frequency temporal features through parallel branches with convolutional kernels of different lengths, and then fuses the features extracted by each branch and maps them into a unified node state embedding vector.
[0009] Preferably, the multi-scale temporal feature encoder includes three parallel one-dimensional convolutional neural network branches: The high-frequency branch, configured with a first-length convolutional kernel, is used to extract millisecond-level high-frequency transient features; The mid-frequency branch is configured with a second-length convolutional kernel that is longer than the first length, and is used to extract mid-term fluctuation features at the ten-millisecond level. The low-frequency branch, configured with a third-length convolutional kernel covering the entire time window, is used to extract second-level trend drift features.
[0010] Preferably, the adaptive graph attention network aggregates information through an improved attention weight calculation unit. For a directed edge from source node i to target node j, its attention coefficient is... The computation depends on: the state embedding vector of source node i The state embedding vector of target node j Edge type embedding vector And the learnable weight matrices corresponding to the type identifiers of nodes i and j, respectively. and .
[0011] Preferably, performing multi-round information aggregation includes performing at least three layers of graph attention aggregation operations, such that the final state representation of each node is fused with the state information of its components in its multi-hop neighborhood.
[0012] Preferably, the preset threshold is determined in the following way: The graph autoencoder is trained based on historical normal operation data, and the reconstruction error distribution of each node on historical normal samples is calculated. The reconstruction error value corresponding to the specified high quantile of the distribution is used as the anomaly judgment threshold of the node.
[0013] Preferably, the abnormal propagation path backtracking includes: Select the node with the highest anomaly score as the target node; Based on the attention weights output by the adaptive graph attention network, identify the set of key upstream nodes that contribute the most to the target node; Starting from the key upstream node, recursively trace the upstream node in the opposite direction of the directed edges in the graph to form a potential anomaly propagation path; For nodes on the propagation path, verify whether the deviation direction of their key operating parameters conforms to the causal logic of the corresponding physical relationship, and retain the paths that pass the verification for generating diagnostic reports.
[0014] Secondly, a UPS operating status data analysis system is provided to implement the aforementioned UPS operating status data analysis method, the system comprising: The physical topology parsing module is used to read and parse the physical topology information of the UPS device; A dynamic heterogeneous graph construction module is used to construct a dynamic heterogeneous graph model based on the physical topology information. The multi-source data acquisition and preprocessing module is used to collect the running status data of each component node, and perform time window segmentation and standardization processing to construct a multi-channel time series tensor. The core neural network inference module, including a multi-scale temporal feature encoder and an adaptive graph attention network, is used to generate node state embedding vectors and perform multi-round context-aware information aggregation to output node state representations. The intelligent analysis and diagnosis module, including a graph autoencoder and a path backtracking unit, is used to detect anomalies based on node state representations and to backtrack the propagation path and generate a diagnostic report when an anomaly is detected. The core neural network inference module is deployed on an AI acceleration chip, while the remaining modules are deployed on edge computing units.
[0015] In summary, this application includes at least one of the following beneficial technical effects: 1. This invention constructs a dynamic heterogeneous graph model that is strictly aligned with the physical topology of the UPS and integrates multi-scale time-series feature encoding. This enables explicit modeling of the coupling relationship between energy flow, heat flow and signal flow across components within the equipment. This allows the system to capture hidden abnormal patterns of multi-parameter coupling that are difficult to detect with a single parameter threshold, thereby effectively reducing false alarm and missed alarm rates and achieving highly sensitive identification and early warning of complex early faults.
[0016] 2. This invention adopts an adaptive graph attention mechanism guided by physical topology and node type. Its core modeling object is the inherent component connection and functional dependency relationship of the device, rather than the fixed data pattern of a specific model. Therefore, by configuring different device topology knowledge bases, the same analysis core does not need to rely on a large number of fault samples for retraining, and can be adapted to UPS devices of different models or architectures, reducing the deployment and maintenance threshold and improving the universality of the method.
[0017] 3. Unlike the "black box" model that only outputs anomaly labels, this invention is based on graph attention weight and anomaly propagation path backtracking algorithm. It can automatically identify the key components that contribute the most to the anomaly and visualize the potential propagation link of the anomaly in a way that conforms to physical causal logic. This provides maintenance personnel with a clear basis for decision-making, enabling them to quickly locate the root cause of the fault and take targeted maintenance measures, which greatly improves maintenance efficiency and system reliability. Attached Figure Description
[0018] Figure 1 This is a flowchart illustrating a UPS operation status data analysis method according to the present invention; Figure 2 This is a schematic diagram of the process of fusing adaptive graph attention mechanism and multi-scale temporal feature encoding in this invention; Figure 3 This is a schematic diagram of the process for backtracking abnormal propagation paths and generating interpretable diagnoses in this invention. Detailed Implementation
[0019] This invention provides a UPS operation status data analysis method and system. Its core lies in constructing a dynamic heterogeneous graph model that is strictly aligned with the physical topology of the UPS, and on this basis, integrating multi-scale temporal feature encoding and adaptive graph attention mechanism to achieve accurate identification and interpretable diagnosis of complex early faults.
[0020] To further illustrate the technical means and effects adopted by the present invention to achieve the intended purpose, the following detailed description is provided in conjunction with the accompanying drawings and preferred embodiments, based on the specific implementation of the present invention. All technical details are based on the actual physical structure of the UPS equipment, the sensor deployment logic, and the operating data acquisition specifications.
[0021] Example 1 A method for analyzing UPS operating status data includes the following steps: Step S1: Obtain the physical topology information of the UPS device, which includes the following sub-steps: S101: Identify the various functional components (such as rectifiers, inverters, static switches, battery packs, cooling fans, etc.) involved in the UPS equipment to be analyzed, as well as the sensors (such as voltage sensors, temperature sensors, current sensors, etc.) deployed on each component, forming a set of entities to be modeled.
[0022] S102: By systematically reading the technical manual, electrical schematics, control logic descriptions, and thermal design documents of this UPS model, the connection relationships and functional dependencies between various entities are identified.
[0023] These documents typically describe the electrical connections between components, the thermal management layout, and the flow of control signals. When extracting information from multiple documents, cross-validation and integration are required to ensure consistency in the description of the same entity or relationship.
[0024] If the document information is incomplete, it can be supplemented by knowledge of the general architecture of this UPS model. The core goal is to construct a topology description that accurately reflects the actual physical structure of the equipment.
[0025] S103: Extracting direct physical connections between entities from documents, mainly including three categories: Electrical conduction path: For example, the rectifier output is connected to the inverter input via a DC bus; Heat conduction path: For example, the IGBT power module inside the inverter is in close contact with the heat sink, and the heat sink is cooled by forced convection through a fan; Control signal path: For example, the main control unit is connected to the fan speed control interface via a PWM signal line.
[0026] This type of relationship reflects the hard connection of the equipment in its structure.
[0027] S104: Identify the soft coupling relationship between entities in terms of function, that is, when the state of a component changes, it will affect the working performance of other components through the chain reaction of energy transfer, heat diffusion or control signals.
[0028] The identification of such relationships can be derived based on the operating principles of the equipment, physical laws (such as the law of conservation of energy and the heat conduction equation), and the experiential knowledge of domain experts. For example, a decrease in fan speed will directly affect heat dissipation efficiency, which may lead to an increase in the temperature of the IGBT module; aging of the DC bus capacitor will cause an increase in ripple current, which may further affect the quality of the inverter output voltage.
[0029] Incorporating these functional dependencies into the model along with structural hard connections is crucial for capturing the propagation of latent faults across components.
[0030] S105: The structural connections (S103) and functional dependencies (S104) identified above are uniformly represented as a triple form of (source component, relation type, target component).
[0031] The relationship types are divided into three categories: "power transmission", "heat conduction", and "control signal". For example, (rectifier, power transmission, DC bus sensor) represents a power transmission edge. Since significant mechanical linkage is usually not involved in the typical model of this UPS system, this type of relationship is not included for the time being.
[0032] All triples together constitute a structured description of the device's physical topology and serve as the sole basis for subsequently building the graph model, thereby ensuring that the computational model and the real device maintain strict consistency in topology.
[0033] Step S2: Construct a dynamic heterogeneous graph model based on physical topology information, which includes the following sub-steps: S201: Abstract the key physical entities in the UPS system into nodes in a graph model.
[0034] This embodiment adopts an aggregation and abstraction strategy centered on functional components: each functional component (such as rectifier, inverter, static switch, battery pack, cooling fan) is abstracted as an independent node, and the sensors deployed on the component (such as voltage sensor, temperature sensor, current sensor) are used as auxiliary monitoring points of the component node. Their data will belong to the node, and no separate sensor node will be created. In this way, a node in the figure represents a functional unit of the device and all its monitoring data.
[0035] Each node is assigned a distinct type identifier, which is divided into 4 categories: Power conversion node: The component that performs AC / DC or DC / AC conversion, such as rectifier, inverter, static switch; Energy storage node: corresponding to battery pack; Heat dissipation node: Corresponding to the cooling fan; Sensing Nodes: Under this strategy, this type specifically refers to a small number of independent, purely sensing units that do not belong to the aforementioned functional components (if they exist). In actual UPS systems, the vast majority of sensors belong to the three types of nodes mentioned above.
[0036] This classification is based on the main physical function of the components, providing a foundation for subsequent type-aware attention mechanisms.
[0037] S202: When constructing the graph data structure, store a "node type" attribute for each node, which can take one of the four categories mentioned above.
[0038] For example, the node corresponding to the rectifier is identified as a "power conversion node", and the node corresponding to the battery pack is an "energy storage node".
[0039] S203: Traverse the set of triples (source component, relation type, target component) generated in step S1, and construct directed edges in the graph according to the following rules: First, the "source component" and "target component" in the triplet are mapped to their respective functional component nodes. For example, in a triplet (rectifier, power transmission, DC bus sensor), the "DC bus sensor" is deployed on the DC bus. The DC bus is usually considered a functional entity connecting the rectifier and inverter or belongs to the rectifier / inverter node. Therefore, the sensor is mapped to the functional component node corresponding to "rectifier" or "DC bus" (depending on the topology design). If the component is a main body without an auxiliary sensor, such as a cooling fan, it is directly mapped to its own node.
[0040] Then, a directed edge is established between the source node and the target node obtained by mapping. Each edge is assigned a corresponding edge type identifier according to the "relation type" field in the triple, which is divided into three types: "power transmission edge", "heat conduction edge" and "control signal edge".
[0041] During the construction of directed edges, it is permissible to create edges that directly connect two functional component nodes, even if there may be a physical intermediary. The purpose is to explicitly capture key cross-component dependencies in the model and control model complexity. For example, a triple (rectifier, power transmission, DC bus sensor) will ultimately create a directed edge of type "power transmission" between the "rectifier" node and the "DC bus" functional node (or the rectifier node itself, if the sensor belongs to it).
[0042] S204: The direction of a directed edge strictly follows the natural propagation direction of energy flow, heat flow, or signal flow in the physical world. That is, it points from the source component node that exerts the influence to the target component node that is affected.
[0043] For example, in the power transmission path, current flows from the rectifier to the DC bus, so the direction of the edge is "rectifier node → DC bus functional node"; in the heat conduction path, heat diffuses from the heat-generating IGBT module to the component responsible for heat dissipation, so the direction of the edge is "inverter (or IGBT) node → cooling fan node".
[0044] S205: For indirect relationships that physically require the transmission of influence through one or more intermediate components, reasonable abstraction and simplification are permitted in the graph model. The purpose is to control the complexity of the model while retaining the key functional dependencies, and to avoid introducing too many intermediate nodes that only play a transmission role and lack independent states.
[0045] For example, the temperature rise of the IGBT module in the inverter will be conducted through the heat sink, ultimately affecting the operation of the cooling fan (physical path: IGBT module → heat sink → fan).
[0046] In the graph model, a "heat conduction edge" can be directly established between the "inverter node" (reflecting the state including IGBT temperature) and the "cooling fan node". This edge represents the complete physical influence chain, enabling the model to capture the core relationship that "the inverter's heating state affects the fan's operation" without explicitly creating a separate node for the "heat sink".
[0047] This simplification is based on engineering judgment and focuses on the main interactions between functional components.
[0048] Step S3: Collect multi-dimensional operational status data in real time using sensors deployed at each node, specifically including the following sub-steps: S301: Data acquisition is based on the graph model constructed in step S2.
[0049] For each functional component node (such as rectifier, inverter, battery pack, cooling fan) in the graph model, deploy or utilize existing high-precision industrial-grade sensors on the physical component it represents to monitor the critical operating status of that node.
[0050] These sensors correspond directly to the parameters that the nodes need to monitor, ensuring that the state information of each node in the graph model has a real and reliable source of physical data.
[0051] If there are a few independent "sensing nodes" in the graph model, then the corresponding sensing units should also be configured for them.
[0052] S302: Set the data sampling frequency of all sensors to 10Hz (i.e., 10 samples per second). This frequency was selected after a trade-off. It is necessary to capture common UPS fault characteristics (such as voltage ripple, current harmonics and thermal inertia of temperature changes) while taking into account the processing capabilities of edge computing devices to avoid excessive data volume causing a burden.
[0053] S303: Through the aforementioned sensor network, multiple key operating parameters covering the three major health dimensions of the UPS system are collected in real time, mainly including power quality, thermal management, and signal integrity, aiming to provide comprehensive information for subsequent analysis.
[0054] The specific parameters collected mainly include 9 items: Rectifier input voltage (AC), rectifier output DC bus voltage (DC), inverter output AC voltage (AC), battery pack terminal voltage (DC), battery pack charging and discharging current, inverter IGBT module temperature, cooling fan speed, DC bus ripple current RMS value, and input current total harmonic distortion rate.
[0055] S304: During data acquisition, a hardware timestamp synchronization mechanism (e.g., a precision clock source based on GPS or the IEEE 1588 protocol) is used to ensure that the data acquired by all sensor channels are aligned with the same microsecond-level time reference, thereby eliminating errors caused by asynchronous acquisition times.
[0056] S305: The raw analog signals collected by each sensor are converted into digital values by a 16-bit (or higher precision, such as 24-bit) analog-to-digital converter (ADC) and represented in floating-point form.
[0057] The converted data is organized according to its physical sensor channels and temporarily stored in a local cache (such as RAM or flash memory of the edge computing device). The synchronized data stored by channel provides raw materials for subsequent steps to reorganize and construct time series tensors according to the graph model node dimension.
[0058] Step S4: Perform time window segmentation and standardization on the multidimensional runtime data to construct a multi-channel time series tensor. This includes the following sub-steps: S401: Set a fixed time window length of 150 milliseconds.
[0059] Based on the 10Hz sampling frequency in step S3, each time window corresponds to 15 consecutive sampling points. The real-time collected, timestamp-aligned multidimensional running status data stream is segmented into non-overlapping or overlapping (such as sliding) segments according to the length of this window to obtain a series of continuous data segments.
[0060] S402: For each monitored operating parameter (such as DC bus voltage, IGBT temperature, etc.), maintain its historical statistics independently. The specific method is: use a sliding time window of 7 days to calculate the mean (μ) and standard deviation (σ) of the parameter within the window.
[0061] The statistics window is updated every hour, thus forming a dynamic benchmark that changes slowly over time and reflects the recent normal operating status of the equipment.
[0062] During the first 7 days after the system is initially started or the device is connected, due to insufficient historical data, a preset typical value or a short window of initial data (such as the first hour) can be used to temporarily calculate statistics until 7 days of data are accumulated, at which point the sliding window mode described above is switched.
[0063] S403: For each sampling point within each time window, the following standardized formula is applied based on its parameter type:
[0064] in: The current sample value is represented by μ, the historical mean of the parameter is represented by μ, and the historical standard deviation of the parameter is represented by σ. This represents the standardized value.
[0065] This operation (also known as Z-score normalization) aims to eliminate numerical differences caused by different physical units (such as volts, amperes, and degrees Celsius) and transform all parameters to a common scale with a mean of 0 and a standard deviation of 1, making different parameters comparable.
[0066] S404: Reorganize and aggregate the standardized data based on the graph model nodes constructed in step S2. The specific method is as follows: Based on the physical functional component to which each sensor belongs, the time-series data it collects is merged into the corresponding functional component node.
[0067] For each node in the graph model, a two-dimensional matrix is created to represent the state evolution of that node within the current time window: the rows of the matrix correspond to the 15 time steps (i.e., 15 consecutive sampling points) within the time window; the columns of the matrix correspond to all the monitoring parameters belonging to that node, with each parameter constituting a feature dimension. For example, for a "cooling fan" node, its feature dimension may only be "speed"; while for a "rectifier" node, its feature dimension may include multiple dimensions such as "input voltage" and "output DC bus voltage".
[0068] S405: Stack the two-dimensional matrices corresponding to all nodes according to a predefined fixed node order (e.g., sorted by node type or node ID) to form a three-dimensional temporal tensor.
[0069] The three dimensions of this three-dimensional temporal tensor are: the number of nodes (corresponding to the total number of nodes N in the graph model), the number of time steps (fixed at T=15), and the feature dimension (the number of parameters corresponding to each node, denoted as...). (Different nodes may be different).
[0070] This three-dimensional temporal tensor is the "multi-channel temporal tensor" described in this scheme, and its dimensions can be represented as (N,T,F).
[0071] This tensor encapsulates multi-node information (node dimension), time-series dynamics (time-step dimension), and multi-parameter features of each node (feature dimension) in a manner aligned with the nodes of the graph model, providing structured input for subsequent graph neural network processing.
[0072] Step S5: Generate node state embedding vectors based on a multi-scale temporal feature encoder. This includes the following sub-steps: S501: Input the three-dimensional temporal tensor generated in step S4 into the multi-scale temporal feature encoder.
[0073] The tensor has dimensions (number of nodes N, time steps T=15, feature dimension F) and corresponds to the state sequence of all nodes within a 150-millisecond time window.
[0074] S502: The encoder core consists of three parallel one-dimensional convolutional neural network branches.
[0075] Each branch is independently configured with convolution kernels of different lengths to specifically capture dynamic patterns in UPS operating data at different time scales, which align with the equipment failure mechanism. Specifically: High-frequency branch: The convolution kernel length is 3 (receptive field 30 ms) and the stride is 1. It is specifically designed to extract millisecond-level high-frequency transient features caused by the rapid switching of power devices, such as voltage spikes or current glitches.
[0076] Mid-frequency branch: The convolution kernel length is 7 (receptive field 70 ms) and the stride is 1, which is used to capture the mid-term periodic fluctuation characteristics of tens of milliseconds caused by load step changes or power grid fluctuations.
[0077] Low-frequency branch: The convolution kernel length is 15 (receptive field 150 ms, covering the entire window), and the stride is 1. It is used to model trend drift features on time scales of seconds or even longer, such as slow battery capacity decay and capacitor aging.
[0078] S503: For each node (N in total) in the input tensor, the encoder performs the following process independently and in parallel: The two-dimensional temporal feature matrix (of shape (T,F)) corresponding to the node is simultaneously input into the three branches mentioned above. Within each branch, one-dimensional convolution operations are performed along the time dimension (T), and the number of channels of the convolution kernel is automatically matched with the input feature dimension (F). After each convolutional layer, batch normalization is performed in sequence (to stabilize the training process and accelerate convergence) and ReLU activation function processing is performed (to introduce nonlinear transformation).
[0079] After this step, each branch outputs a new two-dimensional feature map for that node. Its time dimension may be shortened due to the convolution operation (depending on whether padding is used), while the feature dimension (number of channels) becomes the number of output channels defined by the convolutional layer of that branch.
[0080] S504: After processing by S503, each branch outputs a two-dimensional feature map, whose dimensions can be represented as (T', C), where T' is the number of time steps remaining after the convolution operation (depending on the convolution kernel size and padding strategy), and C is the number of feature channels defined by the convolutional layer of that branch.
[0081] Next, global average pooling is performed on the feature map along the time dimension (i.e., the T' dimension), averaging the activation values at all positions of the feature map in the time dimension, and finally obtaining a one-dimensional feature vector of length C. This vector represents the comprehensive statistical features of the specific time-scale pattern captured by the branch within the entire time window, and exhibits a certain degree of stability to small translations of the features on the time axis.
[0082] S505: Obtain the feature vectors obtained after pooling the high-frequency, mid-frequency, and low-frequency branches respectively. The lengths of these vectors are respectively... , , (The number of output channels corresponding to each branch).
[0083] Then, these three feature vectors are concatenated along the feature dimension (i.e., the direction of the vectors themselves) to form a longer joint feature vector with a total dimension of [missing value]. The spliced vectors fuse key dynamic information across different time scales, from millisecond-level transients to second-level trends.
[0084] S506: Input the joint feature vector obtained by the above concatenation into a fully connected layer (also known as a linear layer). The role of the fully connected layer is to map the fused multi-scale features into a unified and more expressive high-dimensional space.
[0085] In this embodiment, the dimension of the space is set to 128, which is a commonly used design that can balance feature representation capability and model parameter quantity in practice. In addition, this dimension can be adjusted according to the complexity and computing resources of the specific UPS system.
[0086] Through the transformation of this fully connected layer, a 128-dimensional state embedding vector is finally generated for the node. For each node in the input tensor, the entire process from S503 to S506 is executed independently and in parallel.
[0087] Ultimately, each node obtains its own 128-dimensional vector, which comprehensively and concisely reflects the dynamic behavioral characteristics of the node's own operating state in the current 150-millisecond time window across multiple time dimensions.
[0088] Step S6: Perform context-aware information aggregation using an adaptive graph attention network. This includes the following sub-steps: S601: Input the 128-dimensional state embedding vector set generated for all nodes in step S5, the type identifier of each node (such as "power conversion node", "energy storage node", etc.), and the heterogeneous graph model constructed in step S2 (containing all nodes, directed edges and edge type identifiers "power transmission edge", "heat conduction edge", "control signal edge") into the adaptive graph attention network.
[0089] S602: Initialize a learnable 16-dimensional vector for each of the three predefined physical relation edge types (power transmission, heat conduction, and control signal) as the semantic embedding for that edge type. Let the type embedding of the edge pointing from node i to node j be denoted as... .
[0090] These embedding vectors will learn the differences in information transmission between different physical relationships during model training.
[0091] S603: The core of the network is an improved attention weight calculation unit.
[0092] For any directed edge in the graph model pointing from source node i to target node j, calculate its attention coefficient. This is used to quantify the importance of node i to node j. The calculation formula is as follows: ; in: , These are the 128-dimensional state embedding vectors of the source node i and the target node j, respectively.
[0093] , These are the type identifiers for nodes i and j (e.g., "power conversion node").
[0094] , It is a learnable weight matrix, and its key design feature is type-specific transformation: based on the type of node i... Select the corresponding weight from a set of weight matrices specifically prepared for each node type. right Perform a linear transformation; similarly, based on the type of node j... choose right The transformation allows the features of different types of nodes to be mapped through different parameter spaces, thus better adapting them to their physical roles.
[0095] It is a shared, learnable weight matrix used for embedding vectors of all edge types. Perform the transformation.
[0096] 'a' is a shared, learnable attention vector used to map concatenated high-dimensional features to a scalar attention score.
[0097] concat(·) represents the vector concatenation operation.
[0098] S604: The above design makes The computation depends on the features of the source node, the features of the target node, and the physical semantics (type) of the connecting edges. For example, during network training, when the system is under conditions of rising temperature, the "heat conduction edge" connecting the cooling fan node i and the inverter node j corresponds to... The value may be learned to be larger, which suggests that the weight of fan status in inverter health assessment increases accordingly in this scenario.
[0099] S605: For the target node j, calculate the attention coefficients of all its incoming edges (i.e., all edges pointing to j). Then, the softmax function is used to normalize these coefficients to obtain the normalized attention weights for each edge. (satisfy ).
[0100] Then, the updated feature representation of node j The result is obtained by weighted summation of the transformed features of all its incoming neighbor nodes i. ; in, This represents the set of all incoming neighbor nodes of node j. At this point, the representation of node j has incorporated the context information of its direct physical neighbors.
[0101] Step S7: Achieve global context awareness through multi-round graph attention aggregation. This includes the following sub-steps: S701: In this embodiment, based on the analysis of the length of critical coupling paths within a typical UPS system (e.g., from the battery pack to the inverter output, it typically requires 2-3 intermediate components), the number of aggregation layers of the adaptive graph attention network is set to 3. This indicates that the information aggregation and node representation update process described in step S6 will be performed three times to ensure that the information can cover most of the important physical association paths within the system.
[0102] S702: Perform the first round of aggregation (1-hop neighborhood fusion).
[0103] The node representation generated in step S6 (i.e., the representation of each node after fusing its own temporal features with information from its direct neighbors) is used as input and fed into the first layer of the adaptive graph attention network for forward propagation.
[0104] After this layer of processing, each node generates a new, updated representation vector. At this point, each node's representation already includes the state information of its "one-hop" neighborhood (i.e., all components directly physically connected to it). For example, the representation of an inverter node now incorporates information from nodes directly connected to it, such as the DC bus voltage and IGBT temperature sensors.
[0105] S703: Perform the second round of aggregation (extending to 2-hop neighborhood).
[0106] The updated node representations from the first-layer aggregation output are used as input to the second-layer adaptive graph attention network (which has the same structure as the first layer but with independent parameters). In this layer, information can be passed twice through the edges of the graph.
[0107] Therefore, after the second layer of aggregation, the representation of each node can perceive the state influence of components within its "2-hop" neighborhood. For example, the inverter node can indirectly incorporate the state information from the "battery pack" node (second-hop neighbor) through the intermediate node "DC bus" (first-hop neighbor) (propagation path: inverter ← DC bus ← battery pack).
[0108] S704: Perform the third round of aggregation (extending to 3-hop neighborhood).
[0109] The node representations from the second round of aggregation are input into the third layer of the adaptive graph attention network for forward propagation.
[0110] After this processing, the information propagation distance in the graph reaches 3 hops, enabling the representation of any node to fully absorb the state information of remote components that are not directly connected in physical topology but still have strong functional coupling through one or two intermediate components.
[0111] For example, the abnormal discharge characteristics of the battery pack can significantly affect the characterization of the inverter node after the third-level aggregation through the path of "battery pack → DC bus → rectifier → inverter"; similarly, the performance changes of the cooling fan can be more completely mapped to the characterization of the heat source (such as IGBT) node.
[0112] Therefore, the model has been able to capture the synergistic relationships of most critical cross-component energy, heat, and signal flows within a UPS system.
[0113] S705: After the above three layers of sequential graph attention aggregation, each node finally outputs a 128-dimensional vector, which is defined as "globally context-aware node state representation".
[0114] The “global” nature of this representation is reflected in the fact that it not only encapsulates the dynamic behavior of the node itself over multiple time scales (from step S5), but more importantly, through the multi-hop information transmission of the graph structure, it deeply integrates the operating status of numerous components that are directly (1 hop), indirectly (2 hops), or even further (3 hops) functionally related to the node in the physical topology.
[0115] At this point, the representation of each node is no longer isolated, but becomes a "health status microcosm" reflecting its collaborative working context in the entire UPS system. This representation, rich in global topological correlation information, provides a key information foundation for achieving highly sensitive anomaly detection and clear anomaly propagation path tracing in subsequent steps.
[0116] Step S8: Anomaly scoring and determination based on graph autoencoder. The graph autoencoder learns the latent patterns during normal system operation, and anomalies are identified by comparing the differences between the input and the reconstructed data. This includes the following sub-steps: S801: The core of this module is a graph autoencoder, which aims to learn from a large amount of normal data the "normal" pattern of the state representation of each node in a healthy UPS system and the topological relationships between them.
[0117] The autoencoder consists of two parts: an encoder and a decoder.
[0118] S802: The network structure and parameters of the encoder are exactly the same as the "adaptive graph attention network" with 3-layer aggregation described in steps S6 to S7. Its input is the global context-aware node representation output in step S7, and its output is a compressed, low-dimensional hidden layer representation that contains the global features of the entire graph (i.e. the entire UPS system) under normal conditions.
[0119] S803: The decoder is responsible for reconstructing the original node state embedding vector from the hidden layer representation output by the encoder. To achieve this, the decoder employs a graph attention mechanism that operates in the opposite direction to the information propagation of the encoder. Specifically: The decoder is also composed of multiple layers of graph attention network stacked together, with the same number of layers as the encoder (3 layers in this example). The operation logic of each layer of the decoder network is similar to that of the encoder, but the information flow direction is designed to be from the node representation of the current layer to its upstream source node (i.e., according to the directed edges of the original graph, pointing to the neighboring nodes of the node) for attention aggregation and feature update. Through this backpropagation, the decoder attempts to gradually recover the local details of each node, starting from the hidden layer representation that contains global information.
[0120] The final output of the decoder is a reconstruction estimate of the original 128-dimensional state embedding vector for each node.
[0121] S804: The training of the model depends on historical normal operation data. When preparing training data, a large number of node representation sequence samples are collected from the historical normal operation cycle and processed according to steps S1-S7. The samples are regarded as "normal" samples.
[0122] The training objective is to optimize all learnable parameters of the encoder and decoder so that the autoencoder can reconstruct these normal samples as accurately as possible. Specifically, a strategy of minimizing reconstruction error is adopted, and the loss function L is defined as the mean of the reconstruction errors of all nodes:
[0123] Where N is the total number of nodes in the graph. It is the original 128-dimensional state embedding vector of node i (from step S5). It is the vector of the corresponding node reconstructed by the decoder. It represents the L2 norm (Euclidean distance) of a vector.
[0124] By minimizing this loss function through optimization algorithms (such as Adam), the model learns to capture the data distribution under normal conditions.
[0125] S805: During the online monitoring (inference) phase, for the current time window, the "global context-aware node state representation" obtained from steps S1-S7 is input into the trained graph autoencoder, and the autoencoder decoder outputs the reconstruction result of the state vector of each node.
[0126] For each node in the graph, calculate the Euclidean distance between its reconstructed vector and the original 128-dimensional state embedding vector of the node (i.e., the output of step S5). The Euclidean distance value is defined as the abnormal score of the node at the current time. The higher the score, the greater the difference between the actual behavior pattern of the node and the "normal" pattern learned by the model.
[0127] S806: To convert anomaly scores into binary anomaly determinations, a preset threshold needs to be set for each node. This threshold is determined based on historical normal data used during model training. The specific method is as follows: Using all the normal samples collected during the training phase, for each node, calculate its abnormal score on all normal samples to form a score set.
[0128] Sort this set in ascending order and take the score value at the 99.9th percentile as the anomaly detection threshold for that node. For example, if there are 100,000 normal samples, take the 99,900th score value (sorted from low to high). This method aims to control the theoretical false alarm rate to about 0.1%.
[0129] S807: During online inference, the system calculates the anomaly score of each node in real time and compares the real-time anomaly score of each node with its corresponding preset threshold (from S806).
[0130] If the anomaly score of a node exceeds the threshold set for it, it is determined that the physical component corresponding to that node or its closely related area is in an abnormal state.
[0131] The system records information about all nodes that are identified as abnormal (such as node ID, abnormal score, timestamp, etc.) to trigger subsequent root cause analysis processes.
[0132] Step S9: Tracing the abnormal propagation path and generating an interpretable diagnostic report, which includes the following sub-steps: S901: When step S8 determines that the abnormal score of at least one node exceeds the preset threshold, this process is triggered. The system first selects the node with the highest abnormal score from all abnormal nodes and marks it as the "target node" for this diagnosis.
[0133] S902: Extract the attention weights corresponding to all incoming edges of the target node from the last layer of the graph attention network in step S7. .
[0134] Based on the characteristics of the graph attention mechanism, after the model has been fully trained, these weights can reflect the contribution of the features of each upstream neighbor node to the final representation of the target node in the current (potentially abnormal) state of the system, providing data-driven clues for locating potential sources of influence.
[0135] Sort these upstream neighbor nodes in descending order according to their corresponding attention weight values.
[0136] It should be noted that not all upstream nodes need to be included in the traceability. A contribution threshold is set (for example, the cumulative contribution reaches 80% or a single weight is greater than 10% of the total incoming edge weight). Upstream nodes with significantly higher weights are selected from the sorted list to form a "key upstream node set" in order to focus on the main influencing factors and eliminate noise interference.
[0137] S903: Select nodes from the set of key upstream nodes for tracing, usually prioritizing the node with the highest weight in the set as the first tracing starting point.
[0138] Using this node as the current backtracking point, the edge between it and the target node is used as the initial path segment to start reverse tracing; at the same time, the system reserves the possibility of performing parallel or alternative tracing on other high-weight nodes to deal with the situation of multi-source failures.
[0139] S904: Starting from the current backtracking point, recursively trace back to its upstream nodes along its incoming edges. This process continues until any of the following stopping conditions are met: (1) Trace back to the system-defined boundary node, such as a node without an upstream power source (e.g., a mains input port) or a node without an upstream heat source (e.g., an ambient temperature reference point). (2) Trace back to the node that has already appeared in the current path (to prevent loops); (3) The tracing depth reaches the preset upper limit (e.g., 5 hops) to avoid excessive tracing in complex relationships; (4) If the attention weight of the edge to be traced is lower than the set minimum threshold (e.g., 0.05), it indicates that its influence is very weak and can be ignored.
[0140] S905: While tracing back to each node on the access path, the system simultaneously checks whether the key status parameters corresponding to that node have deviated significantly. Key status parameters typically refer to the most critical monitoring parameters for that node type, directly related to its physical relationship (e.g., power, heat, control) with the current tracing path. For example, for a "power conversion node," key parameters usually include voltage and current; for a "heat dissipation node," key parameters are usually rotational speed and temperature. The specific verification method is as follows: (1) Calculate the deviation factor: For each key parameter of the node, calculate its deviation factor. , where x represents the current measurement value, μ represents the historical mean of the parameter, and σ represents the historical standard deviation of the parameter.
[0141] (2) Determine significant deviation: If D>2, in this embodiment, it is considered that the parameter has a significant deviation at the current node (i.e., the deviation exceeds 2 times the standard deviation). The threshold (2) can be adjusted according to the trade-off between sensitivity and false alarm rate in the specific application scenario.
[0142] S906: For each node on the path that shows a significant deviation, check whether the direction of its parameter deviation conforms to the conduction logic of energy flow, heat flow, or signal flow on that physical path. The verification logic needs to be specifically formulated according to the edge type (power transmission, heat conduction, control signal) defined in step S1. For example: Power transmission path: If the voltage at the downstream node drops abnormally, the current at the upstream node should tend to increase.
[0143] Heat conduction path: If the temperature of downstream nodes (such as heat sinks) rises, the temperature of upstream heat-generating nodes (such as IGBTs) should have risen as well, or as a control response, the fan speed should have increased.
[0144] Control signal path: If the downstream node (such as the actuator) operates abnormally, the output signal of the upstream control node (such as the controller) should change accordingly.
[0145] A path is retained and considered a reasonable suspected fault propagation path only if the deviation directions of all nodes on the path are consistent with the corresponding physical causal logic; otherwise, the path is eliminated.
[0146] S907: Based on all verified traceability paths, the system generates a final diagnostic report. The report contains the following three parts of structured information: (1) List of suspected faulty components: From all the key upstream nodes, list the upstream nodes (components) that have the highest contribution to the target node's abnormality. Usually, the top 3 or nodes whose cumulative contribution exceeds a certain percentage (e.g., 80%) are selected. The contribution is sorted from high to low and the percentage is noted.
[0147] The contribution is determined by comprehensively considering the attention weight in step S902. A quantitative index that deviates from the parameter calculated in step S905 by a factor D, for example, can be a weighted product of the two, the core of which is to reflect both the strength of the topological influence and the degree of its own anomaly.
[0148] (2) Abnormal propagation path: In the form of “component A→component B→component C”, select the path with the most concentrated contribution from all the paths that have passed the verification, and clearly present the most likely abnormal propagation path.
[0149] (3) Key parameter details: For each suspected component, list in detail the current measured value, historical mean, historical standard deviation and calculated deviation multiple of the key state parameters that show significant deviation.
[0150] Example report: "Suspected faulty components: DC bus capacitor (contribution 68%), rectifier (22%), battery pack (10%); Abnormal propagation path: battery pack → DC bus → rectifier; Key parameter details: DC bus capacitor node - RMS ripple current: measured 1.8 amps, historical average 0.5 amps, standard deviation 0.1 amps, deviation multiple 13.0."
[0151] Example 2 Based on the method described in Embodiment 1, this embodiment provides a UPS operation status data analysis system. The system is implemented by an integrated software and hardware platform, aiming to translate the algorithm flow of Embodiment 1 into a real-time running solution.
[0152] The hardware platform mainly includes edge computing units that connect to the UPS equipment in the field, and embedded AI acceleration chips (such as modules equipped with NPUs or GPUs) for accelerating neural network inference. The edge computing units are responsible for general logic control, data scheduling, and communication, while the AI acceleration chips are dedicated to performing computationally intensive neural network forward propagation.
[0153] At the software level, the system functionality is designed as multiple collaborative modules, with each module interacting with data through clearly defined interfaces: The physical topology parsing module is responsible for reading the structured topology database corresponding to a specific UPS equipment model. This database is built based on the equipment technical data and stores information such as component entities, connection relationships and functional dependency triples as described in step S1 of Example 1 in a machine-readable form.
[0154] The dynamic heterogeneous graph construction module runs in the memory of the edge computing unit. It receives the output of the topology parsing module and, based on the triplet relationships defined therein, dynamically instantiates the corresponding heterogeneous graph objects using graph data structures (such as adjacency lists) to complete node creation, type allocation, and directed edge establishment as described in step S2 of Example 1.
[0155] The multi-source data acquisition module interacts with various sensors inside or outside the UPS equipment through standard industrial communication protocols (such as Modbus TCP) to realize timed polling or subscription of data, and is responsible for encapsulating protocol details and uniformly reading the original values of various operating parameters listed in step S3 of Example 1.
[0156] The time series data preprocessing module, deployed in the edge computing unit, receives the collected raw data stream and operates strictly according to the process defined in step S4 of Example 1, including data alignment based on hardware timestamps, segmentation with a fixed time window (150 milliseconds), Z-score normalization based on a sliding historical window (7 days), and finally organizes and outputs a three-dimensional time series tensor with dimensions of (number of nodes N, number of time steps T=15, feature dimension F).
[0157] The core neural network inference module includes a multi-scale temporal feature encoder and an adaptive graph attention network. After compilation and optimization, these two modules are deployed on an embedded AI acceleration chip to meet real-time requirements by utilizing its parallel computing capabilities. They sequentially receive preprocessed temporal tensors and graph structure information, execute all calculations from steps S5 to S7 of Example 1, and generate a globally context-aware node state representation.
[0158] The intelligent analysis and diagnosis module, a software logic module, also runs on the edge computing unit and integrates anomaly scoring and interpretable diagnostic generation functions. Specifically: Anomaly scoring determination: The graph autoencoder deployed on the AI chip (whose encoder shares a structure with the adaptive graph attention network) is invoked to calculate the node anomaly score and compare it with a preset threshold (based on the 99.9 percentile of historical normal data) to achieve anomaly detection as in step S8 of Example 1.
[0159] Explainable diagnostic generation: Once an anomaly is detected, the anomaly propagation path backtracking algorithm described in step S9 of Example 1 is immediately triggered. This algorithm utilizes graph attention weights and historical data to automatically trace and verify the anomaly propagation path, generating a structured diagnostic report within 5 seconds that includes suspected faulty components, propagation links, and details of key parameter deviations.
[0160] This system receives data from sensors, processes it through a data acquisition and preprocessing module, and then feeds it into an AI acceleration chip for deep feature extraction and graph inference. The results are then sent back to the edge computing unit for anomaly scoring and diagnostic analysis.
[0161] In addition, the system design has two significant features: First, the model training only relies on historical normal operation data, without the need to collect and label fault samples, thus lowering the deployment threshold; second, by configuring different device topology databases, the same set of analysis cores can be adapted to different models of UPS equipment, and has good cross-model deployment capabilities.
[0162] It will be apparent to those skilled in the art that the present invention is not limited to the details of the exemplary embodiments described above, and that the present invention can be implemented in other specific forms without departing from the spirit or essential characteristics of the present invention. Therefore, the embodiments should be regarded as exemplary and non-limiting in all respects.
[0163] Furthermore, it should be understood that although this specification describes embodiments, not every embodiment includes only one independent technical solution. This narrative style is merely for clarity. Those skilled in the art should consider the specification as a whole, and the technical solutions in each embodiment can also be appropriately combined to form other embodiments that can be understood by those skilled in the art.
Claims
1. A method for analyzing UPS operating status data, characterized in that, include: Based on the physical topology information of the UPS device to be analyzed, a dynamic heterogeneous graph model is constructed, wherein the physical topology information includes the structural connection relationship and functional dependency relationship between components; The system acquires multi-channel time-series running status data of each component node, and processes the time-series running status data based on a multi-scale time-series feature encoder to generate state embedding vectors for each node. The state embedding vector and the dynamic heterogeneous graph model are input into the adaptive graph attention network, and multiple rounds of information aggregation are performed along the physical relationship edges defined in the graph to obtain the node state representation that integrates the global context. Based on the graph autoencoder, the anomaly score of each node is calculated according to the node state representation and its reconstruction error, and anomaly determination is made based on a preset threshold. In response to the anomaly determination, the anomaly propagation path is traced back based on the attention weights output by the adaptive graph attention network, and an interpretable diagnostic report containing suspected faulty components and propagation paths is generated.
2. The UPS operating status data analysis method according to claim 1, characterized in that, Based on the physical topology information of the UPS equipment to be analyzed, a dynamic heterogeneous graph model is constructed, which includes: Identify the functional components in the UPS equipment and the sensors deployed on them, forming a set of entities; Based on the technical documentation and domain knowledge of UPS equipment, the structural connection relationships and functional dependencies between entities are extracted. The structural connection relationships include electrical conduction paths, heat conduction paths, and control signal paths. The functional dependencies are derived based on the chain reaction of energy transfer, heat diffusion, or control signals between components. Relationships are uniformly represented as triplets of (source component, relation type, target component) as the basis for constructing the graph model.
3. The UPS operating status data analysis method according to claim 2, characterized in that, Constructing dynamic heterogeneous graph models also includes: Each functional component is abstracted as a graph node, and the sensor data on it is attributed to that node. Each node is assigned a type identifier, which includes power conversion node, energy storage node, heat dissipation node, and sensing node; Based on the set of triples, a directed edge with the corresponding edge type is established between the corresponding source node and the target node. The direction of the directed edge follows the natural propagation direction of energy flow, heat flow, or signal flow in the physical world.
4. The UPS operating status data analysis method according to claim 1, characterized in that, Acquire multi-channel time-series operational status data of each component node, and process the time-series operational status data based on a multi-scale time-series feature encoder, specifically including: The synchronous operation status data collected by the sensor is segmented into segments with a fixed time window, and each parameter is standardized based on historical statistics. The standardized data is reorganized according to the nodes to which they belong, and a three-dimensional temporal tensor with dimensions of (number of nodes, number of time steps, and feature dimension) is constructed. The three-dimensional temporal tensor is input into a multi-scale temporal feature encoder. The multi-scale temporal feature encoder extracts high-frequency, mid-frequency, and low-frequency temporal features through parallel branches with convolutional kernels of different lengths, and then fuses the features extracted by each branch and maps them into a unified node state embedding vector.
5. The UPS operating status data analysis method according to claim 4, characterized in that, The multi-scale temporal feature encoder comprises three parallel one-dimensional convolutional neural network branches: The high-frequency branch, configured with a first-length convolutional kernel, is used to extract millisecond-level high-frequency transient features; The mid-frequency branch is configured with a second-length convolutional kernel that is longer than the first length, and is used to extract mid-term fluctuation features at the ten-millisecond level. The low-frequency branch, configured with a third-length convolutional kernel covering the entire time window, is used to extract second-level trend drift features.
6. The UPS operating status data analysis method according to claim 1, characterized in that, The adaptive graph attention network aggregates information through an improved attention weight calculation unit. For a directed edge from source node i to target node j, its attention coefficient is... The computation depends on: the state embedding vector of source node i The state embedding vector of target node j Edge type embedding vector And the learnable weight matrices corresponding to the type identifiers of nodes i and j, respectively. and .
7. The UPS operating status data analysis method according to claim 1, characterized in that, Performing multi-round information aggregation involves executing at least three layers of graph attention aggregation operations, such that the final state representation of each node is fused with the state information of its components in its multi-hop neighborhood.
8. The UPS operating status data analysis method according to claim 1, characterized in that, The preset threshold is determined in the following way: The graph autoencoder is trained based on historical normal operation data, and the reconstruction error distribution of each node on historical normal samples is calculated. The reconstruction error value corresponding to the specified high quantile of the distribution is used as the anomaly judgment threshold of the node.
9. The UPS operating status data analysis method according to claim 1, characterized in that, The abnormal propagation path backtracking includes: Select the node with the highest anomaly score as the target node; Based on the attention weights output by the adaptive graph attention network, identify the set of key upstream nodes that contribute the most to the target node; Starting from the key upstream node, recursively trace the upstream node in the opposite direction of the directed edges in the graph to form a potential anomaly propagation path; For nodes on the propagation path, verify whether the deviation direction of their key operating parameters conforms to the causal logic of the corresponding physical relationship, and retain the paths that pass the verification for generating diagnostic reports.
10. A UPS operation status data analysis system, used to implement the UPS operation status data analysis method according to any one of claims 1 to 9, characterized in that, The system includes: The physical topology parsing module is used to read and parse the physical topology information of the UPS device; A dynamic heterogeneous graph construction module is used to construct a dynamic heterogeneous graph model based on the physical topology information. The multi-source data acquisition and preprocessing module is used to collect the running status data of each component node, and perform time window segmentation and standardization processing to construct a multi-channel time series tensor. The core neural network inference module, including a multi-scale temporal feature encoder and an adaptive graph attention network, is used to generate node state embedding vectors and perform multi-round context-aware information aggregation to output node state representations. The intelligent analysis and diagnosis module, including a graph autoencoder and a path backtracking unit, is used to detect anomalies based on node state representations and to backtrack the propagation path and generate a diagnostic report when an anomaly is detected. The core neural network inference module is deployed on an AI acceleration chip, while the remaining modules are deployed on edge computing units.