Substation control system alarm aggregation processing method, system, device and medium
By standardizing and integrating the data of the substation centralized control system, the problems of alarm homogenization and insufficient integration of multi-source data have been solved, enabling precise operation and maintenance decisions and efficient resource utilization for substation equipment.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- STATE GRID ANHUI ELECTRIC POWER CO LTD ELECTRIC POWER SCI RES INST
- Filing Date
- 2026-01-27
- Publication Date
- 2026-06-19
AI Technical Summary
Existing alarm aggregation technologies in substation centralized control systems suffer from problems such as alarm homogenization, insufficient fusion of multi-source data, and weak dynamic adaptation capabilities. This results in maintenance resources not being accurately allocated to high-risk equipment, insufficient support for maintenance decision-making, and difficulty in meeting the refined maintenance needs of new power grids.
By collecting full lifecycle health data, real-time operating condition data, and real-time alarm texts of power equipment, standardizing the data, calculating health index, multi-source operating condition characteristics, and text depth features, and performing feature weighted fusion, we can generate directly executable hierarchical operation and maintenance decision information.
It enables precise aggregation of alarm information and closed-loop implementation of operation and maintenance decisions, improving the accuracy of operation and maintenance decision analysis and ensuring the efficient utilization of operation and maintenance resources.
Smart Images

Figure CN122241167A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of substation centralized control alarm technology, specifically to a method, system, equipment, and medium for alarm aggregation processing in a substation centralized control system. Background Technology
[0002] With the continuous advancement of substation centralized control system construction, multi-dimensional monitoring of equipment operating status can now be achieved, generating alarm information for the substation centralized control system. This alarm information serves as direct feedback on abnormal equipment conditions, and its processing efficiency and accuracy directly impact the substation's operation and maintenance response speed and equipment operational reliability. However, current substation centralized control system alarm aggregation technology still faces three major bottlenecks, making it difficult to meet the refined operation and maintenance needs of new power grids:
[0003] 1. Alarm aggregation is highly homogenized and fails to correlate with the intrinsic health status of equipment. Traditional alarm aggregation methods are mostly based on real-time data thresholds or text keyword clustering, ignoring the health degradation trend and historical maintenance characteristics of equipment throughout its entire lifecycle. This results in core risk alarms being masked by redundant information, and maintenance resources failing to be accurately allocated to high-risk equipment.
[0004] 2. Insufficient fusion of multi-source data and inadequate semantic understanding. Existing methods often process health data, operating condition data, and alarm text in isolation, without establishing a deep collaborative mechanism for the three types of data; text semantic processing is limited to keyword extraction, lacking deep semantic modeling specific to the power industry, and unable to distinguish the fine-grained differences between similar expressions such as "main transformer oil temperature exceeds the limit" and "main transformer winding temperature exceeds the limit," resulting in a high rate of false alarm semantics.
[0005] 3. Weak dynamic adaptation capability and insufficient support for operation and maintenance decisions. Traditional methods use fixed weight fusion and static threshold screening, which cannot dynamically adjust the feature contribution based on different scenarios such as high-incidence periods of failure and normal operation and maintenance periods. At the same time, the alarm aggregation results only output classification labels, without linking historical failure cases and operation and maintenance procedures, and lack directly executable hierarchical operation and maintenance instructions, resulting in a break in the "alarm-decision-execution" link and low response efficiency of operation and maintenance personnel.
[0006] In existing technologies, CN 119651896 A discloses a substation equipment monitoring method based on a centralized control system. This method triggers an early warning mechanism by monitoring the operating status data of substation equipment, such as current, voltage, temperature, and vibration parameters. However, it lacks a dedicated health modeling scheme designed specifically for the mechanisms of power equipment. CN 114970508 A discloses a power text knowledge discovery method and device based on multi-source data fusion. This method automatically extracts key information needed for defect diagnosis from power natural language data and power equipment status text, and combines graph neural network technology and rule engine technology to evaluate the operating status of power equipment recorded in the power text. It also achieves multi-source fusion of at least two text types. However, it does not adapt to the structured features of power terminology and alarm text; the multi-source fusion and decision-making linkage mechanisms are lacking, making it difficult to cope with alarm processing needs in complex scenarios.
[0007] Therefore, how to construct a three-in-one integrated architecture of "health status - multi-source operating conditions - text semantics" to achieve accurate aggregation of alarm information and closed-loop implementation of operation and maintenance decisions has become a core issue that urgently needs to be addressed in the field of alarm processing of centralized control systems. Summary of the Invention
[0008] The technical problem to be solved by this invention is how to achieve accurate aggregation of alarm information and improve the accuracy of operation and maintenance decision analysis.
[0009] The present invention solves the above-mentioned technical problems through the following technical means: Collect full life cycle health data, real-time operating condition data, and real-time alarm text of power equipment, and perform data standardization processing on the full life cycle health data, the real-time operating condition data, and the real-time alarm text to obtain standard data; The remaining service life of the power equipment is calculated based on the first normalized data in the standard data, and the health index and health degradation rate of the power equipment are calculated based on the remaining service life. Based on the second normalized data in the standard data, the real-time load deviation rate, voltage over-limit degree and environmental risk coefficient of the power equipment are calculated to obtain the multi-source operating condition characteristics. The initial text features of the standard text data in the standard data are extracted using a pre-built real-time coarse classification model, and the deep text features are constructed based on the initial text features using a preset collaborative decision-making logic. The health index, the health decay rate, the multi-source operating condition features, and the text depth features are weighted and fused to obtain the fused feature vector of the power equipment. The risk level of the power equipment is calculated based on the fused feature vector. The power equipment is then aggregated and grouped according to the risk level to obtain multiple aggregate groups. Alarm and maintenance decision information is generated for each aggregate group.
[0010] Optionally, the data standardization processing of the full life cycle health data, the real-time operating condition data, and the real-time alarm text to obtain standard data includes: Calculate the data integrity index for the full life cycle health data and the real-time operating condition data respectively; When the data integrity index is less than the preset integrity threshold, data is re-sampling to obtain re-sampling data. The re-sampling data is then normalized to obtain normalized data. When the data integrity index is greater than or equal to the integrity threshold, the full life cycle health data and the real-time operating condition data are subjected to data normalization processing to obtain first normalized data and second normalized data. The real-time alarm text is cleaned to obtain standard text data; The first normalized data, the second normalized data, and the standard text data are combined to obtain standard data.
[0011] Optionally, calculating the remaining service life of the power equipment based on the first normalized data in the standard data includes: Extract health indicator data from the first normalized data; The health indicator features of the health indicator data are extracted using a pre-trained remaining service life prediction model. The remaining service life of the power equipment is calculated based on the health indicator characteristics.
[0012] Optionally, calculating the health index and health degradation rate of the power equipment based on the remaining service life includes: Calculate the average health index of the power equipment; The health index is calculated using a preset health index formula based on the average value of the health indicators and the remaining service life. The health degradation rate of the power equipment is calculated based on the health index.
[0013] Optionally, the step of calculating the real-time load deviation rate, voltage exceedance degree, and environmental risk coefficient of the power equipment based on the second normalized data in the standard data to obtain multi-source operating condition characteristics includes: The real-time load deviation rate of the power equipment is calculated based on the real-time load rate in the second normalization. The degree of voltage exceedance of the power equipment is calculated based on the bus voltage in the second normalized data; The environmental risk coefficient of the power equipment is calculated based on the ambient temperature and humidity in the second normalization. By combining the real-time load deviation rate, the voltage over-limit degree, and the environmental risk coefficient, multi-source operating condition characteristics are obtained.
[0014] Optionally, the step of constructing deep text features based on the initial text features using a preset collaborative decision-making logic includes: The initial features of the text are classified and predicted to obtain the prediction probability, and the text confidence of the standard text data is determined based on the prediction probability. When the text confidence score is less than a preset confidence threshold, a pre-constructed deep feature extraction model is used to extract the deep semantic vector corresponding to the standard text data. Construct deep text features based on the deep semantic vector and the initial text features; When the text confidence level is greater than or equal to the confidence level threshold, the initial text features are filled with special features to obtain the text depth features.
[0015] Optionally, the step of performing feature weighted fusion of the health index, the health decay rate, the multi-source operating condition features, and the text depth features to obtain the fused feature vector of the power equipment includes: A health feature is constructed based on the health index and the health decay rate. The health status feature and the multi-source working condition feature are mapped to the feature dimension corresponding to the text depth feature to obtain the mapped health status feature and the mapped multi-source working condition feature. The mapped health features, the mapped multi-source operating condition features, and the text depth features are standardized to obtain standardized health features, standardized operating condition features, and standardized text depth features. The fusion weights of the standardized health feature, the standardized working condition feature, and the standardized text depth feature are calculated respectively. The standardized health features, standardized operating condition features, and standardized text depth features are weighted and fused using the fusion weights to obtain the fused feature vector of the power equipment.
[0016] To address the aforementioned problems, this invention also proposes an alarm aggregation and processing system for a substation centralized control system, the system comprising: The data standardization processing module is used to collect the full life cycle health data, real-time operating condition data and real-time alarm text of power equipment, and to perform data standardization processing on the full life cycle health data, the real-time operating condition data and the real-time alarm text respectively to obtain standard data. The health index and health decay rate calculation module is used to calculate the remaining service life of the power equipment based on the first normalized data in the standard data, and to calculate the health index and health decay rate of the power equipment based on the remaining service life. The multi-source operating condition characteristic calculation module is used to calculate the real-time load deviation rate, voltage over-limit degree and environmental risk coefficient of the power equipment based on the second normalized data in the standard data, so as to obtain the multi-source operating condition characteristics. The text deep feature construction module is used to extract the initial text features of the standard text data in the standard data using a pre-built real-time coarse classification model, and to construct the text deep features based on the initial text features using a preset collaborative decision logic. The fusion feature vector generation module is used to perform feature weighted fusion of the health index, the health decay rate, the multi-source operating condition features, and the text depth features to obtain the fusion feature vector of the power equipment. The alarm operation and maintenance decision information generation module is used to calculate the risk level of the power equipment based on the fused feature vector, aggregate and group the power equipment according to the risk level to obtain multiple aggregate groups, and generate alarm operation and maintenance decision information for each aggregate group.
[0017] The present invention also provides a processing device, characterized in that it includes at least one processor and at least one memory communicatively connected to the processor, wherein: the memory stores program instructions executable by the processor, and the processor can execute the above-described method for alarm aggregation processing of a substation centralized control system by calling the program instructions.
[0018] The present invention also provides a computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions, the computer instructions causing the computer to execute the above-described method for alarm aggregation processing in a substation centralized control system.
[0019] The advantages of this invention are: This invention, through data standardization, can eliminate abnormal data and dimensional differences, while ensuring the cleanliness and consistency of real-time alarm text, thus improving the accuracy of subsequent feature extraction. Furthermore, by calculating the health index, health decay rate, and multi-source operating condition features, it can reflect the long-term health status and aging trend of equipment, as well as real-time operational risks, providing multi-faceted foundational information for subsequent alarm and maintenance decision-making. Extracting the text depth features of standard text data from the standard data accurately reflects the core meaning of the text. Then, by performing feature weighting and fusion based on the health index, health decay rate, multi-source operating condition features, and text depth features, a fused feature vector is obtained. This enables the construction of a three-in-one fusion architecture of "health level - multi-source operating condition - text semantics," resulting in directly executable hierarchical maintenance instructions, achieving precise aggregation of alarm information and closed-loop implementation of maintenance decisions. Attached Figure Description
[0020] Figure 1 This is a flowchart illustrating an alarm aggregation processing method for a substation centralized control system according to an embodiment of the present invention; Figure 2 This is a comparison chart of fault diagnosis response delay in one embodiment of the present invention. Figure 3 This is a functional module diagram of a substation centralized control system alarm aggregation processing system provided in one embodiment of the present invention. Detailed Implementation
[0021] To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below in conjunction with the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0022] Reference Figure 1 The diagram shown is a flowchart illustrating an alarm aggregation processing method for a substation centralized control system according to an embodiment of the present invention. In this embodiment, the alarm aggregation processing method for the substation centralized control system includes: S1. Collect the full life cycle health data, real-time operating condition data, and real-time alarm text of the power equipment, and perform data standardization processing on the full life cycle health data, the real-time operating condition data, and the real-time alarm text to obtain standard data.
[0023] In this embodiment of the invention, the full lifecycle health data includes equipment service life Tyear, repair rate, and specific health indicators (such as the H2 and CH4 content of dissolved gases (DGA) in the main transformer oil, SF6 humidity and partial discharge of GIS (Gas Insulated Switchgear) equipment, and the opening time of high-voltage circuit breakers, etc., which can be obtained through the Production Management System and the online monitoring platform of power equipment; real-time operating condition data includes real-time load rate P. load Bus voltage (U) bus ), ambient temperature (T) env ), humidity (H) env Data acquisition and monitoring can be collected using SCADA (Supervisory Control And Data Acquisition) systems and meteorological platforms; real-time alarm text is the alarm text of power equipment that is triggered in real time, and is collected through the alarm log of the power equipment's centralized control system.
[0024] The format of fields in the real-time alarm text strictly follows the five-segment standard Q / GDW11021-2013. The field format in the real-time alarm text is defined as follows:
[0025] in, This indicates the real-time alarm text. Indicates the level field, Represents a time field. This field represents the full path of the device. Indicates a behavior field. Indicates the reason field. Indicates the field separator.
[0026] Furthermore, data standardization processing involves quality verification of full lifecycle health data, real-time operating condition data, and real-time alarm texts to establish a verification mechanism that combines integrity and consistency, eliminate abnormal data and remove dimensional differences, and prevent inferior data from affecting the accuracy of subsequent feature extraction.
[0027] Specifically, the process of standardizing the full lifecycle health data, the real-time operating condition data, and the real-time alarm text to obtain standard data includes: Calculate the data integrity index for the full life cycle health data and the real-time operating condition data respectively; When the data integrity index is less than the preset integrity threshold, data is re-sampling to obtain re-sampling data. The re-sampling data is then normalized to obtain normalized data. When the data integrity index is greater than or equal to the integrity threshold, the full life cycle health data and the real-time operating condition data are subjected to data normalization processing to obtain first normalized data and second normalized data. The real-time alarm text is cleaned to obtain standard text data; The first normalized data, the second normalized data, and the standard text data are combined to obtain standard data.
[0028] In this embodiment of the invention, the data integrity index is the ratio of the number of missing data points within a data acquisition period to the total number of data points acquired. For example, a real-time load rate P is set. load If 10 points are collected in a single cycle, and there are two points with a real-time load rate P... load If missing, the number of missing data points in this collection period is 2.
[0029] Specifically, the data integrity index can be calculated using the following formula:
[0030] in, Indicators representing data integrity Indicates the number of missing data points. This indicates the total number of data collection points.
[0031] In this embodiment of the invention, an integrity threshold is set (which can be set to 0.95, i.e., the percentage of missing data points is ≤5%). If the data integrity index is less than the preset integrity threshold, a re-sampling mechanism is triggered on the terminal device, with a maximum of 3 re-sampling attempts. If the threshold is still not met, the data is marked as "faulty data" and an abnormal data collection alarm is triggered, reminding maintenance personnel to check sensors or communication links. When the data integrity index is greater than or equal to the integrity threshold, no data re-sampling is required. Data normalization is directly performed on the full lifecycle health data and real-time operating condition data. The normalized full lifecycle health data is used as the first normalized data, and the normalized real-time operating condition data is used as the second normalized data.
[0032] Data normalization involves using linear normalization to map all numerical data to the [0,1] interval, eliminating dimensional differences. Specifically, data normalization can be performed using the following formula:
[0033] in, Represents normalized data. Represents the original data. , This represents the historical extreme values of this type of data (derived from statistics of normal operation data over the past 5 years), and finally, the normalized data is used to construct a multi-source standardized data matrix. ,in This represents the number of data collection cycles. This provides a unified input format for data monitoring dimensions (including health and work condition data dimensions) and subsequent feature extraction.
[0034] In detail, text cleaning of the real-time alarm text involves using pre-built regular expressions to filter behavior fields within the real-time alarm text. and the reason field Non-text characters, including Chinese characters, letters, numbers, underscores, and spaces, are retained. Multiple consecutive spaces are merged into a single space, and redundant spaces are eliminated to obtain compact text with compressed spaces, ensuring the neatness and consistency of the alarm text.
[0035] In this embodiment of the invention, data standardization processing can eliminate abnormal data and eliminate dimensional differences, while ensuring the cleanliness and consistency of real-time alarm text and improving the accuracy of subsequent feature extraction.
[0036] S2. Calculate the remaining service life of the power equipment based on the first normalized data in the standard data, and calculate the health index and health degradation rate of the power equipment based on the remaining service life.
[0037] In this embodiment of the invention, the remaining service life, i.e. the remaining service years, of the power equipment is calculated from the first normalized data obtained by normalizing the full life cycle health data.
[0038] Specifically, calculating the remaining service life of the power equipment based on the first normalized data in the standard data includes: Extract health indicator data from the first normalized data; The health indicator features of the health indicator data are extracted using a pre-trained remaining service life prediction model. The remaining service life of the power equipment is calculated based on the health indicator characteristics.
[0039] In this embodiment of the invention, the health indicator data is the specific health indicator data included in the first normalized data, such as the H2 and CH4 content of dissolved gas (DGA) in the main transformer oil, the SF6 humidity and partial discharge of GIS (Gas Insulated Switchgear) equipment, and the opening time of the high-voltage circuit breaker.
[0040] Furthermore, before calculating the remaining service life of the power equipment, an LSTM model can be trained, consisting of an input layer (dimension = number of health indicators), two LSTM (Long Short-Term Memory) layers (64 hidden units per layer, Dropout probability = 0.2 to prevent overfitting), and an output layer (dimension = 1, outputting remaining lifespan). Continuous health indicator data can be selected to form time-series data for model training; for example, the H2 content sequence of the main transformer's DGA over the past six months.
[0041] In detail, the model loss function uses the mean squared error, and the formula is as follows:
[0042] in, This represents the value of the loss function. This represents the total number of data points in the time-series data of health indicators. Indicates the first The predicted remaining useful life of each data point. Indicates the first The actual remaining useful life of each data point.
[0043] Specifically, the time-series data of health indicators used for training should cover normal operating conditions and common aging states of the equipment, with a sample size of ≥500 groups. The actual remaining service life can be obtained by labeling historical equipment failure data.
[0044] In this embodiment of the invention, the health index represents the health level of the power equipment, with 1 indicating complete health and 0 indicating malfunction; the health degradation rate represents the aging level of the power equipment (unit: / month).
[0045] Specifically, calculating the health index and health degradation rate of the power equipment based on the remaining service life includes: Calculate the average health index of the power equipment; The health index is calculated using a preset health index formula based on the average value of the health indicators and the remaining service life. The health degradation rate of the power equipment is calculated based on the health index.
[0046] In this embodiment of the invention, the average health indicator can be the average of a specific health indicator over several months, such as the average SF6 humidity of GIS equipment over the past 3 months.
[0047] In detail, the health index formula is expressed as follows:
[0048]
[0049] in, Indicates the health index, This represents the preset first weighting coefficient. This represents the preset second weighting coefficient. This represents the preset third weighting coefficient. This indicates the preset curve steepness parameter. Indicates the remaining service life. This indicates the rated service life of electrical equipment. This indicates the historical repair success rate of electrical equipment. This represents the preset health indicator threshold. This represents the average value of health indicators.
[0050] In this embodiment of the invention, the curve steepness parameter is used to control the steepness of the Sigmoid curve. It can be set to 1.2, and the first, second, and third weighting coefficients can be set to... =0.4, =0.4, =0.2.
[0051] Furthermore, a linear fit is performed between the current health index and the previous health index (such as the health index 6 months ago) to quantify the aging rate of the power equipment.
[0052] Specifically, the rate of health decline can be calculated using the following formula:
[0053] in, Indicates the rate of health decline. This indicates the current health index. This indicates the health index from 6 months ago.
[0054] Among them, A value greater than 0.1 indicates rapid aging of electrical equipment. ≤0.05 indicates that the electrical equipment is in a stable health condition.
[0055] In this embodiment of the invention, by calculating the health index and health degradation rate of power equipment, the long-term health status and aging trend of the equipment can be reflected, which is related to the essential operational health status of the power equipment and improves the accuracy of subsequent risk level calculation.
[0056] S3. Calculate the real-time load deviation rate, voltage over-limit degree, and environmental risk coefficient of the power equipment based on the second normalized data in the standard data to obtain the multi-source operating condition characteristics.
[0057] In this embodiment of the invention, the real-time operating risk of power equipment can be reflected by using multi-source operating condition characteristics composed of real-time load deviation rate, voltage limit exceedance degree, and environmental risk coefficient. Specifically, the real-time load deviation rate reflects the degree to which the load on the current-generating equipment exceeds the rated value, the voltage limit exceedance degree reflects the voltage risk of the power equipment, and the environmental risk coefficient represents the degree of adverse impact of the environment on the operation of the power equipment.
[0058] Specifically, the step of calculating the real-time load deviation rate, voltage exceedance degree, and environmental risk coefficient of the power equipment based on the second normalized data in the standard data to obtain multi-source operating condition characteristics includes: The real-time load deviation rate of the power equipment is calculated based on the real-time load rate in the second normalization. The degree of voltage exceedance of the power equipment is calculated based on the bus voltage in the second normalized data; The environmental risk coefficient of the power equipment is calculated based on the ambient temperature and humidity in the second normalization. By combining the real-time load deviation rate, the voltage over-limit degree, and the environmental risk coefficient, multi-source operating condition characteristics are obtained.
[0059] Specifically, the real-time load deviation rate of the power equipment is calculated using the following formula:
[0060] in, Indicates the real-time load deviation rate. Indicates real-time load rate. This indicates the preset rated load of the equipment.
[0061] In this embodiment of the invention, the real-time load deviation rate >0 indicates overload. ≤0 indicates normal load, especially in A value of ≥0.2 indicates that the power equipment is in a high-load risk state.
[0062] In detail, the degree of voltage over-limit of the electrical equipment is calculated using the following formula:
[0063] in, Indicates the degree of voltage exceeding the limit. Indicates bus voltage. Indicates the upper limit of the rated voltage of electrical equipment. This indicates the lower limit of the rated voltage of electrical equipment.
[0064] Specifically, in When =0, it indicates that the voltage of the power equipment is normal. A value greater than 0 indicates that the voltage of the electrical equipment is exceeding its limit, especially in... A value ≥0.05 indicates an abnormal voltage in the power equipment, and the power equipment will be marked as "voltage abnormality risk".
[0065] Furthermore, the environmental risk coefficient combines the impact of temperature and humidity on equipment operation, quantifies environmental risk by interval, and calculates the environmental risk coefficient of the power equipment using the following formula:
[0066] in, Indicates the environmental risk coefficient. Indicates ambient temperature. Indicates ambient humidity.
[0067] In detail, A larger value indicates a stronger adverse impact of the environment on equipment operation. When the value is 1.2, the power equipment is marked as "high environmental risk".
[0068] In this embodiment of the invention, multi-source operating condition characteristics can reflect the real-time operational risks of power equipment from multiple perspectives, providing basic information for subsequent generation of alarm and maintenance decision-making information.
[0069] S4. Use a pre-built real-time coarse classification model to extract the initial text features of the standard text data in the standard data, and use a preset collaborative decision-making logic to construct deep text features based on the initial text features.
[0070] In this embodiment of the method, the standard text data is the real-time alarm text that has been cleaned, the initial text features are the local statistical features corresponding to the standard text data, and the text depth features are the deep semantic vectors of the standard text data.
[0071] Specifically, the FastText real-time coarse classification model can be used to extract initial text features, and the pre-built Sentence-BERT (Sentence Bidirectional Encoder Representations from Transformers) model architecture can be used to extract deep text features.
[0072] In this embodiment of the invention, before extracting the initial text features of the standard text data in the standard data using a pre-built real-time coarse classification model, the method further includes: A dedicated corpus for the power industry was constructed using pre-defined real-time classification corpus and deep semantic corpus. The power-specific corpus is annotated to obtain an annotated corpus; The pre-constructed classification model and the pre-constructed deep semantic model are trained using the labeled corpus to obtain a real-time initial classification model and a deep feature extraction model.
[0073] In this embodiment of the invention, the real-time classification corpus is a corpus specifically used for the FastText real-time coarse classification model, and the deep semantic corpus is a corpus specifically used for the deep semantic model. The real-time classification corpus and the deep semantic corpus are corpora containing professional terms and parameter units in the power field.
[0074] In detail, real-time classification corpus With deep semantic corpus A mutually exclusive and exhaustive partitioning relation is defined by the following formula:
[0075] Furthermore, for real-time classification corpora The corpus was annotated, with the annotation dimensions being "alarm level + fault type", and the annotation function was used. : The formula is:
[0076] in, ∈ This represents a single text in the real-time classification corpus. Indicates the corresponding alarm level. This indicates the corresponding core fault. Representing text Alarm level labeling, This indicates the fault type label.
[0077] For deep semantic corpus Fine-grained annotation of the corpus, with annotation dimensions of "fault type - equipment component - parameter deviation", annotation function. The formula is:
[0078] Represents a collection of equipment components. Represents the set of parameter deviations. Representing text respectively The equipment components and parameter deviations are labeled.
[0079] In this embodiment of the invention, a pre-built classification model and a pre-built deep semantic model can be trained using an annotated corpus, and the difference between the predicted result and the corresponding annotation in the annotated corpus can be judged until the difference between the predicted result and the annotation meets the preset requirements, thereby obtaining a real-time coarse classification model for initial text feature extraction and a deep feature extraction model for deep text feature extraction.
[0080] Specifically, the step of extracting the initial text features of the standard text data from the standard data using a pre-built real-time coarse classification model includes: The local statistical features of the standard text data are calculated using the real-time coarse classification model. Calculate the embedding vector corresponding to the local statistical features; Calculate the average value of the embedding vectors and use the average value as the initial text feature of the standard text data.
[0081] In detail, the real-time coarse classification model consists of a lightweight architecture comprising an input layer, an embedding layer, and a hierarchical softmax output layer. If the standard text data is... Generate n-gram features (local statistical features) of length 1 to 3. Local statistical features can be represented as:
[0082] in, Represents standard text data Local statistical characteristics, Represents standard text data The total number of words in the Chinese text. Indicates the preset feature length. Represents standard text data The Middle One word.
[0083] Furthermore, an embedding layer is used to map the discrete n-gram local statistical features into continuous numerical vectors, resulting in embedding vectors. Each n-gram feature corresponds to an embedding vector. Text vector The average value of all n-gram embedding vectors is given by the formula:
[0084] in, Indicates the initial features of the text. This represents the total number of features in a local statistical feature set. Indicating the first local statistical feature The embedding vector corresponding to each feature.
[0085] In this embodiment of the invention, the deep semantic model is based on the lightweight Sentence-BERT architecture. Targeting the general semantic characteristics of power alarm texts, namely "fault type - equipment attribute - parameter association", a general pre-training task without temperature coefficient is designed to adapt to various equipment fault scenarios.
[0086] Specifically, positive sample pairs of "texts describing the same type of fault differently" and negative sample pairs of "texts describing different types of faults" are constructed based on the annotated corpus. The semantic vector clustering and discriminativeness are optimized by comparing the loss function, thereby achieving accurate semantic modeling of power alarm texts. The formula is:
[0087] in, This represents the text vector loss function value. Represents the target text vector. This represents a positive sample text vector. For all samples within the batch, This is the cosine similarity function.
[0088] Using text vectors as input, a fully connected layer is used to predict a 3D universal label for "fault type-equipment component-parameter deviation". The loss function employs a multi-task cross-entropy weighted fusion, and the formula is as follows:
[0089] in, This represents the model's loss function value. Indicates the loss by fault type classification. This indicates the loss of positioning of equipment components. This indicates the loss due to deviations in equipment operating parameters. These are the weighting coefficients, and This ensures balanced learning of 3D labels and adapts to the semantic parsing needs of different device malfunctions.
[0090] The AdamW optimizer is used when training the deep semantic model to monitor the semantic similarity accuracy and label prediction accuracy on the validation set, ensuring the deep semantic capture capability and obtaining more accurate text deep features.
[0091] In this embodiment of the invention, the step of constructing deep text features based on the initial text features using a preset collaborative decision-making logic includes: The initial features of the text are classified and predicted to obtain the prediction probability, and the text confidence of the standard text data is determined based on the prediction probability. When the text confidence score is less than a preset confidence threshold, a pre-constructed deep feature extraction model is used to extract the deep semantic vector corresponding to the standard text data. Construct deep text features based on the deep semantic vector and the initial text features; When the text confidence level is greater than or equal to the confidence level threshold, the initial text features are filled with special features to obtain the text depth features.
[0092] In this embodiment of the invention, a pre-constructed binary tree structure is used to replace the traditional Softmax method for fault category prediction of initial text features. This transforms the classification task into path determination, reducing computational complexity. (Category Probability) The formula is:
[0093] in, This represents the path from the root node of the binary tree structure to the predicted class c. This represents the preset node weights. This indicates the preset node offset. This represents the Sigmoid function.
[0094] Specifically, in classification prediction, the binary tree structure model outputs the probability distribution of the category to which the initial features of the text belong, and takes the maximum probability as the confidence level. .like (High confidence) If the classification result is obtained, it will be output directly without further deep processing, ensuring real-time performance; if If the confidence level is less than 0.9, the text is considered semantically ambiguous and is fed into the Sentence-BERT deep feature extraction model for deep analysis to improve semantic accuracy.
[0095] Furthermore, the deep feature extraction model first processes the standard text data through an electricity-specific tokenizer, adds the sequence start token "[CLS]" and the end token "[SEP]", pads it according to the maximum sequence length of the model (adapting to various alarm text lengths), and generates an attention mask (the mask value corresponding to the "[PAD]" padding token is 0, and the mask value corresponding to the valid semantic token is 1, ensuring that invalid tokens do not participate in attention calculation).
[0096] Next, hierarchical feature fusion is performed to extract the middle and top hidden states of the Transformer encoder in the deep feature extraction model, including the middle hidden states. Used to capture local semantics (such as device components, fault behavior), top-level hidden state The formula for weighted fusion, used to capture global semantics (such as the relationship between "fault type - equipment component - parameter"), is as follows:
[0097] in, Represents the fused semantic vector. Local semantics representing the hidden state in the middle layer. The global semantics representing the top-level hidden state. This represents the preset local semantic weights. This represents the preset global semantic weight.
[0098] in, The optimal weight allocation can be determined by verifying the power corpus.
[0099] Finally, the fused semantic vector is normalized: the hidden vector corresponding to the hidden state "[CLS]" token in the fused semantic vector (representing the global semantics of the entire sentence) is taken, which is the feature vector corresponding to the "[CLS]" position in the fused semantic vector. This vector is then normalized using L2 to eliminate vector scale differences, generating a general deep semantic vector. The formula is as follows:
[0100] in, The hidden vector representing the "[CLS]" token. Describing the L2 norm, Represents a deep semantic vector.
[0101] In this embodiment of the invention, the deep semantic vector has the fine-grained semantic ability to distinguish different parameters of the same type of fault and similar expressions of different faults, and is suitable for various power equipment fault scenarios.
[0102] In this embodiment of the invention, the collaborative decision-making logic determines that the text is semantically clear when the text confidence level is greater than or equal to a preset confidence threshold, and directly uses the text vector generated by FastText. By padding with zeros to the target dimension of the Sentence-BERT vector (ensuring dimensional consistency in subsequent feature fusion), deep text features are obtained. The formula is:
[0103] In the formula, For the Sentence-BERT vector dimension, For the FastText vector dimension, As the zero vector for the corresponding dimension, this logic ensures the real-time processing requirements of high-confidence text.
[0104] When the text confidence score is less than a preset confidence threshold, the standard text data is determined to be semantically ambiguous (e.g., ambiguous expressions, incomplete parameter descriptions), and dynamic weights are used to adjust the FastText text vector. After zero-padding, it is compared with the Sentence-BERT depth vector. Weighted fusion yields the final deep text features. The formula is:
[0105] in, Indicates text confidence level. These are the preset base weight coefficients. (Dynamic compensation of deep semantic weights ensures the parsing accuracy of semantically ambiguous text.) In this embodiment of the invention, by constructing deep text features based on initial text features through collaborative decision-making logic, it is possible to achieve synergy between coarse classification results and deep semantics, thereby improving the accuracy of deep text feature extraction for alarm semantics.
[0106] Preferably, the mean cosine similarity between the deep features of the text and the vectors of similar faulty texts in a pre-set corpus is calculated. To verify the effectiveness of semantic expression, the similarity criterion is a configurable threshold. set up, It needs to be set in combination with semantic accuracy requirements (such as the number of fault subcategories and the difficulty of distinguishing similar faults) and corpus representativeness (such as semantic consistency of similar fault texts).
[0107] like : Determine the validity of the semantic vector of the deep features of the text, and output it for subsequent alarm aggregation; like This triggers the text re-parsing process, re-executing steps S1-S4 above to ensure that the constructed deep text features accurately reflect the core meaning of the text.
[0108] S5. Perform feature weighted fusion on the health index, the health decay rate, the multi-source operating condition features, and the text depth features to obtain the fused feature vector of the power equipment.
[0109] In this embodiment of the invention, feature weighted fusion integrates the health index, the health decay rate, the multi-source operating condition features, and the text depth features into a unified and complementary feature space to obtain a fused feature vector containing health data, operating condition data, and alarm text feature information.
[0110] Specifically, the step of performing feature weighted fusion of the health index, the health decay rate, the multi-source operating condition features, and the text depth features to obtain the fused feature vector of the power equipment includes: A health feature is constructed based on the health index and the health decay rate. The health status feature and the multi-source working condition feature are mapped to the feature dimension corresponding to the text depth feature to obtain the mapped health status feature and the mapped multi-source working condition feature. The mapped health features, the mapped multi-source operating condition features, and the text depth features are standardized to obtain standardized health features, standardized operating condition features, and standardized text depth features. The fusion weights of the standardized health feature, the standardized working condition feature, and the standardized text depth feature are calculated respectively. The standardized health features, standardized operating condition features, and standardized text depth features are weighted and fused using the fusion weights to obtain the fused feature vector of the power equipment.
[0111] In this embodiment of the invention, the health index and health decay rate are used as feature values to form a feature vector, resulting in a two-dimensional health feature of the power equipment. This two-dimensional health feature and the multi-dimensional operating condition feature (3D: The target dimension is mapped to the deep features of the text through a fully connected layer. This ensures consistency across the three feature dimensions.
[0112] Specifically, the health status feature and the multi-source working condition feature can be mapped to the feature dimension corresponding to the text depth feature using the following formula:
[0113]
[0114] in, This represents the mapped health feature. This represents the Sigmoid activation function. This represents the preset first mapping weight matrix. Indicates health status characteristics, This represents the preset first mapping bias term. This represents the mapped multi-source operating condition characteristics. This represents the preset second mapping weight matrix. Indicates the characteristics of multi-source operating conditions. This represents the preset second mapping bias term.
[0115] Furthermore, feature standardization eliminates differences in numerical scales. Specifically, feature standardization can be performed using the following formula:
[0116] in, This represents standardized health characteristics, standardized operating condition characteristics, or standardized text depth characteristics. This represents the mapped health feature, the mapped multi-source operating condition feature, or the text depth feature. This represents the corresponding feature mean. This represents the corresponding standard deviation.
[0117] Specifically, , When performing feature standardization, the mean and standard deviation of this type of feature in the power historical corpus can be obtained based on the statistical data of normal operating conditions and faults over the past 3 years. This ensures that the standardized feature values are distributed in the range of [-1,1], thus avoiding the dominance of the fusion result by a certain type of feature due to its excessively large numerical magnitude.
[0118] In this embodiment of the invention, a preset scene adaptation coefficient is introduced. ( [0,1]) and equipment importance coefficient The fusion weights of standardized health features, standardized working condition features, and standardized text depth features are calculated. Specifically, the fusion weights can be calculated using the following formula:
[0119] in, This represents the fusion weights of standardized health characteristics. This represents the preset equipment importance coefficient. This represents the preset scene adaptation coefficient. The fusion weights represent the standardized operating condition characteristics. This represents the fusion weights of the standardized text depth features.
[0120] Specifically, satisfy The scenario adaptation coefficient is set to 0.7 during periods of high failure incidence (such as thunderstorms and high temperatures), focusing on health (inherent equipment risk) and operating conditions (real-time operational risk); 0.3 during normal operation and maintenance, focusing on operating conditions (energy consumption optimization) and text semantics (alarm credibility); and 0.5 during equipment maintenance, to balance the contributions of the three types of features. The equipment importance coefficient ranges from [0,1] and is a quantitative indicator that measures the status, role, and influence of power equipment in a specific system, which can be obtained according to the actual scenario. For example, core equipment such as the main transformer can be set to... ≥0.8, auxiliary equipment such as disconnect switches ≤0.5.
[0121] In this embodiment of the invention, the standardized health feature, the standardized working condition feature, and the standardized text depth feature are weighted and summed by fusion weights to obtain a fused feature vector.
[0122] Specifically, the fused feature vector is represented as:
[0123] in, This represents the fused feature vector.
[0124] Preferably, a fused feature vector can be calculated. Mean cosine similarity with historical fault feature vectors ,like ( A configurable fusion validity threshold is set according to the fault type discrimination requirements to determine fusion validity; otherwise, the dynamic weights are readjusted (such as increasing the health or operating condition weights) until the validity requirements are met, thus avoiding false risk judgments caused by invalid fusion.
[0125] In this embodiment of the invention, by calculating the fusion feature vector of power equipment, the feature contribution can be dynamically adjusted according to different scenarios such as high-incidence period of faults and normal operation and maintenance period, so as to realize the dynamic adaptation of feature information.
[0126] S6. Calculate the risk level of the power equipment based on the fused feature vector, aggregate and group the power equipment according to the risk level to obtain multiple aggregate groups, and generate alarm operation and maintenance decision information for each aggregate group.
[0127] In this embodiment of the invention, the risk level is the operational risk of the power equipment, which may include four levels: emergency / high / medium risk.
[0128] Specifically, calculating the risk level of the power equipment based on the fused feature vector includes: Calculate the fault propagation probability of the power equipment; The risk value of the power equipment is calculated based on the fault propagation probability and the fused feature vector. The risk level of the power equipment is determined based on the risk value.
[0129] In this embodiment of the invention, the probability of fault propagation is a core indicator for quantifying the likelihood of a power equipment failure affecting surrounding equipment, systems, or regional power grids. The closer the connection between the faulty equipment and other equipment, and the more vulnerable the surrounding equipment, the higher the probability of propagation.
[0130] Specifically, determine the power system in which the electrical equipment is located, assuming the system has... Taiwan equipment, construction topological matrix ,like , indicating equipment With equipment There is a direct electrical connection or functional dependency; if , indicating equipment With equipment No connection.
[0131] Count the number of associated devices of the faulty device (matrix number 1) The number of "1"s in a row is denoted as , with the total number of devices in the system The ratio of these two values is the topological affinity:
[0132] Computing the vulnerability of peripheral devices : Peripheral devices (topology associated) The average health index HI of the equipment is supplemented (the lower the HI, the more fragile the equipment and the more easily it is affected by the spread).
[0133]
[0134] in, Indicates the vulnerability of surrounding equipment. Indicates the number of devices in the topology association. Indicates the first The health index of topologically related devices.
[0135] Finally, the fault propagation probability is calculated. : Introduce a preset topology influence coefficient (The value is fixed at 0.8 in the power scenario, since topology is the core carrier of diffusion), the formula is:
[0136] Furthermore, the risk value of the electrical equipment is calculated using the following formula:
[0137] in, Indicates the risk value. , , This represents the preset risk contribution coefficient. The L2 norm of the fused feature vectors is represented. This represents the preset equipment importance coefficient. This represents the probability of fault propagation.
[0138] Furthermore, configurable risk thresholds can be set. Risk value Risks are categorized into four levels, each corresponding to a different operational and maintenance response strategy.
[0139]
[0140] in, Indicates the risk level. Indicates the level of emergency risk. Indicates a high-risk level. Indicates a medium-risk level. Indicates a low-risk level. Indicates the risk value. This indicates the risk threshold corresponding to the emergency risk level. This indicates the risk threshold corresponding to a high-risk level. This indicates the risk threshold corresponding to the medium-risk level.
[0141] In this embodiment of the invention, the aggregation grouping is to divide the power equipment according to the risk level. For example, power equipment with emergency risk level, high risk level and medium risk level are divided into an aggregation group to obtain the aggregation groups corresponding to power equipment with emergency risk level, high risk level and medium risk level.
[0142] Furthermore, the alarm operation and maintenance decision information includes the average risk (weighted by device importance), maximum risk value, device coverage, adaptation suggestions, hierarchical operation and maintenance instructions, and fault type for each aggregation group.
[0143] Specifically, a unique identifier (such as high-risk level) is determined based on the risk level of the power equipment in the aggregation group. A pre-built fault classification module (such as a fully connected layer) is used to classify the fused feature vectors to obtain the fault category of each power equipment. Then, it is connected to the power fault case library. The scenario similarity is calculated based on the equipment type, fault behavior and risk level of the power equipment. Qualified cases are selected and sorted by "similarity + execution effect". The best case is selected and combined with the current equipment health and operating condition to fine-tune the handling plan to form an adaptation suggestion.
[0144] Commands are matched according to the average risk of the aggregated group: emergency risks generate shutdown / standby start commands (response ≤ 5 minutes), high risks generate 24-hour maintenance commands, and medium risks generate 72-hour inspection commands; after the command is executed, the status recovery rate is calculated in the verification window (emergency ≤ 1 hour, high risk ≤ 6 hours, medium risk ≤ 24 hours), cases that meet the standard are recorded, and commands that do not meet the standard are readjusted.
[0145] In detail, the unique identifier of each aggregation group, the fault type of the power equipment, the risk level, the risk indicators, the core information of the associated alarms, and the operation and maintenance instructions are collected to obtain alarm operation and maintenance decision information.
[0146] In this embodiment of the invention, alarm operation and maintenance decision information can be used to construct a three-in-one fusion architecture of "health status - multi-source operating conditions - text semantics", linking historical fault cases and operation and maintenance procedures to obtain directly executable hierarchical operation and maintenance instructions, thereby realizing the accurate aggregation of alarm information and the closed-loop implementation of operation and maintenance decisions.
[0147] For example, a 220kV smart substation in a certain region was selected as the test object. This substation includes core equipment such as main transformers, GIS equipment, and high-voltage circuit breakers. Multi-source sensors were configured to collect full lifecycle health data, real-time operating condition data, and alarm text data. An experimental platform was built and targeted test scenarios were designed. In terms of the experimental environment, the central control center server used an Intel Xeon Gold 6330 processor, the edge acquisition nodes used Intel Core i7-12700 processors and 32GB of memory, and the communication link was a 5G+fiber hybrid network. The software was based on the Ubuntu 20.04 system, equipped with the PyTorch 1.12 framework and the HuggingFace Transformers toolkit. The model parameters were set to LSTM-RUL with 64 hidden units, Dropout probability of 0.2, FastText embedding layer dimension of 128, and Sentence-BERT as the base architecture. The test covers four scenarios: single fault, multiple fault coupling, implicit performance degradation, and normal operation and maintenance. 1,000 samples were generated for each scenario, for a total of 4,000 data sets. The data includes equipment health data from the past 6 months, real-time operating condition data, and 8,000 standard alarm texts. The data is divided into a training set (2,800 sets) and a test set (1,200 sets) in a 7:3 ratio to ensure the generalizability of the test.
[0148] Furthermore, the recognition accuracy (%) of the method in this application compared with traditional keyword clustering, single-modal semantic methods, and cloud-edge collaborative methods under different fault scenarios is shown in Table 1 below: Table 1
[0149] Therefore, this application has significant advantages in various fault diagnosis methods, especially in scenarios involving latent faults and multiple coupled faults. The core reason is that it integrates the health status of the entire equipment lifecycle and the characteristics of multiple operating conditions, avoiding the "indiscriminate classification" of traditional methods. Furthermore, the FastText-Sentence-BERT hybrid architecture enables a deep understanding of power-specific semantics, solving the problem of insufficient ability of single-modal semantic methods to distinguish similar fault descriptions.
[0150] Furthermore, the fault diagnosis response latency of the method of this invention was compared with that of traditional keyword clustering, single-modal semantic methods, and cloud-edge collaborative methods under different data scenarios. The results are as follows: Figure 2 As shown.
[0151] like Figure 3 The diagram shown is a functional block diagram of an alarm aggregation and processing system for a substation centralized control system provided in an embodiment of the present invention.
[0152] The substation centralized control system alarm aggregation processing system 100 of the present invention can be installed in a processing device. Depending on the functions implemented, the substation centralized control system alarm aggregation processing system 100 may include a data standardization processing module 101, a health index and health decay rate calculation module 102, a multi-source operating condition feature calculation module 103, a text deep feature construction module 104, a fusion feature vector generation module 105, and an alarm operation and maintenance decision information generation module 106. The module described in this invention can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of an electronic device and can perform a fixed function, and are stored in the memory of the electronic device.
[0153] In this embodiment, the functions of each module / unit are as follows: The data standardization processing module 101 is used to collect the full life cycle health data, real-time operating condition data and real-time alarm text of the power equipment, and to perform data standardization processing on the full life cycle health data, the real-time operating condition data and the real-time alarm text respectively to obtain standard data. The health index and health decay rate calculation module 102 is used to calculate the remaining service life of the power equipment based on the first normalized data in the standard data, and to calculate the health index and health decay rate of the power equipment based on the remaining service life. The multi-source operating condition characteristic calculation module 103 is used to calculate the real-time load deviation rate, voltage over-limit degree and environmental risk coefficient of the power equipment based on the second normalized data in the standard data, so as to obtain the multi-source operating condition characteristics. The text deep feature construction module 104 is used to extract the initial text features of the standard text data in the standard data using a pre-built real-time coarse classification model, and to construct the text deep features based on the initial text features using a preset collaborative decision logic. The fusion feature vector generation module 105 is used to perform feature weighted fusion of the health index, the health decay rate, the multi-source operating condition features and the text depth features to obtain the fusion feature vector of the power equipment. The alarm operation and maintenance decision information generation module 106 is used to calculate the risk level of the power equipment based on the fused feature vector, aggregate and group the power equipment according to the risk level to obtain multiple aggregate groups, and generate alarm operation and maintenance decision information for each aggregate group.
[0154] In the description of this specification, references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.
[0155] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this invention, "a plurality of" means at least two, such as two, three, etc., unless otherwise explicitly specified.
[0156] The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A method for aggregating and processing alarms in a substation centralized control system, characterized in that, include: Collect full life cycle health data, real-time operating condition data, and real-time alarm text of power equipment, and perform data standardization processing on the full life cycle health data, the real-time operating condition data, and the real-time alarm text to obtain standard data; The remaining service life of the power equipment is calculated based on the first normalized data in the standard data, and the health index and health degradation rate of the power equipment are calculated based on the remaining service life. Based on the second normalized data in the standard data, the real-time load deviation rate, voltage over-limit degree and environmental risk coefficient of the power equipment are calculated to obtain the multi-source operating condition characteristics. The initial text features of the standard text data in the standard data are extracted using a pre-built real-time coarse classification model, and the deep text features are constructed based on the initial text features using a preset collaborative decision-making logic. The health index, the health decay rate, the multi-source operating condition features, and the text depth features are weighted and fused to obtain the fused feature vector of the power equipment. The risk level of the power equipment is calculated based on the fused feature vector. The power equipment is then aggregated and grouped according to the risk level to obtain multiple aggregate groups. Alarm and maintenance decision information is generated for each aggregate group.
2. The alarm aggregation processing method for a substation centralized control system as described in claim 1, characterized in that, The data standardization process is performed on the full life cycle health data, the real-time operating condition data, and the real-time alarm text to obtain standard data, including: Calculate the data integrity index for the full life cycle health data and the real-time operating condition data respectively; When the data integrity index is less than the preset integrity threshold, data is re-sampling to obtain re-sampling data. The re-sampling data is then normalized to obtain normalized data. When the data integrity index is greater than or equal to the integrity threshold, the full life cycle health data and the real-time operating condition data are subjected to data normalization processing to obtain first normalized data and second normalized data. The real-time alarm text is cleaned to obtain standard text data; The first normalized data, the second normalized data, and the standard text data are combined to obtain standard data.
3. The alarm aggregation processing method for a substation centralized control system as described in claim 1, characterized in that, The calculation of the remaining service life of the power equipment based on the first normalized data in the standard data includes: Extract health indicator data from the first normalized data; The health indicator features of the health indicator data are extracted using a pre-trained remaining service life prediction model. The remaining service life of the power equipment is calculated based on the health indicator characteristics.
4. The alarm aggregation processing method for a substation centralized control system as described in claim 1, characterized in that, The calculation of the health index and health degradation rate of the power equipment based on the remaining service life includes: Calculate the average health index of the power equipment; The health index is calculated using a preset health index formula based on the average value of the health indicators and the remaining service life. The health degradation rate of the power equipment is calculated based on the health index.
5. The alarm aggregation processing method for a substation centralized control system as described in claim 1, characterized in that, The process of calculating the real-time load deviation rate, voltage exceedance degree, and environmental risk coefficient of the power equipment based on the second normalized data in the standard data yields multi-source operating condition characteristics, including: The real-time load deviation rate of the power equipment is calculated based on the real-time load rate in the second normalization. The degree of voltage exceedance of the power equipment is calculated based on the bus voltage in the second normalized data; The environmental risk coefficient of the power equipment is calculated based on the ambient temperature and humidity in the second normalization. By combining the real-time load deviation rate, the voltage over-limit degree, and the environmental risk coefficient, multi-source operating condition characteristics are obtained.
6. The alarm aggregation processing method for a substation centralized control system as described in claim 1, characterized in that, The process of constructing deep text features based on the initial text features using a preset collaborative decision-making logic includes: The initial features of the text are classified and predicted to obtain the prediction probability, and the text confidence of the standard text data is determined based on the prediction probability. When the text confidence score is less than a preset confidence threshold, a pre-constructed deep feature extraction model is used to extract the deep semantic vector corresponding to the standard text data. Construct deep text features based on the deep semantic vector and the initial text features; When the text confidence level is greater than or equal to the confidence level threshold, the initial text features are filled with special features to obtain the text depth features.
7. The alarm aggregation processing method for a substation centralized control system as described in claim 1, characterized in that, The step of performing feature weighted fusion of the health index, the health decay rate, the multi-source operating condition features, and the text depth features to obtain the fused feature vector of the power equipment includes: A health feature is constructed based on the health index and the health decay rate. The health status feature and the multi-source working condition feature are mapped to the feature dimension corresponding to the text depth feature to obtain the mapped health status feature and the mapped multi-source working condition feature. The mapped health features, the mapped multi-source operating condition features, and the text depth features are standardized to obtain standardized health features, standardized operating condition features, and standardized text depth features. The fusion weights of the standardized health feature, the standardized working condition feature, and the standardized text depth feature are calculated respectively. The standardized health features, standardized operating condition features, and standardized text depth features are weighted and fused using the fusion weights to obtain the fused feature vector of the power equipment.
8. A substation centralized control system alarm aggregation and processing system, characterized in that, include: The data standardization processing module is used to collect the full life cycle health data, real-time operating condition data and real-time alarm text of power equipment, and to perform data standardization processing on the full life cycle health data, the real-time operating condition data and the real-time alarm text respectively to obtain standard data. The health index and health decay rate calculation module is used to calculate the remaining service life of the power equipment based on the first normalized data in the standard data, and to calculate the health index and health decay rate of the power equipment based on the remaining service life. The multi-source operating condition characteristic calculation module is used to calculate the real-time load deviation rate, voltage over-limit degree and environmental risk coefficient of the power equipment based on the second normalized data in the standard data, so as to obtain the multi-source operating condition characteristics. The text deep feature construction module is used to extract the initial text features of the standard text data in the standard data using a pre-built real-time coarse classification model, and to construct the text deep features based on the initial text features using a preset collaborative decision logic. The fusion feature vector generation module is used to perform feature weighted fusion of the health index, the health decay rate, the multi-source operating condition features, and the text depth features to obtain the fusion feature vector of the power equipment. The alarm operation and maintenance decision information generation module is used to calculate the risk level of the power equipment based on the fused feature vector, aggregate and group the power equipment according to the risk level to obtain multiple aggregate groups, and generate alarm operation and maintenance decision information for each aggregate group.
9. A processing device, characterized in that, It includes at least one processor and at least one memory communicatively connected to the processor, wherein: the memory stores program instructions executable by the processor, and the processor can execute the method as described in any one of claims 1-7 by invoking the program instructions.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions that cause the computer to perform the method as described in any one of claims 1-7.