Power equipment fault early warning analysis system based on fusion of machine learning and k-means algorithm
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHENHUA GUOHUA ZHOUSHAN POWER GENERATION CO LTD
- Filing Date
- 2026-03-30
- Publication Date
- 2026-06-19
Smart Images

Figure CN122243211A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of power equipment monitoring technology, and in particular to a power equipment fault early warning analysis system based on the fusion of machine learning and K-means algorithm. Background Technology
[0002] Existing power equipment fault early warning systems mostly rely on rule-based threshold settings or single-model judgments, which are difficult to effectively address the problems of fusing and processing multi-source heterogeneous data and identifying nonlinear fault characteristics under complex operating environments.
[0003] Power equipment generates a large amount of dynamic monitoring data during operation, such as voltage, current, temperature, and vibration. This data is characterized by high dimensionality, strong temporal sequence, and significant noise interference. Traditional data processing methods have limitations in feature extraction and state modeling. Existing fault diagnosis methods mostly use a single supervised learning model or empirical rules for judgment, lacking effective utilization of the complementary information of multiple models. This results in insufficient system identification ability and a high false alarm and false negative rate when facing unknown fault types or scarce data labels. Summary of the Invention
[0004] The purpose of this invention is to provide a power equipment fault early warning analysis system based on the fusion of machine learning and K-means algorithm, so as to solve the problems mentioned in the background art.
[0005] To achieve the above objectives, the present invention provides the following technical solution: a power equipment fault early warning and analysis system based on the fusion of machine learning and K-means algorithm, comprising:
[0006] The system comprises the following modules: a data feature construction module, configured to collect multi-dimensional dynamic monitoring data during power equipment operation, construct fault features, and generate multi-dimensional feature vectors; a multi-model fusion analysis module, configured to construct a model set including multiple sub-models, perform high-dimensional feature response analysis on the target equipment's state, and dynamically update the contribution weights of each sub-model in the fusion output; an unsupervised clustering auxiliary module, configured to perform unsupervised clustering analysis on power equipment operating status data using the K-means clustering algorithm; a fault trend prediction module, configured to construct a multi-dimensional time series prediction model based on the fault risk probability sequence of the fusion output, and assess the risk evolution trend of power equipment in future operating cycles; and a risk classification and early warning module, configured to dynamically classify and issue early warnings for risk levels based on the output results of the fault trend prediction module and the degree of deviation of the equipment's current state.
[0007] Furthermore, the data feature construction module also includes:
[0008] The system receives multi-dimensional dynamic monitoring data from power equipment, including voltage, current, temperature, vibration frequency, and load changes. It then performs multi-dimensional feature expansion on the dynamic monitoring data, constructs a set of nonlinear mapping functions, and performs expansion transformation on the dynamic monitoring data based on a combination strategy of polynomial transformation, exponential function mapping, and ratio function.
[0009] Based on the high-order cross-combination strategy, variables with potential correlations between different dimensions in dynamic monitoring data are combined to form a high-order multidimensional feature vector.
[0010] The constructed multidimensional feature vectors are subjected to time series encoding to extract temporal features that reflect the trend of state changes during equipment operation.
[0011] Furthermore, the multi-model fusion analysis module includes:
[0012] The model set construction unit is configured to initialize a model set including multiple sub-models based on the multi-dimensional feature vector output by the data feature construction module. The sub-models include a supervised learning model and a semi-supervised learning model. The supervised learning model is trained based on existing labeled data and learns the mapping relationship between equipment failure modes and features. The semi-supervised learning model combines labeled data and unlabeled samples and learns the latent representation structure through graph structure propagation.
[0013] The sub-models are trained using training and validation sets constructed from historical fault record samples. Multidimensional performance evaluation metrics are set to assess the training effect of the sub-models. After the model set training is completed, the state response results of each sub-model to the input samples are output.
[0014] The model fusion unit is configured to receive prediction outputs from multiple sub-models from the model set construction unit. Based on the multi-model collaborative analysis framework, it performs weighted fusion of the sub-models. During the fusion process, a mutual information metric between models is introduced to quantify the information redundancy and complementarity between the output results of different models. The error statistics of each sub-model in historical predictions are calculated, and an adaptive weight adjustment strategy based on the Bayesian optimization principle is adopted to dynamically update the fusion weights of each sub-model.
[0015] Furthermore, the multi-model fusion analysis module also includes:
[0016] An anomaly detection feedback unit is configured to identify anomalies during the sub-model fusion process. The anomalies include significant discrepancies between model prediction results, unstable fluctuations in fusion output, and low-confidence output. When an anomaly is detected, the corresponding sample is marked as an augmented sample and fed back to the model set construction unit to participate in the subsequent iterative training process.
[0017] Furthermore, the unsupervised clustering auxiliary module includes:
[0018] Cluster initialization unit, configured as follows:
[0019] The multidimensional feature vectors generated by the data feature construction module are preprocessed and used for multi-centroid initialization of the K-means clustering algorithm. By statistically distributing each dimension in the feature space, multiple representative initial cluster centers are calculated. Based on the feature distribution density and sample distance relationship, feature standardization and outlier correction are performed. The preprocessed multidimensional feature vector samples will be used as input for K-means clustering calculation.
[0020] The density-aware analysis unit is configured to receive multi-dimensional feature vector samples from the clustering initialization unit, perform unsupervised clustering analysis based on the K-means algorithm, dynamically adjust the cluster convergence conditions and partitioning thresholds according to the average distance and density distribution between sample points within each cluster, adjust the attraction radius of each centroid during the clustering process, iteratively correct the boundaries of each type of cluster after clustering, and output the cluster label and confidence index of each multi-dimensional feature vector sample.
[0021] Furthermore, the unsupervised clustering auxiliary module also includes:
[0022] The pseudo-label generation unit is configured to identify potential abnormal clusters and generate pseudo-labels based on the structural characteristics of each cluster sample after unsupervised clustering. In the process of identifying potential abnormal clusters, the set of samples with high deviation is identified by jointly judging the cluster density, inter-cluster separation and cluster center fluctuation characteristics.
[0023] The set of potential abnormal cluster samples is marked as pseudo-label samples, and a confidence factor related to the degree of abnormality of the pseudo-label samples is assigned. The results are output to the multi-model fusion analysis module as an auxiliary training data source. When generating pseudo-labels, the confidence level of the pseudo-labels is dynamically adjusted according to the clustering stability and the change of the affiliation of each multi-dimensional feature vector sample in multiple rounds of clustering.
[0024] Furthermore, the fault trend prediction module includes:
[0025] The time series feature extraction unit is configured to receive the fusion output results from the multi-model fusion analysis module and combine them with the multi-dimensional feature vector generated by the data feature construction module to construct multi-dimensional time series features that reflect the evolution process of the equipment state.
[0026] Based on the preset time window length, the fault risk probability sequence in the historical fusion output is extracted and synchronously integrated with the multi-dimensional feature vector of equipment operation in the corresponding time period to generate a data segment with temporal consistency.
[0027] The multidimensional feature vectors within the sliding window are standardized and subjected to time-series enhancement transformation to extract time-related features containing short-term trends and long-term state evolution information. The extracted time-series feature sequences are then output to the trend prediction modeling unit.
[0028] The trend prediction modeling unit is configured to model complex nonlinear time series relationships based on multi-dimensional time series features and through multi-layer structure stacking, and to build a fault risk evolution prediction model for future time periods. During the training process, supervised training samples are constructed using historically known risk labels, and the fault risk evolution prediction model is iteratively optimized using gradients.
[0029] The key state change points in the input multidimensional time series features are assigned high weights, and the prediction results are output. The prediction results include the fault risk probability value sequence of the target power equipment in multiple future time steps and the confidence evaluation index of the risk change trend. The prediction results are passed as input to the risk classification and early warning module.
[0030] Furthermore, the time window length is dynamically adjusted, including:
[0031] The data segment is retrieved after extracting the fault risk probability sequence and synchronously integrating the multi-dimensional feature vector of equipment operation from the preset time window length;
[0032] Extract each pair of adjacent data segments as a data segment group;
[0033] Obtain the data similarity values between data segments corresponding to each data segment group;
[0034] The data similarity values are compared with preset similarity reference values;
[0035] Select data segments with similarity values lower than the preset similarity reference value as target data segments;
[0036] Based on the final generation time of the two data segments contained in each target data segment group, obtain the intermediate time corresponding to the final generation time of the two data segments contained in each target data segment group, and use it as the calibration time of each data segment group;
[0037] Based on the calibration time of the target data segment group, obtain the standard deviation of the time interval corresponding to the occurrence of the target data segment group;
[0038] Retrieve the data similarity scores corresponding to the target data segment group;
[0039] Based on the data similarity values corresponding to the target data segment group, obtain the standard deviation of the similarity values corresponding to the target data segment group;
[0040] The length of the time window is dynamically adjusted based on the standard deviation of the time interval and the standard deviation of the similarity value corresponding to the target data segment group.
[0041] Furthermore, the time window length is dynamically adjusted based on the standard deviation of the time interval and the standard deviation of the similarity values corresponding to the target data segment group, including:
[0042] Retrieve the standard deviation of the time interval corresponding to the target data segment group;
[0043] The time interval standard deviation is normalized using a preset time interval standard deviation reference value to obtain the normalized time interval standard deviation.
[0044] Retrieve the standard deviation of the similarity values corresponding to the target data segment group;
[0045] The similarity numerical standard deviation is normalized using a preset reference value to obtain the normalized similarity numerical standard deviation.
[0046] The standard deviation of the normalized similarity value is compared with the standard deviation of the normalized time interval to obtain the standard deviation difference parameter.
[0047] The similarity values corresponding to the target data segment group are used to obtain the median similarity value of the target data segment group;
[0048] The similarity difference parameter is obtained by comparing the median similarity value with a preset similarity reference value.
[0049] Compare the standard deviation difference parameter with the similarity difference parameter;
[0050] When the standard deviation difference parameter is greater than the similarity difference parameter, the time window length is dynamically adjusted using the standard deviation difference parameter and the similarity difference parameter, and data segments are obtained according to the adjusted time window.
[0051] Furthermore, the risk classification and early warning module includes:
[0052] The risk level mapping unit is configured to receive the sequence of fault risk probability values and the confidence assessment index of risk change trend from the fault trend prediction module, classify the current and future operating status of power equipment into levels, and output multi-level risk judgment results. The multi-level risk judgment results include the current risk level, the maximum risk level within the prediction period, the risk level change trend, and the corresponding confidence assessment value.
[0053] The input risk probability value sequence is compared step by step. Based on the relative position of the predicted value and the multi-level risk boundary at each time step, the risk level corresponding to the current and future time nodes is determined. Combined with the trend confidence assessment results, the reliability weight of the level judgment result is calculated.
[0054] The dynamic early warning decision unit is configured to receive multi-level risk judgment results from the risk level mapping unit, combine the current operating status characteristics of the equipment with the historical status offset trajectory, compare the distance measurement between the current status vector and the historical low-risk operating status, and evaluate the risk level of the current status through offset amplitude, directional change and status trend. If the risk level is stably in the same high-risk range for multiple consecutive time steps, the early warning response level is improved.
[0055] Compared with the prior art, the beneficial effects of the present invention are:
[0056] 1. The multi-model fusion analysis module of this invention constructs a model set that fuses supervised and semi-supervised learning models, and achieves dynamic collaborative analysis based on mutual information metric and Bayesian optimal weight strategy. This significantly improves the system's ability to identify diverse fault types and its generalization performance, enhancing the overall learning ability and adaptability of the model. By introducing mutual information metric, it identifies the complementarity and redundancy between model outputs, thereby achieving complementary advantages and reducing bias. The fusion weights are not statically configured but adaptively adjusted based on historical error performance and current task feedback, effectively preventing excessive concentration of some model weights or the participation of ineffective models in the fusion. The introduction of an anomaly detection feedback mechanism further endows the system with self-diagnosis and correction capabilities, enabling timely detection of insufficient confidence or severe discrepancies in model predictions during the fusion process. Furthermore, by enhancing the sample feedback mechanism, it optimizes the model training process and continuously improves the overall early warning performance.
[0057] 2. The unsupervised clustering auxiliary module of this invention introduces a density-aware K-means algorithm and constructs a pseudo-label generation mechanism, which not only improves the accuracy of data spatial structure recognition, but also realizes the ability to mine potential abnormal patterns from unlabeled samples, significantly expanding the application scope of the system in weakly supervised environments. The density-aware mechanism enables the clustering algorithm to adapt to the non-equilibrium distribution of clusters in the feature space, enhances the ability to distinguish boundary samples and sparse outliers, identifies highly deviated samples after clustering analysis, and assigns confidence factors, realizing the initial labeling of abnormal patterns. It improves sample utilization by leveraging clustering results, assists the model in capturing abnormal features, and realizes the continuous enhancement of the training set and the improvement of risk identification accuracy in the multi-model fusion analysis module.
[0058] 3. The fault trend prediction module of this invention realizes dynamic evolution modeling of future operational risks of power equipment by constructing multi-dimensional time-series features and stacking prediction models with multi-layer nonlinear structures. It has the advantages of trend judgment and early response. It combines current state information with historical risk labels to extract comprehensive time-series features of short-term disturbances and long-term evolution, so that the model can identify potential risk paths that gradually evolve from small parameter changes to faults. The prediction model adopts a hierarchical structure to model complex nonlinear time-series relationships and gives higher weight to key state change points, which effectively improves the system's accuracy in capturing fault critical points. The prediction output includes specific probability values, enabling the entire system to have proactive prediction capabilities, shifting from "responding after a fault occurs" to "intervening during risk evolution", effectively extending the equipment early warning time window and winning critical time for operation and maintenance decisions. Attached Figure Description
[0059] Figure 1 This is a schematic diagram of the power equipment fault early warning and analysis system module of the present invention. Detailed Implementation
[0060] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0061] Please see Figure 1 The present invention provides the following technical solutions:
[0062] A power equipment fault early warning and analysis system based on the fusion of machine learning and K-means algorithm includes:
[0063] The system comprises the following modules: a data feature construction module, configured to collect multi-dimensional dynamic monitoring data during power equipment operation, construct fault features, and generate multi-dimensional feature vectors; a multi-model fusion analysis module, configured to construct a model set including multiple sub-models, perform high-dimensional feature response analysis on the target equipment's state, and dynamically update the contribution weights of each sub-model in the fusion output; an unsupervised clustering auxiliary module, configured to perform unsupervised clustering analysis on power equipment operating status data using the K-means clustering algorithm; a fault trend prediction module, configured to construct a multi-dimensional time series prediction model based on the fault risk probability sequence of the fusion output, and assess the risk evolution trend of power equipment in future operating cycles; and a risk classification and early warning module, configured to dynamically classify and issue early warnings for risk levels based on the output results of the fault trend prediction module and the degree of deviation of the equipment's current state.
[0064] The data feature construction module also includes:
[0065] The system receives multi-dimensional dynamic monitoring data from power equipment, including voltage, current, temperature, vibration frequency, and load changes. It then performs multi-dimensional feature expansion on the dynamic monitoring data, constructs a set of nonlinear mapping functions, and performs expansion transformation on the dynamic monitoring data based on a combination strategy of polynomial transformation, exponential function mapping, and ratio function.
[0066] Based on the high-order cross-combination strategy, variables with potential correlations between different dimensions in dynamic monitoring data are combined to form a high-order multidimensional feature vector.
[0067] The constructed multidimensional feature vectors are subjected to time series encoding to extract temporal features that reflect the trend of state changes during equipment operation.
[0068] In the above embodiments, by introducing multi-dimensional dynamic monitoring data and combining it with nonlinear mapping functions, cross-combination strategies, and time-series coding processing, deep feature mining and refined modeling of the operating status of power equipment are achieved, enhancing the system's ability to identify fault symptoms under complex operating conditions. By expanding the original feature space through polynomial transformation, exponential function mapping, and ratio combination strategies, the weak correlation signals between features are effectively amplified, improving the system's sensitivity to early fault signs. High-order cross-combinations can uncover potential coupling relationships between dimensions, enhancing the model's ability to perceive multi-factor coupled failure mechanisms. Through time-series coding, the system further captures the dynamic evolution trajectory of operating data on the time axis, enabling subsequent models to identify state trend changes and providing a higher-resolution input basis for fault trend prediction.
[0069] The multi-model fusion analysis module includes:
[0070] The model set construction unit is configured to initialize a model set including multiple sub-models based on the multi-dimensional feature vector output by the data feature construction module. The sub-models include a supervised learning model and a semi-supervised learning model. The supervised learning model is trained based on existing labeled data and learns the mapping relationship between equipment failure modes and features. The semi-supervised learning model combines labeled data and unlabeled samples and learns the latent representation structure through graph structure propagation.
[0071] The sub-models are trained using training and validation sets constructed from historical fault record samples. Multidimensional performance evaluation metrics are set to assess the training effect of the sub-models. After the model set training is completed, the state response results of each sub-model to the input samples are output.
[0072] The model fusion unit is configured to receive prediction outputs from multiple sub-models from the model set construction unit. Based on the multi-model collaborative analysis framework, it performs weighted fusion of the sub-models. During the fusion process, a mutual information metric between models is introduced to quantify the information redundancy and complementarity between the output results of different models. The error statistics of each sub-model in historical predictions are calculated. An adaptive weight adjustment strategy based on the Bayesian optimization principle is adopted to dynamically update the fusion weights of each sub-model.
[0073] An anomaly detection feedback unit is configured to identify anomalies during the sub-model fusion process. The anomalies include significant discrepancies between model prediction results, unstable fluctuations in fusion output, and low-confidence output. When an anomaly is detected, the corresponding sample is marked as an augmented sample and fed back to the model set construction unit to participate in the subsequent iterative training process.
[0074] In the above embodiments, by constructing a model set that integrates supervised and semi-supervised learning models, and achieving dynamic collaborative analysis based on mutual information metric and Bayesian optimization weight strategy, the system's ability to identify diverse fault types and its generalization performance are significantly improved. The supervised model can accurately identify known fault modes, while the semi-supervised model can make full use of unlabeled samples to complete the data distribution when data labels are insufficient or during the initial commissioning stage of equipment, thereby improving the overall learning ability and adaptability of the model. By introducing mutual information metric, the complementarity and redundancy between model outputs are identified, thereby achieving complementary advantages and reducing bias. The fusion weights are not statically configured, but are adaptively adjusted using a Bayesian optimization mechanism based on historical error performance and current task feedback, effectively preventing the excessive concentration of some model weights or the participation of ineffective models in the fusion. The introduction of an anomaly detection feedback mechanism further endows the system with self-diagnosis and correction capabilities, enabling timely detection of insufficient confidence or severe discrepancies in model predictions during the fusion process. Furthermore, by enhancing the sample feedback mechanism to optimize the model training process, the overall early warning performance is continuously improved.
[0075] Unsupervised clustering auxiliary modules include:
[0076] Cluster initialization unit, configured as follows:
[0077] The multidimensional feature vectors generated by the data feature construction module are preprocessed and used for multi-centroid initialization of the K-means clustering algorithm. By statistically distributing each dimension in the feature space, multiple representative initial cluster centers are calculated. Based on the feature distribution density and sample distance relationship, feature standardization and outlier correction are performed. The preprocessed multidimensional feature vector samples will be used as input for K-means clustering calculation.
[0078] The density-aware analysis unit is configured to receive multi-dimensional feature vector samples from the clustering initialization unit, perform unsupervised clustering analysis based on the K-means algorithm, dynamically adjust the cluster convergence conditions and partitioning thresholds according to the average distance and density distribution between sample points within each cluster, adjust the attraction radius of each centroid during the clustering process, iteratively correct the boundaries of each cluster after clustering, and output the cluster label and confidence index of each multi-dimensional feature vector sample.
[0079] The pseudo-label generation unit is configured to identify potential abnormal clusters and generate pseudo-labels based on the structural characteristics of each cluster sample after unsupervised clustering. In the process of identifying potential abnormal clusters, the set of samples with high deviation is identified by jointly judging the cluster density, inter-cluster separation and cluster center fluctuation characteristics.
[0080] The set of potential abnormal cluster samples is marked as pseudo-label samples, and a confidence factor related to the degree of abnormality of the pseudo-label samples is assigned. The results are output to the multi-model fusion analysis module as an auxiliary training data source. When generating pseudo-labels, the confidence level of the pseudo-labels is dynamically adjusted according to the clustering stability and the change of the affiliation of each multi-dimensional feature vector sample in multiple rounds of clustering.
[0081] In the above embodiments, the introduction of a density-aware K-means algorithm and the construction of a pseudo-label generation mechanism not only improves the accuracy of data spatial structure recognition but also enables the mining of potential abnormal patterns from unlabeled samples, significantly expanding the application scope of the system in weakly supervised environments. The density-aware mechanism enables the clustering algorithm to adapt to the uneven distribution of clusters in the feature space, enhancing the ability to distinguish boundary samples and sparse outliers. After the clustering analysis is completed, the pseudo-label generation unit identifies highly deviated samples based on the stability of the cluster structure and the dynamic changes in their affiliation, and assigns confidence factors, thus achieving preliminary labeling of abnormal patterns. The clustering results improve sample utilization, assist the model in capturing abnormal features, and realize the continuous enhancement of the training set and the improvement of risk identification accuracy in the multi-model fusion analysis module. This is particularly suitable for real-world scenarios where data is continuously growing but labels are lacking.
[0082] The fault trend prediction module includes:
[0083] The time series feature extraction unit is configured to receive the fusion output results from the multi-model fusion analysis module and combine them with the multi-dimensional feature vector generated by the data feature construction module to construct multi-dimensional time series features that reflect the evolution process of the equipment state.
[0084] Based on the preset time window length, the fault risk probability sequence in the historical fusion output is extracted and synchronously integrated with the multi-dimensional feature vector of equipment operation in the corresponding time period to generate a data segment with temporal consistency.
[0085] The multidimensional feature vectors within the sliding window are standardized and subjected to time-series enhancement transformation to extract time-related features containing short-term trends and long-term state evolution information. The extracted time-series feature sequences are then output to the trend prediction modeling unit.
[0086] The trend prediction modeling unit is configured to model complex nonlinear time series relationships based on multi-dimensional time series features and through multi-layer structure stacking, and to build a fault risk evolution prediction model for future time periods. During the training process, supervised training samples are constructed using historically known risk labels, and the fault risk evolution prediction model is iteratively optimized using gradients.
[0087] The key state change points in the input multidimensional time series features are assigned high weights, and the prediction results are output. The prediction results include the fault risk probability value sequence of the target power equipment in multiple future time steps and the confidence evaluation index of the risk change trend. The prediction results are passed as input to the risk classification and early warning module.
[0088] In the above embodiments, by constructing multi-dimensional time-series features and stacking prediction models with multi-layered nonlinear structures, dynamic evolution modeling of future operational risks of power equipment is achieved. This model possesses strong trend judgment capabilities and early response advantages. It not only combines current state information with historical risk labels but also extracts comprehensive time-series features of short-term disturbances and long-term evolution through sliding time windows and time-series enhancement processing. This enables the model to identify potential risk paths that gradually evolve from small parameter changes to faults. The prediction model adopts a hierarchical structure to model complex nonlinear time-series relationships, giving higher weight to key state changes, effectively improving the system's accuracy in capturing fault critical points. The prediction output includes not only specific probability values but also risk evolution trends and their confidence indices, providing multi-faceted support for the risk classification and early warning module. The introduction of this module enables the entire system to have proactive prediction capabilities, shifting from "responding after a fault occurs" to "intervening during risk evolution," effectively extending the equipment early warning time window and gaining critical time for operation and maintenance decisions.
[0089] Specifically, the time window length is dynamically adjusted, including:
[0090] The data segment is retrieved after extracting the fault risk probability sequence and synchronously integrating the multi-dimensional feature vector of equipment operation from the preset time window length;
[0091] Extract each pair of adjacent data segments as a data segment group;
[0092] Obtain the data similarity values between data segments corresponding to each data segment group;
[0093] The data similarity values are compared with preset similarity reference values;
[0094] Select data segments with similarity values lower than the preset similarity reference value as target data segments;
[0095] Based on the final generation time of the two data segments contained in each target data segment group, obtain the intermediate time corresponding to the final generation time of the two data segments contained in each target data segment group, and use it as the calibration time of each data segment group;
[0096] Based on the calibration time of the target data segment group, obtain the standard deviation of the time interval corresponding to the occurrence of the target data segment group;
[0097] Retrieve the data similarity scores corresponding to the target data segment group;
[0098] Based on the data similarity values corresponding to the target data segment group, obtain the standard deviation of the similarity values corresponding to the target data segment group;
[0099] The length of the time window is dynamically adjusted based on the standard deviation of the time interval and the standard deviation of the similarity value corresponding to the target data segment group.
[0100] The technical advantages of the above solution are as follows: Traditional methods often use fixed time windows or adjust based on only a single indicator (such as time interval or single feature threshold), which cannot capture the multi-dimensional coupling characteristics of equipment operation and easily leads to: redundant calculations during stable periods and insufficient resolution during abrupt changes; ignoring the correlation between feature parameter fluctuations and time distribution, resulting in lagging or excessive adjustments. This embodiment, through the location of low-similarity data segment groups, can identify minute changes in feature parameters in advance. This technical solution, through a two-dimensional dynamic adjustment mechanism, significantly improves the accuracy and efficiency of equipment operation status monitoring. By combining data similarity with the standard deviation of time intervals to adjust the time window length, this approach can shorten the fault warning time by an average of more than 30% compared to traditional fixed-window schemes. It can capture subtle changes in equipment operating status in advance, effectively reducing computational resource consumption and improving computational efficiency. Under complex operating conditions, it can dynamically match the rhythm of operating condition switching, improving the accuracy of data feature extraction by more than 25% and avoiding feature aliasing problems. In addition, this scheme does not require the pre-setting of complex physical models. It automatically adapts to individual differences and aging characteristics of equipment through statistics, improving the system's adaptability to different equipment operating states by 40%, enhancing the generalization ability of fault warnings, reducing false alarm / missed alarm rates, and demonstrating higher reliability and practicality in equipment health management.
[0101] Specifically, the length of the time window is dynamically adjusted based on the standard deviation of the time interval and the standard deviation of the similarity values corresponding to the target data segment group, including:
[0102] Retrieve the standard deviation of the time interval corresponding to the target data segment group;
[0103] The time interval standard deviation is normalized using a preset time interval standard deviation reference value to obtain the normalized time interval standard deviation.
[0104] Retrieve the standard deviation of the similarity values corresponding to the target data segment group;
[0105] The similarity numerical standard deviation is normalized using a preset reference value to obtain the normalized similarity numerical standard deviation.
[0106] The standard deviation of the normalized similarity value is compared with the standard deviation of the normalized time interval to obtain the standard deviation difference parameter.
[0107] The similarity values corresponding to the target data segment group are used to obtain the median similarity value of the target data segment group;
[0108] The similarity difference parameter is obtained by comparing the median similarity value with a preset similarity reference value.
[0109] Compare the standard deviation difference parameter with the similarity difference parameter;
[0110] When the standard deviation difference parameter is greater than the similarity difference parameter, the time window length is dynamically adjusted using the standard deviation difference parameter and the similarity difference parameter, and data segments are obtained according to the adjusted time window.
[0111] The adjusted time window length is obtained using the following formula:
[0112]
[0113] Where T represents the adjusted time window length; T0 represents the preset time window length; S y This represents the preset similarity reference value; S z S represents the median similarity value. b C represents the similarity difference parameter; b This represents the standard deviation parameter. Specifically, The standard deviation of the difference parameter C b This reflects the degree of difference between the standard deviation of the time interval and the normalized standard deviation of the similarity values, demonstrating the comprehensive differences in the dispersion of the equipment's operating status over time and the amplitude of its characteristic fluctuations. The similarity difference parameter S... b It is the median similarity S z Similarity reference value S y The comparison results characterize the degree to which the equipment's operating status deviates from its normal state. C b -S bThis measures the net difference between these two differences, which, in a physical sense, is a comprehensive assessment of the non-stationarity of equipment operating status in terms of time and characteristics. middle, Set the preset similarity reference value S y Multiplying this by the previously calculated overall difference value emphasizes the weight of the normal-state similarity reference in the overall difference assessment. Denominator S y +S z This involves adding the preset similarity reference value to the median of the actual similarity value, reflecting the overall level of the equipment's operating status in the similarity dimension, and then multiplying by C. b The calculation comprehensively considers the impact of differences in the standard deviation of time intervals. The overall fractional calculation result represents the quantified proportion of the adjustment range of the time window length based on the comprehensive differences in equipment operating status across time and feature dimensions. This calculation involves subtracting the fractional result from 1 to obtain the correction coefficient for adjusting the time window length. Physically, 1 represents the initial unadjusted state. Subtracting the fractional result means reversing the initial state based on the comprehensive differences in equipment operating conditions. A larger fractional result indicates more complex equipment state changes, resulting in a smaller correction coefficient and a larger adjustment range for the time window length. By comprehensively considering parameters related to the standard deviation of time intervals and the standard deviation of similarity values, changes in equipment operating conditions can be captured more accurately. This ensures the time window length aligns with sudden changes in equipment state, allowing for timely and accurate collection of initial fault data, effectively improving fault warning accuracy and reducing false alarms and missed alarms. The time window length is dynamically adjusted based on the equipment operating condition. When equipment is running smoothly, the window length is appropriately increased to reduce data collection and computation, lowering computational resource consumption by approximately 30%–40%. When equipment conditions fluctuate, the window length is reduced to ensure the collection of critical data and improve data processing efficiency. This formula takes into account the differences in multi-dimensional state parameters, enabling the time window length adjustment mechanism to better adapt to changes in equipment operating status under different working conditions. Whether it is a continuous and stable operating condition or a complex operating condition such as frequent start-stop and variable load, the window length can be effectively adjusted, improving the adaptability of the operating condition by 30% to 40%, and ensuring the effectiveness and stability of equipment status monitoring.
[0114] The technical advantages of the above solution are as follows: Traditional dynamic adjustment of time window length may be based on only a single indicator (such as time interval or data volume), while the above solution achieves more precise adjustment of the time window length by comprehensively considering two key factors: the standard deviation of the time interval and the standard deviation of the similarity value. This multi-dimensional adjustment method can more accurately reflect the actual changes in the data segment group, thereby improving the accuracy and efficiency of data processing. Because it considers the standard deviations of both time interval and similarity, the above solution can better adapt to different types of data segment groups. Whether it is a dataset with large changes in time interval or a dataset with significant fluctuations in similarity, the above solution can achieve reasonable adjustment of the time window length, thereby ensuring the stability and reliability of data processing. By retrieving and normalizing the standard deviations of time interval and similarity values in real time, the above solution can quickly detect changes in the data segment group and dynamically adjust the time window length accordingly. This dynamic response capability allows the system to adapt to changes in the data environment in a timely manner, avoiding problems such as low data processing efficiency or decreased accuracy due to improper time window length settings. Meanwhile, by comprehensively considering changes in time intervals and similarity, the above-mentioned technical solution can more accurately capture the characteristics of data segment groups, thereby adjusting to a more suitable time window length and reducing data processing errors caused by improper time window settings. Dynamically adjusting the time window length allows the system to flexibly adjust processing strategies based on the actual situation of the data segment groups, avoiding unnecessary waste of computational resources and thus improving overall data processing efficiency. The above-mentioned technical solution can adapt to different types of data segment groups; whether the time interval varies greatly or the similarity fluctuates significantly, it can maintain the stability and accuracy of data processing by adjusting the time window length. By retrieving and normalizing the standard deviation of the time interval and the standard deviation of the similarity values in real time, the above-mentioned technical solution can quickly respond to changes in data segment groups, adjust the time window length in a timely manner, and ensure the real-time performance and effectiveness of data processing.
[0115] The risk classification and early warning module includes:
[0116] The risk level mapping unit is configured to receive a sequence of failure risk probability values and the confidence level of risk change trends from the failure trend prediction module.
[0117] The system compares the input risk probability value sequence step by step, and classifies the current and future operating status of power equipment according to the predicted values and multi-level risk assessment indicators at each time step. The system outputs multi-level risk judgment results, which include the current risk level, the maximum risk level within the prediction period, the risk level change trend, and the corresponding confidence assessment value. The system also determines the risk level corresponding to the current and future time nodes by considering the relative position of the risk boundary, and calculates the reliability weight of the judgment result based on the trend confidence assessment result.
[0118] The dynamic early warning decision unit is configured to receive multi-level risk judgment results from the risk level mapping unit, combine the current operating status characteristics of the equipment with the historical status offset trajectory, compare the distance measurement between the current status vector and the historical low-risk operating status, and evaluate the risk level of the current status through offset amplitude, directional change and status trend. If the risk level is stably in the same high-risk range for multiple consecutive time steps, the early warning response level is improved.
[0119] In the above embodiments, by introducing two sub-units, risk level mapping and dynamic early warning decision-making, a multi-level dynamic risk early warning system with time dimension, confidence assessment and state deviation comprehensive judgment capabilities is constructed. This significantly enhances the system's guiding value and practical performance in actual operation and maintenance scenarios. Based on the output results of the fault trend prediction module, the risk probability values at multiple future time nodes are analyzed. Reliability weights are constructed by combining trend confidence indicators, reflecting the credibility of the prediction results while quantifying the risk level. Furthermore, by comprehensively considering the current operating status of power equipment and historical deviation trajectories, the degree of deviation of the state vector in the feature space is calculated. The risk evolution trend is evaluated from three aspects: directionality, amplitude and continuity. When continuous high-risk areas appear, the early warning level is promptly upgraded, improving the timeliness of response and the scientific nature of operation and maintenance priority ranking. This realizes a three-dimensional dynamic evaluation method based on "trend + state deviation", significantly reducing the false alarm rate and missed alarm rate, and improving the intelligence and customization level of early warning response.
[0120] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.
Claims
1. A power equipment fault early warning analysis system based on the fusion of machine learning and K-means algorithm, characterized in that, include: The data feature construction module is configured to collect multi-dimensional dynamic monitoring data during the operation of power equipment, construct fault features, and generate multi-dimensional feature vectors. The multi-model fusion analysis module is configured to construct a model set including multiple sub-models, perform high-dimensional feature response analysis on the state of the target equipment, and dynamically update the contribution weight of each sub-model in the fusion output; the unsupervised clustering auxiliary module is configured to perform unsupervised clustering analysis on the power equipment operating status data using the K-means clustering algorithm; and the fault trend prediction module is configured to construct a multi-dimensional time series prediction model based on the fault risk probability sequence of the fusion output to assess the risk evolution trend of power equipment in the future operating cycle. The risk classification and early warning module is configured to dynamically classify and issue early warnings for risk levels based on the output of the fault trend prediction module and the degree of deviation of the current state of the equipment.
2. The power equipment fault early warning analysis system based on the fusion of machine learning and K-means algorithm as described in claim 1, characterized in that: The data feature construction module further includes: The system receives multi-dimensional dynamic monitoring data from power equipment, including voltage, current, temperature, vibration frequency, and load changes. It then performs multi-dimensional feature expansion on the dynamic monitoring data, constructs a set of nonlinear mapping functions, and performs expansion transformation on the dynamic monitoring data based on a combination strategy of polynomial transformation, exponential function mapping, and ratio function. Based on the high-order cross-combination strategy, variables with potential correlations between different dimensions in dynamic monitoring data are combined to form a high-order multidimensional feature vector. The constructed multidimensional feature vectors are subjected to time series encoding to extract temporal features that reflect the trend of state changes during equipment operation.
3. The power equipment fault early warning analysis system based on the fusion of machine learning and K-means algorithm as described in claim 1, characterized in that: The multi-model fusion analysis module includes: The model set construction unit is configured to initialize a model set including multiple sub-models based on the multi-dimensional feature vector output by the data feature construction module. The sub-models include a supervised learning model and a semi-supervised learning model. The supervised learning model is trained based on existing labeled data and learns the mapping relationship between equipment failure modes and features. The semi-supervised learning model combines labeled data and unlabeled samples and learns the latent representation structure through graph structure propagation. The sub-models are trained using training and validation sets constructed from historical fault record samples. Multidimensional performance evaluation metrics are set to assess the training effect of the sub-models. After the model set training is completed, the state response results of each sub-model to the input samples are output. The model fusion unit is configured to receive prediction outputs from multiple sub-models from the model set construction unit. Based on the multi-model collaborative analysis framework, it performs weighted fusion of the sub-models. During the fusion process, a mutual information metric between models is introduced to quantify the information redundancy and complementarity between the output results of different models. The error statistics of each sub-model in historical predictions are calculated, and an adaptive weight adjustment strategy based on the Bayesian optimization principle is adopted to dynamically update the fusion weights of each sub-model.
4. The power equipment fault early warning analysis system based on the fusion of machine learning and K-means algorithm as described in claim 3, characterized in that: The multi-model fusion analysis module also includes: An anomaly detection feedback unit is configured to identify anomalies during the sub-model fusion process. The anomalies include significant discrepancies between model prediction results, unstable fluctuations in fusion output, and low-confidence output. When an anomaly is detected, the corresponding sample is marked as an augmented sample and fed back to the model set construction unit to participate in the subsequent iterative training process.
5. The power equipment fault early warning analysis system based on the fusion of machine learning and K-means algorithm as described in claim 1, characterized in that: The unsupervised clustering auxiliary module includes: The clustering initialization unit is configured to perform preliminary preprocessing on the multidimensional feature vectors generated by the data feature construction module and to use it for multi-centroid initialization of the K-means clustering algorithm. By statistically distributing each dimension in the feature space, multiple representative initial cluster centers are calculated. Based on the feature distribution density and sample distance relationship, feature standardization and outlier correction are performed. The preprocessed multidimensional feature vector samples will be used as input for K-means clustering calculation. The density-aware analysis unit is configured to receive multi-dimensional feature vector samples from the clustering initialization unit, perform unsupervised clustering analysis based on the K-means algorithm, dynamically adjust the cluster convergence conditions and partitioning thresholds according to the average distance and density distribution between sample points within each cluster, adjust the attraction radius of each centroid during the clustering process, iteratively correct the boundaries of each type of cluster after clustering, and output the cluster label and confidence index of each multi-dimensional feature vector sample.
6. The power equipment fault early warning analysis system based on the fusion of machine learning and K-means algorithm as described in claim 5, characterized in that: The unsupervised clustering auxiliary module also includes: The pseudo-label generation unit is configured to identify potential abnormal clusters and generate pseudo-labels based on the structural characteristics of each cluster sample after unsupervised clustering. In the process of identifying potential abnormal clusters, the set of samples with high deviation is identified by jointly judging the cluster density, inter-cluster separation and cluster center fluctuation characteristics. The set of potential abnormal cluster samples is marked as pseudo-label samples, and a confidence factor related to the degree of abnormality of the pseudo-label samples is assigned. The results are output to the multi-model fusion analysis module as an auxiliary training data source. When generating pseudo-labels, the confidence level of the pseudo-labels is dynamically adjusted according to the clustering stability and the change of the affiliation of each multi-dimensional feature vector sample in multiple rounds of clustering.
7. The power equipment fault early warning analysis system based on the fusion of machine learning and K-means algorithm as described in claim 1, characterized in that: The fault trend prediction module includes: The time series feature extraction unit is configured to receive the fusion output results from the multi-model fusion analysis module and combine them with the multi-dimensional feature vector generated by the data feature construction module to construct multi-dimensional time series features that reflect the evolution process of the equipment state. Based on the preset time window length, the fault risk probability sequence in the historical fusion output is extracted and synchronously integrated with the multi-dimensional feature vector of equipment operation in the corresponding time period to generate a data segment with temporal consistency. The multidimensional feature vectors within the sliding window are standardized and subjected to time-series enhancement transformation to extract time-related features containing short-term trends and long-term state evolution information. The extracted time-series feature sequences are then output to the trend prediction modeling unit. The trend prediction modeling unit is configured to model complex nonlinear time series relationships based on multi-dimensional time series features and through multi-layer structure stacking, and to build a fault risk evolution prediction model for future time periods. During the training process, supervised training samples are constructed using historically known risk labels, and the fault risk evolution prediction model is iteratively optimized using gradients. The key state change points in the input multidimensional time series features are assigned high weights, and the prediction results are output. The prediction results include the fault risk probability value sequence of the target power equipment in multiple future time steps and the confidence evaluation index of the risk change trend. The prediction results are passed as input to the risk classification and early warning module.
8. The power equipment fault early warning analysis system based on the fusion of machine learning and K-means algorithm as described in claim 7, characterized in that: Dynamically adjust the time window length, including: The data segment is retrieved after extracting the fault risk probability sequence and synchronously integrating the multi-dimensional feature vector of equipment operation from the preset time window length; Extract each pair of adjacent data segments as a data segment group; Obtain the data similarity values between data segments corresponding to each data segment group; The data similarity values are compared with preset similarity reference values; Select data segments with similarity values lower than the preset similarity reference value as target data segments; Based on the final generation time of the two data segments contained in each target data segment group, obtain the intermediate time corresponding to the final generation time of the two data segments contained in each target data segment group, and use it as the calibration time of each data segment group; Based on the calibration time of the target data segment group, obtain the standard deviation of the time interval corresponding to the occurrence of the target data segment group; Retrieve the data similarity scores corresponding to the target data segment group; Based on the data similarity values corresponding to the target data segment group, obtain the standard deviation of the similarity values corresponding to the target data segment group; The length of the time window is dynamically adjusted based on the standard deviation of the time interval and the standard deviation of the similarity value corresponding to the target data segment group.
9. The power equipment fault early warning analysis system based on the fusion of machine learning and K-means algorithm as described in claim 8, characterized in that: The time window length is dynamically adjusted based on the standard deviation of the time interval and the standard deviation of the similarity values corresponding to the target data segment group, including: Retrieve the standard deviation of the time interval corresponding to the target data segment group; The time interval standard deviation is normalized using a preset time interval standard deviation reference value to obtain the normalized time interval standard deviation. Retrieve the standard deviation of the similarity values corresponding to the target data segment group; The similarity numerical standard deviation is normalized using a preset reference value to obtain the normalized similarity numerical standard deviation. The standard deviation of the normalized similarity value is compared with the standard deviation of the normalized time interval to obtain the standard deviation difference parameter. The similarity values corresponding to the target data segment group are used to obtain the median similarity value of the target data segment group; The similarity difference parameter is obtained by comparing the median similarity value with a preset similarity reference value. Compare the standard deviation difference parameter with the similarity difference parameter; When the standard deviation difference parameter is greater than the similarity difference parameter, the time window length is dynamically adjusted using the standard deviation difference parameter and the similarity difference parameter, and data segments are obtained according to the adjusted time window.
10. The power equipment fault early warning analysis system based on the fusion of machine learning and K-means algorithm as described in claim 1, characterized in that: The risk classification and early warning module includes: The risk level mapping unit is configured to receive the sequence of fault risk probability values and the confidence assessment index of risk change trend from the fault trend prediction module, classify the current and future operating status of power equipment into levels, and output multi-level risk judgment results. The multi-level risk judgment results include the current risk level, the maximum risk level within the prediction period, the risk level change trend, and the corresponding confidence assessment value. The input risk probability value sequence is compared step by step. Based on the relative position of the predicted value and the multi-level risk boundary at each time step, the risk level corresponding to the current and future time nodes is determined. Combined with the trend confidence assessment results, the reliability weight of the level judgment result is calculated. The dynamic early warning decision unit is configured to receive multi-level risk judgment results from the risk level mapping unit, combine the current operating status characteristics of the equipment with the historical status offset trajectory, compare the distance measurement between the current status vector and the historical low-risk operating status, and evaluate the risk level of the current status through offset amplitude, directional change and status trend. If the risk level is stably in the same high-risk range for multiple consecutive time steps, the early warning response level is improved.