A big data-based photovoltaic power station fault early warning and information diagnosis system
By analyzing string inverter data from photovoltaic power plants using big data, dynamic and static features are extracted, and an unsupervised model is constructed. This solves the problem of online identification of early-stage micro-mismatch faults in photovoltaic power plants, and achieves efficient and low-cost fault early warning and diagnosis.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUANGXI FENGMO NEW ENERGY CO LTD
- Filing Date
- 2026-03-23
- Publication Date
- 2026-06-19
AI Technical Summary
Existing photovoltaic fault diagnosis technologies cannot achieve continuous online monitoring at all times, making it difficult to identify micro-mismatch faults caused by early microcracks in modules, and they also suffer from high false alarm rates and high costs.
A photovoltaic power plant fault early warning and information diagnosis system based on big data is adopted. Through data preprocessing, dual-dimensional feature extraction and anomaly identification and cross-validation modules, dynamic response features and static features are extracted using the native data of string inverters to construct an unsupervised anomaly screening model, thereby realizing online early warning of early hidden cracks in the components.
It achieves low-cost, high-accuracy online early warning of microcracks in components, with an identification accuracy of 96% and a false alarm rate of less than 1%. No new hardware is required, and it is compatible with all types of photovoltaic power plants.
Smart Images

Figure CN122247338A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of intelligent operation and maintenance technology for power plants, and in particular to a photovoltaic power plant fault early warning and information diagnosis system based on big data. Background Technology
[0002] With the rapid development of the global new energy industry, the installed capacity of photovoltaic power plants continues to expand, and string photovoltaic power plants have become the mainstream application form in the industry due to their high flexibility and high power generation efficiency. Micro-mismatch faults caused by early microcracks in photovoltaic modules are the core cause of irreversible power generation losses and hot spot fire safety accidents. The industry's demand for online accurate early warning and diagnosis of early module faults is becoming increasingly urgent.
[0003] Existing photovoltaic fault diagnosis technologies are mainly divided into two categories: one is offline detection schemes that rely on EL detection and infrared thermal imaging, which cannot achieve continuous online monitoring at all times, and have high detection costs and low operating efficiency, making it difficult to adapt to the routine operation and maintenance needs of large-scale power plants; the other is fixed threshold early warning schemes based on string power data, which can only identify obvious faults with power attenuation of more than 8%, and have the technical pain point that weak features of micro-mismatch faults within 3% caused by early microcracks are easily drowned out by operating noise and cannot be effectively extracted and identified.
[0004] Meanwhile, existing online identification solutions have blind spots in steady-state operating condition monitoring, making it difficult to distinguish between component damage and external operating condition disturbances from a physical perspective, resulting in high false alarm and false negative rates. Some high-precision solutions rely on adding new component-level acquisition hardware, which significantly increases the cost of power plant renovation and cannot be adapted to the large-scale application of small and medium-sized photovoltaic power plants. Currently, the industry lacks a low-cost, all-time, and highly accurate online early warning solution for early microcracks and micro-mismatches in components.
[0005] Therefore, a photovoltaic power plant fault early warning and information diagnosis system based on big data is proposed. Summary of the Invention
[0006] This application aims to at least partially solve one of the technical problems in the aforementioned technologies.
[0007] To achieve the above objectives, the first aspect of this application proposes a photovoltaic power station fault early warning and information diagnosis system based on big data, including a data acquisition unit, a data processing unit, and an early warning output unit. The data processing unit includes a data preprocessing module, a two-dimensional feature extraction module, and an anomaly identification and cross-validation module that are connected in sequence.
[0008] The data preprocessing module collects native multi-source time-series data and string basic ledger data of string photovoltaic power stations, performs effective data interval filtering, operating condition disturbance decoupling, data standardization processing, and constructs a string cluster with the same operating conditions.
[0009] The native multi-source timing data includes 10Hz and above string-level high-frequency current and DC voltage synchronization timing data collected on the DC side of the string inverter, small-amplitude IV sweep frequency segment data periodically generated during the inverter MPPT optimization process, and power station-level second-level horizontal total irradiance and ambient temperature acquisition data.
[0010] The basic ledger data for the string includes the component model corresponding to the string, the number of components in the string, the installation orientation, the installation tilt angle, and the commissioning time.
[0011] The dual-dimensional feature extraction module extracts two types of specific fault features. The first type is the dynamic response features within the effective range of irradiation transient, including current and irradiation response hysteresis difference features, current time series nonlinear distortion features, and benchmark cluster feature dispersion features. The second type is the static characteristic features within the effective range of irradiation steady state, including maximum power point slope distortion features and fill factor relative decay features.
[0012] The anomaly identification and cross-validation module is configured to perform initial screening of string anomalies based on the dynamic response features, perform cross-validation of the initially screened abnormal strings based on the static characteristic features, and complete the final confirmation of early microcrack micro-mismatch faults of photovoltaic modules through time synchronization verification.
[0013] The early warning output unit is communicatively connected to the anomaly identification and cross-validation module, and is configured to perform fault severity quantification on the confirmed fault strings, generate graded early warning information and diagnostic results, and perform model self-iterative optimization based on the fault handling results.
[0014] This technical solution is compatible with the existing hardware infrastructure and operation and maintenance system of photovoltaic power plants. It requires no additional hardware investment and can be directly implemented, fundamentally filling the gap in the industry for a low-cost, high-accuracy online early warning solution for early microcracks in modules.
[0015] In addition, the photovoltaic power plant fault early warning and information diagnosis system based on big data proposed in this application may also have the following additional technical features:
[0016] As a further description of the above technical solution:
[0017] The effective data range filtering performed by the data preprocessing module specifically involves filtering time-series intervals with irradiance intensity between 200W / m² and 800W / m² and irradiance change rate ≥ 50W / m²·s as transient effective intervals, and filtering time-series intervals with irradiance intensity between 400W / m² and 1000W / m² and irradiance fluctuation amplitude ≤ 20W / m² within 5 minutes as steady-state effective intervals. Invalid data from nighttime, inverter start-up and shutdown, power-limited operation, and irradiance exceeding the range are simultaneously removed.
[0018] This technical solution significantly improves the signal-to-noise ratio of weak fault characteristics from the data source, filters out more than 90% of invalid interference data, solves the problem of early microcrack micro-mismatch characteristics being submerged by operating noise, greatly reduces the computational load of subsequent modules, and fills the monitoring blind spot of existing technologies that can only rely on transient irradiation data, achieving effective data coverage for all operating conditions and all time periods.
[0019] As a further description of the above technical solution:
[0020] The data preprocessing module performs operating condition disturbance decoupling by using a fixed-frequency sliding window differential filtering algorithm to remove periodic current disturbances of 1-2Hz fixed frequency during the inverter MPPT optimization process, while retaining the non-periodic timing characteristics of the operating condition drive. The data standardization process uses the Z-score standardization method to eliminate the interference of absolute irradiance and ambient temperature fluctuations on feature extraction. The same operating condition benchmark string cluster is a set of normal strings with the same component model, same installation orientation, same installation tilt angle, and same commissioning years within the same power station.
[0021] This technical solution eliminates two major sources of false alarms: first, the problem of normal MPPT disturbances being misjudged as fault features; and second, the problem of feature misjudgment caused by differences in global operating conditions. At the same time, the constructed benchmark cluster under the same operating conditions does not require additional benchmark equipment, fully reuses existing string data of the same type in the power plant, has extremely strong adaptability, and ensures the accuracy and comparability of subsequent feature extraction.
[0022] As a further description of the above technical solution:
[0023] The current and irradiation response hysteresis difference feature is obtained by calculating the peak response hysteresis time of the target string current time series and the synchronous irradiation time series using a cross-correlation function, and then subtracting it from the baseline hysteresis time of the benchmark string cluster under the same operating conditions. The current time series nonlinear distortion feature is obtained by decomposing the current time series data using Hilbert-Huang transform to obtain the intermediate frequency component, and then calculating the fractal box dimension and sample entropy of the intermediate frequency component. The benchmark cluster feature dispersion feature is obtained by calculating the dispersion of the target string feature and the benchmark string cluster feature using Mahalanobis distance, and then standardizing the result.
[0024] This technical solution extracts exclusive high-order features related only to early microcracks in modules, and has extremely high identification sensitivity for micro-mismatch faults with an overall power attenuation of more than 1.5% in the string. It does not require additional module-level acquisition hardware, and can achieve the identification of early microcracks in a single module using only existing string-level high-frequency data.
[0025] As a further description of the above technical solution:
[0026] The maximum power point slope distortion characteristic is obtained by fitting the curve slope change rate of the ±5% voltage range near the maximum power point in the IV sweep frequency segment using the least squares method, and calculating its relative deviation from the benchmark value of the benchmark string cluster under the same operating conditions; the relative attenuation characteristic of the fill factor is obtained by calculating the relative attenuation of the measured fill factor of the target string and the benchmark fill factor based on the irradiation-temperature-fill factor benchmark fitting model constructed by the benchmark string cluster.
[0027] This technical solution enables the value mining of idle IV data of inverters. It can obtain the core characteristics of the inherent output characteristics of the component without additional detection operations and hardware investment. At the same time, these characteristics are only related to the physical characteristics of the component itself and are not affected by external operating conditions. They perfectly complement the dynamic response characteristics.
[0028] As a further description of the above technical solution:
[0029] The anomaly identification and cross-validation module performs an initial screening of anomalies by constructing an unsupervised anomaly screening model using an improved isolated forest algorithm. The training set of the model uses the dynamic response feature dataset of normal strings of the same model within one year of operation in the power plant. When the model output anomaly score of the target string within a single transient effective interval is ≥0.85 and all three types of dynamic response features exceed the baseline confidence interval of the benchmark string cluster under the same operating conditions, it is determined to be a single interval effective anomaly. Strings with an effective anomaly interval ratio of ≥80% within a continuous 7-day sliding time window are included in the list of anomaly strings in the initial screening.
[0030] This technical solution addresses the industry pain points of existing supervised learning models, which rely on fault-labeled samples and have poor generalization. The model training threshold is extremely low, and it can be quickly adapted to photovoltaic power plants of different types and different years of operation. At the same time, through dual judgment rules, while ensuring the sensitivity of early fault identification, it significantly filters out occasional abnormal interference, effectively reduces the fault false alarm rate, and narrows the screening scope for subsequent cross-validation.
[0031] As a further description of the above technical solution:
[0032] The cross-validation and final confirmation performed by the anomaly identification and cross-validation module are as follows: For strings in the initial screening of abnormal strings, if both types of static characteristics continuously exceed the baseline confidence interval of the benchmark string cluster under the same operating conditions within a 7-day sliding time window, they are included in the early warning candidate pool. For strings in the early warning candidate pool, the time synchronization of dynamic response characteristics and static characteristics is verified. If the time sequence of the two types of abnormal characteristics is highly matched, they are finally confirmed as early microcrack micro-mismatch fault strings of photovoltaic modules.
[0033] This technical solution enables accurate differentiation between genuine and false faults, completely solving the problem of false alarms caused by the inability of existing technologies to distinguish between hidden cracks in the component itself and external disturbances such as cloud cover and obstruction. It controls the false alarm rate of the system's steady-state operating conditions to within 1%, while improving the fault identification accuracy to over 96%, significantly enhancing the credibility of early warning information and the operational value of maintenance.
[0034] As a further description of the above technical solution:
[0035] The early warning output unit uses the entropy weight method to determine the weight of each core feature and constructs a micro-mismatch severity index with a value range of 0-10. The graded early warning rule is that a severity index of 3-5 is a level 1 early warning, 5-8 is a level 2 early warning, and 8-10 is a level 3 early warning. The early warning information and diagnosis results include a unique identifier of the fault string, the early warning level, the severity index, core feature data, and the handling priority.
[0036] This technical solution achieves quantitative classification of early-stage microcrack faults, abandoning the binary output mode of existing technologies that can only determine the presence or absence of faults. It can accurately quantify the degree of fault development and risk level, providing power plant operation and maintenance personnel with clear guidance on handling priorities, greatly improving operation and maintenance efficiency. At the same time, it can complete model self-iteration based on fault handling results, continuously optimizing the system's identification accuracy and adaptability.
[0037] Advantages of this invention:
[0038] According to this application, a photovoltaic power station fault early warning and information diagnosis system based on big data has zero new hardware costs, 100% reuse of native data collected by string inverters, no need to add new component-level monitoring equipment, zero transformation costs, and is compatible with all types of string photovoltaic power stations, with extremely low threshold for implementation.
[0039] Through a dynamic and static dual-dimensional feature fusion architecture, early micro-mismatch faults with an overall power attenuation of more than 1.5% in the string can be identified, with an early warning lead time of 8-14 months, far exceeding conventional fixed threshold solutions.
[0040] It has excellent recognition accuracy, distinguishing between component damage and external operating condition disturbances from the physical essence. The false alarm rate under steady-state operating conditions is ≤1%, and the recognition accuracy is ≥96%, solving the industry's problem of false alarms from the same source.
[0041] The lightweight, unsupervised architecture requires no fault labeling samples and can be seamlessly embedded into existing operation and maintenance systems. Edge deployment balances real-time performance and data security.
[0042] Additional aspects and advantages of this application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of this application. Attached Figure Description
[0043] The above and / or additional aspects and advantages of this application will become apparent and readily understood from the following description of the embodiments taken in conjunction with the accompanying drawings, wherein:
[0044] Figure 1 This is a schematic diagram of the module connection of a photovoltaic power station fault early warning and information diagnosis system based on big data according to an embodiment of this application;
[0045] Figure 2 This is a pie chart showing the verification results of a 30-day monitoring cycle fault early warning system for a photovoltaic power station fault early warning and information diagnosis system based on big data, according to an embodiment of this application.
[0046] Figure 3 This is a bar chart comparing the performance of a big data-based photovoltaic power plant fault early warning and information diagnosis system according to an embodiment of this application with a traditional fixed threshold scheme;
[0047] Figure 4 This is a line graph showing the cumulative early warning and verification trend of a big data-based photovoltaic power station fault early warning and information diagnosis system according to an embodiment of this application over a 30-day monitoring period. Detailed Implementation
[0048] The embodiments of this application are described in detail below. Examples of these embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain this application, and should not be construed as limiting this application.
[0049] The following description, in conjunction with the accompanying drawings, illustrates a photovoltaic power plant fault early warning and information diagnosis system based on big data, according to an embodiment of this application.
[0050] like Figure 1 As shown in the figure, a photovoltaic power station fault early warning and information diagnosis system based on big data according to an embodiment of this application may include a data acquisition unit, a data processing unit and an early warning output unit. The data processing unit includes a data preprocessing module, a two-dimensional feature extraction module and an anomaly identification and cross-validation module that are connected in sequence.
[0051] The data preprocessing module collects native multi-source time-series data and string basic ledger data of string photovoltaic power plants, performs effective data interval filtering, operating condition disturbance decoupling, data standardization processing, and constructs a string cluster with the same operating conditions.
[0052] The native multi-source timing data includes 10Hz and above string-level high-frequency current and DC voltage synchronization timing data collected on the DC side of the string inverter, small-amplitude IV sweep frequency segment data periodically generated during the inverter MPPT optimization process, and second-level horizontal total irradiance and ambient temperature data collected at the power station level.
[0053] The basic ledger data for the string includes the component model corresponding to the string, the number of components in the string, the installation orientation, the installation tilt angle, and the commissioning time;
[0054] The dual-dimensional feature extraction module extracts two types of exclusive fault features. The first type is the dynamic response features within the effective range of irradiation transient, including current and irradiation response hysteresis difference features, current time series nonlinear distortion features, and benchmark cluster feature dispersion features. The second type is the static characteristic features within the effective range of irradiation steady state, including maximum power point slope distortion features and fill factor relative decay features.
[0055] The anomaly identification and cross-validation module is configured to perform initial screening of string anomalies based on dynamic response features, perform cross-validation of the initially screened abnormal strings based on static characteristic features, and finally confirm the early microcrack micro-mismatch fault of photovoltaic modules through timing synchronization verification.
[0056] The early warning output unit is connected to the anomaly identification and cross-validation module and is configured to quantify the severity of confirmed fault sequences, generate graded early warning information and diagnostic results, and perform model self-iterative optimization based on the fault handling results.
[0057] This embodiment divides the system into a closed-loop chain of data acquisition, preprocessing, dual-dimensional feature extraction, anomaly identification and cross-validation, and early warning output. By fusing the dynamic response characteristics of the transient irradiation range with the static characteristics of the steady-state range, a complete technical system from data input to fault early warning output is constructed to address the physical characteristics of early micro-mismatch faults in photovoltaic modules, thus solving the problem that existing technologies cannot identify early micro-mismatch faults online.
[0058] like Figure 1 As shown:
[0059] The data preprocessing module specifically filters effective data intervals by selecting time-series intervals with irradiance intensity between 200W / m² and 800W / m² and irradiance variation rate ≥ 50W / m²·s as transient effective intervals, and selecting time-series intervals with irradiance intensity between 400W / m² and 1000W / m² and irradiance fluctuation amplitude ≤ 20W / m² within 5 minutes as steady-state effective intervals. Invalid data from nighttime, inverter start-up / stop, power-limited operation, and irradiance exceeding the range are simultaneously removed. This embodiment defines precise filtering rules for effective data intervals. By defining two core data windows—transient effective intervals and steady-state effective intervals—invalid data from nighttime, inverter start-up / stop, and power-limited operation are eliminated. The transient interval matches the physical law of the manifestation of nonlinear distortion characteristics of early microcracked components under sudden irradiance changes, while the steady-state interval matches the operating conditions under which the inherent output characteristics of the components are stably presented. This achieves targeted enrichment of effective features from the data source.
[0060] like Figure 1 As shown:
[0061] The data preprocessing module performs operating condition disturbance decoupling by employing a fixed-frequency sliding window differential filtering algorithm to eliminate periodic current disturbances of a fixed frequency of 1-2Hz during the inverter MPPT optimization process, while retaining the non-periodic timing characteristics driven by the operating conditions. Data standardization processing uses the Z-score standardization method to eliminate the interference of absolute irradiance and ambient temperature fluctuations on feature extraction. The same-condition benchmark string cluster is a set of normal strings within the same power plant with the same component model, installation orientation, installation tilt angle, and commissioning years. This embodiment defines the core rules for operating condition disturbance decoupling, data standardization, and benchmark cluster construction. By eliminating the inherent periodic disturbances of the inverter MPPT optimization through fixed-frequency sliding window filtering, eliminating the interference of global operating condition fluctuations such as irradiance and temperature through Z-score standardization, and constructing an unbiased feature benchmark through the same-condition benchmark cluster, the core problems of no reference for single strings and the inability to compare features horizontally due to operating condition interference are solved.
[0062] like Figure 1 As shown:
[0063] The current-irradiation response hysteresis difference feature is obtained by calculating the peak response hysteresis time of the target string current time series and the synchronous irradiation time series using a cross-correlation function, and then subtracting it from the baseline hysteresis time of the benchmark string cluster under the same operating conditions. The current time series nonlinear distortion feature is obtained by decomposing the current time series data using Hilbert-Huang transform to obtain the intermediate frequency component, and then calculating the fractal box dimension and sample entropy value of the intermediate frequency component. The benchmark cluster feature dispersion feature is obtained by calculating the dispersion of the target string feature and the baseline feature of the benchmark string cluster using Mahalanobis distance, and then standardizing it. This embodiment defines three core extraction rules for dynamic response features. The current-irradiation response hysteresis difference corresponds to the physical characteristic of the early microcrack component carrier transport efficiency decline. The current time series nonlinear distortion corresponds to the output characteristic distortion law caused by the damage of the component PN junction. The benchmark cluster feature dispersion realizes the operating condition normalization of the feature. The three types of features are progressively enhanced to accurately capture the micro-amplitude fault features hidden in the string-level data.
[0064] like Figure 1 As shown:
[0065] The maximum power point slope distortion feature is obtained by fitting the curve slope change rate of the ±5% voltage range near the maximum power point in the IV sweep frequency segment using the least squares method, and calculating its relative deviation from the benchmark value of the benchmark string cluster under the same operating conditions. The fill factor relative decay feature is obtained by calculating the relative decay of the measured fill factor of the target string and the benchmark fill factor based on the irradiance-temperature-fill factor benchmark fitting model constructed by the benchmark string cluster. This embodiment defines two core extraction rules for static characteristic features. The maximum power point slope distortion corresponds to the local distortion law of the IV curve caused by the partial microcrack of the PN junction of the module, and the fill factor relative decay corresponds to the overall output performance degradation characteristics caused by the early microcrack of the module. Both types of features reuse idle IV sweep frequency segment data generated by the inverter MPPT daily optimization, without triggering a full string IV scan, and do not affect the normal power generation of the power plant at all.
[0066] like Figure 1 As shown:
[0067] The anomaly identification and cross-validation module performs initial anomaly screening by constructing an unsupervised anomaly screening model using an improved isolated forest algorithm. The training set of the model uses the dynamic response feature dataset of normal strings of the same model within one year of operation in the power plant. When the model output anomaly score of the target string within a single transient effective interval is ≥0.85 and all three types of dynamic response features exceed the benchmark confidence interval of the benchmark string cluster under the same operating conditions, it is judged as a single interval valid anomaly. Strings with a valid anomaly interval ratio of ≥80% within a continuous 7-day sliding time window are included in the list of anomaly strings in the initial screening. This embodiment uses the improved isolated forest algorithm to construct the model, which can be trained using only the dynamic feature data of newly commissioned normal strings, without the need for massive fault labeling samples, and is suitable for the industry status of most photovoltaic power plants that do not have early fault labeling data. At the same time, through the single interval judgment rule and the 7-day sliding time window rule, false anomalies caused by occasional operating condition fluctuations are filtered out.
[0068] like Figure 1 As shown:
[0069] The cross-validation and final confirmation performed by the anomaly identification and cross-validation module specifically involves adding strings from the initial screening of abnormal strings to the early warning candidate pool if both types of static characteristics continuously exceed the baseline confidence interval of the benchmark string cluster under the same operating conditions within a 7-day sliding time window. For strings in the early warning candidate pool, the timing synchronization of dynamic response characteristics and static characteristics is verified. If the timing of the two types of abnormal characteristics is highly matched, the string is finally confirmed as an early microcrack micro-mismatch fault string of the photovoltaic module. In this embodiment, the initial screening of abnormal strings is first verified by static characteristics, and then the final confirmation is completed by timing synchronization verification of dynamic and static characteristics. The core logic is that external operating condition disturbances only affect the dynamic response characteristics of the module and do not change the static output characteristics of the module itself. Only microcrack damage to the module itself will cause the timing matching anomaly of the two types of characteristics to occur simultaneously.
[0070] like Figure 1 As shown:
[0071] The early warning output unit uses the entropy weight method to determine the weight of each core feature and constructs a micro-mismatch severity index with a value range of 0-10. The graded early warning rule is that a severity index of 3-5 is a level 1 early warning, 5-8 is a level 2 early warning, and 8-10 is a level 3 early warning. The early warning information and diagnostic results include the unique identifier of the fault string, the early warning level, the severity index, the core feature data, and the handling priority. In this embodiment, the entropy weight method is used to objectively determine the weight of each core feature and construct a micro-mismatch severity index in the range of 0-10, avoiding the subjective bias of manual weighting. At the same time, a three-level graded early warning rule is set to match the operation and maintenance handling strategies corresponding to different fault degrees, and output diagnostic results containing all dimensions of fault information to form a complete operation and maintenance closed loop.
[0072] Example 2, illustrated below with a specific case:
[0073] The system is applied to a 10MW ground-mounted string photovoltaic power station. The power station is equipped with 10 string inverters with a rated power of 500kW. Each inverter is connected to 20 photovoltaic strings, and each string is connected in series with 22 PERC monocrystalline silicon photovoltaic modules. The peak power of the modules is 550W. The overall service life of the power station is 3 years.
[0074] All inverters are equipped with DC-side high-frequency data acquisition capability of 10Hz and above, and generate small-amplitude IV sweep frequency segment data every 5 minutes during the MPPT optimization process.
[0075] 1. First, multi-source data collection is performed. Specific data sources include:
[0076] (1) 10Hz string-level high-frequency current and DC voltage synchronous timing data collected on the DC side of the string inverter, collected continuously for 24 hours.
[0077] (2) Small-amplitude IV sweep frequency segment data periodically generated during the inverter MPPT optimization process, with the voltage sweep range of a single segment being ±10% of the maximum power point voltage, and ≥50 valid data points per segment;
[0078] (3) Second-level total horizontal irradiance and ambient temperature data collected by the power station's meteorological station;
[0079] (4) Basic data of the string, including the component model corresponding to the string, the number of components in the string, the installation orientation (due south), the installation tilt angle (25°), and the commissioning time.
[0080] 2. Then, data preprocessing is performed, and the specific process is as follows:
[0081] Valid data range filtering, with strict adherence to filtering rules:
[0082] Transient effective range: Screening time series with irradiance intensity between 200W / m² and 800W / m² and irradiance change rate ≥ 50W / m²·s;
[0083] Steady-state effective range: Screening time series with irradiance intensity between 400W / ㎡ and 1000W / ㎡ and irradiance fluctuation within 5 minutes ≤20W / ㎡;
[0084] Invalid data from nighttime, inverter start / stop, power-limited operation, and irradiation exceeding the range are simultaneously removed;
[0085] Operating condition disturbance decoupling: A fixed-frequency sliding window differential filtering algorithm is adopted to eliminate the periodic current disturbance of 1.5Hz fixed frequency during the inverter MPPT optimization process, while retaining the non-periodic timing characteristics of the operating condition drive. The filtering window length is set to 20 sampling points.
[0086] Data standardization: The Z-score standardization method is used to normalize the time-series data to eliminate the interference of absolute irradiance and ambient temperature fluctuations on feature extraction. The calculation formula is as follows:
[0087]
[0088] in, These are the standardized eigenvalues. For the original value of the feature, To be used as the mean of the corresponding features of the benchmark string cluster. To determine the standard deviation of the corresponding features of the benchmark string cluster;
[0089] Construction of a benchmark string cluster under the same operating conditions: A set of normal strings with the same component model, installation orientation, installation tilt angle, and commissioning year within the same power plant is constructed as a characteristic benchmark reference system. In this embodiment, 120 normal strings with a commissioning year of less than 1 year and no microcrack faults detected by EL are selected to form a benchmark string cluster.
[0090] 3. Next, dual-dimensional feature extraction is performed, which is executed by the dual-dimensional feature extraction module and is divided into two parallel branches, as follows:
[0091] Dynamic response feature extraction (based on transient effective interval data) extracts three types of core features, and the specific calculation method is as follows:
[0092] (1) Characteristics of current-irradiation response hysteresis:
[0093] The peak response lag time of the target string current timing and the synchronous irradiation timing is calculated using a cross-correlation function, and then subtracted from the baseline lag time of the benchmark string cluster. The calculation formula is as follows:
[0094]
[0095] in, The current-irradiation response hysteresis difference is expressed in milliseconds. The peak response lag time of the target string current timing and synchronous irradiation timing is expressed in milliseconds. To establish a benchmark lag time for string clusters operating under the same conditions, this embodiment... ;
[0096] In this embodiment, the measured peak response lag time of target string A Calculations yielded This exceeds the baseline confidence interval (0-20ms) of the benchmark cluster.
[0097] (2) Current time-series nonlinear distortion characteristics
[0098] The Hilbert-Huang transform is performed on the current time series data within the transient effective interval to decompose it into multiple intrinsic mode functions. After removing high-frequency noise components and low-frequency irradiance trend components, the intermediate frequency components are extracted. The fractal box dimension and sample entropy value of the intermediate frequency components are calculated to form a nonlinear distortion index. In this embodiment, the nonlinear distortion of the target string A is 2.37, which exceeds the benchmark confidence interval (0.8-1.5) of the benchmark cluster.
[0099] (3) Benchmarking cluster characteristics and dispersion characteristics:
[0100] The dispersion of the target string features and the benchmark string cluster features is calculated using Mahalanobis distance, and then standardized. The calculation formula is as follows:
[0101]
[0102] in, This refers to the Mahalanobis distance (characteristic dispersion). The feature vector of the target string. To be used as the feature mean vector for benchmarking string clusters, To be used as the feature covariance matrix for benchmark string clusters;
[0103] In this embodiment, the feature dispersion of the target string A =4.62, which exceeds the baseline confidence interval (0-2.5) of the benchmark cluster.
[0104] Static characteristic feature extraction (based on steady-state effective interval data) extracts two types of core features, and the specific calculation method is as follows:
[0105] (1) Characteristics of slope distortion at maximum power point:
[0106] For the IV sweep frequency segment data within the steady-state effective range, the curve segment within the ±5% voltage range near the maximum power point is extracted. The slope change rate of the curve in this range is fitted using the least squares method, and its relative deviation from the benchmark value of the target string cluster is calculated to obtain the slope distortion index. In this embodiment, the slope distortion of the target string A is 18.7%, which exceeds the benchmark confidence interval of the target string cluster (-5%~5%).
[0107] (2) Relative decay characteristics of fill factor:
[0108] Based on the benchmark string cluster, an irradiation-temperature-fill factor benchmark fitting model is constructed to calculate the relative attenuation of the measured fill factor of the target string compared with the benchmark fill factor. The calculation formula is as follows:
[0109]
[0110] in, This represents the relative decay of the fill factor, expressed as a percentage (%). The baseline fill factor for the current operating conditions is the output of the baseline fitting model. The measured fill factor for the target string;
[0111] In this embodiment, the measured fill factor of target string A =72.3%, baseline fill factor =77.1%, calculated as follows =6.23%, exceeding the baseline confidence interval (0-3%) of the benchmark cluster.
[0112] 4. Next, anomaly detection and cross-validation are performed. This is done through the anomaly detection and cross-validation module, and the specific process is as follows:
[0113] (1) Unsupervised initial screening of anomalies:
[0114] An unsupervised anomaly screening model was constructed using an improved isolated forest algorithm. The training set of the model used the dynamic response feature dataset of the above 120 normal strings. The number of trees was set to 100 and the number of subsamples was set to 256. The model inference outputs the anomaly score of the target string, with the score range of 0-1 and the anomaly judgment threshold set to 0.85.
[0115] Judgment rules:
[0116] If, within a single transient valid interval, the target string anomaly score is ≥0.85 and all three types of dynamic response features exceed the benchmark confidence interval of the benchmark cluster, it is judged as a single interval valid anomaly.
[0117] Within a 7-day sliding time window, strings with an effective abnormal interval ratio of ≥80% are included in the initial screening list of abnormal strings.
[0118] In this embodiment, the effective abnormal interval within the 7-day sliding window of target string A accounts for 92%, and is included in the initial screening abnormal string list.
[0119] (2) Two-dimensional cross-validation and final confirmation:
[0120] Cross-validation: For strings in the initial abnormal string list, if both types of static characteristics continuously exceed the baseline confidence interval of the benchmark cluster within a 7-day sliding time window, they are included in the early warning candidate pool; in this embodiment, if both types of static characteristics of target string A continuously exceed the baseline interval, it enters the early warning candidate pool.
[0121] Timing synchronization verification: For strings in the early warning candidate pool, perform abnormal timing synchronization verification of dynamic response characteristics and static characteristic characteristics. If the overlap of abnormal timing of the two types of characteristics is ≥90%, it is judged as a high timing match and finally confirmed as a string with early microcrack micro-mismatch fault in photovoltaic modules.
[0122] In this embodiment, the abnormal timing overlap of the target string A is 94%, and it is finally confirmed as a faulty string.
[0123] 5. Finally, the severity of the fault is quantified and an early warning is output. This is executed through the early warning output unit, as follows:
[0124] The entropy weight method is used to determine the weights of each core feature, and the Micro Mismatch Severity Index (MSI) is constructed. The calculation formula is as follows:
[0125]
[0126] in, This is a micromismatch severity index, with a value ranging from 0 to 10; The weight corresponding to the i-th core feature is determined by the entropy weight method; For the standardized first The core features include: current-irradiation response hysteresis difference, current time-series nonlinear distortion, benchmark cluster feature dispersion, maximum power point slope distortion, and fill factor relative attenuation.
[0127] In this embodiment, the weights of the five features calculated using the entropy weight method are 0.28, 0.22, 0.18, 0.17, and 0.15, respectively. The standardized feature values of the target string A are 1.8, 1.74, 1.85, 1.74, and 1.74, respectively. =4.21$$;
[0128] The tiered early warning rules are as follows:
[0129] Level 1 Warning: If the value is in the 3-5 range, it is determined to be an early micro-mismatch fault.
[0130] Level 2 warning: If the value is in the 5-8 range, it is determined to be a mid-term mismatch fault.
[0131] Level 3 Warning: If the value is in the 8-10 range, it is considered a severe mismatch fault.
[0132] In this embodiment, the target string A =4.21, triggering a Level 1 warning. The system outputs warning information and diagnostic results, including the unique identifier of the fault string, warning level, severity index, core characteristic data, and handling priority.
[0133] In summary, as Figure 2-4 As shown in Embodiment 2 of this application, a photovoltaic power station fault early warning and information diagnosis system based on big data is provided. The target string A is detected by on-site EL testing and it is confirmed that two modules in the same string have early hidden crack faults, which is completely consistent with the system's early warning result. After the maintenance personnel replace the faulty modules, the handling result is fed back to the system. The system updates the model training set and the benchmark parameters of the benchmark cluster, and completes the model's self-iterative optimization.
[0134] In this embodiment, the system outputs 24 sets of fault warnings within a 30-day monitoring period. After on-site EL testing and verification, 23 sets of warnings showed early microcracks in the components, with an identification accuracy of 95.83% and a false alarm rate of 0.87% under steady-state conditions. The system can identify early microcracks in the components 10 months in advance, fully realizing the technical effect of the present invention and verifying the feasibility and superiority of the solution.
[0135] In the description of this specification, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of that feature. In the description of this application, "multiple" means at least two, such as two, three, etc., unless otherwise explicitly specified.
[0136] In the description of this specification, the references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., refer to specific features, structures, materials, or characteristics described in connection with that embodiment or example, which are included in at least one embodiment or example of this application. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples. Moreover, without contradiction, those skilled in the art can combine and integrate the different embodiments or examples described in this specification, as well as the features of different embodiments or examples.
[0137] Although embodiments of this application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting this application. Those skilled in the art can make changes, modifications, substitutions and variations to the above embodiments within the scope of this application.
Claims
1. A photovoltaic power plant fault early warning and information diagnosis system based on big data, characterized in that, It includes a data acquisition unit, a data processing unit, and an early warning output unit. The data processing unit includes a data preprocessing module, a two-dimensional feature extraction module, and an anomaly identification and cross-validation module that are connected in sequence. The data preprocessing module collects native multi-source time-series data and string basic ledger data of string photovoltaic power stations, performs effective data interval filtering, operating condition disturbance decoupling, data standardization processing, and constructs a string cluster with the same operating conditions. The native multi-source timing data includes 10Hz and above string-level high-frequency current and DC voltage synchronization timing data collected on the DC side of the string inverter, small-amplitude IV sweep frequency segment data periodically generated during the inverter MPPT optimization process, and power station-level second-level horizontal total irradiance and ambient temperature acquisition data. The basic ledger data for the string includes the component model corresponding to the string, the number of components in the string, the installation orientation, the installation tilt angle, and the commissioning time. The dual-dimensional feature extraction module extracts two types of specific fault features. The first type is the dynamic response features within the effective range of irradiation transient, including current and irradiation response hysteresis difference features, current time series nonlinear distortion features, and benchmark cluster feature dispersion features. The second type is the static characteristic features within the effective range of irradiation steady state, including maximum power point slope distortion features and fill factor relative decay features. The anomaly identification and cross-validation module is configured to perform initial screening of string anomalies based on the dynamic response features, perform cross-validation of the initially screened abnormal strings based on the static characteristic features, and complete the final confirmation of early microcrack micro-mismatch faults of photovoltaic modules through time synchronization verification. The early warning output unit is communicatively connected to the anomaly identification and cross-validation module, and is configured to perform fault severity quantification on the confirmed fault strings, generate graded early warning information and diagnostic results, and perform model self-iterative optimization based on the fault handling results.
2. The photovoltaic power plant fault early warning and information diagnosis system based on big data according to claim 1, characterized in that, The effective data range filtering performed by the data preprocessing module specifically involves filtering time-series intervals with irradiance intensity between 200W / m² and 800W / m² and irradiance variation rate ≥ 50W / m²·s as transient effective intervals, and filtering time-series intervals with irradiance intensity between 400W / m² and 1000W / m² and irradiance fluctuation amplitude ≤ 20W / m² within 5 minutes as steady-state effective intervals. Invalid data from nighttime, inverter start-up and shutdown, power-limited operation, and irradiance exceeding the range are simultaneously removed.
3. The photovoltaic power plant fault early warning and information diagnosis system based on big data according to claim 1, characterized in that, The data preprocessing module performs operating condition disturbance decoupling by using a fixed-frequency sliding window differential filtering algorithm to remove periodic current disturbances of a fixed frequency of 1-2Hz during the inverter MPPT optimization process, while retaining the non-periodic timing characteristics of the operating condition drive. The data standardization process uses the Z-score standardization method to eliminate the interference of absolute irradiance and ambient temperature fluctuations on feature extraction. The same operating condition benchmark string cluster is a set of normal strings within the same power plant with the same component model, installation orientation, installation tilt angle, and commissioning years.
4. The photovoltaic power plant fault early warning and information diagnosis system based on big data according to claim 1, characterized in that, The current and irradiation response hysteresis difference feature is obtained by calculating the peak response hysteresis time of the target string current time series and the synchronous irradiation time series using a cross-correlation function, and then subtracting it from the baseline hysteresis time of the benchmark string cluster under the same operating conditions. The current time series nonlinear distortion feature is obtained by decomposing the current time series data using Hilbert-Huang transform to obtain the intermediate frequency component, and then calculating the fractal box dimension and sample entropy value of the intermediate frequency component. The benchmark cluster feature dispersion feature is obtained by calculating the dispersion of the target string features and the benchmark features of the benchmark string cluster using Mahalanobis distance, and then standardizing the result.
5. The photovoltaic power plant fault early warning and information diagnosis system based on big data according to claim 1, characterized in that, The maximum power point slope distortion characteristic is obtained by fitting the curve slope change rate of the ±5% voltage range near the maximum power point in the IV sweep frequency band using the least squares method, and calculating its relative deviation from the benchmark value of the benchmark string cluster under the same operating conditions; the relative attenuation characteristic of the fill factor is obtained by calculating the relative attenuation of the measured fill factor of the target string and the benchmark fill factor based on the irradiation-temperature-fill factor benchmark fitting model constructed by the benchmark string cluster.
6. The photovoltaic power plant fault early warning and information diagnosis system based on big data according to claim 1, characterized in that, The anomaly identification and cross-validation module performs an initial screening of anomalies by constructing an unsupervised anomaly screening model using an improved isolated forest algorithm. The training set of the model uses the dynamic response feature dataset of normal strings of the same model that have been in operation for less than one year within the power plant. When the model output anomaly score of the target string within a single transient effective interval is ≥0.85 and all three types of dynamic response features exceed the baseline confidence interval of the benchmark string cluster under the same operating conditions, it is determined to be a single interval effective anomaly. Strings with an effective anomaly interval ratio of ≥80% within a continuous 7-day sliding time window are included in the list of anomaly strings in the initial screening.
7. The photovoltaic power plant fault early warning and information diagnosis system based on big data according to claim 1, characterized in that, The cross-validation and final confirmation performed by the anomaly identification and cross-validation module specifically involves the following steps: for strings in the initial screening list of abnormal strings, if both types of static characteristics continuously exceed the baseline confidence interval of the benchmark string cluster under the same operating conditions within a 7-day sliding time window, they are included in the early warning candidate pool. For strings in the early warning candidate pool, the time-series synchronization of dynamic response characteristics and static characteristics is verified. If the time-series anomalies of the two types of characteristics are highly matched, they are finally confirmed as early microcrack micro-mismatch fault strings of photovoltaic modules.
8. The photovoltaic power plant fault early warning and information diagnosis system based on big data according to claim 1, characterized in that, The early warning output unit uses the entropy weight method to determine the weight of each core feature and constructs a micro-mismatch severity index with a value range of 0-10. The graded early warning rule is that a severity index of 3-5 is a level 1 early warning, 5-8 is a level 2 early warning, and 8-10 is a level 3 early warning. The early warning information and diagnostic results include a unique identifier of the fault string, the early warning level, the severity index, core feature data, and the handling priority.