A method and system for calculating the power generation duration of a photovoltaic power station

By collecting solar irradiance and meteorological parameters of photovoltaic power plants, and combining them with real-time output current and voltage data, a power generation efficiency decay curve is constructed using physical-driven and data-driven models. Effective power generation periods are identified and predicted, solving the problem of calculation result deviation in existing technologies and realizing accurate calculation of the power generation duration of photovoltaic power plants.

CN122247337APending Publication Date: 2026-06-19ZHEJIANG ZHONGJIA ELECTRIC POWER TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
ZHEJIANG ZHONGJIA ELECTRIC POWER TECHNOLOGY CO LTD
Filing Date
2026-03-06
Publication Date
2026-06-19

Smart Images

  • Figure CN122247337A_ABST
    Figure CN122247337A_ABST
Patent Text Reader

Abstract

This invention discloses a method and system for calculating the power generation duration of a photovoltaic (PV) power station. The method includes: collecting time-series data of solar irradiance and meteorological parameter sequences of the area where the PV power station is located, and simultaneously acquiring real-time output current and voltage data at the PV string level; dynamically constructing the power generation efficiency decay curve of the PV string through a fusion model of physical and data-driven approaches based on the solar irradiance time-series data and meteorological parameter sequences; using the power generation efficiency decay curve to perform time-varying efficiency correction on the real-time output current and voltage data, generating a standardized power generation sequence that eliminates the influence of environmental fluctuations and equipment aging; calculating the duration for which the cumulative power exceeds a set threshold within each power generation day based on the standardized power generation sequence; and generating a power generation duration forecast result by combining historical power generation duration data and meteorological forecast information. Using this invention, accurate identification of power generation periods can be achieved, improving the accuracy and adaptability of PV power station power generation duration calculation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of photovoltaic technology, specifically a method and system for calculating the power generation duration of a photovoltaic power station. Background Technology

[0002] Accurate calculation of the power generation duration of photovoltaic (PV) power plants is crucial for power plant operation and maintenance, power dispatch, and benefit assessment. Existing methods typically estimate based on fixed rated power or simple irradiance thresholds, but such static models are ill-suited to the complex operating conditions of PV modules in reality. PV modules are affected by multiple meteorological factors such as temperature, humidity, and dust accumulation during actual operation. Furthermore, with increasing years of operation, performance degradation due to equipment aging causes the actual output power to continuously deviate from the theoretical value. Traditional calculation methods lack the ability to dynamically perceive the time-varying performance degradation of modules and cannot effectively isolate the interference of environmental fluctuations and equipment aging on the power generation duration, resulting in significant discrepancies between the calculated results and actual conditions. In addition, existing methods usually use fixed thresholds to determine the effective power generation period, which are difficult to adapt to the dynamic changes under different seasons and weather conditions, affecting the accuracy and reliability of power generation duration assessment. Summary of the Invention

[0003] The purpose of this invention is to provide a method and system for calculating the power generation duration of a photovoltaic power station, so as to overcome the shortcomings of the prior art, achieve accurate identification of power generation periods, and improve the accuracy and adaptability of calculating the power generation duration of a photovoltaic power station.

[0004] One embodiment of this application provides a method for calculating the power generation duration of a photovoltaic power plant, the method comprising: Collect time-series data of solar irradiance and meteorological parameter sequences in the area where the photovoltaic power station is located, and simultaneously acquire real-time output current and voltage data at the photovoltaic string level; Based on the solar irradiance time series data and meteorological parameter sequence, the power generation efficiency decay curve of the photovoltaic string is dynamically constructed through a physical-driven and data-driven fusion model. The curve represents the actual conversion efficiency change of the string under different time periods and meteorological conditions. The power generation efficiency decay curve is used to perform time-varying efficiency correction on real-time output current and voltage data, generating a standardized power generation sequence that eliminates the effects of environmental fluctuations and equipment aging. Based on the standardized power generation sequence, an adaptive threshold detection algorithm is used to identify the start and end points of the effective power generation period each day, and to calculate the duration during which the cumulative power exceeds a set threshold within each power generation day. By combining historical power generation duration data with meteorological forecast information, the theoretical daily power generation duration of photovoltaic power plants within a specified future period is extrapolated through a time series prediction model, and power generation duration forecast results including confidence intervals are generated.

[0005] Another embodiment of this application provides a photovoltaic power plant power generation duration calculation system, the system comprising: The data acquisition module is used to collect time-series data of solar irradiance and meteorological parameter sequences in the area where the photovoltaic power station is located, and simultaneously acquire real-time output current and voltage data at the photovoltaic string level; The module is used to dynamically construct the power generation efficiency decay curve of the photovoltaic string based on the solar irradiance time series data and meteorological parameter sequence through a physical-driven and data-driven fusion model. The curve represents the change in the actual conversion efficiency of the string under different time periods and meteorological conditions. The correction module is used to perform time-varying performance correction on the real-time output current and voltage data using the power generation efficiency decay curve, and generate a standardized power generation sequence that eliminates the effects of environmental fluctuations and equipment aging. The identification module is used to identify the start and end points of the effective power generation period of each day based on the standardized power generation sequence using an adaptive threshold detection algorithm, and to calculate the duration during which the cumulative power exceeds a set threshold in each power generation day. The extrapolation module combines historical power generation duration data with meteorological forecast information to extrapolate the theoretical daily power generation duration of photovoltaic power plants within a specified future period using a time-series prediction model, and generates power generation duration forecast results including confidence intervals.

[0006] Another embodiment of this application provides a storage medium storing a computer program, wherein the computer program is configured to execute the method described in any of the preceding claims when running.

[0007] Another embodiment of this application provides an electronic device including a memory and a processor, wherein the memory stores a computer program and the processor is configured to run the computer program to perform the method described in any of the preceding claims.

[0008] Compared with existing technologies, the photovoltaic power plant power generation duration calculation method provided by this invention can accurately identify power generation periods and improve the accuracy and adaptability of photovoltaic power plant power generation duration calculation. Attached Figure Description

[0009] Figure 1 A hardware structure block diagram of a computer terminal for a photovoltaic power station power generation duration calculation method provided in an embodiment of the present invention; Figure 2 A flowchart illustrating a method for calculating the power generation duration of a photovoltaic power station, provided in an embodiment of the present invention; Figure 3 This is a schematic diagram of a photovoltaic power plant power generation duration calculation system provided in an embodiment of the present invention. Detailed Implementation

[0010] The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present invention, and should not be construed as limiting the present invention.

[0011] The present invention first provides a method for calculating the power generation time of a photovoltaic power station. This method can be applied to electronic devices, such as computer terminals, specifically ordinary computers.

[0012] The following detailed explanation uses a computer terminal as an example. Figure 1 This is a hardware structure block diagram of a computer terminal for a photovoltaic power station power generation duration calculation method provided in an embodiment of the present invention. (See diagram below.) Figure 1 As shown, the computer device includes a processor, memory, and network interface connected via a system bus, wherein the memory may include non-volatile storage media and internal memory.

[0013] See Figure 2 The present invention provides a method for calculating the power generation duration of a photovoltaic power plant, which may include the following steps: S201 collects time-series data of solar irradiance and meteorological parameter sequences of the area where the photovoltaic power station is located, and simultaneously acquires real-time output current and voltage data at the photovoltaic string level; Specifically, a distributed environmental monitoring network can be used to collect data on solar irradiance, ambient temperature, humidity, wind speed, and cloud cover at a fixed sampling frequency to generate a raw environmental monitoring time series dataset. The core of this step is to construct an environmental monitoring system covering the entire area of ​​the photovoltaic power station. This system collects key meteorological parameters at a fixed frequency to generate a time-series coherent raw environmental dataset, providing a fundamental data source for subsequent analysis of the impact of the environment on power generation efficiency. The specific implementation method is as follows: The distributed environmental monitoring network consists of multiple monitoring nodes, evenly distributed across different areas of the photovoltaic power station, covering the station's edges, center, and areas with dense photovoltaic strings. This ensures that the collected meteorological parameters comprehensively reflect the overall and local environmental differences of the power station, avoiding the bias caused by data collection from a single node. Based on the typical scale of a photovoltaic power station, five monitoring nodes are selected: one at the center and one on each of the four perimeters. The spacing between nodes is adjusted according to the power station's area to ensure comprehensive monitoring without blind spots.

[0014] Setting a fixed sampling frequency requires balancing data accuracy and processing burden. The changes in meteorological parameters (especially solar irradiance) of photovoltaic power plants are gradual. If the sampling frequency is too high, it will increase data redundancy and storage pressure. If it is too low, it will be unable to capture the dynamic changes of key parameters. Based on engineering practice, the sampling frequency is set to 1 time / minute (i.e., 60 seconds / time). This frequency can not only fully capture the temporal trend of environmental parameters, but also control the amount of data within a reasonable range.

[0015] The five core meteorological parameters collected are explained in detail below: Solar irradiance reflects the intensity of solar radiation and is the core energy source for photovoltaic power generation, measured in watts per square meter (W / m²). 2 The sampling range is set to 0-2000W / m. 2 Accuracy ±10W / m 2 It can cover various weather scenarios such as sunny, cloudy, and overcast days; ambient temperature affects the conversion efficiency of photovoltaic modules, and excessively high temperatures will lead to efficiency degradation. The unit is degrees Celsius (°C), with a collection range of -40°C to 85°C and an accuracy of ±0.5°C, adapting to extreme temperature conditions in different regions; relative humidity reflects the water vapor content in the air, and excessively high humidity will accelerate module aging. The unit is percentage (%RH), with a collection range of 0-100%RH and an accuracy of ±5%RH; wind speed affects heat dissipation and dust accumulation on the module surface. The unit is meters per second (m / s), with a collection range of 0-60m / s and an accuracy of ±0.3m / s; cloud cover rate directly affects the stability of solar irradiance. The unit is percentage (%), with a collection range of 0-100% and an accuracy of ±10%, and data is collected through image recognition combined with sensor monitoring.

[0016] During the data acquisition process, each monitoring node simultaneously collects five parameters and records the acquisition timestamp. The timestamp uses the unified UTC time format with millisecond precision to ensure time consistency across all nodes. In the example, at a certain moment (2024-07-01 08:00:00.123), the data collected by the central monitoring node is: solar irradiance 780W / m². 2 The ambient temperature was 28.5℃, relative humidity was 62%RH, wind speed was 1.2m / s, and cloud cover was 20%. The data collected from the surrounding nodes were slightly different (e.g., the temperature at the east node was 28.7℃ and the wind speed was 1.4m / s). The data collected from all nodes were integrated in chronological order according to timestamps. Each record included a timestamp, node number, and five meteorological parameters to generate the original environmental monitoring time series dataset. The dataset was continuously updated during the collection process and stored in a temporary storage unit in real time.

[0017] Data cleaning was performed on the original environmental monitoring time series dataset. An abnormal data point was identified and removed using a sliding window anomaly detection algorithm. Time series interpolation methods were applied to fill in missing values ​​and generate a cleaned environmental parameter sequence. The core of this step is to eliminate outliers and missing values ​​in the original environmental data, improve data quality, ensure the accuracy of subsequent model analysis, and avoid performance curve deviations caused by outlier data. The specific implementation method is as follows: The core purpose of data cleaning is to remove abnormal data from the original data caused by sensor malfunctions and environmental interference (such as bird obstruction or momentary lightning strikes), and to supplement missing values ​​caused by communication interruptions and equipment restarts, so as to ensure the time-series continuity of environmental parameter sequences and the reliability of the data. The abnormal data in the original environmental monitoring time-series dataset mainly manifests as abrupt changes in parameters (such as solar irradiance changing from 800 W / m²). 2 The values ​​that drop instantly to 0, parameters that exceed a reasonable range (such as when the temperature reaches 90℃), and missing values ​​are mainly manifested as the parameter corresponding to a certain timestamp being empty.

[0018] The sliding window anomaly detection algorithm is the core algorithm for identifying outlier data points. Its core logic involves setting a fixed-size sliding window, calculating the mean and standard deviation of the data within the window, and identifying any data point whose absolute difference from the mean exceeds a preset threshold. The window size is set to 5 data points (i.e., 5 minutes) to balance local data trends and anomaly detection sensitivity; a window that is too large will mask sudden anomalies, while a window that is too small will easily misjudge normal fluctuations as anomalies. The anomaly threshold is set to 3 times the standard deviation (3σ), based on the statistical law of normal distribution. This threshold can eliminate more than 99.7% of extreme outlier data while reducing the probability of false positives.

[0019] In the example, the time series data of solar irradiance for a certain period are 780, 785, 790, 10, and 795 W / m². 2 The mean value of the sliding window (5 data points) is 512 W / m. 2 The standard deviation is 386 W / m 2 Three times the standard deviation is 1158 W / m 2 10W / m 2 The absolute value of the difference from the mean is 502 W / m 2 Less than 1158W / m 2 However, considering the operating conditions (08:00-08:04, normal solar radiation), the solar irradiance could not have suddenly dropped to 50W / m². 2 Therefore, the following supplementary operating condition criteria are added: when the solar irradiance suddenly drops to 50 W / m² within one hour after sunrise to one hour before sunset. 2 The following data are all within 700W / m 2 If the value is above 10W / m, it is considered abnormal data and will be adjusted accordingly. 2 Remove.

[0020] Missing value imputation employs a time-series linear interpolation method. This method is suitable for scenarios where a single point is missing in time-series data. The core logic is to calculate the missing value based on the two adjacent valid data points before and after the missing value, proportional to the time interval, ensuring that the imputed data conforms to the time-series trend and avoiding data gaps. Interpolation assumes that there is valid data before and after the missing value, and the missing duration does not exceed 10 minutes (i.e., 10 data points). If the missing duration is too long, it needs to be marked as data anomaly and processed separately. In the example, the ambient temperature data at 08:05 is missing. The temperature at the previous moment (08:04) was 28.6℃, and the temperature at the next moment (08:06) was 28.8℃, with a time interval of 1 minute. The interpolated missing value is calculated as (28.6 + 28.8) / 2 = 28.7℃, which is then added to the dataset.

[0021] After data cleaning, all valid data are integrated and arranged in time stamp order to generate a cleaned environmental parameter sequence. This sequence is free of anomalies and missing data, and is sequential. Each time stamp corresponds to five complete meteorological parameters. Data cleaning markers (normal, anomaly removed, missing data interpolated) are also added to facilitate subsequent traceability. The cleaned sequence is stored in a data storage unit for subsequent data fusion.

[0022] The intelligent data acquisition unit of the photovoltaic string synchronously collects the real-time output current and voltage data of each string, ensuring that the acquisition timestamp is strictly aligned with the environmental monitoring data, and generates the original sequence of string-level electrical parameters. The core of this step is to synchronously collect the electrical parameters of the photovoltaic strings, ensuring time consistency with environmental data, and constructing string-level time-series electrical parameter data. This provides support for subsequent analysis of the correlation between string power generation efficiency and environmental parameters. The specific implementation method is as follows: The intelligent data acquisition unit is the core component for collecting the electrical parameters of photovoltaic strings. Each photovoltaic string corresponds to an independent acquisition unit, which is installed at the output end of the string. It can capture the output status of the string in real time. The acquisition objects are the real-time output current and voltage of the string. These two parameters are the core basis for calculating the output power of the string and analyzing the power generation efficiency.

[0023] The current and voltage acquisition parameters are set as follows: the output current unit is ampere (A), the acquisition range is 0-10A, and the accuracy is ±0.01A, which can cover the normal output range of the string (the current of a conventional string is 2-8A); the output voltage unit is volt (V), the acquisition range is 0-1000V, and the accuracy is ±1V, which is compatible with the output voltage of conventional photovoltaic strings (300-800V); the acquisition frequency is consistent with the environmental monitoring data, which is 1 time / minute (60 seconds / time), to ensure that the acquisition rhythm of electrical parameters and environmental parameters is synchronized.

[0024] Strict timestamp alignment is a core requirement. Excessive timestamp deviation will lead to inaccurate correlation between environmental and electrical parameters in subsequent analyses, resulting in performance analysis bias. The specific implementation involves the intelligent data acquisition unit and the environmental monitoring network using a unified time synchronization protocol, periodically (every 10 minutes) performing time calibration to ensure that the system time deviation between the two does not exceed 100 milliseconds. During data acquisition, the intelligent data acquisition unit and the environmental monitoring nodes synchronously trigger the acquisition action, recording the same timestamp (UTC format, millisecond precision). If a time deviation exceeding the threshold is detected, an emergency calibration is immediately triggered, pausing acquisition until time alignment is achieved.

[0025] In the example, a photovoltaic string (numbered 1#) recorded an output current of 8.52A and a voltage of 380.2V at 08:00:00.120. The timestamp of the central environmental monitoring node was 08:00:00.123 at the same time, with a deviation of 3 milliseconds, which meets the requirements. The current and voltage data of this string are then associated and recorded with the timestamp and string number. If the deviation between the timestamp of a string and the environmental data is 150 milliseconds, calibration is immediately triggered. After calibration, the data is re-acquired to ensure time alignment.

[0026] After the electrical parameters of all photovoltaic strings are collected, they are integrated in time stamp order. Each record includes a timestamp, string number, output current, and output voltage to generate a raw sequence of string-level electrical parameters. This sequence is stored in categories according to string number to facilitate the subsequent extraction of electrical parameter data of individual strings or all strings. At the same time, the working status of the acquisition unit (normal or abnormal) is marked. If the acquisition unit fails, the corresponding string data is marked as abnormal and will be cleaned and processed later.

[0027] The cleaned environmental parameter sequence is spatiotemporally registered and fused with the original string-level electrical parameter sequence to construct an environmental-electrical parameter fusion dataset with a unified time reference.

[0028] The core of this step is to eliminate the temporal and spatial discrepancies between environmental parameters and electrical parameters, integrating the two types of data into a unified dataset. This ensures that the environmental conditions corresponding to each timestamp accurately correspond to the string electrical parameters, providing synchronous data support for the subsequent construction of power generation efficiency degradation curves. The specific implementation method is as follows: Spatiotemporal registration fusion includes two core components: time registration and spatial registration. The purpose of time registration is to ensure that the time bases of the two types of data are completely consistent, while the purpose of spatial registration is to ensure that the electrical parameters of each string are associated with the environmental parameters of the corresponding region, so as to avoid data mismatch caused by spatial differences.

[0029] The specific implementation process of time registration is as follows: First, extract all timestamps of the cleaned environmental parameter sequence and the original string-level electrical parameter sequence, construct a timestamp comparison list, and compare the timestamps of the two sequences one by one, removing abnormal timestamp data with a deviation of more than 100 milliseconds (such data cannot achieve precise correspondence); for timestamps with a deviation within the allowable range (≤100 milliseconds), use the timestamps of the environmental parameter sequence as the benchmark, and fine-tune the timestamps of the string-level electrical parameter sequence (retaining millisecond-level precision) to ensure that the two are completely consistent; for timestamps that only exist in a certain type of sequence (such as environmental data having them but electrical parameter data not having them), mark them as missing data, and supplement or remove them according to the interpolation method in step two, finally achieving complete time synchronization of the two types of data.

[0030] The specific implementation process of spatial registration is as follows: Based on the location of each node in the distributed environmental monitoring network, the coverage area of ​​each monitoring node is divided, and the photovoltaic strings are classified according to their geographical location. Each string corresponds to the nearest environmental monitoring node, ensuring that the environmental parameters collected by the node can accurately reflect the meteorological conditions of the string. In the example, the monitoring node at the center of the power station covers 10 photovoltaic strings in the central area of ​​the power station, the monitoring node on the east side covers 8 photovoltaic strings in the eastern area, and so on. The number of each string is associated with the number of the corresponding monitoring node to form a spatial association table.

[0031] During data fusion, the environmental parameters (solar irradiance, temperature, humidity, wind speed, cloud cover) corresponding to each timestamp are integrated with the electrical parameters (current, voltage) of all photovoltaic strings at that timestamp, following a unified timestamp sequence. Combined with a spatial association table, the electrical parameters of each string are associated with the environmental parameters of the corresponding monitoring node, generating a complete data record. Each fused data record includes: timestamp, environmental monitoring node number, five environmental parameters, photovoltaic string number, string output current, and string output voltage, ensuring the spatiotemporal correspondence of the data.

[0032] After fusion, the environmental-electrical parameter fusion dataset is checked for completeness and consistency. The check criteria are: no missing environmental parameters and string electrical parameters for each timestamp, complete consistency of timestamps, and correct spatial association between strings and monitoring nodes. If the check fails, the spatiotemporal registration is repeated. If the check passes, the fusion dataset is stored in a dedicated storage unit and archived in chronological order to form a standardized environmental-electrical parameter fusion dataset. This provides high-quality basic data for subsequent construction of power generation efficiency decay curves and calculation of power generation duration based on environmental and electrical parameters, ensuring the consistency and accuracy of the entire photovoltaic power plant power generation duration calculation process.

[0033] S202, Based on the solar irradiance time series data and meteorological parameter sequence, the power generation efficiency decay curve of the photovoltaic string is dynamically constructed through a physical-driven and data-driven fusion model. The curve represents the change in the actual conversion efficiency of the string under different time periods and meteorological conditions. Specifically, solar irradiance, ambient temperature, humidity, and wind speed sequences can be extracted from the environmental-electrical parameter fusion dataset, and the actual output power of the string at the corresponding time can be calculated to generate a theoretical-actual power comparison dataset. The core of this step is to select the essential parameters needed to construct the performance degradation curve from the standardized environmental-electrical parameter fusion dataset, calculate the actual output power of the string, and establish a correlation between theoretical and actual power. This provides high-quality basic data for subsequent dual-model fusion analysis. The specific implementation method is as follows: The environmental-electrical parameter fusion dataset is a standardized dataset obtained from the spatiotemporal registration in the previous step. It contains complete information such as timestamps, environmental parameters, and string electrical parameters. The extraction process requires strict synchronous filtering according to timestamps to ensure accurate correspondence between environmental parameters and string electrical parameters, avoiding data mismatch caused by time deviations. The four core environmental parameters extracted are all key factors affecting the conversion efficiency of photovoltaic strings: solar irradiance directly determines the incident energy and is the core input for power generation; ambient temperature affects the semiconductor characteristics of the modules, and increased temperature leads to a decrease in conversion efficiency; humidity and wind speed jointly affect dust accumulation and heat dissipation on the module surface, indirectly affecting power generation efficiency. Therefore, only these four parameter sequences are extracted, and secondary influencing factors such as cloud cover are eliminated to simplify model complexity while ensuring analytical accuracy.

[0034] During the extraction process, each fused data record was filtered one by one according to the timestamp order to extract the solar irradiance G (unit W / m²) at the corresponding time. 2 The ambient temperature T (°C), relative humidity RH (%RH), and wind speed V (m / s) are integrated into four independent time-series data points, with each sequence's timestamps strictly aligned to ensure time sequence consistency. Simultaneously, based on the real-time output current I (A) and voltage U (V) from the original string-level electrical parameter sequence, the actual string output power P_actual at the corresponding moment is calculated using the formula P_actual = U × I. This formula is the core basis for calculating the actual power of the photovoltaic string, ignoring minor line losses (loss percentage ≤ 0.5%, negligible) to ensure the calculation results closely match actual operating conditions.

[0035] In the example, the electrical parameters of a certain photovoltaic string #1 at the timestamp 2024-07-01 09:00:00 are: output current 8.6A, output voltage 385.5V. Substituting these parameters into the formula, we can calculate P_actual = 8.6A × 385.5V = 3315.3W (i.e., 3.3153kW). The environmental parameter sequence data extracted at the same time is: solar irradiance 890W / m².2 The ambient temperature was 30.2℃, relative humidity was 58%RH, and wind speed was 1.5m / s. The sequences of the four environmental parameters at all times, along with the actual output power of the corresponding strings, were integrated in timestamp order. Each record included a timestamp, solar irradiance, ambient temperature, humidity, wind speed, and the actual output power of the string, generating a theoretical-actual power comparison dataset. This dataset correlates environmental parameters with actual power generation, providing direct support for subsequent theoretical power calculations and efficiency analysis. After generation, the dataset underwent integrity verification to ensure no missing data and no time deviation before being stored in a dedicated processing unit.

[0036] A physical driving model is established. Based on the performance parameters of photovoltaic modules under standard test conditions, the theoretical output power is calculated in combination with the environmental parameter sequence. The instantaneous conversion efficiency is obtained by comparing it with the actual output power, and an instantaneous efficiency sequence is generated. The core of this step is to calculate the theoretical power generation under ideal operating conditions using a physical-driven model, compare it with the actual power to quantify the conversion efficiency, generate an instantaneous efficiency sequence, capture the real-time changes in string conversion efficiency under different environmental conditions, and provide training targets for subsequent data-driven models. The specific implementation method is as follows: The physics-driven model is a theoretical power calculation model built upon the physical characteristics of photovoltaic modules. Its core advantage lies in its close alignment with the module's power generation principle, accurately reflecting the impact of environmental parameters on power generation. The model is constructed based on module performance parameters under Standard Test Conditions (STC), which are unified benchmark operating conditions in the photovoltaic field. Specific parameters are: solar irradiance G_stc = 1000 W / m². 2 The ambient temperature T_stc = 25℃ and the wind speed V_stc = 1m / s. Under these conditions, the performance parameters of the components (such as rated power and temperature coefficient) are calibrated at the time of manufacture and are the core benchmark for theoretical power calculation.

[0037] The component performance parameters to be extracted under standard test conditions include: component rated power P_stc (in W) and temperature coefficient k (in % / ℃). The temperature coefficient k characterizes the rate of degradation of component conversion efficiency when the ambient temperature deviates from 25℃. The temperature coefficient range of conventional photovoltaic components is -0.4% / ℃ to -0.5% / ℃. In the example, the component selected has P_stc=3500W (3.5kW) and k=-0.45% / ℃, which means that for every 1℃ increase in ambient temperature above 25℃, the component conversion efficiency decreases by 0.45%, and for every 1℃ decrease in ambient temperature below 25℃, the conversion efficiency increases by 0.45% (the increase does not exceed 3%).

[0038] The core calculation formula of the physical driving model is: P_theoretical = P_stc × (G / G_stc) × [1 + k × (T - T_stc)], where P_theoretical is the theoretical output power of the string at the corresponding time (in W), G / G_stc is the ratio of actual irradiance to standard irradiance, which characterizes the impact of irradiance change on power; [1 + k × (T - T_stc)] is the temperature correction term, which characterizes the correction effect of temperature change on power. Wind speed indirectly adjusts the temperature correction term by affecting component heat dissipation. Here it is simplified to: when wind speed ≥ 1 m / s, the temperature correction term remains unchanged, and when wind speed < 1 m / s, the temperature correction term is multiplied by 1.02 (the heat dissipation effect deteriorates and the attenuation intensifies).

[0039] In the example, the theoretical power is calculated by substituting the parameters at the corresponding time: G = 890 W / m 2 Given T=30.2℃ and V=1.5m / s (≥1m / s, no additional correction required), the calculation process is as follows: G / G_stc=890 / 1000=0.89; temperature correction term=1+(-0.45%)×(30.2-25)=1-0.0045×5.2≈0.9774; P_theoretical=3500W×0.89×0.9774≈3500×0.8709≈3048.15W (3.04815kW).

[0040] Instantaneous conversion efficiency (η_instantaneous) is a core indicator for measuring the actual power generation performance of a string, representing the ratio of actual output power to theoretical output power. The formula is η_instantaneous = P_actual / P_theoretical, with a value ranging from 0 to 1. The closer the value is to 1, the higher the string conversion efficiency and the better the power generation performance. The calculation result is rounded to four decimal places to ensure accuracy. In the example, P_actual = 3315.3W, P_theoretical = 3048.15W. Substituting these values ​​into the formula, we get η_instantaneous = 3315.3 / 3048.15 ≈ 1.0876. The reason for this value being greater than 1 is that there is a slight temperature correction deviation in actual operating conditions (slightly higher wind speed, better heat dissipation than standard conditions), which is within the normal range. The typical instantaneous efficiency range is 0.85-1.10.

[0041] Using the above method, the instantaneous conversion efficiency at each moment in the theoretical-actual power comparison dataset is calculated and integrated in the order of timestamps to generate an instantaneous efficiency sequence. This sequence can truly reflect the dynamic changes in the actual conversion efficiency of the string under different time periods and environmental conditions. After the sequence is generated, its rationality is checked, and outliers with efficiencies <0.7 or >1.15 are removed (such values ​​correspond to component failures or calculation deviations). After passing the verification, the sequence is stored in the processing unit as the training target for the subsequent data-driven model.

[0042] A data-driven efficiency decay model is constructed, in which environmental parameter sequences are used as input features and instantaneous efficiency sequences are used as training targets. The dynamic impact of environmental factors on conversion efficiency is learned through a gated recurrent unit network to obtain an efficiency prediction model. The core of this step is to use a data-driven model to learn the intrinsic relationship between environmental parameters and instantaneous conversion efficiency, making up for the nonlinear decay laws that physical-driven models cannot capture (such as the coupled effects of humidity and wind speed), and building a model that can accurately predict conversion efficiency. The specific implementation method is as follows: The data-driven efficiency degradation model takes the environmental parameter sequence as input and the instantaneous efficiency sequence as output. The core of the model adopts a gated recurrent unit network (GRU), which is a deep learning model suitable for time series data. Its core advantage is that it can capture long-term dependencies in time series data and effectively handle the dynamic characteristics of environmental parameters (such as solar irradiance and temperature) changing over time. It avoids the shortcomings of traditional models that cannot adapt to time series fluctuations and is very suitable for predicting the conversion efficiency of photovoltaic strings.

[0043] Before building the model, the input and output data need to be preprocessed to ensure training accuracy. The input features are four time series: solar irradiance, ambient temperature, humidity, and wind speed. Normalization is required to map all parameter values ​​to the [0,1] interval, eliminating training bias caused by differences in unit and numerical range of different parameters. The normalization formula is x_norm=(x-x_min) / (x_max-x_min), where x is the original parameter value, and x_min and x_max are the minimum and maximum values ​​of the parameter series, respectively. In the example, x_min = 0 W / m² for solar irradiance. 2 x_max=1200W / m 2 At a certain moment, G = 890 W / m 2 After normalization, G_norm=(890-0) / (1200-0)=0.7417; the output target is the instantaneous efficiency sequence, which does not need to be normalized and can be directly used as the label for model training.

[0044] The core logic of model training is to learn the mapping relationship between input features and output targets layer by layer through the GRU network, and to capture the coupling effect of environmental parameters (such as the efficiency decay law under high temperature and high humidity environment). During the training process, key parameters are set as follows: the training batch size is 64 (64 time series data points are input for training each time) and the number of training iterations is 100 to ensure that the model fully learns the data patterns. The mean squared error (MSE) is used as the loss function to measure the deviation between the model's predicted value and the actual instantaneous efficiency. The smaller the loss function value, the higher the model's prediction accuracy. The training objective is that the loss function value is ≤0.001.

[0045] In the example, 72 consecutive hours of time-series data were selected as training data, with 60 hours used for model training and 12 hours for validation. The input features were normalized sequences of solar irradiance, temperature, humidity, and wind speed, and the output target was the instantaneous efficiency sequence at the corresponding time. Initially, the loss function value was relatively high (approximately 0.008). As the number of iterations increased, the loss function value gradually decreased, reaching 0.0009 after 85 iterations, achieving the training objective, and training was stopped. After training, the model was tested using validation data. The test results showed that the average deviation between the model's predicted instantaneous efficiency and the actual instantaneous efficiency was 0.012, meeting the prediction accuracy requirements. This indicates that the model can accurately capture the dynamic impact of environmental factors on conversion efficiency.

[0046] After passing the test, the model parameters are saved, and an efficiency prediction model is generated. This model can receive the input environmental parameter sequence and output the instantaneous conversion efficiency prediction value at the corresponding time. It makes up for the shortcomings of the physical driving model, which only considers the linear relationship, and realizes the accurate prediction of conversion efficiency, providing core support for the subsequent generation of power generation efficiency decay curve.

[0047] The efficiency prediction model is used to make rolling predictions for future periods, and at the same time, the component aging factor and dust accumulation attenuation factor are introduced for correction, generating a dynamic power generation efficiency attenuation curve.

[0048] The core of this step is to achieve rolling prediction of conversion efficiency for future periods through an efficiency prediction model, and to make corrections based on the actual degradation patterns of module aging and dust accumulation, generating a dynamic and realistic power generation efficiency degradation curve that fully characterizes the actual conversion efficiency changes of the string under different time periods and weather conditions. The specific implementation method is as follows: Rolling forecasting is a forecasting method that adapts to dynamic changes in time-series data. Its core logic involves using an efficiency forecasting model to predict the instantaneous conversion efficiency for the next minute at fixed time intervals (consistent with the data acquisition frequency, once per minute). After each forecast, the model parameters are updated based on the latest collected environmental parameters and actual efficiency data before the next forecast is performed. This ensures that the forecast results can adapt to changes in environmental conditions in real time, avoiding the accumulation of biases caused by a single forecast. The forecast period is set to 24 hours, predicting the instantaneous conversion efficiency for each minute of the next day, generating a 24-hour efficiency forecast sequence to meet the needs of daily power generation efficiency analysis.

[0049] The module aging factor k_aging is a parameter used to correct for the efficiency degradation caused by long-term aging of the module. The aging degradation of photovoltaic modules changes linearly. The annual aging degradation rate of conventional modules is 0.8%-1.0%, that is, the conversion efficiency decreases by 0.8%-1.0% every year. The formula for calculating the aging factor is k_aging=1-(λ×t), where λ is the annual aging degradation rate and t is the number of years the module has been in operation (in years). In the example, the module has been in operation for 3 years, λ=0.8% / year (0.008). Substituting into the formula, we get k_aging=1-(0.008×3)=0.976, which means that after 3 years of operation, the conversion efficiency has decayed to 97.6% of the initial efficiency. This factor incorporates the impact of long-term aging into the prediction results to ensure that the curve closely matches the actual operating state of the module.

[0050] The dust accumulation attenuation factor k_dust is a parameter used to correct for efficiency degradation caused by dust accumulation on the module surface. Dust accumulation attenuation is closely related to ambient humidity and wind speed: the higher the humidity and the lower the wind speed, the more severe the dust accumulation and the more obvious the attenuation; the lower the humidity and the higher the wind speed, the less dust accumulation and the milder the attenuation. The value range of the dust accumulation attenuation factor is 0.95-0.98. In the example, based on the current environmental parameters (humidity 58%RH, wind speed 1.5m / s), k_dust is set to 0.97, which means that dust accumulation causes a 3% reduction in module conversion efficiency. If the humidity rises to 80%RH and the wind speed drops to 0.5m / s, k_dust is adjusted to 0.95, resulting in a 5% attenuation.

[0051] The corrected instantaneous efficiency prediction formula is: η_correction = η_prediction × k_aging × k_dust, where η_prediction is the output value of the efficiency prediction model, and η_correction is the actual instantaneous conversion efficiency prediction value after incorporating aging and dust accumulation factors. This formula fully integrates the triple effects of environmental fluctuations, component aging, and dust accumulation, ensuring accurate and reliable prediction results. In the example, at a certain moment, η_prediction = 1.05, k_aging = 0.976, and k_dust = 0.97. Substituting these values ​​into the formula, we get η_correction = 1.05 × 0.976 × 0.97 ≈ 1.00.

[0052] The η_correction data for each moment within the next 24 hours is integrated in timestamp order. A dynamic power generation efficiency decay curve is plotted with time (hours) on the horizontal axis and the corrected instantaneous conversion efficiency (dimensionless) on the vertical axis. This curve clearly characterizes the efficiency variation patterns at different times: for example, in the early morning (6:00-8:00), solar irradiance is low, and efficiency gradually increases; at noon (11:00-13:00), solar irradiance is high, but temperature is also high, resulting in a slight decrease in efficiency that remains within a stable range; in the evening (16:00-18:00), solar irradiance decreases, and efficiency gradually decreases. Simultaneously, the curve shows a slight overall downward trend, reflecting the long-term decay impact of module aging and dust accumulation. After curve generation, key parameters (aging factor, dust accumulation decay factor, prediction period) are labeled and stored in the data storage unit for subsequent real-time output of time-varying efficiency correction of electrical parameters, providing a core basis for accurately calculating the power generation duration of photovoltaic power plants.

[0053] S203, the power generation efficiency decay curve is used to perform time-varying efficiency correction on the real-time output current and voltage data to generate a standardized power generation sequence that eliminates the effects of environmental fluctuations and equipment aging. Specifically, based on the power generation efficiency decay curve, the actual conversion efficiency value corresponding to each time point can be extracted, the efficiency correction coefficient at that moment relative to the standard test conditions can be calculated, and a time-varying efficiency correction coefficient sequence can be generated. The core of this step is to extract real-time efficiency data from the power generation efficiency decay curve, compare it with the efficiency under standard test conditions, calculate the time-varying correction coefficient, and provide a dynamic benchmark for subsequent electrical parameter correction. This ensures that the corrected power can eliminate the effects of environmental fluctuations and equipment aging. The specific implementation method is as follows: The power generation efficiency degradation curve is a dynamic curve generated in the previous step. With time on the horizontal axis and actual conversion efficiency on the vertical axis, it clearly characterizes the efficiency variation of the photovoltaic string under different time periods and weather conditions. Each time point in the curve corresponds to a unique actual conversion efficiency value η_actual(t). This value integrates multiple influences such as environmental fluctuations, module aging, and dust accumulation, and can truly reflect the current power generation efficiency of the string. The extraction process must strictly adhere to the time synchronization principle. η_actual(t) is extracted from the curve moment by moment according to the timestamp consistent with the string electrical parameter acquisition (1 time / minute), ensuring that the extracted efficiency value accurately corresponds to the current and voltage data at the corresponding time, avoiding correction errors caused by time deviations.

[0054] Standard Test Conditions (STC) are a unified benchmark operating condition in the photovoltaic industry, used to measure the standard power generation efficiency of modules. Its specific parameters are: solar irradiance G_stc = 1000 W / m². 2The ambient temperature T_stc = 25℃ and the wind speed V_stc = 1m / s. Under these conditions, the standard conversion efficiency η_stc of the module is specified by the module manufacturer. The standard conversion efficiency range of a conventional photovoltaic string is 0.92-0.96. The module used in this example has an η_stc of 0.95, which means that under standard conditions, the module can convert 95% of the incident solar radiation energy into electrical energy.

[0055] The efficiency correction coefficient k_t is a core parameter characterizing the deviation between the current actual efficiency and the standard efficiency. It is used to correct the actual power generation to the power value under standard operating conditions. The calculation formula is k_t=η_actual(t) / η_stc, where k_t is the time-varying efficiency correction coefficient at time t. It is a dimensionless quantity and its value range is usually 0.7-1.3. When k_t<1, it means that the current string efficiency is lower than the standard efficiency and there is attenuation (affected by environment, aging, etc.). When k_t>1, it means that the current operating conditions are better than the standard operating conditions (such as slightly lower temperature and better heat dissipation), and the efficiency is higher than the standard value.

[0056] In the example, at a certain moment (2024-07-01 10:00:00), the actual efficiency (t) extracted from the power generation efficiency degradation curve is η_actual(t) = 0.914, and η_stc = 0.95. Substituting these values ​​into the formula, we can calculate k_t = 0.914 / 0.95 = 0.96. This coefficient means that the actual conversion efficiency of the current string is 96% of the standard efficiency, with a 4% overall degradation (including increased ambient temperature, slight aging of components, and a small amount of dust accumulation). A correction coefficient is needed to restore the actual power to the standard operating condition. Following the above method, the efficiency correction coefficient is calculated for each time point, and integrated in timestamp order to generate a time-varying efficiency correction coefficient sequence. Each time point corresponds to a unique k_t value. After the sequence is generated, a rationality check is performed, and outliers with k_t < 0.7 or k_t > 1.3 are removed (these values ​​correspond to component failure or efficiency extraction deviation). After passing the check, the sequence is stored in a temporary processing unit to support subsequent electrical parameter correction.

[0057] Based on the time-varying efficiency correction coefficient sequence, the real-time current and voltage data in the original sequence of string-level electrical parameters are corrected, and the measured power is divided by the corresponding efficiency correction coefficient to generate a preliminary standardized power sequence. The core of this step is to use a time-varying efficiency correction coefficient to correct the real-time current and voltage data collected by the string, restoring it to the power value under standard operating conditions. This initially eliminates interference factors such as environmental fluctuations and equipment aging, generating a preliminary standardized power sequence. The specific implementation method is as follows: The original sequence of string-level electrical parameters includes the real-time output current I_measured(t) (in A) and voltage U_measured(t) (in V) at each time point. These data directly reflect the actual power generation status of the string. However, due to factors such as ambient temperature, component aging, and dust accumulation, they cannot be directly used to identify effective power generation periods and need to be corrected to standard operating conditions using correction coefficients. Before correction, it is necessary to ensure that the timestamps of the electrical parameter sequence and the efficiency correction coefficient sequence are perfectly aligned, with a deviation of no more than 100 milliseconds. If a deviation exists, fine-tuning is required according to the time registration principle to ensure that I_measured(t), U_measured(t), and k_t at each moment correspond precisely.

[0058] First, calculate the measured power of the string at each moment, P_measured(t), using the formula P_measured(t) = U_measured(t) × I_measured(t). This formula ignores minor line losses (loss percentage ≤ 0.5%, negligible) and accurately reflects the actual output power of the string, in W (or kW). In the example, at a certain moment, U_measured(t) = 380.2V and I_measured(t) = 8.5A. Substituting these values ​​into the formula, we get P_measured(t) = 380.2V × 8.5A = 3231.7W (i.e., 3.2317kW).

[0059] The correction logic for the preliminary standardized power is as follows: divide the measured power by the efficiency correction coefficient k_t at the corresponding time to restore the power value under standard test conditions, eliminating the attenuation effect caused by environmental factors, aging and other factors. The calculation formula is P_preliminary standard(t) = P_measured(t) / k_t, with the unit being the same as the measured power. This value represents the output power that the string should have under standard operating conditions, and can realize the standardized comparison of power data under different time periods and different environmental conditions.

[0060] In the example, at the corresponding time point, k_t = 0.96 and P_measured(t) = 3231.7W. Substituting these values ​​into the formula, we can calculate P_preliminary standard(t) = 3231.7W / 0.96 ≈ 3366.35W (3.36635kW). This value is the corrected preliminary normalized power, eliminating the current 4% overall attenuation effect and restoring the power level to the standard operating condition. Following the above method, the preliminary normalized power is calculated for each time point, integrated in timestamp order, and a preliminary normalized power sequence is generated. Each data point in the sequence corresponds to the string output power under standard operating conditions. After generation, integrity verification is performed, removing outliers such as P_preliminary standard(t) < 0 or exceeding 1.2 times the string's rated power (such values ​​correspond to electrical parameter acquisition faults). After passing the verification, the data is stored in the processing unit for subsequent data smoothing processing.

[0061] The preliminary standardized power sequence is smoothed by using a local weighted regression algorithm to eliminate random noise and instantaneous disturbances, and a smoothed standardized power sequence is generated. The core of this step is to eliminate random noise and transient disturbances in the initial standardized power sequence, avoiding deviations in the identification of subsequent effective power generation periods caused by such interference factors. A local weighted regression algorithm is used to smooth and optimize the power sequence, generating a more stable standardized power sequence that better reflects actual power generation patterns. The specific implementation method is as follows: Although the preliminary standardized power sequence has eliminated major influences such as environment and aging, there are still some random noises and transient disturbances. These disturbances mainly come from sensor acquisition errors, instantaneous voltage fluctuations, and sudden changes in irradiance caused by short-term wind. They manifest as isolated peaks or valleys in the power sequence (such as a sudden drop in power from 3366W to 3200W at a certain moment, and then returning to normal at the next moment). If smoothing is not performed, it will lead to misjudgment of the effective power generation time. Therefore, it is necessary to eliminate these disturbances through data smoothing algorithms.

[0062] The Locally Weighted Regression (LOESS) algorithm is a core algorithm suitable for smoothing time-series data. Its key advantage is that it does not require a pre-set fixed fitting model, but can perform local fitting of the data, giving higher weights to nearby data points and lower weights to data points farther away, effectively preserving the overall trend of time-series data while eliminating isolated random noise. It is very suitable for smoothing photovoltaic power series. The core parameters of this algorithm include window size and weighting coefficients. The window size is set to 10 data points (i.e., 10 minutes) to balance local data trends and noise reduction. A window that is too large will mask the normal power change trend, while a window that is too small will not effectively eliminate noise. The weighting coefficients use a Gaussian weighting method, that is, the closer the data point is to the fitting center, the greater the weight. The weight value decreases with increasing distance in a Gaussian distribution, ensuring that the fitting result fits the local data pattern.

[0063] The specific process of smoothing is as follows: taking each data point in the preliminary standardized power sequence as the center, select four data points before and after (a total of 10 data points) as the fitting window, calculate the fitting value in the window by Gaussian weighting, and replace the preliminary standardized power value of the original center data point; move the fitting window point by point and repeat the above operation until the smoothing of the entire sequence is completed, and generate the smoothed standardized power sequence P_smoothing_standard(t).

[0064] In the example, the data for a certain period in the preliminary standardized power sequence are 3366.35W, 3368.2W, 3200.5W, 3370.1W, and 3369.8W. Among them, 3200.5W is random noise (caused by instantaneous disturbances). Taking 3200.5W as the center, four data points before and after it are selected to form a fitting window. Through Gaussian weighted fitting, the fitted value is 3367.9W, which replaces the outlier value of 3200.5W. After smoothing, the power sequence for this period is 3366.35W, 3368.2W, 3367.9W, 3370.1W, and 3369.8W. Isolated noise has been eliminated, and the overall stable upward trend has been preserved. After smoothing, a smoothed standardized power sequence is generated. This sequence is time-coherent and fluctuates smoothly, which can truly reflect the stable power generation state of the string under standard operating conditions. After the sequence is generated, the smoothing effect is verified to ensure that the power change trend after smoothing is consistent with the initial standardized sequence and there is no trend distortion caused by over-smoothing. After the verification is qualified, it is stored in a dedicated storage unit for subsequent iterative optimization.

[0065] The smoothed standardized power sequence is compared and analyzed with the theoretical power curve under ideal conditions. The correction error is calculated and the correction parameters are iteratively optimized to generate the final standardized power generation sequence.

[0066] The core of this step is to quantify the correction error of the smoothed power sequence by comparing it with the ideal theoretical power curve. Through iterative optimization and adjustment of the correction parameters, the accuracy of the standardized power is further improved, generating the final standardized power generation sequence. This provides high-quality data support for subsequent identification of effective power generation periods. The specific implementation method is as follows: The theoretical power curve under ideal conditions refers to the power output of a photovoltaic string under standard test conditions (G_stc=1000W / m). 2 The ideal curve of output power versus time under conditions of T_stc=25℃ and V_stc=1m / s is given. This curve is determined by the module's factory performance parameters and the temporal variation of solar irradiance. It characterizes the ideal power generation state of the string under conditions of no interference and no attenuation. The ideal power value P_ideal(t) at each moment in the curve can be calculated using the physical formula: P_ideal(t)=P_stc×(G(t) / G_stc), where P_stc is the rated power of the module (P_stc=3500W in the example) and G(t) is the solar irradiance at the corresponding moment (unit: W / m²). 2 G_stc=1000W / m 2 This formula ignores disturbances such as temperature (because it has been corrected to the standard temperature) and only considers the effect of solar irradiance on ideal power.

[0067] The specific process of comparative analysis is as follows: The smoothed standardized power P_smoothedstandard(t) and the ideal power P_ideal(t) are extracted synchronously according to the timestamp. The correction error of both is calculated time-by-time, expressed as a percentage of relative error. The calculation formula is E_t = |P_smoothedstandard(t) - P_ideal(t)| / P_ideal(t) × 100%, where E_t is the correction error at time t (in %). The absolute value ensures that the error is non-negative, reflecting only the magnitude of the deviation and not considering its direction. In the example, at a certain time, G(t) = 890 W / m³. 2 P_ideal(t) = 3500W × (890 / 1000) = 3115W, P_smoothing_standard(t) = 3367.9W. Substituting these values ​​into the formula, we can calculate E_t = |3367.9 - 3115| / 3115 × 100% ≈ 8.12%. This error exceeds the preset reasonable range (≤5%) and requires iterative optimization.

[0068] The core object of iterative optimization is the time-varying efficiency correction coefficient k_t. The optimization logic is as follows: based on the magnitude of the correction error E_t, dynamically adjust the value of k_t at the corresponding time, recalculate the initial standardized power, and perform smoothing processing until the correction error E_t ≤ 5% (a preset reasonable error range), forming an iterative closed loop. The optimization formula is k_t_optimization = k_t × (1 - E_t / 100%). When P_smoothing_standard(t) > P_ideal(t), it indicates that k_t is too small, and k_t needs to be increased to reduce the initial standardized power; when P_smoothing_standard(t) < P_ideal(t), it indicates that k_t is too large, and k_t needs to be decreased to increase the initial standardized power.

[0069] In the example, at the corresponding time k_t=0.96, E_t=8.12%, P_smoothing_standard(t)>P_ideal(t), substituting into the optimization formula, we get k_t_optimized=0.96×(1-8.12% / 100%)≈0.96×0.9188≈0.882. The initial standardized power P_initial_standard(t) is recalculated as 3231.7W / 0.882≈3664.06W. After smoothing through the local weighted regression algorithm, we get P_smoothing_standard(t)≈3118W. The correction error E_t is calculated again as |3118-3115| / 3115×100%≈0.096%, ≤5%, which meets the requirements, and the iterative optimization at this time is completed.

[0070] Following the iterative optimization process described above, the k_t value is adjusted time-by-time, and the initial standardized power is recalculated, smoothed, and error calculated until the correction error at all times is within a preset reasonable range. After the iterative optimization is completed, all optimized standardized power values ​​are integrated and arranged in timestamp order to generate the final standardized power generation sequence. This sequence completely eliminates various interferences such as environmental fluctuations, equipment aging, and random noise, accurately representing the stable power generation state of the photovoltaic string under standard operating conditions. With consistent timing and high accuracy, it provides core data support for subsequent identification of effective power generation periods and calculation of power generation duration using adaptive threshold detection algorithms, ensuring the accuracy and reliability of the entire photovoltaic power station's power generation duration calculation.

[0071] S204, Based on the standardized power generation sequence, an adaptive threshold detection algorithm is used to identify the start and end points of the effective power generation period each day, and to calculate the duration during which the cumulative power exceeds a set threshold in each power generation day; Specifically, the standardized power generation sequence can be divided into natural days, and the statistical characteristic parameters of each segment sequence can be calculated, including mean, standard deviation, skewness and kurtosis, to generate a daily power statistical characteristic set; The core of this step is to perform time-series segmentation of the standardized power generation sequence. By calculating key statistical features, the distribution pattern and fluctuation characteristics of daily power generation are quantified, providing a quantitative basis for subsequent adaptive threshold generation and ensuring that the threshold can adapt to different power generation conditions each day. The specific implementation method is as follows: The standardized power generation sequence is a precise sequence generated in the previous step, completely eliminating interference from environmental fluctuations, equipment aging, random noise, etc. It is time-series consistent and meets the accuracy standards, with a sampling frequency of 1 time / minute. Each data point corresponds to a unique timestamp and a power generation value P_standard(t) (in W) under standard operating conditions. The core logic of segmenting by natural day follows the Earth's rotation period, using 00:00:00 as the segmentation start point and 23:59:59 as the segmentation end point, dividing the entire standardized power sequence into multiple independent daily power sequences. Each daily sequence corresponds to the power generation data of one natural day, ensuring that daily data is independently calculated and avoiding cross-day data interference.

[0072] During the segmentation process, the timestamp of each data point must be strictly verified. All power data with timestamps falling within the range of 00:00:00 to 23:59:59 on a given day are extracted and integrated into a power segmentation sequence for that day. If a small number of data points are missing for a given day (missing duration ≤ 30 minutes), linear interpolation is used to fill in the missing values ​​to ensure the integrity of the daily sequence. If the missing duration exceeds 30 minutes, that day is marked as an abnormal day and processed separately, not included in routine statistical feature calculations. In the example, the power segmentation sequence for 2024-07-01 contains 1440 data points (24 hours × 60 minutes), with power values ​​ranging from 0 to 3500W, covering the complete power generation process from sunrise to sunset and the nighttime shutdown period.

[0073] The calculation of statistical characteristic parameters is the core of this step. The four parameters characterize the characteristics of the daily power sequence from different dimensions. The calculation of each parameter is based on all valid data points of the daily power segment sequence. Specifically, the mean μ (in W) represents the average level of the standardized power generation on the day, reflecting the overall power generation intensity on the day. The calculation formula is μ=ΣP_standard(t) / n, where n is the number of valid data points on the day (normally n=1440). In the example, the summation of the power sequence on 2024-07-01 is ΣP_standard(t)=2880000W, μ=2880000W / 1440=2000W, indicating that the average power generation on the day is 2000W, and the overall power generation intensity is moderate.

[0074] The standard deviation σ (in W) characterizes the degree of fluctuation in the daily power sequence. The smaller the σ, the more stable the daily power generation and the less affected by instantaneous changes in solar radiation. The calculation formula is σ = √[Σ(P_standard(t) - μ)]. 2 [ / (n-1)], Substituting the data into the example, we can calculate σ≈350W, indicating that the power fluctuation on that day was moderate and in line with the normal power generation fluctuation range on a sunny day (σ is usually 200-500W).

[0075] Skewness S characterizes the degree of asymmetry in the power sequence distribution. It is a dimensionless quantity. When S>0, the distribution is right-skewed, indicating that high-power periods account for a larger proportion. When S<0, the distribution is left-skewed, indicating that low-power periods account for a larger proportion. When S=0, the distribution is symmetrical. In the example, S=0.8 is calculated, indicating that the power distribution on that day is right-skewed, and the proportion of high-power generation periods (such as noon) is relatively high, which is consistent with the distribution pattern of power generation on sunny days.

[0076] Kurtosis K characterizes the steepness of the power sequence distribution. It is a dimensionless quantity. When K>3, the distribution is peaked, indicating that the power is concentrated near the mean with small fluctuations. When K<3, the distribution is flat, indicating that the power distribution is dispersed with large fluctuations. When K=3, it is a normal distribution. In the example, K=3.2 is calculated, indicating that the power distribution on that day is slightly peaked, with the power mainly concentrated in the range of 1800-2200W and relatively concentrated fluctuations.

[0077] Using the above method, the mean, standard deviation, skewness, and kurtosis of the power segmentation sequence for each natural day are calculated. The four statistical characteristic parameters of each natural day are associated with the date and integrated to form a daily power statistical characteristic set. This characteristic set can comprehensively reflect the distribution and fluctuation characteristics of daily power generation, providing core data support for the subsequent generation of dynamic thresholds. After generation, integrity verification is performed to ensure that the four characteristic parameters of each natural day are complete and without any abnormalities. After passing the verification, the data is stored in a dedicated storage unit.

[0078] Based on the daily power statistics feature set, a dynamic threshold generation algorithm is adopted, which uses a specific percentage of the daily power mean as the base threshold and combines the standard deviation for floating adjustment to generate a daily adaptive power threshold. The core of this step is to overcome the drawbacks of fixed thresholds. By combining daily power statistics characteristics and using a dynamic threshold generation algorithm, an adaptive threshold is generated that fits the power generation conditions of the day. This ensures the accuracy of identifying effective power generation times and avoids misjudgments caused by fixed thresholds (such as misjudging as invalid on cloudy days or as valid at night). The specific implementation method is as follows: The core logic of the dynamic threshold generation algorithm is "basic threshold + floating adjustment". The basic threshold is set based on the daily power average to ensure that the threshold matches the overall power generation level of the day. The floating adjustment is set based on the standard deviation to adapt to the power fluctuation characteristics of the day. The greater the fluctuation, the greater the adjustment range, so that the threshold can flexibly adapt to different weather (sunny, cloudy, partly cloudy) power generation conditions. The core advantage is that it can be dynamically adjusted according to the daily power generation status, improving the accuracy of identifying effective power generation periods.

[0079] The setting of a specific percentage needs to be combined with the actual characteristics of photovoltaic power generation. The core purpose is to exclude periods of ineffective power generation such as nighttime shutdowns and low-power generation in the early morning / evening. After verification through engineering practice, a specific percentage of 30% is set, that is, the base threshold T_base=μ×30%. This ratio can effectively distinguish between effective power generation and ineffective power generation: when the power is higher than the base threshold, it means that there is a certain intensity of sunlight, which is within the range of effective power generation; when the power is lower than the base threshold, it is mostly due to nighttime shutdowns or insufficient sunlight, which is within the range of ineffective power generation. In the example, on 2024-07-01, μ=2000W, T_base=2000W×30%=600W, that is, the base threshold for that day is 600W.

[0080] The core of the floating adjustment is to correct the base threshold by combining the standard deviation σ, thereby eliminating the impact of power fluctuations on the threshold. The adjustment formula is T_adapt=T_base+σ×k, where T_adapt is the daily adaptive power threshold (in W), and k is the floating coefficient, ranging from 0.1 to 0.3. The value of k is positively correlated with the standard deviation; the larger σ is, the larger k is, and the larger the adjustment range. In the example, σ=350W on that day, and k=0.2 is set based on the degree of fluctuation. Substituting into the formula, we can get T_adapt=600W+350W×0.2=670W, that is, the adaptive power threshold on 2024-07-01 is 670W.

[0081] If the day is cloudy, the average power output is low and the fluctuation is small. In the example, on a cloudy day, μ=800W, σ=150W, and k=0.1, we can calculate T_base=800W×30%=240W, T_adapt=240W+150W×0.1=255W. This threshold is consistent with the low power generation conditions on cloudy days, avoiding the misjudgment of valid power generation moments on cloudy days as invalid due to an excessively high threshold. If the day is sunny and the fluctuation is large, μ=2500W, σ=450W, and k=0.3, T_adapt=2500W×30%+450W×0.3=750W+135W=885W. This avoids the misjudgment of valid moments of instantaneous low power as invalid due to excessive fluctuation.

[0082] After the threshold is generated, a rationality check is required. The check standard is: the adaptive threshold T_adapt should be greater than 0 and less than 50% of the maximum power of the day. This ensures that the threshold is neither lower than the minimum effective power generation nor higher than the medium power generation, avoiding misjudgments caused by thresholds that are too high or too low. In the example, the maximum power on July 1, 2024 was 3400W, 50% of which is 1700W, and 670W is within the range of 0-1700W, so the check is successful. Following the above method, a corresponding adaptive power threshold is generated for each natural day, associated with the date and the statistical feature parameters of the day, and integrated to form a daily adaptive threshold sequence, which is then stored in the processing unit for subsequent binarization processing.

[0083] The power sequence of the corresponding day is binarized using a daily adaptive power threshold. The time when the power value is greater than the power threshold is marked as the effective power generation time, and a daily effective power generation mark sequence is generated. The core of this step is to convert the continuous power sequence into a discrete labeled sequence through binarization, clearly distinguishing between effective and ineffective power generation times, thus providing a clear identification basis for subsequent continuous segment detection and duration calculation. The specific implementation method is as follows: Binarization involves assigning a label value to each data point in the daily power segmentation sequence based on its comparison with the adaptive power threshold of the day. The core rule is: if the standardized power value P_standard(t) > T_adapt at a certain moment, then that moment is marked as a valid power generation moment and assigned a label value of 1; if P_standard(t) ≤ T_adapt, then that moment is marked as an invalid power generation moment and assigned a label value of 0. The label value is only used to distinguish between valid and invalid data and has no actual physical meaning. This processing method can simplify continuous power data into a discrete label sequence, which is convenient for subsequent continuous segment detection.

[0084] During processing, it is necessary to ensure that the daily power segmentation sequence strictly corresponds to the adaptive threshold of the corresponding day, and to compare and mark the data moment by moment to avoid mismarking caused by cross-day threshold mixing or timestamp deviation. At the same time, for moments when the power value is exactly equal to the threshold, it is marked as an invalid power generation moment (marked as 0 when P_standard(t)=T_adapt), because the power generation at such moments is in a critical state between effective and invalid, the light intensity is weak, the power generation efficiency is extremely low, and it is not included in the effective power generation period.

[0085] In the example, the adaptive threshold T_adapt = 670W on July 1, 2024. The power value at 06:30:00 is 680W > 670W, marked as 1 (valid); the power value at 06:29:00 is 660W ≤ 670W, marked as 0 (invalid); the power value at 17:45:00 is 685W > 670W, marked as 1 (valid); the power value at 17:46:00 is 665W ≤ 670W, marked as 0 (invalid); and the power value at 23:00:00 is 0W ≤ 670W, marked as 0 (invalid). Following these rules, the marking of 1440 data points for the day is completed hourly, generating a valid power generation marking sequence for the day. This sequence is a discrete sequence containing 1440 0s and 1s, where 1 corresponds to a valid power generation time and 0 corresponds to an invalid power generation time.

[0086] If the power data at a certain moment is an outlier (such as missing interpolated data marked as outlier), it is marked as 0 (invalid) to avoid misjudging the valid moment due to outlier data. After the marked sequence is generated, its completeness and rationality are checked. The check criteria are: the length of the marked sequence is consistent with the power segmentation sequence of the day, there are no missing marked values, and the valid power generation marks (1) are mainly concentrated in the period from sunrise to sunset, and there are no valid marks at night. In the example, in the marked sequence of 2024-07-01, 1 is mainly concentrated between 06:30:00 and 17:45:00, and all are 0 at night, so the check is qualified. According to the above method, a corresponding valid power generation mark sequence is generated for each natural day, associated with the date and adaptive threshold, and stored in the processing unit for subsequent continuous segment detection.

[0087] The system performs continuous segment detection on the daily effective power generation marker sequence, identifies the start and end time points when the power continuously exceeds the power threshold, calculates the duration of each continuous segment and accumulates them to generate the daily effective power generation duration.

[0088] The core of this step is to identify consecutive effective power generation periods from the labeled sequence through continuous segment detection, calculate the duration of each period and sum them up to obtain the complete effective power generation duration for the day, thus achieving accurate quantification of the effective power generation duration. The specific implementation method is as follows: The core logic of continuous segment detection is to traverse the daily valid power generation marker sequence and identify all marker segments with consecutive 1s (i.e., continuous valid power generation periods). Each continuous segment corresponds to an independent valid power generation period. The start position of the segment corresponds to the start time of valid power generation, and the end position corresponds to the end time of valid power generation. During the detection process, isolated 1s (such as a single moment marked as 1 with 0s before and after) need to be excluded. Such isolated markers are mostly caused by instantaneous interference and do not belong to valid power generation periods. The criteria for judging isolated markers are set as follows: the duration of consecutive markers with 1s is ≤5 minutes (i.e., 5 consecutive markers with 1s), which are judged as isolated markers and are not included in the valid power generation period.

[0089] The detection process is as follows: Starting from the first data point of the marked sequence, it iterates point by point. When a marker value changes from 0 to 1, the timestamp of that moment is recorded as the effective power generation start time point t_start. The process continues, and when the marker value of the consecutive 1-sequence changes from 1 to 0, the previous timestamp is recorded as the effective power generation end time point t_end. If the sequence is still 1 at the end, then 23:59:59 of that day is taken as t_end. The duration of each consecutive valid segment Δt = t_end - t_start, in minutes (or hours). Since the sampling frequency is 1 time / minute, the duration of n consecutive segments marked as 1 is n minutes.

[0090] In the example, in the valid power generation marker sequence of 2024-07-01, the first consecutive 1 segment starts from 06:30:00 (mark value becomes 1) and ends at 11:30:00 (mark value becomes 0), with a duration of 300 minutes (5 hours) of continuous marking as 1, Δt1=300 minutes; the second consecutive 1 segment starts from 11:35:00 (with 5 minutes marked as 0, which is a short fluctuation) and ends at 17:45:00, with a duration of 370 minutes (6 hours and 10 minutes) of continuous marking as 1, Δt2=370 minutes; both segments exceed 5 minutes, are not isolated markers, and must be included in the valid power generation duration.

[0091] If a consecutive segment of 1 is only 3 minutes long (3 segments marked as 1), it is considered an isolated segment and is not included in the effective power generation duration to avoid miscalculation of duration due to instantaneous interference. Each natural day may have multiple consecutive effective segments (such as cloudy or overcast weather, unstable sunlight, which may cause interruptions in the effective power generation period). All consecutive segments of 1 need to be detected one by one, the duration of each segment needs to be calculated, and then the durations of all effective segments need to be summed to obtain the effective power generation duration T_total=ΣΔt_i for the day (where i is the sequence number of the effective segment).

[0092] In the example, for July 1, 2024, T_total = 300 minutes + 370 minutes = 670 minutes, which translates to 11 hours and 10 minutes (670 ÷ 60 ≈ 11.17 hours). Therefore, the effective power generation duration for that day is 11 hours and 10 minutes. After the duration calculation, a reasonableness check is required. The check standard is: the effective power generation duration should be within the range of sunrise to sunset on that day and match the day's sunshine duration and power statistics. In the example, on July 1, 2024, sunrise time is 05:40, sunset time is 18:30, sunshine duration is approximately 12 hours and 50 minutes, and the effective power generation duration of 11 hours and 10 minutes falls within this range and matches the day's average power and fluctuation characteristics, thus passing the check.

[0093] Using the above method, the effective power generation duration for each natural day is calculated. The duration is then linked with the date, daily statistical characteristics, and adaptive thresholds to form a daily effective power generation duration sequence, which is stored in a historical parameter database. This provides core historical data support for the subsequent extrapolation of future power generation duration by combining historical data and meteorological forecast information, ensuring the consistency and accuracy of the entire photovoltaic power station power generation duration calculation process.

[0094] S205 combines historical power generation duration data with meteorological forecast information, uses a time-series prediction model to extrapolate the daily theoretical power generation duration of photovoltaic power plants within a specified future period, and generates power generation duration forecast results including confidence intervals.

[0095] Specifically, historical power generation duration data and historical meteorological data of the same period can be collected to construct a historical power generation duration-meteorological feature dataset, extract time series statistics of meteorological features as input features of the model, and generate a training sample set; The core of this step is to collect high-quality historical data, construct an associated dataset, and extract effective input features to provide a foundation for subsequent time-series prediction model training. This ensures that the model can learn the intrinsic correlation between historical power generation duration and meteorological conditions. The specific implementation method is as follows: Historical power generation duration data is derived from the daily effective power generation duration sequence generated in the previous step. The collection period needs to balance data representativeness and model training accuracy. Considering the seasonal and meteorological influences on photovoltaic power generation, the collection period is set to the daily effective power generation duration data for the past year (365 days). This period covers the meteorological changes of spring, summer, autumn, and winter while ensuring sufficient data to avoid model overfitting due to insufficient data. The core data collected is the daily effective power generation duration T_history (unit: hours), which is also associated with the corresponding date to ensure that each duration data point corresponds accurately to the date. In the example, T_history = 7.8 hours for 2024-01-01 and T_history = 11.2 hours for 2024-07-01, corresponding to typical power generation durations in winter and summer, respectively.

[0096] Historical meteorological data and historical power generation duration data are matched one-to-one, and core meteorological parameters closely related to power generation duration are collected, including the daily average solar irradiance G_mean (unit: W / m²). 2 The daily average ambient temperature T_mean (unit: °C), daily average wind speed V_mean (unit: m / s), and daily average cloud cover C_mean (unit: %) directly affect the power generation efficiency of the photovoltaic string, and thus determine the effective power generation duration each day. The data comes from the historical environmental monitoring database of the photovoltaic power station to ensure synchronization with the power generation duration data of the corresponding date, with no time deviation.

[0097] When constructing the historical power generation duration-meteorological feature dataset, the daily historical power generation duration T_history is integrated with four meteorological parameters for the same period. Each dataset record includes: date, T_history, G_mean, T_mean, V_mean, and C_mean. This ensures the correlation and completeness of the data. If data for a certain date is missing (e.g., missing meteorological data or missing power generation duration data), the record for that date is removed to avoid affecting the quality of the dataset. In the example, the complete dataset records are: 2024-07-01, 11.2 hours, 890W / m³. 2 28.5℃, 1.5m / s, 20%.

[0098] Extracting time-series statistics of meteorological features is crucial for generating input features for the model. These statistics quantify the temporal variation patterns of meteorological features, improving model prediction accuracy. Extracted statistics include: the moving average of meteorological parameters over the past 7 days (reflecting short-term weather trends), the deviation of the current day's meteorological parameters from the average of the past 30 days (reflecting the fluctuation of the current day's weather relative to the monthly average), and the difference between the daily maximum and minimum values ​​of the current day's meteorological parameters (reflecting the amplitude of daily weather fluctuations). In the example, G_mean = 890W / m for July 1, 2024. 2The 7-day moving average of G_mean is 870W / m 2 Compared with the average of the past 30 days (850W / m 2 The deviation is 40W / m 2 The maximum value of G_mean on that day was 950W / m 2 Minimum value 780W / m 2 The difference is 170W / m 2 These statistics are then integrated with the meteorological parameters of the day to serve as the input features for that date.

[0099] Finally, the historical power generation duration-meteorological feature dataset was divided into a training sample set and a validation sample set in an 8:2 ratio. 80% (approximately 292 days) of the data was used as the training sample set for model training, and 20% (approximately 73 days) of the data was used as the validation sample set for model accuracy testing. The division process adopted a random sampling method to ensure that the training sample set could fully cover the data under different seasons and meteorological conditions. After generating the training sample set, an integrity check was performed to ensure that there were no missing input features and no data anomalies. After passing the check, the data was stored in the model training unit to provide support for subsequent model building.

[0100] A time series prediction model was established, using historical power generation duration as the main time series and meteorological characteristics as exogenous variables. The model was trained using the Prophet model to generate a preliminary power generation duration prediction model. The core of this step is to construct a time-series model adapted for predicting photovoltaic power generation duration. Leveraging the advantages of the Prophet model, it integrates historical power generation duration patterns and meteorological characteristics. Through model training, it learns the inherent correlations and generates a preliminary model capable of accurately predicting future power generation duration. The specific implementation method is as follows: Time series forecasting models are core models used to capture the changing patterns of time series data and infer future trends. The daily effective power generation duration of photovoltaic power plants exhibits obvious time series characteristics, showing seasonal periodicity (longer in summer, shorter in winter), trend (slightly decreasing due to component aging), and randomness (affected by daily weather fluctuations). Therefore, the Prophet model was chosen as the core forecasting model. The Prophet model is a forecasting model suitable for time series data with periodicity and trends. Its core advantage is its strong robustness to missing and outliers, and its ability to flexibly incorporate exogenous variables, making it very suitable for time series forecasting scenarios such as photovoltaic power generation duration, which is significantly affected by meteorological exogenous variables.

[0101] The model's input consists of two parts: the main time series and exogenous variables. The main time series is a historical power generation duration sequence, arranged in chronological order, representing the temporal variation pattern of daily effective power generation duration and serving as the core basis for model prediction. The exogenous variables are the meteorological feature time series statistics extracted in the first step (including daily meteorological parameters, the 7-day moving average, and the deviation from the 30-day average, etc.). These variables can quantify the impact of meteorological conditions on power generation duration. Integrating them into the model can significantly improve prediction accuracy and overcome the limitations of relying solely on time series patterns for prediction.

[0102] The core process of model training is to learn the mapping relationship between the main time series and exogenous variables through the Prophet model. Key parameters are set during training to ensure effective model training: the model's time period is set to 365 days (corresponding to seasonal periodicity); the trend term uses a linear trend (to match the slow decline in power generation duration caused by slight component aging); and the seasonality term uses Fourier series fitting (to capture seasonal variation patterns). The number of training iterations is set to 100 to ensure the model fully learns the data patterns. Mean squared error (MSE) is used as the loss function to measure the deviation between the model's predicted values ​​and historical actual power generation duration. The smaller the loss function value, the higher the model's prediction accuracy. The training objective is a loss function value ≤ 0.02 (unit: hours). 2 ).

[0103] In the example, the training sample set is input into the Prophet model. Initially, the loss function value is 0.15. As the number of iterations increases, the loss function value gradually decreases, reaching 0.018 after 82 iterations, achieving the training objective, and training stops. After training, the model's accuracy is tested using a validation sample set. The exogenous variables of the validation sample set are input into the model to obtain the predicted power generation duration for the validation set. This is compared with the actual historical power generation duration of the validation set to calculate the mean absolute error (MAE). In this example, the MAE is 0.32 hours, indicating that the average deviation between the model's predicted value and the actual value is only 0.32 hours, meeting the prediction accuracy requirements for photovoltaic power plant power generation duration forecasting. After passing the test, the model parameters are saved, generating a preliminary power generation duration prediction model. This model can receive future weather forecast information as exogenous variables and output the corresponding predicted power generation duration values, providing core support for subsequent future time period predictions.

[0104] By inputting future weather forecast information as an exogenous variable into the preliminary power generation duration prediction model, the point prediction value of the theoretical power generation duration per day within a specified future period is obtained, and a preliminary power generation duration prediction sequence is generated. The core of this step is to use a preliminary prediction model, combined with future meteorological forecast information, to predict the daily power generation duration within a specified future period, generating a preliminary prediction sequence. This lays the foundation for subsequent confidence interval calculations and the generation of final forecast results. The specific implementation method is as follows: Future weather forecast information is the core exogenous variable in model prediction, and its accuracy directly affects the accuracy of the prediction results. Therefore, it is necessary to obtain authoritative and accurate future weather forecast data. The data source is a professional weather forecast platform, and the prediction parameters are consistent with historical weather data, including the average daily solar irradiance G_pred, the average ambient temperature T_pred, the average wind speed V_pred, and the average cloud cover C_pred. The time resolution of the prediction data is 1 day to ensure that it matches the time scale of the daily power generation duration prediction.

[0105] The specified period is set based on the operational needs of the photovoltaic power plant, and is typically set to the next 7 days, which predicts the theoretical daily power generation duration for the next week. This period satisfies short-term dispatch and operation needs while ensuring the accuracy of meteorological forecast information (the accuracy of meteorological forecasts for the next 7 days is usually ≥85%). If the power plant has long-term dispatch needs, the specified period can be adjusted to the next 30 days, while optimizing the accuracy of meteorological forecast data. In the example, the specified period is from July 8, 2024 to July 14, 2024 (7 days), and the obtained future meteorological forecast information includes: July 8, G_pred=880W / m 2 , T_pred=28.2℃, V_pred=1.4m / s, C_pred=25%; G_pred=820W / m on July 9 2 , T_pred=27.8℃, V_pred=1.6m / s, C_pred=30%, etc.

[0106] Before inputting the data into the model, the future weather forecast information needs to be preprocessed to extract time-series statistics consistent with the training sample set. This ensures that the format and dimensions of exogenous variables are consistent with those used during model training, avoiding prediction failures due to input mismatch. The preprocessing process includes: calculating the 7-day moving average of daily weather parameters (combining historical data from the most recent 6 days and the current day's forecast data); calculating the deviation between the current day's forecast parameters and the historical averages of the past 30 days; and calculating the difference between the daily maximum and minimum values ​​of the current day's forecast parameters. In the example, on July 8th, G_pred = 880W / m 2 The 7-day moving average = (the historical 6-day G_mean + 880) / 7 = 875W / m 2 Compared to the historical average of 850W / m³ over the past 30 days 2 The deviation is 30W / m 2 The maximum value of G_pred on that day was 940W / m 2 Minimum value 790W / m 2 The difference is 150W / m 2 .

[0107] The preprocessed exogenous variables for future weather forecasts are input into the preliminary power generation duration prediction model one by one in chronological order. Based on the correlation patterns learned during training, the model outputs a point prediction value T_pred (in hours) for the theoretical daily power generation duration on the corresponding date. This point prediction value represents the most likely daily power generation duration and is rounded to one decimal place to ensure accuracy. In the example, after inputting the exogenous variables on July 8th, the model outputs T_pred = 11.1 hours; on July 9th, T_pred = 10.5 hours; on July 10th, T_pred = 10.8 hours, and so on. The daily point prediction values ​​for a specified future period are integrated in chronological order to generate a preliminary power generation duration prediction sequence. This sequence contains the theoretical power generation duration point prediction values ​​for each day within the specified period. After generation, a rationality check is performed. The check criteria are: the point prediction values ​​should conform to seasonal patterns (10-12 hours in summer and 7-9 hours in winter) and match the meteorological forecast parameters for the day (the higher the irradiance, the larger the point prediction value). In the example, the point prediction values ​​for July 8-14 are all within the range of 10-11.5 hours, which conforms to the summer power generation duration pattern. After passing the check, the prediction values ​​are stored in the prediction processing unit.

[0108] Based on the characteristics of historical prediction error distribution, the confidence interval of the theoretical daily power generation duration is calculated using the quantile regression method, and the power generation duration forecast result containing the confidence interval is generated by combining the point prediction value.

[0109] The core of this step is to quantify the uncertainty of the forecast results. Confidence intervals are calculated through quantile regression, and the point forecast values ​​are combined with the confidence intervals to generate complete power generation duration forecasts, providing more valuable predictive information for power plant operation. The specific implementation method is as follows: The distribution characteristics of historical prediction errors are the basis for calculating confidence intervals. Historical prediction error refers to the difference between the predicted value of each training sample and the actual historical power generation duration when the preliminary power generation duration prediction model is backtested using the training sample set. The calculation formula is E_error = T_pred_history - T_actual_history, where E_error is the historical prediction error (unit: hours), T_pred_history is the predicted value of the model backtest, and T_actual_history is the actual historical power generation duration. The error can be positive or negative. A positive value indicates that the predicted value is greater than the actual value, and a negative value indicates that the predicted value is less than the actual value.

[0110] Historical prediction errors of all training samples are collected to form a historical prediction error sequence. The distribution characteristics are analyzed. The prediction error of photovoltaic power generation duration usually follows an approximately normal distribution. Its distribution parameters include the mean error μ_error and the standard deviation of error σ_error. In the example, μ_error = 0.05 hours (a small positive deviation, the predicted value is slightly higher than the actual value) and σ_error = 0.28 hours are calculated, indicating that the historical prediction error is mainly concentrated in the range of -0.84 hours to 0.94 hours (μ_error ± 3σ_error), the distribution is relatively concentrated, and the model prediction stability is good.

[0111] Quantile regression is a core algorithm for calculating confidence intervals. Its key advantage lies in its ability to accurately calculate the upper and lower limits of predicted values ​​at different confidence levels based on the quantiles of the error distribution, reflecting the uncertainty of the prediction results and avoiding the errors of traditional normal distribution approximation methods. Considering the operational needs of photovoltaic power plants, a confidence level of 95% is set, meaning there is a 95% probability that the actual daily power generation duration will fall within this confidence interval. The core logic of quantile regression is: based on the historical prediction error distribution, calculate the lower quantile (2.5% quantile) and upper quantile (97.5% quantile) at the 95% confidence level, as correction values ​​for the upper and lower limits of the confidence interval.

[0112] In the example, quantile regression calculations yielded a lower quantile of -0.58 hours and an upper quantile of 0.68 hours at a 95% confidence level. The formulas for calculating the upper and lower limits of the confidence interval are: lower limit T_low = T_pred + lower quantile, upper limit T_high = T_pred + upper quantile, where T_pred is the predicted value for each future day. Taking July 8th as an example, T_pred = 11.1 hours. Substituting into the formula, we get T_low = 11.1 + (-0.58) = 10.52 hours and T_high = 11.1 + 0.68 = 11.78 hours. Therefore, the 95% confidence interval for the theoretical power generation duration on July 8th is [10.52, 11.78] hours.

[0113] Following the method described above, the confidence intervals for the theoretical daily power generation duration within a specified future period are calculated one by one. The daily date, point prediction value, and confidence intervals (lower and upper limits) are then integrated to generate a power generation duration forecast result containing the confidence intervals. The forecast result must clearly indicate the confidence level (95%) and explicitly explain the meaning of the confidence intervals to facilitate understanding of the uncertainty of the forecast results by power plant operators. In the example, the forecast result for July 8th is: Date 2024-07-08, theoretical power generation duration point prediction value 11.1 hours, 95% confidence interval [10.52, 11.78] hours; for July 9th: point prediction value 10.5 hours, confidence interval [9.92, 11.18] hours, etc.

[0114] After the forecast results are generated, a final verification is performed. The verification criteria are: the confidence interval width should be controlled within 1.5 hours (to ensure the accuracy of the forecast), and the point predicted value should be in the exact middle of the confidence interval (to ensure uniform error distribution). In the example, the confidence interval width for all dates is within the range of 1.2-1.3 hours, and the point predicted value is in the middle of the interval, thus passing the verification. Finally, the power generation duration forecast results are output, providing accurate reference for the short-term scheduling, power generation estimation, and operation and maintenance plan formulation of photovoltaic power plants, completing the entire photovoltaic power plant power generation duration calculation and forecasting process.

[0115] Another embodiment of the present invention provides a photovoltaic power plant power generation duration calculation system, see [link to relevant documentation]. Figure 3 The system may include: The acquisition module 301 is used to acquire time-series data of solar irradiance and meteorological parameter sequences of the area where the photovoltaic power station is located, and simultaneously acquire real-time output current and voltage data at the photovoltaic string level; Module 302 is used to dynamically construct the power generation efficiency decay curve of the photovoltaic string based on the solar irradiance time series data and meteorological parameter sequence through a physical-driven and data-driven fusion model. The curve represents the actual conversion efficiency change of the string under different time periods and meteorological conditions. The correction module 303 is used to perform time-varying performance correction on the real-time output current and voltage data using the power generation efficiency decay curve, and generate a standardized power generation sequence that eliminates the effects of environmental fluctuations and equipment aging. The identification module 304 is used to identify the start and end points of the effective power generation period of each day according to the standardized power generation sequence using an adaptive threshold detection algorithm, and to calculate the duration of the cumulative power exceeding the set threshold within each power generation day. The extrapolation module 305 is used to combine historical power generation duration data with meteorological forecast information, extrapolate the theoretical daily power generation duration of photovoltaic power plants within a specified future period through a time series prediction model, and generate power generation duration forecast results including confidence intervals.

[0116] This invention also provides a storage medium storing a computer program, wherein the computer program is configured to execute the steps in any of the above method embodiments when running.

[0117] This invention also provides an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to run the computer program to perform the steps in any of the above method embodiments.

[0118] Specifically, the aforementioned electronic device may further include a transmission device and an input / output device, wherein the transmission device is connected to the aforementioned processor, and the input / output device is connected to the aforementioned processor.

[0119] The above description, based on the embodiments shown in the figures, details the structure, features, and effects of the present invention. The above description is only a preferred embodiment of the present invention, but the present invention is not limited to the scope of implementation shown in the figures. Any changes made in accordance with the concept of the present invention, or equivalent embodiments modified to have equivalent changes, that do not exceed the spirit covered by the specification and figures, should be within the protection scope of the present invention.

Claims

1. A method for calculating the power generation duration of a photovoltaic power station, characterized in that, The method includes: Collect time-series data of solar irradiance and meteorological parameter sequences in the area where the photovoltaic power station is located, and simultaneously acquire real-time output current and voltage data at the photovoltaic string level; Based on the solar irradiance time series data and meteorological parameter sequence, the power generation efficiency decay curve of the photovoltaic string is dynamically constructed through a physical-driven and data-driven fusion model. The curve represents the actual conversion efficiency change of the string under different time periods and meteorological conditions. The power generation efficiency decay curve is used to perform time-varying efficiency correction on real-time output current and voltage data, generating a standardized power generation sequence that eliminates the effects of environmental fluctuations and equipment aging. Based on the standardized power generation sequence, an adaptive threshold detection algorithm is used to identify the start and end points of the effective power generation period each day, and to calculate the duration during which the cumulative power exceeds a set threshold within each power generation day. By combining historical power generation duration data with meteorological forecast information, the theoretical daily power generation duration of photovoltaic power plants within a specified future period is extrapolated through a time series prediction model, and power generation duration forecast results including confidence intervals are generated.

2. The method according to claim 1, characterized in that, The process involves collecting time-series data on solar irradiance and meteorological parameters for the area where the photovoltaic power station is located, and simultaneously acquiring real-time output current and voltage data at the photovoltaic string level, including: Through a distributed environmental monitoring network, data on solar irradiance, ambient temperature, humidity, wind speed, and cloud cover are collected at a fixed sampling frequency to generate raw environmental monitoring time-series datasets. Data cleaning was performed on the original environmental monitoring time series dataset. An abnormal data point was identified and removed using a sliding window anomaly detection algorithm. Time series interpolation methods were applied to fill in missing values ​​and generate a cleaned environmental parameter sequence. The intelligent data acquisition unit of the photovoltaic string synchronously collects the real-time output current and voltage data of each string, ensuring that the acquisition timestamp is strictly aligned with the environmental monitoring data, and generates the original sequence of string-level electrical parameters. The cleaned environmental parameter sequence is spatiotemporally registered and fused with the original string-level electrical parameter sequence to construct an environmental-electrical parameter fusion dataset with a unified time reference.

3. The method according to claim 2, characterized in that, Based on the solar irradiance time-series data and meteorological parameter sequences, a photovoltaic string power generation efficiency degradation curve is dynamically constructed through a fusion model of physical and data-driven approaches. This curve characterizes the actual conversion efficiency changes of the string under different time periods and meteorological conditions, including: Solar irradiance, ambient temperature, humidity, and wind speed sequences are extracted from the environmental-electrical parameter fusion dataset. At the same time, the actual output power of the string at the corresponding time is calculated to generate a theoretical-actual power comparison dataset. A physical driving model is established. Based on the performance parameters of photovoltaic modules under standard test conditions, the theoretical output power is calculated in combination with the environmental parameter sequence. The instantaneous conversion efficiency is obtained by comparing it with the actual output power, and an instantaneous efficiency sequence is generated. A data-driven efficiency decay model is constructed, in which environmental parameter sequences are used as input features and instantaneous efficiency sequences are used as training targets. The dynamic impact of environmental factors on conversion efficiency is learned through a gated recurrent unit network to obtain an efficiency prediction model. The efficiency prediction model is used to make rolling predictions for future periods, and at the same time, the component aging factor and dust accumulation attenuation factor are introduced for correction, generating a dynamic power generation efficiency attenuation curve.

4. The method according to claim 3, characterized in that, The process of using the power generation efficiency decay curve to perform time-varying efficiency correction on real-time output current and voltage data, generating a standardized power generation sequence that eliminates the effects of environmental fluctuations and equipment aging, includes: Based on the power generation efficiency decay curve, the actual conversion efficiency value corresponding to each time point is extracted, the efficiency correction coefficient at that time relative to the standard test conditions is calculated, and a time-varying efficiency correction coefficient sequence is generated. Based on the time-varying efficiency correction coefficient sequence, the real-time current and voltage data in the original sequence of string-level electrical parameters are corrected, and the measured power is divided by the corresponding efficiency correction coefficient to generate a preliminary standardized power sequence. The preliminary standardized power sequence is smoothed by using a local weighted regression algorithm to eliminate random noise and instantaneous disturbances, and a smoothed standardized power sequence is generated. The smoothed standardized power sequence is compared and analyzed with the theoretical power curve under ideal conditions. The correction error is calculated and the correction parameters are iteratively optimized to generate the final standardized power generation sequence.

5. The method according to claim 4, characterized in that, The step of identifying the start and end points of the effective power generation period each day using an adaptive threshold detection algorithm based on the standardized power generation sequence, and calculating the duration for which the cumulative power generation exceeds a set threshold within each power generation day, includes: The standardized power generation sequence is divided into natural days, and the statistical characteristic parameters of each segment are calculated, including mean, standard deviation, skewness and kurtosis, to generate a daily power statistical characteristic set. Based on the daily power statistics feature set, a dynamic threshold generation algorithm is adopted, which uses a specific percentage of the daily power mean as the base threshold and combines the standard deviation for floating adjustment to generate a daily adaptive power threshold. The power sequence of the corresponding day is binarized using a daily adaptive power threshold. The time when the power value is greater than the power threshold is marked as the effective power generation time, and a daily effective power generation mark sequence is generated. The system performs continuous segment detection on the daily effective power generation marker sequence, identifies the start and end time points when the power continuously exceeds the power threshold, calculates the duration of each continuous segment and accumulates them to generate the daily effective power generation duration.

6. The method according to claim 5, characterized in that, The method combines historical power generation duration data with meteorological forecast information, uses a time-series prediction model to extrapolate the theoretical daily power generation duration of a photovoltaic power station within a specified future period, and generates power generation duration forecast results including confidence intervals, including: Collect historical power generation duration data and historical meteorological data of the same period to construct a historical power generation duration-meteorological feature dataset, extract time series statistics of meteorological features as model input features, and generate a training sample set; A time series prediction model was established, using historical power generation duration as the main time series and meteorological characteristics as exogenous variables. The model was trained using the Prophet model to generate a preliminary power generation duration prediction model. By inputting future weather forecast information as an exogenous variable into the preliminary power generation duration prediction model, the point prediction value of the theoretical power generation duration per day within a specified future period is obtained, and a preliminary power generation duration prediction sequence is generated. Based on the characteristics of historical prediction error distribution, the confidence interval of the theoretical daily power generation duration is calculated using the quantile regression method, and the power generation duration forecast result containing the confidence interval is generated by combining the point prediction value.

7. A photovoltaic power station power generation duration calculation system, characterized in that, The system includes: The data acquisition module is used to collect time-series data of solar irradiance and meteorological parameter sequences in the area where the photovoltaic power station is located, and simultaneously acquire real-time output current and voltage data at the photovoltaic string level; The module is used to dynamically construct the power generation efficiency decay curve of the photovoltaic string based on the solar irradiance time series data and meteorological parameter sequence through a physical-driven and data-driven fusion model. The curve represents the actual conversion efficiency change of the string under different time periods and meteorological conditions. The correction module is used to perform time-varying performance correction on the real-time output current and voltage data using the power generation efficiency decay curve, and generate a standardized power generation sequence that eliminates the effects of environmental fluctuations and equipment aging. The identification module is used to identify the start and end points of the effective power generation period of each day based on the standardized power generation sequence using an adaptive threshold detection algorithm, and to calculate the duration during which the cumulative power exceeds a set threshold in each power generation day. The extrapolation module combines historical power generation duration data with meteorological forecast information to extrapolate the theoretical daily power generation duration of photovoltaic power plants within a specified future period using a time-series prediction model, and generates power generation duration forecast results including confidence intervals.

8. The system according to claim 7, characterized in that, The acquisition module is specifically used for: Through a distributed environmental monitoring network, data on solar irradiance, ambient temperature, humidity, wind speed, and cloud cover are collected at a fixed sampling frequency to generate raw environmental monitoring time-series datasets. Data cleaning was performed on the original environmental monitoring time series dataset. An abnormal data point was identified and removed using a sliding window anomaly detection algorithm. Time series interpolation methods were applied to fill in missing values ​​and generate a cleaned environmental parameter sequence. The intelligent data acquisition unit of the photovoltaic string synchronously collects the real-time output current and voltage data of each string, ensuring that the acquisition timestamp is strictly aligned with the environmental monitoring data, and generates the original sequence of string-level electrical parameters. The cleaned environmental parameter sequence is spatiotemporally registered and fused with the original string-level electrical parameter sequence to construct an environmental-electrical parameter fusion dataset with a unified time reference.

9. A storage medium, characterized in that, The storage medium stores a computer program, wherein the computer program is configured to execute the method of any one of claims 1-6 when it is run.

10. An electronic device comprising a memory and a processor, characterized in that, The memory stores a computer program, and the processor is configured to run the computer program to perform the method of any one of claims 1-6.