A photovoltaic power generation system power supply quality optimization control method
By combining hierarchical predictive control with reinforcement learning, the problem of unstable power supply quality in photovoltaic power generation systems under complex environments is solved, achieving global optimization and real-time stable control of the system, thereby improving power supply quality and response capability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- QINGHAI WEIHANGBEI INNOVATIVE ENERGY TECH CO LTD
- Filing Date
- 2026-02-05
- Publication Date
- 2026-06-19
AI Technical Summary
Existing photovoltaic power generation system control methods are unable to cope with complex and ever-changing external environments and load variations, resulting in unstable power supply quality, lack of self-learning ability, and insufficient adaptability of traditional control strategies, making it difficult to achieve the global optimal operation of the system.
By combining hierarchical predictive control and reinforcement learning, a comprehensive control framework is constructed. The strategic, tactical, and operational layers handle long-term optimization, medium-term coordination, and short-term real-time adjustment, respectively. The adaptive optimization strategy of reinforcement learning is used to dynamically adjust the control strategy, thereby achieving global optimization and real-time stable control of the power supply quality of the photovoltaic power generation system.
It significantly improves the power supply quality and stability of photovoltaic power generation systems in dynamic environments, reduces voltage and frequency fluctuations, enhances the system's response capability and long-term efficiency, strengthens its adaptability to uncertainties, and achieves adaptive control.
Smart Images

Figure CN122246859A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of photovoltaic power generation, and more specifically to a method for optimizing and controlling the power supply quality of a photovoltaic power generation system. Background Technology
[0002] The output power of photovoltaic (PV) power generation systems is significantly affected by external factors such as weather changes and cloud cover, leading to instability in power quality. Traditional PV control systems primarily employ a single feedback control strategy, which struggles to effectively cope with complex and ever-changing external environments and load variations, thus impacting grid stability and power quality. With the development of smart grid technology, higher demands are placed on the real-time control and optimization of PV power generation systems. Existing PV control methods mainly include PID control, fuzzy control, and model predictive control. These methods have limitations in handling the nonlinear characteristics of the system and variable environments, making it difficult to simultaneously achieve short-term stability and long-term optimization goals. Especially when facing sudden weather changes or large load fluctuations, they often fail to make timely and effective adjustments, resulting in a decline in power quality. Therefore, designing a control method for PV power generation systems that can adapt to complex environments and possess self-learning capabilities has become a current research hotspot and challenge. Hierarchical predictive control, as an advanced control strategy, can effectively handle control problems at multiple time scales, but its application in PV power generation systems has not yet been explored. Meanwhile, reinforcement learning, as a cutting-edge technology in artificial intelligence, possesses adaptive learning and continuous optimization capabilities, providing new ideas for solving control problems in complex dynamic systems. However, how to organically combine hierarchical predictive control with reinforcement learning to optimize the power supply quality of photovoltaic power generation systems remains a technical challenge that urgently needs to be solved.
[0003] Traditional photovoltaic power generation system control methods mainly include PID control and fuzzy control. Although these methods can improve system performance to a certain extent, they are often difficult to achieve optimal control when facing complex and ever-changing environments. Existing photovoltaic power generation system control methods have the following limitations: (1) Single control strategies are difficult to cope with complex and ever-changing environments: Traditional PID control, fuzzy control and other methods perform well in static or slowly changing environments, but they are difficult to achieve fast and accurate adjustment when facing situations such as rapid changes in light intensity and sudden changes in load, resulting in fluctuations in power supply quality. (2) Lack of consideration for long-term system optimization: Most control methods only focus on short-term stability control and ignore long-term efficiency optimization, and cannot achieve the global optimal operation of the system. (3) Insufficient adaptability: Fixed parameter control strategies are difficult to adapt to changes in photovoltaic power generation characteristics in different regions and seasons, requiring frequent manual adjustments and increasing maintenance costs.
[0004] Hierarchical predictive control has been widely used in industrial process control, but its application in photovoltaic power generation systems is relatively limited. Existing research mainly focuses on optimization at a single time scale, lacking a unified consideration of short-term and long-term control. Although hierarchical predictive control has advantages in handling multi-time scale problems, its application in photovoltaic power generation systems still has the following limitations: (1) Strong model dependence: The effectiveness of hierarchical predictive control is highly dependent on the accuracy of the system model. However, the environmental factors involved in photovoltaic power generation systems are complex, and establishing an accurate mathematical model is extremely challenging. (2) High computational complexity: The multi-level prediction and optimization process involves a large amount of computation, which may affect the real-time performance of the control, especially in large-scale photovoltaic systems. (3) Difficulty in inter-level coordination: Control objectives at different levels may conflict, and achieving a balance between short-term stability and long-term efficiency is a challenge.
[0005] Reinforcement learning has shown great potential in power system optimization control in recent years, but its application in photovoltaic power generation systems is still in its early stages, especially in the application of its combination with hierarchical predictive control. Although reinforcement learning performs well in adaptive control, its application in photovoltaic power generation systems still faces the following challenges: (1) Low learning efficiency: In complex photovoltaic system environments, reinforcement learning algorithms may require a large amount of training data and time to converge to the optimal strategy, which may be difficult to achieve in practical applications. (2) Insufficient safety considerations: Reinforcement learning may produce unsafe control behaviors during the exploration process, which is unacceptable in power systems. How to learn effectively while ensuring system safety is a key issue. (3) Poor interpretability: The decision-making process of reinforcement learning algorithms often lacks transparency, making it difficult to explain the rationality of the control strategy to system operators, which increases the difficulty of practical applications. (4) Limited generalization ability: Reinforcement learning models trained in specific environments may be difficult to adapt to new operating conditions, such as changes in system characteristics caused by seasonal changes or equipment upgrades.
[0006] In summary, Hierarchical Model Predictive Control (HMPC), as an advanced control strategy, can optimize system performance at different time scales and effectively handle short-term fluctuations and long-term changes in photovoltaic power generation systems. Reinforcement Learning (RL), as an adaptive optimization method, optimizes decision-making strategies through continuous interaction with the environment, enabling it to adapt to the complex dynamic environment of photovoltaic power generation systems. Combining HMPC and RL to form a comprehensive control framework holds promise for providing an innovative solution for optimizing the power quality of photovoltaic power generation systems. This invention aims to develop a power quality optimization control method for photovoltaic power generation systems based on the combination of hierarchical predictive control and reinforcement learning. This method achieves synergistic optimization of short-term and long-term control through a hierarchical predictive control framework, while continuously adjusting and optimizing the control strategy using reinforcement learning algorithms to adapt to the dynamic characteristics and environmental changes of the photovoltaic power generation system. Summary of the Invention
[0007] To address the above problems, this invention provides a method for optimizing and controlling the power supply quality of a photovoltaic power generation system. The specific solution is as follows:
[0008] A method for optimizing power supply quality control in a photovoltaic power generation system includes the following steps: S1: Constructing a hierarchical predictive control framework, including a strategic layer, a tactical layer, and an operational layer, for achieving long-term optimized scheduling, medium-term coordinated control, and short-term real-time adjustment, respectively; S2: Establishing a reinforcement learning adaptive optimization strategy to continuously optimize control decisions through environmental interaction and reward mechanisms; S3: Combining the reinforcement learning adaptive optimization strategy in S2 with the hierarchical predictive control framework constructed in S1, optimizing the long-term scheduling strategy at the strategic layer, optimizing the resource allocation scheme at the tactical layer, and dynamically adjusting the control input at the operational layer, forming a comprehensive control framework oriented towards multiple time scales, thereby achieving global optimization and real-time stable control of the power supply quality of the photovoltaic power generation system.
[0009] The hierarchical predictive control framework divides system control into short-term and long-term controls, addressing the real-time stability of instantaneous parameters such as voltage and frequency, and the global optimization problem under environmental changes, respectively. The reinforcement learning adaptive optimization strategy dynamically adjusts the weights and decision paths of short-term and long-term control objectives by acquiring feedback information from the environment in real time, ensuring dynamic optimization of power quality in complex and ever-changing environments.
[0010] Furthermore, the construction of the hierarchical predictive control framework described in step S1 involves dividing the entire control system into short-term control and long-term control, as detailed below: (S1.1) Establish a dynamic model of the photovoltaic power generation system and define state variables and control variables; (S1.2) Design a short-term predictive controller and calculate the optimal control input based on the current system state and the predictive model; (S1.3) Design long-term control, including: collecting environmental data, analyzing and predicting through reinforcement learning algorithms, designing long-term control strategies, and continuously learning from the environment and adjusting the control strategies through reinforcement learning algorithms.
[0011] The combination of the control framework and reinforcement learning strategy includes the following steps: (1) Strategic Layer Integration: Reinforcement learning strategies optimize long-term control objectives, such as maximizing power conversion efficiency and improving power supply stability, by analyzing historical operating data and predicting future environments at the strategic layer. The strategic layer adopts the Model Predictive Control (MPC) method to model and optimize long-term behavioral strategies.
[0012] (2) Tactical layer integration: In the tactical layer, the reinforcement learning strategy dynamically adjusts the local control objectives according to the current operating state of the system to cope with complex and ever-changing environmental conditions. The tactical layer undertakes the mid-term scheduling task, optimizes energy storage regulation and load distribution, and also adopts the MPC algorithm to improve the accuracy and flexibility of local control.
[0013] (3) Operational layer fusion: The operational layer uses reinforcement learning to adjust control variables in real time, including inverter control input and energy storage output control, to ensure the stability of instantaneous parameters such as system voltage and frequency. This layer uses a PID control algorithm to ensure the real-time performance and robustness of the control response.
[0014] In summary, this integrated control framework achieves coordinated optimization across multiple time scales through the deep integration of reinforcement learning and hierarchical control strategies, thereby effectively improving the power supply quality and system stability of photovoltaic power generation systems.
[0015] The integrated control framework formed in step (3) can coordinate the control objectives of each layer at different time scales, achieving the system's global optimal performance and rapid response capability. Simulation and experimental verification show that this framework can significantly improve power supply quality, reduce power fluctuations, and optimize energy utilization efficiency.
[0016] Therefore, this invention constructs a multi-level hierarchical control framework that integrates reinforcement learning mechanisms, forming a comprehensive control scheme that covers long-term planning, local optimization, and real-time adjustment, which can adapt to the operational needs of complex photovoltaic power generation systems.
[0017] Furthermore, the integrated control framework described in step S3 comprises three levels: strategic, tactical, and operational, which correspond to long-term optimization, medium-term coordinated control, and short-term real-time control tasks, respectively. The strategic layer employs a model-based predictive control (MPC) approach, whose objective function is expressed as: (1) in, and They represent the first Active and reactive power at any given time, in watts (W). , and These are dimensionless weighting coefficients used to balance the importance of different objective terms. The penalty term, measured in W, is used to measure the cost or performance loss of a system when it violates control constraints. The tactical layer also employs the MPC method, and its objective function can be expressed as: (2) in, It is the optimization time window at the tactical level, with the unit being "steps," representing the number of predicted steps within the system control cycle; , and , which is a dimensionless weighting coefficient used to balance the optimization weights of active power, reactive power, and penalty terms in the objective function; The operation layer employs a PID control algorithm, and its control logic is expressed as follows: (3) in, It is a control signal. It is an error signal. , and These are the proportional, integral, and differential coefficients, respectively.
[0018] Formula (3) describes the control logic of the operation layer, the purpose of which is to adjust the system operating parameters (such as voltage, frequency, etc.) in real time according to the control target value output by the tactical layer; and to minimize the real-time error signal through the PID control algorithm. This ensures the accuracy and response speed of instantaneous control, ultimately achieving precise control of the photovoltaic power generation system;
[0019] The three objective functions mentioned above correspond to the strategic, tactical, and operational layers of the hierarchical predictive control framework, respectively. The strategic layer function is used for global optimization control, determining the global power allocation for future time periods; the tactical layer function is used for local optimization control, refining and correcting the results from the strategic layer; and the operational layer function achieves precise control of power quality over short time scales by adjusting system operating parameters in real time. These three functions work together to ensure optimized power quality across different time scales.
[0020] Furthermore, the reinforcement learning adaptive optimization strategy employs the Q-learning algorithm and combines it with the multi-level control structure of the hierarchical predictive control framework to achieve dynamic optimization and adjustment of short-term control inputs and long-term strategies.
[0021] Combining the characteristics of Q-learning algorithm and hierarchical predictive control, it can dynamically adapt to constantly changing environmental and load conditions, thereby significantly improving the stability and reliability of power supply quality. Furthermore, reinforcement learning is an adaptive learning method based on environmental feedback. Its goal is to continuously adjust the strategy within a given state space through trial and error to maximize cumulative rewards, i.e., the long-term benefits of the system.
[0022] Furthermore, when establishing an adaptive optimization strategy for reinforcement learning, it is necessary to construct the state space, action space, and reward function in the reinforcement learning control process, that is, to perform environmental modeling and control variable modeling, which includes the following elements: (1) Define the state space The state vector, composed of environmental variables such as photovoltaic power generation, load power, ambient temperature, and light intensity, represents the current system state. (4) in, This indicates the current photovoltaic power generation. Indicates the current load power. Current light intensity, Current ambient temperature; (2) Define the action space A t This includes the control input of the photovoltaic inverter and the output regulation of the energy storage system, enabling dynamic adjustment of control variables. (5) Among them, A t For the action space, the unit depends on the specific control interface; represents the control input of the photovoltaic inverter, in the form of duty cycle (%) or voltage (V). This indicates the output regulation of the energy storage system, with units of power (W) and current (A). (3) Define the reward function R tBy combining power quality, power efficiency, and system stability, a reward signal is constructed to guide the control strategy towards the optimal direction, as shown below: (6) in, and These are the system's voltage and frequency, respectively. and These are the reference voltage and frequency, respectively. For the sake of system operating efficiency, , and These are weighting factors, which respectively control the degree of influence of voltage, frequency, and system efficiency on the reward; (4) Strategy optimization: The reinforcement learning strategy is dynamically updated through the Q-Learning algorithm to gradually optimize the control strategy.
[0023] The reward function is the core of a reinforcement learning strategy, guiding the agent to select the optimal action. Specifically, the design goal of the reward function is to minimize the voltage and frequency deviations of the system while considering improvements in system efficiency. By optimizing the reward function, the optimal control strategy can be approximated step by step, achieving adaptive control in complex dynamic environments.
[0024] During reinforcement learning, the agent collects environmental information in the state space S, adjusts the system based on control choices in the action space A, and continuously updates the control strategy based on feedback from the reward function R. This process is performed in a closed-loop manner, ultimately achieving real-time optimized control of the power supply quality of the photovoltaic power generation system.
[0025] The above elements together constitute the core control process of reinforcement learning, where: the state space S defines the environmental information of the system operation, including photovoltaic power generation, load power, ambient temperature, and light intensity, used to describe the current operating state of the system; the action space A... t The executable control inputs of the system are defined, including the control inputs of the photovoltaic inverter and the energy storage system, which are used to adjust the system operating parameters. The reward function R is used to measure the quality of the actions. The control strategy is iteratively updated by optimizing the reward function. The ultimate goal is to improve the power supply quality and operating efficiency of the system.
[0026] Furthermore, Q-Learning is a value-function-based reinforcement learning algorithm that continuously updates the state-action value function. The strategy for finding the optimal action under different states is updated according to the following rules: (7) in, and These represent the current state and action, respectively. As an immediate reward after performing the action, The learning rate controls the update step size of the value function. This is a discount factor that measures the impact of future rewards on current decisions.
[0027] At each time step, the photovoltaic power generation system adjusts according to the current state. Choose an action A new state is generated based on the interaction between the action and the environment. and rewards Then adjust according to formula (7) This allows future decisions to yield higher cumulative rewards.
[0028] Furthermore, for reinforcement learning at the strategic level, a long-term control policy optimization equation based on Q-learning is adopted, as follows: (8) in, This indicates the action to be taken under the strategic state SstratS_{\text{strat}}Sstrat. The corresponding state-action value function is used to evaluate the long-term benefits of the policy behavior; It indicates the current system status at the strategic level, typically including long-term load forecasts and environmental trend parameters; This indicates action decisions within the strategic control strategy. Represents the immediate reward value at the strategic level, quantifying the contribution of the current strategy to the improvement of the system's long-term power supply quality and efficiency. Unit: dimensionless or power gain quantification value. The learning rate controls the current learning step size; the unit is dimensionless. This represents the discount factor, which measures the importance of future rewards to current decisions; the unit is dimensionless. Indicates the next state The maximum expected return among all possible actions is used for strategy improvement; It represents any action from the set of possible future actions.
[0029] Formula (8) is the strategic layer Q-learning equation, used to optimize long-term control strategies. This equation is derived by analyzing historical data and future predictions, combined with the reward signal R. strat The system dynamically adjusts strategic-level control strategies to maximize long-term operational benefits. Its core objective is to guide the system to select the optimal long-term strategy under changing environmental conditions, thereby optimizing overall power supply quality and efficiency.
[0030] For tactical layer reinforcement learning, the strategy is to adjust the strategy based on real-time environmental feedback and optimize the inverter output and energy storage scheduling.
[0031] For reinforcement learning at the operational layer, a real-time control optimization equation based on Q-learning is adopted, as follows: (9) in, Indicates the operational layer state Next action The obtained state-action value function is used to evaluate the effectiveness of the instantaneous control strategy; The system's real-time state vector, representing the operational layer, typically includes measurable physical variables such as current voltage, current, and frequency deviation. This indicates the control actions at the operational level, such as inverter output regulation, voltage setting, and energy storage response; It represents the immediate reward value of the operational layer, which measures the contribution of control behavior to voltage / frequency stability. The unit can be a dimensionless score or an inverse error ratio. The learning rate represents the degree to which current experience influences the update of the Q value; This represents the discount factor, reflecting the importance of future rewards; Indicates the next moment The maximum expected value under all possible actions; This represents any candidate action from the set of possible actions in the operation layer.
[0032] Equation (9) is the Q-learning equation for the operation layer, used to optimize the control strategy in a short time scale. This equation obtains the state space S of the operation layer in real time. op and action space A op Combined with reward signal R op This involves adjusting the output of the inverter and energy storage system to minimize short-term power supply errors and power losses. Its core objective is to ensure the real-time stability of power quality.
[0033] It is important to note that the strategic layer and the operational layer undertake control tasks at different time scales and achieve hierarchical collaborative control through their respective Q-learning optimization equations: the strategic layer utilizes long-term control optimization equations, based on historical data and environmental predictions, to formulate optimal operating strategies across time periods, focusing on the system's global benefits and long-term stability; the operational layer, based on real-time control optimization equations, responds to changes in the system's current state and performs rapid dynamic adjustments to key parameters (such as voltage and frequency) to ensure the instantaneous stability of the system's operation. Through the complementarity of the time dimension and the linkage at the strategic level, the two achieve collaborative optimization of the power supply quality of the photovoltaic power generation system under complex dynamic environments, effectively improving the system's overall performance and operational efficiency.
[0034] The strategic layer is responsible for overall optimization, with a longer time scale, and mainly deals with seasonal changes and long-term load trends.
[0035] The tactical layer is responsible for mid-term optimization, taking into account short-term climate change and load fluctuations. The reinforcement learning focus at this layer is on adjusting strategies based on real-time environmental feedback to optimize inverter output and energy storage scheduling.
[0036] This layer is responsible for instantaneous control and execution, primarily adjusting instantaneous parameters such as voltage and frequency to ensure real-time system stability. The reinforcement learning control algorithm in this layer executes at a high frequency and has a short response time.
[0037] Furthermore, it also includes experience replay and target network mechanisms; the experience replay stores the state-action-reward-new state pairs at each time step in the memory bank, and randomly extracts these experiences for updating during training, thereby breaking the temporal correlation of the data and improving the convergence of the algorithm; the target network, that is, during the Q-Learning update process, periodically updates its parameters to stabilize the learning process.
[0038] Furthermore, the construction of the control framework includes the following steps: (1) Layered design: The control strategy is divided into strategic, tactical and operational layers to deal with long-term, medium-term and short-term control objectives respectively; (2) Setting control objectives: Set long-term optimization objectives at the strategic level, carry out medium-term scheduling optimization at the tactical level, and execute real-time control tasks at the operational level; (3) Control algorithm configuration: The strategic and tactical layers adopt the model predictive control (MPC) method, and the operational layer adopts the proportional-integral-derivative control (PID) method to achieve the timeliness and stability of hierarchical collaborative control.
[0039] By introducing an adaptive optimization strategy based on reinforcement learning, this invention can effectively improve the power supply quality of photovoltaic power generation systems under dynamic environments. Simulation results show that, combining reinforcement learning and hierarchical predictive control, the system exhibits excellent stability and adaptability under varying climate conditions and load fluctuations, significantly reducing voltage and frequency fluctuations and improving overall power generation efficiency.
[0040] The advantages of this invention are: (1) Improve the power supply stability of the photovoltaic power generation system: Through hierarchical predictive control strategy, the system parameters are optimized at different time scales, reducing the fluctuation of key indicators such as voltage and frequency, and improving the overall stability of the system. (2) Enhance the real-time response capability of the system: By using reinforcement learning algorithm, the control strategy is optimized and adjusted in real time, improving the system's rapid response capability to load changes and environmental disturbances. (3) Optimize long-term power supply efficiency: Through optimization at the long-term control level, combined with the adaptive characteristics of reinforcement learning, the photovoltaic power generation system can achieve optimal operation under different weather conditions and load conditions, improving long-term power supply efficiency. (4) Enhance the robustness of the system: By combining the advantages of hierarchical predictive control and reinforcement learning, the system's adaptability to uncertain factors is enhanced, and the control performance in complex dynamic environments is improved. (5) Achieve adaptive control: Through continuous learning and optimization from the environment using reinforcement learning algorithm, the control strategy can be adaptively adjusted to adapt to long-term changes and new operating conditions of the photovoltaic power generation system.
[0041] Secondly, this invention is of great significance to the photovoltaic power generation industry. As the proportion of renewable energy in the power system continues to increase, the power supply quality of photovoltaic power generation systems directly affects the stable operation of the power grid and the user's electricity experience. The control method proposed in this invention is expected to significantly improve the power supply quality of photovoltaic power generation systems, providing an innovative technical path to solve the problems of unstable power supply and low efficiency faced by photovoltaic power generation systems. At the same time, the research results of this invention can provide a reference for the control optimization of other renewable energy power generation systems, promoting the development of smart grid technology. Attached Figure Description
[0042] Figure 1 Analysis of the flowchart of the hierarchical predictive control strategy;
[0043] Figure 2 A comparison of voltage fluctuations under different control methods in Example 1;
[0044] Figure 3 This is a comparison of power supply efficiency under different weather conditions in Example 1;
[0045] Figure 4 The training rounds and strategy reward curves for reinforcement learning in Example 1 are shown.
[0046] Figure 5 This is a comparison of the overall power quality optimization effect in Example 1. Detailed Implementation
[0047] To verify the effectiveness of this invention in optimizing and controlling the power supply quality of a photovoltaic power generation system, tests and verifications were conducted based on specific embodiments. The test environment included hardware configuration and software tools to ensure the validity and accuracy of the tests.
[0048] The power quality optimization control method for a photovoltaic power generation system was simulated and tested using the MATLAB platform to verify the effectiveness of the control strategy and optimization algorithm. The simulation model includes key components such as a data collection and preprocessing module, a hierarchical predictive control module (short-term and long-term control), and a reinforcement learning optimization module. The parameter settings of the simulation model referenced the technical specifications and operating data of the photovoltaic power generation system to ensure the simulation results have practical significance. The simulation hardware environment configuration included an Intel Xeon Gold 6248 processor (2.5 GHz), 256 GB DDR4 RAM, and 1 TB NVMe SSD storage. Specific parameters of the simulation model are as follows: Data collection and preprocessing module: sampling frequency of 1 kHz, data buffering time of 10 seconds.
[0049] The simulation model's parameter settings referenced the technical specifications and operational data applicable to photovoltaic power generation systems to ensure the simulation results have practical significance. The settings for key simulation parameters are shown in Table 1.
[0050] The dataset for the power quality optimization and control method of the photovoltaic power generation system is set to actual operating data, covering different weather conditions and load fluctuations. Photovoltaic module rated power: 300 kW; photovoltaic inverter efficiency: 98%; energy storage system capacity: 500 kWh; load demand fluctuation range: 200 kW - 400 kW. Specific parameter settings are shown in Table 1. Table 1 Parameter Settings
[0051]
[0052] Results analysis: Figure 2The figure shows a comparison of voltage fluctuations between the method of this invention and traditional PID control and fuzzy control over 30 seconds. As can be clearly observed from the figure, the method of this invention (represented by the solid red line) exhibits the smallest voltage fluctuation amplitude throughout the process, demonstrating its superior stability. In the initial stage—0-5 seconds—the voltage fluctuation of the method of this invention rapidly decreases from 0.15V to 0.08V, while PID control and fuzzy control decrease from 0.30V and 0.25V to 0.22V and 0.18V respectively, demonstrating the faster initial response speed of the method of this invention. During the load abrupt change—10-15 seconds—the fluctuation amplitude of the method of this invention is minimal, at only 0.05V, while PID control and fuzzy control reach 0.20V and 0.12V respectively, demonstrating the strong adaptability of the method of this invention to load changes. In the steady-state stage—25-30 seconds—the voltage fluctuation of the method of this invention remains around 0.01V, while PID control and fuzzy control reach 0.14V and 0.09V respectively. This significant difference highlights the superior long-term stability of the method described in this invention, with voltage fluctuations 93% smaller than PID control and 89% smaller than fuzzy control. After a sudden load change, such as after 10 seconds, the method can recover to stability much faster, rapidly adjusting from -0.03V to 0.02V in just 5 seconds, while PID and fuzzy control require much longer to reach a stable state. This result strongly demonstrates the advantages of the method in terms of dynamic response characteristics and anti-interference capabilities, providing a highly efficient and reliable new solution for voltage stability control in microgrid systems.
[0053] Figure 3 The invention demonstrates a comparison of power supply efficiency between the method of the present invention and traditional MPC and PI control under different weather conditions.
[0054] As clearly observed from the figures, the method of this invention exhibits the highest power supply efficiency under all weather conditions. Specifically, under sunny conditions, the power supply efficiency of the method reaches 95.50%, which is 2.50 and 5.50 percentage points higher than MPC and PI control, respectively. Under cloudy conditions, the power supply efficiency of the method is 92.75%, which is 2.50 and 5.25 percentage points higher than MPC and PI control, respectively. Even under the most unfavorable rainy conditions, the method of this invention maintains a power supply efficiency of 89.50%, which is 2.75 and 5.50 percentage points higher than MPC and PI control, respectively. These data demonstrate the superiority and stability of the method of this invention under various weather conditions. Further analysis reveals that as weather conditions change from sunny to cloudy and then to rainy, the power supply efficiency of all three methods shows a decreasing trend, but the decrease in efficiency of the method of this invention is the smallest. From sunny to rainy, the efficiency of the method of this invention decreases by 6.00 percentage points, while MPC and PI control decrease by 6.25 and 6.00 percentage points, respectively. This demonstrates that the method of the present invention not only excels in absolute efficiency but also exhibits greater adaptability and robustness in response to varying weather conditions. The presence of error bars indicates measurement uncertainty, but even considering these errors, the advantages of the method remain significant. These results strongly confirm the superior performance of the method of the present invention in improving the power supply efficiency of photovoltaic power generation systems and coping with complex weather conditions.
[0055] Figure 4 The cumulative policy gains of three different control methods during reinforcement learning training are compared.
[0056] Taking a sunny day scenario as an example, it is clear from the figure that the proposed method, indicated by the blue solid line, consistently maintains the highest cumulative policy reward throughout the entire training process. In the initial training phase—rounds 0-50—the proposed method demonstrates a significant advantage, achieving a cumulative reward of 250.75, which is 25.38% higher than the traditional control method and 11.20% higher than the genetic algorithm control method. This advantage further expands with each training round. At the end of training—rounds 300—the proposed method achieves a cumulative reward of 1501.50, which is 25.13% higher than the traditional control method and 10.97% higher than the genetic algorithm control method. Further analysis reveals that the proposed method not only has an advantage in final reward but also a significantly higher learning rate than the other two methods. Calculations show that the average growth rate of the proposed method is 5.01 values / round, while the growth rates of traditional control and genetic algorithm control are 4.00 and 4.51 values / round, respectively. This means that the proposed method can more effectively improve policy performance in each training round, demonstrating its significant advantage in learning efficiency. Although genetic algorithm control methods initially outperform traditional control methods, the gap gradually narrows as training progresses, potentially indicating that genetic algorithms may face a slowdown in convergence speed during long-term training. In contrast, the proposed method maintains high learning efficiency throughout, demonstrating superiority and stability in complex control tasks.
[0057] Figure 5 The study demonstrates a comparison of the effects of different control strategies on power quality optimization.
[0058] First, under clear weather conditions, all three control algorithms demonstrated high power quality scores, with the GBDT+MPC algorithm performing particularly well, scoring close to 96 points. This indicates that when photovoltaic power generation is sufficient, this algorithm can effectively optimize energy dispatch and energy storage management, ensuring maximum power quality. The PID control and LQR control algorithms also performed similarly, but their scores were relatively lower, at 91.00 and 93.50 points respectively. This suggests that traditional control algorithms have relatively low dispatch efficiency under stable weather conditions, especially when dealing with complex system responses, making it difficult to achieve optimal control performance. Second, under cloudy weather conditions, the power quality scores of all three algorithms decreased. This is mainly due to the increased volatility of photovoltaic power generation caused by cloudy weather, posing a greater challenge to energy dispatch. Nevertheless, the GBDT+MPC algorithm still maintained a high score, close to 94.50 points, demonstrating its stronger robustness in handling dynamically changing load demands and power generation uncertainties. In comparison, the PID control algorithm's score dropped to 89.75, while the LQR control score was 92.25, indicating that traditional control algorithms have poor flexibility and adaptability when dealing with uncertainties and are easily affected by external environmental fluctuations. Under rainy conditions, photovoltaic power generation decreases significantly, increasing the complexity of system scheduling, and the scores of the three algorithms further decrease. However, the GBDT+MPC algorithm still outperforms the other two, with a power quality score close to 93.25. This shows that the algorithm can still provide a stable power supply by maximizing the utilization of the energy storage system under extreme weather conditions through efficient scheduling strategies. The PID control algorithm scored 88.50 under this condition, while the LQR control scored 91.00, further verifying the limitations of traditional control algorithms in dealing with extreme weather conditions, especially when photovoltaic power generation fluctuates greatly, making it difficult to respond quickly and adjust scheduling strategies. Overall, the GBDT+MPC algorithm exhibits the best power quality score under various weather conditions, demonstrating that it can better predict photovoltaic power generation and load demand and perform efficient energy scheduling and control through a combination of machine learning and model predictive control. In contrast, PID control and LQR control algorithms exhibit greater instability when dealing with fluctuations in photovoltaic power generation and weather changes, especially under complex and dynamically changing operating conditions, making it difficult to guarantee optimal power quality.
Claims
1. A method for optimizing and controlling the power supply quality of a photovoltaic power generation system, characterized in that: It includes the following steps: S1: Construct a hierarchical predictive control framework, including a strategic layer, a tactical layer, and an operational layer, to respectively achieve long-term optimal scheduling, medium-term coordinated control, and short-term real-time adjustment; S2: Establish a reinforcement learning adaptive optimization strategy to continuously optimize control decisions through environmental interaction and reward mechanisms; S3: Combining the reinforcement learning adaptive optimization strategy in S2 with the hierarchical predictive control framework constructed in S1, it optimizes long-term scheduling strategies at the strategic level, optimizes resource allocation schemes at the tactical level, and dynamically adjusts control inputs at the operational level, forming a comprehensive control framework for multiple time scales, thereby achieving global optimization and real-time stable control of the power supply quality of the photovoltaic power generation system.
2. The method for optimizing and controlling the power supply quality of a photovoltaic power generation system according to claim 1, characterized in that: Step S1 describes constructing a hierarchical predictive control framework by dividing the entire control system into short-term control and long-term control. The specific method is as follows: (S1.1) Establish a dynamic model of the photovoltaic power generation system and define state variables and control variables; (S1.2) Design a short-term predictive controller and calculate the optimal control input based on the current system state and the predictive model; (S1.3) Design long-term control, including: collecting environmental data, analyzing and predicting through reinforcement learning algorithms, designing long-term control strategies, and continuously learning from the environment and adjusting the control strategies through reinforcement learning algorithms.
3. The method for optimizing and controlling the power supply quality of a photovoltaic power generation system according to claim 1, characterized in that: The integrated control framework described in step S3 comprises three levels: strategic, tactical, and operational, which correspond to long-term optimization, medium-term coordinated control, and short-term real-time control tasks, respectively. The strategic layer employs a model-predictive control (MPC) approach, whose objective function is expressed as: (1) in, and They represent the first Active and reactive power at any given time, in watts (W). , and These are dimensionless weighting coefficients used to balance the importance of different objective terms. The penalty term, measured in W, is used to measure the cost or performance loss of a system when it violates control constraints. The tactical layer also employs the MPC method, and its objective function can be expressed as: (2) in, It is the optimization time window at the tactical level, with the unit being "steps," representing the number of predicted steps within the system control cycle; , and , which is a dimensionless weighting coefficient used to balance the optimization weights of active power, reactive power, and penalty terms in the objective function; The operation layer employs a PID control algorithm, and its control logic is expressed as follows: (3) in, It is a control signal. It is an error signal. , and These are the proportional, integral, and differential coefficients, respectively.
4. The method for optimizing and controlling the power supply quality of a photovoltaic power generation system according to claim 1, characterized in that: The reinforcement learning adaptive optimization strategy employs the Q-learning algorithm and combines it with the multi-level control structure of the hierarchical predictive control framework to achieve dynamic optimization and adjustment of short-term control inputs and long-term strategies.
5. The method for optimizing and controlling the power supply quality of a photovoltaic power generation system according to claim 4, characterized in that: When establishing a reinforcement learning adaptive optimization strategy, it is necessary to construct the state space, action space, and reward function in the reinforcement learning control process, that is, to perform environmental modeling and control variable modeling, which includes the following elements: (1) Define the state space The state vector, composed of environmental variables such as photovoltaic power generation, load power, ambient temperature, and light intensity, represents the current system state. (4) in, This indicates the current photovoltaic power generation. Indicates the current load power. Current light intensity, Current ambient temperature; (2) Define the action space A t This includes the control input of the photovoltaic inverter and the output regulation of the energy storage system, enabling dynamic adjustment of control variables. (5) Among them, A t For the action space, the unit depends on the specific control interface; represents the control input of the photovoltaic inverter, in the form of duty cycle (%) or voltage (V). This indicates the output regulation of the energy storage system, with units of power (W) and current (A). (3) Define the reward function R t By combining power quality, power efficiency, and system stability, a reward signal is constructed to guide the control strategy towards the optimal direction. It is represented as follows: (6) in, and These are the system's voltage and frequency, respectively. and These are the reference voltage and frequency, respectively. For the sake of system operating efficiency, , and These are weighting factors, which respectively control the degree of influence of voltage, frequency, and system efficiency on the reward; (4) Strategy optimization: The reinforcement learning strategy is dynamically updated through the Q-Learning algorithm to gradually optimize the control strategy.
6. The method for optimizing and controlling the power supply quality of a photovoltaic power generation system according to claim 4, characterized in that: Q-Learning is a reinforcement learning algorithm based on value functions, which continuously updates the state-action value function. The strategy for finding the optimal action under different states is updated according to the following rules: (7) in, and These represent the current state and action, respectively. As an immediate reward after performing the action, The learning rate controls the update step size of the value function. The discount factor measures the impact of future rewards on current decisions; At each time step, the photovoltaic power generation system adjusts according to the current state. Choose an action A new state is generated based on the interaction between the action and the environment. and rewards Then adjust according to formula (7) This allows future decisions to yield higher cumulative rewards.
7. The method for optimizing and controlling the power supply quality of a photovoltaic power generation system according to claim 6, characterized in that: For reinforcement learning at the strategic level, a long-term control policy optimization equation based on Q-learning is adopted, as follows: (8) in, This indicates the action to be taken under the strategic state SstratS_{\text{strat}}Sstrat. The corresponding state-action value function is used to evaluate the long-term benefits of the policy behavior; It indicates the current system status at the strategic level, typically including long-term load forecasts and environmental trend parameters; This indicates action decisions within the strategic control strategy. Represents the immediate reward value at the strategic level, quantifying the contribution of the current strategy to the improvement of the system's long-term power supply quality and efficiency. Unit: dimensionless or power gain quantification value. The learning rate controls the current learning step size; the unit is dimensionless. This represents the discount factor, which measures the importance of future rewards to current decisions; the unit is dimensionless. Indicates the next state The maximum expected return among all possible actions is used for strategy improvement; Represents any action from the set of possible future actions; For tactical layer reinforcement learning, the strategy is to adjust the strategy based on real-time environmental feedback and optimize the inverter output and energy storage scheduling. For reinforcement learning at the operational layer, a real-time control optimization equation based on Q-learning is adopted, as follows: (9) in, Indicates the operational layer state Next action The obtained state-action value function is used to evaluate the effectiveness of the instantaneous control strategy; The system's real-time state vector, representing the operational layer, typically includes measurable physical variables such as current voltage, current, and frequency deviation. This indicates the control actions at the operational layer; It represents the immediate reward value of the operational layer, which measures the contribution of control behavior to voltage / frequency stability. The unit can be a dimensionless score or an inverse error ratio. The learning rate represents the degree to which current experience influences the update of the Q value; This represents the discount factor, reflecting the importance of future rewards; Indicates the next moment The maximum expected value under all possible actions; This represents any candidate action from the set of possible actions in the operation layer.
8. The method for optimizing and controlling the power supply quality of a photovoltaic power generation system according to claim 7, characterized in that: It also includes experience replay and target network mechanisms; The experience replay stores the state-action-reward-new state pairs for each time step in a memory bank, and randomly extracts these experiences for updating during training, thereby breaking the temporal correlation of the data and improving the convergence of the algorithm. The target network, i.e., during the Q-Learning update process, has its parameters updated periodically to stabilize the learning process.
9. The method for optimizing and controlling the power supply quality of a photovoltaic power generation system according to claim 8, characterized in that: The construction of the control framework includes the following steps: (1) Layered design: The control strategy is divided into strategic, tactical and operational layers to deal with long-term, medium-term and short-term control objectives respectively; (2) Setting control objectives: Set long-term optimization objectives at the strategic level, carry out medium-term scheduling optimization at the tactical level, and execute real-time control tasks at the operational level; (3) Control algorithm configuration: The strategic and tactical layers adopt the model predictive control (MPC) method, and the operational layer adopts the proportional-integral-derivative control (PID) method to achieve the timeliness and stability of hierarchical collaborative control.