A photovoltaic operating state management regulation method and system

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By constructing a photovoltaic ecological monitoring model and utilizing the physical coupling field submodule and the photovoltaic ecological agent submodule, the problems of insufficient dynamic change adaptability and global coordination capability in traditional photovoltaic operation and management are solved, and efficient and accurate management and control of photovoltaic clusters are achieved.

CN122243183APending Publication Date: 2026-06-19SOUTHERN POWER GRID DIGITAL GRID RESEARCH INSTITUTE CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: SOUTHERN POWER GRID DIGITAL GRID RESEARCH INSTITUTE CO LTD
Filing Date: 2026-02-11
Publication Date: 2026-06-19

Application Information

Patent Timeline

11 Feb 2026

Application

19 Jun 2026

Publication

CN122243183A

IPC: G06Q10/0635; G06Q10/0637; G06Q10/04; G06Q10/067; G06Q50/06; G06N5/04; G06N3/042; G06N3/0464; G06N3/092; G06N3/006

AI Tagging

Application Domain

Forecasting Biological models

Technical Efficacy Phrases

implement miningachieve forecast

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Anti-collision wall stress test digital twin system and method
CN120579429Bachieve precisionachieve forecast
A method of manufacturing a roll
CN122125191ASolve the industry problem of poor consistency in mass productionEliminate brittle phasesIncreasing energy efficiency Casting parameters measurement/indication devices Rare-earth element Slag
Wind power cluster extreme weather ultra-short-term prediction method, system, device and medium based on beaufort wind scale
CN122288027Aaccurate extraction avoid loss Extreme weather Atmospheric sciences
A Dynamic Measurement and Compensation Method for Microwave Heating Temperature Based on Multi-Source Data Fusion
CN121877227BHigh precision Improve stability Radiation pyrometry Thermometers using electric/magnetic elements Control engineering Multiple sensor
Obstacle slow intrusion risk assessment method, device, storage medium
CN117125111Bachieve forecastplay a preventive role in advanceImage analysis Biological models Simulation Term memory

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Traditional photovoltaic operation and management methods are difficult to adapt to the rapid dynamic changes of distributed photovoltaic systems, lack global coordination capabilities, resulting in group oscillations and damage to overall efficiency, and lack quantitative assessment of the overall system operation status and forward-looking risk warning.

Method used

A photovoltaic ecological monitoring model containing a physical coupling field submodule and a photovoltaic ecological agent submodule is constructed and trained. The current transmission and heat diffusion phenomena are simulated through graph attention network and graph convolutional network. The optimal control strategy is generated by combining multi-agent reinforcement learning, and risk assessment and counterfactual reasoning are performed to achieve global collaborative management of photovoltaic clusters.

Benefits of technology

It enables the mining and prediction of photovoltaic cluster behavior, improves the pertinence and efficiency of operation and maintenance decisions, generates the optimal control strategy that balances strategy effectiveness and robustness under extreme disturbances, and achieves efficient and accurate photovoltaic operation status management and control.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122243183A_ABST

Patent Text Reader

Abstract

This invention provides a photovoltaic (PV) operation status management and control method and system, comprising: constructing and training a PV ecological monitoring model including a physical coupling field submodule and a PV ecological agent submodule; performing risk assessment on a real-time PV operation dataset based on the PV ecological monitoring model to generate risk assessment results; performing strategy exploration and counterfactual reasoning based on the risk assessment results to generate corresponding optimal control strategies; executing the optimal control strategies; and performing feedback optimization based on the feedback data corresponding to the optimal control strategies, thereby achieving efficient and accurate management and control of PV operation status.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of photovoltaic operation management technology, and more specifically, to a photovoltaic operation status management and control method and system. Background Technology

[0002] In the field of photovoltaic operation and management, the operation status management and coordinated control of large-scale arrays are the foundation for ensuring system safety, stability and efficiency, but traditional methods still have certain limitations in management and control.

[0003] On the one hand, traditional methods are often unable to adapt to the rapid dynamic changes of distributed photovoltaics and often lack global coordination capabilities, which can easily lead to group oscillations and thus damage the overall efficiency. On the other hand, traditional methods are often limited to threshold alarms and post-event analysis of independent parameters such as voltage, current and temperature, and mostly lack quantitative assessment of the overall system operation status (such as synchronization stability and thermal imbalance propagation mode) and forward-looking risk warning capabilities. Summary of the Invention

[0004] In view of the aforementioned problems, and in conjunction with the first aspect of the present invention, embodiments of the present invention provide a photovoltaic operation status management and control method, the method comprising:

[0005] Construct and train a photovoltaic ecological monitoring model that includes a physical coupling field submodule and a photovoltaic ecological proxy submodule;

[0006] Risk assessment is performed on real-time photovoltaic operation datasets based on the photovoltaic ecological monitoring model, and risk assessment results are generated.

[0007] Based on the risk assessment results, strategy exploration and counterfactual reasoning are performed to generate the corresponding optimal control strategy;

[0008] The optimal control strategy is executed, and feedback optimization is performed based on the feedback data corresponding to the optimal control strategy.

[0009] As a further aspect of the present invention, a photovoltaic ecological monitoring model comprising a physical coupling field submodule and a photovoltaic ecological proxy submodule is constructed and trained, including:

[0010] A graph attention network is used to simulate the current transmission process of a photovoltaic cluster, and a photovoltaic electrical network is constructed.

[0011] A graph convolutional network is used to simulate the thermal diffusion phenomenon of photovoltaic clusters, and a photovoltaic thermal network is constructed.

[0012] A physical coupling field submodule is constructed based on the photovoltaic electrical network and photovoltaic thermal network. The physical coupling field submodule is used to output the global physical coupling field distribution of the photovoltaic cluster.

[0013] A photovoltaic agent is constructed for each photovoltaic module in the photovoltaic cluster based on multi-agent reinforcement learning.

[0014] The strategy network of the photovoltaic agent takes the local state of the photovoltaic module as input, combines it with a preset reward function, and outputs the corresponding parameter adjustment strategy.

[0015] A photovoltaic ecosystem agent submodule is constructed based on an intelligent agent cluster composed of photovoltaic intelligent agents;

[0016] The global physical coupling field distribution output by the physical coupling field submodule is used as the state space parameter of the photovoltaic intelligent agent;

[0017] After the photovoltaic cluster executes the parameter adjustment strategy output by the photovoltaic agent, it updates the parameters of the physical coupling field submodule based on the latest photovoltaic cluster state.

[0018] As a further aspect of the present invention, the method further includes:

[0019] Using historical operating data of photovoltaic clusters as the training dataset, the time-series prediction capability of the physical coupling field submodule is trained by combining a preset loss function;

[0020] The preset training function includes at least the mean square error between the predicted value and the true value of the training dataset, and the residual sum between the predicted value and the preset physical constraints;

[0021] Based on the pre-trained physical coupling field submodule, the training dataset is run in a preset digital twin environment to simulate different photovoltaic cluster operation scenarios;

[0022] With the goal of maximizing cumulative reward, a multi-agent proximal strategy optimization is used to train the photovoltaic ecosystem agent submodule.

[0023] As a further aspect of the present invention, a risk assessment is performed on the real-time photovoltaic operation dataset based on the photovoltaic ecological monitoring model to generate risk assessment results, including:

[0024] After the real-time photovoltaic operation dataset is input into the photovoltaic ecological monitoring model, the physical coupling field submodule extracts field features from the real-time photovoltaic operation dataset and generates a physical coupling field feature vector.

[0025] The physical coupling field feature vector includes at least global field features and local anomaly features;

[0026] The photovoltaic ecological agent submodule analyzes the real-time photovoltaic operation dataset and generates an ecological agent feature vector;

[0027] The ecological agent feature vector includes at least a strategy consistency coefficient and a strategy preference distribution;

[0028] Risk prediction is performed based on the physical coupling field feature vector and the ecological agent feature vector to obtain short-term risk probability, phase transition early warning information and abnormal area distribution.

[0029] The physical coupling field feature vector, ecological proxy feature vector, short-term risk probability, phase transition early warning information, and abnormal area distribution are encapsulated into a risk assessment result for output.

[0030] As a further aspect of the present invention, strategy exploration and counterfactual reasoning are performed based on the risk assessment results to generate a corresponding optimal control strategy, including:

[0031] The risk assessment results are used to conduct a demand analysis to obtain management and control demand data;

[0032] The management and control demand data are analyzed based on a preset candidate strategy generator to generate a set of candidate control strategies;

[0033] The candidate control strategy set is executed within a preset digital twin environment, and a preset counterfactual perturbation generation mechanism is used to inject counterfactual perturbations into the execution process of the candidate control strategy set.

[0034] The counterfactual perturbations include at least Monte Carlo perturbations based on historical distributions, extreme perturbations based on chaos theory, and adversarial perturbations based on adversarial networks.

[0035] Monitor the execution utility of the candidate control strategy set after injecting counterfactual perturbations, and generate a first utility dataset;

[0036] The first utility dataset is compared with an interference-free baseline to obtain a stability score for the candidate regulation strategy.

[0037] The candidate control strategy set is screened based on the stability score of the candidate control strategy to obtain the optimal control strategy.

[0038] As a further aspect of the present invention, the management and control demand data is analyzed based on a preset candidate strategy generator to generate a candidate control strategy set, including:

[0039] A dynamic strategy space is constructed based on management and control demand data. Latin hypercube sampling is used to randomly sample within the dynamic strategy space to generate the original control strategy set.

[0040] Simulations are performed on all original control strategies within the original control strategy set to obtain the corresponding expected control effects.

[0041] If the expected regulatory effect meets the preset screening conditions, the corresponding original regulatory strategy will be used as a candidate regulatory strategy.

[0042] If the expected regulatory effect does not meet the preset screening conditions, a multi-objective evolutionary algorithm is used to iteratively optimize the original regulatory strategy set until the number of strategies in the candidate regulatory strategy set reaches a preset threshold.

[0043] As a further aspect of the present invention, the optimal control strategy is executed, and feedback optimization is performed based on the feedback data corresponding to the optimal control strategy, including:

[0044] Obtain feedback data after the execution of the optimal control strategy, wherein the feedback data includes at least the initial photovoltaic operating state, the optimal control strategy adopted, the actual system response, and the final photovoltaic operating state;

[0045] An incremental learning package is generated based on the feedback data, and the photovoltaic ecological monitoring model is incrementally updated based on the incremental learning package.

[0046] The corresponding parameters in the preset strategy space are updated based on the feedback data.

[0047] Furthermore, embodiments of the present invention also provide a photovoltaic operation status management and control system, comprising:

[0048] A model construction module is used to build and train a photovoltaic ecological monitoring model, which includes a physical coupling field submodule and a photovoltaic ecological proxy submodule.

[0049] The risk assessment module is used to perform risk assessment on the real-time photovoltaic operation dataset based on the photovoltaic ecological monitoring model and generate risk assessment results.

[0050] The strategy generation module is used to explore strategies and perform counterfactual reasoning based on the risk assessment results, and generate the corresponding optimal control strategy.

[0051] The feedback optimization module is used to obtain feedback data after executing the optimal control strategy, and to perform feedback optimization on the corresponding modules of the system based on the feedback data.

[0052] Compared with the prior art, the present invention has the following beneficial effects:

[0053] By constructing a photovoltaic ecological monitoring model that integrates physical law constraints and intelligent agent interaction simulation, the behavior of photovoltaic clusters can be mined and predicted, providing a lossless and rapid virtual test field for all strategies.

[0054] By combining real-time data to calculate corresponding macroscopic order parameters and microscopic behavioral characteristics, the system's stability evolution and internal coordination status can be monitored, enabling precise risk identification and early warning, thereby improving the pertinence and efficiency of operation and maintenance decisions.

[0055] By using counterfactual reasoning and multi-objective optimization, an optimal control strategy that can balance strategy effectiveness and robustness under extreme perturbation stress testing is generated, ultimately achieving efficient and accurate management and control of photovoltaic operation status. Attached Figure Description

[0056] Figure 1 This is a flowchart of the steps of a photovoltaic operation status management and control method according to the present invention;

[0057] Figure 2 This is a diagram illustrating the cases listed in step S1-2;

[0058] Figure 3 This is a schematic diagram of a photovoltaic operation status management and control system according to the present invention. Detailed Implementation

[0059] The present invention will now be described in detail with reference to the accompanying drawings. Figure 1 This is a flowchart illustrating the steps of a photovoltaic operation status management and control method according to the present invention. Figure 2 This is a schematic diagram of the cases listed in steps S1-2. The following is a detailed introduction to this photovoltaic operation status management and control method.

[0060] Step S1: Construct and train a photovoltaic ecological monitoring model that includes a physical coupling field submodule and a photovoltaic ecological agent submodule.

[0061] In this embodiment, step S1 includes:

[0062] Step S1-1: Construct and train the physical coupling field submodule.

[0063] Specifically, a graph attention network is used to simulate the current transmission process of the photovoltaic cluster and a photovoltaic electrical network is constructed. A graph convolutional network is used to simulate the thermal diffusion phenomenon of the photovoltaic cluster and a photovoltaic thermal network is constructed. Based on the photovoltaic electrical network and the photovoltaic thermal network, a physical coupling field submodule is constructed. The physical coupling field submodule is used to output the global physical coupling field distribution of the photovoltaic cluster.

[0064] Furthermore, using historical operating data of the photovoltaic cluster as a training dataset, the time-series prediction capability of the physical coupling field submodule is trained by combining a preset loss function. The preset training function includes at least the mean square error between the predicted value and the true value of the training dataset, as well as the sum of the residuals between the predicted value and the preset physical constraints.

[0065] Understandably, the construction of the training dataset represents the collection of historical operation datasets of photovoltaic power plants. These historical operation datasets cover various scenarios under different seasons, weather, and load conditions, and include typical fault cases such as hot spot formation, bypass diode failure, and power attenuation caused by module glass breakage. Each case scenario sample contains at least the status data of the photovoltaic modules, such as the time-series data of DC side voltage, current, and backsheet temperature of each photovoltaic module; environmental data, such as total horizontal irradiance, ambient temperature, wind speed and direction; and grid-side data, such as grid connection point voltage, frequency, and active or reactive power commands.

[0066] The historical operational data is processed using Z-score-based outlier detection and removal, linear interpolation to fill in temporary data gaps caused by communication interruptions, and sliding window-based standardization to eliminate the influence of daily variations and seasonal trends. This process ultimately generates a corresponding training dataset. For example, for a subarray consisting of 1000 photovoltaic panels, its historical operational dataset contains second-level data for a continuous year. Suppose that during preprocessing, it is found that the current value of a certain photovoltaic panel is consistently 0 on a sunny afternoon, but the irradiance during that period is very strong. Combined with maintenance records, it is confirmed that the panel was disconnected for maintenance that day. Therefore, the data corresponding to that photovoltaic panel during that period is marked as invalid and removed from the training set. For the historical data that is retained, feature extraction is performed to obtain the corresponding features. Taking the local temperature gradient feature as an example, the extraction of the local temperature gradient feature can be represented by quantifying the root mean square of the temperature difference between the current photovoltaic module and its four adjacent modules (east, west, south, and north).

[0067] In one possible embodiment, a graph neural network is used as the basic architecture of the physical coupling field submodule. Specifically, firstly, each photovoltaic module is defined as a graph node. The node feature vector includes at least its own voltage, current, temperature, received irradiance, and derived features such as local temperature gradient. The edge connections between nodes include the following two types of relationships: one is the electrical connection edge based on the electrical topology, whose edge features may include estimated values such as line resistance and inductance; the other is the thermal connection edge based on spatial location and wind direction, used to simulate heat conduction and convection, whose edge features may include distance, relative wind direction angle, and heat transfer coefficient. For example, in an array using a string structure, 20 photovoltaic modules are connected in series to form a string. These 20 nodes are connected end to end through 19 electrical connection edges to form a closed loop chain. At the same time, thermal connection edges are established between each module and its physically adjacent modules above, below, left, and right.

[0068] A hybrid architecture of graph attention network and graph convolutional network is adopted to handle the coupled dynamics of electrical and thermal fields respectively. Specifically, for the electrical field, the voltage and current of each node at the previous moment and the change of the power setpoint at the current moment are taken as inputs. The redistribution of current is simulated through a multi-layer graph attention network, and the predicted voltage amplitude and phase angle of each node are finally output. During the training process, standard circuit constraints including Ohm's law and Kirchhoff's law are implicitly learned. For the temperature field, the temperature at the previous moment, the current heating power of each component, the ambient temperature, and the wind speed and direction are taken as inputs. The heat diffusion process is simulated through a graph convolutional network, and the predicted temperature of each component is output.

[0069] The physical coupling field submodule is pre-trained according to a predefined loss function. The loss function consists of the mean square error between the predicted value and the actual monitored value in the training dataset, as well as the physical conservation law residual. The physical conservation law residual is calculated after forward propagation of each batch of training data to determine whether the prediction result violates the basic physical laws. For example, for any electrical node, the residual of the sum of its inflow current and outflow current should be close to zero; for any closed loop, the residual of the sum of voltage drops should be close to zero; for any component, the residual of the sum of the input electrical power, output thermal power, radiative heat dissipation power, and convective heat transfer power calculated according to the prediction parameters should be balanced, i.e., energy is conserved. The sum of the squares of the residuals between the prediction result and all the predefined constraints involved constitutes a comprehensive physical constraint loss term. This comprehensive physical constraint loss term is then weighted and fused with the mean square error.

[0070] For example, suppose in a pre-training batch, the physical coupling field submodule predicts the temperature of a certain component node to be 65℃, while the actual value is 63℃. The mean square error between the two is then calculated. Simultaneously, based on this predicted value, the predicted temperatures of its adjacent components, and wind speed, the comprehensive physical constraint loss term for that node is calculated according to Newton's law of cooling and Fourier's law of thermal conduction. Taking the thermal balance residual as an example, suppose the heat generation calculated from the predicted value is much greater than the heat dissipation. This phenomenon severely violates the law of energy conservation, and the thermal balance residual will generate a strong gradient, forcing its weights to be adjusted in the next iteration. This makes the prediction not only closer to the actual measurement but also more consistent with physical laws. Through extensive iterative training, the submodule learns to ensure the physical rationality of the data while improving prediction accuracy.

[0071] Step S1-2: Construct and train the photovoltaic ecosystem agent submodule.

[0072] Specifically, a photovoltaic agent is constructed for each photovoltaic module in the photovoltaic cluster based on multi-agent reinforcement learning. The policy network of the photovoltaic agent takes the local state of the photovoltaic module as input, combines it with a preset reward function, and outputs the corresponding parameter adjustment strategy. A photovoltaic ecosystem agent sub-module is constructed based on the agent cluster composed of photovoltaic agents.

[0073] Understandably, the state space of the photovoltaic intelligent agent includes the electrical and temperature states of its own components, local physical field information such as voltage gradient and temperature gradient, and neighbor state summaries. For example, the electrical and temperature states of a photovoltaic intelligent agent can be represented as [voltage: 31.2V, current: 8.5A, backsheet temperature: 68℃], and the local physical field information is [voltage gradient: (0.1V / m, -0.05V / m), temperature gradient: (2.5℃ / m, 0℃ / m)]. This information indicates that at the location of the intelligent agent, the voltage increases by 0.1V per meter in the east direction and decreases by 0.05V per meter in the north direction, and the temperature increases significantly by 2.5℃ per meter in the east direction and remains unchanged in the north direction, thus suggesting that there may be a high-temperature hotspot to its east. The intelligent agent obtains their state summaries by communicating with neighboring components as [neighbor average temperature: 65℃, neighbor average power change trend: -0.5%].

[0074] Action space refers to the control commands generated by an agent based on its state space. For example, a photovoltaic agent makes a decision based on its state space. Suppose the generated action is "to lower its own output power setpoint by 3%". If the system adopts continuous control, this action is a specific value, such as ΔP=-25W. If discrete levels are used, it may be selecting the -5% level from preset levels such as full power, -5%, -10%, and -20%. This action will be executed by the optimizer or inverter corresponding to the photovoltaic module, actually changing its power output.

[0075] The preset reward function includes at least field-following rewards, pattern maintenance rewards, strategy exploration rewards, and constraint safety rewards. Among them, the field-following reward is used to reward the consistency between the direction of the agent's generated action and the direction of the local physical field gradient, that is, to encourage the photovoltaic agent to conform to the guidance of the field. For example, the temperature gradient (2.5℃ / m, 0℃ / m) indicates that the temperature on the east side is high. In order to maintain the stable performance of the module, heat should be encouraged to diffuse to the west or heat generation on the east side should be reduced. Suppose that the field-following reward is quantified as a function of the dot product of the action and the field gradient direction. Suppose that the analysis of historical data shows that reducing power helps to cool down. In this case, cooling down can be defined as a positive action. When the agent's action is the positive action of reducing power, it will get a positive reward. Otherwise, it will get a negative reward.

[0076] Pattern maintenance rewards are used to reward agents for their behavior that is consistent with the behavior patterns of their group or neighbors. For example, suppose the photovoltaic cluster in the above example is divided into an aggressive intervention cluster and a conservative follower cluster by clustering. Suppose the consensus action of the mild follower cluster is to slightly reduce the power within the range of [-1%, -3%]. Then, for agents in the mild follower cluster, adjusting the power within the corresponding range can be regarded as a positive action and should be rewarded positively. If the power is adjusted significantly outside the range, it will be regarded as a negative action and should be rewarded negatively.

[0077] The strategy exploration reward is used to incentivize agents to take actions different from the current best action in order to explore potential better solutions, thereby preventing the system from getting stuck in local optima. For example, if a strategy exploration is conducted with a very small probability, i.e., a strategy exploration is triggered, a positive reward will be given. If it is not triggered, this item will be 0.

[0078] The constraint safety reward is used to penalize any action that may cause the system to violate preset hard physical safety constraints. For example, suppose the current temperature of photovoltaic agent A1 is 68°C and the preset safety limit is 85°C. If the agent performs an action that greatly increases power, causing its temperature to rise to 86°C in the next moment, exceeding the safety limit, a negative penalty will be given relative to the other three reward items. That is, if the sum of the other three rewards is 10, the constraint safety reward for this action may be set to -100, making the total reward for this action extremely low, so that it is almost impossible to be selected, thus ensuring that all emerging group behaviors are within the absolute safety range.

[0079] It should be noted that for the four reward items mentioned above, each photovoltaic agent maintains a weight vector based on its historical operating data. This vector is used to weight the three items: field following reward, pattern maintenance reward, and strategy exploration reward. For example, a photovoltaic agent located at the edge of a hot spot may evolve its weight vector to w=[0.7, 0.2, 0.1] in long-term evolutionary learning. This weight vector means that the current agent is biased towards the field following reward, that is, it is highly sensitive to temperature gradients and prioritizes temperature response actions such as cooling. A photovoltaic agent located in the center of the array with a stable operating environment may evolve its weight to w=[0.2, 0.7, 0.1], indicating that it values the pattern maintenance reward more and tends to keep its behavior synchronized with its neighbors to maintain local stability. Alternatively, the weight vector of a small number of agents may be set to w=[0.4, 0.4, 0.2], making them explorers responsible for trying out new behavioral patterns. As for the constraint safety reward, since it is a fixed-size maximum penalty value, the penalty is only assigned when the corresponding condition is triggered, so no weighting is required.

[0080] Furthermore, based on the pre-trained physical coupling field submodule, the training dataset is run in a preset digital twin environment to simulate different photovoltaic cluster operation scenarios. With maximizing the cumulative reward as the training objective, a multi-agent proximal strategy is used to optimize the training of the photovoltaic ecosystem agent submodule.

[0081] Understandably, the preset digital twin environment is jointly composed of a physical coupling field submodule and a photovoltaic ecological agent submodule. The physical coupling field submodule simulates the spatiotemporal evolution of electrical and thermal fields, while the photovoltaic ecological agent submodule maps each photovoltaic module to an intelligent agent with needs, simulating its autonomous decision-making and group interaction through reinforcement learning. This environment forms a dynamically evolving virtual environment by synchronizing the real-time photovoltaic operation dataset with the corresponding state of the physical coupling field submodule. In this environment, various counterfactual disturbances, such as extreme weather and simulated faults, can be injected to simulate and test the future trajectory of the control strategy in parallel.

[0082] In one possible embodiment, the training process of the photovoltaic ecological agent submodule is carried out in a virtual environment provided by the physical coupling field submodule. The parameters of the agent's policy network and value network are randomly initialized. During the training process, the agent explores the scenarios corresponding to the training dataset, such as different irradiance distributions, initial temperatures, and grid commands. In each training cycle, the agent interacts with the environment according to the current policy to generate a state-action-reward sequence. The agent analyzes the state-action-reward sequence based on the near-end policy optimization algorithm to generate the corresponding policy gradient and updates the policy based on the policy gradient.

[0083] Furthermore, the global physical coupling field distribution output by the physical coupling field submodule is used as the state space parameter of the photovoltaic agent. After the photovoltaic cluster executes the parameter adjustment strategy output by the photovoltaic agent, it updates the parameters of the physical coupling field submodule based on the latest photovoltaic cluster state.

[0084] For example, such as Figure 2As shown, a 2×2 small photovoltaic array contains photovoltaic modules numbered A1, A2, B1, and B2. Assume that at time t0, the temperatures of each module are: A1: 75℃, A2: 65℃, B1: With temperatures of 63℃ for A1 and 64℃ for B2, and a preset safety threshold of 70℃, a hot spot risk is present at point A1. The east direction is the positive x-direction, and the south direction is the positive y-direction. The spacing between components is 1 meter. Therefore, the temperature difference between A1 and A2 is T_A2 - T_A1 = 65 - 7 = -10℃. The gradient component in the x-direction corresponding to A1 is -10 / 1 = -10℃ / m. Similarly, the gradient component in the y-direction of A1 is (63 - 75) / 1 = -12℃ / m. Thus, the gradient vector corresponding to A1 can be represented as (-10, -12). Since both components are negative, the gradient direction points to the third quadrant. That is, the temperature decreases fastest when moving southeast from point A1. Similarly, the vector fields corresponding to A2, B1, and B2 are (10, -1)℃ / m, (-2, 8)℃ / m, and (5, 6)℃ / m, respectively. At ℃ / m, each component's corresponding photovoltaic intelligent agent generates a corresponding strategy based on this gradient vector field information. Assuming the final strategy action is {A1: -8%, A2: -3%, B1: 0%, B2: +1%}, that is, A1 reduces power by 8%, A2 reduces power by 3%, B1 remains unchanged, and B2 slightly increases power by 1%. After executing this strategy, A1's heat generation decreases, and the temperature begins to drop. The physical coupling field submodule recalculates the new gradient vector field based on the new state parameters and iterates through multiple time steps until the array reaches a new equilibrium state (this is just an example of the general operation process of the photovoltaic ecological monitoring model; the specifics need to be determined based on actual conditions).

[0085] Step S2: Based on the photovoltaic ecological monitoring model, conduct a risk assessment on the real-time photovoltaic operation dataset and generate risk assessment results.

[0086] Specifically, after the real-time photovoltaic operation dataset is input into the photovoltaic ecological monitoring model, the physical coupling field submodule extracts field features from the real-time photovoltaic operation dataset to generate a physical coupling field feature vector. The physical coupling field feature vector contains at least global field features and local anomaly features.

[0087] The photovoltaic ecological agent submodule analyzes the real-time photovoltaic operation dataset and generates an ecological agent feature vector, which includes at least a strategy consistency coefficient and a strategy preference distribution.

[0088] In one possible embodiment, the extended Kalman filter algorithm is used to fuse the data within the real-time photovoltaic operation dataset with the predicted state within the physical coupling field submodule, and output a corrected optimal state estimate that is closest to the real physical state. This estimate includes the calibrated voltage and temperature values of each component and the smooth distribution of the entire physical field. At the same time, it generates the local observation vector required for the decision-making of each photovoltaic agent in the photovoltaic ecological agent submodule.

[0089] For example, suppose at a certain moment, the temperature sensor of module A5, located at the edge of the photovoltaic array, reports a sudden abnormal value of 72℃ due to temporary interference. However, the temperature predicted by the physical coupling field submodule is 65℃. Analysis of historical data shows that the noise variance of this sensor is relatively large. At this time, a Kalman filter is applied for filtering, and the current observation is given a lower weight. Suppose the calculated optimal estimate is 66.5℃. Based on the corrected global temperature distribution, the temperature gradient at the location of module A5 is calculated to be [1.2, -0.3]℃ / m. This gradient means that the temperature is higher to the east and slightly lower to the north. Combining its own voltage and current data with the state summaries of its three neighboring modules C7, C9, and D8, the local observation vector corresponding to A5 is obtained as [Voltage: 31.1V, Current: 8.2A, Temperature: 66.5℃, Temperature gradient (1.2, -0.3), Average power of neighbors: 255W].

[0090] Global field characteristic indicators, including at least voltage synchronization sequence parameters, temperature field standard deviation, and field energy spatial gradient variance, are collected. Based on the Kuramoto model, the synchronization degree of voltage phase angles of all nodes is calculated to generate voltage synchronization sequence parameters. The closer the value is to 1, the more electrically stable the system is. The temperature field standard deviation and field energy spatial gradient variance are obtained synchronously. The temperature field standard deviation is used to reflect the uniformity of temperature distribution in the photovoltaic cluster, while the field energy spatial gradient variance is used to measure the drastic degree of field distribution change. Local anomaly characteristic indicators are represented by calculating the deviation of the field strength of each photovoltaic module from the historical average value for the same period, and analyzing whether its gradient direction conflicts with the overall field structure. Modules whose deviations from the preset conditions in either the module deviation or the gradient direction are judged as anomalies, and their corresponding deviations and gradient directions are used as local anomaly characteristics.

[0091] For example, for a photovoltaic array with 1000 nodes, the corrected voltage phase angles of all nodes are first read. Assuming these phase angles fluctuate slightly within [-0.1, 0.1] radians, and the corresponding synchronization sequence parameter is 0.98, this value indicates that the voltage synchronization of the array is excellent and the electrical stability is good. At the same time, the standard deviation of the temperature of all components is calculated to be 4.2℃. Subsequently, the temperature of each component in the array is scanned, and it is found that the temperature of component E12 is 78℃, while its historical average temperature at the same time in the past 30 days is 62℃, with a deviation as high as 16℃, far exceeding the preset threshold of 5℃. Moreover, its local temperature gradient direction is chaotic and obviously inconsistent with the smooth temperature field structure of the surrounding area. Therefore, E12 is marked as a temperature anomaly point, and its corresponding local anomaly characteristics provide a data basis for the subsequent decision-making of the photovoltaic ecological agent sub-module.

[0092] Within the photovoltaic ecosystem agent submodule, each photovoltaic agent generates its theoretically optimal action in the current state based on its corresponding local observation vector. This theoretically optimal action is then encapsulated with the actual action executed by the photovoltaic module in a binary tuple. The deviation within this tuple is calculated to obtain the corresponding policy consistency index. For example, for photovoltaic agent A5, its policy network calculates a theoretically optimal action of "reducing power by 15%" based on the current temperature gradient. However, since the inverter to which this module belongs has reached its minimum power limit, its actual action is only "reducing by 7%". Assuming the action range is [-20%, +20%], or 40%, its policy consistency coefficient is 1 - (|-15 - (-7)| / 40) = 0.8. This means that the agent's execution is compromised. Assuming the mean policy consistency coefficient of all agents in its array is 0.92 and the standard deviation is 0.08, this indicates that the array as a whole can execute the policy well, but there are a few individuals with low consistency. Assuming that in this example, components that deviate from the mean by 10% are considered outliers, the location and identity of the outliers will be recorded for subsequent diagnostic consideration.

[0093] A spectral clustering algorithm is used to perform cluster analysis on all binary pairs to generate corresponding policy preference distributions. Specifically, contextual features such as the magnitude of the theoretically optimal action, the magnitude of the actual action, the direction of the action, and the volatility of the agent's historical decisions are obtained for each binary pair. Based on these contextual features, the spectral clustering algorithm is used to divide all binary pairs into K clusters, where each cluster represents a group of photovoltaic agents with similar decision-making patterns. At the same time, the proportion of photovoltaic agents in each cluster relative to the total number is calculated to form a policy preference distribution P=[p_1, p_2, ..., p_K], where p_k represents the proportion of agents in the k-th cluster.

[0094] For example, if spectral clustering is performed on 1000 agents and the number of clusters is set to 4, after the algorithm runs, assume the following four clusters are obtained: Precise Follower Cluster: The theoretically optimal action is highly consistent with the actual action of the current photovoltaic module, with an adjustment range of less than 5%. The action direction is highly correlated with the local field gradient direction. Its strategy preference is to strictly and precisely respond to changes in the field, accounting for 45% of the cluster; Mild Coordinator Cluster 2: The theoretically optimal action and the actual action of the current photovoltaic module have an adjustment range of [5%, 10%]. The actual action amplitude is about half of the theoretical value, and the action direction sometimes deviates from the field gradient due to the state of neighbors. Its strategy preference is to maintain the balance between field guidance and the local potential field, accounting for 30% of the cluster; Adaptive Explorer Cluster 3: The theoretically optimal... The range of adjustment between the action and the actual action executed by the photovoltaic module exceeds 10%. The matching degree between the theoretical optimal action and the actual action is low and fluctuates greatly. Moreover, historical data shows that the agent occasionally takes exploratory actions that are completely opposite to the theoretical optimal action. Its strategy preference is to take certain risks to explore in order to adapt to the unknown situation. The cluster accounts for 20%. The communication-restricted cluster has extremely low consistency between the theoretical optimal action and the actual action, and the action is not significantly related to the current field gradient or the neighbor state. The communication latency of the physical components corresponding to this part of the agent is high. The cluster accounts for 5%. The current strategy preference distribution is: {Precise Follower: 45%, Moderate Coordinator: 30%, Adaptive Explorer: 20%, Communication-Restricted: 5%}.

[0095] Furthermore, risk prediction is performed based on the physical coupling field feature vector and the ecological agent feature vector to obtain short-term risk probability, phase transition early warning information and abnormal area distribution. The physical coupling field feature vector, ecological agent feature vector, short-term risk probability, phase transition early warning information and abnormal area distribution are then encapsulated into a risk assessment result and output.

[0096] In one possible embodiment, risk prediction includes short-term risk prediction and phase transition risk prediction. The short-term risk prediction is expressed as follows: using the optimal state estimate as the initial condition, combined with the photovoltaic ecological monitoring model, a forward simulation is performed. During the simulation, a large number of possible future trajectories are generated. In each trajectory, a random disturbance sequence is superimposed based on the historical probability distribution of environmental parameters such as irradiance and wind speed. Subsequently, the proportion of these future trajectories that violate preset key safety constraints, such as any component temperature exceeding 85°C or any node voltage exceeding the limit, is statistically analyzed. This proportion is the short-term risk probability at different future time points. For example, 200 parallel simulations are initiated for the state of high temperature detected in component A5 to simulate the development trajectory in the next 5 minutes. In 100 simulations, the temperature of component A5 rises to 86°C within 3 minutes and triggers an alarm. In another 30 simulations, the high temperature spreads and causes the two adjacent components to also exceed 80°C. Then, the risk probability of "component-level thermal failure" in the next 3 minutes is 100 / 200=50%, and the risk probability of "local thermal runaway spread" is 30 / 200=15%. Finally, all risk probabilities are mapped into a risk probability matrix.

[0097] Phase transition risk prediction is represented by a sliding time window continuously monitoring the time series of the voltage synchronicity sequence parameter in the physical coupling field feature vector to obtain phase transition warning information. For example, in the past 10 minutes, the voltage synchronicity sequence parameter has slowly fluctuated from 0.98 and decreased to 0.92. By analyzing its time series, it is found that the variance of the voltage synchronicity sequence parameter fluctuation has increased by 3 times in the past 2 minutes. Secondly, for a simulated small disturbance, the time required for the voltage synchronicity sequence parameter to recover to its original value has increased from 10 seconds to 50 seconds. Finally, its autocorrelation time has increased significantly from 5 seconds to 25 seconds. At the same time, in the forward simulation, 40% of the trajectories show that the system has slid to a low synchronicity state where the voltage synchronicity sequence parameter is stable at around 0.7. Based on this evidence, it can be determined that the system is currently approaching the phase transition critical point of "loss of voltage synchronicity". Then, {phase transition warning: voltage instability, level: emergency, prediction information: it is estimated that a phase transition may occur within 5-8 minutes without intervention} can be generated.

[0098] It should be noted that when performing the aforementioned short-term risk prediction and phase transition risk prediction, whenever a risk anomaly is detected, the corresponding abnormal region will be located. After the risk prediction is completed, all located abnormal regions will be mapped into an abnormal region distribution vector to assist in the generation of subsequent strategies.

[0099] Step S3: Based on the risk assessment results, conduct strategy exploration and counterfactual reasoning to generate the corresponding optimal control strategy.

[0100] In this embodiment, step S3 includes:

[0101] Step S3-1: Generate a set of candidate regulatory strategies.

[0102] Specifically, a demand analysis is performed on the risk assessment results to obtain management and control demand data. A dynamic strategy space is constructed based on the management and control demand data. Latin hypercube sampling is used to randomly sample within the dynamic strategy space to generate an original control strategy set. Simulation is performed on all original control strategies within the original control strategy set to obtain the corresponding expected control effect.

[0103] If the expected regulatory effect meets the preset screening conditions, the corresponding original regulatory strategy will be used as a candidate regulatory strategy.

[0104] If the expected regulatory effect does not meet the preset screening conditions, a multi-objective evolutionary algorithm is used to iteratively optimize the original regulatory strategy set until the number of strategies in the candidate regulatory strategy set reaches a preset threshold.

[0105] In one possible embodiment, multi-source information such as risk probability, phase transition warning level, and operational status deviation are extracted from the risk assessment results. This multi-source information and the system operation objective are transformed into a quantified multi-objective optimization problem. Priority is determined based on the phase transition warning level and the highest risk probability in the risk probability matrix. Specifically, when the phase transition warning level is emergency, risk suppression is forcibly set as the highest priority. When the warning level is below emergency and all probabilities in the risk matrix are below the threshold, economy and instruction tracking are the main objectives. It should be noted that each specific objective is quantified into a cost function with a clear physical meaning. The output value of the cost function is mapped to the [0, 1] interval through max-min normalization, and each cost function is assigned a dynamic weight. The sum of all weights is 1. The total cost function is defined as the weighted sum of all cost functions, where the weight allocation is determined by the risk probability, warning level, and importance of external instructions.

[0106] For example, suppose a risk assessment result shows in its risk probability matrix that the trigger probability of thermal runaway at 1 minute, 3 minutes, and 5 minutes is [0.2, 0.5, 0.45], and the voltage instability risk is [0.05, 0.1, ...]. [0.15]; The phase change warning type is "thermal runaway", the level is emergency, and the estimated critical time is 300 seconds; the abnormal area is the area corresponding to component E12 and is marked as a level one high temperature anomaly; external instruction: require the total power to be increased by 3% within 10 minutes; assuming that the condition here is that the level reaches emergency and the probability of thermal runaway risk within 3 minutes is not less than 0.5, since the risk probability matrix and the phase change warning information meet the corresponding conditions, the risk suppression target gets the highest priority; based on the above information, the following cost functions are set: the probability that the temperature of any component in the high temperature anomaly area centered on E12 exceeds 85°C within the next 5 minutes is defined as the thermal risk suppression cost function; the probability that the system voltage synchronization sequence parameter is less than 0.85 within the next 5 minutes is defined as the voltage steady state cost function; the square of the relative error between the actual total power and the target power after 10 minutes is defined as... Let's define the cost function for tracking dispatch instructions. Assuming the original weight ratio of the three cost functions is 4:4:3, and since the warning level in this example is emergency, a gain of 0.2 is applied to the weight of the thermal risk suppression cost function. Simultaneously, the risk probability of 0.5 is linearly mapped to 0.05. Therefore, the weight coefficient of the thermal risk suppression cost function is 0.4 + 0.2 + 0.05 = 0.65. Since the risk suppression target is the highest priority, the weight coefficients of the voltage steady-state cost function and the dispatch instruction tracking cost function will be proportionally reduced. Assuming this is mapped to half of the original weight coefficients, the adjusted weight ratio is 0.65:0.2:0.15. The total cost function in this example can be expressed as 0.65 * thermal risk suppression cost function + 0.2 * voltage steady-state cost function + 0.15 * dispatch instruction tracking cost function.

[0107] In the policy space composed of field programming parameters and meme injection parameters, a policy that optimizes the output value of the total cost function is searched to generate a set of candidate control policies. The field programming parameters are represented as the correction to the global guidance field, which are represented by a set of radial basis functions, each containing the corresponding center coordinates, influence radius and intensity coefficient. The meme injection parameters are represented as temporary interventions on a specific subset of photovoltaic smart agents, such as adjusting their weight vectors or policy network parameters.

[0108] The candidate regulation strategy set generation process is divided into two stages: In the first stage, based on the current system state and regulation requirements, a basic strategy is generated in combination with preset rules. For example, in response to the risk of thermal runaway, the preset rules suggest placing a negative intensity temperature field in the high-temperature region for correction. At the same time, a meme that enhances the field following reward weight is injected into the agent in this region, and successful strategies in similar scenarios are retrieved from the historical case library as the basic strategy. In the second stage, starting from the basic strategy, local sampling and mutation are performed in the strategy space using Latin hypercube sampling to generate a batch of candidate regulation strategies. Each candidate regulation strategy consists of a specific set of field correction parameters and a set of meme parameters.

[0109] Taking the aforementioned thermal runaway risk case as an example, firstly, a basic strategy Q0 is generated: a temperature field correction with an intensity of -0.4 and a radius of 5 meters is placed at the E12 position of the module; simultaneously, the meme "temporarily increasing its field following weight by 0.3" is injected into all photovoltaic agents of mild coordinators within a radius of 8 meters centered on E12; then, multiple variants are generated through parameter mutation: Q1 {increases the field intensity to -0.6, but reduces the radius to 3 meters}; Q2 {places a field with an intensity of -0.3 at E12, and additionally places an auxiliary field with an intensity of -0.2 to its east}; Q3 {keeps the field parameters of Q0, but changes the meme target to all precise followers}; Q4 adopts {generating a global temperature field correction with an intensity of -0.1, but covering the entire field, and not injecting memes for specific agents}, finally generating a set containing 20 candidate control strategies.

[0110] Step S3-2: Generate the optimal control strategy.

[0111] Specifically, the candidate control strategy set is executed within a preset digital twin environment. A preset counterfactual perturbation generation mechanism is used to inject counterfactual perturbations into the execution process of the candidate control strategy set. The counterfactual perturbations include at least Monte Carlo perturbations based on historical distribution, extreme perturbations based on chaos theory, and adversarial perturbations based on adversarial networks.

[0112] Furthermore, the execution utility of the candidate control strategy set after injecting counterfactual perturbations is monitored to generate a first utility dataset. The first utility dataset is compared with an interference-free baseline to obtain a stability score for the candidate control strategy. Based on the stability score, the candidate control strategy set is screened to obtain the optimal control strategy.

[0113] In one possible embodiment, for each candidate control strategy, firstly, the field correction and meme injection corresponding to the candidate control strategy are applied in a preset digital twin; then, a photovoltaic ecological monitoring model is run for multi-step forward simulation, such as simulating the next 10 minutes; each candidate control strategy is simulated N times, generating N possible development trajectories. On each trajectory, the value of the total cost function under the strategy is calculated, and it is recorded whether it violates preset safety constraints, such as voltage exceeding limits or temperature exceeding limits; during the simulation, a hybrid perturbation generation method is adopted: 70% of the time, Monte Carlo random perturbations based on historical statistics are used to simulate possible irradiance and wind speed fluctuations; 20% of the time, extreme perturbation sequences generated based on chaotic theories such as the Lorentz system are used to simulate sudden weather changes; 10% of the time, generative adversarial networks are used to generate adversarial perturbations against the current strategy; at the same time, as a benchmark for comparison, an additional set of interference-free baseline simulations is run, that is, simulations under the same N perturbation sequences and the same initial conditions, but without any candidate strategy intervention, to obtain the corresponding interference-free baseline results.

[0114] After the simulation, for each candidate control strategy, the average performance, standard deviation, worst-case performance, and constraint violation probability of each term in its total cost function are calculated. These four indicators are used as the first utility dataset. The average performance is expressed as the arithmetic mean of N simulation results, which reflects the expected effectiveness of the strategy. The standard deviation reflects the volatility of the strategy's effectiveness; the smaller the standard deviation, the more stable the strategy output. The worst-case performance reflects the strategy's risk resistance, such as by taking the 5th percentile. The constraint violation probability is expressed as the ratio of the number of violations of any preset safety constraint to the number of simulations.

[0115] Meanwhile, by comparing the simulation results of each candidate control strategy with the uninterrupted baseline results under the corresponding perturbation sequence, the conditional average treatment effect of the candidate control strategy is obtained, that is, the difference between the two indicators. The stability score of the candidate control strategy is expressed as candidate control strategy stability score = -[α*worst case performance) + (1-α)*standard deviation], where α is a trade-off coefficient. In high-risk scenarios, α is close to 1, and its default value is obtained through analysis of historical data.

[0116] For example, 150 simulations were performed on candidate strategy Q0. In 120 Monte Carlo perturbation simulations, the temperature of hotspot E12 was successfully reduced to below 75°C within 3 minutes in 115 of them, with the average probability of thermal runaway decreasing to 0.1. However, in 20 extreme chaotic perturbation simulations, the field correction failed in 8 of them due to drastic fluctuations in irradiance, and the hotspot temperature actually increased. In 10 adversarial perturbation simulations, the voltage instability occurred in 3 of them because the perturbation induced local resonance. Assuming that all simulation trajectories are considered... Q0 has an average performance of 0.12 in suppressing thermal runaway risk, which means an average thermal runaway probability of 12% and a standard deviation of 0.08. Its worst-case performance is 0.35, which means a 5% worst-case thermal runaway probability of 35% and a constraint violation probability of 0.05. Compared with the corresponding baseline, its conditional average treatment effect is -0.25, which means an average reduction of 25 percentage points in the thermal runaway probability. The stability score of the candidate control strategy is -(0.8*0.35+0.2*0.08)=-0.296.

[0117] First, any candidate control strategy with a constraint violation probability exceeding a preset safety threshold is directly eliminated. Then, a comprehensive optimization score is generated based on the weighted sum of average performance, conditional average treatment effect, and candidate control strategy stability score. Specifically, the comprehensive optimization score is calculated as: Comprehensive Optimization Score = a * Average Performance + b * Conditional Average Treatment Effect + c * Candidate Control Strategy Stability Score. Here, a, b, and c are adaptive weights. In a performance-oriented mode, the values of a and b will increase; in a stability-oriented mode, the value of c will increase. Finally, the candidate control strategy with the highest comprehensive optimization score is selected as the optimal control strategy. For example, in the above example, Q0 handles an emergency situation, where stability should be prioritized, therefore the value of c will increase. Assuming a:b:c = 3:3:4, the comprehensive optimization score for Q0 is 0.3 * 0.12 + 0.3 * (-0.25) + 0.4 * (-0.296) = -0.1574.

[0118] Step S4: Execute the optimal control strategy and perform feedback optimization based on the feedback data corresponding to the optimal control strategy.

[0119] Specifically, the system obtains feedback data after the optimal control strategy is executed. The feedback data includes at least the initial photovoltaic operating state, the optimal control strategy adopted, the actual system response, and the final photovoltaic operating state. An incremental learning package is generated based on the feedback data. The photovoltaic ecological monitoring model is incrementally updated based on the incremental learning package. The corresponding parameters in the preset strategy space are updated based on the feedback data.

[0120] For example, one afternoon, an abnormal temperature was detected in component A12, with an initial temperature of 82°C. Assume the optimal control strategy is to apply a cooling field with a strength of -0.4 and a radius of 3 meters to the coordinates of A12, and inject a meme with an "enhanced field following weight of 0.2" into the surrounding 5 agents. After the strategy is executed, the actual response is as follows: the temperature of A12 drops to 76°C within 3 minutes, but the voltage of adjacent branches experiences a brief fluctuation of about 1.5%. After 10 minutes, the hotspot temperature stabilizes at 74°C, the voltage fluctuation subsides, and the total power is 0.8% lower than the expected target. The system packages the data from the entire process to generate a structured incremental learning package. This incremental learning package contains: a snapshot of the original state, parameters of the executed strategy, time-series data of the actual response, a snapshot of the final state, and corresponding performance deviation analysis, such as "voltage fluctuation amplitude exceeds expectations" and "power tracking has a slight deviation." This incremental learning package is then marked as a "thermal suppression-voltage disturbance" related case.

[0121] The learning package is first used to update the parameters of the physical coupling field submodule in the photovoltaic ecological monitoring model. Through incremental training, it was found that in the region near A12, the original thermal-electric coupling parameters underestimated the sensitivity of rapid cooling to local voltage. Through backpropagation, the weight parameters of the corresponding "thermal disturbance on voltage" in the graph neural network in this region were fine-tuned, so that the model can better predict such side effects in the future. At the same time, the photovoltaic ecological agent submodule updates its internal rules based on this feedback. For example, when generating a strong cooling field strategy for similar coordinates, a compensatory, slight "voltage support field" will be generated as the default option.

[0122] Figure 3 This is a schematic diagram of a photovoltaic operation status management and control system according to the present invention.

[0123] Specifically, a photovoltaic operation status management and control system includes:

[0124] The model construction module is used to build and train the photovoltaic ecological monitoring model, which includes a physical coupling field submodule and a photovoltaic ecological proxy submodule.

[0125] The risk assessment module is used to perform risk assessment on the real-time photovoltaic operation dataset based on the photovoltaic ecological monitoring model and generate risk assessment results.

[0126] The strategy generation module is used to explore strategies and perform counterfactual reasoning based on the risk assessment results to generate the corresponding optimal control strategy.

[0127] The feedback optimization module is used to obtain feedback data after executing the optimal control strategy, and to perform feedback optimization on the corresponding modules of the system based on the feedback data.

[0128] This embodiment provides an electronic device, which may include: at least one processor, at least one network interface, a user interface, a memory, and at least one communication bus.

[0129] The following is a detailed introduction to the various components of the electronic device:

[0130] The communication bus can be used to enable communication between the various components mentioned above.

[0131] The user interface may include buttons, and optional user interfaces may also include standard wired interfaces and wireless interfaces.

[0132] The network interface may include, but is not limited to, Bluetooth modules, NFC modules, Wi-Fi modules, etc.

[0133] The processor may include one or more processing cores. It connects various parts of the electronic device via interfaces and lines, executing instructions, programs, code sets, or instruction sets stored in memory, and accessing data stored in memory to perform various functions and process data. Optionally, the processor can be implemented using at least one hardware form of DSP, FPGA, or PLA. The processor may integrate one or more of the following: CPU, GPU, and modem, for example, one or more Digital Signal Processors (DSPs) or one or more Field Programmable Gate Arrays (FPGAs). The CPU primarily handles the operating system, user interface, and applications; the GPU is responsible for rendering and drawing the content displayed on the screen; and the modem handles wireless communication. It is understood that the modem may also be implemented as a separate chip without being integrated into the processor.

[0134] The memory may include RAM or ROM. Optionally, the memory may include a non-transitory computer-readable medium. The memory may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (e.g., touch function, sound playback function, image playback function, etc.), instructions for implementing the various method embodiments described above, etc.; the data storage area may store data involved in the various method embodiments described above, etc. Optionally, the memory may also be at least one storage device located remotely from the aforementioned processor. The memory, as a computer storage medium, may include an operating system, a network communication module, a user interface module, and an evaluation application. The processor may be used to call the evaluation application stored in the memory and execute the method steps mentioned in the foregoing embodiments.

[0135] It should be noted that the above formulas are all dimensionless calculations. The formulas are derived from software simulations based on a large amount of collected data to obtain the most recent real-world results. The preset parameters in the formulas are set by those skilled in the art according to the actual situation.

[0136] The above embodiments can be implemented, in whole or in part, through software, hardware (such as circuits), firmware, or any other combination thereof.

[0137] When implemented using software, the above embodiments can be implemented in whole or in part as a computer program product, which includes one or more computer instructions or computer programs; when the computer instructions or computer programs are loaded or executed on a computer, the processes or functions described in accordance with the embodiments of the present invention are generated in whole or in part.

[0138] It is understood that the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device; the computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via transmission methods such as infrared, wireless, or microwave; the computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more sets of available media. The available medium can be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. A semiconductor medium can be a solid-state drive.

[0139] It should be understood that the term "and / or" in this article is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent three cases: A alone, A and B simultaneously, and B alone. A and B can be singular or plural. Additionally, the character " / " in this article generally indicates an "or" relationship between the preceding and following related objects, but it can also represent an "and / or" relationship. Please refer to the context for a more accurate understanding.

[0140] It should be understood that, in the embodiments of the present invention, the order of the above-mentioned process numbers does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

[0141] The above-described embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should all be included within the protection scope of the present invention.

Claims

1. A photovoltaic operation status management and control method, characterized in that, Including the following steps: Construct and train a photovoltaic ecological monitoring model that includes a physical coupling field submodule and a photovoltaic ecological proxy submodule; Risk assessment is performed on real-time photovoltaic operation datasets based on the photovoltaic ecological monitoring model, and risk assessment results are generated. Based on the risk assessment results, strategy exploration and counterfactual reasoning are performed to generate the corresponding optimal control strategy; The optimal control strategy is executed, and feedback optimization is performed based on the feedback data corresponding to the optimal control strategy.

2. The photovoltaic operation status management and control method according to claim 1, characterized in that, Construct and train a photovoltaic ecological monitoring model that includes a physical coupling field submodule and a photovoltaic ecological proxy submodule, including: A graph attention network is used to simulate the current transmission process of a photovoltaic cluster, and a photovoltaic electrical network is constructed. A graph convolutional network is used to simulate the thermal diffusion phenomenon of photovoltaic clusters, and a photovoltaic thermal network is constructed. A physical coupling field submodule is constructed based on the photovoltaic electrical network and photovoltaic thermal network. The physical coupling field submodule is used to output the global physical coupling field distribution of the photovoltaic cluster. A photovoltaic agent is constructed for each photovoltaic module in the photovoltaic cluster based on multi-agent reinforcement learning. The strategy network of the photovoltaic agent takes the local state of the photovoltaic module as input, combines it with a preset reward function, and outputs the corresponding parameter adjustment strategy. A photovoltaic ecosystem agent submodule is constructed based on an intelligent agent cluster composed of photovoltaic intelligent agents; The global physical coupling field distribution output by the physical coupling field submodule is used as the state space parameter of the photovoltaic intelligent agent; After the photovoltaic cluster executes the parameter adjustment strategy output by the photovoltaic agent, it updates the parameters of the physical coupling field submodule based on the latest photovoltaic cluster state.

3. The photovoltaic operation status management and control method according to claim 2, characterized in that, The method further includes: Using historical operating data of photovoltaic clusters as the training dataset, the time-series prediction capability of the physical coupling field submodule is trained by combining a preset loss function; The preset training function includes at least the mean square error between the predicted value and the true value of the training dataset, and the residual sum between the predicted value and the preset physical constraints; Based on the pre-trained physical coupling field submodule, the training dataset is run in a preset digital twin environment to simulate different photovoltaic cluster operation scenarios; With the goal of maximizing cumulative reward, a multi-agent proximal strategy optimization is used to train the photovoltaic ecosystem agent submodule.

4. The photovoltaic operation status management and control method according to claim 1, characterized in that, Risk assessment is performed on real-time photovoltaic operation datasets based on a photovoltaic ecological monitoring model, generating risk assessment results, including: After the real-time photovoltaic operation dataset is input into the photovoltaic ecological monitoring model, the physical coupling field submodule extracts field features from the real-time photovoltaic operation dataset and generates a physical coupling field feature vector. The physical coupling field feature vector includes at least global field features and local anomaly features; The photovoltaic ecological agent submodule analyzes the real-time photovoltaic operation dataset and generates an ecological agent feature vector; The ecological agent feature vector includes at least a strategy consistency coefficient and a strategy preference distribution; Risk prediction is performed based on the physical coupling field feature vector and the ecological agent feature vector to obtain short-term risk probability, phase transition early warning information and abnormal area distribution. The physical coupling field feature vector, ecological proxy feature vector, short-term risk probability, phase transition early warning information, and abnormal area distribution are encapsulated into a risk assessment result for output.

5. The photovoltaic operation status management and control method according to claim 1, characterized in that, Based on the risk assessment results, strategy exploration and counterfactual reasoning are conducted to generate the corresponding optimal control strategy, including: The risk assessment results are used to conduct a demand analysis to obtain management and control demand data; The management and control demand data are analyzed based on a preset candidate strategy generator to generate a set of candidate control strategies; The candidate control strategy set is executed within a preset digital twin environment, and a preset counterfactual perturbation generation mechanism is used to inject counterfactual perturbations into the execution process of the candidate control strategy set. The counterfactual perturbations include at least Monte Carlo perturbations based on historical distributions, extreme perturbations based on chaos theory, and adversarial perturbations based on adversarial networks. Monitor the execution utility of the candidate control strategy set after injecting counterfactual perturbations, and generate a first utility dataset; The first utility dataset is compared with an interference-free baseline to obtain a stability score for the candidate regulation strategy. The candidate control strategy set is screened based on the stability score of the candidate control strategy to obtain the optimal control strategy.

6. The photovoltaic operation status management and control method according to claim 5, characterized in that, The management and control demand data is analyzed based on a preset candidate strategy generator to generate a set of candidate control strategies, including: A dynamic strategy space is constructed based on management and control demand data. Latin hypercube sampling is used to randomly sample within the dynamic strategy space to generate the original control strategy set. Simulations are performed on all original control strategies within the original control strategy set to obtain the corresponding expected control effects. If the expected regulatory effect meets the preset screening conditions, the corresponding original regulatory strategy will be used as a candidate regulatory strategy. If the expected regulatory effect does not meet the preset screening conditions, a multi-objective evolutionary algorithm is used to iteratively optimize the original regulatory strategy set until the number of strategies in the candidate regulatory strategy set reaches a preset threshold.

7. The photovoltaic operation status management and control method according to claim 1, characterized in that, Execute the optimal control strategy and perform feedback optimization based on the feedback data corresponding to the optimal control strategy, including: Obtain feedback data after the execution of the optimal control strategy, wherein the feedback data includes at least the initial photovoltaic operating state, the optimal control strategy adopted, the actual system response, and the final photovoltaic operating state; An incremental learning package is generated based on the feedback data, and the photovoltaic ecological monitoring model is incrementally updated based on the incremental learning package. The corresponding parameters in the preset strategy space are updated based on the feedback data.

8. A photovoltaic operation status management and control system, used to implement the photovoltaic operation status management and control method according to any one of claims 1 to 7, characterized in that, include: A model construction module is used to build and train a photovoltaic ecological monitoring model, which includes a physical coupling field submodule and a photovoltaic ecological proxy submodule. The risk assessment module is used to perform risk assessment on the real-time photovoltaic operation dataset based on the photovoltaic ecological monitoring model and generate risk assessment results. The strategy generation module is used to explore strategies and perform counterfactual reasoning based on the risk assessment results, and generate the corresponding optimal control strategy. The feedback optimization module is used to obtain feedback data after executing the optimal control strategy, and to perform feedback optimization on the corresponding modules of the system based on the feedback data.