Flow battery energy storage intelligent scheduling method and system
By constructing a prediction-decision-explanation closed-loop intelligent scheduling method for flow battery energy storage, and combining time-series prediction models, reinforcement learning agents, and large language models, the problem of rigid scheduling strategies and low transparency of flow battery energy storage systems is solved, achieving efficient, safe, and human-machine collaborative intelligent scheduling.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHEJIANG NORMAL UNIV
- Filing Date
- 2026-03-09
- Publication Date
- 2026-06-16
AI Technical Summary
Existing flow battery energy storage systems suffer from problems such as rigid strategies, heavy computational burden, low decision-making transparency, and insufficient human-machine collaboration in their scheduling methods, making it difficult to achieve efficient, safe, and intelligent scheduling.
A prediction-decision-explanation closed loop is constructed using a time-series prediction model, a reinforcement learning agent, and a large language model. Combined with a visual interactive interface, it enables high-precision electricity price prediction, transparent decision-making, and human-machine collaboration.
It improves the economy, safety and interpretability of the scheduling strategy of flow battery energy storage system, enhances the reliability and practicality of human-machine collaborative decision-making, optimizes charge and discharge benefits and delays battery degradation.
Smart Images

Figure CN122225501A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of power system automation and artificial intelligence, and more specifically to a smart scheduling method and system for flow battery energy storage. Background Technology
[0002] Building a clean, low-carbon, safe, and efficient new power system is a core direction of my country's energy strategy. Against this backdrop, flow battery energy storage technology, especially vanadium redox flow batteries, is becoming a key technology for enhancing grid regulation capabilities and promoting the integration of new energy sources due to its unique advantages such as long cycle life, intrinsic safety, and easy capacity expansion. However, achieving efficient, safe, and intelligent scheduling of energy storage systems in applications such as grid peak shaving and frequency regulation still faces many technical challenges.
[0003] Currently, the scheduling methods for energy storage power station operation mainly follow the following technical routes, but they all have technical defects that urgently need to be addressed.
[0004] First, scheduling methods based on fixed rules rely on preset, static time-period divisions. These methods are rigid and unable to perceive and respond to dynamic, nonlinear demand fluctuations in the power grid, making it difficult to capture optimal scheduling opportunities and resulting in limited overall scheduling efficiency.
[0005] Secondly, optimization methods based on traditional mathematical programming attempt to solve for the optimal strategy by establishing mathematical models. However, in pursuit of solvability, these methods often have to oversimplify complex nonlinear processes such as battery aging and market uncertainty, leading to significant deviations between the optimization results and the actual physical processes. Furthermore, the computational burden increases dramatically with the problem size, making it difficult to meet the real-time requirements of high-frequency online decision-making.
[0006] Furthermore, with the development of artificial intelligence technology, machine learning-based scheduling methods, especially reinforcement learning, have been introduced into this field. While these methods possess the potential to autonomously learn strategies from data, existing technical solutions still have significant shortcomings: First, the decision-making process of reinforcement learning models is an opaque "black box," and its decision-making logic is difficult for human operators to understand and trust, which seriously hinders its deployment and application in real power systems where safety and reliability are paramount. Second, most methods lack effective mechanisms for integrating high-precision forward-looking information, and decisions are mostly passive responses to current and historical states, lacking strategic foresight. Third, existing systems generally lack effective human-machine collaboration interfaces. When the agent makes high-risk decisions or decisions that violate domain experience, human experts cannot intervene and correct them in a timely and evidence-based manner, and the robustness of the system's decisions in complex and unstable environments cannot be guaranteed.
[0007] Furthermore, while existing patented technologies offer various optimization schemes for energy storage scheduling, they still have their limitations. For example, capacity allocation methods based on greedy algorithms (such as CN118868163A) can achieve rapid solutions, but they rely on static electricity price allocation, making them difficult to adapt to dynamic market environments, and they do not consider battery degradation and decision interpretability. Control methods considering peak and off-peak electricity prices (such as CN119324510A) offer more refined allocation of electricity price periods, but their prediction models are relatively simple, and decisions are still rule-driven, lacking autonomous learning capabilities. End-to-end methods that consider battery degradation costs (such as CN120146894A) incorporate battery aging into the objective function, but their prediction modules do not fully utilize the latest time-series architecture, the decision-making part lacks the adaptability of reinforcement learning, and the overall approach remains a "black box," lacking interpretability and human-machine collaboration mechanisms.
[0008] In summary, existing technologies cannot provide a smart scheduling solution for flow battery energy storage that effectively integrates high-precision forecasting, enables transparent and reliable decision-making, and supports efficient human-machine collaboration. This technological bottleneck limits the full realization of the technological value of flow battery energy storage systems. Therefore, there is an urgent need in this field for an innovative system and method to overcome the aforementioned shortcomings. Summary of the Invention
[0009] In view of the above problems, this invention is proposed to provide a smart scheduling method and system for flow battery energy storage that overcomes or at least partially solves the above problems. By constructing a prediction-decision-interpretation closed-loop smart scheduling mechanism for flow battery energy storage, the invention significantly improves the economy, safety and interpretability of the scheduling strategy. It optimizes the charging and discharging benefits and delays battery degradation, while also supporting efficient manual review and intervention, thus enhancing the reliability and practicality of human-machine collaborative decision-making.
[0010] To achieve the above objectives, the present invention adopts the following technical solution:
[0011] In a first aspect, embodiments of the present invention provide a smart scheduling method for flow battery energy storage, comprising: Based on the time series forecasting model, input historical electricity price data and related exogenous variables, and output the electricity price forecast sequence for a specified future period; Based on a reinforcement learning agent, it receives real-time electricity prices, a predicted sequence of electricity prices for a specified future period, and the current state of charge of the battery, and outputs charging and discharging decision actions. Based on a large language model, the charging and discharging decision actions are analyzed to generate a natural language explanation report that includes explanations of decision reasons, benefit analysis, battery safety status assessment, and risk warnings. The system's key statuses are dynamically displayed through a visual interactive interface, and manual review and intervention are supported.
[0012] Preferably, the relevant exogenous variables include regional power load, regional electricity consumption, and renewable energy generation.
[0013] Preferably, the reinforcement learning agent is constructed based on the deep Q-network algorithm, and its state space includes real-time electricity price, a predicted sequence of electricity prices for a specified future period, and the current state of charge of the battery, and its action space includes three discrete actions: charging, maintaining, and discharging.
[0014] Preferably, the reward function of the reinforcement learning agent is:
[0015]
[0016]
[0017] in, It is the instant reward at time t. It is the revenue from buying and selling electricity. It's the cost of battery degradation. It's a penalty for overcharging and over-discharging the SOC. It is the wholesale electricity price at time t. It refers to the battery charging and discharging power. It refers to time resolution. Peukert's constant, This refers to the battery's nominal cycle life under 100% deep discharge conditions. It is the total investment cost per unit capacity of the battery. and These represent the battery state of charge at the current moment and the previous moment, respectively.
[0018] Preferably, the loss function of the reinforcement learning agent... for:
[0019]
[0020] in, It is the state space at time t+1. It is the action space. It is a discount factor. and These are the parameters for the current network and the target network, respectively. It is the current network based on state and actions The output predicted Q value, It's a hyperparameter.
[0021] Preferably, the key system states include the current real-time electricity price, the current state of charge of the battery, the electricity price prediction sequence for a specified future period, the charging and discharging decision actions, and the natural language interpretation report.
[0022] Preferably, it further includes: Execute the approved or intervened charging / discharging decision action, and update the reinforcement learning agent based on the execution result.
[0023] In a second aspect, embodiments of the present invention provide a smart scheduling system for flow battery energy storage, comprising: The time series forecasting module is used to take historical electricity price data and related exogenous variables as input, and output the electricity price forecast sequence for a specified future period based on the time series forecasting model. The reinforcement learning decision module is used to receive real-time electricity prices, the electricity price prediction sequence for the specified future period, and the current state of charge of the battery based on the reinforcement learning agent, and output charging and discharging decision actions. The interpretability analysis module is used to parse the charging and discharging decision actions based on a large language model and generate a natural language explanation report that includes explanations of the reasons for the decision, benefit analysis, battery safety status assessment and risk warnings. The human-computer interaction module is used to dynamically display the key status of the system through a visual interactive interface, and supports manual review and intervention.
[0024] As can be seen from the above technical solution, compared with the prior art, the present invention discloses a smart scheduling method and system for flow battery energy storage, which has the following effects: It can significantly improve the response accuracy of flow battery energy storage systems, ensuring that the battery SOC always operates within a safe range, effectively protecting the safety and lifespan of flow batteries. It introduces large language model parsing reinforcement learning decision-making to enhance the transparency and credibility of decision-making. Furthermore, it constructs a human-machine collaborative mode through a visual interface and standardized intervention channels to ensure the robustness of decision-making in complex and unforeseen scenarios.
[0025] This approach provides a complete engineering solution, from algorithm modeling and data management to human-computer interface. Employing modular design and mainstream technology stacks ensures the system's scalability, maintainability, and good engineering reproducibility. Attached Figure Description
[0026] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.
[0027] Figure 1 This is a flowchart of a smart scheduling method for flow battery energy storage provided in an embodiment of the present invention; Figure 2 This is a diagram illustrating the visual interface provided in an embodiment of the present invention; Figure 3 This is a flowchart illustrating the training process of a reinforcement learning agent provided in an embodiment of the present invention. Figure 4 This is a bar chart showing the profit comparison of the simulation experiments provided in the embodiments of the present invention; Figure 5 This is a comparative bar chart showing whether or not overcharge / over-discharge penalties are introduced in the embodiments of the present invention; Figure 6 This is a schematic diagram of a smart scheduling system for flow battery energy storage provided in an embodiment of the present invention. Detailed Implementation
[0028] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0029] This invention discloses a smart scheduling method for flow battery energy storage, such as... Figure 1 As shown, it includes: Based on the time series forecasting model, input historical electricity price data and related exogenous variables, and output the electricity price forecast sequence for a specified future period; Based on a reinforcement learning agent, it receives real-time electricity prices, a predicted sequence of electricity prices for a specified future period, and the current state of charge of the battery, and outputs charging and discharging decision actions. Based on a large language model, the charging and discharging decision-making actions are analyzed to generate a natural language explanation report that includes explanations of the reasons for the decision, benefit analysis, battery safety status assessment and risk warnings. The system's key statuses are dynamically displayed through a visual interactive interface, and manual review and intervention are supported.
[0030] The first step of this invention is to perform high-precision prediction of future power grid conditions by constructing a time-series prediction model based on the Tiny Time Mixers (TTMs) architecture. This model takes structured time-series data such as historical electricity price data, regional power load, renewable energy generation, and regional electricity consumption as input. Through its unique adaptive block partitioning and resolution prefix tuning techniques, it outputs a predicted electricity price sequence for a specified future period, providing forward-looking information for decision-making.
[0031] The second step involves constructing a reinforcement learning agent based on the Deep Q-Network (DQN) algorithm. This agent makes charging and discharging decisions. The method sets the state space of the reinforcement learning as the real-time electricity price, the predicted electricity price sequence for a specified future period, and the battery's current state of charge (SOC). The action space includes charging, maintenance, and discharging commands. The reward integrates the revenue from buying and selling electricity, the cost of battery degradation, and penalties for overcharging and over-discharging at SOC. Its design goal is to maximize long-term cumulative benefits through charging and discharging energy while maintaining the battery at a normal SOC. Since overcharging and over-discharging accelerates battery aging, penalties for overcharging and over-discharging at SOC are added to the reward function.
[0032] The third step is to analyze the rationality of the decisions produced by the reinforcement learning agent. An interpretability analysis module is built based on the Large Language Model (LLM). This module takes the state information received by the reinforcement learning agent and the charging and discharging decisions it outputs as input, and generates a natural language analysis report containing explanations of the reasons for the decisions, benefit analysis, battery safety status assessment, and potential risk warnings through structured prompting engineering, making the "black box" decision-making process transparent.
[0033] The fourth step involves human-computer collaboration and visual interaction. This method provides a graphical human-computer interface that dynamically displays key system states, such as the current real-time electricity price, the battery's current state of charge, the predicted electricity price sequence for a specified future period, charging and discharging decisions, and natural language explanation reports. Human experts can review the system's decisions based on the explained information and their own experience, and enforce alternative actions through the intervention channels provided by the interface, thus achieving collaboration between algorithmic intelligence and human experience.
[0034] Furthermore, it also includes: Execute the charging and discharging decisions after review or intervention, and update the reinforcement learning agent based on the execution results.
[0035] The specific implementation process of this invention is as follows: The first step is data preparation and preprocessing. During system initialization, historical electricity price data is obtained from electricity market operators, with data fields including at least a timestamp and the actual electricity price. Simultaneously, information such as regional electricity load, regional electricity consumption, and renewable energy generation can be collected as relevant exogenous variables.
[0036] The Pandas library in Python was used to clean and perform feature engineering on historical electricity price data and related exogenous variable data. Missing values were filled using linear interpolation between preceding and following time points. Hourly information was extracted, and the sine and cosine values of the hourly information were calculated to capture the intraday periodicity of electricity prices.
[0037] Z-score standardization is applied to numerical features such as electricity price and load, so that their mean is 0 and their standard deviation is 1, in order to improve the stability of model training.
[0038] The second step is the construction and deployment of the time series forecasting model. This involves implementing the TTMs model using the PyTorch framework.
[0039] Set the input sequence length to x and the prediction length to y. The feature dimension c includes the target variable electricity price and related exogenous variables.
[0040] The model employs adaptive block partitioning and resolution prefix tuning techniques, extracting multi-scale temporal features through multi-layer TSMixer blocks. The model output is a predicted electricity price sequence for the next y hours.
[0041] TTMs are pre-trained models. Model parameters can be downloaded directly from the internet. To improve the accuracy of electricity price predictions for specific regions, the model can be fine-tuned using historical electricity price data for the corresponding region.
[0042] The third step is to build and train a reinforcement learning agent, such as... Figure 3 As shown, the DQN algorithm is implemented based on the Stable-Baselines3 library.
[0043] Reinforcement learning agents employ an experience replay mechanism, using a fixed-capacity replay buffer to store state transition tuples. Uniform random sampling is employed; the target network uses a dual-network architecture, periodically synchronizing parameters from the current network to provide a stable learning objective; the Huber loss function is used, the Adam optimizer is used, and the learning rate is 1× The Huber loss function is expressed as follows:
[0044] in, It is the current network based on state and actions Output predicted Q-value; hyperparameters This is the threshold parameter of the Huber loss function, which controls the transition point from quadratic loss to linear loss; in this embodiment, it is set to 1.0. Target Q-value. Calculated using Bellman's equations:
[0045] here, It is an instant reward, consisting of three parts: revenue from buying and selling electricity. Battery degradation costs and SOC overcharge and over-discharge penalties ,Right now . It is the state space at time t+1, including: the current real-time electricity price. Current battery SOC value And the future Hourly electricity price forecast sequence. Action It is discrete into three phases: charging, sustaining, and discharging. Discount factor It is set to 0.99 to balance the importance of current rewards versus future rewards. and These are the parameters of the current network and the target network, respectively. Target network parameters Periodically from current network parameters Replication is used to stabilize the training process.
[0046] Electricity trading formula:
[0047] in The wholesale electricity price at time t. This represents the battery's charging and discharging power, where negative values indicate charging and positive values indicate discharging. Set the time resolution to 1 hour.
[0048] The formula for the SOC state evolution model is:
[0049] in, Self-discharge rate The energy conversion efficiency coefficient. This is the rated capacity.
[0050] The formula for battery degradation cost is:
[0051] in, is the Peukert constant, used to characterize the nonlinear effect of discharge rate on battery aging; This indicates the nominal cycle life of the battery under 100% deep discharge conditions; the larger the value, the longer the battery life. It is the total investment cost per unit capacity of the battery, used to convert the life loss into economic cost; and These represent the battery state of charge at the current moment and the previous moment, respectively.
[0052] The specific training process is as follows: Figure 3As shown, its core execution sequence can be summarized as follows: 1) Initialize the experience replay buffer, the current network, and the target network; 2) The agent interacts with the environment, formulates decision actions, and collects experience; 3) Store the experience in the replay buffer; 4) Periodically sample the data in the experience replay buffer and calculate the Huber loss, updating the current network; 5) Periodically synchronize the target network; 6) Iterate in a loop until training is complete. Specifically, during the iterative process of offline training, the reinforcement learning agent interacts with the power grid environment. First, it executes charging and discharging commands: the simulated environment changes the battery's SOC state according to the charging and discharging decision actions output by the agent; next, it calculates the reward: the system calculates the reward of the charging and discharging decision action in real time according to the preset reward function; finally, it performs policy optimization feedback: the system stores the calculated reward evaluation along with the state transition data in the experience replay buffer, calculates the Huber loss, and uses the gradient descent algorithm to correct the network parameters, thereby achieving closed-loop optimization feedback of the scheduling policy.
[0053] After sufficient training of the DQN reinforcement learning agent using historical data, the reinforcement learning agent is deployed in online scheduling applications. Its decision generation process is as follows.
[0054] At each scheduling time, the agent receives a state vector consisting of three parts: the current real-time electricity price, the current state of charge of the battery, and a predicted sequence of electricity prices for a specified future period. This state vector is preprocessed and then input into a trained deep Q-network.
[0055] The network internally calculates the long-term expected value of each possible action—charging, maintaining, or discharging—in the current state through a series of nonlinear mappings. The agent employs a deterministic strategy, directly selecting the action with the highest Q-value as the optimal scheduling instruction for the current moment.
[0056] Step 4: Construct a decision interpreter based on the Deepseek-R1 large language model.
[0057] To make the agent's "black box" decision-making transparent, the system initiates an interpretability analysis module based on a large language model while generating scheduling instructions.
[0058] This module automatically collects structured contextual information related to the current decision, mainly including: real-time electricity price, current state of charge of the battery, electricity price prediction sequence for a specified future period, and charging / discharging decision actions output by the reinforcement learning agent.
[0059] This information is organized into a structured cue text and input into a large language model. Through carefully designed cue engineering, the model is guided to act as an energy scheduling expert, analyzing and reasoning about the given decision information.
[0060] Based on its trained general knowledge and logical abilities, the large language model automatically generates a natural language report that includes explanations of decision-making reasons, benefit analysis, battery safety status assessment, and potential operational risk warnings. This report enables operations and maintenance personnel to quickly understand the AI's decision-making logic.
[0061] Step 5: Human-Machine Collaboration and System Deployment. A graphical user interface developed using PyQt5 displays real-time electricity prices, battery state of charge, future electricity price forecasts for a specified period, charging / discharging decisions, and natural language explanation reports. It provides an action intervention dropdown menu and supports manual overriding of system decisions by experts. Figure 2 As shown.
[0062] The simulation experiment uses hourly electricity price data from the Alberta electricity market in Canada as the core input data. The data includes key fields such as timestamps and actual electricity prices. At the same time, the load data of the power system in the same region during the same period is collected as exogenous variables for training and validation of the time series prediction model, ensuring the authenticity of the data and the adaptability of the scenario.
[0063] This simulation strictly follows the actual operating characteristics of flow batteries and the rules of electricity market transactions to set the core parameters. The specific parameter values are shown in Table 1.
[0064] Table 1. Actual Operating Characteristics of Flow Batteries and Core Parameters for Electricity Market Trading Rules
[0065] To verify the superiority of the proposed model, the following three types of models were selected as comparison benchmarks: Baseline-DQN, a DQN model that makes decisions based solely on real-time electricity prices and current SOC without a forward-looking electricity price prediction module; PatchTST-DQN, a hybrid model that uses the PatchTST time series prediction model to replace TTMs and combines it with DQN; and CnnLstmAttn-DQN, a hybrid model that uses the CNN-LSTM-Attention time series prediction model and combines it with DQN.
[0066] Within the same simulation period, key indicators of the TTMs-DQN model of this invention were quantitatively compared with three contrasting models. The results show that the present invention has significant advantages in terms of profitability and the rationality of its operational strategy. Experimental results are as follows: Figure 4 As shown in Table 2 below.
[0067] Table 2 Comparison Data
[0068] From an economic perspective, the TTMs-DQN model of this invention achieved a total actual revenue of CAD 550,849, representing a 75.9% improvement over Baseline-DQN, a 2.4% improvement over PatchTST-DQN, and a 16.4% improvement over CnnLstmAttn-DQN, making it the most outstanding model in terms of revenue growth. The average revenue per operation was CAD 62.88, higher than Baseline-DQN's CAD 37.80, PatchTST-DQN's CAD 61.35, and CnnLstmAttn-DQN's CAD 54.01, indicating that this invention achieves higher scheduling efficiency per operation and does not rely on accumulating revenue through frequent operations.
[0069] In terms of operational strategy, the TTMs-DQN model of this invention performs 1097 charging operations and 952 discharging operations, for a total of 2049 charging and discharging operations, with 6711 idle operations. Compared with Baseline-DQN, it performs 2.1 times more charging and discharging operations, and reduces the number of idle operations by 1088, significantly lowering the idle rate. This reflects the proactive decision-making characteristics of this invention based on electricity price forecasting, avoiding passive idleness caused by a lack of forward-looking information. Compared with PatchTST-DQN and CnnLstmAttn-DQN, this invention slightly reduces the number of charging and discharging operations, but improves the average revenue per operation. This avoids battery damage that may result from overcharging or discharging, and ensures that potential scheduling arbitrage opportunities are not missed by reasonably controlling the number of idle operations.
[0070] In the optimized scheduling of battery energy storage systems, the proper control of State of Charge (SOC) directly affects battery life and system safety. Overcharging and over-discharging significantly accelerate battery aging. This experiment introduces overcharging and over-discharging penalties into a reinforcement learning reward mechanism, compares and analyzes the impact of the penalty mechanism on the average SOC and operating state of the battery, and verifies the effectiveness of the penalty mechanism in ensuring battery safety and extending its service life.
[0071] like Figure 5 As shown, after introducing overcharge and over-discharge penalties, the average SOC of each model is closer to 0.5, with the model of this invention showing the best performance. This indicates that overcharge and over-discharge penalties can significantly optimize the SOC distribution, reduce the frequency of extreme states, and slow down battery aging, verifying the necessity of the penalty mechanism in ensuring battery safety.
[0072] Based on the same inventive concept, embodiments of the present invention also provide a smart scheduling system for flow battery energy storage, such as... Figure 6 As shown, it includes: The time series forecasting module is used to take historical electricity price data and related exogenous variables as input, and output the electricity price forecast sequence for a specified future period based on the time series forecasting model. The reinforcement learning decision module is used to receive real-time electricity prices, the electricity price prediction sequence for the specified future period, and the current state of charge of the battery based on the reinforcement learning agent, and output charging and discharging decision actions. The interpretability analysis module is used to parse the charging and discharging decision actions based on a large language model and generate a natural language explanation report that includes explanations of the reasons for the decision, benefit analysis, battery safety status assessment and risk warnings. The human-computer interaction module is used to dynamically display the key status of the system through a visual interactive interface, and supports manual review and intervention.
[0073] Since the principle behind the problem solved by the system is similar to that of the aforementioned methods, the implementation of the system can be found in the implementation of the aforementioned methods, and the repetitive parts will not be repeated.
[0074] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to in the method section.
[0075] The above description of the disclosed embodiments enables those skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the invention is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A smart scheduling method for flow battery energy storage, characterized in that, include: Based on the time series forecasting model, input historical electricity price data and related exogenous variables, and output the electricity price forecast sequence for a specified future period; Based on a reinforcement learning agent, it receives real-time electricity prices, a predicted sequence of electricity prices for a specified future period, and the current state of charge of the battery, and outputs charging and discharging decision actions. Based on a large language model, the charging and discharging decision actions are analyzed to generate a natural language explanation report that includes explanations of decision reasons, benefit analysis, battery safety status assessment, and risk warnings. The system's key statuses are dynamically displayed through a visual interactive interface, and manual review and intervention are supported.
2. The method as described in claim 1, characterized in that, The relevant exogenous variables include regional power load, regional electricity consumption, and renewable energy generation.
3. The method according to claim 1, characterized in that, The reinforcement learning agent is constructed based on the deep Q-network algorithm. Its state space includes real-time electricity price, a predicted sequence of electricity prices for a specified future period, and the current state of charge of the battery. Its action space includes three discrete actions: charging, maintenance, and discharging.
4. The method as described in claim 3, characterized in that, The reward function of the reinforcement learning agent is: in, It is the instant reward at time t. It is the revenue from buying and selling electricity. It's the cost of battery degradation. It's a penalty for overcharging and over-discharging the SOC. It is the wholesale electricity price at time t. It refers to the battery charging and discharging power. It is time resolution. Peukert's constant, This refers to the battery's nominal cycle life under 100% deep discharge conditions. It is the total investment cost per unit capacity of the battery. and These represent the battery state of charge at the current moment and the previous moment, respectively.
5. The method as described in claim 4, characterized in that, The loss function of the reinforcement learning agent for: in, It is the state space at time t+1. It is the action space. It is a discount factor. and These are the parameters for the current network and the target network, respectively. It is the current network based on state and actions The output predicted Q value, It's a hyperparameter.
6. The method according to claim 1, characterized in that, The key states of the system include the current real-time electricity price, the current state of charge of the battery, the electricity price prediction sequence for a specified future period, the charging and discharging decision actions, and the natural language interpretation report.
7. The method according to claim 1, characterized in that, Also includes: Execute the approved or intervened charging / discharging decision action, and update the reinforcement learning agent based on the execution result.
8. A smart dispatching system for flow battery energy storage, characterized in that, include: The time series forecasting module is used to take historical electricity price data and related exogenous variables as input, and output the electricity price forecast sequence for a specified future period based on the time series forecasting model. The reinforcement learning decision module is used to receive real-time electricity prices, the electricity price prediction sequence for the specified future period, and the current state of charge of the battery based on the reinforcement learning agent, and output charging and discharging decision actions. The interpretability analysis module is used to parse the charging and discharging decision actions based on a large language model and generate a natural language explanation report that includes explanations of the reasons for the decision, benefit analysis, battery safety status assessment and risk warnings. The human-computer interaction module is used to dynamically display the key status of the system through a visual interactive interface, and supports manual review and intervention.