Building park energy dispatching method and device based on air conditioning load flexible potential tapping, equipment and medium

By constructing a temperature dynamic evolution model and an air conditioning energy efficiency conversion model, and combining a state extension observer and maximum entropy reinforcement learning, the flexible potential mining of air conditioning load is optimized, and cost-optimal adjustment instructions that do not affect user experience are generated. This solves the problems of inaccurate flexible load mining and insufficient user comfort in existing technologies, and improves energy utilization efficiency.

CN122243114APending Publication Date: 2026-06-19HANGZHOU KAIDA ELECTRIC POWER CONSTR +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HANGZHOU KAIDA ELECTRIC POWER CONSTR
Filing Date
2026-04-24
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing flexible load tapping technology for HVAC systems relies on fixed temperature dead zones and simplified models, ignoring indoor and outdoor environmental disturbances and human activities. This results in insufficiently accurate potential ranges, and flexible adjustments often sacrifice user comfort, leading to low user participation.

Method used

By establishing a temperature dynamic evolution model, an air conditioning energy efficiency conversion model, and a state extension observer, a target simulation environment is constructed. Maximum entropy reinforcement learning is used to determine the air conditioning power regulation boundary. Combined with a multi-objective reward function and random disturbance noise, the flexibility potential of air conditioning load is optimized to generate cost-optimal regulation instructions that do not affect user experience.

Benefits of technology

This technology improves the depth and accuracy of flexible load excavation and enhances energy efficiency without compromising user comfort.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243114A_ABST
    Figure CN122243114A_ABST
Patent Text Reader

Abstract

This application discloses a method, apparatus, equipment, and medium for energy dispatching in building parks based on the flexible potential mining of air conditioning loads, relating to the field of computer technology. The method includes: constructing an air conditioning energy efficiency conversion model using the quantitative relationship between air conditioning power consumption and building indoor temperature evolution; determining predictive average evaluation constraints based on indoor temperature, humidity, and occupant activity parameters; constructing a target simulation environment based on a temperature dynamic evolution model characterizing indoor temperature evolution, a state extension observer compensating for random heat dissipation disturbances indoors, and a comfort entropy growth rate constraint characterizing the intensity of user comfort evolution; determining a multi-objective reward function to balance user comfort and the depth of flexible adjustment potential mining; and using maximum entropy reinforcement learning in conjunction with the target simulation environment to determine the target air conditioning adjustment command to complete the flexible potential mining operation of air conditioning loads. The method considers various external factors to achieve flexible load mining for HVAC systems.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and in particular to a method, apparatus, equipment and medium for energy dispatching in building parks based on the flexible potential of air conditioning load. Background Technology

[0002] With the transformation of the energy structure, HVAC (Heating, Ventilation, and Air Conditioning) loads in building parks, as an important flexible resource, play a crucial role in peak shaving and valley filling of the power grid. Air conditioning systems have significant thermal inertia, enabling a "virtual energy storage" effect through short-term power reshaping. However, existing flexible load tapping technologies (adjustable, controllable, and predictable flexible adjustment of the electrical load of HVAC systems during operation) still have the following limitations: First, existing definitions of flexible boundaries often rely on fixed temperature dead zones or simplified physical models, ignoring the strong temporal coupling characteristics of indoor and outdoor environmental disturbances, human activities, and building thermodynamics, resulting in insufficiently accurate or overly conservative potential ranges. Second, flexible adjustment often comes at the expense of user comfort, lacking a closed-loop real-time consideration of human sensory evaluation, leading to low user participation and difficulty in achieving truly "unobtrusive" adjustment.

[0003] As can be seen from the above, how to take into account various external factors to achieve flexible load tapping for HVAC is an urgent problem to be solved. Summary of the Invention

[0004] In view of this, the purpose of this invention is to provide a method, apparatus, equipment, and medium for energy dispatching in building parks based on the flexible potential exploitation of air conditioning loads, capable of considering various external factors to achieve flexible load exploitation for HVAC systems. The specific solution is as follows: Firstly, this application provides a building park energy dispatching method based on the flexible potential mining of air conditioning load, including: A temperature dynamic evolution model is established to characterize the evolution of indoor temperature in a building. An air conditioning energy efficiency conversion model is constructed using the quantitative relationship between air conditioning power consumption and the evolution of indoor temperature in the building. Predictive average evaluation constraints are determined based on the indoor temperature, humidity and human activity parameters of the building. A state extension observer is constructed to compensate for random heat dissipation disturbances in the building. The comfort entropy growth rate constraint, which characterizes the intensity of the evolution of user comfort, is determined. Based on the temperature dynamic evolution model, the predicted average evaluation constraint, the air conditioning energy efficiency conversion model, the state extension observer, and the comfort entropy growth rate constraint, a target simulation environment is constructed. The state vector is constructed using the current environmental parameters, electricity price information, and user activity. The flexible adjustment potential of the air conditioning load is determined based on the predicted average evaluation constraint. A multi-objective reward function is determined to balance user comfort and the depth of flexible adjustment potential mining. Random disturbance noise is added to the target simulation environment, and maximum entropy reinforcement learning is performed based on the state vector, the flexible adjustment potential, and the multi-objective reward function to obtain the air conditioning power adjustment boundary. A target optimization function is constructed with the goal of minimizing the total operating cost of the building park. Based on the target optimization function and the air conditioning power adjustment boundary, a target air conditioning adjustment command is determined to complete the corresponding air conditioning load flexibility potential mining operation. Energy scheduling of the building park is then carried out based on the air conditioning load flexibility potential mining results.

[0005] Optionally, the step of establishing a temperature dynamic evolution model to characterize the evolution of building indoor temperature, constructing an air conditioning energy efficiency conversion model using the quantitative relationship between air conditioning power consumption and the evolution of building indoor temperature, and determining predictive average evaluation constraints based on the building's indoor temperature, humidity, and occupant activity parameters, includes: The thermal inertia and heat storage capacity of the building envelope are abstracted into equivalent heat capacity and equivalent thermal resistance. Based on the equivalent heat capacity, equivalent thermal resistance, indoor temperature and outdoor temperature of the building park, a dynamic temperature evolution model is constructed. The quantitative mapping relationship between the active power of the air conditioner and the actual cooling / heating capacity of the building is determined based on the air conditioner energy efficiency ratio and the air conditioner comprehensive efficiency coefficient, and an air conditioner energy efficiency conversion model is constructed based on the quantitative mapping relationship. The predicted average evaluation is determined based on the indoor temperature and humidity of the building, the metabolic rate of the people, the thermal resistance of the people's clothing, the average radiant temperature of the building envelope, and the indoor air velocity. Define the range interval of the predicted average evaluation to obtain the predicted average evaluation constraint.

[0006] Optionally, the construction of a state extension observer for compensating for random heat dissipation disturbances within the building, and the determination of a comfort entropy growth rate constraint characterizing the drastic evolution of user comfort, includes: The random heat dissipation disturbance inside the building is determined as the equivalent heat flux disturbance, and the indoor temperature estimated by the observer is determined as the estimated indoor temperature. A state-extended observer is constructed based on the estimated indoor temperature, the indoor temperature of the building, the equivalent heat flux disturbance, and the preset observer gain coefficient and nonlinear function. A comfort entropy growth rate is determined to characterize the drastic evolution of user comfort, and a comfort entropy growth rate constraint is constructed based on the comfort entropy growth rate; the comfort entropy growth rate constraint is that the comfort entropy growth rate is not greater than the target entropy growth rate threshold.

[0007] Optionally, the step of constructing a state vector using current environmental parameters, electricity price information, and user activity, and determining the flexible adjustment potential of air conditioning load based on the predicted average evaluation constraint, includes: The current environmental parameters are constructed based on the equivalent heat flux disturbance, the indoor and outdoor temperatures of the building complex, and the predicted average evaluation. Electricity price information is constructed using real-time electricity prices and grid peak shaving demand response signals, and a state vector is constructed based on the current environmental parameters, electricity price information, and user activity. The flexible adjustment potential of air conditioning load is determined based on the air conditioning power adjustment trajectory that meets the predicted average evaluation constraint within a preset time period.

[0008] Optionally, determining the multi-objective reward function for balancing user comfort and the depth of potential for flexible adjustment includes: The air conditioning power adjustment trajectory that exceeds the range is penalized using an indicator function to obtain a first penalty term; A penalty is imposed on the air conditioning power adjustment trajectory that deviates from the target optimal comfort value corresponding to the predicted average evaluation, to obtain a second penalty term; The power regulation range and flexibility margin of the air conditioner within the range are explored using a logarithmic function to obtain an incentive term. A weighted combination of the first penalty term, the second penalty term, and the incentive term is then performed to construct a multi-objective reward function.

[0009] Optionally, the step of adding random disturbance noise to the target simulation environment and performing maximum entropy reinforcement learning based on the state vector, the flexible adjustment potential, and the multi-objective reward function to obtain the air conditioning power adjustment boundary includes: Random interference noise satisfying a Gaussian distribution is added to the target simulation environment, Shannon entropy is added to the objective function of reinforcement learning, and maximum entropy reinforcement learning is performed in combination with the state vector, the flexible adjustment potential and the multi-objective reward function to obtain the trained target policy network. The air conditioning power regulation boundary is determined based on the target policy network.

[0010] Optionally, the step of constructing a target optimization function with the objective of minimizing the total operating cost of the building park, and determining the target air conditioning adjustment command based on the target optimization function and the air conditioning power adjustment boundary, includes: The thermal inertia of the building complex is abstracted as an energy storage battery; A target optimization function is constructed with the goal of minimizing the total operating cost of the building park, and the power balance constraint, air conditioning comfort constraint, and energy storage battery constraint of the target optimization function are determined. The target optimization function is solved based on the air conditioning power adjustment boundary and various constraints to obtain the target air conditioning adjustment command; The power balance constraint is that the total power load of the building park is equal to the sum of the power purchased by the grid, the photovoltaic power generation, and the energy storage discharge; the air conditioning comfort constraint is that the air conditioning power is within the air conditioning power adjustment boundary; and the energy storage battery constraint is that the charging and discharging power of the energy storage battery is within a preset safety range.

[0011] Secondly, this application provides an energy dispatching device for building parks based on the flexible potential exploitation of air conditioning loads, comprising: The constraint determination module is used to establish a temperature dynamic evolution model to characterize the evolution of indoor temperature in a building, construct an air conditioning energy efficiency conversion model using the quantitative relationship between air conditioning power consumption and the evolution of indoor temperature in the building, determine the predictive average evaluation constraint based on the temperature, humidity and human activity parameters in the building, construct a state extension observer to compensate for random heat dissipation disturbances in the building, and determine the comfort entropy growth rate constraint to characterize the severity of the evolution of user comfort. The adjustment potential determination module is used to construct a target simulation environment based on the temperature dynamic evolution model, the predicted average evaluation constraint, the air conditioning energy efficiency conversion model, the state extension observer, and the comfort entropy growth rate constraint. It constructs a state vector using current environmental parameters, electricity price information, and user activity, and determines the flexible adjustment potential of the air conditioning load based on the predicted average evaluation constraint. The adjustment boundary determination module is used to determine a multi-objective reward function for balancing user comfort and the depth of flexible adjustment potential mining. Random disturbance noise is added to the target simulation environment, and maximum entropy reinforcement learning is performed based on the state vector, the flexible adjustment potential and the multi-objective reward function to obtain the air conditioning power adjustment boundary. The adjustment instruction determination module is used to construct a target optimization function with the goal of minimizing the total operating cost of the building park, determine the target air conditioning adjustment instruction based on the target optimization function and the air conditioning power adjustment boundary, complete the corresponding air conditioning load flexibility potential mining operation, and perform energy scheduling of the building park based on the air conditioning load flexibility potential mining results.

[0012] Thirdly, this application provides an electronic device, comprising: Memory, used to store computer programs; A processor is used to execute the computer program to implement the aforementioned energy dispatching method for building parks based on the flexible potential of air conditioning load.

[0013] Fourthly, this application provides a computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the aforementioned energy scheduling method for building parks based on the flexible potential mining of air conditioning load.

[0014] This application establishes a temperature dynamic evolution model to characterize the evolution of indoor building temperature. It constructs an air conditioning energy efficiency conversion model using the quantitative relationship between air conditioning power consumption and the evolution of indoor building temperature. Based on the indoor temperature, humidity, and human activity parameters, it determines predictive average evaluation constraints, constructs a state extension observer to compensate for random heat dissipation disturbances in the building, and determines a comfort entropy growth rate constraint characterizing the severity of user comfort evolution. Based on the temperature dynamic evolution model, the predictive average evaluation constraints, the air conditioning energy efficiency conversion model, the state extension observer, and the comfort entropy growth rate constraint, it constructs a target simulation environment, utilizing current environmental parameters, electricity price information, and user activity data. A state vector is constructed, and the flexible adjustment potential of the air conditioning load is determined based on the predicted average evaluation constraint. A multi-objective reward function is determined to balance user comfort and the depth of flexible adjustment potential mining. Random interference noise is added to the target simulation environment, and maximum entropy reinforcement learning is performed based on the state vector, the flexible adjustment potential, and the multi-objective reward function to obtain the air conditioning power adjustment boundary. A target optimization function is constructed with the goal of minimizing the total operating cost of the building park. The target air conditioning adjustment command is determined based on the target optimization function and the air conditioning power adjustment boundary to complete the corresponding air conditioning load flexible potential mining operation. Energy scheduling of the building park is performed based on the air conditioning load flexible potential mining results.

[0015] As can be seen from the above, this application accurately depicts the thermal inertia and temperature change patterns of buildings through a temperature dynamic evolution model, achieves accurate conversion between power consumption and indoor temperature through an air conditioning energy efficiency conversion model, defines a safe red line for user comfort through predictive average evaluation constraints, eliminates prediction errors caused by random heat dissipation disturbances through a state extension observer, and eliminates sensory shocks caused by sudden temperature changes through comfort entropy growth rate constraints. Based on the above constraints, a simulation environment is constructed. By constructing a state vector and clarifying the flexible adjustment potential of air conditioning, and combining a multi-objective reward function and random interference noise for maximum entropy reinforcement learning, the air conditioning power adjustment boundary is obtained. In this way, an optimization function is constructed with the goal of minimizing the total operating cost of the park. The power adjustment boundary is used as a hard constraint to solve the optimization function to obtain the target air conditioning adjustment command. Under the premise of not affecting the user experience, the cost-optimal air conditioning adjustment command adapted to the grid demand is automatically generated, improving the depth and accuracy of flexible load mining and effectively improving energy utilization efficiency. Attached Figure Description

[0016] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.

[0017] Figure 1 This application discloses a flowchart of an energy dispatching method for building parks based on the flexible potential of air conditioning load. Figure 2 This is a schematic diagram illustrating the control potential of a building air conditioning system that takes user comfort into account, as disclosed in this application. Figure 3 This is a schematic diagram illustrating the potential for flexible adjustment of air conditioning load disclosed in this application; Figure 4 This is a schematic diagram of the dynamic flexible boundary of air conditioning load and park scheduling response disclosed in this application; Figure 5 This is a schematic diagram of a building park energy dispatching device based on the flexible potential of air conditioning load disclosed in this application. Figure 6 This is a structural diagram of an electronic device disclosed in this application. Detailed Implementation

[0018] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0019] Currently, existing flexible boundary definitions mostly rely on fixed temperature dead zones or simplified physical models, ignoring the strong temporal coupling characteristics of indoor and outdoor environmental disturbances, human activities, and building thermodynamics. This results in insufficiently accurate or overly conservative potential ranges. Secondly, flexible regulation often sacrifices user comfort, lacking a closed-loop real-time consideration of human sensory evaluation, leading to low user participation and difficulty in achieving truly "seamless" regulation. To address this, this application provides a building park energy dispatching method based on the flexible potential mining of air conditioning loads. An optimization function is constructed with the goal of minimizing the total operating cost of the park. The power regulation boundary is used as a hard constraint to solve the optimization function, obtaining the target air conditioning regulation command. This automatically generates cost-optimal air conditioning regulation commands that adapt to grid demands without affecting user experience, improving the depth and accuracy of flexible load mining and effectively increasing energy utilization efficiency.

[0020] See Figure 1 As shown, this invention discloses an energy dispatching method for building parks based on the flexible potential mining of air conditioning load, including: Step S11: Establish a temperature dynamic evolution model to characterize the evolution of indoor temperature in a building; construct an air conditioning energy efficiency conversion model using the quantitative relationship between air conditioning power consumption and the evolution of indoor temperature in the building; determine the predictive average evaluation constraint based on the indoor temperature, humidity and human activity parameters of the building; construct a state extension observer to compensate for random heat dissipation disturbances in the building; and determine the comfort entropy growth rate constraint to characterize the severity of the evolution of user comfort.

[0021] In this embodiment, the dynamic change law of indoor temperature in the building park is described based on the law of conservation of energy and using a first-order differential equation. The corresponding formula is as follows: ; in, The rate of change of indoor temperature in the building complex; The equivalent heat capacity of the building envelope; The equivalent thermal resistance of the building envelope; and Let t be the indoor temperature and the outdoor temperature, respectively. The real-time cooling or heating power of the air conditioning system; Random thermal disturbances caused by indoor occupants, electrical heat dissipation, and solar radiation.

[0022] Understandably, the Predicted Mean Vote (PMV) metric is used to characterize users' subjective thermal comfort, and the corresponding formula is as follows: ; in, The average evaluation of the prediction; Indoor temperature; The average radiant temperature of the surface of the enclosure structure (such as walls, windows, ceilings, etc.) surrounding the human body; The speed of airflow around the human body; Relative humidity affects the evaporation efficiency of human sweat; excessively high humidity will inhibit sweating and heat dissipation, making people feel stuffy and hot. Human metabolic rate; This refers to the thermal resistance of clothing, which is the degree of warmth provided by the clothing worn by the person. The corresponding predictive average evaluation constraint is usually set to [-0.5, 0.5], which is the threshold for adjusting to a state of neither cold nor hot (i.e., imperceptible warmth), but it can also be adjusted according to the actual situation.

[0023] Furthermore, in order to establish a quantitative relationship between air conditioning power consumption and indoor thermal evolution, an air conditioning energy efficiency conversion model is constructed. The real-time cooling / heating capacity of the air conditioning system and its consumed active power satisfy the following mapping relationship: ; in, For real-time cooling or heating of air conditioning systems; The overall efficiency coefficient of the air conditioning system; Let be the energy efficiency ratio of the air conditioning system at time t; This represents the active power absorbed by the air conditioning system from the power grid at time t.

[0024] In this embodiment, the models from the preceding steps are integrated and engineered to obtain a simplified temperature dynamic evolution model and a predicted average evaluation. Specifically, the temperature dynamic evolution model and the air conditioning energy efficiency conversion model are integrated, and the building's indoor environment is equivalent to a first-order circuit model of "resistance R - capacitance C," that is, temperature is equivalent to voltage, and heat is equivalent to current, to obtain the corresponding differential equation: ; in, The equivalent heat capacity of indoor air; The equivalent thermal resistance of the building envelope; and Let t be the indoor temperature and the outdoor temperature, respectively. Random thermal disturbances caused by indoor occupants, electrical heat dissipation, and solar radiation, as mentioned above. Consistent; The overall efficiency coefficient of the air conditioning system; This represents the active power absorbed by the air conditioning system from the power grid at time t.

[0025] Understandably, to reduce computational complexity, the predicted average evaluation can be simplified. The nonlinear relationship between the predicted average evaluation and indoor temperature is shown below: ; in, The average evaluation of the prediction; Let be the indoor temperature at time t; Let be the indoor relative humidity at time t, and Consistent; Human metabolic rate; For the thermal resistance of clothing; Let t be the indoor temperature. The corresponding saturated water vapor partial pressure; , , The sensitivity coefficient is obtained by linearizing the predicted average evaluation under specific working conditions (fixed metabolic rate, clothing thermal resistance).

[0026] Specifically, the establishment of a temperature dynamic evolution model to characterize the evolution of indoor building temperature, the construction of an air conditioning energy efficiency conversion model using the quantitative relationship between air conditioning power consumption and the evolution of indoor building temperature, and the determination of predictive average evaluation constraints based on indoor building temperature, humidity, and human activity parameters include: abstracting the thermal inertia and heat storage capacity of the building envelope into equivalent heat capacity and equivalent thermal resistance, and constructing a temperature dynamic evolution model based on the equivalent heat capacity, equivalent thermal resistance, indoor temperature, and outdoor temperature of the building complex; determining the quantitative mapping relationship between the active power of air conditioning and the actual cooling / heating capacity of the building interior based on the air conditioning energy efficiency ratio and the air conditioning comprehensive efficiency coefficient, and constructing an air conditioning energy efficiency conversion model based on the quantitative mapping relationship; determining the predictive average evaluation based on the indoor building temperature, humidity, human metabolic rate corresponding to personnel, thermal resistance of personnel clothing, average radiant temperature of the building envelope, and indoor air velocity; and defining the range interval of the predictive average evaluation to obtain the predictive average evaluation constraints.

[0027] In this embodiment, to eliminate the impact of random thermal disturbances such as indoor occupancy fluctuations and solar radiation on prediction accuracy, a state extension observer is introduced to extend the unknown heat flux disturbance term into additional state variables of the system, constructing the following observer equation: ; in, The estimated indoor temperature for the observer; and These represent the actual indoor and outdoor temperatures at time t, respectively. For the estimated real-time equivalent heat flux perturbation (including illumination, human body heat dissipation, etc.); This is the time derivative, or rate of change, corresponding to the real-time equivalent heat flux perturbation. , The preset observer gain coefficient determines the convergence speed; It is a nonlinear pressure function used to improve noise suppression and observation sensitivity; To estimate the rate of change of indoor temperature over time; The equivalent thermal resistance of the building envelope; The overall efficiency coefficient of the air conditioning system; This represents the active power absorbed by the air conditioning system from the power grid at time t.

[0028] It is understandable that even if a predicted average rating constraint is constructed to ensure that the predicted average rating is within a certain range, if the predicted average rating changes drastically within the range in a short period of time, users will still feel a significant change in temperature. Therefore, comfort entropy increase is defined, and the rate of change of the comfort entropy increase is used to quantify the drastic degree of the evolution of user thermal comfort. The formula corresponding to the comfort entropy increase rate is as follows: ; in, The growth rate of the comfort entropy; The average predicted evaluation at time t; To predict the maximum allowable value of the average evaluation within the range. The smaller the rate of increase of the comfort entropy, the more gradual the change in the thermal environment, and the less likely the user is to perceive the adjustment of the air conditioning output; by constraining the rate of increase of the comfort entropy. , The threshold for the growth rate of the target entropy is used to construct a constraint on the growth rate of the comfort entropy.

[0029] Specifically, the construction of a state-extended observer for compensating for random heat dissipation disturbances within the building, and the determination of a comfort entropy growth rate constraint characterizing the severity of user comfort evolution, includes: determining the random heat dissipation disturbances within the building as equivalent heat flux disturbances, and determining the indoor temperature estimated by the observer as the estimated indoor temperature; constructing a state-extended observer based on the estimated indoor temperature, the indoor temperature, the equivalent heat flux disturbance, and preset observer gain coefficients and nonlinear functions; determining the comfort entropy growth rate characterizing the severity of user comfort evolution, and constructing a comfort entropy growth rate constraint based on the comfort entropy growth rate; the comfort entropy growth rate constraint is that the comfort entropy growth rate is not greater than a target entropy growth rate threshold.

[0030] Step S12: Construct a target simulation environment based on the temperature dynamic evolution model, the predicted average evaluation constraint, the air conditioning energy efficiency conversion model, the state extension observer, and the comfort entropy growth rate constraint. Construct a state vector using current environmental parameters, electricity price information, and user activity. Determine the flexible adjustment potential of the air conditioning load based on the predicted average evaluation constraint.

[0031] In this embodiment, a target simulation environment is constructed based on the aforementioned constraints, and the flexible adjustment potential of the air conditioning load is defined, with the corresponding formula as follows: ; in, This refers to the flexible adjustment potential; Forecast time period for the future The internal air conditioning power adjustment trajectory; The time within the predicted time period The corresponding predicted average evaluation; , These are the upper and lower limits of the range of the predicted average evaluation.

[0032] Understandably, the problem of flexible mining of air conditioning is transformed into a problem solvable by reinforcement learning, that is, by modeling a Markov decision process. First, the state space, i.e., the state vector, is defined, and the corresponding formula is as follows: ; in, Let be the state vector at time t; and These represent the actual indoor and outdoor temperatures at time t, respectively. The real-time electricity price or grid demand response signal at time t guides the intelligent agent to adjust the air conditioning power when the electricity price is high or the grid has peak shaving needs, thereby maximizing economic benefits. The average predicted evaluation at time t; Random thermal disturbances caused by indoor occupants, electrical heat dissipation, and solar radiation; This represents the current position within the daily scheduling cycle.

[0033] Specifically, the step of constructing a state vector using current environmental parameters, electricity price information, and user activity, and determining the flexible adjustment potential of air conditioning load based on the predicted average evaluation constraint, includes: constructing current environmental parameters based on the equivalent heat flux disturbance, the indoor and outdoor temperatures of the building complex, and the predicted average evaluation; constructing electricity price information using real-time electricity prices and grid peak-shaving demand response signals, and constructing a state vector based on the current environmental parameters, electricity price information, and user activity; and determining the flexible adjustment potential of air conditioning load based on the air conditioning power adjustment trajectory that satisfies the predicted average evaluation constraint within a preset time period.

[0034] Step S13: Determine a multi-objective reward function to balance user comfort and the depth of flexible adjustment potential mining, add random disturbance noise to the target simulation environment, and perform maximum entropy reinforcement learning based on the state vector, the flexible adjustment potential and the multi-objective reward function to obtain the air conditioning power adjustment boundary.

[0035] In this embodiment, a multi-objective reward function is constructed, aiming to balance the stringency of physical constraints, the steady-state quality of thermal comfort, and the depth of exploration of flexible potential. The formula corresponding to the macroscopic initial reward function is as follows: ; in, This is the macroscopic initial reward function; Let be the energy consumption cost of the air conditioning system at time t; This is the user's tolerable comfort threshold, usually set to 0.5, but it can also be adjusted according to the actual situation. This is a penalty factor for exceeding the limit; The volume of the flexible manifold represents the size of the exploited flexibility margin; the larger the volume, the greater the adjustable potential of the air conditioner. The average predicted evaluation at time t; , , These are the weighting coefficients for operating costs, over-limit penalties, and flexible mining, used to balance economy, comfort, and flexible depth.

[0036] Understandably, by refining the initial reward function described above, a multi-objective reward function can be obtained, with the corresponding formula as follows: ; in, The multi-objective reward function; This is the weighting coefficient for exceeding the limit penalty, and its value is usually much larger than other coefficients; For indicator functions; , These are the upper and lower limits of the range of the predicted average evaluation. This is the comfort deviation weighting coefficient; The average evaluation of the prediction; The optimal reference value that is furthest from the predicted average evaluation is the target optimal value for user thermal comfort. The flexibility incentive weighting coefficient represents the incentive level for exploring air conditioner power regulation. Let t be the air conditioning power adjustment at time t.

[0037] Specifically, determining the multi-objective reward function for balancing user comfort and the depth of flexibility adjustment potential includes: penalizing air conditioner power adjustment trajectories that exceed the range using an indicator function to obtain a first penalty term; penalizing air conditioner power adjustment trajectories that deviate from the target optimal comfort value corresponding to the predicted average evaluation to obtain a second penalty term; exploring the power adjustment range and flexibility margin of the air conditioner within the range using a logarithmic function to obtain an incentive term; and constructing a multi-objective reward function by weighting the first penalty term, the second penalty term, and the incentive term.

[0038] In this embodiment, the real environment contains a large number of unmodeled random disturbances, such as people entering and exiting, opening and closing doors and windows, sudden weather changes, and the starting and stopping of electrical appliances. Therefore, random noise is introduced into the simulation environment to simulate the uncertainty of the environment. The corresponding formula is as follows: ; in, Let be the indoor temperature at time t+1; Let be the indoor temperature at time t; This refers to the temperature dynamic evolution model; The real-time cooling or heating power of the air conditioning system; Random thermal disturbances caused by indoor occupants, electrical heat dissipation, and solar radiation; For random interference noise, That is, random interference noise follows a pattern with a mean of 0 and a variance of 0. It follows a Gaussian normal distribution.

[0039] Understandably, the goal of ordinary reinforcement learning is to find a single optimal policy that maximizes cumulative reward. However, the core requirement of this approach is not to find a single optimal adjustment scheme, but to characterize the entire high-dimensional flexible manifold, that is, to find all feasible power trajectories that satisfy the constraints, requiring a wide-area exploration of the action space. Therefore, the reinforcement learning framework adopted in this invention is not simply policy optimization; its mathematical essence is solving a constrained Markov Decision Process (MDP). To address the exploration efficiency problem of air conditioning load under complex perturbations, Maximum Entropy Reinforcement Learning (Maximum Entropy RL) is introduced, and the objective function is rewritten as: ; in, For strategy The objective function; For expectation operator, The state vector; For action space; For strategy The distribution of generated state-action pairs; The multi-objective reward function; The entropy tempering coefficient in the objective function; Shannon entropy represents the degree of exploration in the entire decision-making process.

[0040] Furthermore, after obtaining the trained target policy network through reinforcement learning, the target policy network is used to infer in real time the extreme values ​​of power that the air conditioning load can instantly increase or decrease under the current environmental conditions, without exceeding the range of the predicted average evaluation. These extreme values ​​represent the upper and lower boundaries of the air conditioning flexibility, and the corresponding formulas are as follows: ; ; in, , These are the upper and lower limits for air conditioner power adjustment, respectively. The real-time cooling or heating power of the air conditioning system; Indoor temperature; , These are the upper and lower limits of the range of the predicted average evaluation. The change in indoor temperature caused by power regulation; The average evaluation of the prediction.

[0041] Specifically, the step of adding random interference noise to the target simulation environment and performing maximum entropy reinforcement learning based on the state vector, the flexible adjustment potential, and the multi-objective reward function to obtain the air conditioning power adjustment boundary includes: adding random interference noise that satisfies a Gaussian distribution to the target simulation environment; adding Shannon entropy to the objective function of reinforcement learning; and performing maximum entropy reinforcement learning in combination with the state vector, the flexible adjustment potential, and the multi-objective reward function to obtain a trained target policy network; and determining the air conditioning power adjustment boundary based on the target policy network.

[0042] Step S14: Construct a target optimization function with the goal of minimizing the total operating cost of the building park. Determine the target air conditioning adjustment command based on the target optimization function and the air conditioning power adjustment boundary to complete the corresponding air conditioning load flexibility potential mining operation. Perform energy scheduling of the building park based on the air conditioning load flexibility potential mining results.

[0043] In this embodiment, in order to standardize the "virtual energy storage" characteristics of building thermal inertia, analogous to the state of charge (SOC) of a real battery, a virtual state of charge is introduced to quantify the degree of accumulation of building thermal inertia (i.e., the charge and discharge depth of virtual energy storage). The corresponding formula is as follows: ; in, The virtual charge state of the building at time t; Let be the indoor temperature at time t; , These represent the upper limit of indoor temperature and the lower limit of indoor temperature corresponding to the range of the predicted average evaluation. Furthermore, the building's virtual state of charge is based on the definition of a dissipative system. By constructing a storage function for building thermal inertia, the energy storage characteristics of building thermal inertia are theoretically proven, with the corresponding formula as follows: ; in, A storage function for building thermal inertia; The rate of change of energy stored in a building; Indoor temperature; Outdoor temperature; Let t be the input electrical power of the air conditioner at time t; The energy efficiency coefficient of the air conditioner; This is the building heat loss function.

[0044] Understandably, the integral form of the building's virtual state of charge more intuitively reflects the mathematical essence of its normalized residual potential energy, and the corresponding formula is as follows: ; in, This is the integral form of the virtual charged state; The equivalent heat capacity of indoor air; Let be the indoor temperature at time t; , These represent the upper limit of indoor temperature and the lower limit of indoor temperature, respectively, corresponding to the range of the predicted average evaluation. Through this integral mapping, the power regulation of the air conditioning system is transformed into the momentum exchange of a quasi-Hamiltonian system, enabling the building load to exhibit linear power-frequency characteristics completely consistent with those of a real battery when participating in grid frequency regulation, thus significantly improving the resolvability and stability of park scheduling.

[0045] Furthermore, using the air conditioning power regulation boundary as a hard physical constraint and leveraging the virtual energy storage characteristics of the air conditioner, while ensuring user comfort, the optimal scheduling strategy is solved by coordinating fixed energy storage and air conditioning flexibility based on real-time electricity prices or grid peak-shaving demands, thereby minimizing the park's operating costs. The corresponding objective optimization function is shown below: ; in, This refers to the total scheduling cycle of the dispatch center. Let be the electricity price at time t; For loads that are uncontrollable in the park, other rigid electrical loads in the park, such as lighting and office equipment, besides air conditioning and energy storage, cannot be adjusted through dispatching; Let t be the power consumption of the air conditioning system at time t; Let the thermal inertia of the building complex at time t be abstracted as the charging power of the energy storage battery; Let t be the discharge power of the energy storage battery at time t; The photovoltaic power output is predicted for time t.

[0046] Furthermore, there are several constraints on the objective function described above, and the corresponding formulas are as follows: ; ; ; ; in, Let be the active power absorbed by the air conditioning system from the power grid at time t; The potential for flexible adjustment of air conditioning load, that is, in , Within the corresponding air conditioner power adjustment boundary; Let t be the state of charge of the energy storage battery at time t; The charging efficiency of the energy storage battery; The discharge efficiency of the energy storage battery; Let t be the charging power of the energy storage battery at time t; Let t be the discharge power of the energy storage battery at time t; The rated capacity of the energy storage battery; The scheduling time step; , These are the minimum and maximum power of the energy storage battery, respectively; The charging and discharging power of the energy storage battery; , These are the minimum and maximum values ​​of the safe range corresponding to the state of charge of the pure prank battery, respectively. Let t be the state of charge of the energy storage battery at time t.

[0047] Specifically, the step of constructing a target optimization function with the objective of minimizing the total operating cost of the building complex, and determining the target air conditioning adjustment command based on the target optimization function and the air conditioning power adjustment boundary, includes: abstracting the thermal inertia of the building complex as an energy storage battery; constructing a target optimization function with the objective of minimizing the total operating cost of the building complex, and determining the power balance constraint, air conditioning comfort constraint, and energy storage battery constraint of the target optimization function; solving the target optimization function based on the air conditioning power adjustment boundary and each constraint to obtain the target air conditioning adjustment command; wherein, the power balance constraint is that the total electricity load of the building complex is equal to the sum of the electricity purchased by the grid, the photovoltaic power generation, and the energy storage discharge; the air conditioning comfort constraint is that the air conditioning power is within the air conditioning power adjustment boundary; and the energy storage battery constraint is that the charging and discharging power of the energy storage battery is within a preset safety range.

[0048] Understandably, the objective optimization function is solved based on the aforementioned constraints to obtain the target air conditioning adjustment command. The operating frequency is then adjusted based on this command, and the indoor temperature is fed back in real time. If extreme disturbances cause the temperature to deviate from the comfort range, the air conditioning power adjustment boundary will dynamically shrink in real time, and the air conditioning command will be adjusted synchronously. This closed-loop feedback ensures that the user experience remains within the comfort range. By predicting the future decrease in air conditioning flexibility through the virtual state of charge, heat / cooling capacity can be stored in advance: for example, cooling / heating during periods of low electricity prices to increase the building's energy storage level; and releasing stored energy during peak electricity prices / grid shaving periods to reduce air conditioning electricity consumption, achieving synergy between cost optimization and grid response.

[0049] As can be seen from the above, this application accurately depicts the thermal inertia and temperature change patterns of buildings through a temperature dynamic evolution model, achieves accurate conversion between power consumption and indoor temperature through an air conditioning energy efficiency conversion model, defines a safe red line for user comfort through predictive average evaluation constraints, eliminates prediction errors caused by random heat dissipation disturbances through a state extension observer, and eliminates sensory shocks caused by sudden temperature changes through comfort entropy growth rate constraints. Based on the above constraints, a simulation environment is constructed. By constructing a state vector and clarifying the flexible adjustment potential of air conditioning, and combining a multi-objective reward function and random interference noise for maximum entropy reinforcement learning, the air conditioning power adjustment boundary is obtained. In this way, an optimization function is constructed with the goal of minimizing the total operating cost of the park. The power adjustment boundary is used as a hard constraint to solve the optimization function to obtain the target air conditioning adjustment command. Under the premise of not affecting the user experience, the cost-optimal air conditioning adjustment command adapted to the grid demand is automatically generated, improving the depth and accuracy of flexible load mining and effectively improving energy utilization efficiency.

[0050] As can be seen from the above embodiments, this application constructs an optimization function based on the total operating cost of the park in order to obtain the air conditioning adjustment command with the optimal operating cost and adapted to the grid demand target. Therefore, the process of constructing an optimization function based on the total operating cost of the park is described.

[0051] Combination Figures 2 to 4 This invention discloses a specific energy dispatching method for building parks based on the flexible potential mining of air conditioning load, including: In this embodiment, Figure 2 This is a schematic diagram illustrating the control potential of a building air conditioning system that considers user comfort, where the indoor temperature is... It defines the full temperature range for air conditioning adjustment and the red line for user comfort constraints. This is the indoor temperature setting, corresponding to the user's comfort temperature value. This refers to the temperature fluctuation range during the normal start-up and shutdown of the air conditioner. This is the boundary of the comfort zone, a transitional area for seamless adjustment; , These represent the upper and lower limits corresponding to the temperature, and the range of the corresponding predicted average evaluation. Figure 2 The flexible region in the diagram represents the flexible region where temperature changes over time, with the horizontal axis representing the scheduling time t and the vertical axis representing the indoor temperature. If the indoor temperature consistently falls within this flexible range, user comfort is satisfied. Additionally, Figure 2 In This represents the energy storage function corresponding to the building's thermal inertia, with the horizontal axis representing the scheduling time t and the vertical axis representing the building's stored thermal / cold energy E. , These represent the upper and lower limits of building energy storage. The building envelope and indoor air have thermal capacity characteristics and can store and release cold / heat energy like a battery.

[0052] Figure 3 A schematic diagram illustrating the potential for flexible adjustment of air conditioning load is presented. The method uses grid status (i.e., real-time grid operation data), building user demand analysis data (i.e., users' thermal comfort needs), and HVAC consumption (i.e., real-time HVAC operation data) as input data. A grid demand interaction module transmits the grid status data to an energy management system, which then constructs a state vector. This vector, combined with a reinforcement learning agent, outputs air conditioning power adjustment actions, i.e., air conditioning power adjustment boundaries. Based on these boundaries, a target air conditioning adjustment command is constructed. During the execution of this command, the indoor temperature is consistently maintained within the comfort range, achieving "seamless adjustment" for the user.

[0053] Figure 4 This diagram illustrates a dynamic flexible boundary for air conditioning load and a park-wide scheduling response. The demand curve represents the grid demand curve, with time on the horizontal axis and the output power of the HVAC system on the vertical axis. During peak periods, corresponding to peak grid demand, air conditioning power needs to be reduced to achieve peak shaving. Conversely, during trough periods, corresponding to off-peak grid demand, air conditioning power needs to be increased to achieve valley filling. This allows the air conditioning output power to follow the grid demand curve, achieving efficient coordination between air conditioning load and grid demand. Then, reinforcement learning is used to implement air conditioning load response, outputting optimal air conditioning power adjustment commands based on the real-time status of the air conditioning and grid targets. During off-peak periods, air conditioning power is increased, boosting cooling / heating output and storing excess cooling / heat in the building envelope, bringing indoor temperatures closer to the comfort boundary. During peak periods, air conditioning power is reduced, releasing stored cooling / heat in the building and maintaining indoor temperatures within the comfort range, thus reducing air conditioning electricity consumption without causing user discomfort.

[0054] As can be seen from the above, this application constructs a state vector and clarifies the flexible adjustment potential of air conditioning. It combines a multi-objective reward function and random interference noise to perform maximum entropy reinforcement learning, thereby obtaining the air conditioning power adjustment boundary. Based on the power adjustment boundary, the target air conditioning adjustment command is determined. This method is applicable to the air conditioning load mining requirements of different climate regions and different building types. While ensuring the quality of life of users, it significantly improves the energy flexibility utilization efficiency of building parks.

[0055] Accordingly, see Figure 5 As shown, this application also provides an energy dispatching device for building parks based on the flexible potential of air conditioning load, comprising: The constraint determination module 11 is used to establish a temperature dynamic evolution model to characterize the evolution of indoor temperature in a building, construct an air conditioning energy efficiency conversion model using the quantitative relationship between air conditioning power consumption and the evolution of indoor temperature in the building, determine the predictive average evaluation constraint based on the temperature, humidity and human activity parameters in the building, construct a state extension observer to compensate for random heat dissipation disturbances in the building, and determine the comfort entropy growth rate constraint to characterize the severity of the evolution of user comfort. The adjustment potential determination module 12 is used to construct a target simulation environment based on the temperature dynamic evolution model, the predicted average evaluation constraint, the air conditioning energy efficiency conversion model, the state extension observer, and the comfort entropy growth rate constraint, construct a state vector using current environmental parameters, electricity price information, and user activity, and determine the flexible adjustment potential of the air conditioning load based on the predicted average evaluation constraint. The adjustment boundary determination module 13 is used to determine a multi-objective reward function for balancing user comfort and the depth of flexible adjustment potential mining. Random interference noise is added to the target simulation environment, and maximum entropy reinforcement learning is performed based on the state vector, the flexible adjustment potential and the multi-objective reward function to obtain the air conditioning power adjustment boundary. The adjustment instruction determination module 14 is used to construct a target optimization function with the goal of minimizing the total operating cost of the building park, determine the target air conditioning adjustment instruction based on the target optimization function and the air conditioning power adjustment boundary, complete the corresponding air conditioning load flexibility potential mining operation, and perform energy scheduling of the building park based on the air conditioning load flexibility potential mining result.

[0056] In some specific embodiments, the constraint determination module 11 may specifically include: The evolution model construction unit is used to abstract the thermal inertia and heat storage capacity of the building envelope into equivalent heat capacity and equivalent thermal resistance, and to construct a temperature dynamic evolution model based on the equivalent heat capacity, equivalent thermal resistance, indoor temperature and outdoor temperature of the building park. The conversion model construction unit is used to determine the quantitative mapping relationship between the active power of the air conditioner and the actual cooling / heating capacity of the building based on the air conditioner energy efficiency ratio and the air conditioner comprehensive efficiency coefficient, and to construct the air conditioner energy efficiency conversion model based on the quantitative mapping relationship. The average evaluation determination unit is used to determine the predicted average evaluation based on the temperature and humidity inside the building, the human metabolic rate of the personnel, the thermal resistance of the personnel's clothing, the average radiant temperature of the building envelope, and the indoor air velocity. The evaluation constraint determination unit is used to define the range interval of the predicted average evaluation in order to obtain the predicted average evaluation constraint.

[0057] In some specific embodiments, the constraint determination module 11 may specifically include: The estimated temperature determination unit is used to determine the random heat dissipation disturbance in the building interior as an equivalent heat flux disturbance, and to determine the indoor temperature estimated by the observer as the estimated indoor temperature. An observer construction unit is used to construct a state-extended observer based on the estimated indoor temperature, the temperature inside the building, the equivalent heat flux disturbance, and a preset observer gain coefficient and nonlinear function. A rate constraint construction unit is used to determine the comfort entropy growth rate, which characterizes the drastic degree of user comfort evolution, and to construct a comfort entropy growth rate constraint based on the comfort entropy growth rate; the comfort entropy growth rate constraint is that the comfort entropy growth rate is not greater than the target entropy growth rate threshold.

[0058] In some specific embodiments, the adjustment potential determination module 12 may specifically include: An environmental parameter construction unit is used to construct current environmental parameters based on the equivalent heat flux disturbance, the indoor and outdoor temperatures of the building park, and the predicted average evaluation. The state vector construction unit is used to construct electricity price information using real-time electricity price and grid peak shaving demand response signal, and to construct a state vector based on the current environmental parameters, electricity price information and user activity. The flexible potential determination unit is used to determine the flexible adjustment potential of the air conditioning load based on the air conditioning power adjustment trajectory that meets the predicted average evaluation constraint within a preset time period.

[0059] In some specific embodiments, the adjustment boundary determination module 13 may specifically include: The first penalty term determination unit is used to penalize the air conditioner power adjustment trajectory that exceeds the range using an indication function, so as to obtain the first penalty term; The second penalty term determination unit is used to penalize the air conditioning power adjustment trajectory that deviates from the target optimal comfort value corresponding to the predicted average evaluation, so as to obtain the second penalty term; The reward function construction unit is used to explore the power adjustment range and flexibility margin of the air conditioner within the range interval using a logarithmic function to obtain an incentive term, and to construct a multi-objective reward function by weighting the first penalty term, the second penalty term and the incentive term.

[0060] In some specific embodiments, the adjustment boundary determination module 13 may specifically include: The reinforcement learning unit is used to add random interference noise that satisfies a Gaussian distribution to the target simulation environment, add Shannon entropy to the objective function of reinforcement learning, and combine the state vector, the flexible adjustment potential and the multi-objective reward function to perform maximum entropy reinforcement learning to obtain the trained target policy network. The adjustment boundary determination unit is used to determine the air conditioning power adjustment boundary based on the target strategy network.

[0061] In some specific embodiments, the adjustment command determination module 14 may specifically include: Thermal inertia abstraction unit, used to abstract the thermal inertia of the building park as an energy storage battery; The constraint determination unit is used to construct an objective optimization function with the goal of minimizing the total operating cost of the building park, and to determine the power balance constraint, air conditioning comfort constraint and energy storage battery constraint of the objective optimization function; The target instruction determination unit is used to solve the target optimization function based on the air conditioning power adjustment boundary and various constraints to obtain the target air conditioning adjustment instruction.

[0062] Furthermore, embodiments of this application also disclose an electronic device, Figure 6 This is a structural diagram of an electronic device 20 according to an exemplary embodiment. The content of the diagram should not be construed as limiting the scope of this application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input / output interface 25, and a communication bus 26. The memory 22 stores a computer program, which is loaded and executed by the processor 21 to implement the relevant steps in the building park energy dispatching method based on the flexible potential mining of air conditioning load disclosed in any of the foregoing embodiments. Furthermore, the electronic device 20 in this embodiment may specifically be an electronic computer.

[0063] In this embodiment, the power supply 23 is used to provide operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and external devices, and the communication protocol it follows can be any communication protocol applicable to the technical solution of this application, and is not specifically limited here; the input / output interface 25 is used to acquire external input data or output data to the outside world, and its specific interface type can be selected according to specific application needs, and is not specifically limited here.

[0064] In addition, the memory 22, as a carrier for resource storage, can be a read-only memory, random access memory, disk or optical disk, etc. The resources stored thereon can include operating system 221, computer program 222, etc., and the storage method can be temporary storage or permanent storage.

[0065] The operating system 221 is used to manage and control the various hardware devices on the electronic device 20 and the computer program 222, which may be Windows Server, Netware, Unix, Linux, etc. In addition to including a computer program capable of performing the building park energy dispatching method based on the flexible potential mining of air conditioning load disclosed in any of the foregoing embodiments, the computer program 222 may further include computer programs capable of performing other specific tasks.

[0066] Furthermore, this application also discloses a computer-readable storage medium for storing a computer program; wherein, when the computer program is executed by a processor, it implements the aforementioned energy dispatching method for building parks based on the flexible potential mining of air conditioning load. Specific steps of this method can be found in the corresponding content disclosed in the foregoing embodiments, and will not be repeated here.

[0067] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since it corresponds to the method disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to in the method section.

[0068] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0069] The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein can be implemented directly by hardware, a software module executed by a processor, or a combination of both. The software module can be located in random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.

[0070] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0071] The technical solutions provided in this application have been described in detail above. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the above embodiments are only for the purpose of helping to understand the methods and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.

Claims

1. A building park energy dispatching method based on the flexible potential mining of air conditioning load, characterized in that, include: A temperature dynamic evolution model is established to characterize the evolution of indoor temperature in a building. An air conditioning energy efficiency conversion model is constructed using the quantitative relationship between air conditioning power consumption and the evolution of indoor temperature in the building. Predictive average evaluation constraints are determined based on the indoor temperature, humidity and human activity parameters of the building. A state extension observer is constructed to compensate for random heat dissipation disturbances in the building. The comfort entropy growth rate constraint, which characterizes the intensity of the evolution of user comfort, is determined. Based on the temperature dynamic evolution model, the predicted average evaluation constraint, the air conditioning energy efficiency conversion model, the state extension observer, and the comfort entropy growth rate constraint, a target simulation environment is constructed. The state vector is constructed using the current environmental parameters, electricity price information, and user activity. The flexible adjustment potential of the air conditioning load is determined based on the predicted average evaluation constraint. A multi-objective reward function is determined to balance user comfort and the depth of flexible adjustment potential mining. Random disturbance noise is added to the target simulation environment, and maximum entropy reinforcement learning is performed based on the state vector, the flexible adjustment potential, and the multi-objective reward function to obtain the air conditioning power adjustment boundary. A target optimization function is constructed with the goal of minimizing the total operating cost of the building park. Based on the target optimization function and the air conditioning power adjustment boundary, a target air conditioning adjustment command is determined to complete the corresponding air conditioning load flexibility potential mining operation. Energy scheduling of the building park is then carried out based on the air conditioning load flexibility potential mining results.

2. The building park energy dispatching method based on the flexible potential mining of air conditioning load according to claim 1, characterized in that, The establishment of a temperature dynamic evolution model to characterize the evolution of building indoor temperature, the construction of an air conditioning energy efficiency conversion model using the quantitative relationship between air conditioning power consumption and the evolution of building indoor temperature, and the determination of predictive average evaluation constraints based on the building's indoor temperature, humidity, and occupant activity parameters, including: The thermal inertia and heat storage capacity of the building envelope are abstracted into equivalent heat capacity and equivalent thermal resistance. Based on the equivalent heat capacity, equivalent thermal resistance, indoor temperature and outdoor temperature of the building park, a dynamic temperature evolution model is constructed. The quantitative mapping relationship between the active power of the air conditioner and the actual cooling / heating capacity of the building is determined based on the air conditioner energy efficiency ratio and the air conditioner comprehensive efficiency coefficient, and an air conditioner energy efficiency conversion model is constructed based on the quantitative mapping relationship. The predicted average evaluation is determined based on the indoor temperature and humidity of the building, the metabolic rate of the people, the thermal resistance of the people's clothing, the average radiant temperature of the building envelope, and the indoor air velocity. Define the range interval of the predicted average evaluation to obtain the predicted average evaluation constraint.

3. The energy dispatching method for building parks based on the flexible potential mining of air conditioning load according to claim 2, characterized in that, The construction of a state extension observer to compensate for random heat dissipation disturbances within the building, and the determination of the comfort entropy growth rate constraint characterizing the drastic evolution of user comfort, include: The random heat dissipation disturbance inside the building is determined as the equivalent heat flux disturbance, and the indoor temperature estimated by the observer is determined as the estimated indoor temperature. A state-extended observer is constructed based on the estimated indoor temperature, the indoor temperature of the building, the equivalent heat flux disturbance, and the preset observer gain coefficient and nonlinear function. A comfort entropy growth rate is determined to characterize the drastic evolution of user comfort, and a comfort entropy growth rate constraint is constructed based on the comfort entropy growth rate; the comfort entropy growth rate constraint is that the comfort entropy growth rate is not greater than the target entropy growth rate threshold.

4. The energy dispatching method for building parks based on the flexible potential mining of air conditioning load according to claim 3, characterized in that, The process of constructing a state vector using current environmental parameters, electricity price information, and user activity, and determining the flexible adjustment potential of air conditioning load based on the predicted average evaluation constraints, includes: The current environmental parameters are constructed based on the equivalent heat flux disturbance, the indoor and outdoor temperatures of the building complex, and the predicted average evaluation. Electricity price information is constructed using real-time electricity prices and grid peak shaving demand response signals, and a state vector is constructed based on the current environmental parameters, electricity price information, and user activity. The flexible adjustment potential of air conditioning load is determined based on the air conditioning power adjustment trajectory that meets the predicted average evaluation constraint within a preset time period.

5. The energy dispatching method for building parks based on the flexible potential mining of air conditioning load according to claim 4, characterized in that, The determination of the multi-objective reward function for balancing user comfort and the depth of potential for flexible adjustment includes: The air conditioning power adjustment trajectory that exceeds the range is penalized using an indicator function to obtain a first penalty term; A penalty is imposed on the air conditioning power adjustment trajectory that deviates from the target optimal comfort value corresponding to the predicted average evaluation, to obtain a second penalty term; The power regulation range and flexibility margin of the air conditioner within the range are explored using a logarithmic function to obtain an incentive term. A weighted combination of the first penalty term, the second penalty term, and the incentive term is then performed to construct a multi-objective reward function.

6. The energy dispatching method for building parks based on the flexible potential mining of air conditioning load according to claim 1, characterized in that, The step of adding random disturbance noise to the target simulation environment and performing maximum entropy reinforcement learning based on the state vector, the flexible adjustment potential, and the multi-objective reward function to obtain the air conditioning power adjustment boundary includes: Random interference noise satisfying a Gaussian distribution is added to the target simulation environment, Shannon entropy is added to the objective function of reinforcement learning, and maximum entropy reinforcement learning is performed in combination with the state vector, the flexible adjustment potential and the multi-objective reward function to obtain the trained target policy network. The air conditioning power regulation boundary is determined based on the target policy network.

7. The energy dispatching method for building parks based on the flexible potential mining of air conditioning load according to claim 1, characterized in that, The process of constructing a target optimization function with the objective of minimizing the total operating cost of the building park, and determining the target air conditioning adjustment command based on the target optimization function and the air conditioning power adjustment boundary, includes: The thermal inertia of the building complex is abstracted as an energy storage battery; A target optimization function is constructed with the goal of minimizing the total operating cost of the building park, and the power balance constraint, air conditioning comfort constraint, and energy storage battery constraint of the target optimization function are determined. The target optimization function is solved based on the air conditioning power adjustment boundary and various constraints to obtain the target air conditioning adjustment command; The power balance constraint is that the total power load of the building park is equal to the sum of the power purchased by the grid, the photovoltaic power generation, and the energy storage discharge; the air conditioning comfort constraint is that the air conditioning power is within the air conditioning power adjustment boundary; and the energy storage battery constraint is that the charging and discharging power of the energy storage battery is within a preset safety range.

8. An energy dispatching device for building parks based on the flexible potential of air conditioning load, characterized in that, include: The constraint determination module is used to establish a temperature dynamic evolution model to characterize the evolution of indoor temperature in a building, construct an air conditioning energy efficiency conversion model using the quantitative relationship between air conditioning power consumption and the evolution of indoor temperature in the building, determine the predictive average evaluation constraint based on the temperature, humidity and human activity parameters in the building, construct a state extension observer to compensate for random heat dissipation disturbances in the building, and determine the comfort entropy growth rate constraint to characterize the severity of the evolution of user comfort. The adjustment potential determination module is used to construct a target simulation environment based on the temperature dynamic evolution model, the predicted average evaluation constraint, the air conditioning energy efficiency conversion model, the state extension observer, and the comfort entropy growth rate constraint. It constructs a state vector using current environmental parameters, electricity price information, and user activity, and determines the flexible adjustment potential of the air conditioning load based on the predicted average evaluation constraint. The adjustment boundary determination module is used to determine a multi-objective reward function for balancing user comfort and the depth of flexible adjustment potential mining. Random disturbance noise is added to the target simulation environment, and maximum entropy reinforcement learning is performed based on the state vector, the flexible adjustment potential and the multi-objective reward function to obtain the air conditioning power adjustment boundary. The adjustment instruction determination module is used to construct a target optimization function with the goal of minimizing the total operating cost of the building park, determine the target air conditioning adjustment instruction based on the target optimization function and the air conditioning power adjustment boundary, complete the corresponding air conditioning load flexibility potential mining operation, and perform energy scheduling of the building park based on the air conditioning load flexibility potential mining results.

9. An electronic device, characterized in that, include: Memory, used to store computer programs; A processor is configured to execute the computer program to implement the building park energy dispatching method based on the flexible potential mining of air conditioning load as described in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, Used to store a computer program, wherein the computer program, when executed by a processor, implements the building park energy dispatching method based on the flexible potential mining of air conditioning load as described in any one of claims 1 to 7.