A building energy consumption real-time control method based on DQN

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By combining DQN with rolling optimization in building energy control, this method solves the problems of model simplification and uncertainty handling in traditional optimization algorithms and basic DRL in building energy control, achieves coordination between building energy cost and comfort, and improves the robustness and computational efficiency of optimization results.

CN117650518BActive Publication Date: 2026-06-23HOHAI UNIV

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: HOHAI UNIV
Filing Date: 2023-11-28
Publication Date: 2026-06-23

Application Information

Patent Timeline

28 Nov 2023

Application

23 Jun 2026

Publication

CN117650518B

IPC: H02J3/175; G06F30/27; G06Q50/08; G06F18/214; G06Q30/0201; G06Q50/06; F24F11/46; F24F11/64; H02J3/28; H02J3/32; H02J3/38; H02J3/008; G06F119/08; G06F111/04; H02J103/35; H02J103/40; H02J103/50; H02J101/24; H02J105/12; H02J105/42

AI Tagging

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Multi-scale adaptive gating MambaPlus network construction method and device
CN122087742AImprove multi-scale feature expression abilityAddressing Underutilized Technology IssuesBiological models Data set Feature set
Cigarette case and manufacturing process thereof
CN111907938Afit tightlyPacked tightlyContainers for flexible articles Packaging cigarette Plant fibre Moisture resistance
Novel migration-resistant PVC sealing material and application thereof
CN111647235Alow costImprove migration resistancePolymer science Plasticizer
A sealing glass well-matched with titanium alloy in wettability and a preparation method thereof
CN122212468AWetting sealing matches wellgood chemical stability
Carbon dioxide recovery cascade refrigeration system
CN224316451Ularge displacementLarge leakageMechanical apparatus Compression machines with cascade operation Mechanical engineering Mechanics

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In existing building energy consumption control methods, traditional optimization algorithms are limited by solver constraints, which leads to model simplification that affects actual results. Meanwhile, the basic DRL algorithm requires a large number of samples and high exploration costs when dealing with uncertainties, making it difficult to achieve coordination between building energy consumption and comfort.

Method used

By combining DQN and rolling optimization, a Markov decision process model and discretized control actions are constructed, a reward function is designed, and stochastic optimization is used to generate scenarios to simulate uncertainty and optimize building energy consumption strategies.

Benefits of technology

It achieves both comfort and cost reduction in building energy control, improves the robustness of optimization results and calculation speed, and enhances energy flexibility.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN117650518B_ABST

Patent Text Reader

Abstract

The application discloses a building energy consumption real-time control method based on DQN, generates multiple groups of scenes based on building photovoltaic power generation and building rigid load prediction data, optimizes building energy consumption behavior by using a DQN algorithm, reasonably designs a DQN reward function by using the concept of random optimization, and makes the optimization result have good robustness to the uncertainty of building power generation and load. The application makes the building maintain high comfort all day, increases the flexibility of building energy consumption by scheduling energy storage equipment, and reduces the building energy consumption cost.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to building energy consumption control methods, and more particularly to a real-time building energy consumption control method based on DQN. Background Technology

[0002] Building energy consumption accounts for approximately one-third of global energy demand. With the continuous increase in building energy demand, and given the current and future energy crises and environmental pollution problems, achieving energy conservation and emission reduction in buildings is crucial. However, when considering building energy conservation, the comfort of building residents must also be taken into account, as there is a inherent conflict between reducing building energy consumption and maintaining a good level of comfort for building residents.

[0003] Currently, building energy consumption control mainly employs methods based on traditional optimization algorithms and methods based on Deep Reinforcement Learning (DRL). Both of these methods suffer from the following problems:

[0004] I. Most traditional optimization algorithms use commercial solvers for solving problems. Due to the limitations of the solver's constraint format, some constraints often need to be simplified. For example, the equivalent thermal model of a building often needs to be simplified to the first or second order. However, the low-order model does not take into account the changes in the building's internal temperature caused by factors such as ventilation and radiation, and does not take into account factors that affect the building's thermal inertia, such as indoor and outdoor temperature and radiation, which affects the actual effect of the optimization results.

[0005] Second, compared with traditional optimization, machine learning can better coordinate many time and space coupled operational constraints, making it more suitable for solving large-scale building energy optimization problems in real time. It also does not require simplification of constraints and models. However, the basic DRL algorithm is insufficient to handle analytical and cognitive uncertainties. When dealing with practical energy control problems, it also has the problems of requiring a large number of samples, long exploration time, and high exploration cost. These problems are difficult to solve when dealing with practical engineering. Summary of the Invention

[0006] Purpose of the invention: The present invention aims to provide a real-time building energy control method that combines DQN with rolling optimization to coordinate building energy costs and the comfort of residents within the building.

[0007] Technical solution: The real-time building energy consumption control method based on DQN described in this invention includes the following steps:

[0008] (1) Collect N-day historical photovoltaic power generation data and N-day historical rigid load data of user buildings;

[0009] (2) Based on the historical data collected in step (1), construct a building photovoltaic power generation model and a building rigid load model using deep learning;

[0010] (3) Define the overall optimization time domain H of the building's energy consumption control, i.e., the time range of building energy management, and the control precision Δt; based on the overall optimization time domain H of the building's energy consumption control, define the rolling optimization time domain as H. * That is, the time range for each rolling optimization calculation. The time domain for the first rolling optimization is the overall optimization time domain H. With each optimization completed, the rolling optimization time domain is updated. Using building-integrated photovoltaic (BIPV) power generation models and building rigid load models, BIPV power generation and rigid load data are predicted for the future time period H′, where H′ = H * ;

[0011] (4) Construct a Markov decision process model based on the building energy consumption control problem, and determine the state space State based on the real-time control target of building energy consumption; discretize the control actions of building energy consumption control based on the air conditioning operation adjustment range of user buildings and the rated charging and discharging power of energy storage equipment, and determine the action space Action in the DQN algorithm; determine the reward function Reward in the DQN algorithm based on the stochastic optimization method.

[0012] (5) Use the DQN algorithm to solve the building energy consumption control strategy for the future time period H′;

[0013] (6) The DQN algorithm completes m rounds of training, determines whether the obtained building energy control strategy meets the convergence condition. If it does not meet the convergence condition, it continues training. If it meets the convergence condition, it outputs the training result, completes the current time period optimization control, collects the building information at this time to update the state space, updates the rolling optimization time domain, and performs the next time period optimization control until the building energy control in the overall optimization time domain H is achieved.

[0014] Furthermore, the state space State includes the indoor temperature T. in BESS's state of charge SoC and the air conditioning cooling capacity of the HVAC system dev .

[0015] Furthermore, in step (4), the discretization method for the control action discretization is as follows:

[0016] First, the air conditioning cooling capacity adjustment action is set based on the rated cooling capacity and maximum cooling capacity of the air conditioner. The air conditioning cooling capacity adjustment action includes keeping the cooling capacity unchanged, setting the cooling capacity to zero (i.e., turning off the air conditioner), and increasing / decreasing the air conditioning cooling capacity by An kW, where An is the nth group of discrete air conditioning cooling capacity change values.

[0017] Then, based on the BESS real-time SoC and the rated charge and discharge power, the charge and discharge power action of the energy storage device is set. The charge and discharge power action of the energy storage device includes keeping the charge and discharge power unchanged, setting the charge and discharge power to zero (i.e., shutting down the energy storage device), and charging and discharging the energy storage device at a power of BnkW, where Bn is n sets of discrete charge and discharge power, which are determined by the rated re-discharge power of the energy storage device.

[0018] Furthermore, the cooling capacity of the air conditioner satisfies the following constraints:

[0019]

[0020] in, This is the maximum cooling capacity of the air conditioner;

[0021] The rated re-discharge power of the energy storage device meets the following constraints:

[0022] 0≤P Ch arge ≤P max

[0023] 0≤P disch arg e≤P max

[0024] SoC min ≤SoC≤SoC max

[0025] Among them, P Ch arge and P disch arge These represent the BESS charge / discharge power, P max For the maximum charge / discharge power of BESS, SoC min and SoC max These are the upper and lower limits of the BESS state of charge, respectively.

[0026] Furthermore, the Action space includes the increase / decrease value of HVAC cooling capacity and the charging / discharging power of BESS; the increase / decrease value of HVAC cooling capacity includes keeping it unchanged, increasing / decrease by 1kW, increasing / decrease by 5kW, increasing / decrease by 10kW, increasing / decrease by 20kW, increasing / decrease by 40kW, and 0kW, where 0kW means HVAC is turned off; the charging / discharging power of BESS includes keeping it unchanged, and the charging / discharging rate is expressed in terms of battery capacity: 0.1c, 0.2c, 0.5c, and 0, where c is the battery capacity.

[0027] Furthermore, in step (4), the reward includes the building's energy cost and the occupants' thermal comfort, and the reward function Reward is...

[0028]

[0029] Where ω1 is the weight of comfort, ω2 is the weight of electricity cost, and NS CLD is the number of scenarios set based on a stochastic optimization method. s It refers to comfort in scenario s, F buy,s For the electricity purchase cost in scenario s; F sell,s Let 's' be the profit from electricity sales in scenario 's'.

[0030] Furthermore, the comfort level CLD in scenario s s Electricity purchase cost F buy,s Electricity sales profit F sell,s The details are as follows:

[0031]

[0032]

[0033]

[0034] PMV is the building thermal comfort index, T in,t Indoor temperature, T iw,t For the temperature of the interior wall, To purchase AC power in scenario s, The internet power in scenario s, For the price of electricity from the grid, Let t be the grid connection price, t be the control time, and T be the set of all control times.

[0035] Furthermore, in step (5), the DQN algorithm is used to solve the building energy control strategy for the future time period H′, specifically as follows: the rolling optimization time domain H is updated according to the optimization progress. * H * =Ha*Δt, where a is the number of optimizations completed within the overall optimization time domain H. The inputs to the DQN algorithm include the building's state space at this moment and the photovoltaic and load forecasts for the future time period H′ from this moment, where H′ = H * The output of the DQN algorithm is the optimization of building energy consumption strategy in the time domain.

[0036] Furthermore, in step (6), if the convergence condition is met, the training results are output, and only the optimized first Δt building energy consumption behavior sequence is taken.

[0037] Furthermore, in step (6), the convergence condition is:

[0038]

[0039] Among them, Reward iThe reward value for the policy of the DQN algorithm in the i-th round is given, where m is the number of training rounds, n is the number of replay rounds, and ρ is the convergence oscillation allowance.

[0040] Beneficial Effects: Compared with existing technologies, the significant advantages of this invention are: 1. This invention combines DQN with rolling optimization to coordinate building energy costs and resident comfort. 2. This invention uses stochastic optimization to design the DQN reward function setting method, randomly generating multiple sets of scenarios based on predicted solar power generation and load data to simulate the uncertainty of future building power generation and consumption, making the optimization results more robust in actual control processes. 3. This invention uses a discretization method for building energy management control actions, discretizing the control actions of building energy control, rationally setting the action space in the DQN algorithm, reducing redundant action space, and accelerating calculation speed. 4. This invention ensures the effectiveness of building energy control, enabling buildings to maintain a high level of comfort throughout the day, while increasing the flexibility of building energy use and reducing building energy costs by scheduling energy storage equipment. Attached Figure Description

[0041] Figure 1 This is a schematic diagram of the process of the present invention;

[0042] Figure 2 This is a schematic diagram of the time domain for DQN-based rolling optimization.

[0043] Figure 3 A schematic diagram showing the building photovoltaic power generation and total building energy consumption on the day after the completion of real-time building energy consumption control;

[0044] Figure 4 A schematic diagram illustrating the changes in building room temperature and air conditioning cooling capacity over one day after achieving real-time control of building energy consumption;

[0045] Figure 5 A schematic diagram showing the changes in the operating status and grid-connected power of building energy storage equipment after real-time control of building energy consumption. Detailed Implementation

[0046] The invention will now be further described with reference to the accompanying drawings.

[0047] The DQN-based real-time building energy consumption control method of this invention requires the collection of historical photovoltaic power generation data and historical load data of user buildings. During the control process, it is necessary to perform real-time measurement and collection of the following parameters from user node meters, distributed power sources, energy storage devices, and other electrical equipment: historical N-day photovoltaic power generation data, historical N-day rigid load data, real-time building temperature, building exterior wall temperature, building interior wall temperature, adjacent room temperature, air conditioning power consumption, air conditioning cooling capacity, energy storage device charging and discharging power, and energy storage state of charge, etc. The DQN-based real-time building energy consumption control method includes the following steps:

[0048] (1) Collect N-day historical photovoltaic power generation data and N-day historical rigid load data of user buildings;

[0049] (2) Based on the historical data collected in step (1), construct a building photovoltaic power generation model and a building rigid load model using deep learning;

[0050] (3) Set the overall optimization time domain H of building energy consumption control, i.e., the time range of building energy management, and the control accuracy Δt; based on the overall optimization time domain H of building energy consumption control, set the rolling optimization time domain as H. * That is, the time range for each rolling optimization calculation. The time domain for the first rolling optimization is the overall optimization time domain H. With each optimization completed, the rolling optimization time domain is updated. Using building-integrated photovoltaic (BIPV) power generation models and building rigid load models, BIPV power generation and rigid load data are predicted for the future time period H′, where H′ = H * ;

[0051] (4) Construct a Markov decision process model based on the building energy consumption control problem, and determine the state space State based on the real-time control target of building energy consumption; discretize the control actions of building energy consumption control based on the air conditioning operation adjustment range of user buildings and the rated charging and discharging power of energy storage equipment, and determine the action space Action in the DQN algorithm; determine the reward function Reward in the DQN algorithm based on the stochastic optimization method.

[0052]

[0053] After determining appropriate building state and action spaces, a DQN reward function method is designed based on stochastic optimization principles to increase the robustness of optimization results under uncertainties in future building photovoltaic power generation and load, thereby improving optimization performance. Based on predicted data, N is randomly generated. S A set of scenarios is used to simulate the uncertainties of future building-based photovoltaic power generation and building load. The rewards are as follows:

[0054]

[0055] The state space State includes the indoor temperature T. in BESS's state of charge SoC and the air conditioning cooling capacity of the HVAC system dev .

[0056] The discretization method for control action discretization is as follows:

[0057] First, based on the rated and maximum cooling capacity of the air conditioner, the air conditioner cooling capacity adjustment actions are set. These actions include keeping the cooling capacity constant, setting the cooling capacity to zero (i.e., turning off the air conditioner), and increasing / decreasing the cooling capacity by An kW, where An is the nth discrete air conditioner cooling capacity change value. The air conditioner cooling capacity satisfies the following constraints:

[0058]

[0059] in, This is the maximum cooling capacity of the air conditioner;

[0060] Based on the BESS real-time SoC and rated charge / discharge power, the charging / discharge power operation of the energy storage device is set. This operation includes maintaining the charging / discharge power constant, setting the charging / discharge power to zero (i.e., shutting down the energy storage device), and charging / discharging the energy storage device at a power of BnkW, where Bn represents n discrete charging / discharging power sets determined by the rated re-discharge power of the energy storage device. The rated re-discharge power of the energy storage device satisfies the following constraints:

[0061] 0≤P Ch arge ≤P max

[0062] 0≤P disch arge ≤P max

[0063] SoC min ≤SoC≤SoC max

[0064] Among them, P Ch arge and P disch arge These represent the BESS charge / discharge power, P max For the maximum charge / discharge power of BESS, SoC min and SoC max These are the upper and lower limits of the BESS state of charge, respectively.

[0065] The Action space includes the increase / decrease value of HVAC cooling capacity and the charging / discharging power of BESS; the increase / decrease value of HVAC cooling capacity includes keeping it unchanged, increasing / decrease by 1kW, increasing / decrease by 5kW, increasing / decrease by 10kW, increasing / decrease by 20kW, increasing / decrease by 40kW and 0kW, where 0kW means HVAC is turned off; the charging / discharging power of BESS includes keeping it unchanged, and the charging / discharging rate is expressed in terms of battery capacity: 0.1c, 0.2c, 0.5c and 0, where c is the battery capacity.

[0066] The rewards include the building's energy costs and the thermal comfort of its occupants, and the reward function Reward is...

[0067]

[0068]

[0069]

[0070]

[0071] Where ω1 is the weight of comfort, ω2 is the weight of electricity cost, and N S CLD is the number of scenarios set based on a stochastic optimization method. s It refers to comfort in scenario s, F buy,s For the electricity purchase cost in scenario s; F sell,s Electricity sales profit under scenario s; PMV is the building thermal comfort index, T in,t Indoor temperature, T iw,t For the temperature of the interior wall, To purchase AC power in scenario s, The internet power in scenario s, For the price of electricity from the grid, Let t be the grid connection price, t be the control time, and T be the set of all control times.

[0072] (5) Use the DQN algorithm to solve the building energy consumption control strategy for the future time period H′; the details are as follows:

[0073] Update the rolling optimization time domain H according to the optimization process. * H * =Ha*Δt, where a is the number of optimizations completed within the overall optimization time domain H. The inputs to the DQN algorithm include the building's state space at this moment and the photovoltaic and load forecasts for the future time period H′ from this moment, where H′ = H * The output of the DQN algorithm is the optimization of building energy consumption strategy in the time domain.

[0074] (6) The DQN algorithm completes m rounds of training, determines whether the obtained building energy control strategy meets the convergence condition. If it does not meet the convergence condition, training continues; if it meets the convergence condition, the training result is output, the current time period optimization control is completed, the building's current information is collected to update the state space, the rolling optimization time domain is updated, and the next time period optimization control is performed until the overall optimization time domain H is achieved. If the convergence condition is met, the training result is output, and only the optimized building energy behavior sequence for the first Δt is taken. The convergence condition is...

[0075]

[0076] Among them, Reward iThe reward value for the policy of the DQN algorithm in the i-th round is given, where m is the number of training rounds, n is the number of replay rounds, and ρ is the convergence oscillation allowance.

[0077] This invention uses an office building in southern China as the computational scenario. The building has a total area of 528.08 square meters, two floors above ground, and a height of 7.2 meters. Its main function is to provide office and meeting space. The building is energy-efficient, equipped with photovoltaic tiles and photovoltaic railings as photovoltaic power generation devices, and also features energy storage equipment. The building envelope has undergone energy-saving renovations, with insulation measures applied to the exterior walls, roof, and windows. The operation of the HVAC and energy storage devices can be controlled via IoT devices. The building's energy control system participates in energy consumption control 24 / 7, with control intervals of 15 minutes.

[0078] like Figure 3 As shown, the building-integrated photovoltaic power generation and total building energy consumption are displayed one day after the implementation of real-time building energy consumption control. Figure 4 To achieve real-time control of building energy consumption and the changes in building room temperature and air conditioning cooling capacity over one day, Figure 5 To achieve real-time control of building energy consumption and changes in the operating status and grid-connected power of building energy storage equipment.

[0079] This invention utilizes energy control methods to maintain a building's indoor temperature between 24.5 and 28 degrees Celsius throughout the day, ensuring a high level of comfort. During nighttime hours when electricity prices are low, grid electricity is used for charging. At midday, when photovoltaic (PV) power generation exceeds the building's power consumption, the energy storage device charges, and excess electricity is fed into the grid for profit. Discharge occurs during the afternoon when electricity prices are high, reducing the building's grid-connected power and thus lowering electricity costs. The energy storage device undergoes two main discharge and charging processes: the first discharge occurs during the morning peak load period, when electricity prices are high, including PV grid-connected prices, and PV power generation can cover the building's load; at this time, the energy storage device discharges to generate profit. The second discharge occurs during the evening peak load period, when PV power generation cannot cover the building's load; the energy storage device discharges to reduce grid electricity usage and lower electricity costs. The two charging processes occur during the midday PV power generation peak and the nighttime low-price period, respectively. Therefore, the operation of the energy storage device increases the building's energy flexibility and reduces its overall daily energy costs.

Claims

1. A method for real-time control of building energy consumption based on DQN, characterized in that, Includes the following steps: (1) Collect N-day historical photovoltaic power generation data and N-day historical rigid load data of user buildings; (2) Based on the historical data collected in step (1), construct a building photovoltaic power generation model and a building rigid load model using deep learning; (3) Define the overall optimization time domain H of the building's energy consumption control, i.e., the time range of building energy management, and the control precision Δt; based on the overall optimization time domain H of the building's energy consumption control, define the rolling optimization time domain as H. * That is, the time range for each rolling optimization calculation. The time domain for the first rolling optimization is the overall optimization time domain H. With each optimization completed, the rolling optimization time domain is updated. Using building-integrated photovoltaic (BIPV) power generation models and building rigid load models, BIPV power generation and rigid load data are predicted for the future time period H′, where H′ = H * ; (4) Construct a Markov decision process model based on the building energy consumption control problem, and determine the state space State based on the real-time control target of building energy consumption; discretize the control actions of building energy consumption control based on the air conditioning operation adjustment range of user buildings and the rated charging and discharging power of energy storage equipment, and determine the action space Action in the DQN algorithm; determine the reward function Reward in the DQN algorithm based on the stochastic optimization method. (5) Use the DQN algorithm to solve the building energy consumption control strategy for the future time period H′; (6) The DQN algorithm completes m rounds of training, determines whether the obtained building energy control strategy meets the convergence condition. If it does not meet the convergence condition, it continues training. If it meets the convergence condition, it outputs the training result, completes the current time period optimization control, collects the building information at this time to update the state space, updates the rolling optimization time domain, and performs the next time period optimization control until the building energy control in the overall optimization time domain H is achieved.

2. The method for real-time building energy consumption control based on DQN according to claim 1, characterized in that, In step (4), the state space State includes the indoor temperature T. in BESS's state of charge SoC and the air conditioning cooling capacity of the HVAC system dev .

3. The method for real-time control of building energy consumption based on DQN according to claim 2, characterized in that, In step (4), the discretization method for the control action discretization is as follows: First, the air conditioning cooling capacity adjustment action is set based on the rated cooling capacity and maximum cooling capacity of the air conditioner. The air conditioning cooling capacity adjustment action includes keeping the cooling capacity unchanged, setting the cooling capacity to zero (i.e., turning off the air conditioner), and increasing / decreasing the air conditioning cooling capacity by An kW, where An is the nth group of discrete air conditioning cooling capacity change values. Then, based on the BESS real-time SoC and the rated charge and discharge power, the charge and discharge power action of the energy storage device is set. The charge and discharge power action of the energy storage device includes keeping the charge and discharge power unchanged, setting the charge and discharge power to zero (i.e., shutting down the energy storage device), and charging and discharging the energy storage device at a power of BnkW, where Bn is n sets of discrete charge and discharge power, which are determined by the rated re-discharge power of the energy storage device.

4. The method for real-time control of building energy consumption based on DQN according to claim 3, characterized in that, The air conditioning cooling capacity meets the following constraints: in, This is the maximum cooling capacity of the air conditioner; The rated re-discharge power of the energy storage device meets the following constraints: 0≤P Ch arge ≤P max 0≤P disch arge ≤p max SoC min ≤SoC≤SoC max Among them, P Ch arge and P disch arge These represent the BESS charge / discharge power, P max For the maximum charge / discharge power of BESS, SoC min and SoC max These are the upper and lower limits of the BESS state of charge, respectively.

5. The method for real-time control of building energy consumption based on DQN according to claim 4, characterized in that, In step (4), the Action space includes the increase / decrease value of HVAC cooling capacity and the charging / discharging power of BESS; the increase / decrease value of HVAC cooling capacity includes keeping it unchanged, increasing / decrease by 1kW, increasing / decrease by 5kW, increasing / decrease by 10kW, increasing / decrease by 20kW, increasing / decrease by 40kW and 0kW, where 0kW means turning off HVAC; the charging / discharging power of BESS includes keeping it unchanged, and the charging / discharging rate is expressed in terms of battery capacity: 0.1c, 0.2c, 0.5c and 0, where c is the battery capacity.

6. The method for real-time control of building energy consumption based on DQN according to claim 5, characterized in that, In step (4), the reward includes the building's energy cost and the occupants' thermal comfort, and the reward function Reward is... Where ω1 is the weight of comfort, ω2 is the weight of electricity cost, and N S CLD is the number of scenarios set based on a stochastic optimization method. s It refers to comfort in scenario s, F buy,s For the electricity purchase cost in scenario s; F sell,s The electricity sales profit under scenario s.

7. The real-time building energy consumption control method based on DQN according to claim 6, characterized in that, Comfort CLD in Scenario s s Electricity purchase cost F buy,s Electricity sales profit F sell,s The details are as follows: PMV is the building thermal comfort index, T in,t Indoor temperature, T iw,t For the temperature of the interior wall, To purchase AC power in scenario s, The internet power in scenario s, For the price of electricity from the grid, Let t be the grid connection price, t be the control time, and T be the set of all control times.

8. The method for real-time control of building energy consumption based on DQN according to claim 1, characterized in that, In step (5), the DQN algorithm is used to solve the building energy control strategy for the future time period H′, specifically as follows: the rolling optimization time domain H is updated according to the optimization progress. * H * =Ha*Δt, where a is the number of optimizations completed within the overall optimization time domain H. The input of the DQN algorithm includes the building's state space at this moment and the photovoltaic and load forecast values for the future time period H′ from this moment. The output of the DQN algorithm is the building's energy consumption strategy within the optimization time domain.

9. The method for real-time control of building energy consumption based on DQN according to claim 1, characterized in that, In step (6), if the convergence condition is met, the training results are output, and only the optimized first Δt building energy consumption behavior sequence is taken.

10. The method for real-time control of building energy consumption based on DQN according to claim 1, characterized in that, In step (6), the convergence condition is: Among them, Reward i The reward value for the policy of the DQN algorithm in the i-th round is given, where m is the number of training rounds, n is the number of replay rounds, and ρ is the convergence oscillation allowance.