Smart city building cluster management method and system
By combining a reinforcement learning energy consumption prediction model with an online learning mechanism, the problems of insufficient prediction accuracy and real-time response capability in smart city building management systems have been solved, achieving efficient energy management and safety monitoring, and improving the system's adaptability and user satisfaction.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- LIANYUNGANG WANCHANG TECHNOLOGY CO LTD
- Filing Date
- 2025-03-19
- Publication Date
- 2026-06-23
Smart Images

Figure CN120146403B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of smart city technology, and in particular relates to a method and system for managing smart city building clusters. Background Technology
[0002] With the acceleration of urbanization, traditional building management models are proving inadequate in meeting the demands of modern cities for efficiency, energy conservation, and safety. Traditional methods rely on manual monitoring and regulation, resulting in inefficiency, resource waste, and safety hazards. Manual inspections and paper records lead to information lag and an inability to respond to environmental changes in real time. The lack of real-time monitoring and data analysis of building internal and external environmental parameters results in energy waste. Safety monitoring systems rely on manual inspections and simple alarm devices, making it difficult to detect hazards in real time, and emergency response mechanisms are inadequate. In recent years, the development of IoT and big data technologies has provided new solutions for intelligent building management systems. By collecting environmental parameters, energy consumption, and pedestrian flow data in real time through sensor networks and combining them with big data analytics, intelligent and automated building management can be achieved. However, existing systems still suffer from insufficient prediction accuracy, limited real-time response capabilities, and a lack of self-learning ability. Energy consumption prediction models are mostly based on static algorithms, making it difficult to adapt to dynamically changing environments. Emergency response relies on fixed strategies and lacks dynamic adjustment capabilities. The system cannot continuously optimize models and strategies based on historical data and real-time feedback. Therefore, this invention proposes a smart city building cluster management method and system. Summary of the Invention
[0003] To solve the above-mentioned technical problems, the present invention is achieved through the following technical solution:
[0004] This invention relates to a smart city building cluster management method, comprising the following steps;
[0005] Step 1: Collect real-time building interior and exterior environmental parameters, energy consumption data, and pedestrian flow data through IoT sensors, and clean and preprocess the data to remove outliers and fill in missing values;
[0006] Step 2: Based on the cleaned data, a hybrid reinforcement learning energy consumption prediction model is used for big data modeling and intelligent analysis, and the model weights are adjusted through dynamic coefficients;
[0007] Step 3: Based on the prediction results, the system performs dynamic resource scheduling and decision support. When the predicted energy consumption is higher than the threshold, the air conditioning temperature is automatically adjusted or unnecessary lighting is turned off, and elevator scheduling is optimized in combination with the prediction of passenger flow.
[0008] Step 4: The system continuously optimizes the model through online learning. The Q-Learning strategy updates the action value function based on real-time feedback, and the long short-term memory network model is incrementally trained every 24 hours.
[0009] Step 5: Verify the model's effectiveness through comparative testing, and evaluate prediction error, energy saving rate, and user satisfaction.
[0010] Furthermore, in step one, environmental parameters inside and outside the building, energy consumption data, and pedestrian flow data are collected in real time through an Internet of Things sensor network. The collected data is cleaned, outliers are removed, and missing data is filled in. Outliers exceeding the mean ± 3 times the standard deviation are removed using the 3σ principle, and missing data are filled in using time series linear interpolation. Finally, the multi-source heterogeneous data is normalized to provide input data for modeling and analysis.
[0011] Furthermore, in step two, a long short-term memory network module is constructed, which takes as input energy consumption data Xt from n historical time periods and outputs time-series predicted values, as shown in the following formula:
[0012] LSTM(Xt)=fLSTM(Xt-n,Xt+1,...,Xt);
[0013] In the formula, LSTM(Xt) is the time series prediction value based on historical data Xt, fLSTM is the time series prediction value output by taking n historical energy consumption data Xt as input, Xt-n, Xt+1,...,Xt as the historical data sequence from time tn to time t, and n is the length of the historical data used for prediction.
[0014] The Q-Learning module is introduced to define the current state st, the current action at, and the reward function R. The formula for the reward function R is as follows:
[0015] R=-Eactual(t+1)-Epred(t+1)|;
[0016] In the formula, Eactual(t+1) is the actual energy consumption value at time t+1, and Epred(t+1) is the predicted energy consumption value at time t+1.
[0017] The formula for the Q-Learning correction value is as follows:
[0018] α(t)×Q(st,at);
[0019] In the formula, α(t) represents the dynamic coefficient used to adjust the weight of the Q-Learning correction value, and Q(st,at) represents the output of the Q-Learning module, that is, the Q value of the action at when the action is performed in state st.
[0020] The formula for the weight α(t) is as follows:
[0021]
[0022] In the formula, k is a sensitivity parameter used to control the rate of change of the dynamic coefficient, and ΔE(t) is the mean of recent prediction errors, used to reflect the deviation of the model prediction;
[0023] The final energy consumption prediction is obtained by combining the predictions from the Long Short-Term Memory network with the Q-Learning correction value:
[0024] Epred(t+1)=LSTM(Xt)+α(t)×Q(st,at);
[0025] In the formula, Epred(t+1) is the predicted energy consumption value at time t+1.
[0026] Furthermore, in step three, based on the prediction results of the hybrid reinforcement learning energy consumption prediction model, it is determined whether the future energy consumption exceeds a preset threshold. If the predicted energy consumption is lower than the threshold, the current device state is maintained. If the predicted energy consumption is higher than the threshold, a resource scheduling strategy is triggered, automatically increasing the air conditioning temperature setpoint by 1°C and turning off unnecessary lighting. The decision formula is as follows:
[0027]
[0028] In the formula, Ethreshold is the preset energy consumption threshold, Adjust is the implementation of the energy-saving strategy, and Maintain is to keep the current device state unchanged.
[0029] Furthermore, in step four, the latest energy consumption data, environmental parameters, and pedestrian flow data are collected every 24 hours for incremental model training. The Q-Learning strategy table is updated using the reward value, and the real-time update formula is as follows:
[0030]
[0031] In the formula, η is the learning rate, used to control the speed at which the Q-value is updated, and γ is the discount factor, used to measure the importance of future rewards. The maximum Q value is determined by selecting the optimal action a in the next state st+1.
[0032] Furthermore, in step five, the hybrid reinforcement learning energy consumption prediction model is compared with the traditional ARIMA model to verify the accuracy improvement and energy saving effect of the hybrid reinforcement learning energy consumption prediction. The performance of the updated model is evaluated through verification experiments. If the effect is worse than that of the traditional ARIMA model, the model is updated again in step four.
[0033] The smart city building cluster management system includes a data acquisition and processing module, a big data modeling and analysis module, a dynamic resource scheduling and decision-making module, a security monitoring and optimization module, and a system self-learning module.
[0034] The data acquisition and processing module is used to collect building interior and exterior environmental parameters, energy consumption data and pedestrian flow data in real time, and to clean, remove outliers, fill in missing values and standardize the raw data.
[0035] The big data modeling and analysis module receives standardized data from the data acquisition and processing module. Based on the hybrid reinforcement learning energy consumption prediction model, combined with the long short-term memory network module and the Q-Learning algorithm, it is used to realize dynamic and adaptive energy consumption prediction, providing energy demand prediction and population distribution prediction for a period of time in the future.
[0036] The dynamic resource scheduling decision module dynamically adjusts the status of air conditioning and lighting equipment and elevator operation strategies based on energy consumption and passenger flow prediction results, generates energy-saving strategies and emergency evacuation plans, assists management personnel in decision-making, receives prediction results from the big data modeling and analysis module, and outputs control commands to building equipment.
[0037] The safety monitoring and optimization module receives real-time safety data from the sensor network, outputs alarm information to management personnel and the emergency system, monitors fire and safety hazards in real time through smoke detectors, cameras and other equipment, triggers emergency response, analyzes user behavior data, and optimizes elevator operation mode and water and power supply strategies.
[0038] The system's self-learning module continuously collects historical and real-time data, trains machine learning models, optimizes prediction accuracy, receives prediction errors from the big data modeling and analysis module and scheduling effects from the dynamic resource scheduling decision module, and outputs optimized model parameters.
[0039] The present invention has the following beneficial effects:
[0040] 1. This invention utilizes a hybrid reinforcement learning energy consumption prediction model, combining a long short-term memory network and a Q-learning algorithm, to dynamically adjust model weights and accurately predict future energy consumption needs. Compared to traditional static prediction models, this model significantly improves prediction accuracy in complex environments such as extreme weather, reducing prediction errors by 20%. Based on the prediction results, the system can automatically adjust air conditioning temperatures, turn off unnecessary lighting equipment, and optimize elevator scheduling by combining pedestrian flow prediction, thereby achieving efficient energy utilization and reducing overall energy consumption by 10%.
[0041] 2. Through online learning and incremental training mechanisms, the system can continuously optimize the model based on real-time feedback and historical data. The Q-Learning strategy updates the action value function based on real-time energy consumption errors, and the long short-term memory network model is incrementally trained every 24 hours to ensure that the model can adapt to dynamically changing environments. In addition, the system can monitor building internal and external environmental parameters, energy consumption data, and pedestrian flow information in real time, and dynamically adjust resource scheduling strategies based on prediction results, thereby improving the system's real-time response capability, reducing elevator waiting time by 15%, and improving user satisfaction.
[0042] Of course, any product implementing this invention does not necessarily need to achieve all of the advantages described above at the same time. Attached Figure Description
[0043] To more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0044] Figure 1 This is a flowchart illustrating the smart city building cluster management method of the present invention.
[0045] Figure 2 This is a flowchart illustrating the smart city building cluster management system of the present invention. Detailed Implementation
[0046] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0047] Please see Figures 1-2 As shown, the present invention is a smart city building cluster management method, which includes the following steps;
[0048] Step 1: Collect real-time building interior and exterior environmental parameters, energy consumption data, and pedestrian flow data through IoT sensors, and clean and preprocess the data to remove outliers and fill in missing values;
[0049] Step 2: Based on the cleaned data, a hybrid reinforcement learning energy consumption prediction model is used for big data modeling and intelligent analysis, and the model weights are adjusted through dynamic coefficients;
[0050] Step 3: Based on the prediction results, the system performs dynamic resource scheduling and decision support. When the predicted energy consumption is higher than the threshold, the air conditioning temperature is automatically adjusted or unnecessary lighting is turned off, and elevator scheduling is optimized in combination with the prediction of passenger flow.
[0051] Step 4: The system continuously optimizes the model through online learning. The Q-Learning strategy updates the action value function based on real-time feedback, and the long short-term memory network model is incrementally trained every 24 hours.
[0052] Step 5: Verify the model's effectiveness through comparative testing, and evaluate prediction error, energy saving rate, and user satisfaction.
[0053] In step one, environmental parameters, energy consumption data, and pedestrian flow data inside and outside the building are collected in real time through an IoT sensor network. The collected data is cleaned, outliers are removed, and missing data is filled in. Outliers exceeding the mean ± 3 times the standard deviation are removed using the 3σ principle, and missing data are filled in using time series linear interpolation. Finally, the multi-source heterogeneous data is normalized to provide input data for modeling and analysis.
[0054] In step two, a long short-term memory network module is constructed. It takes energy consumption data Xt from n historical time periods as input and outputs time-series predicted values, as shown in the following formula:
[0055] LSTM(Xt)=fLSTM(Xt-n,Xt+1,...,Xt);
[0056] In the formula, LSTM(Xt) is the time series prediction value based on historical data Xt, fLSTM is the time series prediction value output by taking n historical energy consumption data Xt as input, Xt-n, Xt+1,...,Xt as the historical data sequence from time tn to time t, and n is the length of the historical data used for prediction.
[0057] The Q-Learning module is introduced to define the current state st, the current action at, and the reward function R. The formula for the reward function R is as follows:
[0058] R=-Eactual(t+1)-Epred(t+1)|;
[0059] In the formula, Eactual(t+1) is the actual energy consumption value at time t+1, and Epred(t+1) is the predicted energy consumption value at time t+1.
[0060] The formula for the Q-Learning correction value is as follows:
[0061] α(t)×Q(st,at);
[0062] In the formula, α(t) represents the dynamic coefficient used to adjust the weight of the Q-Learning correction value, and Q(st,at) represents the output of the Q-Learning module, that is, the Q value of the action at when the action is performed in state st.
[0063] The formula for the weight α(t) is as follows:
[0064]
[0065] In the formula, k is a sensitivity parameter used to control the rate of change of the dynamic coefficient, and ΔE(t) is the mean of recent prediction errors, used to reflect the deviation of the model prediction;
[0066] The final energy consumption prediction is obtained by combining the predictions from the Long Short-Term Memory network with the Q-Learning correction value:
[0067] Epred(t+1)=LSTM(Xt)+α(t)×Q(st,at);
[0068] In the formula, Epred(t+1) is the predicted energy consumption value at time t+1.
[0069] In step three, based on the prediction results of the hybrid reinforcement learning energy consumption prediction model, it is determined whether the future energy consumption will exceed a preset threshold. If the predicted energy consumption is lower than the threshold, the current device state is maintained. If the predicted energy consumption is higher than the threshold, a resource scheduling strategy is triggered, automatically increasing the air conditioning temperature setpoint by 1°C and turning off unnecessary lighting. The decision formula is as follows:
[0070]
[0071] In the formula, Ethreshold is the preset energy consumption threshold, Adjust is the implementation of the energy-saving strategy, and Maintain is to keep the current device state unchanged.
[0072] In step four, the latest energy consumption data, environmental parameters, and pedestrian flow data are collected every 24 hours for incremental model training. The Q-Learning strategy table is updated using the reward value, and the real-time update formula is as follows:
[0073]
[0074] In the formula, η is the learning rate, used to control the speed at which the Q-value is updated, and γ is the discount factor, used to measure the importance of future rewards. The maximum Q value is determined by selecting the optimal action a in the next state st+1.
[0075] In step five, the hybrid reinforcement learning energy consumption prediction model is compared with the traditional ARIMA model to verify the accuracy improvement and energy saving effect of the hybrid reinforcement learning energy consumption prediction. The performance of the updated model is evaluated through validation experiments. If the effect is worse than that of the traditional ARIMA model, the model is updated again in step four.
[0076] The smart city building cluster management system includes a data acquisition and processing module, a big data modeling and analysis module, a dynamic resource scheduling and decision-making module, a security monitoring and optimization module, and a system self-learning module.
[0077] The data acquisition and processing module is used to collect building interior and exterior environmental parameters, energy consumption data, and pedestrian flow data in real time, and to clean, remove outliers, fill in missing values, and standardize the raw data.
[0078] The big data modeling and analysis module receives standardized data from the data acquisition and processing module. Based on the hybrid reinforcement learning energy consumption prediction model, combined with the long short-term memory network module and the Q-Learning algorithm, it is used to achieve dynamic and adaptive energy consumption prediction, providing energy demand prediction and population distribution prediction for a period of time in the future.
[0079] The dynamic resource scheduling decision module dynamically adjusts the status of air conditioning and lighting equipment and elevator operation strategies based on energy consumption and passenger flow prediction results, generates energy-saving strategies and emergency evacuation plans, assists managers in decision-making, receives prediction results from the big data modeling and analysis module, and outputs control commands to building equipment.
[0080] The safety monitoring and optimization module receives real-time safety data from the sensor network, outputs alarm information to management personnel and emergency systems, monitors fire and safety hazards in real time through smoke detectors, cameras and other equipment, triggers emergency response, analyzes user behavior data, and optimizes elevator operation mode and water and power supply strategies.
[0081] The system's self-learning module continuously collects historical and real-time data, trains machine learning models, optimizes prediction accuracy, receives prediction errors from the big data modeling and analysis module and scheduling effects from the dynamic resource scheduling decision module, and outputs optimized model parameters.
[0082] One specific application of this embodiment is:
[0083] 1. Scene Description
[0084] Large commercial complexes, including office areas, shopping malls, and hotels, require intelligent energy management. The complex is equipped with temperature and humidity sensors, smart meters, water meters, and pedestrian flow cameras to collect environmental parameters, energy consumption data, and pedestrian flow information in real time. Through the hybrid reinforcement learning energy consumption prediction model of this invention, dynamic energy consumption prediction and optimization can be achieved.
[0085] 2. Implementation Steps
[0086] Step 1: Data Acquisition and Preprocessing
[0087] Data collection:
[0088] Temperature and humidity sensor: Collects indoor and outdoor temperature and humidity data every 5 minutes;
[0089] Smart meters: record electricity consumption data every 15 minutes;
[0090] People counting camera: counts the number of people entering and exiting every 10 minutes;
[0091] Data cleaning:
[0092] Outlier removal: The power consumption data at 12 noon on a certain day was 5000kW, which far exceeded the normal range (mean ± 3 times standard deviation), and was identified as an outlier and removed.
[0093] Missing value imputation: When temperature and humidity data for a certain period are missing, linear interpolation is used to fill in the missing data.
[0094] Data standardization: Z-Score standardization is performed on temperature, humidity, energy consumption, and pedestrian flow data to eliminate differences in dimensions;
[0095] Step 2: Hybrid Reinforcement Learning Energy Prediction
[0096] Model input:
[0097] Historical data: Temperature, humidity, energy consumption, and pedestrian traffic data for the past 24 hours;
[0098] Real-time data: Environmental parameters and pedestrian density at the current point in time;
[0099] Long Short-Term Memory Network Module:
[0100] Input historical 24-hour data to predict energy consumption for the next hour.
[0101] Output: Preliminary predicted value Epred(t+1);
[0102] Q-Learning module:
[0103] Status st: Current energy consumption, temperature and humidity, and crowd density;
[0104] Action at: Adjust the correction magnitude of the predicted value (±5%, ±10%);
[0105] Reward function R: R = -Eactual(t+1) - Epred(t+1)|;
[0106] Dynamic coefficient α(t):
[0107] Where Δt is the mean of recent prediction errors, and k = 0.5;
[0108] Final predicted value: Epred(t+1) = LSTM(Xt) + α(t) × Q(st,at);
[0109] Step 3: Dynamic Resource Scheduling and Decision Support
[0110] Energy consumption optimization:
[0111] If the system predicts that energy consumption will exceed the threshold (5000kW) in the next hour, it will automatically perform the following actions:
[0112] Increase the air conditioner temperature setting by 1℃ (from 24℃ to 25℃);
[0113] Turn off non-essential lighting equipment in the mall (such as corridor lights and advertising light boxes);
[0114] Elevator dispatching optimization:
[0115] Based on the passenger flow prediction model, the 6 pm rush hour is predicted to be the evening rush hour. The system will increase the frequency of elevator operation in advance to reduce waiting time.
[0116] Step 4: System Self-Learning and Upgrading
[0117] Q-Learning strategy update:
[0118] Update the Q-value table based on the error between actual energy consumption and predicted values:
[0119]
[0120] Incremental training of LSTM models:
[0121] Every 24 hours, the LSTM network parameters are fine-tuned using the latest data to ensure the model adapts to dynamic changes;
[0122] Step 5: Verification and Effectiveness Evaluation
[0123] Evaluation indicators:
[0124] Prediction error: The MAE of the hybrid reinforcement learning energy consumption prediction model is 120kW, which is 20% lower than that of the traditional ARI MA model (MAE = 150kW);
[0125] Energy saving rate: Through dynamic scheduling, overall energy consumption is reduced by 8%;
[0126] User satisfaction: Elevator waiting time reduced by 15%, indoor comfort rating improved by 10%;
[0127] Comparative Test:
[0128] Compared with traditional models, the hybrid reinforcement learning energy consumption prediction model significantly improves the prediction accuracy under extreme weather conditions.
[0129] In the description of this specification, references to terms such as "an embodiment," "example," "specific example," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.
[0130] The preferred embodiments of the present invention disclosed above are merely illustrative of the invention. These preferred embodiments do not exhaustively describe all details, nor do they limit the invention to the specific implementations described. Clearly, many modifications and variations can be made based on the content of this specification. This specification selects and specifically describes these embodiments to better explain the principles and practical applications of the invention, thereby enabling those skilled in the art to better understand and utilize the invention. The invention is limited only by the claims and their full scope and equivalents.
Claims
1. A smart city building cluster management method, characterized by: Includes the following steps; Step 1: Collect real-time building interior and exterior environmental parameters, energy consumption data, and pedestrian flow data through IoT sensors, and clean and preprocess the data to remove outliers and fill in missing values; Step 2: Based on the cleaned data, a hybrid reinforcement learning energy consumption prediction model is used for big data modeling and intelligent analysis, and the model weights are adjusted through dynamic coefficients; In step two, a long short-term memory network module is constructed, and historical data is input. Energy consumption data for a specific time period The time series prediction value is output using the following formula: ; In the formula, Based on historical data The time series predicted value, To input history Energy consumption data for a specific time period Output time series prediction values. To indicate from time Time Historical data sequences, The length of the historical data used for prediction; Introducing the Q-Learning module to monitor the current state Current action and reward function Define the reward function. The formula is as follows: ; In the formula, In time The actual energy consumption at any given time. In time Predicted energy consumption at any given time; The formula for the Q-Learning correction value is as follows: ; In the formula, This represents a dynamic coefficient used to adjust the weights of the Q-Learning correction value. This represents the output of the Q-Learning module, i.e., in the state. Next action of value; The weighting formula is as follows: ; In the formula, This is a sensitivity parameter used to control the rate of change of the dynamic coefficient. This is the mean of recent prediction errors, used to reflect the deviation of the model's predictions; The final energy consumption prediction is obtained by combining the predictions from the Long Short-Term Memory network with the Q-Learning correction value: ; In the formula, In time Predicted energy consumption at any given time; Step 3: Based on the prediction results, the system performs dynamic resource scheduling and decision support. When the predicted energy consumption is higher than the threshold, the air conditioning temperature is automatically adjusted or unnecessary lighting is turned off, and elevator scheduling is optimized in combination with the prediction of passenger flow. Step 4: The system continuously optimizes the model through online learning. The Q-Learning strategy updates the action value function based on real-time feedback, and the long short-term memory network model is incrementally trained every 24 hours. Step 5: Verify the model's effectiveness through comparative testing, and evaluate prediction error, energy saving rate, and user satisfaction.
2. The smart city building cluster management method according to claim 1, characterized in that, In step one, environmental parameters, energy consumption data, and pedestrian flow data of the building's interior and exterior are collected in real time through an Internet of Things (IoT) sensor network. The collected data is cleaned, outliers are removed, and missing data is filled in. Outliers exceeding the mean ± 3 times the standard deviation are removed using the 3σ principle, and missing data are filled in using time series linear interpolation. Finally, the multi-source heterogeneous data is normalized to provide input data for modeling and analysis.
3. The smart city building cluster management method according to claim 1, characterized in that, In step three, based on the prediction results of the hybrid reinforcement learning energy consumption prediction model, it is determined whether the future energy consumption will exceed a preset threshold. If the predicted energy consumption is lower than the threshold, the current device state is maintained. If the predicted energy consumption is higher than the threshold, a resource scheduling strategy is triggered, automatically increasing the air conditioning temperature setpoint by 1°C and turning off unnecessary lighting. The decision formula is as follows: ; In the formula, The preset energy consumption threshold, In order to implement energy-saving strategies, To maintain the current state of the equipment.
4. The smart city building cluster management method according to claim 1, characterized in that, In step four, the latest energy consumption data, environmental parameters, and pedestrian flow data are collected every 24 hours for incremental model training. The Q-Learning strategy table is updated using the reward value, and the real-time update formula is as follows: ; In the formula, The learning rate is used to control the speed at which the Q-value is updated. This is a discount factor used to measure the importance of future rewards. For the next state Next, select the optimal action. The largest value.
5. The smart city building cluster management method according to claim 1, characterized in that, In step five, the hybrid reinforcement learning energy consumption prediction model is compared with the traditional ARIMA model to verify the accuracy improvement and energy saving effect of the hybrid reinforcement learning energy consumption prediction. The performance of the updated model is evaluated through verification experiments. If the effect is worse than that of the traditional ARIMA model, the model is updated again in step four.
6. A smart city building cluster management system, characterized in that, It includes a data acquisition and processing module, a big data modeling and analysis module, a dynamic resource scheduling and decision-making module, a security monitoring and optimization module, and a system self-learning module; The data acquisition and processing module is used to collect building internal and external environmental parameters, energy consumption data, and pedestrian flow data in real time, and to clean, remove outliers, fill in missing values, and standardize the raw data. The big data modeling and analysis module receives standardized data from the data acquisition and processing module. Based on the hybrid reinforcement learning energy consumption prediction model, combined with the long short-term memory network module and the Q-Learning algorithm, it is used to realize dynamic and adaptive energy consumption prediction, and provide energy demand prediction and population distribution prediction for a period of time in the future. The dynamic resource scheduling decision module dynamically adjusts the status of air conditioning and lighting equipment and elevator operation strategies based on energy consumption and passenger flow prediction results, generates energy-saving strategies and emergency evacuation plans, assists management personnel in decision-making, receives prediction results from the big data modeling and analysis module, and outputs control commands to building equipment. The safety monitoring and optimization module receives real-time safety data from the sensor network, outputs alarm information to management personnel and the emergency system, monitors fire and safety hazards in real time through smoke detectors and camera equipment, triggers emergency response, analyzes user behavior data, and optimizes elevator operation mode and water and power supply strategies. The system's self-learning module continuously collects historical and real-time data, trains machine learning models, receives prediction errors from the big data modeling and analysis module and scheduling effects from the dynamic resource scheduling decision module, and outputs optimized model parameters.