Air conditioner load prediction method and system based on feature engineering and lstm

By combining feature engineering with LSTM, the problems of abnormal data screening, insufficient feature dimensions, and model overfitting in air conditioning load forecasting are solved, and high-precision air conditioning load forecasting is achieved.

CN122243686APending Publication Date: 2026-06-19CHANGJIANG SURVEY PLANNING DESIGN & RES CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHANGJIANG SURVEY PLANNING DESIGN & RES CO LTD
Filing Date
2026-03-31
Publication Date
2026-06-19

Smart Images

  • Figure CN122243686A_ABST
    Figure CN122243686A_ABST
Patent Text Reader

Abstract

This invention provides a method and system for predicting air conditioning load based on feature engineering and LSTM. After collecting outdoor temperature, relative humidity, and air conditioning load data, it uses K-means clustering to identify and process outliers and missing values. The input dimensions are enriched by constructing temperature and humidity indices, hysteresis, rolling window statistics, difference, and time-series features. A hybrid model combining linear regression and decision trees is used to evaluate feature importance and select key features. Based on an LSTM neural network model, trained using the AdamW optimizer and early shutdown mechanism, an air conditioning load prediction model is constructed to predict air conditioning load. This invention effectively solves the problems of limitations in outlier identification, insufficient feature dimensions, and model overfitting, achieving high-precision and stable air conditioning load prediction, providing reliable data support and technical assurance for building energy-saving operation.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of energy management technology and relates to a method and system for predicting air conditioning load. Background Technology

[0002] Air conditioning systems account for over 30% of building operating energy consumption. Insufficient predictive analysis of actual operating energy consumption during the design phase, coupled with the fact that air conditioning systems often operate at partial load during actual operation, makes accurate prediction crucial for developing efficient operating plans. Currently, data-driven prediction methods based on IoT and AI technologies have been explored, such as SVR, random forests, artificial neural networks, and PCA-LSTM. However, each method has its advantages and disadvantages, and existing solutions like PCA-LSTM suffer from shortcomings such as not considering the impact of holidays and not enriching sample features through feature derivation.

[0003] The existing technologies have the following problems: First, the methods for screening outlier data have limitations: the four-part interval method and the 3σ method only identify outliers from the distribution patterns of a single variable, which cannot cope with complex outlier situations and are difficult to effectively identify hidden outlier samples through a single dimension. Multi-dimensional comprehensive analysis is required. Second, the feature dimension is insufficient: only the collected meteorological parameters are used as training features, resulting in a small feature dimension, which easily leads to underfitting of the model and generalized ability. Third, the model is prone to overfitting: when using Long Short-Term Memory Neural Network (LSTM) for prediction, the Adam optimizer is often used. However, air conditioning energy consumption data has the characteristics of strong seasonality, high noise, and small sample size, which easily leads to overfitting of the LSTM model. Summary of the Invention

[0004] To address the limitations, insufficient feature dimensions, and easy overfitting issues of the abnormal data screening methods in air conditioning load forecasting described in the background art, this invention provides an air conditioning load forecasting method and system based on feature engineering and LSTM.

[0005] In a first aspect, the present invention provides an air conditioning load forecasting method based on feature engineering and LSTM, comprising: S1. Use the building's mini weather station to collect outdoor temperature and relative humidity over a period of time, and use the energy meter in the energy station to collect the air conditioning system load over a period of time. S2. The K-means clustering algorithm is used to perform multi-dimensional clustering analysis on the collected outdoor temperature, relative humidity and air conditioning system load data. Outliers are identified by calculating the distance from each sample to its cluster center. The outliers are replaced by the mean of adjacent time points, and the missing values ​​are imputed by the mean of two time points before and after, so as to obtain the preprocessed data. S3. Perform feature derivation on the preprocessed data, including constructing temperature and humidity index features, hysteresis features, rolling window statistical features, difference features, and time features; S4. Use a hybrid model of linear regression and decision tree to evaluate the importance of all derived features. Weight the scores of the two models to obtain the comprehensive importance score of each feature. Use the elbow rule to determine the screening threshold to select the features with the highest comprehensive importance scores and normalize them as input data for model training and validation. S5. Based on the LSTM neural network model, the sliding window technique is used to construct sequence samples, the AdamW optimizer is used to update the model parameters, and the early shutdown mechanism control and prediction process are combined. The input data of the above model training and prediction are used to complete the model training and obtain the air conditioning load prediction model. S6. Obtain the outdoor temperature and relative humidity for a future period of time, and input them into the air conditioning load prediction model to predict the air conditioning load for a future period of time. Perform inverse normalization on the prediction results to obtain the predicted value of the air conditioning load for a future period of time.

[0006] Further, step S2 includes: S21. The collected outdoor temperature, relative humidity and air conditioning system load data are standardized using StandardScaler to transform the data into a distribution with a mean of 0 and a standard deviation of 1, eliminating the differences between different units. S22. For the data after feature standardization, the elbow rule is used to determine the optimal number of clusters k. First, iterate through the values ​​from k=2 to k=10, perform K-means clustering for each k value and calculate the in-cluster sum of squares (WCSS). Select the k value corresponding to the point of sudden change in WCSS decreasing acceleration as the optimal number of clusters. S23. Train the K-means model using the k value corresponding to the optimal number of clusters, assign a corresponding cluster label to each sample in the dataset, calculate the Euclidean distance from each sample to the center of its cluster, and convert the distance into a Z-score to quantify the significance of the sample's deviation from the cluster center. S24. Set an anomaly detection threshold based on the Z-score, mark samples that exceed the threshold as abnormal samples, replace the identified outliers with the mean of adjacent time points, and imput missing values ​​with the mean of two consecutive time points to obtain the preprocessed data.

[0007] Furthermore, in step S3, the method for feature derivation from the preprocessed data includes: S31. Based on the Steadman temperature and humidity index theory, the temperature and humidity index HI is calculated using the collected outdoor temperature T and relative humidity RH. The calculation formula is: HI = -42.379 + 2.04901523×T + 10.14333127×RH - 0.22475541×T×RH - 6.83783×10-3 ×T 2 - 5.481717×10 -2 ×RH 2 + 1.22874×10 -3 ×T 2 ×RH + 8.5282×10 -4 ×T×RH 2 - 1.99×10 -6 ×T 2 ×RH 2 The temperature and humidity index is used as a characteristic of the temperature and humidity index. S32. For the existing features in the preprocessed data, calculate the difference between the current feature value and the feature value at the corresponding time point before the time interval according to the time intervals of 1 hour, 2 hours, 6 hours, 8 hours and 24 hours respectively, and construct the lag feature; S33. Using a 24-hour rolling window, extract the mean and standard deviation of each existing feature in the preprocessed data within the window as the rolling window statistical features. S34. Calculate the difference between adjacent time points in the time series for the preprocessed data at time intervals of 1 hour, 12 hours, and 24 hours respectively, and construct the difference feature. S35. Extract time features such as hour, weekday, weekend, and year using timestamp information, and construct time features by performing sine and cosine encoding on the periodic time features.

[0008] Furthermore, step S4 includes: S41. Linear Regression Analysis Path: For each derived feature, establish a univariate linear regression model with the target variable, calculate its regression coefficient, t-statistic, and p-value, and evaluate the independent linear explanatory power of each feature; then perform multicollinearity diagnosis, calculate the variance inflation factor to identify and process highly correlated features, comprehensively consider the absolute value and statistical significance of the regression coefficient, and calculate the linear regression importance score for each feature. S42, Decision Tree Analysis Path: Train an ensemble model consisting of 3 decision trees, use the built-in mechanism of the tree model to calculate the importance score of each feature based on impurity reduction, and use the permutation importance method for verification. Combine the importance score of impurity reduction and the result of permutation importance to synthesize the decision tree importance score of each feature. S43. Comprehensive scoring and selection: The importance scores of linear regression and decision tree are weighted and fused to obtain the comprehensive importance score of each feature. The elbow rule is used to determine the selection threshold and select the features with the highest comprehensive importance scores. S44. Normalize the selected feature data. For time-related features, use sine and cosine coding. For other features, use minimum-maximum normalization. Use the normalized feature data as input data for model training and validation.

[0009] Furthermore, step S5 includes: S51. Set model training parameters: Divide the input data into training and validation sets in a 7:3 ratio, with a sliding window size of 48 hours and a prediction step size of 6 hours; The LSTM neural network uses a 3-layer hidden layer structure, with each layer containing 128 neurons, a dropout ratio of 0.1, a learning rate of 0.005, an early stopping parameter of 20 epochs, a maximum number of training epochs of 200, and a batch size of 64; S52. Using the sliding window technique, the historical window input data of 48 hours is used as the input feature sequence, and the air conditioning system load value after 6 hours is used as the prediction target to construct the sequence sample. S53. During the forward propagation of the model, the input data first passes through a Gaussian noise layer to enhance the model's anti-interference ability, then enters the LSTM neural network to process the complete sequence, obtain the hidden state of the last 6 time steps, and finally generates the prediction output through a fully connected layer. S54. Using the mean squared error (MSE) as the loss function, the AdamW optimizer is used for parameter updates, with weight decay set to 1×10⁻⁶. -5 After each epoch of training, the loss value is calculated on the validation set. If the validation loss does not improve for 20 consecutive epochs, the early stop mechanism is triggered, the current best model parameters are saved and training is terminated, and the air conditioning load prediction model is obtained.

[0010] Furthermore, in step S5, a uniform random seed is set for the built-in random library of the LSTM neural network.

[0011] Furthermore, step S6 also includes evaluating the predicted air conditioning load, with evaluation indicators including mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R²). 2 .

[0012] Secondly, based on the above method, the present invention provides an air conditioning load forecasting system based on feature engineering and LSTM, comprising: The past data acquisition module uses the building's micro weather station to collect outdoor temperature and relative humidity over a period of time, and uses the energy station's energy meter to collect air conditioning system load over a period of time. The data preprocessing module uses the K-means clustering algorithm to perform multi-dimensional clustering analysis on the collected outdoor temperature, relative humidity and air conditioning system load data. It identifies outliers by calculating the distance from each sample to its cluster center, replaces the identified outliers with the mean of adjacent time points, and imputes missing values ​​with the mean of two consecutive time points to obtain the preprocessed data. The feature derivation module performs feature derivation on the preprocessed data, including constructing temperature and humidity index features, hysteresis features, rolling window statistical features, difference features, and time features; The input data acquisition module uses a hybrid model of linear regression and decision tree to evaluate the importance of all derived features. The scores of the two models are weighted and fused to obtain the comprehensive importance score of each feature. The features with the highest comprehensive importance scores are selected by using the elbow rule to determine the screening threshold, and then normalized as input data for model training and validation. The air conditioning load prediction model construction module is based on the LSTM neural network model. It uses the sliding window technique to construct sequence samples, uses the AdamW optimizer to update the model parameters, and combines the early shutdown mechanism control and prediction process. The model is trained using the input data of the above model training and prediction to obtain the air conditioning load prediction model. The air conditioning load prediction module obtains the outdoor temperature and relative humidity for a future period and inputs them into the air conditioning load prediction model to predict the air conditioning load for that period. The prediction results are then inversely normalized to obtain the predicted air conditioning load value for the future period.

[0013] Thirdly, the present invention provides an electronic device, characterized in that it includes: a memory and a processor, wherein the memory and the processor are communicatively connected to each other, the memory stores computer instructions, and the processor executes the computer instructions to realize the air conditioning load prediction method based on feature engineering and LSTM as described above.

[0014] Fourthly, the present invention provides a computer-readable storage medium storing a computer program, characterized in that: when the computer program is executed by a processor, it implements the air conditioning load prediction method based on feature engineering and LSTM as described above.

[0015] Compared with the prior art, the present invention has the following advantages: (1) Improve the accuracy of abnormal data identification and processing: The K-means clustering algorithm is used to perform multi-dimensional clustering analysis on outdoor temperature, relative humidity and air conditioning load. Outliers are identified by calculating the distance from the sample to the cluster center. This overcomes the limitation of traditional univariate methods (such as interquartile range method and 3σ method) that only rely on the distribution law of a single variable. It can discover hidden multi-dimensional abnormal samples. At the same time, the mean of adjacent time points is used to replace the outlier values ​​and the mean of the previous and next time points is used to interpolate the missing values ​​to ensure data quality and provide more reliable input for subsequent modeling. (2) Enriching feature dimensions to enhance model expressive power: Based on the preprocessed data, feature derivation is carried out to construct temperature and humidity index features, lag features, rolling window statistical features, difference features and time features. This breaks through the limitation of existing methods that only use the original meteorological parameters, significantly increases the feature dimensions, reduces the risk of underfitting due to insufficient features, and improves the model's adaptability and generalization performance to different working conditions. (3) Optimize feature selection to improve input effectiveness: Use a hybrid model of linear regression and decision tree to evaluate the importance of derived features, and weight and fuse the scores of the two models to obtain a comprehensive importance score. Combine the elbow rule to determine the screening threshold and retain key features. This strategy effectively eliminates redundant or low-contribution features, reduces model complexity, improves training efficiency, and reduces the interference of irrelevant features on prediction. (4) Improve the stability of model training and the ability to prevent overfitting: In the training of LSTM neural network model, AdamW optimizer is used to replace the traditional Adam optimizer. Combined with the early stop mechanism to control the training process, it effectively alleviates the overfitting problem caused by strong seasonality, high noise and low sample size of air conditioning load data, and improves the prediction accuracy and robustness of the model on the validation set. (5) Achieve high-precision and reusable air conditioning load forecasting: By constructing sequence samples through sliding window technology, the LSTM neural network model can fully learn the dynamic change patterns in the time series. Combined with the aforementioned data preprocessing, feature engineering and model optimization measures, a complete forecasting process is formed, which can stably output high-precision future air conditioning load forecasts, providing a reliable basis for building energy management.

[0016] In summary, this invention collects outdoor temperature, relative humidity, and air conditioning load data, then uses the K-means clustering algorithm to identify and process outliers and missing values ​​in multiple dimensions. It enriches the input dimensions by constructing temperature and humidity indices, hysteresis, rolling window statistics, difference, and time-series features. A hybrid model combining linear regression and decision trees is used to evaluate feature importance and select key features. Based on an LSTM neural network model, trained using the AdamW optimizer and early shutdown mechanism, an air conditioning load prediction model is constructed to predict air conditioning load. This invention effectively solves the problems of limitations in outlier identification, insufficient feature dimensions, and model overfitting, achieving high-precision and stable air conditioning load prediction, providing reliable data support and technical assurance for building energy-saving operation. Attached Figure Description

[0017] Figure 1 This is a flowchart of the method of the present invention.

[0018] Figure 2 This is a distribution map of outliers discovered using the K-means clustering algorithm.

[0019] Figure 3 This is a system architecture diagram of the present invention. Detailed Implementation

[0020] To make the technical problems, technical solutions, and beneficial effects to be solved by this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and are not intended to limit the scope of this application.

[0021] Example 1 A method for predicting air conditioning load based on feature engineering and LSTM, the flowchart of which is shown below. Figure 1 As shown, the specific steps are as follows.

[0022] S1. Use the building's mini weather station to collect outdoor temperature and relative humidity data over a period of time, and use the energy station's energy meter to collect air conditioning system load data over a period of time.

[0023] Specifically, the collection period for all data is uniformly set to 1 hour, that is, a set of outdoor temperature values, relative humidity values ​​and air conditioning system load values ​​are recorded every hour, forming a time series dataset with hourly granularity.

[0024] S2. The K-means clustering algorithm is used to perform multi-dimensional clustering analysis on the collected outdoor temperature, relative humidity and air conditioning system load data. Outliers are identified by calculating the distance from each sample to its cluster center. The outliers are replaced with the mean of adjacent time points, and the missing values ​​are imputed with the mean of two time points before and after, so as to obtain the preprocessed data.

[0025] Specifically, after collecting the raw data, the first step is to clean and process the data to improve its quality. The specific process is as follows: S21. The collected outdoor temperature, relative humidity and air conditioning system load data are standardized using StandardScaler to transform the data into a distribution with a mean of 0 and a standard deviation of 1, eliminating the differences between different units and ensuring the fairness of calculations in the subsequent clustering process. S22. For the data after feature standardization, the elbow rule is used to determine the optimal number of clusters k. First, iterate through the values ​​from k=2 to k=10, perform K-means clustering for each k value and calculate the sum of squares within the cluster (WCSS). WCSS reflects the cumulative sum of the squared distances from each sample to the center of its cluster. The smaller the value, the higher the compactness of the cluster. Calculate the decreasing acceleration of WCSS as k changes, and select the k value corresponding to the point of abrupt change in the decreasing acceleration of WCSS as the optimal number of clusters. Specifically, the range of the number of candidate clusters is set as k=2,3,…,K max For each value of k, perform N K-means clustering operations with a fixed random seed (N=10 in this embodiment). Calculate the sum of squares within each cluster (WCSS) and take the average of the N results as the WCSS(k) corresponding to that k. By averaging the WCSS across multiple clustering operations, the sensitivity of the K-means algorithm to the random selection of initial cluster centers is eliminated. Based on this, calculate the first difference ΔWCSS(k) = WCSS(k) - WCSS(k+1) and the second difference ΔWCSS(k+1). 2 WCSS(k) = ΔWCSS(k) - ΔWCSS(k+1), choosing the second-order difference Δ 2 WCSS(k) takes the k value corresponding to the maximum value as the optimal cluster number, thereby objectively determining the inflection point position in the elbow rule in a programmed manner and avoiding the subjectivity of manual observation.

[0026] S23. Train a K-means model using the k-value corresponding to the optimal number of clusters, assign a corresponding cluster label to each sample in the dataset, calculate the Euclidean distance from each sample to its cluster center, and convert this distance into a Z-score to quantify the significance of the sample's deviation from the cluster center; samples in abnormal operating states will form outliers far from normal data clusters in multidimensional space, and their Z-score values ​​will be significantly higher, such as... Figure 2 As shown.

[0027] Specifically, based on the optimal number of clusters determined in S22, K-means clustering is performed on the standardized three-dimensional feature data (temperature, humidity, load) to obtain each cluster and its cluster center. For the i-th sample within each cluster c, the Euclidean distance from the sample to its cluster center is calculated: , in Let j be the value of the sample on the j-th standardized feature dimension. Let j be the value of the cluster center in the j-th dimension, where j=1,2,3 correspond to the standardized temperature, humidity, and load characteristics, respectively. The standardized features are used to calculate the Euclidean distance to eliminate the influence of different physical dimensions on the distance metric.

[0028] Since the distance distribution from samples within different clusters to their respective centers may differ significantly, the Z-score is calculated separately for each cluster to improve the accuracy of outlier identification. The mean distance within cluster c is calculated as follows: , The standard deviation is calculated as follows: , in Let be the number of samples within cluster c. The Z-score for each sample is calculated as follows: , In this embodiment, an anomaly detection threshold is set. Soon The sample is determined to be an abnormal operating state sample. This threshold means that the distance of the sample from the cluster center deviates from the average level within the cluster by more than 3 standard deviations, which corresponds to an extreme probability of about 0.27% under the normal distribution assumption, thus providing a clear quantitative basis for the abnormal state identification in the subsequent S24 step.

[0029] S24. Set an anomaly detection threshold based on the Z-score, mark samples that exceed the threshold as abnormal samples, replace the identified outliers with the mean of adjacent time points, and imput missing values ​​with the mean of two consecutive time points to obtain the preprocessed data.

[0030] Specifically, data repair and feature enhancement processing are performed on the abnormal operating state samples identified in S23: During the data repair phase, for (In this embodiment) For abnormal samples, with the time point t where the sample is located as the center, normal samples are searched within a range of W sampling intervals before and after (in this embodiment, W=3, the sampling interval is 1 hour, i.e., 3 hours before and after). Linear interpolation is used to replace the abnormal values. The linear interpolation formula is: , in and These are the corresponding feature values ​​(temperature, humidity, or load) of the normal samples on the left and right sides closest to the anomaly point, respectively. The timestamp of the abnormal sample. and These are the timestamps for the left and right normal samples, respectively. This interpolation method is performed independently for the three feature channels: temperature, humidity, and load.

[0031] The rules for handling boundary cases and consecutive anomalies are as follows: When there are no normal samples on one side of an anomaly point within W sampling intervals, the search range is gradually expanded towards that side until the nearest available normal sample is found; if the anomaly point is located at the beginning of the time series, resulting in no available normal samples on the left, then the two nearest normal samples on the right are used. (timestamp) )and (timestamp) , Perform linear extrapolation: , Similarly, the two normal samples on the left are extrapolated to the end case. When multiple consecutive outliers constitute an outlier segment, the nearest normal samples at both ends of the outlier segment are used as endpoints. Piecewise linear interpolation is performed on each point in the outlier segment according to its time position to ensure that the repaired data has a smooth transition characteristic on the time axis.

[0032] In addition, considering the significant impact of the difference in the location of people on weekends and weekdays on building air conditioning energy consumption, a column representing the weekday attribute at the corresponding time point is added to the dataset, and the attribute is processed by sine and cosine encoding to convert the discrete weekday information into a continuous numerical representation, which is convenient for subsequent LSTM neural network model learning.

[0033] During the feature enhancement stage, sine and cosine cyclic encoding is used for the weekday and hour information, which have periodicity in the time features, to avoid directly inputting periodic features as linear values ​​into the model. The encoding formula for the weekday feature is: , , Where `weekday` is the weekday number, with values ​​from 0 to 6 corresponding to Monday through Sunday. Through sine and cosine dual-channel encoding, Sunday (`weekday=6`) and Monday (`weekday=0`) have close numerical distances in the encoding space, thus preserving the periodic continuity of the weekday feature. The encoding formula for the hour feature is: , , Where 'hour' is the hour number, ranging from 0 to 23. Similarly, this encoding maintains continuity between 23:00 and 0:00 in the encoding space. After the above processing, the original two scalar features, day of the week and hour, are replaced with two encoding channels, and the final feature vector of each sample includes: temperature, humidity, load, and... , , , There are 7 dimensions in total.

[0034] S3. Perform feature derivation on the preprocessed data, including constructing temperature and humidity index features, hysteresis features, rolling window statistical features, difference features, and time features.

[0035] Specifically, methods for feature derivation from preprocessed data include: S31. Based on the Steadman temperature and humidity index theory, the temperature and humidity index HI is calculated using the collected outdoor temperature T and relative humidity RH. The calculation formula is: HI = -42.379 + 2.04901523×T + 10.14333127×RH - 0.22475541×T×RH - 6.83783×10 -3 ×T 2 - 5.481717×10 -2 ×RH 2 + 1.22874×10 -3 ×T 2 ×RH + 8.5282×10 -4 ×T×RH 2 - 1.99×10 -6 ×T 2 ×RH 2 Where T is the temperature in Fahrenheit (°F) and RH is the relative humidity (%), the temperature and humidity index is used as the temperature and humidity index feature; the temperature and humidity index comprehensively reflects the synergistic effect of temperature and humidity on human comfort, and adding it as a new feature to the feature set helps the LSTM neural network model to more accurately characterize the relationship between air conditioning load and environmental state. S32. For the existing features in the preprocessed data, calculate the difference between the current feature value and the feature value at the corresponding time point before the time interval according to the time intervals of 1 hour, 2 hours, 6 hours, 8 hours and 24 hours respectively, and construct the lag feature; the lag feature can reflect the influence of the data change trend of the previous period on the current value, and provide the LSTM neural network model with contextual information in the time dimension; Specifically, for the three continuous features—outdoor temperature, relative humidity, and air conditioning load—in the data preprocessed by S24, multi-time-scale lag features are constructed respectively. Let the data acquisition interval be Δt (Δt = 15 minutes in this embodiment), then the actual time interval corresponding to n time steps is n × Δt. The lag time intervals selected in this step and their corresponding step numbers are as follows: 1 hour corresponds to... One time step; 24 hours correspondence Each time step.

[0036] For any feature x, the lag difference feature with time step n is defined as: in, The characteristic value at the current time point t, This represents the characteristic value for the first n time steps (i.e., before the n×Δt time interval). This difference feature reflects the amount and direction of change of the feature within the specified time interval.

[0037] For the three characteristics of outdoor temperature T, relative humidity H, and air conditioning load L, the lag difference characteristics of n=4 (corresponding to 1 hour) and n=96 (corresponding to 24 hours) are calculated respectively, generating a total of 3×2=6 new characteristics.

[0038] Boundary handling: In the initial stage of the time series, when t < n, ... If a feature does not exist, the corresponding lag difference feature value is set to 0 (indicating no change), and the sample is retained in subsequent model training. This padding strategy is applicable to the boundary cases of all lag features, avoiding a reduction in training data due to discarding samples.

[0039] S33. Using a 24-hour rolling window, extract the mean and standard deviation of each existing feature in the preprocessed data within the window as the rolling window statistical features; the rolling mean can smooth short-term fluctuations and reflect the overall trend of the data, while the rolling standard deviation can characterize the dispersion and volatility of the data within the window period. For the four features in the preprocessed data of S24 (outdoor temperature T, relative humidity H, air conditioning load L, and temperature and humidity index HI constructed in S31), rolling statistical features are constructed based on the sliding window method. Let the data acquisition interval be Δt (Δt = 15 minutes in this embodiment), and the sliding window size be M time steps (the window length is 24 hours in this embodiment, corresponding to M = 24 × 60 / 15 = 96 time steps). The window moves forward in a time-point sliding manner, that is, it moves forward by one sampling interval (Δt = 15 minutes) each time.

[0040] For any feature x, the rolling mean at time point t is defined as: , The rolling standard deviation is defined as: , Where M is the number of time steps contained in the window. This involves backtracking j time steps from the current time point t to obtain the feature values. The rolling mean and rolling standard deviation are calculated for the four features: outdoor temperature, relative humidity, air conditioning load, and temperature-humidity index, generating a total of 4 × 2 = 8 new features.

[0041] For boundary handling: In the initial stage of the time series (t < M-1), when there are fewer than M time steps of available data within the window, a partial window calculation strategy is adopted. That is, the mean and standard deviation are calculated using the currently available t+1 data points. When there is only one data point (t=0), the rolling mean is equal to the feature value of that point, and the rolling standard deviation is set to 0. This strategy ensures that all time points are retained in the dataset and that samples are not discarded due to window boundary issues.

[0042] S34. For the preprocessed data, calculate the difference between adjacent time points in the time series according to time intervals of 1 hour, 12 hours and 24 hours respectively, and construct the difference feature; the difference operation can eliminate the trend component in the time series and highlight the rate of change and fluctuation pattern of the data. Specifically, for the three continuous features—outdoor temperature T, relative humidity H, and air conditioning load L—in the data preprocessed by S24, multi-time-scale differential features are constructed respectively. Let the data acquisition interval be Δt (Δt = 15 minutes in this embodiment), and the calculation of the differential features is performed at the specified step size corresponding to the time interval. The differential time intervals and corresponding step sizes selected in this step are as follows: 1 hour corresponds to... One time step; 12 hours corresponds to One time step; 24 hours correspondence Each time step.

[0043] For any feature x, the difference feature with step size n is defined as: , in, The characteristic value at the current time point t, The feature values ​​are the corresponding time points for the first n time steps (i.e., before the n×Δt time interval). The difference results are directional numerical values, with positive values ​​indicating that the feature increases over time and negative values ​​indicating that it decreases. This difference feature is calculated in the same way as the lag difference feature in S32, but the selected time scale is different (a 12-hour scale is added in this step) to capture the changing trend of the feature over different time spans.

[0044] Specifically, for the three characteristics of outdoor temperature T, relative humidity H, and air conditioning load L, differential features of n=4 (1 hour), n=48 (12 hours), and n=96 (24 hours) are calculated respectively, generating a total of 3×3=9 new features.

[0045] For boundary handling: Same as S32, when t < n, it leads to If the sample does not exist, set the corresponding difference feature value to 0 and retain the sample for subsequent training.

[0046] S35. Use timestamp information to extract time features such as hour, weekday, whether it is a weekend, and year, and perform sine and cosine encoding on the periodic time features to construct time features; time features can help the model capture the regular change patterns of air conditioning load over time, such as intraday, intraweek, and intrayear cycles.

[0047] Specifically, time features are classified and encoded. Based on their mathematical properties, time features are divided into two categories: periodic features and aperiodic features, each using different encoding methods. (I) Sine and Cosine Encoding of Periodic Time Features: Sine and cosine dual-channel encoding is used for time features with inherent periods to preserve their cyclical continuity. The specific encoding formula is as follows: Hourly feature encoding (period P=24): , , Here, hour is the hour number, ranging from 0 to 23. This encoding ensures that 23:00 and 0:00 remain continuous in the encoding space.

[0048] Weekday feature encoding (period P=7): , , Here, weekday is the weekday number, with values ​​from 0 to 6 corresponding to Monday to Sunday, respectively. Through sine and cosine dual-channel encoding, Sunday (weekday=6) and Monday (weekday=0) have a close numerical distance in the encoding space.

[0049] Monthly feature coding (period P=12): , , Where month is the month number, ranging from 1 to 12.

[0050] The general formula for the above encoding is as follows: For a time feature with a period of P, the sine and cosine codes for its value v are respectively... and After sine and cosine coding, the three scalar features of hour, day of the week, and month are replaced by two coding channels, generating a total of 6 coding features.

[0051] (ii) Processing of non-periodic time features: "whether it is a weekday" and "whether it is a weekend" are mutually exclusive. Only one of them is retained as a binary feature. In this embodiment, the "whether it is a weekend" feature is retained and encoded as follows: weekend (Saturday and Sunday) takes the value 1, and weekday (Monday to Friday) takes the value 0. This feature is directly derived from the weekday number. When weekday ≥ 5, it takes the value 1, otherwise it takes the value 0.

[0052] The year feature is a linearly increasing, non-periodic feature; directly inputting it into the model may introduce unnecessary trend bias. In this embodiment, the year feature is standardized: ,in and These are the minimum and maximum values ​​of the year in the training dataset, respectively, and the year is mapped to the interval [0,1].

[0053] After the above processing, the time features ultimately include: , , , , , Is it a weekend (two values)? There are a total of 8 dimensions.

[0054] Using the five feature derivation methods described above, 60 new features can be added to the original three features, resulting in a total of 63 candidate features.

[0055] S4. Use a hybrid model of linear regression and decision tree to evaluate the importance of all derived features. Weight the scores of the two models to obtain the comprehensive importance score of each feature. Use the elbow rule to determine the screening threshold to select the features with the highest comprehensive importance scores and normalize them as input data for model training and validation.

[0056] In this embodiment, the 63 candidate features derived from the feature derivation may contain redundant and noisy features, and directly inputting all of them into the model may lead to overfitting. Therefore, this embodiment uses a hybrid model combining linear regression and decision trees for feature selection, taking into account both linear and nonlinear relationships in evaluating the importance of features. The specific process is as follows: S41. Linear Regression Analysis Path: First, perform univariate analysis, establishing a simple linear regression model for each derived feature with the target variable (air conditioning system load), calculating its regression coefficients, t-statistics, and p-values, and assessing the independent linear explanatory power of each feature; then, perform multicollinearity diagnosis, identifying and labeling variables that are highly linearly correlated with other features by calculating the variance inflation factor (VIF) of each feature, comprehensively considering the absolute value of the regression coefficients and statistical significance, and calculating the linear regression importance score for each feature; the absolute value of the regression coefficients reflects the effect size, and statistical significance is the size of the p-value, the smaller the p-value, the more important it is. A feature with a high linear regression importance score indicates that it has a strong independent explanatory contribution to the target variable within the linear framework.

[0057] Specifically, for all candidate features processed by S35, a univariate linear regression model with an intercept term (OLS, ordinary least squares) is constructed one by one to evaluate the individual explanatory power of each feature for air conditioning load forecasting.

[0058] For the j-th candidate feature, the following univariate linear regression model is established: , Where y is the air conditioning load (target variable). For the j-th candidate feature, For the intercept term, For regression coefficients, This represents the residual term. Before modeling, each feature and target variable is standardized using Z-score (subtracting the mean and dividing by the standard deviation) to eliminate the influence of dimensional differences on the magnitude of the regression coefficients, making the regression coefficients of different features comparable.

[0059] After estimating the regression coefficients using OLS, the t-statistic of the regression coefficients is calculated: , in, The standard error of the regression coefficients. The corresponding two-sided p-values ​​are calculated based on the t-statistic. The smaller the p-value, the more significant the linear relationship between the feature and the target variable.

[0060] The formula for calculating the importance score of linear regression is: , in, The absolute value of the standardized regression coefficient (reflecting the magnitude of the effect). The statistical confidence weight is denoted by p-value (the smaller the p-value, the closer the weight is to 1). This formula multiplies the effect size by the statistical significance, achieving a quantitative fusion of the two: a higher score is obtained only when the feature has both a large regression coefficient and a significant p-value.

[0061] To facilitate subsequent integration with decision tree scoring, the importance scores of linear regression are Min-Max normalized and mapped to the [0,1] interval: , Simultaneously, the variance inflation factor (VIF) of each feature is calculated to detect multicollinearity: , in, Let VIF be the coefficient of determination obtained by performing a multiple linear regression with the j-th feature as the dependent variable and all other features as independent variables. In this embodiment, the VIF threshold is set to 10. If the feature is considered to be highly collinear with other features, it will be weighted down in subsequent feature selection (its linear regression importance score will be multiplied by a decay coefficient of 0.5).

[0062] S42. Decision Tree Analysis Path: Train an ensemble model consisting of 3 decision trees, use the built-in mechanism of the tree model to calculate the importance score of each feature based on impurity reduction, and use the permutation importance method for verification. Combine the importance score of impurity reduction and the result of permutation importance to synthesize the decision tree importance score of each feature.

[0063] Specifically, a random forest ensemble model based on the Bagging strategy (containing 3 decision trees) is constructed to evaluate the importance of each feature. The hyperparameters of each decision tree are set as follows: maximum depth max_depth=10, minimum number of split samples min_samples_split=20, and the impurity measure adopts the mean squared error (MSE) reduction criterion. During training, each tree is sampled with replacement using Bootstrap sampling (sampling ratio is 100% of the training set samples), and samples are randomly selected at each split. 10 candidate features (p is the total number of features) are used to enhance inter-tree diversity.

[0064] (I) Feature Importance Based on Impurity Reduction (MDI): For regression problems, impurity is measured using mean squared error. In each decision tree, the amount of impurity reduction of feature j at a certain split node is: in The MSE of the parent node before the split. and These are the MSEs of the left and right child nodes, respectively. and These represent the sample proportion weights of the left and right child nodes, respectively. The MDI importance of feature j in a single tree is the sum of the reductions in impurity of that feature across all split nodes, divided by the sum of the reductions in impurity of all features for normalization. The final MDI importance is the average of the three trees. in Let be the normalized MDI importance of feature j in the k-th tree. The sum of the MDI importance of the three trees is normalized to a sum of 1.

[0065] (II) Feature Importance Based on Permutation Importance: Permutation importance is evaluated on the validation set (not the training set, to avoid overfitting bias), and the mean squared error (MSE) is used as the metric. For feature j, the specific steps are as follows: First, calculate the baseline MSE of the model on the validation set, denoted as... Then, the columns of feature j in the validation set are randomly permuted (shuffled), and the model MSE is recalculated, denoted as . Repeat the above permutation process N_perm times (N_perm=30 in this embodiment), and take the average value. The permutation importance of feature j is defined as: in Let MSE be the difference between the MSE after the r-th permutation and the baseline MSE, i.e. The greater the importance of the permutation, the more the model's performance deteriorates after the feature is shuffled, indicating a higher degree of model dependence on that feature.

[0066] (III) Decision Tree Comprehensive Importance Score: First, the importance of MDI and the importance of permutation are respectively mapped to the [0,1] interval using Min-Max normalization, and then an equal-weighted average is used for fusion: in and These represent the normalized MDI importance and the permutation importance, respectively. Equal-weighted fusion is used because MDI focuses on the splitting contribution of features during the training phase, while permutation importance focuses on the prediction contribution during the validation phase; the two complement each other.

[0067] S43. Comprehensive Scoring and Screening: The importance scores of linear regression and decision tree are weighted and fused to obtain the comprehensive importance score of each feature. The elbow rule is used to determine the screening threshold, and the features with the highest comprehensive importance scores are selected. In this embodiment, the top 10 features with the highest comprehensive importance scores are selected, plus the three basic dimensions of temperature (t_out), humidity (h_out), and day of week (day_of_week), for a total of 13 features as the final model input. The 13 features selected are: temperature (t_out), humidity (h_out), day of week (day_of_week), rolling mean of air conditioning load (24h), air conditioning load 1-hour lag difference, rolling mean of outdoor temperature (24h), temperature and humidity index (THI), outdoor temperature, cos_hour, sin_hour, rolling standard deviation of relative humidity (24h), air conditioning load 24-hour difference, and whether it is a weekend.

[0068] Specifically, the feature importance scores of S41 and S42 are weighted and fused, and the final subset of features to be retained is determined based on the elbow rule.

[0069] The weighted fusion formula for the overall importance score is: in The importance score of the normalized linear regression in S41 (which already includes VIF weighting). The overall importance score of the decision tree in S42 is already within the [0,1] interval. and These are the fusion weights for linear regression scores and decision tree scores, respectively. In this embodiment, they are set as follows: , The decision tree score is given higher weight because the decision tree model can capture non-linear relationships and is closer to the learning paradigm of subsequent prediction models.

[0070] Sort all candidate features from highest to lowest based on their overall importance score, and plot the feature ranking curve (horizontal axis represents feature number k=1,2,...,p, vertical axis represents the corresponding overall score). Use the same second-order difference method as S22 to determine the screening inflection point: calculate the first-order difference. and second-order difference The k value corresponding to the maximum value of the second difference is selected as the number of features to be retained. The feature scores after this point tend to decrease gradually and contribute little to the model.

[0071] In this embodiment, the top 10 features selected through the above screening process are ranked from highest to lowest based on their overall importance score as follows: rolling mean of air conditioning load (24h), air conditioning load 1-hour lag difference, rolling mean of outdoor temperature (24h), temperature and humidity index (THI), outdoor temperature, cos_hour, sin_hour, rolling standard deviation of relative humidity (24h), air conditioning load 24-hour difference, and whether it is a weekend. These 10 features constitute the final model input feature set.

[0072] S44. Normalize the selected feature data. Time-related features are encoded using sine and cosine, while other features are normalized using minimum-maximum normalization. The normalized feature data is used as input data for model training and validation. In this embodiment, the final selected 13-dimensional feature dataset is normalized according to data type: periodic time features such as weekdays and hours are converted using sine and cosine encoding to map discrete periodic information into continuous pairs of sine and cosine values, preserving their cyclical characteristics; the remaining non-periodic features are normalized using the minimum-maximum normalization method to linearly scale the data to the [0,1] interval; the label variable (air conditioning system load) is normalized separately using minimum-maximum normalization, and the scaling weight parameters in the normalization process are retained so that the model output results can be denormalized and restored during the prediction stage.

[0073] Specifically, the features retained after S43 screening are finally encoded and normalized to make them suitable for input into the prediction model.

[0074] (I) Sine and Cosine Encoding of Time-Related Features: In this step, "time-related features" specifically refers to the periodic time features (such as sin_hour, cos_hour, sin_week, cos_week, etc.) retained in the S43 screening results and generated from the time feature encoding step in S35. Since these features have already undergone sine and cosine encoding in S35, this step confirms their encoding formulas and ensures consistency. Taking the hour feature as an example, the encoding formula is: , , Where hour is the hour number (values ​​from 0 to 23). The encoding methods for weekday and month features are similar (cycles of 7 and 12 respectively), see S35 for specific formulas. "Whether it is a weekend" is a binary feature (0 / 1), requiring no additional encoding. The feature values ​​after sine and cosine encoding are in the range of [-1, 1], which will be uniformly processed in the subsequent Min-Max normalization.

[0075] (II) Min-Max Normalization: Min-Max normalization is performed on all features selected by S43 (including continuous features, rolling statistical features, difference features, lag features, and time features encoded with sine and cosine) and the target variable (air conditioning load), mapping the values ​​to the [0,1] interval. The normalization formula is: , in and These are the minimum and maximum values ​​of the feature in the training set, respectively. Special emphasis: normalization parameters ( and The parameters are calculated only from the training set, and the validation and test sets are transformed using the normalized parameters of the training set to avoid data leakage. When the feature is constant, the normalization result is set to 0 to avoid division by zero error.

[0076] Saving normalization parameters: For each feature and target variable, save its corresponding... and The features are stored as key-value pairs in a JSON configuration file (scaler_params.json), where the keys are the feature names and the values ​​are dictionaries containing the min and max fields. This file is loaded during model deployment and is used to perform the same normalization transformation on the real-time input data.

[0077] (III) Inverse Normalization of the Target Variable. The air conditioning load (target variable y) is also normalized using Min-Max, and the minimum value of the load in the training set is denoted as... The maximum value is The model's predicted output is a normalized predicted value y′, which needs to be restored to the actual load value through inverse normalization. , in This represents the actual load forecast after inverse normalization. During the inverse normalization process, if... (In the extreme case where the training set load is constant), return directly. As a prediction result. The inverse normalization required. and The parameters are also stored in the scaler_params.json file with the key "load".

[0078] After the above processing, all values ​​in the input feature matrix of the model are normalized to the [0,1] interval, and the target variable is also normalized to the [0,1] interval, ensuring that each feature has the same numerical scale during the model training process, while preserving the complete inverse normalization path to recover the true prediction value.

[0079] S5. Based on the LSTM neural network model, a sliding window technique is used to construct sequence samples. The AdamW optimizer is used to update the model parameters. Combined with the early shutdown mechanism control and prediction process, the model is trained using the input data of the above model training and prediction to obtain the air conditioning load prediction model.

[0080] Specifically, the process of constructing the air conditioning load forecasting model is as follows: S51. Setting Model Training Parameters: The input data is divided into training and validation sets in a 7:3 ratio. The sliding window size is 48 hours, enabling the model to make predictions based on historical data from the past two days. The prediction step size is 6 hours, meeting the application requirements for medium-term load forecasting. In this embodiment, the feature dimension is 13, including multi-dimensional information such as time features, temperature and humidity, temperature and humidity index, and their derived features. The LSTM neural network adopts a 3-layer hidden layer structure, with each layer containing 128 neurons. This deep architecture can effectively capture complex dependency patterns in time series. The dropout ratio is set to 0.1, the learning rate is set to 0.005 to ensure stable convergence of training, the early stopping parameter is set to 20 epochs, the maximum number of training epochs is 200, and the batch size is set to 64. In addition, to ensure the reproducibility of the results, the same random seed is uniformly set for the built-in random libraries of the LSTM neural network, such as CPU, GPU, NumPy, and Python's built-in random library, so that the model can produce consistent training results under the same data and parameter conditions.

[0081] Specifically, the S44-normalized feature data and the target variable are used to construct a sliding window sequence of data suitable for the LSTM model. Let the data acquisition interval be... Minutes, input window length is 24 hours, corresponding to the number of time steps. 1 sampling point; prediction step size is 6 hours, corresponding to 10 time steps. One sampling point.

[0082] The sliding window uses a time-point-by-time sliding method, that is, it moves forward by 1 sampling interval each time (step size = 1, corresponding to Δt = 15 minutes). For the t-th time point in the time series (t≥1+ Construct the following input / output pair: , , in The input tensor has the following shape: ,Right now ( =96 is the number of time steps, F=10 is the number of features after S43 filtering). The output vector has the shape of ,Right now (Normalized values ​​of air conditioning load every 15 minutes over a 6-hour period). During model training, the shape of the input tensor for each batch is... Where B is the batch size (batch_size).

[0083] This embodiment uses a unidirectional stacked LSTM structure (not bidirectional), with a total of 2 stacked LSTM layers. The parameters of each layer are set as follows: the input dimension of the first LSTM layer is F=10 (number of features), and the hidden layer dimension is... This layer returns a complete sequence output (return_sequences=True), with the output shape being... A Dropout layer with a dropout rate of 0.1 is applied after the output of the first layer. During training, 10% of the neurons in the hidden layer output are randomly set to zero to prevent overfitting. The input dimension of the second LSTM layer is... The hidden layer dimension is This layer also returns a complete sequence output (return_sequences=True), with the output shape being... After the output of the second layer, a Dropout layer is also applied (Dropout rate = 0.1).

[0084] The reason why both LSTM layers return complete sequence outputs is that the first layer needs to pass the hidden state of each time step to the second layer as input; the second layer needs to return the complete sequence so that the hidden states of the last few time steps can be extracted later. The model does not use a BatchNorm layer because the statistical characteristics of time series data change over time, and BatchNorm may introduce bias.

[0085] S52. Using the sliding window technique, the input feature sequence is constructed by using 48 hours of historical window input data and the air conditioning system load value 6 hours later as the prediction target. At the same time, all sequence samples are divided into training set and validation set in a 7:3 ratio. The divided data is converted into PyTorch tensors, and TensorDataset is used to encapsulate feature and label pairs. An iterative data loader with a batch size of 64 is created through DataLoader to achieve a balance between memory usage and training efficiency.

[0086] S53. During the forward propagation of the model, the input data first passes through a Gaussian noise layer to enhance the model's anti-interference ability, then enters the LSTM neural network to process the complete sequence, obtain the hidden states of the last 6 time steps, and finally generates the prediction output through a fully connected layer.

[0087] Specifically, the complete forward propagation process of the LSTM model includes four stages: Gaussian noise injection, LSTM layer sequence modeling, hidden state truncation, and fully connected layer output.

[0088] (a) Gaussian noise layer. During the training phase, the input tensor... Add a mean of 0 and a standard deviation of Gaussian noise: , in, Let be a random noise matrix with the same shape as the input tensor, and its shape be... The standard deviation of 0.01 was chosen because the input data has been normalized to the [0,1] interval, and this noise level is approximately 1% of the data range, which enhances the robustness of the model without significantly interfering with the original signal. No noise was added during the validation and testing phases; the original input was used directly.

[0089] (II) LSTM layer sequence modeling. For each time step τ (τ=1,2,...,96), the internal calculation of the LSTM unit is as follows: Forgotten Gate: , Input Gate: , Candidate memory states: , Memory status update: Output gate: , Hidden output: , in, It is the Sigmoid activation function. This is element-wise multiplication (Hadamard product). For the input weight matrix, For the recursive weight matrix, This is the bias vector. In the first layer of the LSTM... The input feature vector (dimension F=10) is used. Dimensions In the second layer of LSTM This is the output of layer 1 (dimension 128). Dimensions .

[0090] (III) Hidden State Extraction. The complete sequence shape output by the second LSTM layer is as follows: Extract the hidden states of the last 24 time steps (corresponding to a prediction window of 6 hours). The extraction operation is as follows: , The shape of the tensor after truncation is Each sample contains 24 time steps, and the hidden state dimension of each time step is 64. The last 24 steps are chosen instead of just the last step because the hidden state of each time step encapsulates the context information up to that moment. The 24 time steps correspond to the prediction of each 15-minute period in the next 6 hours, thus preserving the temporal evolution information.

[0091] (iv) Output of the fully connected layer. The truncated hidden state tensor... Remodeling ,Right now Then input the fully connected layer. The calculation of the fully connected layer is as follows: in, The weight matrix has the following shape: ; Let be the bias vector, with shape as Output The shape is This corresponds to the predicted air conditioning load values ​​(after normalization) for 24 time steps. The activation function of the fully connected layer is linear activation (i.e., no additional nonlinear transformation is applied) because the objective variable is a continuous real-valued regression problem, and there is no need to constrain the output range.

[0092] S54. Using the mean squared error (MSE) as the loss function, the AdamW optimizer is used for parameter updates, with weight decay set to 1×10⁻⁶. -5 After each epoch of training, the loss value is calculated on the validation set. If the loss on the validation set does not improve for 20 consecutive epochs, the early stop mechanism is triggered, the current best model parameters are saved, and training is terminated to obtain the air conditioning load prediction model.

[0093] Specifically, the model is trained using the standard backpropagation process based on the mean squared error loss function and the Adam optimizer, and an early stopping strategy is combined to prevent overfitting.

[0094] (a) Loss Function: The mean squared error (MSE) is used as the loss function. , Where B is the batch size. To predict the step size, Let be the predicted value of the i-th sample at the j-th prediction time step. This corresponds to the actual value.

[0095] (II) Optimizer and Gradient Calculation: The Adam optimizer is used, with an initial learning rate of First-order moment estimation of attenuation coefficient Second-order moment estimation of attenuation coefficient The parameter update process for each training step is as follows: First, the predicted value is calculated and the MSE loss is calculated through forward propagation; then, loss.backward() is called to calculate the gradient of the loss with respect to all trainable parameters according to the chain rule based on the automatic differentiation mechanism; before performing parameter updates, gradient clipping is applied to prevent the gradient explosion problem common in LSTM. , in Let L2 norm be the gradient of all parameters. This is the gradient clipping threshold. When the gradient norm exceeds this threshold, the gradient is scaled proportionally to a norm of 1. After clipping is complete, optimizer.step() is called to update the parameters, and then optimizer.zero_grad() is called to clear the gradient cache.

[0096] (III) Training configuration: The batch size is 64, and the maximum number of training epochs is 200. The ratio of training set to validation set is 8:2. The division method is to divide the first 80% into training set and the last 20% into validation set in chronological order (without random shuffling, preserving the temporal characteristics).

[0097] (iv) Early Stop Strategy: An early stop mechanism based on validation set MSE loss is adopted, with the following specific rules: Let the early stop patience value be P = 15 rounds, and the minimum improvement threshold be... After each training epoch, the MSE loss on the validation set is calculated. and the lowest historical verification loss Comparison: , , Early stopping is triggered when the continuous counter reaches the patience value P=15, terminating training. That is, training only ends when the decrease in validation loss exceeds... Only when this is considered an "improvement" can the early stop counter be abnormally reset due to minor random fluctuations.

[0098] (v) Model Saving: Whenever the validation loss refreshes the historical best (i.e., when the counter is reset to 0), the current model state is saved. The saved content includes: model parameters (model.state_dict(), containing the weights and biases of all LSTM and fully connected layers), optimizer state (optimizer.state_dict(), containing momentum and adaptive learning rate parameters, used for resuming training after breakpoints), the current training epoch number, and the validation loss value. The above content is serialized into a .pt file in dictionary form (saved using torch.save()). After training is complete (early stopping triggered or the maximum number of epochs is reached), the model parameters with the optimal validation loss are loaded as the final model for subsequent testing and deployment.

[0099] In summary, the complete process of a single training step is as follows: forward propagation (noise injection → LSTM layer 1 → Dropout → LSTM layer 2 → Dropout → truncation of the last 24 steps → Flatten → fully connected layer) → calculation of MSE loss → backpropagation (loss.backward()) → gradient clipping (max_norm=1.0) → parameter update (optimizer.step()) → gradient clearing (optimizer.zero_grad()) → validation set evaluation → early stopping determination and model saving.

[0100] In other words, during the forward propagation of each training batch, the input data first passes through a Gaussian noise layer (this layer is only activated in training mode), injecting random noise into the input signal to enhance the model's robustness to data perturbations. The data then enters the LSTM network, which processes the input sequence time-step by time. Through its internal forget gate, input gate, and output gate mechanisms, it selectively remembers and forgets short-term and long-term information, obtaining the hidden states corresponding to the last six time steps after processing the complete sequence. These hidden states contain the model's encoding information about the load change trend over the next six hours, and are finally mapped to the load prediction output for the six time steps through a fully connected layer.

[0101] The AdamW optimizer was used for parameter optimization during training, with the learning rate set to 0.005 and the weight decay set to 1×10⁻⁶. -5 The key improvement of the AdamW optimizer compared to the traditional Adam optimizer lies in decoupling the weight decay and adaptive learning rate mechanism, avoiding the optimization bias caused by the coupling of L2 regularization and gradient normalization. When faced with the characteristics of strong seasonality, high noise, and limited sample size in air conditioning energy consumption data, it can more effectively suppress overfitting and improve the generalization ability of the model.

[0102] The loss is calculated using the mean squared error (MSE) loss function, and the MSE loss between the model's predicted value and the true label is calculated in each training batch. During the backpropagation phase, the loss gradient is propagated layer by layer from the output layer to the input layer, and the chain rule is used to calculate the contribution of each parameter to the final loss. The AdamW optimizer updates the model parameters based on the calculated gradient information.

[0103] S6. Obtain the outdoor temperature and relative humidity for a future period of time, and input them into the air conditioning load prediction model to predict the air conditioning load for a future period of time. Perform inverse normalization on the prediction results to obtain the predicted value of the air conditioning load for a future period of time.

[0104] Using the scaling weight parameters from the normalization process that were saved in step S4, the model prediction results are denormalized to convert the standardized prediction output back to the actual air conditioning load value in the original dimensions, so that the prediction results have clear physical meaning and are easy to apply in engineering and understand in business.

[0105] A comprehensive evaluation of the prediction results is conducted using multiple evaluation metrics: Mean Absolute Error (MAE) is used to measure the average magnitude of prediction bias; Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) are used to assess the model's sensitivity to larger errors; Mean Absolute Percentage Error (MAPE) is used to measure the model's relative prediction accuracy; and the Coefficient of Determination (R²) is used to evaluate the prediction results. 2 The evaluation assesses the proportion of the variance of the target variable that the model explains. Through a comprehensive analysis of the above indicators, the predictive performance and practical value of the model can be fully determined.

[0106] Example 2 An air conditioning load forecasting system based on feature engineering and LSTM, with the architecture diagram shown below. Figure 3 As shown, it consists of a past data acquisition module, a data preprocessing module, a feature derivation module, an input data acquisition module, an air conditioning load prediction model construction module, and an air conditioning load prediction module.

[0107] The past data acquisition module uses the building's micro weather station to collect outdoor temperature and relative humidity over a period of time, and uses the energy station's energy meter to collect air conditioning system load over a period of time. The data preprocessing module uses the K-means clustering algorithm to perform multi-dimensional clustering analysis on the collected outdoor temperature, relative humidity and air conditioning system load data. It identifies outliers by calculating the distance from each sample to its cluster center, replaces the identified outliers with the mean of adjacent time points, and imputes missing values ​​with the mean of two consecutive time points to obtain the preprocessed data. The feature derivation module performs feature derivation on the preprocessed data, including constructing temperature and humidity index features, hysteresis features, rolling window statistical features, difference features, and time features; The input data acquisition module uses a hybrid model of linear regression and decision tree to evaluate the importance of all derived features. The scores of the two models are weighted and fused to obtain the comprehensive importance score of each feature. The features with the highest comprehensive importance scores are selected by using the elbow rule to determine the screening threshold, and then normalized as input data for model training and validation. The air conditioning load prediction model construction module is based on the LSTM neural network model. It uses the sliding window technique to construct sequence samples, uses the AdamW optimizer to update the model parameters, and combines the early shutdown mechanism control and prediction process. The model is trained using the input data of the above model training and prediction to obtain the air conditioning load prediction model. The air conditioning load prediction module obtains the outdoor temperature and relative humidity for a future period and inputs them into the air conditioning load prediction model to predict the air conditioning load for that period. The prediction results are then inversely normalized to obtain the predicted air conditioning load value for the future period.

[0108] The specific implementation methods of each module in this system are the same as those described in Example 1, and will not be repeated here.

[0109] Example 3 An electronic device includes a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing computer instructions, and the processor executing the computer instructions to implement the air conditioning load forecasting method based on feature engineering and LSTM as described in Embodiment 1 above, and the air conditioning load forecasting system based on feature engineering and LSTM as described in Embodiment 2.

[0110] Example 4 A computer-readable storage medium storing a computer program that, when executed by a processor, implements the air conditioning load forecasting method based on feature engineering and LSTM as described in Embodiment 1 above, and the air conditioning load forecasting system based on feature engineering and LSTM as described in Embodiment 2.

[0111] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code. The solutions in the embodiments of this application can be implemented in various computer languages, such as object-oriented programming languages ​​like Java, C++, Python, and interpreted scripting languages ​​like JavaScript.

[0112] This application is described with reference to flowchart illustrations and / or block diagrams of methods, electronic devices (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing electronic device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing electronic device, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0113] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing electronic device to operate in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0114] These computer program instructions can also be loaded onto a computer or other programmable data processing electronic device to cause a series of operational steps to be performed on the computer or other programmable electronic device to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable electronic device for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0115] Although preferred embodiments of this application have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments as well as all changes and modifications falling within the scope of this application.

[0116] Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. Therefore, if such modifications and variations fall within the scope of the claims of this application and their equivalents, this application also intends to include such modifications and variations.

Claims

1. A method for predicting air conditioning load based on feature engineering and LSTM, characterized in that, include: S1. Use the building's mini weather station to collect outdoor temperature and relative humidity over a period of time, and use the energy meter in the energy station to collect the air conditioning system load over a period of time. S2. The K-means clustering algorithm is used to perform multi-dimensional clustering analysis on the collected outdoor temperature, relative humidity and air conditioning system load data. Outliers are identified by calculating the distance from each sample to its cluster center. The outliers are replaced by the mean of adjacent time points, and the missing values ​​are imputed by the mean of two time points before and after, so as to obtain the preprocessed data. S3. Perform feature derivation on the preprocessed data, including constructing temperature and humidity index features, hysteresis features, rolling window statistical features, difference features, and time features; S4. Use a hybrid model of linear regression and decision tree to evaluate the importance of all derived features. Weight the scores of the two models to obtain the comprehensive importance score of each feature. Use the elbow rule to determine the screening threshold to select the features with the highest comprehensive importance scores and normalize them as input data for model training and validation. S5. Based on the LSTM neural network model, the sliding window technique is used to construct sequence samples, the AdamW optimizer is used to update the model parameters, and the early shutdown mechanism is combined with the prediction process. The input data of the above model training and prediction are used to complete the training and verification of the model, and the air conditioning load prediction model is obtained. S6. Obtain the outdoor temperature and relative humidity for a future period of time, and input them into the air conditioning load prediction model to predict the air conditioning load for a future period of time. Perform inverse normalization on the prediction results to obtain the predicted value of the air conditioning load for a future period of time.

2. The air conditioning load forecasting method based on feature engineering and LSTM according to claim 1, characterized in that: Step S2 includes: S21. The collected outdoor temperature, relative humidity and air conditioning system load data are standardized using StandardScaler to transform the data into a distribution with a mean of 0 and a standard deviation of 1, eliminating the differences between different units. S22. For the data after feature standardization, the elbow rule is used to determine the optimal number of clusters k. First, iterate through the values ​​from k=2 to k=10, perform K-means clustering for each k value and calculate the in-cluster sum of squares (WCSS). Select the k value corresponding to the point of sudden change in WCSS decreasing acceleration as the optimal number of clusters. S23. Train the K-means model using the k value corresponding to the optimal number of clusters, assign a corresponding cluster label to each sample in the dataset, calculate the Euclidean distance from each sample to the center of its cluster, and convert the distance into a Z-score to quantify the significance of the sample's deviation from the cluster center. S24. Set an anomaly detection threshold based on the Z-score, mark samples that exceed the threshold as abnormal samples, replace the identified outliers with the mean of adjacent time points, and imput missing values ​​with the mean of two consecutive time points to obtain the preprocessed data.

3. The air conditioning load forecasting method based on feature engineering and LSTM according to claim 1, characterized in that: In step S3, the method for feature derivation of the preprocessed data includes: S31. Based on the Steadman temperature and humidity index theory, the temperature and humidity index HI is calculated using the collected outdoor temperature T and relative humidity RH. The calculation formula is: HI = -42.379 + 2.04901523×T + 10.14333127×RH - 0.22475541×T×RH - 6.83783×10 -3 ×T 2 - 5.481717×10 -2 ×RH 2 + 1.22874×10 -3 ×T 2 ×RH + 8.5282×10 -4 ×T×RH 2 - 1.99×10 -6 ×T 2 ×RH 2 The temperature and humidity index is used as a characteristic of the temperature and humidity index. S32. For the existing features in the preprocessed data, calculate the difference between the current feature value and the feature value at the corresponding time point before the time interval according to the time intervals of 1 hour, 2 hours, 6 hours, 8 hours and 24 hours respectively, and construct the lag feature; S33. Using a 24-hour rolling window, extract the mean and standard deviation of each existing feature in the preprocessed data within the window as the rolling window statistical features. S34. Calculate the difference between adjacent time points in the time series for the preprocessed data at time intervals of 1 hour, 12 hours, and 24 hours respectively, and construct the difference feature. S35. Extract time features such as hour, weekday, weekend, and year using timestamp information, and construct time features by performing sine and cosine encoding on the periodic time features.

4. The air conditioning load forecasting method based on feature engineering and LSTM according to claim 3, characterized in that: Step S4 includes: S41. Linear Regression Analysis Path: For each derived feature, establish a univariate linear regression model with the target variable, calculate its regression coefficient, t-statistic, and p-value, and evaluate the independent linear explanatory power of each feature; then perform multicollinearity diagnosis, calculate the variance inflation factor to identify and process highly correlated features, comprehensively consider the absolute value and statistical significance of the regression coefficient, and calculate the linear regression importance score for each feature. S42, Decision Tree Analysis Path: Train an ensemble model consisting of 3 decision trees, use the built-in mechanism of the tree model to calculate the importance score of each feature based on impurity reduction, and use the permutation importance method for verification. Combine the importance score of impurity reduction and the result of permutation importance to synthesize the decision tree importance score of each feature. S43. Comprehensive scoring and selection: The importance scores of linear regression and decision tree are weighted and fused to obtain the comprehensive importance score of each feature. The elbow rule is used to determine the selection threshold and select the features with the highest comprehensive importance scores. S44. Normalize the selected feature data. For time-related features, use sine and cosine coding. For other features, use minimum-maximum normalization. Use the normalized feature data as input data for model training and validation.

5. The air conditioning load forecasting method based on feature engineering and LSTM according to claim 4, characterized in that: Step S5 includes: S51. Set model training parameters: Divide the input data into training and validation sets in a 7:3 ratio, with a sliding window size of 48 hours and a prediction step size of 6 hours; The LSTM neural network uses a 3-layer hidden layer structure, with each layer containing 128 neurons, a dropout ratio of 0.1, a learning rate of 0.005, an early stopping parameter of 20 epochs, a maximum number of training epochs of 200, and a batch size of 64; S52. Using the sliding window technique, the historical window input data of 48 hours is used as the input feature sequence, and the air conditioning system load value after 6 hours is used as the prediction target to construct the sequence sample. S53. During the forward propagation of the model, the input data first passes through a Gaussian noise layer to enhance the model's anti-interference ability, then enters the LSTM neural network to process the complete sequence, obtain the hidden state of the last 6 time steps, and finally generates the prediction output through a fully connected layer. S54. Using the mean squared error (MSE) as the loss function, the AdamW optimizer is used for parameter updates, with weight decay set to 1×10⁻⁶. -5 After each epoch of training, the loss value is calculated on the validation set. If the validation loss does not improve for 20 consecutive epochs, the early stop mechanism is triggered, the current best model parameters are saved and training is terminated, and the air conditioning load prediction model is obtained.

6. The air conditioning load forecasting method based on feature engineering and LSTM according to claim 1, characterized in that: In step S5, a uniform random seed is set for the built-in random library of the LSTM neural network.

7. The air conditioning load forecasting method based on feature engineering and LSTM according to claim 1, characterized in that: Step S6 further includes evaluating the predicted air conditioning load, with evaluation indicators including mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R). 2 .

8. An air conditioning load forecasting system based on feature engineering and LSTM, implementing the method of any one of claims 1-7, characterized in that, include: The past data acquisition module uses the building's micro weather station to collect outdoor temperature and relative humidity over a period of time, and uses the energy station's energy meter to collect air conditioning system load over a period of time. The data preprocessing module uses the K-means clustering algorithm to perform multi-dimensional clustering analysis on the collected outdoor temperature, relative humidity and air conditioning system load data. It identifies outliers by calculating the distance from each sample to its cluster center, replaces the identified outliers with the mean of adjacent time points, and imputes missing values ​​with the mean of two consecutive time points to obtain the preprocessed data. The feature derivation module performs feature derivation on the preprocessed data, including constructing temperature and humidity index features, hysteresis features, rolling window statistical features, difference features, and time features; The input data acquisition module uses a hybrid model of linear regression and decision tree to evaluate the importance of all derived features. The scores of the two models are weighted and fused to obtain the comprehensive importance score of each feature. The features with the highest comprehensive importance scores are selected by using the elbow rule to determine the screening threshold, and then normalized as input data for model training and validation. The air conditioning load prediction model construction module is based on the LSTM neural network model. It uses the sliding window technique to construct sequence samples, uses the AdamW optimizer to update the model parameters, and combines the early shutdown mechanism control and prediction process. The model is trained using the input data of the above model training and prediction to obtain the air conditioning load prediction model. The air conditioning load prediction module obtains the outdoor temperature and relative humidity for a future period and inputs them into the air conditioning load prediction model to predict the air conditioning load for that period. The prediction results are then inversely normalized to obtain the predicted air conditioning load value for the future period.

9. An electronic device, characterized in that, include: The system includes a memory and a processor, which are interconnected. The memory stores computer instructions, and the processor executes these computer instructions to implement the air conditioning load forecasting method based on feature engineering and LSTM as described in any one of claims 1-7.

10. A computer-readable storage medium storing a computer program, characterized in that: When the computer program is executed by the processor, it implements the air conditioning load prediction method based on feature engineering and LSTM as described in any one of claims 1-7.