Building air conditioning cooling load prediction method and electronic device

By combining a bidirectional long short-term memory network and a Transformer encoder, the nonlinear and non-stationary characteristics of building air conditioning cooling load prediction in existing technologies are solved, achieving higher accuracy and robust prediction results.

CN122243685APending Publication Date: 2026-06-19CHANGJIANG SURVEY PLANNING DESIGN & RES CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
CHANGJIANG SURVEY PLANNING DESIGN & RES CO LTD
Filing Date
2026-03-31
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In predicting building air conditioning cooling load, existing technologies often fail to effectively capture nonlinear and non-stationary characteristics using traditional methods. Machine learning methods also have limitations, with single recurrent neural networks and attention mechanism models being insufficient in learning long-term and short-term correlations and local change patterns, resulting in inadequate prediction accuracy and robustness.

Method used

A hybrid architecture combining a bidirectional long short-term memory network and a Transformer encoder is adopted. By combining a systematic data cleaning process and feature engineering, the bidirectional long short-term memory network extracts local temporal features, while the Transformer encoder captures long-term dependencies to build a hybrid prediction model.

🎯Benefits of technology

It improves the accuracy and robustness of building air conditioning cooling load forecasting, and can more comprehensively depict the changing patterns of cooling load, thereby enhancing the model's prediction accuracy and generalization ability.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122243685A_ABST
    Figure CN122243685A_ABST
Patent Text Reader

Abstract

This application relates to the field of building energy management technology, and provides a method and electronic device for predicting building air conditioning cooling load. The method includes acquiring real-time operating data of the building air conditioning system and inputting it into a pre-trained cooling load prediction model for prediction, obtaining the cooling load prediction result. The pre-training process of the cooling load prediction model includes: collecting historical operating data of the building air conditioning system and cleaning the data to obtain a cleaned dataset; performing feature engineering on the cleaned dataset to obtain a target feature dataset; normalizing the target feature dataset and inputting it into a pre-constructed hybrid prediction model for training, obtaining the cooling load prediction model. The hybrid prediction model includes a bidirectional long short-term memory network and a Transformer encoder. By fusing the bidirectional long short-term memory network and the Transformer encoder, combined with a systematic data cleaning process and feature engineering strategy, the accuracy and robustness of building air conditioning cooling load prediction are improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of building energy management technology, specifically to a method and electronic device for predicting building air conditioning cooling load. Background Technology

[0002] Air conditioning systems are one of the main sources of energy consumption during building operation, accounting for a significant portion of total building energy consumption. During the building design phase, the capacity selection and operating strategies of the air conditioning system are often calculated based on specific design conditions and boundary conditions. However, after the building is put into operation, the load of the air conditioning system is affected by a combination of time-varying factors, including outdoor meteorological parameters, the thermal performance of the building envelope, indoor occupant activity patterns, and equipment heat dissipation. This results in significant annual load fluctuations, with most of the operating time spent under partial load conditions. Therefore, accurate prediction of the cooling load of the air conditioning system is of great engineering significance for developing reasonable operation and control strategies, reducing system energy consumption, and improving operation and maintenance management.

[0003] Currently, various data-driven methods have been proposed and applied for predicting building air conditioning loads. Traditional statistical methods, such as autoregressive integral moving average models and multiple regression analysis, rely on the assumption of linear relationships in time series data. When faced with the nonlinear and non-stationary characteristics exhibited by air conditioning cooling loads, their prediction accuracy often falls short of practical requirements. Machine learning methods, such as K-nearest neighbors, support vector machines, and random forests, have improved the fitting ability to nonlinear relationships to some extent, but they still have their own limitations: K-nearest neighbors are sensitive to data quality and have limited prediction accuracy; support vector machines have low computational efficiency under large-scale sample conditions and are sensitive to the selection of kernel functions and hyperparameters; random forests are prone to overfitting in regression problems with high noise levels.

[0004] Furthermore, from a model architecture perspective, a single recurrent neural network structure suffers from the problem of gradient information gradually decaying with increasing propagation steps when processing sequence data over long time spans. This results in insufficient ability to learn the correlations between distant time steps. While models based purely on attention mechanisms can establish direct connections between any positions in the sequence, they lack the ability to specifically model local change patterns between adjacent time steps, easily overlooking fine-grained temporal fluctuation information. The aforementioned methods are insufficient in simultaneously capturing short-term local fluctuation features and modeling long-term global trends, thus limiting further improvements in the accuracy of building air conditioning cooling load prediction. Summary of the Invention

[0005] In view of this, embodiments of this application provide a method and electronic device for predicting building air conditioning cooling load, which aims to improve the accuracy and robustness of building air conditioning cooling load prediction by integrating the local temporal feature extraction capability of bidirectional long short-term memory networks and the global long-term dependency modeling capability of Transformer encoders, combined with a systematic data cleaning process and feature engineering strategy.

[0006] The first aspect of this application provides a method for predicting building air conditioning cooling load, including: Real-time operating data of building air conditioning is obtained and input into a pre-trained cooling load prediction model to make predictions and obtain cooling load prediction results. The pre-training process of the cooling load prediction model includes: Collect historical operating data of building air conditioning and perform data cleaning to obtain a cleaned dataset; The cleaned dataset is subjected to feature engineering to construct derived features and filter target features to obtain the target feature dataset; The target feature dataset is normalized. The normalized data is input into a pre-built hybrid prediction model for training to obtain the cold load prediction model; wherein, the hybrid prediction model includes a bidirectional long short-term memory network and a Transformer encoder, the bidirectional long short-term memory network is used to extract the local temporal features of the data, and the Transformer encoder is used to capture the long-term dependencies of the data.

[0007] A second aspect of this application provides an electronic device including a processor, a memory, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the electronic device implements the building air conditioning cooling load prediction method provided in the first aspect of this application.

[0008] The building air conditioning cooling load prediction method provided in the first aspect of this application ensures the quality of input data through a systematic data cleaning process, enriches the input information of the model and removes redundant features through multi-dimensional feature engineering, eliminates the influence of dimensional differences through normalization processing, and achieves effective integration of local temporal feature extraction and global long-term dependency modeling through a hybrid architecture of bidirectional long short-term memory network and Transformer encoder. Compared with a single model structure, it can more comprehensively depict the changing pattern of cooling load at different time scales, thereby improving prediction accuracy and model robustness.

[0009] It is understandable that the beneficial effects of the second aspect mentioned above can be found in the relevant descriptions in the first aspect mentioned above, and will not be repeated here. Attached Figure Description

[0010] To more clearly illustrate the technical solutions in the embodiments of this application, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0011] Figure 1 This is a schematic flowchart of a building air conditioning cooling load prediction method provided in an embodiment of this application; Figure 2 This is a schematic diagram of a data cleaning process provided in an embodiment of this application; Figure 3 This is a schematic diagram of the feature engineering process provided in an embodiment of this application; Figure 4 This is a schematic diagram illustrating the ranking of feature importance according to an embodiment of this application; Figure 5 This is a training structure diagram of an LSTM-Transformer hybrid model provided in an embodiment of this application; Figure 6 This is a comparison chart of air conditioning load prediction provided in one embodiment of this application. Detailed Implementation

[0012] In the following description, specific details such as particular system architectures and techniques are set forth for illustrative purposes and not for limitation, in order to provide a thorough understanding of the embodiments of this application. However, those skilled in the art will understand that this application may also be implemented in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, apparatuses, circuits, and methods have been omitted so as not to obscure the description of this application with unnecessary detail.

[0013] It should be understood that, when used in this application specification and the appended claims, the term "comprising" indicates the presence of the described features, integrals, steps, operations, elements and / or components, but does not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or a collection thereof.

[0014] It should also be understood that the term “and / or” as used in this application specification and the appended claims means any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.

[0015] As used in this application specification and the appended claims, the term "if" may be interpreted, depending on the context, as "when," "once," "in response to determination," or "in response to detection." Similarly, the phrase "if determined" or "if detected [the described condition or event]" may be interpreted, depending on the context, as meaning "once determined," "in response to determination," "once detected [the described condition or event]," or "in response to detection [the described condition or event]."

[0016] Furthermore, in the description of this application and the appended claims, the terms "first," "second," "third," etc., are used only to distinguish descriptions and should not be construed as indicating or implying relative importance.

[0017] References to "one embodiment" or "some embodiments" as described in this specification mean that one or more embodiments of this application include a specific feature, structure, or characteristic described in connection with that embodiment. Therefore, the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in still other embodiments," etc., appearing in different parts of this specification do not necessarily refer to the same embodiment, but rather mean "one or more, but not all, embodiments," unless otherwise specifically emphasized. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless otherwise specifically emphasized.

[0018] like Figure 1 As shown in the embodiments of this application, the building air conditioning cooling load prediction method includes the following steps: Step S1: Obtain real-time operating data of building air conditioning and input it into the pre-trained cooling load prediction model to make a prediction and obtain the cooling load prediction result; The pre-training process of the cooling load prediction model includes: Step S101: Collect historical operating data of building air conditioning and perform data cleaning to obtain a cleaned dataset; Step S102: Perform feature engineering on the cleaned dataset to construct derived features and filter target features to obtain the target feature dataset; Step S103: Normalize the target feature dataset; Step S104: Input the normalized data into the pre-constructed hybrid prediction model for training to obtain the cold load prediction model; wherein, the hybrid prediction model includes a bidirectional long short-term memory network and a Transformer encoder, the bidirectional long short-term memory network is used to extract the local temporal features of the data, and the Transformer encoder is used to capture the long-term dependencies of the data.

[0019] In the application, in step S1, the real-time operating data of the building air conditioning system refers to the data reflecting the system's operating status and external environmental conditions, collected in real time by various sensors and metering devices during the actual operation of the building air conditioning system. Specifically, the building's micro-weather station collects meteorological parameters such as outdoor temperature and relative humidity, and the energy station's metering facilities and energy meters record equipment operating parameters such as the air conditioning system's cooling load, chiller system power consumption, water pump frequency, water pump power, and cooling tower power, while also recording time information such as date and time. The data collection frequency can be set to once every hour, forming an hourly operating data sequence. The historical operating data of the building air conditioning system in step S101 is the air conditioning operating data for a specified historical time period obtained from the historical database based on the recorded date, time, and other time information.

[0020] Data cleaning refers to the process of identifying and processing quality problems such as missing values, outliers, and duplicate records in the original collected data. The purpose is to eliminate the adverse effects of data noise on the accuracy of subsequent model training and to ensure the quality of the data input to the model.

[0021] In application, step S102, feature engineering, refers to the process of constructing new derived features based on the original features, using professional knowledge and data analysis methods in the field of building air conditioning, and then selecting a subset of features that contribute significantly to the prediction target. Derived features are new features generated through mathematical transformations, combination operations, or time-series operations on the original features. These features enrich the information dimensions of the input data, helping the model to more comprehensively characterize the various factors affecting cooling load changes. Target features are the set of features retained after feature evaluation and selection that have a high contribution to cooling load prediction. These features reduce the interference of redundant information on model training efficiency and generalization ability.

[0022] In application, step S103, normalization, refers to a data preprocessing operation that uniformly scales the numerical range of each feature to the same order of magnitude. Because the physical dimensions and numerical ranges of various features in building air conditioning operation data differ significantly (for example, temperature is typically in the tens of degrees Celsius range, while power consumption may reach hundreds of kilowatt-hours), directly inputting the raw data into the model would cause features with larger numerical ranges to dominate the gradient update direction, resulting in the neglect of features with smaller numerical ranges. This invention employs a max-min normalization method, linearly scaling the values ​​of all features to the [0,1] interval. The calculation method involves subtracting the minimum value of each feature from its value and then dividing by the difference between its maximum and minimum values. This operation eliminates the influence of dimensional differences, ensuring that each feature receives an equal learning opportunity during model training and accelerating model convergence.

[0023] In the application, step S104, the hybrid prediction model refers to a hybrid neural network architecture that combines two deep learning structures: a Bidirectional Long Short-Term Memory (BiLSTM) network and a Transformer encoder. The BiLSTM network is a recurrent neural network structure capable of processing sequential data from both forward and reverse temporal directions. By concatenating the outputs of the forward and backward LSTM networks, the model can utilize information from past and future time steps to characterize the features of the current time step, thereby extracting local temporal features from the sequence data. These local temporal features refer to the patterns and fluctuations between adjacent or short-distance time steps in a time series, such as the rising and falling trends and short-cycle fluctuations of air conditioning cooling load over several hours. The Transformer encoder is a sequence modeling structure based on a self-attention mechanism. Through multi-head self-attention computation, it can directly establish the association weights between any two time steps in the sequence, thereby capturing long-term dependencies in the data. Among them, long-term dependency refers to the correlation and influence between time steps that are far apart in a time series, such as the daily cycle pattern and weekly cycle pattern of air conditioning cooling load, as well as the continuous impact of changes in meteorological conditions on the load trend over a longer period.

[0024] In the application, in step S1, the real-time operating data of the building's air conditioning system undergoes the same preprocessing procedure as in steps S101-S103 before being input into the cooling load prediction model. Based on the learned temporal feature mapping relationship, the model outputs the cooling load prediction result for the corresponding time period. After inverse normalization, this prediction result yields the cooling load prediction value in actual physical dimensions, which can be used to guide the operation and control decisions of the air conditioning system. (The last sentence appears to be incomplete and possibly refers to a different application or model.) 2 Evaluation metrics such as MAPE are used to assess the predictive accuracy of the model.

[0025] The building air conditioning cooling load prediction method provided in the above embodiments ensures the quality of input data through a systematic data cleaning process, enriches the model's input information and removes redundant features through multi-dimensional feature engineering, eliminates the influence of dimensional differences through normalization processing, and achieves effective integration of local temporal feature extraction and global long-term dependency modeling through a hybrid architecture of bidirectional long short-term memory network and Transformer encoder. Compared with a single model structure, it can more comprehensively depict the changing patterns of cooling load at different time scales, thereby improving prediction accuracy and model robustness.

[0026] In one embodiment, the data cleaning includes: Step S201: Divide the building air conditioning operation data into a training set and a test set.

[0027] Step S202: Perform outlier detection on the training set and the test set respectively to identify abnormal data.

[0028] Step S203: Based on the proportion of abnormal data, process the abnormal data to obtain the cleaned dataset.

[0029] In the application, subsequent embodiments use data from one air-conditioning season of the air conditioning system as a basis for model training and testing. This air conditioning system employs a primary pump variable frequency and variable flow system, consisting of one screw compressor, two cooling water pumps (one in operation and one on standby), two chilled water pumps (one in operation and one on standby), and one cooling tower. All pumps and the cooling tower are equipped with frequency converters.

[0030] In step S201, the training set refers to the subset of data used for model parameter learning and weight optimization. The model learns the mapping relationship between input features and cold load by iteratively adjusting its internal parameters on the training set. The test set refers to the subset of data used to evaluate the model's generalization performance after training. The data in the test set does not participate in the model's training process and is used to simulate the model's prediction performance when faced with unseen data in real-world applications. In this embodiment, the first 70% of the data is divided into the training set and the last 30% into the test set according to chronological order. Using chronological division instead of random division is to maintain the temporal continuity of the time series data, avoid data leakage, and thus make the evaluation of the model's generalization performance more accurate and reliable. After the division, outlier detection and data cleaning are performed independently on the training set and the test set to prevent statistical information from the training set from leaking into the test set.

[0031] In the application, step S202, outlier detection, refers to the process of identifying data points in the dataset that deviate from the normal range or distribution pattern. In building air conditioning operation data, outliers can be caused by factors including, but not limited to, sensor malfunctions, communication interruptions, human error, and extreme operating conditions. Unprocessed outliers can mislead model training, causing the model to learn incorrect data patterns and thus reducing prediction accuracy. This embodiment performs outlier detection on both the training and test sets separately to ensure the data quality of both subsets is guaranteed.

[0032] In application, step S203, selecting different processing strategies based on the proportion of outlier data, is a flexible approach that balances data integrity and data quality. When the proportion of outlier data is small, directly deleting the corresponding samples will not significantly affect the overall size and representativeness of the dataset, and the operation is simple and efficient. When the proportion of outlier data is large, directly deleting all of them will lead to a significant reduction in the size of the dataset, which may affect the sufficiency of model training. In this case, it is necessary to flexibly choose the processing method of mean imputation or sample deletion based on the specific circumstances of each feature.

[0033] The above embodiments, by first dividing the dataset and then performing outlier detection and processing separately, not only ensure the data quality of the training and test sets, but also avoid the leakage of statistical information from the training data to the test data. At the same time, the hierarchical processing strategy takes into account both data quality and data integrity, laying a reliable data foundation for subsequent feature engineering and model training.

[0034] In one embodiment, the outlier detection includes rule-based detection and detection based on the isolated forest algorithm; wherein, the rule-based detection is used to remove data that exceeds a preset physical range; and the detection based on the isolated forest algorithm is used to identify outliers in the data that deviate from the normal distribution.

[0035] In application, rule-based detection refers to a detection method that, based on the physical meaning and engineering reality of each operating parameter of a building air conditioning system, pre-sets reasonable value ranges for each parameter and identifies data exceeding these ranges as outliers. Its basic principle is that each physical quantity in a building air conditioning system has a clear physical meaning and reasonable value boundaries. Data exceeding these boundaries is usually due to sensor malfunction or data acquisition errors and does not reflect the actual system operating status. In this embodiment, the specific judgment criteria for rule-based detection include: data with relative humidity outside the range of 0% to 100% is considered an outlier; data with air conditioning cooling load less than 0 kW is considered an outlier; data with outdoor temperature less than 0°C is considered an outlier (under the operating conditions of the building's air conditioning cooling season); data with water pump and cooling tower operating frequencies outside the range of 25Hz to 50Hz is considered an outlier; and data with cooling tower power exceeding the rated power range is considered an outlier. Rule-based detection can quickly and accurately eliminate obviously unreasonable data and is the first screening step in the data cleaning process.

[0036] In applications, outlier detection based on the Isolation Forest algorithm refers to the process of further identifying outliers in data that has undergone rule-based detection using the Isolation Forest unsupervised machine learning algorithm. Isolation Forest is an ensemble learning algorithm specifically designed for outlier detection. Its basic principle is to recursively divide the data space into several subspaces by randomly selecting features and split values, constructing multiple isolation trees. During the data space segmentation process, normal data points located in dense areas require more segmentation to be isolated into a single subspace due to the presence of many similar data points around them; while outliers deviating from the normal distribution require fewer segmentations because they are farther away from other data points and sparsely distributed. Therefore, by statistically analyzing the average number of segmentations required for each data point to be isolated across all isolation trees (i.e., path length), its anomaly degree can be quantified; the shorter the path length, the more likely the data point is to be an outlier. The Isolation Forest algorithm does not require predefined mathematical models or labeled training data, making it particularly suitable for scenarios lacking labeled information, such as building air conditioning operation data. In this embodiment, the number of isolated trees parameter n_estimator for the isolated forest is set to 100, meaning 100 isolated trees are constructed for ensemble judgment. Isolation forest detection can discover hidden outliers that are not detected by rule detection, do not violate physical range constraints, but deviate from the normal data distribution in the multidimensional feature space, and is an effective supplement to rule detection.

[0037] The above embodiments construct a two-level outlier detection mechanism by combining rule-based detection and the Isolation Forest algorithm: rule-based detection first quickly removes obviously unreasonable data based on physical constraints, while the Isolation Forest algorithm then performs in-depth statistical anomaly identification on the remaining data. The two methods complement each other, making outlier detection more comprehensive, capable of handling simple numerical out-of-bounds issues as well as identifying hidden anomalies at the data distribution level, thereby effectively improving the thoroughness and reliability of data cleaning.

[0038] In one embodiment, such as Figure 2 As shown, the process of processing the abnormal data based on the proportion of abnormal data includes: When the proportion of abnormal data is less than a preset threshold, the sample corresponding to the abnormal data is deleted. When the proportion of abnormal data is greater than or equal to the preset threshold, the abnormal data is subjected to mean interpolation or sample deletion.

[0039] In application, the preset threshold refers to the boundary value used to distinguish between high and low proportions of abnormal data, which is set to 3% in this embodiment. The setting of this threshold takes into account factors such as the size of the dataset, the scarcity of data, and the amount of data required for model training.

[0040] In applications, when the proportion of outlier data is less than 3%, it indicates that the number of outliers in the dataset is relatively small. After deleting these samples, the remaining dataset still has sufficient size and representativeness to support model training. In this case, directly deleting the samples corresponding to the outliers is the simplest and most effective approach. It can completely eliminate the interference of outliers on model training without substantially affecting the integrity of the dataset.

[0041] In applications, when the proportion of outlier data is greater than or equal to 3%, it indicates a large number of outliers in the dataset. Directly deleting all outliers in this case would significantly reduce the dataset size, potentially leading to insufficient data for model training. Therefore, it is necessary to flexibly choose the processing method based on the specific characteristics and samples: for outliers with small deviations and relatively stable surrounding data distribution, mean imputation can be used, replacing the outlier with the mean of its neighboring normal data points, thus correcting the outlier values ​​while preserving the samples; for outliers with large deviations or discontinuous data distribution leading to unreliable mean imputation results, sample deletion is still the preferred method. Mean imputation is a commonly used method for filling missing and outlier values. It estimates a reasonable value for the outlier by utilizing statistical information from the normal data surrounding it, thus maintaining the continuity and integrity of the time series data to a certain extent.

[0042] The above embodiments achieve an effective balance between data quality assurance and data integrity maintenance by setting a threshold for the proportion of abnormal data and adopting a hierarchical processing strategy. This avoids the problem of insufficient training samples caused by excessive data deletion, while ensuring the reliability of the retained data.

[0043] In one embodiment, the feature engineering process on the cleaned dataset includes: Step S301: Based on knowledge in the field of building air conditioning, feature construction is performed on the original features in the cleaned dataset to generate derived features including temperature and humidity index features, perceived temperature features, and hysteresis features.

[0044] Step S302: Use multiple feature evaluation methods to comprehensively rank the derived features, and select target derived features from the derived features.

[0045] Step S303: Merge the target derived features with the original features to obtain the target feature dataset.

[0046] In the application, the feature engineering process for the cleaned dataset is as follows: Figure 3 As shown.

[0047] In the application, step S301, feature construction based on knowledge of the building air conditioning field, refers to using professional knowledge and physical laws in the HVAC engineering field to selectively transform and combine the original features to generate new features that can more directly reflect the influencing factors of cooling load. Among these, the temperature and humidity index feature is a composite index that comprehensively reflects the combined impact of temperature and humidity on human thermal comfort and air conditioning cooling load. Since air conditioning cooling load is not only affected by temperature but also closely related to air humidity, the temperature and humidity index can more accurately characterize the comprehensive effect of environmental thermal and humidity conditions on cooling load than using temperature or humidity alone. The perceived temperature feature is an equivalent temperature index actually felt by the human body after comprehensively considering factors such as air temperature, humidity, and wind speed. It has a strong correlation with the building air conditioning cooling load and can more realistically reflect the driving effect of human thermal comfort needs on the air conditioning system load. The lag feature refers to a time-dimensional feature constructed using the difference between different time steps or historical time step data values ​​to characterize the time lag effect and historical inertia of cooling load changes. In this embodiment, the lag step size for regular features is set to [1, 2, 3, 6, 12, 24] hours, the historical lag step size is set to [1, 2, 3, 4, 6, 12, 24, 48] hours, and the rolling statistical window size is set to [3, 6, 12, 24] hours. Using the above feature construction method, this embodiment generates a total of 156 derived features.

[0048] In the application, step S302, which involves using multiple feature evaluation methods to comprehensively rank derived features, refers to using feature evaluation methods based on different principles to score the importance of each derived feature, and then combining the scores from each method to rank them, in order to select the feature subset that contributes the most to the cold load prediction. The purpose of comprehensive ranking is to overcome the bias and limitations of a single evaluation method, and to make the feature selection results more objective and reliable by mutually verifying and supplementing the evaluation results from multiple perspectives.

[0049] In application, step S303, merging the target derived features with the original features, refers to horizontally concatenating the filtered and retained derived features with the cleaned original features to form the final feature dataset used for model training. The original features retain basic information about the building's air conditioning system operation, while the target derived features supplement higher-level information reflecting the interaction relationships and temporal patterns among factors influencing cooling load. The merging of the two enables the model to learn the changing patterns of cooling load from multiple dimensions and levels. Feature importance ranking is as follows: Figure 4 As shown. In this embodiment, a total of 15 target features were ultimately determined.

[0050] The above embodiments, through two stages of feature construction based on domain knowledge and feature selection based on multi-method comprehensive evaluation, effectively remove redundant and low-contribution features while enriching the dimensions of input information. This not only improves the model's ability to characterize the changing patterns of cold load, but also reduces the adverse effects of feature redundancy on model training efficiency and generalization performance.

[0051] In one embodiment, the step of using multiple feature evaluation methods to comprehensively rank the derived features includes: The derived features were scored using random forest feature importance assessment, mutual information assessment, and correlation analysis, respectively. Based on the scoring results of each feature evaluation method, the derived features are comprehensively ranked, and the top-ranked derived features are selected as the target derived features.

[0052] In applications, feature importance assessment in Random Forest refers to quantifying the importance of each feature by calculating its contribution to the prediction accuracy of the target variable during node splitting across all decision trees in the Random Forest ensemble learning algorithm. The principle is that a Random Forest consists of multiple decision trees. At each node, each decision tree selects the feature that maximizes the reduction in the impurity (e.g., mean squared error) of the target variable after splitting. The total reduction in impurity caused by each feature across all nodes in all decision trees is the measure of that feature's importance. A higher importance value indicates a greater contribution of the feature to cold load prediction. This method can automatically handle nonlinear relationships and interaction effects between features, making it suitable for assessing the relative importance of features in high-dimensional feature sets.

[0053] In applications, mutual information assessment refers to a method that uses mutual information, an information-theoretic metric, to measure the strength of the statistical dependency between derived features and the target cooling load forecast. Mutual information measures the amount of information shared between two random variables; a higher value indicates a stronger dependency between the two variables. Unlike the linear correlation coefficient, mutual information can capture non-linear dependencies between features and the target variable, thus offering a unique advantage in assessing complex non-linear associations that may exist in building air conditioning operation data.

[0054] In applications, correlation analysis refers to calculating the Pearson correlation coefficient between various derived characteristics and the cooling load prediction target to measure the degree of linear correlation between them. The Pearson correlation coefficient ranges from -1 to 1, with a larger absolute value indicating a stronger linear correlation. Correlation analysis can intuitively reveal the direction and strength of the linear relationship between characteristics and targets, and is one of the fundamental methods for characteristic evaluation.

[0055] In application, the comprehensive ranking is implemented as follows: features are ranked separately based on the scoring results of each evaluation method. Then, the rankings of each feature across the three methods are comprehensively calculated (e.g., by taking the average ranking or a weighted ranking). Finally, the features are sorted from highest to lowest according to the comprehensive ranking results, and the top-ranked derived features are selected as target derived features. By integrating the results of three evaluation methods based on different principles, the importance of features can be comprehensively evaluated from three complementary perspectives: nonlinear relationship evaluation, information theory dependency measurement, and linear correlation analysis, making the screening results more robust and reliable.

[0056] The above embodiments comprehensively rank features by integrating three different principles: random forest feature importance assessment, mutual information assessment, and correlation analysis. This overcomes the limitations of a single method in terms of assessment dimensions, comprehensively measures the contribution of each derived feature to cold load prediction from multiple perspectives, and improves the objectivity and reliability of feature selection results.

[0057] like Figure 5 As shown, in one embodiment, in the hybrid prediction model, the bidirectional long short-term memory network is used to concatenate the forward feature representation and the backward feature representation to obtain a bidirectional temporal feature representation that integrates past and future information; the Transformer encoder is used to sequentially perform position encoding, layer normalization, multi-head self-attention calculation, residual connection, and feedforward neural network processing on the input bidirectional temporal feature representation to output the prediction result; wherein, the bidirectional long short-term memory network includes a forward long short-term memory network and a backward long short-term memory network; the forward long short-term memory network is used to process the input sequence in forward time order and output the forward feature representation; the backward long short-term memory network is used to process the input sequence in reverse time order and output the backward feature representation.

[0058] In applications, the Forward Long Short-Term Memory (LSTM) network processes the input sequence sequentially in ascending time order (i.e., from the first time step to the Tth time step). LSTM is a special type of recurrent neural network structure whose core design incorporates memory units (cell states) and a gating mechanism to control the flow of information. Each LSTM unit contains three gates: the Forget Gate, which determines which historical information in the memory unit should be discarded; it maps the input to the [0,1] interval using the Sigmoid activation function, with output values ​​closer to 0 indicating that the corresponding information should be forgotten; the Input Gate, which determines which new information should be written into the memory unit; it is achieved through the combined action of the Sigmoid and Tanh activation functions, with the Sigmoid function controlling the writing ratio and the Tanh function generating candidate memory values; and the Output Gate, which determines which information in the memory unit should be output as the hidden state at the current time step, controlling the output ratio using the Sigmoid activation function. The Feedforward Long Short-Term Memory network processes the input sequence in chronological order, and its output feedforward feature representation encodes historical information from the beginning of the sequence to the current time step.

[0059] In applications, the Backward Long Short-Term Memory (LSTM) network processes the input sequence in reverse temporal order (i.e., from the T-th time step to the 1st time step). Its internal structure is the same as the Forward LSTM network, differing only in the processing direction of the input sequence. The output backward feature representation of the Backward LSTM network encodes future information from the end of the sequence to the current time step. By concatenating the forward and backward feature representations along the feature dimension, the resulting bidirectional temporal feature representation contains both past and future information, enabling the model to utilize complete contextual information to characterize the features of each time step. In this embodiment, the number of hidden layer cells in the LSTM is set to 256, the number of LSTM layers is set to 3, the input sequence length is set to 36 hours, and the Dropout regularization ratio is set to 0.25.

[0060] In applications, after the bidirectional temporal feature representation is input into the Transformer encoder, the Transformer encoder performs the following processing steps in sequence: Positional encoding is the first processing step in a Transformer encoder. Since the Transformer architecture itself does not contain loops or convolutions, it cannot naturally perceive the positional order of elements in the input sequence. Positional encoding, by superimposing positional information onto the bidirectional temporal feature representation of the input, enables the Transformer to distinguish data from different time steps in the sequence, thus preserving temporal information. Positional encoding is typically generated using sine and cosine functions; combinations of sine and cosine functions of different frequencies can generate a unique encoding vector for each position.

[0061] Layer normalization is an operation that normalizes positionally encoded data along its feature dimensions. It normalizes the data to a zero-mean, unit-variance distribution by calculating the mean and variance of each sample along the feature dimension, and then performs an affine transformation using learnable scaling and offset parameters. The purpose of layer normalization is to stabilize the distribution of input data across layers of a neural network, alleviate internal covariate bias during training, thereby accelerating model convergence and improving training stability.

[0062] Multi-head self-attention is the core computational module of the Transformer encoder. The self-attention mechanism establishes a direct correlation between any two positions globally by calculating the attention weights between each time step and all other time steps in the sequence, thereby capturing long-term dependencies in the data. Multi-head self-attention projects the input data into multiple independent subspaces through different linear transformations, calculates self-attention in each subspace, and finally concatenates and linearly transforms the attention outputs of each subspace to obtain the final attention output. The multi-head mechanism enables the model to simultaneously focus on correlation information at different positions and levels in the sequence, enhancing the model's expressive power. In this embodiment, the Transformer dimension is set to 256, the number of attention heads is set to 8, the number of attention layers is set to 3, and the Dropout regularization ratio is set to 0.25.

[0063] Residual connections are operations that add the output of multi-head self-attention computation to its input element-wise. The purpose of residual connections is to provide a direct backpropagation path for gradients, alleviating the vanishing gradient problem in deep network training, enabling the model to learn identity mappings more easily, and accelerating the convergence process.

[0064] A feedforward network (FFN) is a subnetwork module consisting of two fully connected layers and an intermediate nonlinear activation function. The FFN performs a nonlinear transformation on the data after attention calculations and residual connections, enhancing the model's ability to fit complex data patterns. In this embodiment, the hidden layer dimension of the FFN is set to 512, and the ReLU activation function is used.

[0065] The above embodiments extract local temporal features that integrate past and future information through a bidirectional long short-term memory network. Then, global long-term dependencies are captured through processing steps such as positional encoding, layer normalization, multi-head self-attention computation, residual connections, and feedforward neural networks using a Transformer encoder. This achieves an organic combination of local feature extraction and global dependency modeling. The gating mechanism of the bidirectional long short-term memory network excels at capturing fine-grained change patterns between consecutive time steps, while the self-attention mechanism of the Transformer encoder excels at discovering correlations between distant time steps. The synergistic effect of these two mechanisms enables the hybrid model to comprehensively characterize the changing patterns of cold load across different time scales, thereby improving prediction accuracy.

[0066] In one embodiment, during the training of the hybrid prediction model, a hybrid loss function is used for optimization. The hybrid loss function is a weighted combination of mean squared error loss and mean absolute error loss. An early stopping mechanism is set during the training process, and training is stopped when the hybrid loss function no longer decreases within a preset number of consecutive rounds.

[0067] In applications, a hybrid loss function refers to a composite optimization objective function formed by weighted combination of two different types of loss functions. Mean Squared Error (MSE) is the mean of the squares of the differences between predicted and true values. Because the error is squared, MSE has a stronger penalty for larger prediction deviations, effectively driving the model to reduce the bias at large error prediction points and helping to improve the model's prediction accuracy. Mean Absolute Error (MAE) is the mean of the absolute values ​​of the differences between predicted and true values. MAE imposes an equal penalty on all error points, avoiding over-adjustment of model parameters for individual extreme error points, thus exhibiting strong robustness and reducing the interference of outliers on the model's training direction. Weighted combination of MSE and MAE as a hybrid loss function can simultaneously consider prediction accuracy and model robustness. It drives the model to reduce the bias at large error prediction points through the MSE component, while maintaining a balanced focus on overall prediction bias through the MAE component, preventing the model from overfitting to individual extreme data points.

[0068] In applications, early stopping is a regularization technique used to automatically determine the optimal number of training epochs during model training to prevent overfitting. Its working principle is as follows: after each training epoch, the loss function value of the model on the validation set is calculated. If the loss function value does not decrease within a consecutive preset number of epochs (called the patience value), the model's generalization performance is considered to have reached its optimal level or begun to degenerate. At this point, the training process is terminated, and the model parameters corresponding to the lowest loss function value are saved. Early stopping effectively avoids model overfitting caused by overtraining while saving unnecessary computational resources. In this embodiment, the optimizer used is the AdamW optimizer, with a learning rate of 0.0006, a batch size of 24, and a maximum training epoch of 800 epochs.

[0069] The above embodiments balance prediction accuracy and robustness through the design of a hybrid loss function, and automatically determine the optimal training rounds and prevent overfitting through an early stopping mechanism. The combination of these two approaches effectively improves the scientific nature and stability of the model training process.

[0070] In one embodiment, the expression for the hybrid loss function is: Where Loss is the value of the hybrid loss function, MSE is the mean squared error loss, MAE is the mean absolute error loss, and α and β are weighting coefficients with α + β = 1.

[0071] In application, α and β are weighting coefficients that control the respective contributions of mean squared error loss (MSE) and mean absolute error (MAE) loss to the mixed loss function. The constraint α + β = 1 ensures that the magnitude of the mixed loss function is at the same level as the single loss function, facilitating the setting and adjustment of hyperparameters such as the learning rate. A larger value for α results in a higher proportion of MSE in the mixed loss function, leading to stronger penalties for large error prediction points during model training, which is beneficial for improving prediction accuracy but may increase the model's sensitivity to outliers. Conversely, a larger value for β results in a higher proportion of MAE, leading to more balanced penalties for each error point during model training, which is beneficial for improving the model's robustness but may reduce the model's ability to correct large error prediction points. Therefore, the values ​​of α and β need to be carefully chosen based on the characteristics of the actual data and application requirements. In this embodiment, α is set to 0.6 and β is set to 0.4, meaning the specific form of the mixed loss function is Loss = 0.6 × MSE + 0.4 × MAE. This value ensures that the MSE component dominates to guarantee prediction accuracy, while the MAE component provides some robust compensation.

[0072] The above embodiments achieve an effective balance between prediction accuracy and model robustness by reasonably configuring the weight coefficients in the hybrid loss function, enabling the model training process to take into account both the key correction of large errors and the balanced optimization of the overall error.

[0073] To verify the effectiveness of the method proposed in this invention, comparative experiments were conducted using actual operating data from the aforementioned air conditioning system. For example... Figure 6 As shown in Table 1, the prediction accuracy of the proposed bidirectional LSTM-Transformer hybrid architecture prediction method was compared with that of a single LSTM model and a feature-engineered LSTM model. The evaluation metrics included root mean square error (RMSE) and coefficient of determination (R²). 2 ) and mean absolute percentage error (MAPE).

[0074] Table 1: Comparison of prediction accuracy of the three prediction models Experimental results show that the RMSE of a single LSTM model is 78.48kW, and R0 is... 2 The efficiency was 0.90, and the MAPE was 10.13%; the RMSE of the feature-engineered LSTM model was 60.12kW, and the R... 2 The RMSE is 0.94, and the MAPE is 7.04%; the RMSE of the BiLSTM-Transformer hybrid model proposed in this invention is 48.88kW, and the R... 2 The RMSE was 0.96, and the MAPE was 5.67%. Compared to a single LSTM model, the method of this invention reduced the RMSE by 37.72%, and R... 2 The performance improved by 0.06, and the MAPE decreased by 4.46 percentage points; compared to the feature-engineered LSTM model, the RMSE was further reduced by 18.70%, and R... 2 The MAPE decreased by 1.37 percentage points, while the improvement was 0.02. The experimental results fully verify the superiority of the hybrid architecture proposed in this invention in the task of predicting building air conditioning cooling load.

[0075] In summary, this method cleanses the collected building air conditioning operation data to remove outliers and noise; it then performs feature engineering on the cleaned dataset, constructing derived features such as temperature and humidity index, perceived temperature, and hysteresis features based on domain knowledge, and employs multiple evaluation methods to select target features; it normalizes the target feature dataset to eliminate dimensional differences; and it trains a hybrid prediction model containing a bidirectional long short-term memory network (BSSN) and a Transformer encoder. The BSSN processes sequential data in both forward and backward directions to extract local temporal features, while the Transformer encoder captures long-term dependencies through a multi-head self-attention mechanism. Finally, the trained model is used to predict the cooling load for the desired period. This method, through the synergistic effect of the BSSN and the Transformer encoder, balances fine-grained capture of short-term local fluctuations with effective modeling of long-term global trends. Combined with a systematic data preprocessing and feature engineering process, it effectively improves the accuracy and robustness of building air conditioning cooling load prediction.

[0076] It should be understood that the sequence number of each step in the above embodiments does not imply the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of this application.

[0077] This application also provides an electronic device, including a processor, a memory, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the electronic device implements the building air conditioning cooling load prediction method provided in the first aspect of this application.

[0078] This application also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the steps described in the various method embodiments above.

[0079] This application provides a computer program product, including a computer program, which, when run on an electronic device, enables the electronic device to perform the steps described in the various method embodiments above.

[0080] The above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application, and should all be included within the protection scope of this application.

Claims

1. A method for predicting building air conditioning cooling load, characterized in that, include: Real-time operating data of building air conditioning is obtained and input into a pre-trained cooling load prediction model to make predictions and obtain cooling load prediction results. The pre-training process of the cooling load prediction model includes: Collect historical operating data of building air conditioning and perform data cleaning to obtain a cleaned dataset; The cleaned dataset is subjected to feature engineering to construct derived features and filter target features to obtain the target feature dataset; The target feature dataset is normalized. The normalized data is input into a pre-built hybrid prediction model for training to obtain the cold load prediction model; wherein, the hybrid prediction model includes a bidirectional long short-term memory network and a Transformer encoder, the bidirectional long short-term memory network is used to extract the local temporal features of the data, and the Transformer encoder is used to capture the long-term dependencies of the data.

2. The building air conditioning cooling load prediction method according to claim 1, characterized in that, The data cleaning process includes: The historical operating data of the building's air conditioning system is divided into a training set and a test set; Outlier detection is performed on both the training set and the test set to identify abnormal data. Based on the proportion of abnormal data, the abnormal data is processed to obtain the cleaned dataset.

3. The building air conditioning cooling load prediction method according to claim 2, characterized in that, The outlier detection includes rule-based detection and detection based on the isolated forest algorithm; The rule detection is used to remove data that exceeds a preset physical range; the detection based on the isolated forest algorithm is used to identify outliers in the data that deviate from the normal distribution.

4. The building air conditioning cooling load prediction method according to claim 2, characterized in that, The step of processing the abnormal data based on the proportion of abnormal data includes: When the proportion of abnormal data is less than a preset threshold, the sample corresponding to the abnormal data is deleted. When the proportion of abnormal data is greater than or equal to the preset threshold, the abnormal data is subjected to mean interpolation or sample deletion.

5. The building air conditioning cooling load prediction method according to claim 1, characterized in that, The feature engineering process performed on the cleaned dataset includes: Based on knowledge of the building air conditioning field, feature construction is performed on the original features in the cleaned dataset to generate derived features including temperature and humidity index features, perceived temperature features, and hysteresis features. Multiple feature evaluation methods are used to comprehensively rank the derived features, and target derived features are selected from the derived features; The target derived features are merged with the original features to obtain the target feature dataset.

6. The building air conditioning cooling load prediction method according to claim 5, characterized in that, The method of using multiple feature evaluation methods to comprehensively rank the derived features includes: The derived features were scored using random forest feature importance assessment, mutual information assessment, and correlation analysis, respectively. Based on the scoring results of each feature evaluation method, the derived features are comprehensively ranked, and the top-ranked derived features are selected as the target derived features.

7. The building air conditioning cooling load prediction method according to claim 1, characterized in that, In the hybrid prediction model, the bidirectional long short-term memory network is used to concatenate the forward feature representation and the backward feature representation to obtain a bidirectional temporal feature representation that integrates past and future information; The Transformer encoder is used to sequentially perform position encoding, layer normalization, multi-head self-attention calculation, residual connection and feedforward neural network processing on the input bidirectional temporal feature representation to output the prediction result; The bidirectional long short-term memory network includes a forward long short-term memory network and a backward long short-term memory network. The forward long short-term memory network is used to process the input sequence in ascending time order and output forward feature representation; The backward long short-term memory network is used to process the input sequence in reverse chronological order and output a backward feature representation.

8. The method for predicting building air conditioning cooling load according to claim 1, characterized in that, During the training process of the hybrid prediction model, a hybrid loss function is used for optimization. The hybrid loss function is a weighted combination of mean squared error loss and mean absolute error loss. An early stopping mechanism is set during the training process. When the hybrid loss function no longer decreases within a preset number of consecutive rounds, the training is stopped.

9. The building air conditioning cooling load prediction method according to claim 8, characterized in that, The expression for the hybrid loss function is: Where Loss is the value of the hybrid loss function, MSE is the mean squared error loss, MAE is the mean absolute error loss, and α and β are weighting coefficients.

10. An electronic device, characterized in that, The device includes a processor, a memory, and a computer program stored in the memory and executable on the processor, characterized in that, when the processor executes the computer program, the electronic device performs the method as described in any one of claims 1-9.