A flight arrival time prediction method and system
By combining multi-interval regression isolated forest and progressive random forest feature selection with a deep ensemble learning model, the problems of data noise and feature interaction modeling in flight arrival time prediction are solved, the robustness and prediction accuracy of the model are improved, and efficient flight arrival time prediction is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- GUILIN UNIV OF ELECTRONIC TECH
- Filing Date
- 2026-03-24
- Publication Date
- 2026-06-19
AI Technical Summary
Existing flight arrival time prediction methods suffer from problems such as sensitivity to data noise, insufficient feature engineering, and weak ability to model the interaction between time series and features when dealing with multi-source heterogeneous data, resulting in poor model generalization ability and low prediction accuracy.
We employ multi-interval regression isolated forest outlier detection, progressive random forest feature selection, and a deep ensemble learning model. This model combines Transformer, LSTM, and random forest branches, and integrates them using a dynamic weighting strategy. We also utilize IGWO to optimize hyperparameters and construct the ITLR deep ensemble learning framework.
It improves data quality and model robustness, reduces the false alarm rate of outliers, simplifies feature redundancy, enhances the model's prediction accuracy and stability, and achieves efficient flight arrival time prediction.
Smart Images

Figure CN122242861A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of air traffic management, and in particular to a method and system for predicting flight arrival times. Background Technology
[0002] Accurately predicting flight arrival times is crucial for improving airport operational efficiency and resource utilization. All airport ground services, such as parking space allocation, baggage handling, and passenger transfers, are arranged based on flight arrival times. Knowing precise arrival times in advance can significantly optimize resource allocation, reduce the cascading effects of flight delays, and ultimately improve passenger satisfaction.
[0003] Currently, methods for predicting flight arrival times mainly fall into three categories. The first category is based on flight schedules, which is simple but inaccurate and cannot handle the impact of dynamic factors such as weather and air traffic control. The second category is based on traditional machine learning models, such as Support Vector Machines (SVM) or Random Forests (RF). While these models can handle some nonlinear relationships, they struggle to fully capture the high-dimensional, complex interactions and long-term temporal dependencies in flight operation data. The third category is based on deep learning models such as Long Short-Term Memory Networks (LSTM) and Convolutional Neural Networks (CNN). Although LSTM has made breakthroughs in temporal modeling, it still suffers from information decay when dealing with extremely long sequences, and a single model often struggles to simultaneously account for local fluctuations and global patterns. Furthermore, models like CNN have limited ability to extract spatial features when processing multi-source heterogeneous data and are prone to overfitting.
[0004] Existing research generally suffers from the following bottlenecks: 1) Sensitivity to data noise: The raw running data contains a large number of abnormal records, which, if used directly for training, will seriously affect the model's generalization ability and prediction accuracy. 2) Insufficient feature engineering: It is difficult to effectively select the most predictive feature combinations from multi-source heterogeneous data, resulting in high model complexity and the risk of overfitting. 3) Weak ability to model temporal and feature interactions: Traditional models cannot effectively integrate the global interactions of features with long-term and short-term temporal dependencies. Summary of the Invention
[0005] To overcome the aforementioned bottlenecks, this invention aims to provide a method and system for predicting flight arrival times, which can accurately identify and eliminate multi-dimensional abnormal data, automatically select the optimal feature subset, improve the model's generalization ability, and propose a prediction model that can deeply integrate the advantages of multiple neural network architectures.
[0006] To achieve the above objectives, the present invention adopts the following technical solution:
[0007] A method for predicting flight arrival times includes:
[0008] Outlier detection in isolated forests using multi-interval regression: dynamically estimating the normal range of the original data, identifying and removing outlier data, where the original data includes historical operational data of the airport;
[0009] Progressive random forest feature selection, which combines feature importance ranking and progressive subset validation, is used to identify the most predictive feature subsets to reduce model complexity and enhance generalization ability. The model is used to predict flight arrival times.
[0010] The model is a deep ensemble model based on ITLR. It adopts a deep ensemble learning framework, which integrates the three branches of Transformer, LSTM and Random Forest in parallel, and integrates them through a dynamic weight strategy.
[0011] Hyperparameter optimization based on IGWO automatically searches for model hyperparameters.
[0012] Preferably, the outlier detection in the multi-interval regression isolated forest includes the following steps:
[0013] a1. Anomaly preprocessing: First, missing entries in the original data matrix are filled by linear interpolation. Then, a moving median filter with a fixed window length is applied for preliminary anomaly screening and adjustment. Observations with absolute deviations exceeding the absolute deviations of the moving median by three median values are shortened to the corresponding upper / lower limit determined by the filter to obtain the cleaned data matrix. Then, the cleaned data is standardized.
[0014] a2. Correlation analysis: Calculate the correlation coefficient matrix of standardized data, extract the correlation coefficient between each candidate input variable and the target variable, and select the variable with the largest absolute correlation coefficient as the key variable.
[0015] a3. Data segmentation and theoretical boundary function construction: The domain of key variables is divided into K equally spaced intervals, and a quadratic polynomial regression model is fitted by the least squares method to characterize the overall trend relationship between key variables and target variables.
[0016] Within each interval, an isolation forest is applied for initial anomaly screening, and the contamination rate is adaptively adjusted based on the proportion of the current interval sample to the entire dataset.
[0017] a4. Sample selection and dynamic boundary determination: Remove the observations marked as anomalous by the isolated forest, and the remaining observations constitute the normal sample set. Extract the samples whose target variable values fall in the upper and lower percentiles from the normal sample set, and fit two curves to construct the theoretical dynamic upper and lower boundaries.
[0018] Furthermore, buffers are introduced near the upper and lower boundaries of the theoretical dynamics to construct safety boundaries. Samples that meet specific conditions are marked as outliers and removed.
[0019] Preferably, the progressive random forest feature selection includes the following steps:
[0020] b1. Data partitioning: A time-based partitioning strategy is adopted to divide the dataset into a training set and a test set.
[0021] b2. Feature importance assessment: Using a random forest regression model containing S trees, feature importance scores are calculated based on out-of-package permutation error. For each feature, its value in the sample is randomly permuted, and the average increase in mean squared error before and after the permutation is calculated as a measure of feature importance. All features are arranged in descending order of importance.
[0022] b3. Progressive subset validation: For each subset size, select the corresponding feature subset, retrain the random forest model, and evaluate it on the test set using mean absolute error and coefficient of determination to obtain the performance curve;
[0023] b4. Optimal feature selection: The final feature subset is determined by detecting inflection points on the performance curve.
[0024] Preferably, the method for integration using a dynamic weighting strategy is as follows:
[0025] The Transformer-LSTM branch maps the original feature vector to a high-dimensional hidden space through the input representation layer, reformulates the feature vector of each flight into a feature sequence, introduces a position embedding layer to assign a unique learnable encoding vector to each feature dimension in the input sequence, forming an enhanced feature representation, and then passes the enhanced feature representation to an encoder composed of multiple multi-head self-attention layers;
[0026] The high-level feature sequence generated by the Transformer is fed into the LSTM layer, and the interaction-aware representation is aggregated into a compact vector. The output of the LSTM is then passed through a fully connected layer to generate the final scalar prediction.
[0027] Furthermore, the encoder composed of multiple layers and multi-head self-attention allows each feature to pay attention to all other features and dynamically assigns attention weights that reflect relative importance.
[0028] Furthermore, a Dropout layer is inserted after each self-attention block.
[0029] Preferably, the abnormal data includes data generated due to special events and recording errors.
[0030] A system for predicting flight arrival times, wherein the system employs the aforementioned flight arrival time prediction method.
[0031] Preferably, such a system includes:
[0032] The multi-interval regression isolated forest outlier detection module is used to dynamically estimate the normal range of data and identify and remove records that significantly deviate from the normal pattern.
[0033] The progressive random forest feature selection module is used to perform feature importance ranking and progressive subset validation, and to identify the most predictive feature subsets.
[0034] The ITLR-based deep ensemble prediction module is used to perform parallel fusion of the three branches of Transformer, LSTM and Random Forest, and integrate them through a dynamic weight strategy.
[0035] The IGWO-based hyperparameter optimization module is used to perform efficient hyperparameter configuration for identifying ITLR deep ensemble models.
[0036] The beneficial effects of this invention are as follows:
[0037] 1. Improve data quality and model robustness
[0038] The MRIF anomaly detection method can accurately identify and remove abnormal data caused by special events, recording errors, and other reasons. Compared with traditional isolated forests, MRIF has a lower false alarm rate of 73.2% in skewed, high-dimensional, multi-feature data. This provides cleaner and more reliable training data for subsequent modeling. Experimental results show that it removes only about 33.7% of the abnormal records of traditional methods, making it more operationally relevant.
[0039] 2. Reduce feature redundancy and overfitting risk
[0040] The PRF-FS feature selection method uses progressive validation to automatically identify the subset of features with the highest predictive power and stops adding redundant features at the performance inflection point. This not only reduces the feature dimension from the original 12 to 6, simplifying model complexity, but also effectively suppresses overfitting and improves the model's generalization ability.
[0041] 3. Enhance model prediction accuracy and stability
[0042] The Transformer-LSTM branch combines self-attention and LSTM, capturing global interactions between features (such as nonlinear coupling between different flight attributes) and modeling long- and short-term temporal dependencies, thus addressing the insufficient expressive power of traditional single models. By integrating deep neural networks with the ensemble learning model Random Forest in parallel, it combines the representational power of deep learning with the robustness of traditional models. Experiments show that the ITLR model achieves an R² of 0.9495 on the test set, with predicted residuals concentrated within a narrow range and approximately normally distributed, demonstrating high accuracy and stability.
[0043] The introduction of dynamic weighting strategy and threshold θ ensures that the ensemble model is at least no worse than the single best model, avoids the decrease in prediction accuracy caused by the fusion of low-performance branches, and guarantees the reliability of the model in actual deployment.
[0044] 4. Implement automatic global optimization of hyperparameters.
[0045] The IGWO algorithm is used for automated hyperparameter search, which significantly improves the efficiency and quality of hyperparameter tuning compared to manual trial and error or grid search. By introducing a dual-candidate solution generation strategy and historical best memory, IGWO has better convergence accuracy and the ability to escape local optima than the standard GWO algorithm, ensuring that the model can achieve its best performance. Attached Figure Description
[0046] Figure 1 This is a diagram illustrating the overall framework of the flight arrival time prediction method disclosed in this invention.
[0047] Figure 2 This is the core step in outlier detection for isolated forests with multi-interval regression.
[0048] Figure 3 This is the core step in feature selection for progressive random forests.
[0049] Figure 4 This is a diagram of the model integration architecture.
[0050] Figure 5 This is a flowchart of the method in Example 2.
[0051] Figure 6 This is a schematic diagram of the LSTM and regression output layer structure.
[0052] Figure 7 This is a schematic diagram of the Transformer-LSTM structure. Detailed Implementation
[0053] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings.
[0054] Example 1
[0055] This embodiment discloses a flight arrival time prediction method based on deep ensemble learning and anomaly detection. Its overall architecture is as follows: Figure 1 As shown, it includes:
[0056] 1. Outlier Detection in Isolated Forests with Multi-Interval Regression
[0057] To address the issues of noise and outlier observations in the raw data, this invention proposes an anomaly detection method based on Multi-Interval Regression Isolated Forest (MRIF), the implementation process of which is as follows: Figure 2 As shown. This method identifies and removes records that significantly deviate from the normal pattern by dynamically estimating the normal range of the data. Its core steps are as follows:
[0058] a1: Anomaly Preprocessing. First, missing entries in the original data matrix are filled using linear interpolation. Next, a moving median filter with a fixed window length is applied for initial anomaly screening and adjustment. Observations with absolute deviations exceeding the moving median three median absolute deviations (MAD) are shortened to the corresponding upper / lower limits determined by the filter, resulting in a cleaned data matrix. Finally, the cleaned data is standardized.
[0059] a2: Correlation Analysis. Identify the input variables most strongly correlated with the target variable, using them as key predictors for subsequent stepwise regression. Calculate the correlation coefficient matrix of standardized data and extract the correlation coefficient between each candidate input variable and the target variable. Select the variable with the largest absolute correlation coefficient as the key variable.
[0060] a3: Data Segmentation and Construction of Theoretical Boundary Functions. The domain of the key variable is divided into K equally spaced intervals. A quadratic polynomial regression model is fitted using the least squares method to characterize the overall trend relationship between the key variable and the target variable. Within each interval, an isolation forest is applied for preliminary anomaly screening, and the contamination rate is adaptively adjusted based on the proportion of the current interval's samples to the entire dataset to mitigate the bias that may arise from using a fixed contamination level across intervals.
[0061] A4: Sample Selection and Dynamic Boundary Determination. After processing all intervals, observations marked as outliers in the isolated forest are removed, and the remaining observations constitute the "normal" sample set. Samples with target variable values falling within the upper and lower percentiles are extracted from the normal sample set, and two curves are fitted to construct the theoretical dynamic upper and lower boundaries. To improve algorithm robustness and reduce false positives, a buffer is introduced around the theoretical boundaries to construct the final discriminative safety boundary. Samples meeting specific conditions are ultimately marked as outliers and removed.
[0062] 2. Progressive Random Forest Feature Filtering Module
[0063] To reduce model complexity and enhance generalization ability, this invention proposes a progressive random forest feature selection (PRF-FS) algorithm, the core steps of which are as follows: Figure 3 As shown. This method combines feature importance ranking with progressive subset validation to identify the most predictive feature subset. The algorithm implementation flow is as follows:
[0064] b1: Data partitioning. A time-based partitioning strategy is adopted to divide the dataset into training and test sets, ensuring temporal consistency in the regression prediction task.
[0065] b2: Feature Importance Assessment. A random forest regression model containing S trees is used to calculate feature importance scores based on out-of-bag (OOB) permutation error. For each feature, its value in the OOB samples is randomly permuted, and the average increase in mean squared error (MSE) before and after the permutation is calculated as a measure of feature importance. All features are sorted in descending order of importance.
[0066] b3: Asymptotic subset validation. For each subset size, a corresponding feature subset is selected, the random forest model is retrained, and evaluated on the test set using mean absolute error (MAE) and coefficient of determination (R²). The resulting performance curves illustrate the evolution of model performance as the number of features increases, thus revealing the marginal contribution of each newly added feature.
[0067] b4: Optimal Feature Selection. The final feature subset is determined by detecting the inflection point on the performance curve. The inflection point indicates that beyond that point, the performance gains from adding more features diminish. This criterion balances model complexity and predictive performance.
[0068] 3. Deep ensemble prediction module based on ITLR
[0069] To address the limited expressive power of a single model, this invention constructs a deep ensemble learning framework called ITLR (IGWO-Transformer-LSTM-RF). This framework integrates the Transformer, LSTM, and Random Forest branches in parallel and uses a dynamic weighting strategy for integration. Its structure is as follows: Figure 4 As shown.
[0070] The Transformer-LSTM branch maps the original feature vectors to a high-dimensional hidden space through the input representation layer, reformulating the feature vector of each flight as a "feature sequence." A positional embedding layer is introduced to assign a unique, learnable encoding vector to each feature dimension in the input sequence to distinguish the semantics of different features. The enhanced feature representation is then passed to an encoder consisting of multiple layers of multi-head self-attention (MHSA). MHSA allows each feature token to attend to all other feature tokens and dynamically assigns attention weights reflecting relative importance. To improve generalization and mitigate overfitting with limited training data, a Dropout layer is inserted after each self-attention block. The high-level feature sequence generated by the Transformer is fed into LSTM layers to further model sequence dependencies and aggregate the interaction-aware representations into a compact vector. The output of the LSTM generates the final scalar prediction through fully connected layers. This design replaces the feedforward sublayers in the standard Transformer with LSTM layers to obtain more robust feature representations. RF mitigates overfitting and improves generalization by integrating the predictions of multiple decision trees.
[0071] 4. Hyperparameter optimization module based on IGWO
[0072] To efficiently identify effective hyperparameter configurations for ITLR deep ensemble models, an improved Grey Wolf Optimization (IGWO) algorithm is employed for automated hyperparameter optimization. Building upon the social hierarchy mechanism of the standard Grey Wolf Optimization (GWO), IGWO introduces a dual-candidate solution generation scheme and an individual historical best memory component, improving convergence behavior and robustness.
[0073] Example 2
[0074] Based on Example 1, this example discloses an outlier detection method based on multi-interval regression isolated forest as follows:
[0075] The raw airport operation data contains a lot of noise and outliers, and directly inputting them into the model will reduce the prediction accuracy and generalization ability.
[0076] The specific processing in this embodiment is as follows:
[0077] 1. Data Acquisition and Preprocessing: First, historical operational data of an airport was acquired, containing 3,677 samples. The original data matrix was then processed. Missing values are filled using linear interpolation. Then, a fixed-window-width moving median filter is used to identify and correct initial anomalies. Values deviating from the moving median by more than three median absolute deviations (MAD) are trimmed to the upper or lower bound defined by the filter. This process yields the cleaned data matrix. Finally, the cleaned data is standardized (Z-score) to transform it into a standard matrix with a mean of 0 and a standard deviation of 1. :
[0078]
[0079] in and These represent the mean and standard deviation, respectively.
[0080] 2. Key Variable Identification: To capture the core dependencies within the data, it is necessary to find an input variable that has the strongest linear correlation with the target variable y. First, calculate... The correlation coefficient matrix was obtained, and the correlation coefficients between each variable and y were extracted. The variable with the largest absolute correlation coefficient was selected. As the key independent variable in the piecewise regression analysis, the planned arrival time (STA) and actual arrival time (ATA) were found to be most strongly correlated through correlation analysis, and were therefore selected as the key variable.
[0081]
[0082] 3. Data Segmentation and Theoretical Boundary Function Construction: The range of STA values is divided into 10 equally spaced intervals. Within each interval, the Isolation Forest algorithm is run, and the outlier ratio (contamination parameter) is adaptively adjusted according to the proportion of samples in that interval to the total samples, and outlier identification is performed separately.
[0083] 4. Dynamic Boundary Construction and Removal: After traversing all intervals, the algorithm temporarily removes points marked as anomalous by the isolated forest, leaving a "normal" sample set N. To further refine the boundaries, sample points with y-values at the highest percentile (95%) and lowest percentile (5%) are extracted from N. These sample points are then used to independently fit two curves as theoretical dynamic upper and lower boundaries.
[0084] Maximum:
[0085] Lower limit:
[0086] To enhance the robustness of the algorithm and reduce false positives, this study introduces a buffer zone based on the aforementioned theoretical boundary to construct the final safe boundary for discrimination:
[0087] or
[0088] in and Based on the buffer size set according to the data characteristics, all sample points that meet the above conditions are ultimately marked as outliers, and their row indexes are extracted for further filtering.
[0089] The specific steps for outlier detection are as follows: Figure 5 As shown.
[0090] Example 3
[0091] Building upon Example 1, this example discloses a feature selection method based on progressive random forest. Addressing the issue that the initial feature set contains multi-dimensional feature data, leading to redundancy, model overfitting, and high computational complexity, the specific solution is as follows:
[0092] 1. Dataset partitioning: The cleaned dataset is divided into training set, validation set and test set in a ratio of 8:1:1.
[0093] 2. Construct a Random Forest (RF) model and calculate importance: Train a random forest regression model containing 100 decision trees on the training set. Using out-of-bounds (OOB) samples, calculate the increase in MSE (Mean Sequence Size) before and after the permutation for each feature, which is used as the importance score for that feature. The higher the score, the more important the feature. Assumptions This represents a regression tree ensemble, for each feature... Importance score The calculation is as follows:
[0094]
[0095] 3. Progressive Validation: Select the top 1, 2, ... up to N features in descending order of importance score to construct subsets. For each subset, retrain the RF model and evaluate MAE and R² on the test set.
[0096]
[0097] 4. Determine the optimal feature set: Plot the curves of MAE and R² as a function of the number of features. Observation shows that when the number of features increases from 1 to N, the model performance improves significantly, achieving the best balance between model complexity and prediction performance.
[0098] Example 4
[0099] Building upon Example 1, this example discloses an instance of arrival time prediction based on a deep ensemble learning model. Addressing the issue that a single model struggles to simultaneously capture global interactions and long- and short-term temporal dependencies of features, resulting in insufficient prediction accuracy and robustness, the following approach is taken:
[0100] Transformer-LSTM branch:
[0101] 1. Input Representation and Position Encoding: Unlike traditional Transformer applications, the hybrid model proposed in this embodiment begins with a Sequence Input Layer. The original feature vector... By projecting the linear transformation onto a high-dimensional hidden space, the D-dimensional feature vector of each flight sample is obtained. Reconstruct it into a "feature sequence" with length L=1 and dimension D.
[0102]
[0103] in It is a learnable linear transformation matrix. For bias terms, This step hides dimensions for the model. Each discrete or numerical feature is transformed into a dense, distributed vector representation. Since the Transformer architecture itself does not contain positional information, to distinguish the physical meaning of different features and preserve their structural information, this invention introduces a Positional Embedding Layer (P). This layer generates a unique, learnable encoded vector for each feature dimension of the input sequence.
[0104]
[0105] This step ensures that the model can perceive that "feature A" and "feature B" are different entities, even if their values are the same.
[0106] 2. Multi-head self-attention mechanism: For input First, a query, key, and value matrix is generated through linear transformation, as shown in the following formula.
[0107]
[0108] Next, the enhanced feature representation will be... The input is an encoding module containing multiple layers of multi-head self-attention (MHSA). To further enhance the model's generalization ability and prevent overfitting when training data is limited, this invention adds a dropout layer after each self-attention layer. The dropout rate is dynamically adjusted based on the model's data processing performance to provide the necessary regularization effect, while avoiding significant damage to the model's representational ability. This improves the generalization performance of the Transformer-LSTM model in complex and ever-changing real-world aviation operating environments to a certain extent.
[0109] The MHSA mechanism allows each feature to interact with all other features and dynamically calculates its importance weights. Its model is as follows:
[0110]
[0111]
[0112] in, All features are obtained from the input through a linear transformation. This mechanism allows each feature to interact with all other features and dynamically calculates its importance weights. Furthermore, to ensure gradient flow and training stability, residual connections and layer normalization are introduced as shown in the equation.
[0113]
[0114] 3. LSTM and regression output layer: Figure 6 This paper demonstrates the specific implementation process of the LSTM layer and regression output layer in the improved Encoder module proposed in this invention. The high-level feature sequence processed by the Transformer module is fed into an LSTM layer, which acts as a nonlinear transformer to fuse and extract the feature sequence rich in interactive information into a highly comprehensive representation vector. This representation condenses all feature interaction information for the final prediction. The output of the LSTM layer is mapped to the final scalar prediction value through a fully connected layer (FC layer). As shown in the following formula, the regression output predicts the flight arrival time.
[0115]
[0116] The overall Transformer-LSTM branch structure is as follows: Figure 7 As shown.
[0117] RF branch:
[0118] Random Forest (RF) combines the ideas of Bagging and Random Subspace. By constructing and ensembling multiple decision trees, it effectively suppresses overfitting. Furthermore, RF can provide estimates of prediction uncertainty. By examining the variance of the distribution of predictions from all decision trees, one can intuitively determine the model's confidence level in a particular prediction. This uncertainty information is extremely valuable for applications in the risk-sensitive field of air transport.
[0119] During the training phase, the algorithm generates multiple training subsets through Bootstrap sampling and trains a decision tree independently based on each subset. When splitting at each node of the tree, instead of selecting the optimal feature from all M features, it selects from a randomly chosen subset of features, effectively enhancing the diversity of the base learners. For regression tasks, the final prediction value of RF... The result is given by a simple average of the predictions from all decision trees in the forest:
[0120] .
[0121] Example 5
[0122] Building upon Example 1, this example discloses an instance of automatic hyperparameter optimization based on IGWO. Addressing the issues of high training complexity in deep ensemble learning models, the time-consuming nature of parameter combination optimization using traditional methods, and the algorithm's tendency to get trapped in local optima, the solution is as follows:
[0123] To efficiently determine the optimal hyperparameter combination for the Transformer-LSTM-RF deep ensemble model, this embodiment employs an improved Grey Wolf Optimization (IGWO) algorithm for automated hyperparameter optimization. While retaining the social hierarchy structure of the Grey Wolf Optimization (GWO) algorithm, it introduces a dual candidate solution generation mechanism and an individual historical best memory function, significantly improving the algorithm's convergence accuracy and robustness. The IGWO algorithm basically consists of the following four steps:
[0124] 1. Initialization and Leadership Update: In the IGWO algorithm, the gray wolf population is divided into four levels: α (alpha wolf, representing the current optimal solution), β (second-best solution), δ (third-best solution), and ω (ordinary wolves, representing other candidate solutions). The optimization process updates the population position by simulating the wolf pack's behavior of surrounding, chasing, and attacking prey. Its core mathematical description is as follows:
[0125] a) The wolf pack's encirclement behavior of its prey can be simulated by the following formula:
[0126]
[0127] In the formula, t is the current iteration number; Represents the position vector of the prey; This represents the position vector of an individual gray wolf. and The coefficient vector is calculated by the following formula:
[0128]
[0129] In the formula and It is a random vector in the range [0,1]. The convergence factor decreases linearly from 2 to 0 during the iteration process, i.e. , This represents the maximum number of iterations.
[0130] b) During the hunt, the α, β, and δ wolves guide the ω wolf to update its position, mathematically expressed as:
[0131]
[0132] This mechanism ensures that candidate solutions can be efficiently explored in the search space based on the combined information of the three optimal solutions in the population. The core innovation of IGWO is that it generates two candidate solutions in parallel for each individual in each iteration.
[0133] 2. Dual candidate solution generation strategy:
[0134] a) GWO strategy candidate solutions: generated by the social hierarchy hunting rules of standard GWO, and their mathematical expression is:
[0135]
[0136] b) DLH Local Search Candidate Solutions: To enhance local exploitation capabilities, the algorithm introduces a strategy based on dynamic neighborhood. For each individual... First, calculate the Euclidean distance to the GWO candidate solution as the neighborhood radius. Subsequently, an individual is randomly selected from its neighborhood. And generate candidate solutions according to the following formula:
[0137]
[0138] In the formula It is a random vector in the range [0,1]. This represents element-wise multiplication. It is a random individual in the population, used to introduce perturbations to avoid local optima.
[0139] 3. Greedy selection and historical best update: After generating dual candidate solutions, the algorithm adopts a greedy strategy to select the better one, as shown in the following formula.
[0140]
[0141] Then, update the individual's historical best information. If the fitness of a new candidate solution is better than the historical best, then update it; that is, if... Then let and .
[0142] 4. Unified Evolution of the Population: Ultimately, the position of the entire population is uniformly updated to the historical best position of all individuals, i.e. This mechanism ensures that excellent discoveries are preserved and guides the population as a whole to robustly evolve towards the global optimum. This invention defines the Transformer-LSTM model hyperparameter optimization problem as a minimization problem, where the objective function F is the model's classification error rate on the validation set, and the position vector... Represents a set of hyperparameters:
[0143]
[0144] The optimization problem of the algorithm for the RF model can be expressed by the following constraints:
[0145]
[0146] in, The objective function is the model's mean absolute error (MAE) on the validation set. To determine the number of samples in the validation set, and These are the true value and the model prediction value of the i-th sample, respectively.
[0147] Example 6
[0148] This embodiment discloses a system for predicting flight arrival times. The system employs the aforementioned flight arrival time prediction method and specifically includes:
[0149] The multi-interval regression isolated forest outlier detection module is used to dynamically estimate the normal range of data and identify and remove records that significantly deviate from the normal pattern.
[0150] The progressive random forest feature selection module is used to perform feature importance ranking and progressive subset validation, and to identify the most predictive feature subsets.
[0151] The ITLR-based deep ensemble prediction module is used to perform parallel fusion of the three branches of Transformer, LSTM and Random Forest, and integrate them through a dynamic weight strategy.
[0152] The IGWO-based hyperparameter optimization module is used to perform efficient hyperparameter configuration for identifying ITLR deep ensemble models.
[0153] Of course, the present invention may have other various embodiments. Without departing from the spirit and essence of the present invention, those skilled in the art can make various corresponding changes and modifications according to the present invention, but these corresponding changes and modifications should all fall within the protection scope of the appended claims.
Claims
1. A method for predicting flight arrival time, characterized in that, include: Outlier detection in isolated forests using multi-interval regression: dynamically estimating the normal range of the original data, identifying and removing outlier data, where the original data includes historical operational data of the airport; Progressive random forest feature selection, which combines feature importance ranking and progressive subset validation, is used to identify the most predictive feature subsets to reduce model complexity and enhance generalization ability. The model is used to predict flight arrival times. The model is a deep ensemble model based on ITLR. It adopts a deep ensemble learning framework, which integrates the three branches of Transformer, LSTM and Random Forest in parallel, and integrates them through a dynamic weight strategy. Hyperparameter optimization based on IGWO automatically searches for model hyperparameters.
2. The flight arrival time prediction method according to claim 1, characterized in that, The outlier detection in the multi-interval regression isolated forest includes the following steps: a1. Anomaly preprocessing: First, missing entries in the original data matrix are filled by linear interpolation. Then, a moving median filter with a fixed window length is applied for preliminary anomaly screening and adjustment. Observations with absolute deviations exceeding the absolute deviations of the moving median by three median values are shortened to the corresponding upper / lower limit determined by the filter to obtain the cleaned data matrix. Then, the cleaned data is standardized. a2. Correlation analysis: Calculate the correlation coefficient matrix of standardized data, extract the correlation coefficient between each candidate input variable and the target variable, and select the variable with the largest absolute correlation coefficient as the key variable. a3. Data segmentation and theoretical boundary function construction: The domain of key variables is divided into K equally spaced intervals, and a quadratic polynomial regression model is fitted by the least squares method to characterize the overall trend relationship between key variables and target variables. Within each interval, an isolation forest is applied for initial anomaly screening, and the contamination rate is adaptively adjusted based on the proportion of the current interval sample to the entire dataset. a4. Sample selection and dynamic boundary determination: Remove the observations marked as anomalous by the isolated forest, and the remaining observations constitute the normal sample set. Extract the samples whose target variable values fall in the upper and lower percentiles from the normal sample set, and fit two curves to construct the theoretical dynamic upper and lower boundaries.
3. The flight arrival time prediction method according to claim 2, characterized in that, Step a4 also includes: introducing buffers near the upper and lower boundaries of the theoretical dynamics to construct safety boundaries, and marking samples that meet specific conditions as outliers and removing them.
4. The flight arrival time prediction method according to claim 1, characterized in that, The progressive random forest feature selection includes the following steps: b1. Data partitioning: A time-based partitioning strategy is adopted to divide the dataset into a training set and a test set. b2. Feature importance assessment: Using a random forest regression model containing S trees, feature importance scores are calculated based on out-of-package permutation error. For each feature, its value in the sample is randomly permuted, and the average increase in mean squared error before and after the permutation is calculated as a measure of feature importance. All features are arranged in descending order of importance. b3. Progressive subset validation: For each subset size, select the corresponding feature subset, retrain the random forest model, and evaluate it on the test set using mean absolute error and coefficient of determination to obtain the performance curve; b4. Optimal feature selection: The final feature subset is determined by detecting inflection points on the performance curve.
5. The flight arrival time prediction method according to claim 1, characterized in that, The method for integration using a dynamic weighting strategy is as follows: The Transformer-LSTM branch maps the original feature vector to a high-dimensional hidden space through the input representation layer, reformulates the feature vector of each flight into a feature sequence, introduces a position embedding layer to assign a unique learnable encoding vector to each feature dimension in the input sequence, forming an enhanced feature representation, and then passes the enhanced feature representation to an encoder composed of multiple multi-head self-attention layers; The high-level feature sequence generated by the Transformer is fed into the LSTM layer, and the interaction-aware representation is aggregated into a compact vector. The output of the LSTM is then passed through a fully connected layer to generate the final scalar prediction.
6. The flight arrival time prediction method according to claim 5, characterized in that, The encoder, composed of multiple layers and multi-head self-attention, allows each feature to pay attention to all other features and dynamically assigns attention weights that reflect relative importance.
7. The flight arrival time prediction method according to claim 5, characterized in that, Insert a Dropout layer after each self-attention block.
8. The flight arrival time prediction method according to claim 1, characterized in that, The abnormal data includes data generated due to special events and recording errors.
9. A system for predicting flight arrival times, characterized in that, The system employs the flight arrival time prediction method as described in any one of claims 1-8.
10. The system according to claim 9, characterized in that, include: The multi-interval regression isolated forest outlier detection module is used to dynamically estimate the normal range of data and identify and remove records that significantly deviate from the normal pattern. The progressive random forest feature selection module is used to perform feature importance ranking and progressive subset validation, and to identify the most predictive feature subsets. The ITLR-based deep ensemble prediction module is used to perform parallel fusion of the three branches of Transformer, LSTM and Random Forest, and integrate them through a dynamic weight strategy. The IGWO-based hyperparameter optimization module is used to perform efficient hyperparameter configuration for identifying ITLR deep ensemble models.