Electrolytic aluminum fluorine salt discharging amount prediction method and device based on regression model
By constructing a regression model-based method for predicting the amount of fluoride salts fed into electrolytic aluminum, and utilizing multi-source data preprocessing and a hybrid prediction model system, the subjectivity problem of traditional manual experience mode is solved, and the standardization and precision control of electrolytic aluminum production are realized, thereby improving the prediction accuracy and stability.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- YUNNAN ALUMINUM
- Filing Date
- 2026-03-24
- Publication Date
- 2026-06-19
Smart Images

Figure CN122241645A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of industrial production technology, and in particular to a method and apparatus for predicting the amount of electrolytic aluminum fluoride salts fed into the production process based on a regression model. Background Technology
[0002] With the deepening of industrial intelligent transformation, the electrolytic aluminum industry is actively exploring ways to improve the precision control and stability of the production process through data-driven and artificial intelligence technologies. The amount of fluoride salts (such as aluminum fluoride) fed is a key control parameter in electrolytic aluminum production, and its precise control is crucial for stabilizing the operating conditions of the electrolytic cell, reducing material consumption, and improving current efficiency.
[0003] In related technologies, current production planning and material control in electrolytic aluminum enterprises mainly rely on experience-based decision-making. Key process steps in electrolytic cells, such as alumina concentration adjustment and aluminum liquid height control, still heavily depend on the experience and judgment of operators, requiring frequent intervention to adjust cell conditions. However, the applicant recognizes that the traditional experience-based model inherently has its subjectivity and inconsistencies, making it difficult to achieve standardized and precise optimal control, and thus compromising production stability. Summary of the Invention
[0004] In view of this, this application provides a method and apparatus for predicting the amount of fluoride salts fed into electrolytic aluminum based on a regression model. The main purpose is to solve the problems of subjectivity and inconsistency inherent in traditional manual experience mode, which makes it difficult to achieve standardized and refined optimal control and ensure production stability.
[0005] According to the first aspect of this application, a method for predicting the feed amount of electrolytic aluminum fluoride salts based on a regression model is provided, the method comprising: Acquire multi-source raw time-series data from the electrolytic aluminum production process, preprocess the multi-source raw time-series data to obtain a basic feature dataset, the preprocessing includes time series alignment, data unit standardization, and missing value handling; Based on the aforementioned basic feature dataset, multiple types of derived features are constructed. These derived features are then integrated and filtered to obtain a high-dimensional feature set. The derived features include lag features, rolling statistical features, difference features, periodic features, and interaction features. Multiple gradient boosting decision tree models and deep time series prediction models are trained using the high-dimensional feature set to form a hybrid prediction model system. The characteristic data of the day to be predicted is input into the hybrid prediction model system to obtain multiple preliminary prediction results. The multiple preliminary prediction results are weighted and fused according to the validation set performance index of each model in the hybrid prediction model system to obtain the predicted value of the fluoride salt feeding amount for the day to be predicted.
[0006] According to a second aspect of this application, a device for predicting the amount of electrolytic aluminum fluoride salt feed based on a regression model is provided, the device comprising: The preprocessing module is used to acquire multi-source raw time-series data from the electrolytic aluminum production process, and to preprocess the multi-source raw time-series data to obtain a basic feature dataset. The preprocessing includes time series alignment, data unit standardization, and missing value handling. The feature construction module is used to construct multiple types of derived features based on the basic feature dataset, integrate and filter the multiple types of derived features to obtain a high-dimensional feature set. The multiple types of derived features include lag features, rolling statistical features, difference features, periodic features and interaction features. The model training module is used to train multiple gradient boosting decision tree models and deep time series prediction models using the high-dimensional feature set, forming a hybrid prediction model system. The prediction module is used to input the feature data of the day to be predicted into the hybrid prediction model system to obtain multiple preliminary prediction results. Based on the validation set performance index of each model in the hybrid prediction model system, the multiple preliminary prediction results are weighted and fused to obtain the predicted value of the fluoride salt feeding amount for the day to be predicted.
[0007] By employing the above technical solutions, the technical solutions provided in the embodiments of this application have at least the following advantages: This application provides a method and apparatus for predicting the amount of fluoride salt fed into electrolytic aluminum based on a regression model. The method involves acquiring multi-source raw time-series data from the electrolytic aluminum production process, preprocessing the data to obtain a basic feature dataset (including time series alignment, data unit standardization, and missing value handling), constructing multiple derived features based on the basic feature dataset, integrating and filtering these features to obtain a high-dimensional feature set (including lag features, rolling statistical features, difference features, periodic features, and interaction features), training multiple gradient boosting decision tree models and deep time-series prediction models using the high-dimensional feature set to form a hybrid prediction model system, inputting the feature data of the day to be predicted into the hybrid prediction model system to obtain multiple preliminary prediction results, and weighting and fusing these preliminary prediction results according to the validation set performance index of each model in the hybrid prediction model system to obtain the predicted value of the fluoride salt feeding amount for the day to be predicted. By preprocessing multi-source data and constructing five types of derived features in parallel, this method accurately adapts to the characteristics of time-series data in the electrolytic aluminum industry. Simultaneously, it employs a hybrid prediction architecture combining gradient boosting decision trees and deep time-series models. This ensures model interpretability while efficiently capturing the nonlinear coupling relationship between long-term time-series dependencies and process parameters. Furthermore, a weighted fusion strategy based on model validation set performance significantly improves the accuracy and robustness of fluoride feed rate prediction. Compared to traditional manual experience-based methods, this application achieves standardization, datafication, and precision in production control. It stabilizes electrolytic cell operating conditions, reduces fluoride salt consumption and energy consumption, eliminates reliance on experienced process engineers, and its modular design facilitates integration with existing production systems.
[0008] The above description is only an overview of the technical solution of this application. In order to better understand the technical means of this application and to implement it in accordance with the contents of the specification, and to make the above and other objects, features and advantages of this application more obvious and understandable, the following are specific embodiments of this application. Attached Figure Description
[0009] Various other advantages and benefits will become apparent to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the scope of this application. Furthermore, the same reference numerals denote the same parts throughout the drawings. In the drawings: Figure 1 This paper illustrates a flowchart of a method for predicting the amount of electrolytic aluminum fluoride salt feed based on a regression model, as provided in an embodiment of this application. Figure 2 This paper illustrates a flowchart of another method for predicting the amount of electrolytic aluminum fluoride salt feed based on a regression model, as provided in an embodiment of this application. Figure 3This paper presents a schematic diagram of a device for predicting the amount of electrolytic aluminum fluoride salt feed based on a regression model, as provided in an embodiment of this application. Detailed Implementation
[0010] In the description of this application, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings. They are used only for the convenience of describing this application and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, they should not be construed as limitations on this application.
[0011] Furthermore, the terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this application, "multiple" means two or more, unless otherwise explicitly specified.
[0012] In this application, unless otherwise expressly specified and limited, the terms "installation," "connection," "linking," and "fixing," etc., should be interpreted broadly. For example, they can refer to a fixed connection, a detachable connection, or an integral connection; they can refer to a mechanical connection or an electrical connection; they can refer to a direct connection or an indirect connection through an intermediate medium; and they can refer to the internal connection between two components. Those skilled in the art can understand the specific meaning of the above terms in this application according to the specific circumstances.
[0013] Exemplary embodiments of the present application will now be described in more detail with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this application will be thorough and complete, and will fully convey the scope of the present application to those skilled in the art.
[0014] Currently, production planning and material control in electrolytic aluminum enterprises mainly rely on experience-based decision-making. Key process steps in electrolytic cells (such as alumina concentration adjustment and aluminum liquid height control) still depend on manual judgment, requiring operators to frequently intervene to adjust cell conditions. However, this experience-driven decision-making model results in low operational standardization and significant deviations in execution between different shifts, making it difficult to guarantee production stability.
[0015] With the advancement of industrial intelligent transformation, the electrolytic aluminum industry is actively exploring new-generation technological solutions: multi-source data fusion technology, which attempts to integrate acoustic signals from electrolytic cells, thermal imaging data, and process parameters to build a more comprehensive operating condition perception system; and machine learning application exploration, where companies begin to try neural network algorithms to predict fluoride salt demand, and continue to explore feature engineering design and model generalization capabilities.
[0016] However, the current technology in the field of fluoride salt feeding control in the electrolytic aluminum industry still has obvious shortcomings: multi-source data fusion is only at the initial stage of integrating acoustic, thermal imaging and process parameters, and a standardized data governance and feature mining system adapted to the time-series characteristics of electrolytic aluminum has not yet been formed; machine learning applications are still in the exploratory stage, neural network algorithms lack specificity in feature engineering design, cannot fully capture the nonlinear coupling and time-series dependence between process parameters, have weak model generalization ability and insufficient prediction accuracy, and are difficult to stably adapt to the complex and ever-changing electrolytic cell conditions; the overall system still relies on human experience, the deployment of intelligent control systems is not yet mature, and it is impossible to achieve accurate, standardized and automated control of fluoride salt feeding, resulting in problems such as high material consumption, poor cell condition stability and large differences in personnel operation.
[0017] To address this issue, this application proposes a regression model-based method for predicting the amount of fluoride salt fed into electrolytic aluminum production. First, time-series data from multiple systems, including cell control and analysis, are uniformly aligned, dimensionally normalized, and intelligently processed for missing values to form high-quality foundational data. Then, by constructing various derived features such as lag, rolling statistics, difference, periodicity, and interaction, the temporal dynamics and coupling relationships of process parameters are deeply explored. Based on this, a hybrid prediction model system is constructed using gradient boosting trees and a deep time-series model, learning from complex feature interactions and long-term temporal dependencies respectively. The models are then weighted and fused based on their performance on the validation set, ultimately outputting an accurate prediction of the fluoride salt feeding amount. This application effectively overcomes the uncertainty of traditional manual experience and the limitations of single models, significantly improving prediction accuracy and stability in practical applications, and providing reliable data-driven decision support for refined production control, reduced material consumption, and improved energy efficiency. The implementing entity of this application may be an electrolytic aluminum fluoride salt feeding quantity prediction system. The electrolytic aluminum fluoride salt feeding quantity prediction system provides services to users by relying on the computing power of a server. The server may be an independent server or a server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDN), and big data and artificial intelligence platforms.
[0018] This application provides a method for predicting the amount of electrolytic aluminum fluoride salts fed into the reactor based on a regression model, such as... Figure 1 As shown, the method includes: 101. Obtain multi-source raw time-series data from the electrolytic aluminum production process, preprocess the multi-source raw time-series data, and obtain a basic feature dataset.
[0019] In this embodiment, firstly, raw time-series data indexed by cell number and timestamp are collected from multiple sources, including the cell control system, testing system, manual measurement, and production execution system of electrolytic aluminum production. This data covers core process parameters such as cell temperature, current, alumina concentration, and historical fluoride salt feed rate. Subsequently, the data is time-series aligned to construct a supervised learning sample format of "t-1 day features corresponding to t day fluoride salt feed rate," ensuring the temporal matching between the data and the prediction target. Next, data unit standardization is used to eliminate numerical scale differences between different physical quantity units, avoiding feature weight bias during model training. Finally, differential missing value processing is performed. Missing samples for key process parameters such as cell temperature and current are directly removed; for non-key parameters, imputation is performed based on the missing type using methods such as mean imputation, linear interpolation, or model prediction, thereby preserving the most effective data information.
[0020] The entire preprocessing process includes time series alignment, data standardization, and missing value handling. "Time series alignment" ensures that data from different sources correspond strictly in the same "slot number-time" dimension; "data standardization" eliminates differences in feature scale caused by different units of measurement; and "missing value handling" adopts differentiated strategies, from forward / backward imputation and interpolation to model prediction, based on the pattern of missing data and the importance of features, to ensure the integrity of the dataset.
[0021] The standardized data governance process described above can solve problems such as heterogeneous multi-source data, misaligned time sequences, and inconsistent data quality in industrial settings. This ensures the consistency and reliability of subsequent model input data, improves the model's adaptability to real production scenarios, avoids prediction biases caused by data noise or missing data, and lays a solid data foundation for high-precision modeling.
[0022] 102. Construct multiple types of derived features based on the basic feature dataset, integrate and filter these derived features to obtain a high-dimensional feature set.
[0023] In this embodiment, a preprocessed basic feature dataset is used as input to construct five types of derived features in parallel: historical values of the features from the previous few days are extracted to form lag features, thereby capturing the time-dependent inertia of process parameters; the mean, standard deviation, and other statistics within the sliding window are calculated to form rolling statistical features, reflecting the stability of the electrolytic cell operation; differential features are generated through first-order / second-order differencing to characterize the speed and acceleration of parameter changes; sine and cosine encoding of time-dimensional information is performed to form periodic features, thereby mining production rhythms and operating condition fluctuation patterns; and multiplication and division combinations of key process variables are constructed to form interactive features, thereby mining the nonlinear coupling relationship between parameters. Subsequently, the five types of derived features are integrated through feature importance evaluation and recursive feature elimination algorithms, and redundant features are removed, ultimately obtaining a high-dimensional feature set that balances information content and simplicity.
[0024] Among them, "lag features" introduce historical states, giving the model a memory function; "rolling statistical features" characterize recent trends and stability; "difference features" capture the speed and acceleration of change; "periodic features" encode possible weekly, monthly, and other cyclical patterns in production; and "interaction features" explore the product or proportional relationships between key process variables.
[0025] This application breaks through the limitations of traditional single-feature modeling by comprehensively exploring the temporal characteristics of electrolytic aluminum production from five dimensions: "historical inertia, operational stability, changing trends, periodic patterns, and parameter coupling." This avoids underfitting due to single features and controls the curse of dimensionality through feature selection, thereby improving the model's ability to capture complex operating conditions while ensuring the efficiency and stability of model training.
[0026] 103. Train multiple gradient boosting decision tree models and deep time series prediction models using high-dimensional feature sets to form a hybrid prediction model system.
[0027] This application's embodiments do not rely on a single model, but rather, based on a high-dimensional feature set, train multiple gradient boosting decision tree models and deep temporal prediction models separately to construct a complementary hybrid prediction system. For gradient boosting decision trees, different hyperparameters are used to train XGBoost and LightGBM models, and optimization is achieved by adding regularization, restricting leaf growth, and weighting key samples during training, making them adept at fitting structured features and residuals, and possessing strong interpretability. For deep temporal prediction, DLinear and Transformer models are trained, and early stopping mechanisms and regularization are used to avoid overfitting, enabling them to efficiently capture temporal dynamics.
[0028] These two types of models are trained in parallel. Gradient boosting trees are good at learning complex nonlinear interactions and decision boundaries from structured features, while deep time series models focus on identifying long-term dependencies and complex dynamic patterns from sequence data, thus forming a hybrid prediction capability that covers "structured features + long-term time series dependencies".
[0029] This application overcomes the performance limitations of single models by employing a hybrid architecture of "gradient boosting tree + deep time series model," achieving a balance between interpretability and prediction accuracy. The gradient boosting tree model efficiently handles high-dimensional features and provides a good interpretation of feature importance, while the deep time series model captures long-range temporal correlations that the former might overlook. The two complement each other, significantly improving the overall robustness and generalization ability of the model, and enabling it to adapt to the complex and variable operating conditions of electrolytic aluminum production.
[0030] 104. Input the feature data of the day to be predicted into the hybrid prediction model system to obtain multiple preliminary prediction results. Based on the validation set performance index of each model in the hybrid prediction model system, the multiple preliminary prediction results are weighted and fused to obtain the predicted value of the fluoride salt feeding amount for the day to be predicted.
[0031] In this embodiment, the feature data of the day to be predicted is input into the hybrid prediction model system to obtain the preliminary prediction results output by each sub-model. Then, based on the performance indicators of each model on the validation set, fusion weights are assigned, and multiple preliminary prediction results are weighted and summed to finally obtain the predicted value of the fluoride salt feeding amount for the day to be predicted. The model with better performance is assigned a higher weight, and the weight allocation logic tilts towards the model that performs better on the validation set, thereby ensuring that the fusion result can combine the advantages of each model and avoid the prediction bias of a single model under specific operating conditions.
[0032] By employing dynamic weighted fusion based on validation set performance, optimal integration of multi-model prediction results can be achieved. This avoids the risk of overfitting from a single model while fully leveraging the modeling advantages of different models, further improving the accuracy and stability of the final prediction results. Simultaneously, the fusion logic is clear, transparent, and traceable, facilitating subsequent iterative updates of weights based on production data to continuously optimize prediction performance and provide reliable data support for precise control of electrolytic aluminum fluoride salt feeding.
[0033] This application provides a method for predicting the amount of fluoride salt fed into electrolytic aluminum based on a regression model. Compared with the prior art, this application obtains multi-source raw time-series data from the electrolytic aluminum production process, preprocesses the multi-source raw time-series data to obtain a basic feature dataset, wherein the preprocessing includes time series alignment, data dimension standardization, and missing value handling; constructs multiple types of derived features based on the basic feature dataset, integrates and filters these derived features to obtain a high-dimensional feature set, wherein the multiple derived features include lag features, rolling statistical features, difference features, periodic features, and interaction features; trains multiple gradient boosting decision tree models and deep time-series prediction models using the high-dimensional feature set to form a hybrid prediction model system; inputs the feature data of the day to be predicted into the hybrid prediction model system to obtain multiple preliminary prediction results, and weights and fuses the multiple preliminary prediction results according to the validation set performance index of each model in the hybrid prediction model system to obtain the predicted value of the amount of fluoride salt fed into the day to be predicted. By preprocessing multi-source data and constructing five types of derived features in parallel, this method accurately adapts to the characteristics of time-series data in the electrolytic aluminum industry. Simultaneously, it employs a hybrid prediction architecture combining gradient boosting decision trees and deep time-series models. This ensures model interpretability while efficiently capturing the nonlinear coupling relationship between long-term time-series dependencies and process parameters. Furthermore, a weighted fusion strategy based on model validation set performance significantly improves the accuracy and robustness of fluoride feed rate prediction. Compared to traditional manual experience-based methods, this application achieves standardization, datafication, and precision in production control. It stabilizes electrolytic cell operating conditions, reduces fluoride salt consumption and energy consumption, eliminates reliance on experienced process engineers, and its modular design facilitates integration with existing production systems.
[0034] Furthermore, as a refinement and extension of the specific implementation methods of the above embodiments, and to fully illustrate the specific implementation process of this embodiment, this application provides another method for predicting the feed amount of electrolytic aluminum fluoride salts based on a regression model, such as... Figure 2 As shown, the method includes: 201. Obtain raw time-series data from multiple independent business systems during the electrolytic aluminum production process to form multi-source raw time-series data.
[0035] In this embodiment, raw time-series data from multiple independent business systems are acquired during the electrolytic aluminum production process, forming multi-source raw time-series data. These independent business systems include the cell control system, the testing system, the manual measurement system, and the production execution system. The cell control system provides operating parameters such as voltage, current, alumina feed rate, fluoride addition frequency, and effect frequency; the testing system provides data on molecular ratio and electrolyte composition such as iron and silicon; the manual measurement system provides process parameters such as electrolysis temperature, aluminum level, electrolyte level, and furnace bottom pressure drop; and the production execution system records production information such as actual aluminum output, aluminum output time, NB setting interval, and aluminum fluoride feed rate. The multi-source raw time-series data is uniformly collected through data interfaces or database reads, using the electrolytic cell number and timestamp as key indexes.
[0036] Multi-source raw time-series data enables comprehensive and three-dimensional digital perception of the electrolytic cell's production status. It not only covers all dimensions of parameters, from real-time voltage and current, chemical composition to temperature levels and production records, but more importantly, by unifying the spatiotemporal benchmark, it integrates the originally scattered and heterogeneous data streams into a standardized information source that can be directly analyzed by machines. This lays a solid and reliable data foundation for subsequent in-depth analysis and intelligent modeling, and completely changes the one-sidedness and lag of traditional data that relies on local or manual records.
[0037] 202. Group the multi-source raw time series data according to the electrolytic cell number to obtain the time series data of each electrolytic cell, and sort the time series data of each electrolytic cell in ascending order according to the timestamp to obtain the target time series data of each electrolytic cell.
[0038] In this embodiment, the system first uses the electrolytic cell number as a key identifier to segment the raw, mixed time-series data from different sources, constructing a dedicated data subset for each independent electrolytic cell, thereby ensuring clear isolation of data from each cell. In the aluminum electrolysis production process, each electrolytic cell is an independent operating unit, and its state evolution is both independent and continuous. Grouping by cell number ensures that the analysis is strictly limited to the lifecycle of a single electrolytic cell, avoiding erroneous conclusions due to interference between data from different electrolytic cells.
[0039] Next, within the data subset of each electrolytic cell, all data records are sorted in ascending order according to their timestamps, that is, ordered from the earliest time point to the latest time point. After this step, the data of each electrolytic cell forms a complete and ordered time series, namely the target time series data, ensuring that the data strictly follows the time causal relationship of production.
[0040] 203. For each electrolytic cell, extract the operating feature data and aluminum fluoride feed amount for multiple dates from the target time series data of the electrolytic cell, and construct supervised learning samples for multiple dates using the operating feature data and aluminum fluoride feed amount for multiple dates.
[0041] In this embodiment, time series alignment is first performed to achieve a modeling pattern of "predicting the target value of the next day using the data of the previous day". That is, the data is grouped according to the slot number and sorted according to the time order. The production data of a certain day is shifted forward by one time step and then matched with the amount of aluminum fluoride added on that day, thereby constructing a supervised learning sample consisting of "the operating characteristics of the previous day + the target value of the current day".
[0042] Specifically, for each electrolytic cell, operational characteristic data for multiple dates and aluminum fluoride feed amounts for multiple dates are extracted from the target time-series data of the electrolytic cell. Since the data for each electrolytic cell is processed independently, a unique and continuous time-series data is formed. In this series, time points are usually measured in days, and each time point contains two types of core information: first, the operational characteristic data of the electrolytic cell on that day, such as multiple process parameters such as voltage, current, temperature, molecular ratio, and number of effects; second, the actual aluminum fluoride feed amount of the electrolytic cell on that day.
[0043] Next, supervised learning samples for multiple dates are constructed using operational feature data and aluminum fluoride feeding amounts from multiple dates. Specifically, the supervised learning samples for day t include operational feature data from day t-1 and aluminum fluoride feeding amounts from day t. The key to data processing lies in establishing the correct supervised learning relationships. For any date t, the system performs the following operations: First, feature extraction is performed, extracting all relevant operational feature data reflecting the electrolytic cell's daily operating status under date t and integrating them into a feature vector (X_t). Next, label extraction is performed, extracting the actual aluminum fluoride feed amount recorded on date t and using it as the target value or label (Y_t). Finally, offset pairing is performed, shifting the operational feature data of date t one day forward on the timeline. Features originally belonging to day t are relabeled as being associated with day t-1. After the shift, the feature data of day t-1 (X_t-1) is paired with the feed amount (Y_t) of day t, forming a supervised learning sample in the format (X_t-1, Y_t). The underlying logic is to predict the required feed amount for the current day based on the previous day's production status.
[0044] By shifting the feature data backward by one day, it can be ensured that the feature inputs seen by the model in any training sample are strictly earlier in time than the target output it is trying to predict. This fundamentally avoids the error of using future information to predict the past, ensuring that the model learns real causal relationships rather than spurious correlations, thereby guaranteeing the effectiveness of the model in actual future predictions.
[0045] 204. Perform data dimension standardization on the supervised learning samples from multiple dates to obtain the supervised learning sample set of the electrolytic cell, and use a differential processing strategy to process the supervised learning sample set of each electrolytic cell to obtain the basic feature dataset.
[0046] In this embodiment, data dimension standardization is a unified conversion of process parameters with different physical units. For example, the NB interval is converted from 0.1 seconds to seconds, the aluminum output is converted from tons to kilograms, and the aluminum level and electrolyte level are converted from millimeters to centimeters. This eliminates the influence of different dimensions and improves the stability of model training.
[0047] In industrial production data, various forms of data loss may occur due to factors such as equipment communication anomalies, data acquisition failures, or missing manual records. This application employs a systematic, tiered, rule-based intelligent decision-making process to address missing values. Instead of a one-size-fits-all approach, it performs differentiated operations based on the severity, pattern, and importance of the missing data. It should be noted that the missing value rate should not exceed 10% within any given time period. In the case of a single missing value, predefined values "0" or "999" can be used to fill the gap.
[0048] Specifically, for each electrolyzer's supervised learning sample set, if a supervised learning sample is found to be missing key process parameters such as electrolysis temperature, aluminum level, and electrolyte level, that sample will be removed from the supervised learning sample set. This strategy prioritizes ensuring the absolute accuracy and completeness of the core input data, avoiding model risks introduced by estimating key process states.
[0049] When scattered missing data points are detected in the non-critical process parameters of the supervised learning sample set, and the missing proportion is less than or equal to a first threshold, time series imputation methods are used to impute the supervised learning sample set. These time series imputation methods include forward imputation and backward imputation, suitable for scenarios with strong data continuity and small short-term fluctuations. Forward imputation is preferred, meaning the previous valid observation value at the missing time point is used to fill the gap; if forward imputation is not possible, for example, if the missing data point is the first data point, backward imputation is used, using the next valid observation value.
[0050] When random missing parameters of non-critical process parameters are detected in the supervised learning sample set, and the missing percentage is less than or equal to a second threshold, statistical imputation methods are used to impute the supervised learning sample set. These methods include mean imputation, median imputation, and mode imputation. Mean imputation is suitable for uniform data distribution, median imputation is suitable for situations with outliers, and mode imputation is suitable for categorical or discrete features. When missing parameters are random and the percentage is low, estimation using the central tendency of the variable has the least impact on the overall data distribution.
[0051] When a non-critical process parameter in the supervised learning sample set is found to be missing for consecutive time periods (i.e., no data for a parameter at multiple consecutive time points), interpolation methods are used to fill in the missing data. These interpolation methods include linear interpolation, polynomial interpolation, and spline interpolation. Linear interpolation is preferred, assuming the data changes linearly within the missing interval. For data with a clear non-linear trend, polynomial or spline interpolation can be used to fit the curve and estimate the parameter's potential trend during the missing period, using known data points before and after the missing time period.
[0052] When the importance level of a feature parameter in the supervised learning sample set is detected to be greater than or equal to a preset importance level, and the missing proportion is greater than or equal to a third threshold, that is, the feature is determined to have a significant impact on the prediction model through methods such as domain knowledge or preliminary analysis. In this case, the supervised learning sample set is filled with either the K nearest neighbor algorithm or a regression model. The K nearest neighbor algorithm estimates the feature value using the K most similar samples to other features, while the regression model predicts missing values using other features. Both algorithms can leverage the complex relationships between other complete features and missing features in the dataset to perform more intelligent and accurate estimations.
[0053] By standardizing units, all process parameters maintain a consistent numerical scale, preventing the model from overemphasizing large numerical features and neglecting key small-scale features due to differences in units. This ensures a fair contribution of each feature to the prediction results and improves the stability of model training. The differentiated processing strategy is designed for different missing data types in industrial data. It avoids data loss due to simple deletion while ensuring the reliability of samples when key parameters are missing, achieving an optimal balance between "preserving effective information" and "ensuring data quality."
[0054] 205. Construct multiple derived features in parallel based on the basic feature dataset, and then concatenate the multiple derived features with the basic feature dataset to obtain an extended feature matrix.
[0055] In this embodiment, the various feature derivation methods are not executed sequentially, but rather in parallel, using the basic feature dataset as a common input. That is, each feature derivation method directly operates on the basic feature dataset, rather than using the output of one method as the sole input to the next. After completing the feature derivation calculations, all generated features are uniformly integrated and filtered to form the final feature set used for model training. This parallel construction and unified fusion approach avoids the propagation of errors during feature calculation and improves feature representation capabilities.
[0056] Specifically, after acquiring the basic feature data, lag features are first constructed to reflect the impact of historical states on current control behavior during the operation of the electrolyzer. The specific processing method involves grouping the data by cell number and arranging it chronologically, generating multiple time-lag variables for each basic feature, such as corresponding feature values for the previous day, two days, three days, and four days. This process yields a lag feature matrix describing the historical operating states, used to characterize the time dependencies of the production process.
[0057] In the basic feature dataset, a supervised learning sample set for each electrolytic cell is extracted. For each electrolytic cell, multiple target feature variables, such as voltage and temperature, are selected in its supervised learning sample set. For each target feature variable, the time series value of the target feature variable is shifted backward along the time axis by multiple preset unit time intervals to obtain multiple time lag features corresponding to the target feature variable. These preset unit time intervals include 1 unit time interval, 2 unit time intervals, 3 unit time intervals, and 4 unit time intervals. These multiple time lag features corresponding to each target feature variable are added to the supervised learning sample set of the electrolytic cell to obtain the lag features of the electrolytic cell.
[0058] Based on this, rolling statistical characteristics are further calculated for key continuous variables to reflect the changing trends of process parameters within a certain time window. Specifically, variables such as operating voltage, molecular ratio, electrolysis temperature, and aluminum fluoride feed rate are selected, and their mean, standard deviation, maximum value, and minimum value are calculated within a set sliding window (such as 3 days, 5 days, or 7 days), thereby forming trend statistical characteristics describing the local operational stability and fluctuation degree.
[0059] At least one key continuous variable is selected from the supervised learning sample set of the electrolyzer. These variables are typically process parameters that significantly affect the production status and the predicted target fluoride feed rate, such as operating voltage, molecular ratio, electrolysis temperature, and aluminum fluoride feed rate. The rolling statistics for each key continuous variable within a sliding window of a preset time length are calculated. The rolling statistics for each key continuous variable are then added to the supervised learning sample set of the electrolyzer to obtain the rolling statistical characteristics of the electrolyzer. The rolling statistics include the rolling mean, rolling standard deviation, rolling maximum, and rolling minimum. Using each day in the supervised learning sample set as the current point, a sliding window is formed by backward truncation of continuous data including the current point and of a preset length. A series of rolling statistics are calculated on the data within this window. The rolling mean reflects the average level or central trend of the parameter within the recent window; the rolling standard deviation reflects the fluctuation or stability of the parameter within the recent window; and the rolling maximum and rolling minimum reflect the upper and lower limits of the parameter's range of change within the recent window. This window slides daily as the current point moves, thus generating a set of corresponding rolling statistics for each day of the time series.
[0060] Subsequently, to characterize the rate and trend of variable change, difference features were constructed for some continuous variables. First-order difference features were formed by calculating the difference between the current value and the previous value, and second-order difference features were obtained by further calculating the difference between adjacent first-order differences, so as to reflect the rate and acceleration information of production parameter changes.
[0061] For each key continuous variable, its first-order difference feature sequence is calculated, which contains first-order difference feature values at multiple time points. At any given time point, the first-order difference feature value equals the difference between the parameter value of the key continuous variable at that time point and its parameter value at the previous adjacent time point. Based on the first-order difference feature sequence of the key continuous variable, its second-order difference feature sequence is calculated, which contains second-order difference feature values at multiple time points. At any given time point, the second-order difference feature value equals the difference between the first-order difference feature value of the key continuous variable at that time point and its first-order difference feature value at the previous adjacent time point. The first-order difference feature sequence and the second-order difference feature sequence of each key continuous variable are added to the supervised learning sample set of the electrolyzer to obtain the differential features of the electrolyzer.
[0062] Furthermore, periodic information is extracted from the time field to capture potential cyclical patterns in the production process. Specifically, time variables such as month, weekday, and day number are extracted from the date field, and sine and cosine functions are used to encode the periodic variables, allowing the time features to maintain their continuous and periodic expressive ability in the model.
[0063] At least one time element is extracted from the time field of the supervised learning sample set. Each time element is encoded using sine and cosine functions to obtain the time feature code corresponding to each time element. Then, the time feature code corresponding to each time element is added to the supervised learning sample set of the electrolytic cell to obtain the periodicity feature of the electrolytic cell.
[0064] Meanwhile, to uncover the nonlinear coupling relationships between variables, interactive features were constructed for some key process variables. For example, by calculating the product of the molecular ratio and voltage, the ratio of voltage to resistance, and the proportional relationship between the amount of aluminum fluoride fed and the amount of alumina fed, combined variables that can reflect the process coupling relationships were formed.
[0065] Several key process variables were selected from the supervised learning sample set of the electrolytic cell. These key process variables included molecular ratio, operating voltage, resistance, aluminum fluoride feed rate, and alumina feed rate. Multiple key process variables were combined through pre-defined mathematical operations to generate multiple interactive feature variables. The interactive feature variables were any one of the following: the product of molecular ratio and operating voltage, the ratio of operating voltage to resistance, or the ratio of aluminum fluoride feed rate to alumina feed rate. These multiple interactive feature variables were added to the supervised learning sample set of the electrolytic cell to obtain the interactive features of the electrolytic cell.
[0066] After constructing the aforementioned features, lag features, rolling statistical features, difference features, periodic features, and interaction features are integrated to form multiple derived features. These derived features are then horizontally concatenated with the basic feature dataset to obtain an extended feature matrix. The construction of all derived features strictly follows a chronological order, using only historical and current information and never future information. This fundamentally ensures the reliability of the features, avoids data leakage during prediction, and allows the model to learn true causal relationships. Furthermore, each feature is generated independently, avoiding the propagation of errors through sequential processing, thereby improving computational efficiency and feature quality.
[0067] 206. Feature importance assessment and recursive feature elimination methods are used to filter features in the extended feature matrix to obtain a high-dimensional feature set.
[0068] In this embodiment, a feature importance assessment method and a recursive feature elimination method are used to filter features in the extended feature matrix, removing redundant or low-contribution variables, thereby obtaining a high-dimensional feature set.
[0069] Specifically, feature importance assessment methods employ a model capable of evaluating feature importance, such as XGBoost or Random Forest, and perform initial training on the training set using the entire extended feature matrix. After model training, an importance score is calculated for each input feature. This score is typically based on the frequency with which the feature is used to split nodes when constructing all decision trees, or the total reduction in impurity it brings. All features are then sorted in descending order according to their importance scores, thus clearly identifying which features contribute most to the prediction target from the model's perspective; for example, it might be the "rolling mean of the numerator ratio over the past 3 days" or the "voltage-to-resistance ratio." This stage provides a quantified priority list, and features with extremely low importance can be initially identified as candidates for elimination.
[0070] The recursive feature elimination method is a greedy algorithm that iterates through the entire model, using the final model performance as the criterion for judgment. Its goal is to find the optimal subset of features. It begins by iterating through the high-ranking features retained based on importance evaluation until a stopping condition is met, such as reaching a preset number of features or performance starting to decline. Step A (Training and Evaluation): Train a new prediction model, such as an XGBoost model, using the current feature subset, and evaluate the model's performance on an independent validation set.
[0071] Step B (Importance Ranking): Obtain the importance ranking of all features under the current model.
[0072] Step C (Feature Removal): Remove one or a small subset of features that are currently ranked last in importance, forming a new, smaller subset of features.
[0073] Then, repeat step A and start the next round of the loop with the new feature subset.
[0074] Throughout the recursive elimination process, the model's performance on the validation set is recorded after each iteration. Finally, the feature subset that performs best on the validation set is selected. This subset is usually not the original full set, but a simplified set with fewer features while maintaining or even improving prediction accuracy.
[0075] After obtaining a high-dimensional feature matrix containing lag features, rolling statistical features, difference features, periodic features, and interaction features, given that electrolytic aluminum production data has strong time dependence, medium sample size, complex feature structure, and high interpretability requirements in production scenarios, this application adopts a combination of machine learning models and deep time series models to construct a prediction model system.
[0076] First, at the machine learning level, gradient boosting decision tree models, including XGBoost and LightGBM, were selected. These models offer significant advantages in processing structured industrial data, effectively capturing nonlinear relationships and complex interactions between variables. They also possess good interpretability, enabling the analysis of the impact of key process variables on fluoride salt feed rates using feature importance assessment. Furthermore, tree models are robust to outliers and missing values, adapting to the common data incompleteness issues in industrial datasets.
[0077] Secondly, to further depict the long-term temporal dependencies and dynamic evolution characteristics in the electrolysis production process, deep time series prediction algorithms, such as DLinear and the Transformer series of models, are introduced into the model system. These models can perform end-to-end modeling of time series data through neural network structures. DLinear enhances its ability to depict long-cycle changes through trend and seasonal term decomposition mechanisms, while the Transformer series of models uses a self-attention mechanism to capture the global dependencies between different time steps in the sequence, thereby more accurately learning complex temporal patterns in historical operating data and improving the ability to predict future production trend changes.
[0078] By combining the high interpretability of tree models with the dynamic modeling capabilities of deep time series models, we can ensure the stability of predictions and improve the ability to depict the changing trends of complex production processes, thus providing a more reliable model basis for predicting the amount of fluoride salts fed into electrolytic aluminum production.
[0079] 207. Train multiple initial gradient boosting decision tree models using high-dimensional feature sets to obtain multiple gradient boosting decision tree models.
[0080] In this embodiment, multiple gradient boosting decision tree initial models are trained using a high-dimensional feature set to obtain multiple gradient boosting decision tree models. The initial gradient boosting decision tree models are either XGBoost or LightGBM models.
[0081] XGBoost (eXtreme Gradient Boosting) is an ensemble learning algorithm based on Gradient Boosting Decision Tree (GBDT). It can build multiple weak learners by progressively fitting residuals and continuously optimize the loss function, thereby improving the model's predictive ability.
[0082] Specifically, to ensure the model learns true temporal patterns and avoids data leakage caused by using future information to predict the past, a rolling window approach is used to dynamically divide the training and validation sets. For example, data from months 1-12 is used for training, and data from month 13 is used for validation; then the window is moved forward, using data from months 2-13 for training, and data from month 14 for validation, and so on. This rigorously simulates a progressive prediction scenario in reality.
[0083] Then, using grid search or Bayesian optimization methods, the key parameters of multiple XGBoost models are optimized and searched using both the training and validation sets, resulting in multiple target XGBoost models. These methods systematically search within a predefined parameter space, such as the maximum tree depth, learning rate, and number of weak learners. The validation set obtained in the previous step is used to evaluate the performance of different parameter combinations, thereby finding the optimal or near-optimal parameter configuration for each model on its corresponding data slice. The objective function of the XGBoost model includes L1 and L2 regularization terms. L1 regularization helps with feature selection, generating sparse solutions; L2 regularization constrains model weights, preventing them from becoming excessively large. Together, they effectively control model complexity, improve generalization ability, and prevent the model from merely "memorizing" noise in the training data. Through these steps, multiple target XGBoost models with different hyperparameter combinations are ultimately obtained. This diversity stems from different initial parameters or optimal solutions found at different stages of the rolling window.
[0084] It should be noted that, considering the characteristics of electrolytic aluminum production data, such as strong time correlation, high feature dimensionality, and uneven sample distribution, this application has made the following optimizations to the XGBoost model: First, a time-series rolling training mechanism is introduced into the training strategy. Since production data has obvious time-series characteristics, in order to avoid future information leakage, the training set and validation set are divided according to time sequence during the model training process, and a rolling window approach is used to perform multiple training and validations, so that the model can better adapt to the dynamic changes in the production process.
[0085] Secondly, regarding model structure, L1 and L2 regularization terms are introduced to control model complexity, and key parameters are optimized, including tree depth (max_depth), learning rate (learning_rate), number of weak learners (n_estimators), and feature sampling ratio (colsample_bytree). Optimal parameter combinations are automatically found through grid search or Bayesian optimization methods to improve model generalization ability and reduce overfitting risk.
[0086] LightGBM (Light Gradient Boosting Machine) is a high-efficiency machine learning algorithm based on the gradient boosting framework. It uses a leaf-wise strategy to construct decision trees, which has higher training efficiency and stronger non-linear fitting ability compared to traditional GBDT.
[0087] Specifically, the leaf-wise growth strategy is constrained by setting parameters such as the number of leaf nodes, maximum tree depth, and minimum number of samples per leaf node. This prevents the model from growing into overly complex and deep trees due to excessive pursuit of split gain, thus effectively controlling the risk of overfitting while maintaining its high efficiency. The number of leaf nodes limits the maximum number of leaf nodes in a single decision tree, which is a key parameter for controlling tree complexity; the maximum tree depth limits the maximum depth of a single decision tree to prevent the model from overgrowing and learning overly specific rules that may just be noise; the minimum number of samples per leaf node specifies the minimum number of data samples that a leaf node must contain, ensuring that each decision node has sufficient statistical support and avoiding decisions based on a very small number of outliers.
[0088] Next, the high-dimensional feature set is discretized using the histogram binning algorithm. Then, through feature sampling and sample sampling mechanisms, multiple LightGBM models are trained using a constrained leaf-wise growth strategy and the discretized high-dimensional feature set, resulting in multiple target LightGBM models. The histogram binning algorithm discretizes continuous floating-point feature values into integer buckets and finds the optimal split point based on the gradient histogram of these buckets. This significantly reduces the number of split points that need to be evaluated, substantially reduces memory consumption, and greatly improves training speed, especially suitable for high-dimensional feature data. Traditional algorithms need to examine every unique value of each feature as a potential split point, resulting in high computational cost. However, in the decision tree construction process, histogram binning no longer traverses all possible split points but instead finds the optimal split point based on the histogram constructed on these buckets using the feature gradients (first and second derivatives), significantly reducing memory consumption and greatly improving training speed.
[0089] Feature sampling involves randomly selecting a subset of all features during the construction of each decision tree to find the optimal split point. Sample sampling involves randomly and with replacement drawing a subset of training data at a certain frequency to train the next tree. These two mechanisms introduce randomness into the training of each tree, increasing the diversity between models, similar to the idea of ensemble learning, thereby improving the overall model's generalization ability and stability. Through this configuration, multiple objective LightGBM models are trained, and their diversity may stem from different sampling ratios, growth constraint parameters, etc.
[0090] It should be noted that, considering the characteristics of electrolytic aluminum production data, this application has made the following optimizations to the LightGBM model: First, by limiting parameters such as the number of leaf nodes (num_leaves), the maximum tree depth (max_depth), and the minimum number of samples in a leaf node (min_data_in_leaf), the leaf-wise growth strategy is constrained, thereby avoiding overfitting of the model.
[0091] Secondly, the LightGBM histogram-based algorithm is used to bin the continuous variables, thereby reducing data dimensionality and improving training efficiency. Simultaneously, the model's generalization ability is further enhanced by combining feature sampling and bagging sampling mechanisms.
[0092] In addition, to enhance the model's focus on key operating states, higher weights are assigned to some key operating condition data during training, enabling the model to more effectively learn the characteristic patterns of abnormal fluctuation stages and important operating states.
[0093] Finally, multiple objective XGBoost models and multiple objective LightGBM models are integrated as multiple gradient boosting decision tree models. A single machine learning model, no matter how excellent, may have inherent blind spots or be sensitive to specific data distributions. By integrating multiple XGBoost models and multiple LightGBM models, when one model fails to predict under certain conditions due to its limitations, the other models can play a corrective and balancing role, thereby ensuring that the output of the entire system is more stable and reliable, and reducing the prediction risk caused by accidental model errors.
[0094] Although XGBoost and LightGBM belong to the gradient boosting decision tree family, they differ in implementation details and optimization priorities. For example, XGBoost is extremely rigorous in regularization control, while LightGBM has advantages in training speed and memory efficiency, and its leaf-wise growth strategy may capture different data splitting patterns. Integrating the advantageous versions of these two algorithms is equivalent to fusing a more conservative and robust understanding of the data with a more efficient and in-depth perspective on data segmentation. This fusion allows the model ensemble to uncover the complex relationships between features and targets from more diverse perspectives, thereby making more comprehensive and accurate integrated judgments.
[0095] Furthermore, each model learns slightly different data patterns during training due to the randomness of data sampling, feature sampling, or parameter settings. By aggregating the predictions of these models, the prediction variance of individual models can be effectively smoothed. This means that the overall prediction after ensemble is usually closer to the true value and has less fluctuation than the prediction of any single model, thus achieving higher average accuracy and more stable performance on the test set or in real-world applications.
[0096] When selecting a suitable algorithm, it is necessary to comprehensively consider factors such as the characteristics of the data, the requirements of the task, and the complexity of the model. Given the special characteristics of time-series data, this application prioritizes models that can effectively capture time dependencies and have high interpretability. Therefore, tree models suitable for time-series data, such as Gradient Boosting Trees (GBDT) and XGBoost, are chosen. These algorithms have strong performance capabilities and provide relatively good interpretability. Compared with the backpropagation and multi-layered structures of neural networks, tree models are more computationally efficient, especially suitable for small to medium-sized datasets. Tree models can be used for interpretability analysis through feature importance evaluation and path tracing of single decision trees, facilitating understanding and optimization. Furthermore, tree models can handle missing values well, without the need for complex data preprocessing required by deep learning models. Although neural networks perform exceptionally well in certain tasks, tree models are more suitable in this application. With their lower computational cost, strong interpretability, good real-time prediction capabilities, and simple and efficient tuning process, tree models can effectively support prediction tasks in production and provide a reliable basis for business decisions.
[0097] 208. The deep time series prediction initial model is trained using a high-dimensional feature set to obtain the deep time series prediction model.
[0098] In this embodiment, time-series data is extracted from a high-dimensional feature set, and a sliding time window method is used to construct a training sample set using the time-series data. Specifically, data with temporal order is extracted from the high-dimensional feature set to form a time series, and then a sliding time window method is used to construct a training sample set. This method uses a fixed-length historical window as the input feature, such as data from the past 30 days, and uses the amount of fluoride salt produced at one or more time points after this window as the prediction target. By traversing the entire time axis through the sliding window, a large number of "history-future" corresponding sample pairs are generated, providing structured data for the model to learn temporal causal relationships.
[0099] The DLinear and Transformer models were trained using training sample sets to obtain the target DLinear and target Transformer models. The objective functions of both models incorporate L1 and L2 regularization terms to constrain model complexity. L1 regularization facilitates feature selection, encouraging the model to focus on key signals; L2 regularization effectively controls model complexity by penalizing larger weight parameters. This shared strategy significantly reduces the risk of overfitting on the training data, ensuring that the learned patterns have good generalization ability and can be more reliably applied to predicting future data. The DLinear model training decomposes the input sequence into trend and seasonal terms, maps them through linear layers, and then combines them. The Transformer model training utilizes a self-attention mechanism to capture global dependencies between any time steps in the sequence.
[0100] Finally, the target DLinear model and the target Transformer model were used as deep time series prediction models, achieving complementary advantages. The DLinear model has a simple structure and high parameter efficiency, and is particularly good at capturing deterministic long-term trends and periodic changes in time series; while the Transformer model, with its powerful self-attention mechanism, can effectively model complex, nonlinear, long-distance dynamic dependencies in the sequence. The combination of the two enables the deep time series prediction component to cover a variety of time patterns, from linear trends to complex dynamics.
[0101] It should be noted that, in order to more deeply characterize the long-term time dependencies and dynamic evolution patterns in the electrolysis production process, this application introduces a deep time series prediction model into the model system, encompassing the DLinear and Transformer series models. The DLinear model effectively captures long-term trends in the production process by decomposing the time series into trend and seasonal components and performing linear modeling on each. The Transformer series models, on the other hand, employ a self-attention mechanism to perform global modeling on different time steps in the sequence, thereby learning the complex time dependencies in historical operating data.
[0102] During model training, derived features and historical time-series data are used as input to the model. Training samples are constructed using a sliding time window, enabling the model to learn production change patterns at different time scales. Simultaneously, regularization strategies and early stopping mechanisms are employed to control the model training process and avoid overfitting issues in deep models. The prediction results output by the deep time-series model can be used as auxiliary prediction results, fused with or compared with the prediction results of the tree model, thereby further improving the overall prediction accuracy and stability.
[0103] 209. Combine multiple gradient boosting decision tree models and deep time series prediction models to form a hybrid prediction model system.
[0104] In this embodiment, to further improve the prediction accuracy and stability of the model in the electrolytic aluminum production scenario, targeted optimizations are performed based on the characteristics of industrial time-series data, in addition to standard machine learning algorithms. Simultaneously, a prediction model system is constructed, consisting of two XGBoost models with different parameter configurations, two LightGBM models with different parameter configurations, and a deep time-series model. Overall prediction performance is improved by optimizing the model structure, training strategy, and employing model fusion.
[0105] This application proposes a systematic solution to common challenges in the industry, overcoming the limitations of traditional static modeling methods. By constructing a closed-loop system of "perception-cognition-decision," it significantly improves the scientific nature of production planning and the accuracy of material control, providing an innovative technical path for the intelligent upgrading of the electrolytic aluminum industry. Furthermore, the system has been validated in actual production environments for its comprehensive advantages in stabilizing cell conditions, reducing material consumption, and improving energy efficiency, demonstrating significant industry-wide application value. This application leverages a cross-system data governance framework to integrate multi-source data, achieving data flow connectivity across multiple systems such as DCS, MES, and EMS, and constructing a comprehensive feature library including material and energy balance. By integrating mechanistic processes with machine learning algorithms, it effectively improves model accuracy, enabling more precise electrolytic cell control. This application has created a large-scale AI computational digital twin model platform for automated electrolytic production control, forming a data-driven management and control model for electrolytic aluminum production that integrates data analysis, indicator management, process management, production standard maintenance, electrolytic cell rating, execution strategies, tracking analysis, and optimization iteration.
[0106] 210. Input the feature data of the day to be predicted into the hybrid prediction model system to obtain multiple preliminary prediction results. Based on the validation set performance index of each model in the hybrid prediction model system, the multiple preliminary prediction results are weighted and fused to obtain the predicted value of the fluoride salt feed amount for the day to be predicted.
[0107] In this embodiment, the feature data of the day to be predicted is input into each model in the hybrid prediction model system to obtain the preliminary predicted value of fluoride salt feeding amount output by each model. For each model, the fusion weight corresponding to the model is calculated based on the model's validation set performance index. The preliminary predicted value of fluoride salt feeding amount of the model is multiplied by the corresponding fusion weight to obtain the model's weighted predicted value. The validation set performance index is a pre-evaluated performance index on a historical independent validation set. This validation set has never participated in the model parameter update during the training phase, so its evaluation is objective. The most commonly used indexes are root mean square error or mean absolute error, which quantify the average deviation between the model's predicted value and the true value. Finally, the weighted predicted values of multiple models in the hybrid prediction model system are weighted to obtain the predicted value of fluoride salt feeding amount for the day to be predicted.
[0108] This application provides a method for predicting the amount of fluoride salt fed into electrolytic aluminum based on a regression model. Compared with the prior art, this application obtains multi-source raw time-series data from the electrolytic aluminum production process, preprocesses the multi-source raw time-series data to obtain a basic feature dataset, wherein the preprocessing includes time series alignment, data dimension standardization, and missing value handling; constructs multiple types of derived features based on the basic feature dataset, integrates and filters these derived features to obtain a high-dimensional feature set, wherein the multiple derived features include lag features, rolling statistical features, difference features, periodic features, and interaction features; trains multiple gradient boosting decision tree models and deep time-series prediction models using the high-dimensional feature set to form a hybrid prediction model system; inputs the feature data of the day to be predicted into the hybrid prediction model system to obtain multiple preliminary prediction results, and weights and fuses the multiple preliminary prediction results according to the validation set performance index of each model in the hybrid prediction model system to obtain the predicted value of the amount of fluoride salt fed into the day to be predicted. By preprocessing multi-source data and constructing five types of derived features in parallel, this method accurately adapts to the characteristics of time-series data in the electrolytic aluminum industry. Simultaneously, it employs a hybrid prediction architecture combining gradient boosting decision trees and deep time-series models. This ensures model interpretability while efficiently capturing the nonlinear coupling relationship between long-term time-series dependencies and process parameters. Furthermore, a weighted fusion strategy based on model validation set performance significantly improves the accuracy and robustness of fluoride feed rate prediction. Compared to traditional manual experience-based methods, this application achieves standardization, datafication, and precision in production control. It stabilizes electrolytic cell operating conditions, reduces fluoride salt consumption and energy consumption, eliminates reliance on experienced process engineers, and its modular design facilitates integration with existing production systems.
[0109] Furthermore, as Figure 1 In a specific implementation of the method, this application provides a device for predicting the amount of electrolytic aluminum fluoride salt feed based on a regression model, such as... Figure 3As shown, the device includes: a preprocessing module 301, a feature construction module 302, a model training module 303, and a prediction module 304.
[0110] Preprocessing module 301 is used to acquire multi-source raw time series data in the electrolytic aluminum production process, preprocess the multi-source raw time series data to obtain a basic feature dataset, the preprocessing includes time series alignment, data unit standardization, and missing value handling; The feature construction module 302 is used to construct multiple types of derived features based on the basic feature dataset, integrate and filter the multiple types of derived features to obtain a high-dimensional feature set, wherein the multiple types of derived features include lag features, rolling statistical features, difference features, periodic features and interaction features; Model training module 303 is used to train multiple gradient boosting decision tree models and deep time series prediction models using the high-dimensional feature set, forming a hybrid prediction model system; The prediction module 304 is used to input the feature data of the day to be predicted into the hybrid prediction model system to obtain multiple preliminary prediction results. Based on the validation set performance index of each model in the hybrid prediction model system, the multiple preliminary prediction results are weighted and fused to obtain the predicted value of the fluoride salt feeding amount for the day to be predicted.
[0111] In a specific application scenario, the preprocessing module 301 is used to acquire raw time-series data from multiple independent business systems during the electrolytic aluminum production process, forming the multi-source raw time-series data. The multiple independent business systems include a cell control system, a testing system, a manual measurement system, and a production execution system. The multi-source raw time-series data is collected using the electrolytic cell number and timestamp as key indexes. The multi-source raw time-series data is grouped according to the electrolytic cell number to obtain the time-series data for each electrolytic cell, and the time-series data for each electrolytic cell is sorted in ascending order according to the timestamp to obtain the target time-series data for each electrolytic cell. For each electrolytic cell, operational feature data and aluminum fluoride feed amounts for multiple dates are extracted from the target time-series data of the electrolytic cell. Supervised learning samples for multiple dates are constructed using the operational feature data and aluminum fluoride feed amounts for these multiple dates. The supervised learning samples for day t include the operational feature data for day t-1 and the aluminum fluoride feed amount for day t. The supervised learning samples for these multiple dates are then subjected to data dimension standardization to obtain the supervised learning sample set for the electrolytic cell. A differentiated processing strategy is then applied to the supervised learning sample set for each electrolytic cell to obtain the basic feature dataset.
[0112] In specific application scenarios, the preprocessing module 301 is used to, for each electrolytic cell's supervised learning sample set, delete the supervised learning sample from the supervised learning sample set when it is determined that there is a supervised learning sample in the supervised learning sample set with missing key process parameters; when it is determined that non-key process parameters in the supervised learning sample set have scattered missing points and the missing proportion is less than or equal to a first threshold, fill the supervised learning sample set with a time series filling method, wherein the time series filling method is either forward filling or backward filling; when it is determined that non-key process parameters in the supervised learning sample set have random missing points and the missing proportion is small... When the value equals the second threshold, a statistical filling method is used to fill the supervised learning sample set. The statistical filling method is any one of mean filling, median filling, and mode filling. When it is detected that non-critical process parameters in the supervised learning sample set are missing for a continuous time period, an interpolation filling method is used to fill the supervised learning sample set. The interpolation filling method is any one of linear interpolation, polynomial interpolation, and spline interpolation. When it is detected that the importance level of the feature parameters in the supervised learning sample set is greater than or equal to a preset importance level and the missing proportion is greater than or equal to a third threshold, the K-nearest neighbor algorithm or regression model is used to fill the supervised learning sample set.
[0113] In specific application scenarios, the feature construction module 302 is used to construct multiple types of derived features in parallel based on the basic feature dataset, and to horizontally concatenate the multiple types of derived features with the basic feature dataset to obtain an extended feature matrix; the extended feature matrix is then filtered using a feature importance evaluation method and a recursive feature elimination method to obtain the high-dimensional feature set.
[0114] In a specific application scenario, the feature construction module 302 is used to extract the supervised learning sample set for each electrolytic cell from the basic feature dataset; for each electrolytic cell, multiple target feature variables are selected from the supervised learning sample set; for each target feature variable, the time series value of the target feature variable is shifted backward along the time axis by multiple preset unit time intervals to obtain multiple time lag features corresponding to the target feature variable, wherein the multiple preset unit time intervals include 1 unit time interval, 2 unit time intervals, 3 unit time intervals, and 4 unit time intervals; the multiple time lag features corresponding to each target feature variable are added to the electrolytic cell... In the supervised learning sample set of the electrolytic cell, the hysteresis features of the electrolytic cell are obtained; at least one key continuous variable is selected from the supervised learning sample set of the electrolytic cell; the rolling statistic for each key continuous variable is calculated within a sliding window of a preset time length, and the rolling statistic corresponding to each key continuous variable is obtained. The rolling statistic corresponding to each key continuous variable is added to the supervised learning sample set of the electrolytic cell to obtain the rolling statistical features of the electrolytic cell. The rolling statistic includes the rolling mean, rolling standard deviation, rolling maximum value, and rolling minimum value; for each key continuous variable, the first-order difference feature sequence of the key continuous variable is calculated, wherein the first-order difference feature sequence... The sequence includes first-order difference eigenvalues at multiple time points. For a given time point, the first-order difference eigenvalue is equal to the difference between the parameter value of the key continuous variable at that time point and the parameter value of the key continuous variable at the previous adjacent time point. Based on the first-order difference eigenvalue sequence of the key continuous variable, the second-order difference eigenvalue sequence of the key continuous variable is calculated, wherein the second-order difference eigenvalue sequence includes second-order difference eigenvalues at multiple time points. For a given time point, the second-order difference eigenvalue is equal to the difference between the first-order difference eigenvalue of the key continuous variable at that time point and the first-order difference eigenvalue of the key continuous variable at the previous adjacent time point. The first-order difference eigenvalue sequence of each key continuous variable is... The second-order difference feature sequence of each of the key continuous variables is added to the supervised learning sample set of the electrolytic cell to obtain the difference features of the electrolytic cell; at least one time element is extracted from the time field of the supervised learning sample set, and each time element is encoded and transformed using sine and cosine functions to obtain the time feature code corresponding to each time element. The time feature code corresponding to each time element is added to the supervised learning sample set of the electrolytic cell to obtain the periodic features of the electrolytic cell; multiple key process variables are selected from the supervised learning sample set of the electrolytic cell, including molecular ratio, working voltage, resistance, aluminum fluoride feed rate, and alumina feed rate.Multiple key process variables are combined through preset mathematical operations to generate multiple interactive feature variables. These interactive feature variables are any one of the following: the product of the molecular ratio and the operating voltage; the ratio of the operating voltage to the resistance; or the ratio of the aluminum fluoride feed rate to the alumina feed rate. These multiple interactive feature variables are then added to the supervised learning sample set of the electrolytic cell to obtain the interactive features of the electrolytic cell.
[0115] In a specific application scenario, the model training module 303 is used to obtain an initial gradient boosting decision tree model and an initial deep temporal prediction model; train multiple initial gradient boosting decision tree models using the high-dimensional feature set to obtain multiple gradient boosting decision tree models, wherein the initial gradient boosting decision tree models are XGBoost models or LightGBM models; train the initial deep temporal prediction model using the high-dimensional feature set to obtain the deep temporal prediction model, wherein the initial deep temporal prediction model includes DLinear models and Transformer models; and combine the multiple gradient boosting decision tree models and the deep temporal prediction model to form the hybrid prediction model system.
[0116] In specific application scenarios, the model training module 303 is used to dynamically divide the high-dimensional feature set into a training set and a validation set in chronological order using a rolling window. The training set and the validation set are used to optimize the key parameters of multiple XGBoost models using grid search or Bayesian optimization methods, resulting in multiple target XGBoost models. The objective function of each XGBoost model includes L1 and L2 regularization terms, and the model parameters of the multiple target XGBoost models are different. By setting parameters such as the number of leaf nodes, maximum tree depth, and minimum number of samples per leaf node, the Leaf-wise growth strategy is constrained. The high-dimensional feature set is discretized using a histogram binning algorithm. Multiple LightGBM models are trained using the constrained Leaf-wise growth strategy and the discretized high-dimensional feature set through feature sampling and sample sampling mechanisms, resulting in multiple target LightGBM models. These multiple target XGBoost models and multiple target LightGBM models are then used as multiple gradient boosting decision tree models.
[0117] In a specific application scenario, the model training module 303 is used to extract time series data from the high-dimensional feature set, construct a training sample set using the time series data through a sliding time window method, train the DLinear model and the Transformer model using the training sample set respectively to obtain a target DLinear model and a target Transformer model, wherein the objective functions of the DLinear model and the Transformer model are configured with L1 regularization terms and L2 regularization terms, and the target DLinear model and the target Transformer model are used as the deep time series prediction model.
[0118] In a specific application scenario, the prediction module 304 is used to input the feature data of the day to be predicted into each model in the hybrid prediction model system to obtain the preliminary fluoride salt feeding quantity prediction value output by each model; for each model, the fusion weight corresponding to the model is calculated based on the validation set performance index of the model, and the preliminary fluoride salt feeding quantity prediction value of the model is multiplied by the fusion weight corresponding to the model to obtain the weight prediction value of the model; the weight prediction values of multiple models in the hybrid prediction model system are weighted and calculated to obtain the fluoride salt feeding quantity prediction value of the day to be predicted.
[0119] This application provides an apparatus that, compared with the prior art, acquires multi-source raw time-series data from the electrolytic aluminum production process, preprocesses the multi-source raw time-series data to obtain a basic feature dataset, wherein the preprocessing includes time series alignment, data dimension standardization, and missing value handling; constructs multiple types of derived features based on the basic feature dataset, integrates and filters these derived features to obtain a high-dimensional feature set, wherein the multiple derived features include lag features, rolling statistical features, difference features, periodic features, and interaction features; trains multiple gradient boosting decision tree models and deep time-series prediction models using the high-dimensional feature set to form a hybrid prediction model system; inputs the feature data of the day to be predicted into the hybrid prediction model system to obtain multiple preliminary prediction results, and weights and fuses the multiple preliminary prediction results according to the validation set performance index of each model in the hybrid prediction model system to obtain the predicted value of the fluoride salt feeding amount for the day to be predicted. By preprocessing multi-source data and constructing five types of derived features in parallel, this method accurately adapts to the characteristics of time-series data in the electrolytic aluminum industry. Simultaneously, it employs a hybrid prediction architecture combining gradient boosting decision trees and deep time-series models. This ensures model interpretability while efficiently capturing the nonlinear coupling relationship between long-term time-series dependencies and process parameters. Furthermore, a weighted fusion strategy based on model validation set performance significantly improves the accuracy and robustness of fluoride feed rate prediction. Compared to traditional manual experience-based methods, this application achieves standardization, datafication, and precision in production control. It stabilizes electrolytic cell operating conditions, reduces fluoride salt consumption and energy consumption, eliminates reliance on experienced process engineers, and its modular design facilitates integration with existing production systems.
[0120] It should be noted that other corresponding descriptions of the functional units involved in the electrolytic aluminum fluoride salt feeding quantity prediction device based on a regression model provided in this application embodiment can be found in the following references. Figure 1 and Figure 2 The corresponding description in [the document] will not be repeated here.
[0121] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties.
[0122] The technical features of the above embodiments can be combined in any way. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.
[0123] The embodiments described above are merely illustrative of several implementation methods of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of this patent application. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of this application, and these all fall within the protection scope of this application. Therefore, the protection scope of this application should be determined by the appended claims.
[0124] In an exemplary embodiment, a computer device is also provided, comprising a bus, a processor, a memory, and a communication interface. It may also include an input / output interface and a display device, wherein the various functional units can communicate with each other via the bus. The memory stores a computer program, and the processor executes the program stored in the memory to perform the regression model-based method for predicting the amount of electrolytic aluminum fluoride salt to be fed in the above embodiment.
[0125] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the method for predicting the amount of electrolytic aluminum fluoride salt feed based on a regression model.
[0126] Through the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented in hardware or by using software plus necessary general-purpose hardware platforms. Based on this understanding, the technical solution of this application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (such as a CD-ROM, USB flash drive, external hard drive, etc.) and includes several instructions to cause a computer device (such as a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments of this application.
[0127] Those skilled in the art will understand that the accompanying drawings are merely schematic diagrams of a preferred embodiment, and the modules or processes shown in the drawings are not necessarily essential for implementing this application.
[0128] Those skilled in the art will understand that the modules in the apparatus of the implementation scenario can be distributed within the apparatus of the implementation scenario as described, or they can be located in one or more apparatuses different from this implementation scenario, with corresponding changes. The modules of the above-described implementation scenario can be combined into one module, or they can be further divided into multiple sub-modules.
[0129] The serial numbers in this application are for descriptive purposes only and do not represent the superiority or inferiority of the implementation scenario.
[0130] The above disclosures are only a few specific implementation scenarios of this application. However, this application is not limited to these. Any variations that can be conceived by those skilled in the art should fall within the protection scope of this application.
Claims
1. A method for predicting the amount of electrolytic aluminum fluoride salt feed based on a regression model, characterized in that, include: Acquire multi-source raw time-series data from the electrolytic aluminum production process, preprocess the multi-source raw time-series data to obtain a basic feature dataset, the preprocessing includes time series alignment, data unit standardization, and missing value handling; Based on the aforementioned basic feature dataset, multiple types of derived features are constructed. These derived features are then integrated and filtered to obtain a high-dimensional feature set. The derived features include lag features, rolling statistical features, difference features, periodic features, and interaction features. Multiple gradient boosting decision tree models and deep time series prediction models are trained using the high-dimensional feature set to form a hybrid prediction model system. The characteristic data of the day to be predicted is input into the hybrid prediction model system to obtain multiple preliminary prediction results. The multiple preliminary prediction results are weighted and fused according to the validation set performance index of each model in the hybrid prediction model system to obtain the predicted value of the fluoride salt feeding amount for the day to be predicted.
2. The method according to claim 1, characterized in that, The process of acquiring multi-source raw time-series data during the electrolytic aluminum production process involves preprocessing the multi-source raw time-series data to obtain a basic feature dataset, including: The original time-series data from multiple independent business systems during the electrolytic aluminum production process are acquired to form the multi-source original time-series data. The multiple independent business systems include the cell control system, the testing system, the manual measurement system, and the production execution system. The multi-source original time-series data is collected using the electrolytic cell number and timestamp as key indexes. The multi-source raw time-series data are grouped according to the electrolytic cell number to obtain the time-series data of each electrolytic cell. The time-series data of each electrolytic cell are then sorted in ascending order according to the timestamp to obtain the target time-series data of each electrolytic cell. For each electrolytic cell, multiple days of operating feature data and multiple days of aluminum fluoride feeding amount are extracted from the target time series data of the electrolytic cell. The multiple days of operating feature data and multiple days of aluminum fluoride feeding amount are used to construct supervised learning samples for multiple days. The supervised learning sample for day t includes the operating feature data for day t-1 and the aluminum fluoride feeding amount for day t. The supervised learning samples from the multiple dates are standardized to obtain the supervised learning sample set of the electrolytic cell. A differential processing strategy is then applied to process the supervised learning sample set of each electrolytic cell to obtain the basic feature dataset.
3. The method according to claim 2, characterized in that, The supervised learning sample set of each electrolytic cell is processed using a differentiated processing strategy to obtain the basic feature dataset, which includes: For each electrolytic cell's supervised learning sample set, when it is determined that there is a supervised learning sample in the supervised learning sample set that is missing a key process parameter, the supervised learning sample is deleted from the supervised learning sample set. When the detection determines that non-critical process parameters in the supervised learning sample set are missing in a scattered manner and the missing ratio is less than or equal to the first threshold, the supervised learning sample set is filled with a time series filling method, which is either forward filling or backward filling. When the detection determines that non-critical process parameters in the supervised learning sample set are randomly missing and the missing ratio is less than or equal to the second threshold, the supervised learning sample set is filled with a statistical filling method, which is any one of mean filling, median filling, or mode filling. When the detection determines that non-critical process parameters in the supervised learning sample set are missing for a continuous time period, the supervised learning sample set is filled with an interpolation filling method, which can be any one of linear interpolation, polynomial interpolation, or spline interpolation. When the detection determines that the importance level of the feature parameters in the supervised learning sample set is greater than or equal to the preset importance level and the missing proportion is greater than or equal to the third threshold, the supervised learning sample set is filled with the K-nearest neighbor algorithm or a regression model.
4. The method according to claim 1, characterized in that, The process involves constructing multiple derived features based on the basic feature dataset, integrating and filtering these derived features to obtain a high-dimensional feature set, including: Based on the basic feature dataset, multiple types of derived features are constructed in parallel. The multiple types of derived features are then horizontally concatenated with the basic feature dataset to obtain an extended feature matrix. The extended feature matrix is subjected to feature filtering using a feature importance evaluation method and a recursive feature elimination method to obtain the high-dimensional feature set.
5. The method according to claim 4, characterized in that, The parallel construction of multiple derived features based on the aforementioned basic feature dataset includes: Extract the supervised learning sample set for each electrolytic cell from the basic feature dataset; For each electrolytic cell, multiple target feature variables are selected from the supervised learning sample set of the electrolytic cell; For each target feature variable, the time series value of the target feature variable is shifted backward along the time axis by multiple preset unit time intervals to obtain multiple time lag features corresponding to the target feature variable. The multiple preset unit time intervals include 1 unit time interval, 2 unit time intervals, 3 unit time intervals, and 4 unit time intervals. Multiple time lag features corresponding to each of the target feature variables are added to the supervised learning sample set of the electrolytic cell to obtain the lag features of the electrolytic cell; At least one key continuous variable is selected from the supervised learning sample set of the electrolyzer; Calculate the rolling statistics for each of the key continuous variables within a sliding window of a preset time length to obtain the rolling statistics corresponding to each of the key continuous variables. Add the rolling statistics corresponding to each of the key continuous variables to the supervised learning sample set of the electrolytic cell to obtain the rolling statistical features of the electrolytic cell. The rolling statistics include the rolling average, rolling standard deviation, rolling maximum value, and rolling minimum value. For each of the key continuous variables, the first-order difference feature sequence of the key continuous variable is calculated, wherein the first-order difference feature sequence includes first-order difference feature values at multiple times. For a certain time, the first-order difference feature value is equal to the difference between the parameter value of the key continuous variable at that time and the parameter value of the key continuous variable at the previous adjacent time. Based on the first-order difference feature sequence of the key continuous variable, calculate the second-order difference feature sequence of the key continuous variable, wherein the second-order difference feature sequence includes second-order difference feature values at multiple times. For a certain time, the second-order difference feature value is equal to the difference between the first-order difference feature value of the key continuous variable at that time and the first-order difference feature value of the key continuous variable at the previous adjacent time. The first-order difference feature sequence of each of the key continuous variables and the second-order difference feature sequence of each of the key continuous variables are added to the supervised learning sample set of the electrolytic cell to obtain the difference features of the electrolytic cell. At least one time element is extracted from the time field of the supervised learning sample set, and each time element is encoded and transformed using sine and cosine functions to obtain the time feature code corresponding to each time element. The time feature code corresponding to each time element is added to the supervised learning sample set of the electrolytic cell to obtain the periodic features of the electrolytic cell. Several key process variables were selected from the supervised learning sample set of the electrolytic cell. These key process variables included molecular ratio, operating voltage, resistance, aluminum fluoride feed rate, and alumina feed rate. Multiple key process variables are combined by pre-set mathematical operations to generate multiple interactive feature variables. The interactive feature variables are any one of the following: the product of the molecular ratio and the working voltage, the ratio of the working voltage to the resistance, and the ratio of the aluminum fluoride feed amount to the alumina feed amount. The multiple interactive feature variables are added to the supervised learning sample set of the electrolytic cell to obtain the interactive features of the electrolytic cell.
6. The method according to claim 1, characterized in that, The method utilizes the high-dimensional feature set to train multiple gradient boosting decision tree models and deep time series prediction models, respectively, to form a hybrid prediction model system, including: Obtain the initial model of gradient boosting decision tree and the initial model of deep time series prediction; The high-dimensional feature set is used to train multiple initial gradient boosting decision tree models to obtain multiple gradient boosting decision tree models. The initial gradient boosting decision tree models are XGBoost models or LightGBM models. The deep temporal prediction initial model is trained using the high-dimensional feature set to obtain the deep temporal prediction model, which includes the DLinear model and the Transformer model. The multiple gradient boosting decision tree models and the deep time series prediction model are combined to form the hybrid prediction model system.
7. The method according to claim 6, characterized in that, The step of training multiple initial gradient boosting decision tree models using the high-dimensional feature set to obtain the multiple gradient boosting decision tree models includes: The high-dimensional feature set is dynamically divided into a training set and a validation set in chronological order using a rolling window. The key parameters of multiple XGBoost models are optimized and searched using the training set and the validation set respectively through grid search or Bayesian optimization methods, thereby training multiple target XGBoost models. The objective function of each XGBoost model is configured with L1 regularization and L2 regularization terms, and the model parameters of the multiple target XGBoost models are different. By setting parameters such as the number of leaf nodes, maximum tree depth, and minimum number of leaf node samples, the Leaf-wise growth strategy is constrained. The high-dimensional feature set is discretized using the histogram binning algorithm. Multiple LightGBM models are trained using the constrained Leaf-wise growth strategy and the discretized high-dimensional feature set through feature sampling and sample sampling mechanisms, respectively, to obtain multiple target LightGBM models. The multiple target XGBoost models and the multiple target LightGBM models are used as the multiple gradient boosting decision tree models.
8. The method according to claim 6, characterized in that, The step of training the initial deep temporal prediction model using the high-dimensional feature set to obtain the deep temporal prediction model includes: Time series data is extracted from the high-dimensional feature set, and a training sample set is constructed using the time series data through a sliding time window method. The DLinear model and the Transformer model are trained using the training sample set to obtain the target DLinear model and the target Transformer model. The objective functions of the DLinear model and the Transformer model are configured with L1 regularization terms and L2 regularization terms. The target DLinear model and the target Transformer model are used as the deep time series prediction model.
9. The method according to claim 1, characterized in that, The process involves inputting the feature data of the day to be predicted into the hybrid prediction model system to obtain multiple preliminary prediction results. These preliminary prediction results are then weighted and fused based on the validation set performance metrics of each model in the hybrid prediction model system to obtain the predicted value of the fluoride salt feed amount for the day to be predicted, including: The characteristic data of the day to be predicted are input into each model in the hybrid prediction model system to obtain the preliminary predicted value of the fluoride salt feed output by each model. For each model, the fusion weight corresponding to the model is calculated based on the performance index of the validation set of the model. The preliminary predicted value of the fluoride salt feed amount of the model is multiplied by the fusion weight corresponding to the model to obtain the predicted weight value of the model. The weighted prediction values of multiple models in the hybrid prediction model system are calculated to obtain the predicted value of the fluoride salt feed amount for the day to be predicted.
10. A device for predicting the amount of electrolytic aluminum fluoride salt feed based on a regression model, characterized in that, include: The preprocessing module is used to acquire multi-source raw time-series data from the electrolytic aluminum production process, and to preprocess the multi-source raw time-series data to obtain a basic feature dataset. The preprocessing includes time series alignment, data unit standardization, and missing value handling. The feature construction module is used to construct multiple types of derived features based on the basic feature dataset, integrate and filter the multiple types of derived features to obtain a high-dimensional feature set. The multiple types of derived features include lag features, rolling statistical features, difference features, periodic features and interaction features. The model training module is used to train multiple gradient boosting decision tree models and deep time series prediction models using the high-dimensional feature set, forming a hybrid prediction model system. The prediction module is used to input the feature data of the day to be predicted into the hybrid prediction model system to obtain multiple preliminary prediction results. Based on the validation set performance index of each model in the hybrid prediction model system, the multiple preliminary prediction results are weighted and fused to obtain the predicted value of the fluoride salt feeding amount for the day to be predicted.