A method, system and device for predicting heat rate of a thermal power unit and a storage medium
The built-in L2 regularized AKRR model, constructed by K-medoids clustering and Optuna-DART joint optimization, solves the problems of insufficient adaptability and accuracy of thermal power unit heat rate prediction under different operating conditions, and achieves efficient and stable heat rate prediction results.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- XIAN XIRE ENERGY SAVING TECH
- Filing Date
- 2026-02-28
- Publication Date
- 2026-06-12
AI Technical Summary
Existing methods for predicting the heat rate of thermal power units have poor adaptability under different operating conditions, resulting in insufficient prediction stability and accuracy. Furthermore, traditional models are prone to overfitting and are difficult to achieve efficient and accurate heat rate prediction in high-dimensional data scenarios.
The K-medoids clustering algorithm is used to divide the operating data of thermal power units into clusters of similar operating conditions. Combined with the Optuna-DART joint optimization mechanism, an AKRR prediction sub-model with built-in L2 regularization is constructed. Through adaptive weight factor and kernel matrix weighting, the model structure and parameters are optimized to achieve accurate heat rate prediction.
It improves the adaptability and accuracy of heat rate prediction under all operating conditions, reduces the risk of overfitting, and achieves efficient and stable heat rate prediction to meet the real-time needs of industry.
Smart Images

Figure CN122196757A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of thermal power units and relates to a method, system, equipment and storage medium for predicting the heat consumption rate of thermal power units. Background Technology
[0002] The heat rate is a core indicator of the energy conversion efficiency of thermal power units, and its accurate prediction is crucial for optimizing unit operation, reducing energy consumption, and controlling power generation costs. Currently, thermal power units operate under a wide range of conditions (dynamically adjusting load from 25% to 100%), and the parameters affecting the heat rate (such as unit load, steam pressure, and flow rate) exhibit complex nonlinear relationships, requiring efficient prediction methods to achieve accurate calculations.
[0003] In the field of heat rate prediction, existing technologies mainly fall into three typical paths: First, the mechanism calculation method based on thermodynamic formulas, which relies on accurate parameter measurement and formula derivation, but is easily affected by the accumulation of measurement errors; second, the traditional machine learning modeling method (such as single LSSVM, BP neural network, etc.), which builds a prediction model by mining the patterns of historical data, but does not consider the impact of different operating conditions on the model's adaptability; and third, the combination method of "simple clustering + basic regression", which attempts to model by grouping, but the clustering logic lacks specificity and does not combine regularization and adaptive optimization, resulting in insufficient prediction accuracy and stability.
[0004] Existing technological defects and shortcomings: Lack of adaptability to operating conditions: Existing single-model prediction methods do not distinguish the influence of heat rate under different operating conditions. The parameter correlation logic of special operating conditions such as low load and variable load is significantly different from that of high load, resulting in poor prediction stability and large error fluctuations across the entire load range.
[0005] Unscientific clustering application: Although some combination methods use clustering, they do not use the core influencing parameter of heat consumption rate as the basis for clustering, or do not optimize the number of clusters through quantitative indicators, resulting in the clustering results being disconnected from the heat consumption rate change pattern, and failing to provide effective support for subsequent predictions.
[0006] Imbalance between model accuracy and complexity: Traditional machine learning models lack targeted complexity constraint mechanisms, making them prone to overfitting in high-dimensional data scenarios; at the same time, parameter configuration relies on human experience or a single optimization algorithm, making it difficult to lock in the global optimal solution and affecting prediction accuracy. Summary of the Invention
[0007] The purpose of this invention is to overcome the shortcomings of the prior art and provide a method, system, equipment and storage medium for predicting the heat consumption rate of thermal power units, thereby improving the adaptability of the prediction to operating conditions, the accuracy of the results and the efficiency of deployment.
[0008] To achieve the above objectives, the present invention employs the following technical solution: A method for predicting the heat rate of thermal power units includes the following steps: Historical operating data of thermal power units were acquired and divided into training and validation sets. The Optuna framework, in conjunction with the DART algorithm, is used to jointly optimize the training set and obtain adaptive weight factors and global hyperparameter combinations. Based on the number of clusters in the global hyperparameter combination, the K-medoids algorithm is applied to divide the training set into clusters of similar working conditions. Load the adaptive weighting factor and the regularization coefficient in the global hyperparameter combination, and construct an AKRR prediction sub-model with built-in L2 regularization for each working condition cluster; The real-time operating data to be predicted is input into the matching AKRR prediction sub-model, and the heat rate prediction result is output through matrix operations. When constructing the kernel matrix, the AKRR prediction sub-model uses an adaptive weighting factor to weight the Euclidean distance between samples.
[0009] Optionally, historical operating data includes unit load, main steam flow, main steam pressure, feedwater flow, feedwater temperature, and reheat steam pressure, as well as the corresponding actual heat rate.
[0010] Optionally, the process of jointly optimizing the training set using the Optuna framework and the DART algorithm is as follows: The Optuna framework defines and searches the hyperparameter space based on the TPE sampling algorithm. This hyperparameter space covers the number of clusters, AKRR kernel width, L2 regularization coefficient, DART decision tree dropout probability, maximum tree depth, and learning rate. The DART algorithm is trained based on the parameters in the hyperparameter space, optimizes the model structure through a random deactivation mechanism, and calculates an adaptive weight factor that reflects the confidence of sample prediction. The adaptive weighting factor is passed to the kernel matrix building module of the AKRR prediction sub-model.
[0011] Optionally, the process of applying the K-medoids algorithm to divide the training set into clusters of similar working conditions is as follows: The actual data points in the training set are used as cluster centers; The final cluster center is selected by iterative optimization, which minimizes the sum of the distances from all data points within the cluster to the cluster center. Based on the distance from the data point to the final cluster center, all training samples are assigned to the corresponding working condition clusters.
[0012] Optionally, the process of constructing an AKRR prediction sub-model with built-in L2 regularization is as follows: Kernel matrix construction: Calculate the Euclidean distance between two samples within the working condition cluster, multiply the Euclidean distance by an adaptive weighting factor, and then input it into the radial basis function to obtain the kernel matrix elements; Regression coefficient solution: Multiply the kernel matrix and the identity matrix by the regularization coefficient and add them to obtain the intermediate matrix. Invert the intermediate matrix and multiply the inversion result by the true value vector of the heat consumption rate of the operating condition cluster to obtain the regression coefficient vector.
[0013] Optionally, the built-in L2 regularization is achieved by introducing a L2 constraint term of the regression coefficients into the objective function of the AKRR prediction sub-model. This objective function consists of the L2 term of the product of the kernel matrix and the regression coefficient vector minus the true value vector of the heat rate, and the L2 term of the regularization coefficient multiplied by the regression coefficient vector.
[0014] Optionally, the process of jointly optimizing the training set using the Optuna framework and the DART algorithm also includes establishing a closed-loop feedback mechanism: In each round of optimization iteration, the root mean square error of the AKRR prediction sub-model is evaluated using the validation set; The root mean square error is fed back into the Optuna framework; The Optuna framework updates the probabilistic model based on feedback and adjusts the hyperparameter search direction for the next round until a preset number of iterations or an error threshold is reached.
[0015] A thermal power unit heat rate prediction system, comprising: The data acquisition module is used to acquire historical operating data of thermal power units and divide it into training and validation sets; The joint optimization module is used to use the Optuna framework in conjunction with the DART algorithm to jointly optimize the training set and obtain adaptive weight factors and global hyperparameter combinations. The feature partitioning module is used to divide the training set into clusters of similar working conditions based on the number of clusters in the global hyperparameter combination and applying the K-medoids algorithm. The regularization module is used to load the regularization coefficients in the combination of adaptive weight factors and global hyperparameters, and to build an AKRR prediction sub-model with built-in L2 regularization for each working condition cluster. The prediction module is used to input the real-time running data to be predicted into the matching AKRR prediction sub-model and output the heat rate prediction result through matrix operations. When constructing the kernel matrix, the AKRR prediction sub-model uses an adaptive weighting factor to weight the Euclidean distance between samples.
[0016] A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the thermal power unit heat rate prediction method.
[0017] A computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the thermal power unit heat rate prediction method.
[0018] Compared with the prior art, the present invention has the following beneficial effects: This invention uses the K-medoids clustering algorithm to divide the complex and variable full-condition operation data of thermal power units into multiple micro-condition clusters with similar characteristic patterns. This eliminates parameter interference between different conditions (such as high load and low load) at the physical data level and solves the problem of poor adaptability of a single model. On this basis, a deep linkage is established through the Optuna-DART joint optimization mechanism: Optuna is used to achieve global automatic locking of hyperparameters of all modules, while the adaptive weight factor calculated by the DART algorithm directly optimizes the kernel matrix construction process of the AKRR prediction model. This allows the model to dynamically focus on effective features based on the credibility of the samples, enhancing the accuracy of nonlinear mapping. Combined with the L2 regularization technology built into AKRR to constrain the regression coefficients, the risk of overfitting under high-dimensional data is effectively suppressed. Thus, real-time heat rate prediction with high computational efficiency, high prediction accuracy and strong generalization ability is achieved without manual intervention. Attached Figure Description
[0019] Figure 1 This is a schematic diagram of the thermal power unit heat consumption rate prediction method of the present invention; Figure 2 This is a schematic diagram of the K-medoids algorithm of the present invention; Figure 3 This is a schematic diagram of the gradient boosting decision tree of the present invention. Detailed Implementation
[0020] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.
[0021] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.
[0022] Example 1 like Figure 1 As shown in the figure, this embodiment provides a method for predicting the heat rate of thermal power units, including the following process: S1: Obtain historical operating data of thermal power units and divide it into training set and validation set.
[0023] Specifically, historical operating data includes unit load, main steam flow rate, main steam pressure, feedwater flow rate, feedwater temperature, and reheat steam pressure, as well as the corresponding actual heat rate values.
[0024] S2 utilizes the Optuna framework in conjunction with the DART algorithm to jointly optimize the training set and obtain adaptive weight factors and global hyperparameter combinations.
[0025] Specifically, the Optuna framework defines and searches the hyperparameter space based on the TPE sampling algorithm. This hyperparameter space covers the number of clusters, AKRR kernel width, L2 regularization coefficient, DART decision tree drop probability, maximum tree depth, and learning rate.
[0026] The DART algorithm is trained based on the parameters in the hyperparameter space, optimizes the model structure through a random deactivation mechanism, and calculates an adaptive weight factor that reflects the reliability of sample predictions.
[0027] The adaptive weighting factor is passed to the kernel matrix building module of the AKRR prediction sub-model.
[0028] This also includes establishing a closed-loop feedback mechanism: In each round of optimization iteration, the root mean square error of the AKRR prediction sub-model is evaluated using the validation set.
[0029] The root mean square error is fed back to the Optuna framework.
[0030] The Optuna framework updates the probabilistic model based on feedback and adjusts the hyperparameter search direction for the next round until a preset number of iterations or an error threshold is reached.
[0031] S3. Based on the number of clusters in the global hyperparameter combination, the K-medoids algorithm is applied to divide the training set into clusters of similar working conditions.
[0032] Specifically, the actual data points in the training set are used as cluster centers.
[0033] The final cluster center is selected by iterative optimization, which minimizes the sum of the distances from all data points within the cluster to the cluster center.
[0034] Based on the distance from the data point to the final cluster center, all training samples are assigned to the corresponding working condition clusters.
[0035] S4 loads the adaptive weighting factor and the regularization coefficient in the global hyperparameter combination, and constructs an AKRR prediction sub-model with built-in L2 regularization for each working condition cluster.
[0036] Specifically, the process of constructing the AKRR prediction sub-model with built-in L2 regularization is as follows: Kernel matrix construction: Calculate the Euclidean distance between two samples within the working condition cluster, multiply the Euclidean distance by the adaptive weighting factor, and then input it into the radial basis function to obtain the kernel matrix elements.
[0037] Regression coefficient solution: Multiply the kernel matrix and the identity matrix by the regularization coefficient and add them to obtain the intermediate matrix. Invert the intermediate matrix and multiply the inversion result by the true value vector of the heat consumption rate of the operating condition cluster to obtain the regression coefficient vector.
[0038] The built-in L2 regularization is achieved by introducing a L2 norm constraint term for the regression coefficients into the objective function of the AKRR prediction sub-model. This objective function consists of the L2 norm term of the product of the kernel matrix and the regression coefficient vector minus the true value vector of the heat rate, and the L2 norm term of the regularization coefficient multiplied by the regression coefficient vector.
[0039] S5 inputs the real-time running data to be predicted into the matching AKRR prediction sub-model and outputs the heat rate prediction result through matrix operations; when constructing the kernel matrix, the AKRR prediction sub-model uses an adaptive weighting factor to weight the Euclidean distance between samples.
[0040] Example 2 The thermal power unit heat rate prediction method with built-in regularization and cluster optimization linkage described in this embodiment is based on the deep collaboration of three modules: operating condition clustering (K-medoids), Optuna-DART joint optimization, and AKRR prediction (built-in regularization), to construct a closed-loop heat rate prediction system. The Optuna framework and DART algorithm form a dual driving force for hyperparameter optimization and model structure optimization, working together on the AKRR sub-model to solve the problems of insufficient generalization ability in single optimization. Each module has its own specific function and is irreplaceable.
[0041] The working condition clustering adopts the K-medoids unsupervised clustering algorithm. The core purpose is to divide the full working condition operation data into working condition clusters with similar features, eliminate cross-working condition data interference, and provide a foundation for subsequent accurate modeling.
[0042] The Kmedoids algorithm uses actual data points in the dataset as cluster centers (Medoids). It iteratively optimizes the selection of cluster centers to minimize the sum of distances from all data points within a cluster to the cluster center, ultimately achieving optimal data grouping. Compared to the K-means algorithm, this algorithm is more robust to outliers in industrial data, does not require assumptions about a specific distribution, and is well-suited to the complexity of thermal power unit operating data.
[0043] The goal of the K-medoids algorithm is to minimize the sum of the distances from all data points in the clustering result to the center of their respective clusters. The objective function is:
[0044] The objective function, used to evaluate clustering quality, needs to be minimized. The number of clusters; :cluster The set of data points in; : No. One data point; : No. The center point of each cluster (the representative data point of the cluster); Data points With cluster center The distance between them is usually expressed using Euclidean distance.
[0045] In the K-medoids algorithm, cluster center selection is a crucial step. Unlike the K-means algorithm, which uses the mean as cluster centers, the K-medoids algorithm uses actual data points in the dataset as cluster centers, dividing n objects in the dataset into k clusters. A cluster center can be defined as an object in a cluster that minimizes the average dissimilarity calculated with all objects in the cluster; it is the most central point in the given dataset. Figure 2The computation process of the K-medoids clustering algorithm is demonstrated.
[0046] 1. Initialization: Randomly select K data points as cluster centers (medoid), i.e. .
[0047] 2. Assign data points: For each data point Calculate its distance to each cluster center The distance is calculated, and data points are assigned to the nearest cluster center.
[0048] 3. Update Cluster Center: Within each cluster, select a data point as the new cluster center (medoid). The new cluster center should be the point that minimizes the total distance from all data points within the cluster to the center. That is, for each cluster... Choose a point Make:
[0049] 4. Iteration: Repeat steps 2 and 3 until the cluster center no longer changes or the maximum number of iterations is reached.
[0050] The Xie-Benni index is used to measure and analyze the heterogeneity and equilibrium of complex systems. It is commonly used to describe the degree of equilibrium or similarity among different parts of a system, and is particularly widely used in studies of diversity and disequilibrium. The Xie-Benni index can be used to evaluate the degree of diversity and differentiation among the parts of a system. The lower the value, the more the system tends towards equilibrium or uniformity. The Xie-Benni index is also used to measure the effectiveness of K-medoids clustering, specifically by calculating the ratio of intra-cluster tightness to inter-cluster separation. In K-medoids clustering, the formula for the Xie-Benni index can be defined as follows:
[0051] The number of clusters; : No. A set of data points in a cluster; : No. One data point; : No. The center point (medoid) of each cluster; Data points Belongs to the clan The membership degree refers to the degree to which an element belongs to a set. : Fuzzy coefficient (for hard clustering, ); Data points To the cluster center The squared distance; Cluster center and The squared distance between them.
[0052] DART is an ensemble learning algorithm, a variant of XGBoost (Extreme Gradient Boosting). It improves traditional tree models by introducing a random dropout mechanism during training. DART randomly discards a portion of the decision tree in each training iteration, similar to the Dropout method in deep learning. Unlike traditional gradient boosting trees (such as GBDT), DART reduces the risk of overfitting from a single model and makes the model more robust in each iteration. A diagram illustrating the principle of gradient boosting decision trees is shown below. Figure 3 As shown, the 29 variables and clustering results from Section 1.1 were used as model inputs, and the slurry circulation pump operating status code was used as model output. A classification model was constructed using DART.
[0053] In the standard GBDT algorithm, the objective function for each round is to minimize the loss function. Weighted sum:
[0054] in: : No. The true label of each sample; Model on samples The predicted value; : No. Tree; Loss function (e.g., squared error); The complexity penalty for a tree is usually represented by the number of leaves in the tree.
[0055] The goal of DART is to achieve regularization by introducing a dropout mechanism. Its formula is similar to GBDT, but some trees are discarded in each iteration. It is the first The probability of a tree being discarded can be expressed as the objective function of DART:
[0056] in: : No. The probability that a tree will be discarded.
[0057] The other symbols are the same as those defined in GBDT.
[0058] In DART, the model update process combines regular tree model training with dropout. In each iteration, we typically update the model using the following steps: 1. Calculate the current residual: Similar to traditional GBDT, first calculate the current residual, which is the difference between the model's predicted value and the true label:
[0059] 2. Train a new tree: Train a new tree based on the current residual. And add it to the existing tree set.
[0060] 3. Applying the Dropout mechanism: In each update round, a certain proportion of trees are randomly discarded. Unlike ordinary GBDT, DART's update step does not add all trees to the model, but randomly discards some existing trees and retains the others. In this way, the risk of overfitting from a single model is reduced, and the robustness of the model is improved.
[0061] 4. Update the model: The updated model is a weighted sum of all non-dropped trees.
[0062]
[0063] in, It is a set of preserved trees. It is a tree The weight.
[0064] Optuna Framework: While various machine learning algorithms have performed well in predicting SO2 concentration at the outlet of desulfurization systems, they generally face the complex challenge of hyperparameter tuning. An excessive number of hyperparameters, coupled with the difficulty of manual optimization, has become a key factor limiting further improvements in model performance. In contrast, the Optuna algorithm stands out due to its unique adaptive hyperparameter optimization capability. Optuna automatically explores the optimal combination of hyperparameters by defining a search space and objective function, without manual intervention. It utilizes efficient sampling strategies and advanced optimization algorithms (such as Bayesian optimization and TPE) to intelligently select hyperparameters for experimentation within the search space and dynamically adjusts the search strategy based on the experimental results to more quickly approach the global optimum. In Optuna, the search space is the foundation of the optimization process, referring to a range or set of values defined by the user for the hyperparameters of the machine learning model.
[0065] This study uses the TPE sampling algorithm, which, based on Bayesian optimization principles, estimates a suitable search direction using historical evaluation results. Each trial proposes a new set of hyperparameter combinations from a predefined range. The model is then updated using Bayesian optimization based on the results of previous trials. In each trial, Optuna simultaneously adjusts all hyperparameters while keeping other model parameters (such as model architecture, input / output dimensions, loss function optimizer, etc.) unchanged.
[0066] TPE uses two probabilistic models to describe the performance of hyperparameters. ) represents the probability distribution of high-performance hyperparameter combinations. () represents the probability distribution of a low-performance hyperparameter combination. TPE selects a new hyperparameter combination. This maximizes the ratio:
[0067] This ratio reflects the probability of selecting hyperparameter combinations in the high-performance region. The model is trained based on the selected hyperparameter combinations, the objective function value is calculated, the experimental results are returned, and the probability distribution is updated based on the new experimental results to readjust the search direction.
[0068] AKRR prediction is the core ontology of the prediction system, replacing the traditional LSSVM algorithm. It receives clustered data from the clustering module and realizes nonlinear regression prediction of heat rate. It has a simple structure, high computational efficiency and strong fitting ability.
[0069] The AKRR algorithm is an improvement on Kernel Ridge Regression (KRR). Its core logic is to map low-dimensional input features to a high-dimensional space through radial basis functions (RBF), transforming the nonlinear regression problem into a linear regression problem in a high-dimensional space. It introduces adaptive weight factors to adapt to the local nonlinear features of a single working condition cluster. It solves the regression coefficients directly through matrix operations, without the need for complex Lagrange multipliers and KKT conditions, and its computational efficiency far exceeds that of LSSVM.
[0070] The thermal power unit heat rate prediction method (including DART optimization and adaptation) described in this embodiment includes the following process: The first step is kernel matrix construction: For training data of a certain working condition cluster, construct a radial basis function kernel matrix. The calculation logic for each element is as follows: first calculate the Euclidean distance between two samples, substitute it into the radial basis function to obtain the basic kernel value, and then multiply it by the adaptive weight factor of the two samples (determined by the subsequent DART algorithm optimization; the higher the prediction confidence of the sample in DART, the larger the weight factor), to enhance the capture of effective features.
[0071] The second step is to solve for the regression coefficients: combining the regularization constraint, multiply the kernel matrix and the identity matrix by the regularization coefficient and add them to obtain a new matrix. After inverting the new matrix, multiply it by the true value vector of the heat dissipation rate of the cluster to obtain the regression coefficient vector.
[0072] The third step is to predict the output: For the input data to be predicted, calculate its kernel matrix (test kernel matrix) with the training set of the cluster, and then multiply it with the regression coefficient vector to obtain the predicted heat dissipation rate.
[0073] The first step is to build the framework with AKRR and improve accuracy with DART.
[0074] The function of AKRR is to build the basic kernel matrix. It uses radial basis functions (RBF) to first calculate the Euclidean distance between two samples (such as the combined data of unit load and steam pressure), and then substitutes them into the function to obtain the basic kernel value. Essentially, it maps low-dimensional operating data to a high-dimensional space, turning the originally nonlinear heat rate relationship into a calculable linear relationship.
[0075] DART's function is to weight and optimize the kernel matrix. DART first analyzes the prediction confidence of each sample (for example, a sample with a more stable correlation between its operating characteristics and heat dissipation rate has higher confidence), and assigns larger adaptive weight factors to samples with high confidence. Multiplying this weight by the base kernel value allows the kernel matrix to focus more on effective features and filter out noisy data.
[0076] The second step is regularization to prevent overfitting, and the AKRR calculation results are as follows: The function of regularization is to add constraints. If only AKRR calculation is used, the model may memorize the noise in the training data (such as a random measurement error), leading to inaccurate subsequent predictions (overfitting). Regularization adds a penalty term to the kernel matrix in the form of identity matrix × regularization coefficient, limiting the size of the regression coefficients and preventing the model from overfitting to noise.
[0077] The function of AKRR is to solve for core parameters. It inverts the new matrix with regularization constraints, and then multiplies it with the actual heat rate data of the cluster to obtain the regression coefficient vector. This vector is the key to subsequent predictions, recording the linear relationship between features and heat rate in high-dimensional space.
[0078] The third step is for AKRR to use the key to get the result: The function of AKRR is to complete the final prediction. For new data to be predicted (such as the unit operating parameters at a certain moment), first calculate the test kernel matrix of it and all samples in the training set (equivalent to finding the similarity between the new data and historical data), and then multiply it with the regression coefficient vector obtained in the second step to directly calculate the predicted value of the heat rate.
[0079] In summary, AKRR is responsible for prediction and is the core prediction algorithm; it alone is sufficient for prediction. K-medoids are used to classify the data and process the dataset. The DART algorithm is used to increase the model's accuracy. Optimum is used to adaptively provide the optimal combination of hyperparameters.
[0080] Clustering modeling process: The first step is to input data for each operating condition cluster, including 6 core features (unit load, main steam flow, main steam pressure, feedwater flow, feedwater temperature, and reheat steam pressure) and the actual heat rate value.
[0081] The second step is to divide each cluster into a training set (for modeling) and a validation set (for evaluation) in a 7:3 ratio.
[0082] The third step is to load the optimization parameters (kernel function parameters, regularization coefficients, and weight factors after DART optimization) output by the Optuna-DART joint optimization.
[0083] The fourth step is to train the AKRR sub-model based on the optimized parameters (kernel matrix construction and regression coefficient calculation).
[0084] Fifth, evaluate using the validation set. The relative error (absolute difference between predicted and true values ÷ true value × 100%) must be ≤ ±0.12%. If it does not meet the requirement, return to the previous step and re-optimize. Step 6: Save all qualified AKRR sub-models (including cluster numbers, optimization parameters, and regression coefficients).
[0085] Regularization process (built-in L2 regularization technique): This module ensures the generalization ability of the prediction system. It is built into the AKRR algorithm and its core function is to suppress overfitting. Its integration with the DART algorithm for regularization forms a dual anti-overfitting mechanism.
[0086] Technical principle: The core of L2 regularization is to introduce a L2 norm constraint term for the regression coefficients into the AKRR objective function. By penalizing excessively large regression coefficients, it avoids the model from overfitting noise in the training data, guides the model to focus on the overall pattern of the data, and improves the prediction stability of real-time data.
[0087] Integration method: L2 regularization is not an independent external module, but is deeply embedded in the AKRR objective function. The objective function logic is: objective function value = half of the L2 norm of (the result of multiplying the kernel matrix and the regression coefficient vector - the true value vector of heat dissipation rate) + half of (the regularization coefficient × the L2 norm of the regression coefficient vector), without the need for additional execution steps.
[0088] Strength control: The regularization coefficient determines the strength of overfit suppression—a larger coefficient results in a more severe penalty and stronger suppression, but may lead to underfitting; a smaller coefficient results in a weaker penalty and poorer suppression. This coefficient is determined by subsequent Optuna-DART joint optimization to ensure a balance between fitting accuracy and generalization ability.
[0089] Joint logic: Optuna is responsible for global hyperparameter optimization (covering all core parameters of all modules), and DART is responsible for model structure optimization (focusing on the internal structure of AKRR). The two form a dual drive of "parameters + structure" to ensure optimal overall performance.
[0090] Optimization parameter definition: Clustering module: number of clusters (number of working condition divisions), search range 2 to 20 (integer).
[0091] AKRR module: kernel width (radial basis function nonlinear mapping capability), search range 0.1 to 10 (uniformly distributed floating-point numbers).
[0092] Regularization module: L2 regularization coefficient (overfit suppression strength), search range 0.001 to 1 (uniformly distributed floating-point number).
[0093] DART algorithm: decision tree drop probability (ensemble regularization strength) 0.1 to 0.3 (uniformly distributed floating-point number), maximum tree depth (structural complexity) 3 to 10 (integer number), learning rate (sub-model update step size) 0.01 to 0.3 (uniformly distributed floating-point number).
[0094] Optimization goals and steps: Optimization objective: To minimize the root mean square error (RMSE) of 5-fold cross-validation. The RMSE logic is: the square root of (the sum of the squares of the differences between the predicted and actual values of all samples ÷ the number of samples); 5-fold cross-validation involves randomly dividing the data into 5 parts, alternating between using 4 parts as the training set and 1 part as the validation set, and taking the average of the 5 RMSE values.
[0095] Execution steps: The first step is to load the preprocessed full feature data and heat consumption rate labels.
[0096] The second step is to define the search range and data type for each hyperparameter.
[0097] Step 3, iterative optimization (100 iterations in total): ①Optuna generates a set of hyperparameter combinations.
[0098] ②The DART algorithm is trained according to hyperparameters, and outputs the optimized weight factors and kernel matrix construction logic of AKRR.
[0099] ③ The K-medoids algorithm divides the operating conditions into clusters based on the number of clusters in the hyperparameters.
[0100] ④ The AKRR and regularization modules are loaded with optimized parameters, clustered modeling is performed, and the root mean square error of 5-fold cross-validation is calculated.
[0101] ⑤ The root mean square error is fed back to Optuna to update the probability model and adjust the search direction for the next round; The fourth step is to output the globally optimal combination of hyperparameters (including the optimal parameters of each module) and the AKRR optimization structure parameters (weight factor rules and kernel matrix logic).
[0102] Module Collaboration Logic Data flow: Preprocessed data → K-medoids clustering to divide into working condition clusters → Clustering results + Optuna-DART optimization parameters → AKRR clustering modeling → Integrated output module for real-time prediction.
[0103] Parameter passing: The optimal parameters output by Optuna-DART are passed to the corresponding modules respectively: number of clusters → clustering module, kernel width + structural parameters → AKRR module, regularization coefficient → regularization module.
[0104] Optimization closed loop: The validation error of the AKRR sub-model is fed back to Optuna-DART, dynamically adjusting the search direction to form a "modeling-evaluation-optimization" closed loop.
[0105] Necessity analysis of each module Existing technologies suffer from drawbacks such as poor adaptability to operating conditions, low prediction accuracy, weak generalization ability, and incomplete optimization. All four modules are indispensable: The necessity of K-medoids clustering: When a single model covers all operating conditions, the differences in the influence of heat consumption rates under different operating conditions lead to poor prediction stability. This module divides operating conditions into clusters to ensure consistent patterns within each cluster and eliminates cross-operating condition interference, which is a prerequisite for accurate modeling.
[0106] The necessity of the AKRR algorithm: Traditional LSSVM has a complex structure and low computational efficiency, while ordinary kernel ridge regression lacks adaptive capabilities. AKRR solves directly through matrix operations, improving efficiency by 30% compared to LSSVM, and supports DART optimization to adapt to specific operating conditions, balancing accuracy and real-time performance. Replacing it with other algorithms would result in insufficient accuracy or inadequate real-time performance.
[0107] The necessity of built-in L2 regularization: External regularization has poor compatibility with prediction algorithms and limited overfit suppression effects. This module is integrated into AKRR, forming a dual anti-overfit mechanism with DART, and the regularization coefficients are globally optimized and determined. If omitted, the risk of overfitting increases by 40% in high-dimensional data scenarios, and the test set error increases significantly.
[0108] The necessity of Optuna-DART joint optimization: Single-parameter optimization does not involve structural optimization, and single-structural optimization lacks global parameter support, resulting in low efficiency for manual parameter tuning. The combination of the two achieves dual optimization of "parameters + structure," solving the pain point of incomplete optimization. If either component is omitted, the optimization accuracy or generalization ability decreases significantly, failing to achieve the preset performance.
[0109] Hardware components: Data acquisition unit: Supports OPCUA protocol, communicates with DCS system of thermal power unit, and has storage capacity ≥1TB.
[0110] Computing unit: CPU ≥ Intel Core i7-12700H, memory ≥ 32GB, to meet the computing needs of clustering, optimization, and modeling.
[0111] Output unit: Connects to the power plant's SIS system, displays prediction results in real time, and provides standardized data interfaces for third parties to use.
[0112] The software consists of several modules: a data preprocessing module, a K-medoids clustering module, an AKRR prediction module, a regularization module, an Optuna-DART joint optimization module, and an integrated output module. These modules are connected through standardized interfaces, supporting plug-and-play and upgrades.
[0113] The beneficial effects of the above method are: Strong adaptability to operating conditions and stable prediction across the entire load range: Existing technologies use a single model to cover all operating conditions, resulting in large error fluctuations due to differences in the influence of heat rate under different operating conditions. This invention divides the operating conditions into clusters using the K-medoids algorithm, making the influence of heat rate within each cluster more consistent. Then, a dedicated AKRR sub-model is trained for each cluster, eliminating cross-operating condition data interference at its source. Simultaneously, the adaptive weighting factor of AKRR is optimized using the DART algorithm to further adapt to the local nonlinear characteristics of a single operating condition. Ultimately, the relative error fluctuation range of the full load range (30%~100% load) prediction is reduced to within ±0.12%, and the stability is improved by more than 30% compared to traditional single models.
[0114] High prediction accuracy and excellent nonlinear fitting ability: Traditional LSSVM algorithms are computationally complex, and ordinary kernel ridge regression lacks adaptive optimization, resulting in limited fitting accuracy. This invention uses AKRR as the core prediction algorithm, achieving a low-dimensional to high-dimensional nonlinear mapping through radial basis functions. Furthermore, the kernel matrix undergoes DART algorithm-optimized weight allocation, accurately capturing the complex correlation between heat dissipation rate and core features. Combined with the global hyperparameter optimization of the Optuna framework, key parameters such as kernel width and regularization coefficient are ensured to be in optimal condition, keeping the average relative error of the test set within ±0.1%, achieving an accuracy improvement of 8%~10% compared to the traditional LSSVM-GWO model.
[0115] Excellent generalization ability and significantly reduced overfitting risk: Existing technologies have poor adaptability of external regularization and limited overfitting suppression effect. This invention adopts a dual mechanism of "built-in L2 regularization + DART integrated regularization": L2 regularization is embedded in the AKRR objective function to penalize excessively large regression coefficients to avoid fitting noise; the DART algorithm reduces the excessive dependence of a single model on data through random deactivation decision trees. The synergistic effect of the dual mechanisms reduces the overfitting risk of the model by 40% in high-dimensional data scenarios and significantly improves the prediction reliability of unknown real-time running data.
[0116] High optimization efficiency, no manual intervention required: Existing technologies involve time-consuming and labor-intensive manual parameter tuning, and single-mode optimization is prone to getting stuck in local optima. This invention uses Optuna-DART joint optimization. The Optuna framework uses the TPE algorithm to automatically explore the hyperparameter space of the entire module, while the DART algorithm simultaneously optimizes the AKRR model structure, eliminating the need for manual parameter range setting or algorithm logic adjustment. The optimization process only requires 100 iterations to approach the global optimum, improving optimization efficiency by more than 60% compared to traditional manual parameter tuning, and halving the model deployment cycle.
[0117] High computational efficiency, meeting real-time industrial needs: Traditional LSSVM requires solving Lagrange multipliers and KKT conditions, which is computationally time-consuming. The AKRR algorithm directly solves the regression coefficients through matrix operations, with a simple and efficient structure, improving computational efficiency by more than 30% compared to LSSVM. At the same time, clustered modeling reduces the amount of training data for a single model, further shortening the computation time. Ultimately, it achieves a prediction time of ≤0.1 seconds for a single real-time data point, fully meeting the industrial needs of real-time monitoring of thermal power units.
[0118] Example 3 This embodiment is a device embodiment, which can be used to execute the method embodiment of the present invention. For details not omitted in the device embodiment, please refer to the method embodiment of the present invention.
[0119] In this embodiment, a thermal power unit heat rate prediction system is provided. This thermal power unit heat rate prediction system can be used to implement the above-mentioned thermal power unit heat rate prediction method. Specifically, the thermal power unit heat rate prediction system includes a data acquisition module, a joint optimization module, a feature partitioning module, a regularization module, and a prediction module.
[0120] The data acquisition module is used to acquire historical operating data of thermal power units and divide it into training and validation sets.
[0121] The joint optimization module is used to perform joint optimization of the training set using the Optuna framework and the DART algorithm to obtain adaptive weight factors and global hyperparameter combinations.
[0122] The feature segmentation module is used to divide the training set into clusters of similar working conditions based on the number of clusters in the global hyperparameter combination and by applying the K-medoids algorithm.
[0123] The regularization module is used to load the regularization coefficients in the combination of adaptive weight factors and global hyperparameters, and to build an AKRR prediction sub-model with built-in L2 regularization for each working condition cluster.
[0124] The prediction module is used to input the real-time running data to be predicted into the matching AKRR prediction sub-model and output the heat rate prediction result through matrix operations. When constructing the kernel matrix, the AKRR prediction sub-model uses an adaptive weighting factor to weight the Euclidean distance between samples.
[0125] Example 4 This embodiment provides a terminal device, which includes a processor and a memory. The memory stores a computer program, which includes program instructions. The processor executes the program instructions stored in the computer storage medium. The processor may be a Central Processing Unit (CPU), or other general-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. It is the computing and control core of the terminal, suitable for implementing one or more instructions, specifically suitable for loading and executing one or more instructions to achieve corresponding method flows or corresponding functions. This embodiment of the invention... The processor described herein can be used to operate a thermal power unit heat rate prediction method, including: acquiring historical operating data of the thermal power unit and dividing it into a training set and a validation set; using the Optuna framework in conjunction with the DART algorithm to jointly optimize the training set and obtain adaptive weight factors and global hyperparameter combinations; based on the number of clusters in the global hyperparameter combinations, applying the K-medoids algorithm to divide the training set into operating condition clusters with similar features; loading the adaptive weight factors and regularization coefficients in the global hyperparameter combinations, and constructing an AKRR prediction sub-model with built-in L2 regularization for each operating condition cluster; inputting the real-time operating data to be predicted into the matched AKRR prediction sub-model, and outputting the heat rate prediction result through matrix operations; when constructing the kernel matrix, the AKRR prediction sub-model uses adaptive weight factors to weight the Euclidean distance between samples.
[0126] Example 5 This embodiment provides a computer-readable storage medium (Memory), which is a memory device in a terminal device used to store programs and data. It is understood that the computer-readable storage medium here can include both the built-in storage medium in the terminal device and extended storage media supported by the terminal device. The computer-readable storage medium provides storage space that stores the terminal's operating system. Furthermore, this storage space also stores one or more instructions suitable for loading and execution by a processor. These instructions can be one or more computer programs (including program code). It should be noted that the computer-readable storage medium here can be high-speed RAM or non-volatile memory, such as at least one disk storage device.
[0127] One or more instructions stored in a computer-readable storage medium can be loaded and executed by a processor to implement the corresponding steps of the thermal power unit heat rate prediction method in the above embodiments. One or more instructions in the computer-readable storage medium are loaded and executed by the processor in the following steps: acquiring historical operating data of thermal power units and dividing them into training and validation sets; using the Optuna framework in conjunction with the DART algorithm to jointly optimize the training set and obtain adaptive weight factors and global hyperparameter combinations; based on the number of clusters in the global hyperparameter combinations, applying the K-medoids algorithm to divide the training set into operating condition clusters with similar features; loading the adaptive weight factors and regularization coefficients in the global hyperparameter combinations, and constructing an AKRR prediction sub-model with built-in L2 regularization for each operating condition cluster; inputting the real-time operating data to be predicted into the matching AKRR prediction sub-model, and outputting the heat rate prediction result through matrix operations; when constructing the kernel matrix, the AKRR prediction sub-model uses adaptive weight factors to weight the Euclidean distance between samples.
[0128] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
[0129] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0130] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.
[0131] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0132] The sequence numbers of the embodiments in this application are for descriptive purposes only and do not represent the superiority or inferiority of the embodiments.
[0133] In the above embodiments of this application, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
[0134] In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiments described above are merely illustrative; for example, the division of units can be a logical functional division, and in actual implementation, there may be other division methods. For instance, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the displayed or discussed mutual coupling, direct coupling, or communication connection may be through some interfaces; the indirect coupling or communication connection between units or modules may be electrical or other forms.
[0135] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0136] The above description is only a preferred embodiment of this application. It should be noted that for those skilled in the art, several improvements and modifications can be made without departing from the principle of this application, and these improvements and modifications should also be considered within the scope of protection of this application.
[0137] It should be understood that the above description is for illustrative purposes and not for limitation. Many embodiments and applications beyond the provided examples will be apparent to those skilled in the art upon reading the above description. Therefore, the scope of this patent should not be determined by reference to the above description, but rather by reference to the foregoing claims and the full scope of their equivalents. For purposes of completeness, all articles and references, including patent applications and publications, are incorporated herein by reference. The omission of any aspect of the subject matter disclosed herein in the foregoing claims is not intended as a waiver of that subject matter, nor should it be construed as an indication that the applicant has not considered that subject matter as part of the disclosed inventive subject matter.
Claims
1. A method for predicting the heat rate of thermal power units, characterized in that, Includes the following processes: Historical operating data of thermal power units were acquired and divided into training and validation sets. The Optuna framework, in conjunction with the DART algorithm, is used to jointly optimize the training set and obtain adaptive weight factors and global hyperparameter combinations. Based on the number of clusters in the global hyperparameter combination, the K-medoids algorithm is applied to divide the training set into clusters of similar working conditions. Load the adaptive weighting factor and the regularization coefficient in the global hyperparameter combination, and construct an AKRR prediction sub-model with built-in L2 regularization for each working condition cluster; The real-time operating data to be predicted is input into the matching AKRR prediction sub-model, and the heat rate prediction result is output through matrix operations. When constructing the kernel matrix, the AKRR prediction sub-model uses an adaptive weighting factor to weight the Euclidean distance between samples.
2. The method for predicting the heat consumption rate of thermal power units according to claim 1, characterized in that, Historical operating data includes unit load, main steam flow rate, main steam pressure, feedwater flow rate, feedwater temperature, and reheat steam pressure, as well as the corresponding actual heat rate values.
3. The method for predicting the heat consumption rate of thermal power units according to claim 1, characterized in that, The process of jointly optimizing the training set using the Optuna framework and the DART algorithm is as follows: The Optuna framework defines and searches the hyperparameter space based on the TPE sampling algorithm. This hyperparameter space covers the number of clusters, AKRR kernel width, L2 regularization coefficient, DART decision tree dropout probability, maximum tree depth, and learning rate. The DART algorithm is trained based on the parameters in the hyperparameter space, optimizes the model structure through a random deactivation mechanism, and calculates an adaptive weight factor that reflects the confidence of sample prediction. The adaptive weighting factor is passed to the kernel matrix building module of the AKRR prediction sub-model.
4. The method for predicting the heat consumption rate of thermal power units according to claim 1, characterized in that, The process of dividing the training set into clusters of similar working conditions using the K-medoids algorithm is as follows: The actual data points in the training set are used as cluster centers; The final cluster center is selected by iterative optimization, which minimizes the sum of the distances from all data points within the cluster to the cluster center. Based on the distance from the data point to the final cluster center, all training samples are assigned to the corresponding working condition clusters.
5. The method for predicting the heat consumption rate of thermal power units according to claim 1, characterized in that, The process of constructing an AKRR prediction sub-model with built-in L2 regularization is as follows: Kernel matrix construction: Calculate the Euclidean distance between two samples within the working condition cluster, multiply the Euclidean distance by an adaptive weighting factor, and then input it into the radial basis function to obtain the kernel matrix elements; Regression coefficient solution: Multiply the kernel matrix and the identity matrix by the regularization coefficient and add them to obtain the intermediate matrix. Invert the intermediate matrix and multiply the inversion result by the true value vector of the heat consumption rate of the operating condition cluster to obtain the regression coefficient vector.
6. The method for predicting the heat rate of thermal power units according to claim 5, characterized in that, The built-in L2 regularization is achieved by introducing a L2 norm constraint term for the regression coefficients into the objective function of the AKRR prediction sub-model. This objective function consists of the L2 norm term of the product of the kernel matrix and the regression coefficient vector minus the true value vector of the heat rate, and the L2 norm term of the regularization coefficient multiplied by the regression coefficient vector.
7. The method for predicting the heat rate of thermal power units according to claim 1, characterized in that, The process of jointly optimizing the training set using the Optuna framework and the DART algorithm also includes establishing a closed-loop feedback mechanism: In each round of optimization iteration, the root mean square error of the AKRR prediction sub-model is evaluated using the validation set; The root mean square error is fed back into the Optuna framework; The Optuna framework updates the probabilistic model based on feedback and adjusts the hyperparameter search direction for the next round until a preset number of iterations or an error threshold is reached.
8. A thermal power unit heat rate prediction system, characterized in that, include: The data acquisition module is used to acquire historical operating data of thermal power units and divide it into training and validation sets; The joint optimization module is used to use the Optuna framework in conjunction with the DART algorithm to jointly optimize the training set and obtain adaptive weight factors and global hyperparameter combinations. The feature partitioning module is used to divide the training set into clusters of similar working conditions based on the number of clusters in the global hyperparameter combination and applying the K-medoids algorithm. The regularization module is used to load the regularization coefficients in the combination of adaptive weight factors and global hyperparameters, and to build an AKRR prediction sub-model with built-in L2 regularization for each working condition cluster. The prediction module is used to input the real-time running data to be predicted into the matching AKRR prediction sub-model and output the heat rate prediction result through matrix operations. When constructing the kernel matrix, the AKRR prediction sub-model uses an adaptive weighting factor to weight the Euclidean distance between samples.
9. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the thermal power unit heat rate prediction method as described in any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, characterized in that, When the computer program is executed by the processor, it implements the steps of the thermal power unit heat rate prediction method as described in any one of claims 1 to 7.