Energy-based intelligent regulation method and system for thermal power production parameters
By employing an intelligent control method that combines multi-scale feature extraction, capacity gating fusion, and activation value clipping, the problems of low control accuracy and model instability in traditional thermal power production control have been solved. This method achieves high-precision and smooth control parameter prediction, ensuring equipment safety and production efficiency.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SOUTH CHINA INST OF ENVIRONMENTAL SCI MEP
- Filing Date
- 2026-02-11
- Publication Date
- 2026-06-12
Smart Images

Figure CN122196493A_ABST
Abstract
Description
Technical Field
[0001] This application belongs to the field of intelligent control, and in particular relates to an intelligent control method and system for thermal power production parameters based on production capacity. Background Technology
[0002] Traditional control methods, whether manual or using PID control, suffer from low precision and low energy efficiency. Uncovering underlying operational patterns from historical data can assist or even replace manual decision-making, thereby achieving more refined operational optimization while ensuring safe production. The cogeneration process involves variations at different time scales, such as minute-level disturbances, hourly load periodic fluctuations, and longer-term equipment start-up and shutdown effects. Existing time series models have limited ability to detect such long-term, multi-scale dependencies and easily overlook key periodic features. The core objective of production control is to meet specific capacity plans, but many models fail to integrate key command information about capacity with process time-series features, merely treating this information as ordinary input features. This results in insufficient responsiveness and guidance for capacity targets. Furthermore, the collected data often contains noise and outliers, which can lead to unstable model training, causing severe fluctuations or overshoot in the predicted control parameter sequences. In practical applications, this not only lacks operability but may also pose a threat to equipment safety. Therefore, how to plan an intelligent control method that can extract multi-scale time-series features, deeply integrate production capacity targets, and ensure stable and smooth prediction results is a technical problem that urgently needs to be solved in the current field. Summary of the Invention
[0003] In response to the problems mentioned in the background art, this invention proposes, in a first aspect, a method for intelligent control of thermal power production parameters based on production capacity, comprising the following steps: Acquire historical operating data and planned capacity data of thermal power production, process the historical operating data into multi-dimensional time-series features, and process the planned capacity data into a capacity feature vector; The multi-dimensional temporal features are input into a multi-scale feature extraction module. The module contains parallel multi-path causal dilated convolution branches. Each branch uses a different and non-equal dilation rate to detect temporal dependencies at different periods. The output feature maps of each branch are concatenated along the channel dimension to obtain multi-scale temporal features. The capacity-gated fusion unit generates channel attention weights using the capacity feature vector and applies the weights to each feature channel of the multi-scale temporal features to obtain fused features. The fused features are then input into stacked residual modules for deep feature learning, and activation value clipping is performed after the nonlinear activation layer of each residual module. Based on the output of the residual module, the control parameters for thermal power production at future times are predicted, and the model is trained using a composite loss function that includes a mean squared error term and a time gradient penalty term. The mean squared error term is used to represent the difference between the predicted parameters and the true parameters, and the time gradient penalty term is used to smooth the predicted control parameter sequence.
[0004] Optionally, processing the historical operational data into multi-dimensional time-series features and processing the planned capacity data into a capacity feature vector includes: The collected steam pressure, steam temperature, feedwater flow rate, fuel consumption, and actual power generation at the current moment are used as dimensions to constitute the multidimensional time series features. The time series representing the planned power generation within a preset future time period is mapped to a capacity feature vector of a preset dimension through a fully connected layer.
[0005] Optionally, the step of inputting the multidimensional temporal features into a multi-scale feature extraction module, the module comprising parallel multi-path causal dilated convolution branches, each branch employing different and non-equal dilation rates to detect temporal dependencies at different periods, includes: The multi-scale feature extraction module contains parallel multi-path causal dilated convolution branches, and the dilation rates used by each branch form a geometric sequence with a common ratio greater than 1.
[0006] Optionally, the step of generating channel attention weights using the capacity feature vector through the capacity gating fusion unit includes: The capacity feature vector is input into a network consisting of two fully connected layers. The first fully connected layer is followed by a modified linear unit activation function, and the second fully connected layer is followed by a sigmoid activation function. The output channel attention weights are equal to the number of channels of the multi-scale temporal feature.
[0007] Optionally, the step of inputting the fused features into stacked residual modules for deep feature learning includes: The fused features are sequentially passed through a predetermined number of stacked residual modules; Each residual module consists of two one-dimensional convolutional layers, followed by a batch normalization layer and a modified linear unit activation layer. After the second modified linear unit activation layer, the module's input is added element-wise.
[0008] Optionally, the activation value clipping process performed after the nonlinear activation layer of each residual module includes: Initialize a global clipping threshold; During the model training phase, the preset high percentile of the activation values output by the activation layer for all data in the current training batch is calculated, and the preset high percentile is fused into the global clipping threshold using a momentum update strategy. During the model inference phase, the global clipping threshold after training is read directly, and the elements in the activation values that are greater than the global clipping threshold are reset to the values corresponding to the global clipping threshold.
[0009] In the second aspect, this invention proposes an intelligent control system for thermal power production parameters based on production capacity, comprising the following modules: The acquisition module is used to acquire historical operating data and planned capacity data of thermal power production, process the historical operating data into multi-dimensional time-series features, and process the planned capacity data into a capacity feature vector. The input module is used to input the multi-dimensional temporal features into the multi-scale feature extraction module. The module includes parallel multi-path causal dilated convolution branches. Each branch uses a different and non-equal dilation rate to detect temporal dependencies at different periods. The output feature maps of each branch are concatenated along the channel dimension to obtain multi-scale temporal features. The calculation module is used to generate channel attention weights using the capacity feature vector through the capacity gating fusion unit, and apply the weights to each feature channel of the multi-scale time series features to obtain fused features; the fused features are input to stacked residual modules for deep feature learning, and activation value clipping is performed after the nonlinear activation layer of each residual module; The output module is used to predict the control parameters of thermal power production at future times based on the output of the residual module, and to train the model using a composite loss function that includes a mean square error term and a time gradient penalty term. The mean square error term is used to represent the difference between the predicted parameters and the true parameters, and the time gradient penalty term is used to smooth the predicted control parameter sequence.
[0010] Preferably, the step of processing the historical operating data into multi-dimensional time-series features and processing the planned capacity data into a capacity feature vector includes: The collected steam pressure, steam temperature, feedwater flow rate, fuel consumption, and actual power generation at the current moment are used as dimensions to constitute the multidimensional time series features. The time series representing the planned power generation within a preset future time period is mapped to a capacity feature vector of a preset dimension through a fully connected layer.
[0011] Preferably, the step of inputting the multidimensional temporal features into a multi-scale feature extraction module includes parallel multi-path causal dilated convolution branches, each branch employing different and non-equal dilation rates to detect temporal dependencies at different periods, including: The multi-scale feature extraction module contains parallel multi-path causal dilated convolution branches, and the dilation rates used by each branch form a geometric sequence with a common ratio greater than 1.
[0012] Preferably, the step of generating channel attention weights using the capacity feature vector through the capacity gating fusion unit includes: The capacity feature vector is input into a network consisting of two fully connected layers. The first fully connected layer is followed by a modified linear unit activation function, and the second fully connected layer is followed by a sigmoid activation function. The output channel attention weights are equal to the number of channels of the multi-scale temporal feature.
[0013] Preferably, the step of inputting the fused features into stacked residual modules for deep feature learning includes: The fused features are sequentially passed through a predetermined number of stacked residual modules; Each residual module consists of two one-dimensional convolutional layers, followed by a batch normalization layer and a modified linear unit activation layer. After the second modified linear unit activation layer, the module's input is added element-wise.
[0014] Preferably, the activation value clipping process performed after the nonlinear activation layer of each residual module includes: After the last nonlinear activation layer of each residual module, the preset high percentile of the activation values output by the activation layer for all data in the current training batch is calculated, and the elements in the activation values that are greater than the preset high percentile are reset to the values corresponding to the preset high percentile.
[0015] This invention utilizes causal convolutional branches with multiple non-arithmetic void ratios to detect complex temporal dependencies at different time scales in the thermal power production process. Through a capacity-gated fusion unit, planned capacity information is integrated into the temporal features, improving the model's prediction accuracy under different production loads. An activation value clipping mechanism is employed to suppress interference from data noise or outliers during model training. Model optimization using a composite loss function combining mean squared error and temporal gradient penalty not only ensures the accuracy of prediction parameters but also guarantees good smoothness in the output control parameter sequence, avoiding drastic parameter fluctuations. This, in turn, helps ensure the safe and stable operation of production equipment and improves overall production efficiency. Attached Figure Description
[0016] Figure 1 This is a flowchart of a specific embodiment one. Detailed Implementation
[0017] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0018] See Figure 1 The flowchart of the specific embodiment shown includes the following steps: S1, acquire historical operating data and planned capacity data of thermal power production, process the historical operating data into multi-dimensional time-series features, and process the planned capacity data into a capacity feature vector; Historical operating data, such as boiler main steam pressure, main steam temperature, current actual power generation of the unit, and fuel consumption, are read from the Distributed Control System (DCS) and Manufacturing Execution System (MES) databases. Planned power generation time-series curves for a future period are also retrieved from the Enterprise Resource Planning (ERP) system as planned capacity data. The historical operating data is preprocessed by using Lagrange interpolation to fill missing values, employing the 3σ criterion to remove outliers, and performing minimum-maximum normalization on all data to scale it to the range of zero to one. A sliding window method is used to construct a fixed-length sequence from the processed historical data, for example, using data from every 288 time points as a sample, forming a three-dimensional tensor of batch size, sequence length, and number of features as multi-dimensional time-series features. For the planned capacity data, the normalized planned capacity sequence is mapped to a high-dimensional vector through a feature extraction layer, containing information on the rate and trend of load changes, resulting in a capacity feature vector that can be fused with the multi-dimensional time-series features in terms of dimension.
[0019] In an optional embodiment, processing the historical operational data into multi-dimensional time-series features and processing the planned capacity data into a capacity feature vector includes: The collected steam pressure, steam temperature, feedwater flow rate, fuel consumption, and actual power generation at the current moment are used as dimensions to constitute the multidimensional time series features. The time series representing the planned power generation within a preset future time period is mapped to a capacity feature vector of a preset dimension through a fully connected layer.
[0020] Multiple historical operating parameters collected are constructed into a multi-dimensional time series matrix. For example, within each time step, five key parameters of the unit are collected: steam pressure, steam temperature, feedwater flow rate, fuel consumption, and current actual power generation. If the historical series length is 100 time steps, a matrix of size 100 rows and 5 columns is formed. The current actual power generation constitutes the feedback signal for the control loop, enabling the model to perceive the deviation between the current state and the target state.
[0021] The planned capacity data, representing future production tasks, is serialized. Planned capacity is a time series containing future trends, such as the planned load value per minute within the next hour, totaling 60 points. This series is input into a fully connected layer, which maps this series, containing ramp rate and target load information, to a high-dimensional vector space, such as 64-dimensional. The capacity feature vector represents the future load change trajectory, such as a sharp ramp or a smooth transition.
[0022] The model is a multi-input, single-output deep temporal convolutional neural network, consisting of three concatenated parts: a multi-scale dilated convolutional module for extracting temporal dependencies, a channel attention module for adjusting feature weights using production capacity information, and a stacked residual network, such as ResNet, for deep fitting. The model's input is dual-source heterogeneous data: one is historical multi-dimensional time-series data containing steam pressure, temperature, flow rate, consumption, and power, reflecting the system's past and present state inertia; the other is a planned production capacity feature vector, representing the system's future production goals and constraints. The model's output is a sequence of future thermal power production control parameters corresponding to the input time series. The training samples are a supervised learning dataset constructed based on historical archived data. During training, the model predicts results using the input part and compares them with the labeled part, i.e., the true parameters, to calculate the mean squared error and gradient penalty, thereby adjusting the network weights.
[0023] S2, The multi-dimensional temporal features are input into the multi-scale feature extraction module. The module contains parallel multi-path causal dilated convolution branches. Each branch uses a different and non-equal dilation rate to detect temporal dependencies in different periods. The output feature maps of each branch are spliced along the channel dimension to obtain multi-scale temporal features. A parallel three-branch network structure is established, with each branch being a one-dimensional causal dilated convolutional layer. To achieve causality, the beginning of the input sequence is asymmetrically padded before the convolution operation, ensuring that the output at any given time depends only on information from the current and past time points. The dilation rates for the three branches are set to 1, 2, and 5, respectively. The convolutional kernel of the first branch samples continuously on the input sequence, the convolutional kernel of the second branch samples once every other time point, and the convolutional kernel of the fifth branch samples once every four time points. Tensors with the same shape but different feature representations from the outputs of the three branches are merged along the dimension representing the feature channels. For example, three output tensors with 64 channels each are concatenated into a tensor with 192 channels, which serves as a multi-scale temporal feature.
[0024] In an optional embodiment, the step of inputting the multidimensional temporal features into a multi-scale feature extraction module, the module comprising parallel multi-path causal dilated convolution branches, each branch employing different and non-arithmic dilation rates to detect temporal dependencies at different periods, including: The multi-scale feature extraction module contains parallel multi-path causal dilated convolution branches, and the dilation rates used by each branch form a geometric sequence with a common ratio greater than 1.
[0025] The multi-scale feature extraction module is structured to process multi-dimensional temporal features in parallel. This module contains multiple independent convolutional branches, each a causal dilated one-dimensional convolutional layer. The causal nature ensures that when predicting data at any given time point, only historical information prior to that point is used, avoiding the leakage of future information. Dilated convolution expands the receptive field by inserting holes between convolutional kernel elements, thereby detecting a wider range of temporal dependencies without increasing computational cost.
[0026] The dilation rate of each parallel branch follows a geometric progression with a common ratio greater than 1. For example, with three branches, the dilation rates are 1, 2, and 4. The branch with a dilation rate of 1 is equivalent to standard convolution and is used to extract local, short-term temporal features. The branch with a dilation rate of 2 has a larger receptive field and can detect patterns over medium time spans. The branch with a dilation rate of 4 has the largest receptive field and focuses on extracting long-term trends and periodic features. After all branches have been processed, the output feature maps are concatenated along the channel dimension to obtain an enhanced feature representation that integrates multiple time scales.
[0027] S3, through the capacity-gated fusion unit, channel attention weights are generated using the capacity feature vector, and the weights are applied to each feature channel of the multi-scale temporal features to obtain fused features; the fused features are input to stacked residual modules for deep feature learning, and activation value clipping is performed after the nonlinear activation layer of each residual module; In one embodiment, the production capacity feature vector is input into a miniature neural network consisting of two fully connected layers. The output dimension of the first fully connected layer is one-quarter of the number of channels in the multi-scale temporal feature and uses the ReLU activation function. The output dimension of the second fully connected layer is equal to the number of channels in the multi-scale temporal feature. The output of the second fully connected layer is passed through a Sigmoid activation function to compress the numerical range of the output to between zero and one, generating an attention weight vector representing the importance of each channel. This weight vector is broadcast in the time dimension to match the shape of the multi-scale temporal feature. The two are multiplied element-wise to weight different feature channels, thereby modulating the temporal feature with production capacity information and obtaining a fused feature.
[0028] The fused features are sequentially passed through five stacked residual modules. Each residual module contains two one-dimensional convolutional layers, two layer normalization layers, and two ReLU nonlinear activation layers, with a cross-layer connection adding the module's input to its output. After each ReLU activation function, a stability clipping process based on global statistics is performed: a global clipping threshold is initialized; during training, the 99th percentile of the current batch of activation values is calculated, and the global clipping threshold is smoothly updated; during inference, the globally clipping threshold locked during training is directly used to truncate the upper limit of the output tensor. This approach suppresses gradient explosion caused by extreme activation values while ensuring the completeness and consistency of the model's logic during single-sample inference.
[0029] In yet another optional embodiment, the step of generating channel attention weights using the capacity feature vector through the capacity gating fusion unit includes: The capacity feature vector is input into a network consisting of two fully connected layers. The first fully connected layer is followed by a modified linear unit activation function, and the second fully connected layer is followed by a sigmoid activation function. The output channel attention weights are equal to the number of channels of the multi-scale temporal feature.
[0030] The fusion process begins by inputting the capacity feature vector generated in the preceding steps into a small neural network. This network consists of two fully connected layers cascaded together. The first fully connected layer performs a linear transformation on the capacity feature vector, with the output dimension typically smaller than the input dimension, effectively compressing the information. The transformed result is then passed through a modified linear unit activation function, which sets all negative values to zero. This leverages the non-linear expressive power of the function to better learn the complex relationship between capacity information and time-series characteristics.
[0031] The activated features are fed into a second fully connected layer. This layer maps the feature dimension to a dimension with the exact same number of channels as the multi-scale temporal feature. For example, if the multi-scale temporal feature has 128 channels, the output of the second fully connected layer is also a 128-dimensional vector. This vector is then passed through a sigmoid activation function, which compresses the value of each element to the range of 0 to 1. The output vector is the channel attention weight, where each element corresponds to the weight of a temporal feature channel, used for subsequent weighting of temporal features at different scales to achieve feature selection based on production capacity targets.
[0032] In yet another optional embodiment, the step of inputting the fused features into stacked residual modules for deep feature learning includes: The fused features are sequentially passed through a predetermined number of stacked residual modules; Each residual module consists of two one-dimensional convolutional layers, followed by a batch normalization layer and a modified linear unit activation layer. After the second modified linear unit activation layer, the module's input is added element-wise.
[0033] To learn deeper abstract representations from the fused features, the features are fed into a deep network structure consisting of multiple residual modules stacked sequentially. The data flows through each residual module in turn. For example, with four stacked residual modules, the output of the first module becomes the input of the second module, and so on, achieving layer-by-layer feature extraction.
[0034] Each residual module contains a main path and a shortcut connection. On the main path, the input features pass through a one-dimensional convolutional layer to further extract local patterns over time. The output of the convolutional layer is passed through a batch normalization layer to accelerate training convergence and improve model stability, and then through a rectified linear unit (RCU) activation layer for a non-linear transformation. This combination of convolution, normalization, and activation is repeated once. At the end of the main path, after the second RCU activation layer, the output is element-wise added to the initial input features of the module. The shortcut connection structure allows the network to easily learn identity mappings, mitigating the vanishing gradient problem in deep networks.
[0035] In an optional embodiment, performing activation value clipping after the nonlinear activation layer of each residual module includes: Initialize a global clipping threshold; During the model training phase, the preset high percentile of the activation values output by the activation layer for all data in the current training batch is calculated, and the preset high percentile is fused into the global clipping threshold using a momentum update strategy. During the model inference phase, the global clipping threshold after training is read directly, and the elements in the activation values that are greater than the global clipping threshold are reset to the values corresponding to the global clipping threshold.
[0036] Clipping is performed at the end of each residual module, immediately following the second modified linear unit activation layer, to maintain the stability of the model's numerical values. In a single training iteration, all activation values generated after all samples in the current batch pass through this activation layer are collected. Assuming a batch size of 32, a feature map time step of 100, and 64 channels, a total of 32 × 100 × 64 activation values will be collected.
[0037] Based on the collected activation values, a high percentile is calculated, for example, the 99.5th percentile, assumed to be 20.5. The global clipping threshold is then smoothly updated using a momentum update formula, for example: global threshold = 0.9 × old global threshold + 0.1 × 20.5. The updated global threshold is then used to impose an upper limit on the current activation values. This not only prevents gradient explosion caused by individual outliers but also records the overall statistical distribution of the training data, thus providing a fixed cutoff standard for the single-sample inference stage.
[0038] S4. Based on the output of the residual module, predict the thermal power production control parameters at future times, and use a composite loss function including a mean square error term and a time gradient penalty term for model training. The mean square error term is used to represent the difference between the predicted parameters and the true parameters, and the time gradient penalty term is used to smooth the predicted control parameter sequence.
[0039] The output features of the last residual module are passed through a global average pooling layer, followed by a fully connected layer, to map the feature dimension to the number of control parameters to be predicted multiplied by the dimension of the future time steps. For example, if 5 control parameters are predicted for the next 24 time points, the output dimension will be 120. During the backpropagation phase of model training, the total loss is calculated. Optionally, the total loss equals the mean squared error loss plus... Multiply by the time gradient penalty loss, where the mean squared error loss is the mean of the squares of the differences between the predicted and actual control parameter sequences, and the time gradient penalty loss is the mean of the squares of the differences between adjacent time points in the predicted control parameter sequence. Hyperparameters... To balance prediction accuracy and sequence smoothness, for example, set to 0.1.
[0040] In an optional embodiment, the composite loss function L is calculated using the following formula: ; Where N is the batch size and T is the predicted sequence length. Let be the predicted value of the i-th sample at time t. For the corresponding true value, This is a hyperparameter used to balance the mean square error term and the time gradient penalty term.
[0041] This composite loss function consists of two parts, used to guide parameter optimization during model training. The mean squared error term, i.e. This term calculates the average of the squared differences between the predicted and actual values for all samples at all prediction time points within a training batch. Its main function is to measure the accuracy of predictions, driving the model to learn to minimize the gap between the predicted results and the actual observations, and it is a primary evaluation metric for the model's predictive ability.
[0042] The time gradient penalty term, i.e. This parameter calculates the average of the squared changes in predicted values between adjacent time points in the prediction sequence. It aims to penalize overly abrupt or uneven fluctuations in the prediction sequence. In industrial process forecasting, changes in physical quantities are typically continuous and smooth; inputting this parameter can make the model's predictions more consistent with physical laws, avoiding meaningless spikes or jitters. Hyperparameters This is used to adjust the weight of the penalty term in the total loss, by adjusting... A trade-off can be struck between prediction accuracy and smoothness.
[0043] The experiment used a real SCADA historical operation dataset from a thermal power plant, spanning 12 months with a sampling frequency of 1 minute. The dataset included steam pressure, temperature, feedwater flow rate, fuel consumption, actual power, and corresponding planned future power generation. The dataset was divided into training, validation, and test sets in an 8:1:1 ratio. The experiment was conducted on a single NVIDIA RTX 3090 GPU using the PyTorch framework, with the Adam optimizer and an initial learning rate of 0.001. The evaluation metrics were Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Smoothness Index (SI). To verify the contribution of each module, three comparative models were set up: Model A, which removed the capacity-gated fusion unit, directly concatenated the capacity features with the time-series features, and did not use an attention mechanism; Model B, which removed the residual module, clipped the activation values, and used standard ReLU; and Model C, which removed the temporal gradient penalty term from the loss function and used only MSE loss. Table 1 shows the specific performance data of each model on the test set.
[0044] Table 1 ; It can be seen that this invention achieves optimal results in both RMSE and MAE. Compared with Model A, the introduction of the capacity-gated fusion unit reduces MAE by approximately 21%, demonstrating that channel attention weights generated using the planned capacity vector can guide the model to focus on key feature channels under different load conditions, such as focusing more on feedwater flow under high load, which is more adaptable than directly splicing features. Although the activation value clipping strategy has no impact on the number of parameters, it avoids overfitting noise by suppressing abnormal activation values, further optimizing RMSE by approximately 0.27. Although Model C and this invention do not differ much in prediction accuracy, they show a significant difference in smoothness index. Model C has an SI as high as 1.56, indicating that its predicted control parameters have jitter, and high-frequency fluctuations can lead to severe wear of actuators in actual control. In contrast, this invention, by introducing a time gradient penalty term, significantly reduces SI to 0.45, outputting a smoother control curve that meets the requirements of actual engineering execution while maintaining prediction accuracy, verifying the necessity and effectiveness of this composite loss function in the scenario of thermal power production parameter control.
[0045] The general principles defined in this invention may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims
1. A method for intelligent control of thermal power production parameters based on production capacity, characterized in that, Includes the following steps: Acquire historical operating data and planned capacity data of thermal power production, process the historical operating data into multi-dimensional time-series features, and process the planned capacity data into a capacity feature vector; The multi-dimensional temporal features are input into a multi-scale feature extraction module. The module contains parallel multi-path causal dilated convolution branches. Each branch uses a different and non-equal dilation rate to detect temporal dependencies at different periods. The output feature maps of each branch are concatenated along the channel dimension to obtain multi-scale temporal features. The capacity-gated fusion unit generates channel attention weights using the capacity feature vector and applies the weights to each feature channel of the multi-scale temporal features to obtain fused features. The fused features are then input into stacked residual modules for deep feature learning, and activation value clipping is performed after the nonlinear activation layer of each residual module. Based on the output of the residual module, the control parameters for thermal power production at future times are predicted, and the model is trained using a composite loss function that includes a mean squared error term and a time gradient penalty term. The mean squared error term is used to represent the difference between the predicted parameters and the true parameters, and the time gradient penalty term is used to smooth the predicted control parameter sequence.
2. The intelligent control method for thermoelectric production parameters according to claim 1, characterized in that, The process of processing the historical operational data into multi-dimensional time-series features and the planned capacity data into a capacity feature vector includes: The collected steam pressure, steam temperature, feedwater flow rate, fuel consumption, and actual power generation at the current moment are used as dimensions to constitute the multidimensional time series features. The time series representing the planned power generation within a preset future time period is mapped to a capacity feature vector of a preset dimension through a fully connected layer.
3. The intelligent control method for thermoelectric production parameters according to claim 1 or 2, characterized in that, The process of inputting the multidimensional temporal features into a multi-scale feature extraction module, which includes parallel multi-path causal dilated convolution branches, each branch employing different and non-equal dilation rates to detect temporal dependencies at different periods, includes: The multi-scale feature extraction module contains parallel multi-path causal dilated convolution branches, and the dilation rates used by each branch form a geometric sequence with a common ratio greater than 1.
4. The intelligent control method for thermoelectric production parameters according to claim 1, characterized in that, The process of generating channel attention weights using the capacity feature vector through the capacity gating fusion unit includes: The capacity feature vector is input into a network consisting of two fully connected layers. The first fully connected layer is followed by a modified linear unit activation function, and the second fully connected layer is followed by a sigmoid activation function. The output channel attention weights are equal to the number of channels of the multi-scale temporal feature.
5. The intelligent control method for thermoelectric production parameters according to claim 1, characterized in that, The step of inputting the fused features into stacked residual modules for deep feature learning includes: The fused features are sequentially passed through a predetermined number of stacked residual modules; Each residual module consists of two one-dimensional convolutional layers, followed by a batch normalization layer and a modified linear unit activation layer. After the second modified linear unit activation layer, the module's input is added element-wise.
6. The intelligent control method for thermoelectric production parameters according to claim 1, characterized in that, The activation value clipping process performed after the nonlinear activation layer of each residual module includes: Initialize a global clipping threshold; During the model training phase, the preset high percentile of the activation values output by the activation layer for all data in the current training batch is calculated, and the preset high percentile is fused into the global clipping threshold using a momentum update strategy. During the model inference phase, the global clipping threshold after training is read directly, and the elements in the activation values that are greater than the global clipping threshold are reset to the values corresponding to the global clipping threshold.
7. A smart control system for thermal power production parameters based on production capacity, characterized in that, Includes the following modules: The acquisition module is used to acquire historical operating data and planned capacity data of thermal power production, process the historical operating data into multi-dimensional time-series features, and process the planned capacity data into a capacity feature vector. The input module is used to input the multi-dimensional temporal features into the multi-scale feature extraction module. The module includes parallel multi-path causal dilated convolution branches. Each branch uses a different and non-equal dilation rate to detect temporal dependencies at different periods. The output feature maps of each branch are concatenated along the channel dimension to obtain multi-scale temporal features. The calculation module is used to generate channel attention weights using the capacity feature vector through the capacity gating fusion unit, and apply the weights to each feature channel of the multi-scale time series features to obtain fused features; the fused features are input to stacked residual modules for deep feature learning, and activation value clipping is performed after the nonlinear activation layer of each residual module; The output module is used to predict the control parameters of thermal power production at future times based on the output of the residual module, and to train the model using a composite loss function that includes a mean square error term and a time gradient penalty term. The mean square error term is used to represent the difference between the predicted parameters and the true parameters, and the time gradient penalty term is used to smooth the predicted control parameter sequence.
8. The intelligent control system for thermoelectric production parameters according to claim 7, characterized in that, The process of processing the historical operational data into multi-dimensional time-series features and the planned capacity data into a capacity feature vector includes: The collected steam pressure, steam temperature, feedwater flow rate, fuel consumption, and actual power generation at the current moment are used as dimensions to constitute the multidimensional time series features. The time series representing the planned power generation within a preset future time period is mapped to a capacity feature vector of a preset dimension through a fully connected layer.
9. The intelligent control system for thermoelectric production parameters according to claim 7, characterized in that, The process of inputting the multidimensional temporal features into a multi-scale feature extraction module, which includes parallel multi-path causal dilated convolution branches, each branch employing different and non-equal dilation rates to detect temporal dependencies at different periods, includes: The multi-scale feature extraction module contains parallel multi-path causal dilated convolution branches, and the dilation rates used by each branch form a geometric sequence with a common ratio greater than 1.
10. The intelligent control system for thermoelectric production parameters according to claim 7, characterized in that, The process of generating channel attention weights using the capacity feature vector through the capacity gating fusion unit includes: The capacity feature vector is input into a network consisting of two fully connected layers. The first fully connected layer is followed by a modified linear unit activation function, and the second fully connected layer is followed by a sigmoid activation function. The output channel attention weights are equal to the number of channels of the multi-scale temporal feature.