A kiln temperature prediction method based on spatiotemporal transformer and physical information neural network (PINN) fusion

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By fusing spatiotemporal Transformer with PINN, the coupling effect between burners is explicitly modeled and physical constraints are introduced, which solves the problem of low temperature prediction accuracy in ceramic kilns and achieves high-precision and interpretable temperature prediction.

CN121302280BActive Publication Date: 2026-06-26KUNMING UNIV OF SCI & TECH

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: KUNMING UNIV OF SCI & TECH
Filing Date: 2025-11-11
Publication Date: 2026-06-26

AI Technical Summary

Technical Problem

Existing methods struggle to accurately capture the strong convection coupling effect within multi-burner roller kilns, and traditional data-driven models neglect physical spatial correlations, resulting in low accuracy in ceramic kiln temperature prediction, which significantly decreases when operating conditions change.

Method used

By employing a method that integrates the spatiotemporal Transformer and the physical information neural network PINN, the temporal dependence and spatial coupling effect between burners are explicitly modeled, and the thermodynamic equation of oxygen-enriched combustion is introduced as a physical constraint. Combined with data-driven learning, this approach achieves high accuracy and interpretability in temperature prediction.

Benefits of technology

It significantly improves the accuracy and generalization ability of kiln temperature prediction. The model follows physical laws, has high transparency and interpretability, and can automatically discover the true combustion characteristics of the system.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN121302280B_ABST

Patent Text Reader

Abstract

The present application provides a kiln temperature prediction method based on the fusion of space-time Transformer and physical information neural network (PINN), belonging to the technical field of intelligent control of industrial kiln. The method collects historical combustion parameters of the burner, models time evolution and space coupling through a space-time coupling Transformer model, captures the time sequence dependence of the combustion parameters through a time attention mechanism, models the heat coupling effect between multiple burners through a space attention mechanism, then introduces a PINN framework and embeds the oxygen-enriched combustion thermodynamic equation as an explicit constraint into the training process, ensures that the prediction conforms to the energy conservation and heat transfer law through a physical loss function, and realizes parameter inversion through learnable physical parameters to automatically discover the true combustion characteristics, thereby fusing the deep learning fitting capability and the prior knowledge of physical law, and improving the accuracy, interpretability and generalization ability of kiln temperature prediction.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of intelligent control technology for industrial kilns, specifically involving a kiln temperature prediction method based on the fusion of spatiotemporal Transformer and physical information neural network PINN. In particular, it is a technology that uses a combination of deep learning and physical constraints to make short-term high-precision predictions of the temperature field of ceramic kilns, which is especially suitable for temperature prediction and control of multi-burner roller kilns using oxygen-enriched combustion technology. Background Technology

[0002] Ceramic roller kilns are the core firing equipment in the modern ceramics industry, and the uniformity and stability of their internal temperature field directly determine product quality. Modern roller kilns generally employ oxygen-enriched combustion technology to improve combustion efficiency and achieve energy conservation and emission reduction.

[0003] However, oxygen-enriched combustion presents significant challenges: First, the nonlinear characteristics of the combustion process are aggravated, and the temperature response exhibits obvious nonlinear features; second, the high-speed injected gas and combustion air form a strong convection field in the kiln, and the hot airflow generated by different burners interferes with each other, forming a complex dynamic coupling system; finally, there is a time delay between the adjustment of combustion parameters and the temperature response, and the delay is affected by a variety of factors.

[0004] Existing methods struggle to accurately capture the strong convective coupling effect between burners. Mechanistic models require enormous computational resources, making them unsuitable for real-time prediction. While traditional data-driven models can handle time series, they neglect the strong correlation between measurement points in physical space, resulting in low prediction accuracy, especially with a significant decrease in accuracy when operating conditions change.

[0005] Therefore, how to develop a high-precision, short-time temperature prediction method that can explicitly model the strong convection interaction between burners and adapt to the nonlinear combustion characteristics of oxygen-rich combustion has become a technical problem that urgently needs to be solved to improve the intelligent control level of ceramic kilns, ensure product quality, and achieve energy conservation and consumption reduction. Summary of the Invention

[0006] The purpose of this invention is to overcome the shortcomings of existing technologies and provide a kiln temperature prediction method based on the fusion of spatiotemporal Transformer and Physical Information Neural Network (PINN). This method explicitly models the temporal dependencies and spatial coupling effects among multiple burners using a spatiotemporally coupled Transformer architecture. Simultaneously, it introduces the PINN framework to embed the thermodynamic equations of oxygen-enriched combustion as physical constraints into the model training process, achieving an organic unity between data-driven learning and adherence to physical laws. This significantly improves the accuracy, generalization ability, and interpretability of kiln temperature prediction.

[0007] To achieve the above objectives, the technical solution adopted by the present invention is as follows:

[0008] A method for predicting kiln temperature based on the fusion of spatiotemporal Transformer and physical information neural network PINN includes the following steps:

[0009] Step 1: Data Acquisition and Preprocessing. Historical operational data is acquired from the distributed control system or data acquisition system of the ceramic kiln. Assuming N burners are arranged along the length of the kiln, the following multi-dimensional combustion parameters are collected for each burner: natural gas flow rate. Combustion airflow velocity Oxygen concentration Measured temperature Select the historical time window length. This forms the input dataset. Outlier detection and removal, missing value imputation, and normalization are then performed on the collected data.

[0010] Outlier detection and removal employs a 3-standard-deviation criterion, specifically: for each feature dimension, the mean of that feature in the original input data is calculated. and standard deviation If data points satisfy If a value is missing, it is considered an outlier and removed. The missing value imputation uses either linear interpolation or forward imputation.

[0011] The normalization process employs a max-min normalization method, scaling each feature to the range of 0 to 1. The normalization formula is as follows:

[0012]

[0013] in, These are the original eigenvalues. and The minimum and maximum values of this feature over the normalized input data. These are the eigenvalues after normalization.

[0014] After preprocessing, normalized input data is obtained, with dimensions [N, [, M], where Where M is the number of historical time steps, and M is the feature dimension.

[0015] Step 2: Spatiotemporal Feature Extraction. A spatiotemporally coupled Transformer model is used, employing a dual-path parallel architecture to extract temporal and spatial features separately, which are then fused. Specifically: Normalized input data is input into the spatiotemporally coupled Transformer model; the model includes a temporal attention encoding branch and a spatial attention encoding branch; the temporal attention encoding branch extracts temporal features through a multi-layer Transformer encoder; the spatial attention encoding branch models spatial features through a multi-layer Transformer encoder; the temporal and spatial features are fused to obtain the spatiotemporally coupled feature representation.

[0016] The input embedding of the spatiotemporally coupled Transformer model includes feature embedding, temporal location encoding, and spatial location encoding; feature embedding maps the original features to a high-dimensional embedding space through linear transformation; both temporal location encoding and spatial location encoding use learnable parameter matrices.

[0017] Both the temporal attention encoding branch and the spatial attention encoding branch employ a multi-head self-attention mechanism, and the attention calculation formula is as follows:

[0018]

[0019] in, For querying the matrix, The key matrix, The three values are obtained through a linear transformation of the input features, and are value matrices. The dimension of the key vector. Scaling factor The function is used to normalize the attention weights, with superscripts... This is the matrix transpose symbol.

[0020] (1) Input embedding and position encoding

[0021] First, an embedding transformation is performed on the normalized input data, mapping the original feature dimension M to a higher-dimensional embedding space. .

[0022] Embedding is achieved through linear transformation:

[0023]

[0024] in, To embed the weight matrix, For bias vectors, For the embedded features, The input data is normalized.

[0025] Add learnable location coding, including time-location coding. and spatial location coding .

[0026] The final input is represented as:

[0027]

[0028] (2) Temporal attention encoding branch

[0029] The task of the temporal attention encoding branch is to capture the evolution patterns and dependencies of each burner combustion parameter over time. For the first... Each burner, its historical time series extracted. The sequence is input into a multi-layer Transformer encoder. Each encoder layer includes a multi-head self-attention layer, a feedforward neural network, residual connections, and layer normalization. After multi-layer encoding, average pooling or taking the last time step is applied to the temporal dimension, ultimately outputting a temporal feature representation. .

[0030] (3) Spatial attention encoding branch

[0031] The task of the spatial attention encoding branch is to model the spatial coupling effect between multiple burners caused by convective heat transfer. For a certain moment... (This can be the state of the last time step or the average of all time steps), extract the state of all burners. The sequence is input into a multi-layer Transformer encoder, where a spatial attention mechanism calculates the degree of interaction between burners and automatically identifies coupling relationships between them. The final output is a spatial feature representation. .

[0032] (4) Spatiotemporal feature fusion

[0033] Representing time features and spatial feature representation To achieve fusion, this invention provides two fusion methods:

[0034] splicing and blending:

[0035]

[0036]

[0037] Weighted summation fusion:

[0038]

[0039] in, and For learnable parameters, satisfying , To fuse the weight matrix, To fuse the bias vector, For the fused feature representation, Represented as a time feature, It represents spatial features.

[0040] Step 3: Physical constraint modeling. A physical information neural network framework is introduced, embedding the physical laws of oxy-fuel combustion as soft constraints into the model training process. Specifically: based on the energy conservation law and heat transfer principle of oxy-fuel combustion, a physical equation describing the relationship between the rate of temperature change and combustion parameters is constructed; a physical loss function is defined to measure the difference between the predicted rate of temperature change and the theoretical rate of temperature change calculated by the physical equation.

[0041] (1) Physical equation of oxygen-enriched combustion

[0042] Based on the law of conservation of energy and the principles of heat transfer, the physical equations include terms related to the heat generated by combustion of fuel gas, the temperature rise caused by combustion air, the co-combustion of fuel gas and air, and heat loss. The rate of change of kiln temperature can be described by the following physical equations:

[0043]

[0044] in, The calorific value coefficient of the gas. The air heat capacity coefficient, The coefficient of performance for the co-combustion of fuel gas and air. To consider the overall heat dissipation coefficient, The nonlinearity index of gas combustion, The nonlinear index of air convection. For natural gas flow rate, To increase the combustion airflow rate, For temperature, For ambient temperature, t It is a time variable.

[0045] The equation is based on the following physical considerations:

[0046] This indicates the contribution of heat generated by natural gas combustion to the temperature rise; the exponent p reflects the nonlinear intensification effect of combustion. This indicates the sensible heat and preheating effect brought about by the flow of combustion air into the kiln; This indicates the synergistic enhancement effect when fuel gas and combustion air are mixed and burned. This represents the heat loss based on Newton's law of cooling.

[0047] (2) Learnable physical parameters

[0048] Six physical parameters ( , , , , , These parameters are set as learnable parameters for the neural network. Parameter initialization involves setting reasonable initial values and ranges based on physical intuition, empirical values, or literature. During model training, these parameters are optimized along with other parameters of the neural network through backpropagation and gradient descent, achieving the inversion of physical parameters.

[0049] (3) Physical loss function

[0050] The steps for calculating the physical loss function are as follows:

[0051] The predicted rate of temperature change is calculated from the temperature sequence predicted by the neural network using numerical differentiation:

[0052]

[0053] Calculate the theoretical rate of temperature change based on the physical equations and the current combustion parameters:

[0054]

[0055] Calculate the mean square error between the predicted rate of temperature change and the theoretical rate of temperature change:

[0056]

[0057] in, For the number of burners, To predict the number of time steps, The predicted temperature at time t+1, Let be the predicted temperature at time t. For time step, This is the theoretical temperature (calculated from the physical equations). For the first The predicted temperature of the burner at time t. For the first The theoretical temperature of the first burner at time t, since the rate of temperature change is calculated using the forward difference method, is... The temperature at time t+1 cannot be calculated in a single time step, therefore the time summation range is from t=1 to... The denominator is adjusted accordingly. .

[0058] Step 4: Joint Training and Prediction. The spatiotemporal coupled feature representation is input into the prediction head network, which outputs the temperature prediction sequence for multiple future time steps. The prediction head network is a multi-layer fully connected neural network. The entire fusion model is trained using a joint loss function. The fusion model includes the spatiotemporal coupled Transformer model, the prediction head network, and the PINN framework. The joint loss function is a weighted combination of the data fitting loss and the physical loss function. The network parameters of the spatiotemporal coupled Transformer model, the network parameters of the prediction head network, and the learnable physical parameters in the physical equations are simultaneously optimized using the backpropagation algorithm.

[0059] (1) Prediction Head Network

[0060] Representing the spatiotemporal coupling characteristics The input is fed into the prediction head network, which outputs a temperature prediction sequence for multiple future time steps.

[0061]

[0062] Output Dimension is [N, ] indicates that N burners will be available in the future. Temperature prediction values at each time step , , To predict the intermediate hidden layers of the head network, , , To predict the head network weight matrix, , , To predict the bias vector of the head network, This represents the Dropout probability.

[0063] (2) Joint loss function

[0064] This invention uses a joint loss function for model training:

[0065]

[0066] Among them, the data fitting loss is:

[0067]

[0068] The weighting coefficient A dynamic scheduling strategy is employed for adjustment: in the initial training phase, the weight coefficients linearly increase from 0 to a preset value; in the later training phase, the weight coefficients remain unchanged at the preset value. For the first The actual temperature (measured value) of the burner at time t.

[0069] The weights are dynamically adjusted based on the training progress.

[0070]

[0071] Where epoch is the current training epoch, and warmup_epochs is the number of warmup epochs, set to 30% of the total training epochs. This is the maximum weight value, ranging from 0.1 to 0.3.

[0072] (3) Model training

[0073] By simultaneously optimizing the network parameters of the spatiotemporally coupled Transformer model, the network parameters of the prediction head network, and the learnable parameters in the physical equations through the backpropagation algorithm, the model can fit historical data while following physical laws.

[0074] Compared with existing technologies, the beneficial effects of this invention are as follows: Through a dual-path architecture of spatiotemporal Transformer, the evolution and coupling laws of combustion parameters and temperature are modeled in both the temporal and spatial dimensions. The temporal attention mechanism automatically identifies key historical moments, while the spatial attention mechanism captures the convective heat transfer effects between multiple burners, significantly improving prediction accuracy. Through the PINN framework, the energy conservation equation and heat transfer equation of oxygen-enriched combustion are embedded into the training process, enabling the model to not only fit historical data but also follow physical laws. Key parameters in the physical equations are set as learnable parameters, and the actual combustion characteristics of the system are automatically discovered through training. The learned physical parameters have clear physical meaning and can quantitatively describe the degree of influence of different factors on temperature, enhancing the interpretability of the model. Attached Figure Description

[0075] Figure 1 This is a schematic diagram of the overall process of the prediction method described in this invention;

[0076] Figure 2 This is a detailed architectural diagram of the spatiotemporal Transformer and PINN fusion model described in this invention. Detailed Implementation

[0077] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

[0078] Example: Refer to Figure 1 and Figure 2 This embodiment provides a specific implementation of a kiln temperature prediction method based on the fusion of spatiotemporal Transformer and physical information neural network PINN.

[0079] I. Application Scenarios and Data Preparation

[0080] This embodiment focuses on temperature prediction modeling for a standard ceramic roller kiln. The kiln is 15 meters long (single section) with 8 burners (N = 8) arranged along its length. It employs oxygen-enriched combustion technology, with an adjustable oxygen concentration range of 21% to 30%.

[0081] Historical operational data was collected from the kiln's distributed control system (DCS) every 30 seconds. The collected parameters included: natural gas flow rate, combustion air flow rate, oxygen concentration, and measured temperature for the eight burners. Historical time windows were selected. = 40 time steps (corresponding to 20 minutes) to predict the future = 12 time steps (corresponding to 6 minutes).

[0082] II. Data Preprocessing

[0083] (1) Outlier detection and removal

[0084] For each feature dimension, the mean μ and standard deviation σ are calculated for the collected parameter values. For example, for the natural gas flow velocity feature, μ = 280 and σ = 95. If |x - 280| > 3 × 95 = 285 across all data points, it is identified as an outlier. For outliers, a linear interpolation method is used to replace them.

[0085] (2) Imputation of missing values

[0086] Missing values exist in the data due to sensor malfunction or communication interruption. For a single missing value, linear interpolation is used; for multiple consecutive missing values, forward imputation is used.

[0087] (3) Normalization

[0088] For each feature, calculate the minimum and maximum values of the collected parameter values after outlier interpolation and missing value imputation. Taking natural gas flow rate as an example... = 50, = 500. The normalization formula is:

[0089]

[0090] After preprocessing, normalized input data is obtained with dimensions [8, 40, 4], which means 8 burners, 40 time steps, and 4 feature dimensions.

[0091] III. Spatiotemporal Feature Extraction

[0092] (1) Input embedding and position encoding

[0093] Select Embedding Dimension = 256. Input data is mapped to a 256-dimensional space through a linear layer:

[0094]

[0095] in, The weight matrix is [4, 256]. It is a 256-dimensional bias vector.

[0096] Add learnable location coding. Time-location coding. Spatial location encoding of a parameter matrix of [40, 256]. The parameter matrix is [8, 256].

[0097] (2) Temporal attention encoding branch

[0098] Extract the time series of each burner and input it into... =4-layer Transformer encoder. Each encoder layer configuration: 8 multi-head self-attention heads (h=8), each head dimension... =32; Feedforward network intermediate layer dimension 1024; Dropout probability After four layers of encoding, average pooling is applied along the time dimension to obtain... The dimensions are [8, 256].

[0099] (3) Spatial attention encoding branch

[0100] Take the state of all burners at the last time step to form The dimensions are [8, 256]. Input to =4-layer Transformer encoder, with the same structure as the temporal attention branch. Output The dimensions are [8, 256].

[0101] (4) Spatiotemporal feature fusion

[0102] Employing a splicing and fusion method:

[0103]

[0104] Dimensionality reduction via linear layers:

[0105]

[0106] IV. Physical Constraint Modeling

[0107] (1) Physical equations and parameter initialization

[0108] Based on the principle of energy conservation in oxygen-enriched combustion, the physical equations are established as follows:

[0109]

[0110] The six physical parameters are set as learnable parameters, with initial values as follows:

[0111] , , , , Ambient temperature Take the lowest value of the input temperature sequence.

[0112] (2) Calculation of physical loss function

[0113] Predicted temperature sequence from neural network output In the middle (dimensions [8, 12]), calculate the predicted rate of temperature change:

[0114]

[0115] For each burner and each prediction time, based on the input combustion parameters , and current temperature Calculate the theoretical rate of temperature change:

[0116]

[0117] Calculate physical loss:

[0118]

[0119] V. Joint Training and Prediction

[0120] (1) Prediction Head Network

[0121] Fusion features (Dimensions [8, 256]) are input to the prediction head network, which is then processed through multiple fully connected layers and activation functions to output... The dimensions are [8, 12].

[0122] (2) Joint loss function

[0123]

[0124] The data fitting loss is:

[0125]

[0126] The weights are dynamically scheduled, with a total of 200 training rounds, 60 warm-up rounds, and maximum weights. =0.15:

[0127]

[0128] (3) Model training

[0129] Using the AdamW optimizer, with an initial learning rate Weight decay The learning rate scheduling uses a cosine annealing strategy. The training batch size is... The model is evaluated on the validation set every 10 rounds, and the model with the lowest validation loss is saved.

[0130] After training, the learned physical parameters converge to: , , , , These parameter values have clear physical meanings, for example... This indicates that combustion under oxygen-rich conditions exhibits superlinear enhancement characteristics.

[0131] (4) Model prediction and evaluation

[0132] Evaluation index calculation formula:

[0133] Root Mean Square Error (RMSE):

[0134]

[0135] The model performance was evaluated on the test set using root mean square error (RMSE) as the evaluation metric. Evaluation results: 1-step prediction (after 30 seconds) RMSE = 8.7℃, 6-step prediction (after 3 minutes) RMSE = 16.2℃, 12-step prediction (after 6 minutes) RMSE = 23.8℃. The total number of test samples, For the first The true temperature of each sample For the first Predicted temperature for each sample.

[0136] In this embodiment, the model, when making temperature predictions, references both the space thermodynamic laws learned from massive historical data and considers recent dynamic changes. The introduced PINN framework ensures that the prediction results conform to the laws of energy conservation and heat transfer. Compared with traditional "black box" models, this method has high transparency and interpretability; operators can not only see the predicted temperature values but also view the learned physical parameters to understand the actual combustion characteristics of the system.

[0137] The successful application of this embodiment in a ceramic roller kiln verifies the practical value of the method, and the core idea of the method can be extended to temperature prediction and intelligent control of other types of industrial kilns.

[0138] Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing embodiments or make equivalent substitutions for some of the technical features. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.

Claims

1. A method for predicting kiln temperature based on the fusion of spatiotemporal Transformer and Physical Information Neural Network (PINN), characterized in that, Includes the following steps: Step 1: Data Acquisition and Preprocessing. Obtain multidimensional combustion parameters of N burners in the ceramic kiln over a historical time period to form an input dataset. Perform outlier detection and removal, missing value imputation, and normalization on the input dataset to obtain normalized input data. Step 2, Spatiotemporal Feature Extraction: Normalized input data is input into a spatiotemporally coupled Transformer model. The spatiotemporally coupled Transformer model includes a temporal attention encoding branch and a spatial attention encoding branch. The temporal attention encoding branch extracts temporal features through a multi-layer Transformer encoder. The spatial attention encoding branch models spatial features through a multi-layer Transformer encoder. The temporal and spatial features are fused to obtain the spatiotemporally coupled feature representation. Step 3: Physical constraint modeling step. Based on the energy conservation law and heat transfer principle of oxygen-enriched combustion, construct a physical equation describing the relationship between the temperature change rate and combustion parameters; define a physical loss function to measure the difference between the predicted temperature change rate and the theoretical temperature change rate calculated by the physical equation. In step 3, the physical equation includes a term for the heat generated by the combustion of the gas, a term for the temperature rise caused by the combustion air, a term for the co-combustion of gas and air, and a term for heat loss. The physical equation is in the form of: ； in, The calorific value coefficient of the gas. The air heat capacity coefficient, The coefficient of performance for the co-combustion of fuel gas and air. To consider the overall heat dissipation coefficient, The nonlinearity index of gas combustion, The nonlinear index of air convection. For natural gas flow rate, To increase the combustion airflow rate, For temperature, For ambient temperature, t The above six parameters are time variables. All are set as learnable parameters for the entire fusion model; Step 3, the calculation of the physical loss function includes the following sub-steps: The predicted rate of temperature change is calculated from the temperature sequence predicted by the entire fusion model using numerical differentiation: ； Calculate the theoretical rate of temperature change based on the physical equations and the current combustion parameter input values: ； Calculate the mean square error between the predicted rate of temperature change and the theoretical rate of temperature change, and use it as the physical loss function value: ； in, For the number of burners, To predict the number of time steps, The predicted temperature at time t+1, The predicted temperature at time t. For time step, The theoretical temperature, For the first The predicted temperature of the burner at time t. For the first The theoretical temperature of the first burner at time t, since the rate of temperature change is calculated using the forward difference method, is... The temperature at time t+1 cannot be calculated in a single time step, therefore the time summation range is from t=1 to... The denominator is adjusted accordingly. ; Step 4, Joint Training and Prediction: The spatiotemporal coupled feature representation is input into the prediction head network, which outputs temperature prediction sequences for multiple future time steps. The prediction head network is a multi-layer fully connected neural network, and the entire fusion model is trained using a joint loss function. The fusion model includes the spatiotemporal coupled Transformer model, the prediction head network, and the PINN framework. The joint loss function is a weighted combination of the data fitting loss and the physical loss function. The network parameters of the spatiotemporal coupled Transformer model, the network parameters of the prediction head network, and the learnable physical parameters in the physical equations are simultaneously optimized using the backpropagation algorithm.

2. The kiln temperature prediction method based on the fusion of spatiotemporal Transformer and physical information neural network PINN as described in claim 1, characterized in that, In step 1, the multidimensional combustion parameters include natural gas flow rate, combustion air flow rate, oxygen concentration, and measured temperature. Outlier detection and removal employ a 3-standard-deviation criterion, specifically: for each feature dimension, the mean of that feature on the original input data is calculated. and standard deviation If data points satisfy If the value is an outlier, it will be removed.

3. The kiln temperature prediction method based on the fusion of spatiotemporal Transformer and physical information neural network PINN according to claim 1, characterized in that, In step 1, the missing value imputation uses either linear interpolation or forward imputation; the normalization process uses max-min normalization to scale each feature to the range of 0 to 1, and the normalization formula is: ； in, These are the original eigenvalues. and The minimum and maximum values of this feature on the normalized input data, These are the eigenvalues after normalization.

4. The kiln temperature prediction method based on the fusion of spatiotemporal Transformer and physical information neural network PINN according to claim 1, characterized in that, In step 2, the input embedding of the spatiotemporally coupled Transformer model includes feature embedding, temporal location encoding, and spatial location encoding; feature embedding maps the original features to a high-dimensional embedding space through linear transformation; both temporal location encoding and spatial location encoding use learnable parameter matrices.

5. The kiln temperature prediction method based on the fusion of spatiotemporal Transformer and physical information neural network PINN according to claim 1, characterized in that, In step 2, both the temporal attention encoding branch and the spatial attention encoding branch employ a multi-head self-attention mechanism, and the attention calculation formula is as follows: ； in, For querying the matrix, The key matrix, The three values are obtained through a linear transformation of the input features, and are value matrices. The dimension of the key vector. Scaling factor The function is used to normalize the attention weights, with superscripts... This is the matrix transpose symbol.

6. The kiln temperature prediction method based on the fusion of spatiotemporal Transformer and physical information neural network PINN according to claim 1, characterized in that, In step 2, the fusion of temporal and spatial features is achieved by either splicing or weighted summation. The splicing method is as follows: ； The weighted summation method is as follows: ； in, and For learnable weight coefficients, satisfying , To fuse the weight matrix, To fuse the bias vector, For the fused feature representation, Represented as a time feature, It represents spatial features.

7. The kiln temperature prediction method based on the fusion of spatiotemporal Transformer and physical information neural network PINN according to claim 1, characterized in that, In step 4, the formula for the joint loss function is: ； The data fitting loss is: ； Weighting coefficient A dynamic scheduling strategy is employed for adjustment: in the initial training phase, the weight coefficients linearly increase from 0 to a preset value; in the later training phase, the weight coefficients remain unchanged at the preset value. For the first The actual temperature of the burner at time t.

8. The kiln temperature prediction method based on the fusion of spatiotemporal Transformer and physical information neural network PINN according to claim 7, characterized in that, The specific formula for the dynamic scheduling strategy is as follows: ； in, For the current training round, The number of warm-up rounds is set at 30% of the total number of training rounds. This is the maximum weight value, ranging from 0.1 to 0.3.