Temperature control sensor data completion method based on generative adversarial network
By introducing physical residual hard embedding and sparse attention dimensionality reduction into generative adversarial networks, the problems of violating physical laws and high computational complexity in existing technologies are solved, and high-fidelity, stable and efficient data completion of temperature control sensor data is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SUZHOU HUIKE EQUIP CO LTD
- Filing Date
- 2026-04-20
- Publication Date
- 2026-06-26
AI Technical Summary
Existing temporal interpolation techniques based on generative adversarial networks are prone to generating non-physically distorted data that violates the law of energy conservation when dealing with missing long-period, high-frequency data blocks. This results in high computational complexity and can easily lead to gradient vanishing or mode collapse, making it difficult for the model to maintain stable convergence of the macroscopic physical evolution trend in extreme data blind zones.
By introducing physical residual hard embedding and sparse attention dimensionality reduction, a generative adversarial network model is constructed. The partial differential physical residual is calculated using automatic differentiation, and low-weight redundant vectors are removed by combining the sparse attention mechanism. A composite loss function and gradient penalty mechanism are constructed to ensure that the model conforms to physical laws and converges stably.
It improves the physical authenticity and computational efficiency of temperature control sensor data completion, avoids erroneous control, reduces computing power consumption, and ensures accurate macroscopic evolution trend anchoring of the model in the high missing rate extreme blind zone, achieving high-fidelity time series reconstruction.
Smart Images

Figure CN122064931B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of artificial intelligence and data processing technology, specifically to a method for data completion of temperature control sensors based on generative adversarial networks. Background Technology
[0002] In complex industrial physical systems, temperature control sensor networks are crucial for equipment condition monitoring. However, due to extreme environments and transmission interference, sensor data often suffers from missing blocks of continuity. Using generative neural networks for time-series interpolation has become the mainstream research direction for data reliability recovery.
[0003] Existing temporal imputation techniques based on generative adversarial networks (GANs) are mostly pure data-driven black-box models, primarily fitting missing sequences through statistical probability distributions. However, they have significant drawbacks when dealing with long-term, high-frequency missing data blocks: First, traditional networks lack underlying macroscopic physical constraints such as thermodynamics, making them prone to generating non-physically distorted data that violates the law of energy conservation, leading to erroneous control. Second, conventional global self-attention mechanisms experience quadratic increases in computational and spatial complexity when processing high-dimensional temporal data, resulting in enormous computational overhead. Finally, conventional adversarial game loss mechanisms are prone to gradient vanishing or pattern collapse during deep network iterations, making it difficult for the model to maintain stable convergence of macroscopic physical evolution trends in extreme data blind spots.
[0004] The present invention aims to address the technical shortcomings of existing pure data-driven models, such as violating physical laws, having excessively high computational complexity, and being extremely prone to mode collapse.
[0005] To address this, a data completion method for temperature control sensors based on generative adversarial networks is proposed. Summary of the Invention
[0006] The purpose of this invention is to provide a method for data completion of temperature control sensors based on generative adversarial networks, which achieves high-fidelity time sequence reconstruction that conforms to physical laws through physical residual hard embedding and sparse attention dimensionality reduction.
[0007] To achieve the above objectives, the present invention provides the following technical solution:
[0008] A method for data completion of temperature control sensors based on generative adversarial networks includes:
[0009] Obtain the time-series data matrix of the missing temperature control sensor; construct a model containing a generator and a discriminator, wherein the generator is configured with a forward topological hard embedding layer and an active feature selection module;
[0010] The time-series data matrix is input into the forward topological hard embedding layer, and the automatic differentiation is used to calculate the first-order temporal partial derivative and the second-order spatial partial derivative matrix to obtain the partial differential physical residual; the partial differential physical residual is embedded into the basic neuron transmission equation to output the latent feature matrix.
[0011] The latent feature matrix is input into the active feature selection module; the divergence weight of the query vector is calculated and low-weight redundant vectors are removed using a sparse attention mechanism; the retained vectors are dimensionally compressed to output a global temporal feature matrix, and then decoded to generate the first synthetic sequence matrix;
[0012] The first synthesized sequence matrix and the real sequence are input into the discriminator to obtain the discrimination confidence; based on the discrimination confidence, a composite loss function consisting of adversarial loss, reconstruction loss and residual constraint loss is calculated;
[0013] The network parameters are iterated using a gradient penalty mechanism until the composite loss function converges; the time-series data matrix is interpolated to output the target temperature time-series completion matrix.
[0014] Preferably, the step of calling the automatic differential calculation to obtain the partial differential physical residual includes:
[0015] In the forward topology hard embedding layer, the automatic differentiation is used to construct a neural network forward computation graph about the temperature variable; based on the neural network forward computation graph, the first-order partial derivative of the temperature variable with respect to the time step is extracted from the time-series data matrix; the second-order partial derivative matrix of the temperature variable with respect to the three-dimensional spatial dimension is extracted, and matrix scaling is performed on the second-order partial derivative matrix using the effective thermal diffusivity of the target environment; the difference matrix between the first-order partial derivative and the scaled second-order partial derivative matrix is calculated, and the difference matrix is defined as the partial differential physical residual.
[0016] Preferably, embedding the partial differential physical residual into the basic neuron transfer equation and outputting a latent feature matrix includes: obtaining the weight matrix and bias vector of the basic neuron transfer equation in the forward topological hard embedding layer; performing a linear mapping calculation on the time-series data matrix using the weight matrix and superimposing the bias vector to generate an initial feature state matrix; obtaining preset physical constraint weight coefficients and performing matrix scaling calculation on the partial differential physical residual using the physical constraint weight coefficients; performing element-wise summation operation on the initial feature state matrix and the scaled partial differential physical residual to generate a physical constraint feedforward matrix; inputting the physical constraint feedforward matrix into a nonlinear activation function for activation calculation, and defining the output result of the nonlinear activation function as the latent feature matrix.
[0017] Preferably, calculating the query vector divergence weight and using a sparse attention mechanism to remove low-weight redundant vectors includes: in the active feature selection module, generating corresponding feature query vectors and feature key vectors based on the latent feature matrix; calculating the attention probability distribution between each feature query vector and all feature key vectors; using a metric distribution divergence algorithm to calculate the difference metric between the attention probability distribution and the standard uniform distribution; defining the difference metric as the query vector divergence weight corresponding to the feature query vector; setting an activity weight threshold, and using the sparse attention mechanism to extract feature query vectors whose query vector divergence weight is not less than the activity weight threshold as the retained vectors; and determining feature query vectors whose query vector divergence weight is lower than the activity weight threshold as low-weight redundant vectors.
[0018] Preferably, generating the first synthesized sequence matrix includes: in the active feature selection module, generating a feature value vector matching the feature query vector and the feature key vector based on the latent feature matrix; extracting the attention probability distribution corresponding to the retained vector, and constructing a query-key relevance weight matrix using the attention probability distribution of the retained vector; performing matrix multiplication on the query-key relevance weight matrix and the feature value vector to achieve the dimensionality compression based on the feature weights, and defining the aggregated output of the matrix multiplication operation as the global temporal feature matrix; inputting the global temporal feature matrix into the generative decoding network layer configured in the generator; in the generative decoding network layer, performing a single forward network reconstruction on the global temporal feature matrix through multi-layer fully connected mapping calculation to restore it to the same time step and spatial channel dimension as the temporal data matrix, and defining the reconstructed sequence matrix as the first synthesized sequence matrix.
[0019] Preferably, calculating the composite loss function includes: based on the discrimination confidence level, calculating a probability distribution difference metric between the generated data distribution of the first synthesized sequence matrix and the real data distribution of the real sequence, and defining the probability distribution difference metric as the adversarial loss; extracting known observation data points that are not missing from the time-series data matrix, calculating the mean square error between the first synthesized sequence matrix and the real sequence at the corresponding positions of the known observation data points, and defining the mean square error as the reconstruction loss; extracting target imputation data points that are missing from the time-series data matrix, obtaining the partial differential physical residual calculated by the forward topology hard embedding layer, calculating the mean square error of the partial differential physical residual at the corresponding positions of the target imputation data points, and defining the mean square error as the residual constraint loss; obtaining preset adversarial loss weight coefficients, reconstruction loss weight coefficients, and residual constraint weight coefficients, performing multiplication operations on the adversarial loss, reconstruction loss, and residual constraint loss using the adversarial loss weight coefficients, the reconstruction loss weight coefficients, and the residual constraint weight coefficients respectively, and weighting and summing the output results of the multiplication operations to generate the composite loss function.
[0020] Preferably, the gradient penalty mechanism includes: performing a linear interpolation calculation with a random ratio between the first synthesized sequence matrix and the real sequence to generate an interpolated sample matrix; inputting the interpolated sample matrix into the discriminator and calculating the gradient norm of the discriminator's output relative to the interpolated sample matrix; calculating the squared difference between the gradient norm and 1, and multiplying the squared difference by a preset penalty coefficient to generate a gradient penalty regularization term; superimposing the gradient penalty regularization term into the composite loss function, and performing backpropagation based on the superimposed composite loss function to iteratively update the network parameters in the model.
[0021] Compared with the prior art, the beneficial effects of the present invention are as follows:
[0022] 1. This invention breaks through the black-box limitations of traditional pure data-driven models by calling automatic differential calculation to calculate partial differential physical residuals and hard-embedding them into the basic neuron transmission equations. This mechanism introduces strict physical boundary constraints at the underlying layer of network forward propagation, directly blocking the generation of distorted data that violates thermodynamic laws and energy conservation from the algorithm structure, improving the physical authenticity of the interpolation sequence, and effectively avoiding erroneous decisions caused by abnormal data in industrial closed-loop control systems.
[0023] 2. This invention employs a metric distribution divergence algorithm to calculate query vector weights and actively eliminates low-weight redundant vectors through a sparse attention mechanism to achieve dimensionality compression. This effectively overcomes the problem of quadratic computational complexity growth inherent in traditional global self-attention mechanisms when processing high-frequency, high-dimensional time-series data, reducing computational overhead and memory consumption. While ensuring no omission of core global temporal features, it improves the computational efficiency of sequence decoding.
[0024] 3. This invention constructs a composite loss function that integrates adversarial, reconstruction, and residual constraints, and introduces a gradient penalty mechanism based on random interpolation samples. This mechanism constrains the gradient norm of the discriminator during differentiation, effectively alleviating the gradient vanishing and mode collapse problems that easily occur in deep generative adversarial networks during complex temporal training, ensuring the convergence stability of adversarial training, and enabling the model to accurately anchor the macroscopic evolution trend even when facing a high missing rate extreme blind zone. Attached Figure Description
[0025] Figure 1 This is a flowchart of the temperature control sensor data completion method based on generative adversarial networks proposed in this invention.
[0026] Figure 2 This is a flowchart of the temperature control sensor data completion method based on generative adversarial networks proposed in this invention.
[0027] Figure 3 This is a flowchart of the method for calculating the composite loss function according to the present invention. Detailed Implementation
[0028] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention. Example 1
[0029] Please see Figures 1 to 3 This invention provides a method for data completion of temperature control sensors based on generative adversarial networks, the technical solution of which is as follows:
[0030] A method for data completion from temperature control sensors based on generative adversarial networks, such as Figures 1-2 As shown, it includes:
[0031] Obtain the time-series data matrix of the missing temperature control sensor; construct a model containing a generator and a discriminator, wherein the generator is configured with a forward topological hard embedding layer and an active feature selection module;
[0032] The time-series data matrix is input into the forward topological hard embedding layer, and the automatic differentiation is used to calculate the first-order temporal partial derivative and the second-order spatial partial derivative matrix to obtain the partial differential physical residual; the partial differential physical residual is embedded into the basic neuron transmission equation to output the latent feature matrix.
[0033] The latent feature matrix is input into the active feature selection module; the divergence weight of the query vector is calculated and low-weight redundant vectors are removed using a sparse attention mechanism; the retained vectors are dimensionally compressed to output a global temporal feature matrix, and then decoded to generate the first synthetic sequence matrix;
[0034] The first synthesized sequence matrix and the real sequence are input into the discriminator to obtain the discrimination confidence; based on the discrimination confidence, a composite loss function consisting of adversarial loss, reconstruction loss and residual constraint loss is calculated;
[0035] The network parameters are iterated using a gradient penalty mechanism until the composite loss function converges; the time-series data matrix is interpolated to output the target temperature time-series completion matrix.
[0036] Furthermore, the step of calling the automatic differential calculation to obtain the partial differential physical residuals by calculating the first-order time partial derivative and the second-order space partial derivative matrix includes:
[0037] In the forward topology hard embedding layer, the automatic differentiation is used to construct a neural network forward computation graph about the temperature variable; based on the neural network forward computation graph, the first-order partial derivative of the temperature variable with respect to the time step is extracted from the time-series data matrix; the second-order partial derivative matrix of the temperature variable with respect to the three-dimensional spatial dimension is extracted, and matrix scaling is performed on the second-order partial derivative matrix using the effective thermal diffusivity of the target environment; the difference matrix between the first-order partial derivative and the scaled second-order partial derivative matrix is calculated, and the difference matrix is defined as the partial differential physical residual.
[0038] The forward topology hard embedding layer specifically adopts a multi-layer feedforward network topology based on implicit neural representation. In its specific network architecture, the number of network layers is configured to include one input mapping sub-layer, three consecutively stacked hidden computation sub-layers, and one feature-aligned output sub-layer. Regarding neuron connections, the aforementioned sub-layers strictly employ a fully connected topology for forward signal transmission, allowing for the full cross-fusion of the time dimension and the three-dimensional spatial coordinate dimension in the temporal data matrix. All basic neurons in the three consecutively stacked hidden computation sub-layers are configured with the hyperbolic tangent activation function, which possesses second-order continuous differentiability, thereby constructing a globally continuous and smooth neural network forward computation graph.
[0039] Specifically, extracting the second-order partial derivative matrix of the temperature variable with respect to the three-dimensional spatial dimension involves: calculating the unmixed second-order partial derivatives of the temperature variable in the three-dimensional spatial direction using automatic differentiation, and summing and aggregating the unmixed second-order partial derivatives in the three-dimensional spatial direction to obtain the second-order partial derivative matrix characterizing the Laplace operator of spatial heat conduction; the effective thermal diffusivity of the target environment is pre-calibrated and input from the material physical constants of the target monitoring area.
[0040] In practical industrial applications, the effective thermal diffusivity is directly related to the medium in which it is located. When the temperature sensor is deployed on a thermally conductive component made of pure copper, the effective thermal diffusivity can be pre-calibrated using a table. When deployed in a normal temperature air environment, this coefficient can be calibrated as follows: When applying the continuous thermal diffusion equation to discrete sensor networks, the five-point difference scheme in the finite difference method is used to discretize the spatial second-order partial derivatives, mapping the sensor node positions to discrete grid points.
[0041] This invention combines the effective thermal diffusivity of real industrial media with automatic differentiation technology to construct partial differential physical residuals characterizing the Laplace operator. This mechanism embeds the physical laws of thermal conduction directly into the neural network, overcoming the black-box limitations of traditional pure data models. It not only improves the physical fidelity of the supplementary data, preventing predictions from violating the law of conservation of energy, but also enables the model to adaptively fit to different sensor deployment media, enhancing the accuracy and generalization performance of data reconstruction in complex industrial environments.
[0042] Further, embedding the partial differential physical residual into the basic neuron transfer equation and outputting a latent feature matrix includes: obtaining the weight matrix and bias vector of the basic neuron transfer equation in the forward topological hard embedding layer; performing linear mapping calculation on the time-series data matrix using the weight matrix and superimposing the bias vector to generate an initial feature state matrix; obtaining preset physical constraint weight coefficients and performing matrix scaling calculation on the partial differential physical residual using the physical constraint weight coefficients; performing element-wise summation operation on the initial feature state matrix and the scaled partial differential physical residual to generate a physical constraint feedforward matrix; inputting the physical constraint feedforward matrix into a nonlinear activation function for activation calculation, and defining the output result of the nonlinear activation function as the latent feature matrix.
[0043] The physical constraint weight coefficient is used to control the fusion ratio of data-driven features and the physical laws of heat conduction. Its value range is set to the interval [0, 1]. In the initial stage of model training, it is given an initial high weight biased towards the physical laws. Subsequently, it decays exponentially with the number of iterations of backpropagation of the network according to a preset decay rate. The nonlinear activation function is a smooth activation function with second-order continuous differentiability. It is a hyperbolic tangent function to ensure that the non-zero spatial second-order partial derivative matrix can be continuously extracted through the automatic differentiation in the forward topological hard embedding layer, avoiding the truncation of physical residual calculation caused by gradient discontinuity.
[0044] This invention employs dynamically decaying physical weight coefficients, strongly guiding model convergence with physical priors in the early stages of training, while focusing on data fitting in the later stages, perfectly balancing physical fidelity and completion accuracy. Simultaneously, the use of a second-order continuously differentiable activation function ensures the continuous extraction of non-zero second-order partial derivatives of the spatial matrix through automatic differentiation, avoiding gradient truncation and ensuring the physical residual constraint mechanism is truly implemented and effective.
[0045] Further, calculating the query vector divergence weights and using a sparse attention mechanism to remove low-weight redundant vectors includes: in the active feature selection module, generating corresponding feature query vectors and feature key vectors based on the latent feature matrix; calculating the attention probability distribution between each feature query vector and all feature key vectors; using a metric distribution divergence algorithm to calculate the difference metric between the attention probability distribution and the standard uniform distribution; defining the difference metric as the query vector divergence weight corresponding to the feature query vector; setting an activity weight threshold, and using the sparse attention mechanism to extract feature query vectors whose query vector divergence weights are not less than the activity weight threshold as the retained vectors; and determining feature query vectors whose query vector divergence weights are lower than the activity weight threshold as low-weight redundant vectors.
[0046] Specifically, the metric distribution divergence algorithm is the Kullback-Leibler relative entropy algorithm, which obtains the difference metric value by calculating the information gain of the attention probability distribution relative to the standard uniform distribution; the activity weight threshold is a dynamic adaptive threshold, which is specifically set as follows: calculate the arithmetic mean of the divergence weights of the query vectors corresponding to all feature query vectors in the current network layer, multiply the arithmetic mean by a preset sparsity retention coefficient, and set the multiplication result as the activity weight threshold.
[0047] The preset sparsity retention coefficient is set to a range of [0.7, 1.3]. In actual engineering implementation, the specific value of this coefficient is adaptively matched and set according to the degree of dynamic temperature change in the target monitoring area.
[0048] When the temperature control sensor is in a low dynamic stable condition with long-term constant temperature or slow temperature drift, there is a lot of similar redundancy in the target time series data. The recommended value range for the preset sparsity retention coefficient is [1.1, 1.2].
[0049] When the temperature control sensor is in a highly dynamic operating condition with frequent temperature fluctuations and dense abrupt changes (such as the start-up and shutdown of the refrigeration unit or the charging stage of the blast furnace), the recommended range for the preset sparsity retention coefficient is [0.8, 0.9].
[0050] In standard operating conditions where prior dynamic characteristics are lacking, the preset sparsity retention coefficient is set to a baseline value of 1.0 by default, that is, the arithmetic mean is directly used as the dividing line for feature selection.
[0051] This invention employs Kullback-Leibler relative entropy to precisely quantify the information gain of attention distribution, scientifically evaluating and identifying high-value features. Combined with an adaptive threshold dynamically calculated based on the mean of the current layer weights, the model can flexibly adjust the sparsity boundary according to the real-time data distribution. This design avoids the erroneous feature deletion or computational waste caused by fixed thresholds. While eliminating low-weight redundant noise and significantly reducing computational overhead, it adaptively preserves core temporal features, achieving a perfect balance between computational efficiency and data reconstruction accuracy.
[0052] Further, generating the first synthetic sequence matrix includes: in the active feature selection module, generating a feature value vector matching the feature query vector and the feature key vector based on the latent feature matrix; extracting the attention probability distribution corresponding to the retained vector, and constructing a query-key relevance weight matrix using the attention probability distribution of the retained vector; performing matrix multiplication on the query-key relevance weight matrix and the feature value vector to achieve the dimensionality compression based on feature weights, and defining the aggregated output result of the matrix multiplication operation as the global temporal feature matrix; inputting the global temporal feature matrix into the generative decoding network layer configured in the generator; in the generative decoding network layer, performing a single forward network reconstruction on the global temporal feature matrix through multi-layer fully connected mapping calculation to restore it to the same time step and spatial channel dimension as the temporal data matrix, and defining the reconstructed sequence matrix as the first synthetic sequence matrix.
[0053] Specifically, the dimension compression based on feature weights is achieved as follows: since the query-key relevance weight matrix is constructed only from the retained vectors after removing low-weight redundant vectors, its time step dimension is smaller than the original sequence, thus reducing the time dimension of the aggregated output result after performing matrix multiplication with the feature value vector.
[0054] In the generative decoding network layer, a single forward network reconstruction is performed on the global temporal feature matrix through multi-layer fully connected mapping calculation to restore it to the same time step and spatial channel dimension as the temporal data matrix. Specifically, a feature reshaping operator and a one-dimensional transposed convolution operator are additionally configured in the generative decoding network layer. First, the global temporal feature matrix is mapped to a high-dimensional hidden layer vector through the multi-layer fully connected mapping calculation. Second, the high-dimensional hidden layer vector is folded into a multi-channel sequence tensor using the feature reshaping operator. Finally, the multi-channel sequence tensor is subjected to temporal upsampling and channel dimensionality reduction through the one-dimensional transposed convolution operator, thereby accurately restoring it to the same time step and spatial channel dimension as the temporal data matrix.
[0055] To ensure that the multi-layer fully connected mapping computation can fully exploit the nonlinear expression of temporal features and stably restore the spatiotemporal tensor structure, the specific network architecture and hyperparameter configuration are as follows: the multi-layer fully connected mapping computation consists of three consecutively stacked fully connected hidden layers; to achieve progressively increasing the dimensionality of the feature space, the number of neurons in the three fully connected hidden layers is increased layer by layer according to a preset expansion ratio, specifically set to 2 times, 4 times, and 8 times the dimension of the global temporal feature matrix, respectively; in terms of signal transmission logic, each fully connected hidden layer is followed by a layer normalization. The operator and nonlinear activation function are used, wherein, to maintain smooth compatibility with the gradient of the preceding physical residual features, the nonlinear activation function is selected as a Gaussian error linear unit; in addition, to overcome the gradient vanishing and feature decay problems that are prone to occur during deep network reconstruction, an additional cross-layer residual connection with identity mapping is constructed between the input end of the first fully connected hidden layer and the output end of the third fully connected hidden layer, and the original global temporal feature matrix is directly accumulated onto the high-dimensional reconstruction output feature, which is used as the high-dimensional hidden layer vector input to the feature reshaping operator.
[0056] This invention achieves substantial reduction in the time dimension by preserving vectors, effectively eliminating redundant noise and significantly reducing the computational overhead of subsequent decoding. Simultaneously, it innovatively introduces feature reshaping and a one-dimensional transposed convolution operator, overcoming the limitation of a single fully connected layer in accurately reconstructing the spatiotemporal tensor structure. This design realizes a logical closed loop of efficient feature purification and high-fidelity temporal upsampling, balancing the computational efficiency of adversarial networks with the accuracy of multi-dimensional temporal reconstruction.
[0057] Furthermore, the composite loss function is calculated, such as... Figure 3As shown, the process includes: calculating a probability distribution difference metric between the generated data distribution of the first synthesized sequence matrix and the real data distribution of the real sequence based on the discrimination confidence level, and defining the probability distribution difference metric as the adversarial loss; extracting known observation data points that are not missing from the time-series data matrix, calculating the mean square error between the first synthesized sequence matrix and the real sequence at the corresponding positions of the known observation data points, and defining the mean square error as the reconstruction loss; extracting target imputation data points that are missing from the time-series data matrix, obtaining the partial differential physical residual calculated by the forward topology hard embedding layer, calculating the mean square error of the partial differential physical residual at the corresponding positions of the target imputation data points, and defining the mean square error as the residual constraint loss; obtaining preset adversarial loss weight coefficients, reconstruction loss weight coefficients, and residual constraint weight coefficients, performing multiplication operations on the adversarial loss, reconstruction loss, and residual constraint loss using the adversarial loss weight coefficients, the reconstruction loss weight coefficients, and the residual constraint weight coefficients respectively, and weighting and summing the output results of the multiplication operations to generate the composite loss function.
[0058] To meet the convergence requirements of the gradient penalty mechanism, the probability distribution difference metric is specifically calculated based on the Wasserstein distance algorithm. The adversarial loss is obtained by quantizing the minimum bulldozer cost required to transform the generated data distribution into the real data distribution. The specific configuration rules for the adversarial loss weight coefficient, reconstruction loss weight coefficient, and residual constraint weight coefficient are as follows: to balance the backpropagation of gradients of different magnitudes, the reconstruction loss weight coefficient is set as a baseline dominant constant, the adversarial loss weight coefficient is set as an empirical fixed value with a magnitude smaller than the reconstruction loss weight coefficient, and the residual constraint weight coefficient is set as an adaptive variable that dynamically increases with the number of training iterations. This achieves the composite optimization objective of prioritizing the reconstruction of temporal basic data in the early stage of model training and finely calibrating the distribution boundary using physical constraints in the later stage of training.
[0059] This invention employs a Wasserstein distance combined with a gradient penalty mechanism to effectively overcome the pattern collapse problem that is prone to occur in adversarial networks. A dynamic weight configuration strategy balances multi-objective gradient backpropagation: initially, reconstruction loss dominates to ensure temporal series fitting, while later, dynamically increasing physical weights fine-tunes the distribution boundaries. This mechanism guides the model to achieve smooth convergence through "data reconstruction first, then physical calibration," improving the physical fidelity of the imputed sequence and the overall interpolation accuracy.
[0060] Furthermore, the gradient penalty mechanism includes: performing a linear interpolation calculation with a random ratio between the first synthesized sequence matrix and the real sequence to generate an interpolated sample matrix; inputting the interpolated sample matrix into the discriminator and calculating the gradient norm of the discriminator's output relative to the interpolated sample matrix; calculating the squared difference between the gradient norm and 1, and multiplying the squared difference by a preset penalty coefficient to generate a gradient penalty regularization term; superimposing the gradient penalty regularization term into the composite loss function, and performing backpropagation based on the superimposed composite loss function to iteratively update the network parameters in the model.
[0061] A linear interpolation calculation with a random ratio is performed between the first synthesized sequence matrix and the real sequence. Specifically, random weight coefficients are generated by independently sampling from a standard uniform distribution in the interval [0, 1]. The first synthesized sequence matrix and the real sequence are weighted and summed using the random weight coefficients to generate the interpolated sample matrix. The gradient norm is specifically the L2 norm of the partial derivative of the discriminator's output with respect to the interpolated sample matrix. By calculating the square of the difference between the L2 norm and 1 and applying a penalty, the gradient change rate of the discriminator is forced to approach 1, so that the discriminator strictly satisfies the 1-Lipschitz continuity condition in the entire data sample space, thereby ensuring the gradient stability of the composite loss function based on the Wasserstein distance in the global network optimization process.
[0062] This invention forces the discriminator to strictly satisfy the 1-Lipschitz continuity condition in the global data space by penalizing the L2 norm of the partial derivatives of randomly interpolated samples. This mechanism effectively overcomes the gradient explosion and mode collapse problems that are common in traditional adversarial networks from the underlying algorithmic logic, ensuring gradient stability when optimizing based on Wasserstein distance, and ensuring that the model achieves smooth, stable convergence and high-fidelity interpolation in complex temporal reconstruction tasks.
[0063] This invention embeds partial differential physical residuals into the bottom layer of a neural network, breaking the black-box limitations of pure data models and ensuring that the completed sequence strictly follows the conservation laws of thermodynamics. At the same time, it actively removes redundant features by using divergence weights and sparse attention mechanisms, reducing the computational cost of high-dimensional time series data. Combined with composite loss and gradient penalty mechanisms, it effectively overcomes the problem of adversarial network mode collapse, ensuring smooth model convergence and achieving high-fidelity and high-precision time series data reconstruction. Example 2
[0064] This embodiment uses a temperature control sensor network in a high-density server room of a large data center as a specific application scenario to describe in detail a temperature control sensor data completion method based on generative adversarial networks proposed in this invention. During high-load operation of a data center, temperature control sensors on the top of the racks often experience continuous data loss due to sudden high-frequency electromagnetic interference from server arrays. To recover this core monitoring data, the time-series data matrix containing the missing temperature control sensors within the rack monitoring area is first obtained, and an adversarial model containing a generator and a discriminator is constructed. A forward topology hard embedding layer and an active feature selection module are configured within the generator.
[0065] The acquired time-series data matrix is input into a forward topological hard embedding layer, and an automatic differentiation is used to construct a neural network feedforward computation graph for the temperature variable in the computer room. Based on this computation graph, the first-order partial derivatives of the temperature variable with respect to the time step are extracted from the time-series data matrix. The unmixed second-order partial derivatives of the temperature variable in the three orthogonal coordinate axes of three-dimensional space are then calculated and summed to obtain the second-order partial derivative matrix representing the Laplace operator of heat conduction in the computer room space. For the ambient temperature air medium in the cold aisle of the computer room, the effective thermal diffusivity coefficient, pre-calibrated by looking up a table, is used as a physical constant input, and this coefficient is used to perform matrix scaling calculations on the second-order partial derivative matrix. Subsequently, the difference matrix between the first-order partial derivatives and the scaled second-order partial derivative matrix is calculated to obtain the partial differential physical residual. Further, the weight matrix and bias vector of the basic neuron transmission equation are obtained, a linear mapping is performed on the time-series data matrix, and the bias is superimposed to generate an initial feature state matrix. Simultaneously, the partial differential physical residual is scaled using physical constraint weight coefficients that decay exponentially with the number of iterations according to a preset decay rate. The initial feature state matrix is summed element-wise with the scaled physical residuals to generate a physical constraint feedforward matrix, which is then input into a nonlinear activation function that acts as a hyperbolic tangent function to activate the matrix and output the latent feature matrix.
[0066] The output latent feature matrix is input into the active feature selection module, which generates corresponding feature query vectors, feature key vectors, and eigenvalue vectors based on this matrix. After calculating the attention probability distribution between the feature query vector and all feature key vectors, the Kullback-Leibler relative entropy algorithm is used to calculate the information gain difference metric between this distribution and the standard uniform distribution, which is defined as the query vector divergence weight. The activity weight threshold is dynamically set by calculating the arithmetic mean of the divergence weights of the current network layer's feature query vectors and multiplying it by a preset sparsity retention coefficient. Using the sparse attention mechanism, vectors with weights below the threshold are identified as low-weight redundant vectors in the constant temperature dead zone of the computer room and are removed, while high-dynamic retention vectors with weights not less than the threshold are extracted. Subsequently, a query-key relevance weight matrix is constructed using the attention probability distribution of the retained vectors, and matrix multiplication is performed with the eigenvalue vector to achieve dimensionality compression based on feature weights, aggregating and outputting a global temporal feature matrix. The feature matrix is input into the generative decoding network layer configured by the generator. It is then mapped to a high-dimensional hidden layer vector through multi-layer fully connected mapping. The feature reshaping operator is used to fold it into a multi-channel sequence tensor. Finally, a one-dimensional transpose convolution operator is used to perform temporal upsampling and channel dimensionality reduction to accurately restore the same time step and spatial channel dimension as the original time series data matrix of the data center, and decode to generate the first synthetic sequence matrix.
[0067] After high-fidelity decoding, the first synthesized sequence matrix and the real data center sensor sequence are synchronously input into the discriminator to obtain the discrimination confidence. To facilitate gradient penalty optimization, a probability distribution difference metric between the generated data and the real data distribution is calculated based on the discrimination confidence using the Wasserstein distance algorithm, thus obtaining the adversarial loss. Simultaneously, the mean square error of known observation data points in the time-series data matrix is extracted as the reconstruction loss, and the mean square error of the partial differential physical residuals corresponding to the missing target interpolation data points is extracted as the residual constraint loss. Using a preset baseline dominant constant as the reconstruction loss weight, an empirical fixed value as the adversarial loss weight, and a dynamically increasing adaptive variable as the residual constraint weight, the above three losses are multiplied and weighted summed to generate a composite loss function.
[0068] Finally, to ensure smooth model convergence, a gradient penalty mechanism is used to iterate the network parameters. Random weight coefficients are generated by independently sampling from a standard uniform distribution, and a weighted sum is calculated between the first synthesized sequence matrix and the true sequence to generate an interpolated sample matrix. This interpolated sample matrix is input into the discriminator, and the L2 norm of the output result relative to the partial derivative of the interpolated sample is calculated. The square of the difference between this L2 norm and 1 is calculated and multiplied by a preset penalty coefficient to generate a gradient penalty regularization term, which is superimposed on the composite loss function, forcing the discriminator to strictly satisfy the 1-Lipschitz continuity condition throughout the entire data sample space. Finally, backpropagation is performed based on the composite loss function with superimposed regularization term, iteratively updating the network parameters until convergence, completing the interpolation of the computer room time-series data matrix, and outputting a highly reliable target temperature time-series completion matrix.
[0069] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.
Claims
1. A method for data completion of temperature control sensors based on generative adversarial networks, characterized in that: Obtain the time-series data matrix of the temperature control sensor containing the missing data; Construct a model containing a generator and a discriminator, wherein the generator is configured with a forward topological hard embedding layer and an active feature selection module; The time-series data matrix is input into the forward topological hard embedding layer, and the automatic differentiation is used to calculate the first-order temporal partial derivative and the second-order spatial partial derivative matrix to obtain the partial differential physical residual; the partial differential physical residual is embedded into the basic neuron transmission equation to output the latent feature matrix. The latent feature matrix is input into the active feature selection module; Calculate the divergence weights of the query vectors and use a sparse attention mechanism to remove low-weight redundant vectors; perform dimensionality compression on the retained vectors to output a global temporal feature matrix, and decode to generate the first synthetic sequence matrix; The first synthesized sequence matrix and the real sequence are input into the discriminator to obtain the discrimination confidence; based on the discrimination confidence, a composite loss function consisting of adversarial loss, reconstruction loss and residual constraint loss is calculated; The network parameters are iterated using a gradient penalty mechanism until the composite loss function converges; the time-series data matrix is interpolated to output the target temperature time-series completion matrix.
2. The method for data completion of temperature control sensors based on generative adversarial networks according to claim 1, characterized in that, The step of calling the automatic differential calculation to obtain the partial differential physical residual includes: In the forward topology hard embedding layer, the automatic differentiation is used to construct a neural network forward computation graph about the temperature variable; based on the neural network forward computation graph, the first-order partial derivative of the temperature variable with respect to the time step is extracted from the time-series data matrix; the second-order partial derivative matrix of the temperature variable with respect to the three-dimensional spatial dimension is extracted, and matrix scaling is performed on the second-order partial derivative matrix using the effective thermal diffusivity of the target environment; the difference matrix between the first-order partial derivative and the scaled second-order partial derivative matrix is calculated, and the difference matrix is defined as the partial differential physical residual.
3. The method for data completion of temperature control sensors based on generative adversarial networks according to claim 1, characterized in that, Embedding the partial differential physical residual into the basic neuron transfer equation and outputting a latent feature matrix includes: obtaining the weight matrix and bias vector of the basic neuron transfer equation in the forward topological hard embedding layer; performing a linear mapping calculation on the time-series data matrix using the weight matrix and superimposing the bias vector to generate an initial feature state matrix; obtaining preset physical constraint weight coefficients and performing matrix scaling calculation on the partial differential physical residual using the physical constraint weight coefficients; performing element-wise summation operation on the initial feature state matrix and the scaled partial differential physical residual to generate a physical constraint feedforward matrix; inputting the physical constraint feedforward matrix into a nonlinear activation function for activation calculation, and defining the output result of the nonlinear activation function as the latent feature matrix.
4. The method for data completion of temperature control sensors based on generative adversarial networks according to claim 1, characterized in that, The calculation of query vector divergence weights and the removal of low-weight redundant vectors using a sparse attention mechanism includes: in the active feature selection module, generating corresponding feature query vectors and feature key vectors based on the latent feature matrix; calculating the attention probability distribution between each feature query vector and all feature key vectors; using a metric distribution divergence algorithm to calculate the difference metric between the attention probability distribution and the standard uniform distribution; defining the difference metric as the query vector divergence weight corresponding to the feature query vector; setting an activity weight threshold, and using the sparse attention mechanism to extract feature query vectors whose query vector divergence weight is not less than the activity weight threshold as the retained vectors; and determining feature query vectors whose query vector divergence weight is lower than the activity weight threshold as low-weight redundant vectors.
5. The method for data completion of temperature control sensors based on generative adversarial networks according to claim 1, characterized in that, Generating the first synthetic sequence matrix includes: in the active feature selection module, generating feature value vectors that match the feature query vector and feature key vector based on the latent feature matrix; extracting the attention probability distribution corresponding to the retained vector, and constructing a query-key relevance weight matrix using the attention probability distribution of the retained vector; performing matrix multiplication on the query-key relevance weight matrix and the feature value vector to achieve the dimensionality compression based on the feature weights, and defining the aggregated output of the matrix multiplication operation as the global temporal feature matrix; inputting the global temporal feature matrix into the generative decoding network layer configured in the generator; in the generative decoding network layer, performing a single forward network reconstruction on the global temporal feature matrix through multi-layer fully connected mapping calculation to restore it to the same time step and spatial channel dimension as the temporal data matrix, and defining the reconstructed sequence matrix as the first synthetic sequence matrix.
6. The method for data completion of temperature control sensors based on generative adversarial networks according to claim 1, characterized in that, The calculation of the composite loss function includes: based on the discrimination confidence, calculating a probability distribution difference metric between the generated data distribution of the first synthesized sequence matrix and the real data distribution of the real sequence, and defining the probability distribution difference metric as the adversarial loss; extracting known observation data points that are not missing from the time-series data matrix, calculating the mean square error between the first synthesized sequence matrix and the real sequence at the corresponding positions of the known observation data points, and defining the mean square error as the reconstruction loss; extracting target imputation data points that are missing from the time-series data matrix, obtaining the partial differential physical residual calculated by the forward topology hard embedding layer, calculating the mean square error of the partial differential physical residual at the corresponding positions of the target imputation data points, and defining the mean square error as the residual constraint loss; obtaining preset adversarial loss weight coefficients, reconstruction loss weight coefficients, and residual constraint weight coefficients, performing multiplication operations on the adversarial loss, reconstruction loss, and residual constraint loss using the adversarial loss weight coefficients, reconstruction loss weight coefficients, and residual constraint weight coefficients respectively, and weighting and summing the output results of the multiplication operations to generate the composite loss function.
7. The method for data completion of temperature control sensors based on generative adversarial networks according to claim 1, characterized in that, The gradient penalty mechanism includes: performing a linear interpolation calculation with a random ratio between the first synthesized sequence matrix and the real sequence to generate an interpolated sample matrix; inputting the interpolated sample matrix into the discriminator and calculating the gradient norm of the discriminator's output relative to the interpolated sample matrix; calculating the squared difference between the gradient norm and 1, and multiplying the squared difference by a preset penalty coefficient to generate a gradient penalty regularization term; superimposing the gradient penalty regularization term into the composite loss function, and performing backpropagation based on the superimposed composite loss function to iteratively update the network parameters in the model.