Method for selective reconstruction and alignment of missing multivariate time series

By introducing an uncertainty assessment mechanism and multi-scale feature extraction, combined with semantic alignment and information compensation, the dynamic missing rate and multi-scale feature coupling problems in multivariate time series forecasting in power systems are solved, thereby improving the stability and robustness of the forecast.

CN122241022APending Publication Date: 2026-06-19BEIJING UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING UNIV OF TECH
Filing Date
2026-03-17
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing multivariate time series forecasting methods have failed to effectively handle the problems of dynamic missing rates and multi-scale feature coupling in power systems, resulting in unstable forecast results and poor robustness, especially with a significant performance degradation under high load or abnormal operating conditions.

Method used

An uncertainty assessment mechanism is introduced, and a collaborative learning strategy combining multi-scale feature extraction, semantic alignment, and information compensation is adopted to adaptively adjust the feature alignment strength and information compensation method, thereby improving the stability and robustness of the model under different missing conditions.

🎯Benefits of technology

It improves the stability and robustness of power load forecasting, maintains the stability and consistency of forecast results under complex missing scenarios, and enhances the model's generalization ability and application flexibility.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122241022A_ABST
    Figure CN122241022A_ABST
Patent Text Reader

Abstract

To address the issue of decreased prediction stability of power operation data under dynamic missing conditions, this invention proposes a selective reconstruction and alignment method for missing multivariate time series data to mitigate semantic shifts and feature distortions caused by changes in missing patterns. Specifically, this invention first constructs an uncertainty modeling framework with multiple missing views to uniformly characterize the reliability of data semantic representations under different missing conditions. Based on this, a multi-scale time series representation structure integrating time indices and periodic prior information is constructed to enhance the model's ability to collaboratively model local dynamic changes and long-term dependencies. Simultaneously, semantic consistency constraints across missing conditions are dynamically adjusted based on uncertainty information to suppress interference from noise supervision and spurious negative samples, and a controlled feature compensation mechanism is introduced in regions of insufficient reliability. Through these designs, the prediction accuracy and generalization robustness of the model are improved under complex missing scenarios while ensuring load prediction stability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of artificial intelligence and time series data processing technology, and designs a selective reconstruction and alignment method for missing multivariate time series. Background Technology

[0002] Multivariate Time Series Forecasting (MTSF), a crucial technique in power system operation data analysis, jointly models multiple interrelated historical operation sequences to predict future operational trends and assess risks. The forecast results are widely applied in scenarios such as operation status monitoring, dispatch optimization, and safety early warning. Its prediction accuracy and stability directly impact the overall operational efficiency and safety level of the power system. With the expansion of power grid scale and the continuous broadening of monitoring dimensions, power operation data exhibits a significant increasing trend in variable correlation, structural complexity, and dynamic change characteristics, leading to a continuous increase in data modeling difficulty and placing higher demands on the robustness, stability, and generalization ability of existing forecasting methods.

[0003] In real-world power operation scenarios, the data acquisition process is often affected by factors such as sensor failures, communication link anomalies, equipment maintenance, and external environmental interference, leading to varying degrees of missing data in historical observations. Unlike the assumed random missing data pattern under ideal conditions, missing data in real-world power operation typically exhibits a dynamic, non-fixed missing rate characteristic over time. That is, the location and proportion of missing data differ significantly across different time periods, especially during high-load operation or abnormal operating conditions. This dynamic missing data phenomenon disrupts the continuity of the time series and the correlation structure between variables, causing significant imbalances in data quality across different operational phases, thus adversely affecting the stability and reliability of prediction results.

[0004] To address the issue of missing power operation data, existing multivariate time series forecasting methods typically combine data preprocessing with model training. Some studies use interpolation, imputation, or reconstruction to complete missing data and then perform unified modeling based on the completed data. Furthermore, some studies consider directly modeling under incomplete observation conditions, attempting to learn feature representations that are robust to missing data. However, these methods often implicitly assume that the missing patterns are relatively stable or use a uniform strategy to handle samples with different missing proportions, failing to fully consider the dynamic changes in the missing rate over time in power operation data. In actual power grid operation scenarios, the degree of missing data may vary significantly across different time periods; for example, continuous missing data is more likely to occur during high-load periods or under abnormal operating conditions. In such cases, if a uniform modeling strategy is still used, the model's performance is likely to degrade significantly under high missing data conditions, resulting in increased fluctuations in prediction results and making it difficult to maintain stability and reliability over long-term operation.

[0005] Furthermore, electricity load data contains both low-frequency components reflecting long-term trends and high-frequency components characterizing short-term fluctuations, resulting in complex coupling relationships between information at different time scales. Under conditions of missing data and noise interference, the correlation structure between multi-scale features is easily disrupted, limiting existing methods in the collaborative modeling of information at different time scales. When the degree of missing data is high or the missing data exhibits structured characteristics, relying solely on a single learning objective or fixed constraints often fails to maintain model robustness while ensuring prediction accuracy. For these reasons, in multivariate time series forecasting applications, the dynamic changes, variable missing rates, and multi-scale feature coupling characteristics exhibited by missing data make existing methods prone to fluctuations in prediction results and degradation in model performance under complex missing conditions. Maintaining the stability and consistency of prediction results under different missing patterns and data reliability conditions has become a critical technical problem that urgently needs to be solved in this field. Therefore, maintaining the stability and consistency of prediction results under different missing patterns and data reliability conditions has become a critical technical problem that urgently needs to be solved in the field of electricity load forecasting. Summary of the Invention

[0006] To address the challenges of dynamically changing missing rates, unstable semantic information, and complex multi-scale feature coupling in the actual acquisition of multivariate time series data, this invention proposes an uncertainty-controlled multivariate time series prediction method. This method introduces an uncertainty assessment mechanism to adaptively adjust the feature alignment and information compensation processes based on data reliability under different missing conditions, thereby improving the stability and robustness of the power load forecasting system in data-missing scenarios.

[0007] To address the dynamic missing data problem in practical industrial scenarios of power load forecasting, this invention proposes a selective reconstruction and alignment method for missing multivariate time series data. This method, designed for multivariate time series data with dynamic missing characteristics, constructs a unified prediction modeling framework and performs multi-scale feature extraction and joint modeling on historical observation data. Specifically, by introducing an uncertainty assessment mechanism, the reliability of data under different time periods, different missing proportions, and different missing structures is quantitatively characterized to represent the confidence level of the current time series information. Based on this, a collaborative learning strategy combining feature alignment and information compensation is introduced. The semantic feature alignment strength and the compensation method for missing information are adaptively adjusted according to the uncertainty assessment results, achieving selective modeling and fusion of multi-scale time series features. When data reliability is high, the alignment degree of multi-view time series features is improved by strengthening semantic consistency constraints across missing conditions; when data reliability is low or information missing is severe, a controlled information compensation mechanism is introduced to reduce the risk of error accumulation and performance degradation caused by the unified modeling strategy under complex missing conditions. Furthermore, considering the differences in statistical characteristics between long-term trend features and short-term fluctuation features in multivariate time series, the proposed method adopts a multi-scale collaborative modeling approach, which effectively alleviates the disruption of the continuity of the time series structure caused by missing data in complex missing scenarios, thereby improving the stability, robustness and generalization ability of power load forecasting results.

[0008] The method described in this invention and its implementation principle are described in detail below.

[0009] Step 1. Construct a model for assessing missing multivariate time series samples and uncertainty.

[0010] Based on the application scenario of power system load forecasting, a multivariate time series sample composed of historical operating data is obtained, such as... Figure 1 As shown. The historical operating data was collected continuously for two years at 15-minute sampling intervals, including transformer active load, oil temperature, ambient temperature, ambient humidity, voltage, current, and time period identifier variables. This constitutes a multivariate time series sample. ,in This indicates the number of variables included in the time series. Indicates the length of historical observations. This represents the number of feature channels corresponding to each variable. In actual power grid operation, due to sensor failures, communication interruptions, or abnormal operating conditions, historical observation data may be missing to varying degrees during the acquisition and storage phases. Therefore, the multivariate time series samples used in the training phase... This data itself is incomplete observational data. Let... This represents the actual missing observation patterns in the training set, where This indicates that a valid observation exists at this location; otherwise, it is considered a true missing observation. The corresponding prediction target is denoted as... , Indicates the length of the prediction period.

[0011] Furthermore, to construct different missing rate views, additional masking is performed only based on existing observation data, without introducing any additional assumptions about the true missing locations. A set of missing rates is defined. ,in Indicates different missing percentages. This indicates the number of missing rate categories; the default missing rate set is... For each missing rate Generate the corresponding mask matrix based on random missing or structured missing strategies. And construct missing multivariate time series samples Its definition is:

[0012]

[0013] in This represents element-wise multiplication. Indicates the preset missing fill value and .

[0014] A set of multi-view missing samples with different missing rates is constructed within the same time window. This is used to simulate the actual operating state of power operation data under different levels of data loss. To represent the variation characteristics of multivariate time series at different time scales, a time index sequence is constructed. And extract periodic feature information based on time index. Periodic features, including hourly and weekly periodic information, are used to represent the operating status of power load at different periodic positions and serve as auxiliary information input for multi-scale feature modeling and uncertainty assessment. Based on this, to quantify the semantic reliability of multivariate time series samples under different missing conditions, an uncertainty assessment model is constructed to output the confidence index of the current time series sample under a given missing mode. This index reflects the stability of the current load operating status under missing conditions and serves as a unified basis for subsequent semantic feature alignment intensity adjustment, weight allocation, and information compensation strategy selection. Specifically, to quantify the uncertainty of the model's semantic representation under missing conditions during the training phase, a representation uncertainty assessment mechanism based on random inference of the teacher model is introduced. This model does not treat the teacher representation as a deterministic objective, but rather, given a missing multivariate time series sample, obtains a set of teacher hidden representations through multiple random forward propagations, thereby characterizing the distribution characteristics of the model representation under the current missing mode. Randomness is achieved by enabling random deactivation, parameter perturbation, or random sampling mechanisms during the inference phase, performing random inactivation on the same sample. A random forward propagation is performed to obtain a set of teacher hidden representations:

[0015]

[0016] in Indicates the first The teacher's hidden semantic representation obtained through random inference. Indicates the first The teacher encoder, which introduces randomness in the next forward propagation, specifically employs a 3-layer MLP feedforward neural network structure with input feature dimensions of... Corresponding to traffic flow, hour embedding, and weekday embedding, the hidden embedding dimension is set to The output embedding dimension is... Therefore, this encoder implements arrive The mapping. Each layer has a hidden dimension of 128. The first layer receives the input from... Dimension mapping to The second and third layers maintain the same input and output dimensions. Each linear mapping layer is followed by the following connections: (1) a nonlinear activation function unit, used to enhance the network's expressive power. In this invention, the activation function is a modified linear unit function. (2) A random deactivation unit, used to randomly mask hidden neurons during the training and random inference phases to simulate model parameter perturbations. The random deactivation unit operates according to a preset probability. The neuron outputs are zeroed out. During the inference phase, the randomly deactivated units are kept on, thus generating different random neuron masking patterns in each forward propagation to achieve random perturbation of the model output. Based on the results of multiple forward propagations, the mean of the teacher representation is calculated. With variance This is used to characterize the central tendency and dispersion of teachers' semantic representations:

[0017]

[0018]

[0019] The mean of the teacher representations reflects the dominant semantics learned by the model under the current missing conditions, while its variance quantifies the stability of the model's understanding of that semantics, i.e., semantic uncertainty. Further, based on the variance of the teacher representations, for missing multivariate time series samples... Constructing uncertainty indicators The sample confidence level is calculated using the uncertainty index. Its definition is:

[0020]

[0021]

[0022] in When larger Smaller When smaller The uncertainty is relatively large. Based on the above uncertainty assessment process, for each missing multivariate time series sample, a corresponding [details can be obtained]. and , Used to quantify the stability of a model's semantic understanding of the current sample under given missing conditions. This is used to transform semantic uncertainty into a control quantity that can directly participate in subsequent feature alignment strength adjustment and weight allocation.

[0023] Step 2. Multi-scale feature extraction and semantic representation construction.

[0024] In power load forecasting scenarios, load exhibits a combination of short-term fluctuations and long-term trends across different time scales. For each missing sample... Set according to a preset time scale The historical time series is divided into multiple scales using {96, 192, 336, 720}. Indicates the first The length of the time window corresponding to each time scale. This indicates the number of time scales. Each missing sample is partitioned using a multi-scale partitioning operation. Mapped to ,in Indicated on the time scale The power operation subsequences are extracted or aggregated. For missing subsequences at different time scales, a multi-scale feature extraction model is constructed to learn the dynamic feature representation of the time series at the corresponding time scale. The multi-scale feature extraction model includes several feature extraction sub-modules, each corresponding to a time scale. and the missing subsequence of the input Perform feature mapping operation:

[0025]

[0026] in Representation and time scale The corresponding feature extraction function, This is a scale-level missing mask. The above feature mapping process is combined with the semantic confidence index obtained in step 1. This enables adaptive adjustment of the scale feature extraction intensity under different missing conditions, thus improving the scale feature representation. It can reflect the distribution of missing values ​​at the current time scale and their impact on semantic stability.

[0027] After obtaining the scale feature representations at various time scales, to avoid applying uniform strength fusion constraints to all scale features under different missing conditions and time locations, a cross-scale fusion mechanism that is aware of missing values ​​and confidence levels is further constructed. First, regarding the time scale... According to its corresponding scale-level missing mask The scale of calculation can be scaled. :

[0028]

[0029] in Representing the scale embedding vector, constructing a scale-level reliability gating function. :

[0030]

[0031] in For the Sigmoid function, and These are learnable parameters. This gating function represents the relative reliability of the semantic representation of power load based on features at different time scales, given the current missing conditions and time position. A time index sequence is constructed. And extract periodic feature information based on time index. .in, Indicates index by time The periodic temporal feature embeddings obtained from the mapping are used to represent the structural state information of the time series at different periodic positions and serve as auxiliary temporal condition inputs for subsequent multi-scale semantic modeling and uncertainty assessment. To ensure the stability of weight allocation across different time scales, the scale-level gating weights are normalized.

[0032]

[0033] in This represents the scale fusion weights obtained by normalizing all scale-gated scores. Indicates a time-scale index. Indicates the index for normalized summation. Indicates the number of time scales. Indicates the first Gated scores at each time scale. After obtaining the normalized scale weights, the feature representations at different time scales are weighted and fused to obtain a unified conditional semantic feature representation:

[0034]

[0035] in Scale-specific linear mappings or multilayer perceptrons are used to spatially align feature representations across different time scales. Through the aforementioned multi-scale feature extraction, gated modeling with missing and confidence awareness, and cross-scale fusion processes, the model can form stable and consistent conditional semantic representations under different missing conditions and time scale differences. This provides a basic representation for subsequent semantic alignment and information compensation.

[0036] Step 3. A semantic feature alignment method based on uncertainty regulation with multiple missing rates.

[0037] Given that the semantic representation stability of power system operation data varies significantly under different missing ratios and missing patterns—for example, the model's understanding of load change trends may deviate during peak load periods or sensor anomaly phases—applying uniform alignment constraints to semantic representations under different missing conditions without semantic reliability assessment could lead to over-reinforcement of erroneous semantic information, thus affecting load forecasting accuracy. Therefore, this invention, based on the uncertainty assessment results constructed in step 1, uniformly regulates the semantic alignment and consistency learning process under multiple missing rate conditions. Furthermore, considering that teacher semantic references also exhibit uncertainty under different missing conditions, this step explicitly introduces uncertainty information regarding the distribution of teacher semantics during the semantic alignment process to avoid forcibly aligning unreliable teacher semantics as a deterministic target. Let the results obtained in step 2 under the missing rate conditions... The semantic representation of the following is The semantic distribution center and variance obtained by the teacher under random inference are respectively , The uncertainty index obtained in step 1 is For any missing rate condition Construct its connection with the teacher semantic distribution center Alignment constraints are applied, and the uncertainty in the semantic distribution of teachers is explicitly introduced for weighting:

[0038]

[0039]

[0040] in Indicates missing conditions The semantic feature alignment loss is used to constrain the semantic representations learned by the student model under the current missing patterns. Consistent with the center of the semantic distribution of the teacher model, These are numerically stable terms used to prevent instability during numerical calculations. This is an uncertainty regularization term used to avoid over-enhancing alignment constraints when the teacher's semantics are unstable.

[0041] However, in scenarios involving missing multivariate time series data on power load, the same transformer operating sample may exhibit significantly different semantic representations under varying missing rates and patterns. For example, during periods of high load operation or drastic temperature fluctuations, missing data will significantly impact the model's characterization of load change trends, leading to a decrease in the stability of the semantic representation. To avoid imposing overly strong consistency constraints when semantic reliability is insufficient, this invention utilizes the sample-level semantic uncertainty index obtained in step 1. A consistency strength regulation mechanism is introduced for semantic representation under different missing conditions. Specifically, for indexes... and The semantic representation pairs constituted define the consistency strength regulation factor. for:

[0042]

[0043] in and They represent semantic representations respectively. , This corresponds to the sample-level semantic uncertainty under missing conditions. When the sample semantic uncertainty is high... Reduce, lower the consistency constraint strength; when semantic uncertainty is low Increase the size to reinforce consistency learning. This mechanism controls at the global level whether consistency constraints should be enforced, avoiding over-alignment under semantically unreliable conditions.

[0044] Building upon teacher semantic anchoring, to avoid semantic shift caused by discrete constraints imposed on negative samples under varying missing values, this step uses the teacher semantic distribution as a continuous reference space to continuously adjust the gradient contribution of negative samples. For any two missing values ​​within the same time window... , The semantic representation obtained below and Define semantic similarity :

[0045]

[0046] in Based on this, the weights of continuous negative samples are defined. :

[0047]

[0048] in This is a similarity threshold used to control the smooth transition of negative sample weights from valid to invalid. This is the smoothing coefficient. When the teacher model determines that sample pairs are semantically similar, This reduces the gradient contribution of the sample as a negative sample. Based on this, a weighted semantic consistency constraint is constructed as follows:

[0049]

[0050]

[0051] in , , Indicates sample With sample Consistency intensity regulation factor between them Indicates the missing rate The semantic representation obtained below, The distance function represents the measure of semantic feature similarity. This represents the vector transpose operation. express Norm. (Through) This invention further refines the contributions of different sample pairs given an overall strength, thereby enhancing consistency when semantic reliability is high and reducing constraints when semantic instability or severe missing data occurs. Unlike standard contrastive learning, which explicitly widens the distance between negative samples, this invention does not directly maximize the semantic margin between negative samples, but rather uses weights... This design suppresses erroneous consistency constraints on spurious negative sample pairs, thereby avoiding over-alignment across samples. It is more suitable for multivariate time series scenarios with variable and periodic missing rates.

[0052] Because power load time series typically exhibit significant periodicity and repetitive operating patterns, samples from different time windows may be highly similar semantically. Treating all such semantically similar but temporally different samples as negative samples can easily introduce spurious negative samples and lead to semantic drift. Therefore, this step introduces a spurious negative sample filtering mechanism based on the time periodic structure. Let the sample... The time index is The periodic characteristics are For the sample and candidate samples Define the candidate negative sample set for periodic awareness. , This represents samples that are not located in the same period at any given time. To avoid false exclusion caused by the continuity of short-term adjacent states, a minimum time interval threshold can be added. : ,in The minimum time interval threshold is used to avoid redundant constraints at adjacent time points, and its Δ= ⋅ , The time series sampling period, It is a positive integer, generally taken as This filtering rule does not rely on the teacher model; it only uses temporal structure information to exclude sample pairs from the same periodic position from entering the negative sample set, thereby reducing the risk of false negative samples in periodic scenarios.

[0053] For candidate negative sample pairs after periodic filtering Further bias removal judgment is performed based on the semantic distribution information of teachers. Let the sample... The corresponding teacher semantic distribution centers are respectively , The semantic distribution layer similarity of its teachers is defined as follows:

[0054]

[0055] Based on this, a debiasing gating coefficient is introduced. Used to determine sample pairs Whether a pair should be considered a valid negative sample is defined as follows:

[0056]

[0057] in This is an indicator function that takes the value 1 when the condition within the parentheses is true, and 0 otherwise. A threshold for semantic debiasing in the teacher's semantic space is used to determine whether two samples should be considered semantically similar in the teacher's semantic space. At that time, it was considered that the sample and Semantic similarity exists in the teacher's semantic space, at this point This sample does not participate in subsequent ranking or bias removal constraints, thus avoiding the incorrect constraint of treating semantically similar samples as negative samples; when hour, This sample pair is considered a valid negative sample pair and is used to construct the subsequent bias-reduction ranking constraint. Based on this, the bias-reduction ranking constraint loss is constructed. This is used to explicitly constrain the semantic margin of valid negative sample pairs, and is defined as follows:

[0058]

[0059] in , This is an interval hyperparameter used to control the minimum separation magnitude of the sorting constraints.

[0060] To ensure that consistency constraints apply only to valid sample pairs that have passed periodic filtering, the weighted semantic consistency constraint is rewritten as follows:

[0061]

[0062] Step 4. Information compensation and missing feature reconstruction method based on uncertainty regulation.

[0063] In actual power grid operation, when sensors are offline for extended periods or data loss is severe during high-load operation, relying solely on semantic alignment mechanisms may not be sufficient to fully recover missing load and related operational variable information, thus affecting short-term load forecast accuracy and system stability. To address this issue, this invention further introduces an uncertainty-aware information compensation and missing feature reconstruction mechanism to enhance the model's ability to model missing information when semantic alignment reliability is low. To avoid unnecessary disturbance to existing observation data, the information compensation and reconstruction process only applies to the set of missing locations. This does not modify the actual observations. Specifically, for the missing multivariate time series samples constructed in step 1... Any missing view sample in the collection Define its missing location set for:

[0064]

[0065] in , , These represent the variable index, time index, and feature channel index, respectively. For missing rate view The missing indicator variable is set below, with a value of 0 indicating a missing value at the corresponding position. Based on the multi-scale semantic feature fusion result constructed in step 2, the missing samples are denoted as... In time index The corresponding fusion semantic representation is as follows: ,in This represents the unified semantic representation dimension obtained after multi-scale semantic feature fusion. Simultaneously, the time periodic features obtained in step 2 are introduced. , The embedding dimension of the time-periodic features is used to represent the periodic structure and phase information in the time series, constructing a conditional semantic vector for missing feature reconstruction:

[0066]

[0067] in Indicated in the missing rate view Next, Time Index The conditional semantic context vector constructed at that point is composed of multi-scale fused semantic representations. With time cycle characteristics It is constructed through vector concatenation operations and is used to provide joint semantic and temporal conditional information for the subsequent reconstruction of missing features. This represents a vector concatenation operation. The conditional semantic vector is indexed at the same time. The following applies to all variables and feature channels, and is shared for any missing location. The corresponding missing feature reconstruction values ​​are calculated through explicit linear mapping:

[0068]

[0069] in Indicated in the missing rate view Below, represented by conditional semantics The predicted first Variables in time ,aisle The missing feature reconstruction value at the location. and These are learnable parameters used to map the conditional semantic vector to the numerical space of the corresponding variables and feature channels. This reconstruction process does not rely on any complete observation assumptions; it estimates based solely on the semantic representation under the current missing features. Based on the above missing feature reconstruction results, a missing feature reconstruction loss function is defined. for:

[0070]

[0071] in Represents the distance metric operator. This represents the actual observed value at the corresponding location, and its value comes from the original multivariate time series sample in the variable dimension. Time dimension and feature channels Numerical representation on the data. Introducing an uncertainty-based index. Information compensation regulation factor This is used to adaptively adjust the strength of the missing feature reconstruction term in the overall optimization objective. Specifically, it is based on the semantic uncertainty index obtained in step 1. and the semantic alignment reliability function defined in step 3 The information compensation regulation factor is constructed as follows:

[0072]

[0073] Overall information compensation loss Defined as:

[0074]

[0075] This regulatory mechanism, by complementing the semantic alignment reliability function, achieves adaptive adjustment of the trade-off between semantic alignment and information compensation under different missing conditions.

[0076] Step 5. Multi-objective joint optimization method based on uncertainty control.

[0077] In steps 3 and 4, this invention constructs a semantic feature alignment mechanism based on semantic uncertainty regulation with multiple missing rates, and a selective information compensation and missing feature reconstruction mechanism, respectively, to improve semantic consistency and information integrity under different missing rates in power operation data. In actual power grid operation, transformer load forecasting not only needs to ensure forecast accuracy but also needs to maintain forecast stability under conditions of data missingness or sensor anomalies. Therefore, to achieve synergistic optimization among load forecast accuracy, semantic consistency, and missing information compensation, this invention constructs a multi-objective joint optimization framework. Specifically, for any missing rate condition... Missing multivariate time series samples constructed below Prediction results based on student model output Define the prediction loss function , used to constrain the prediction accuracy of the model under the current missing conditions, is defined in the form of:

[0078]

[0079] in Indicates missing conditions Below, the student model for the first The electricity variables at the prediction time step The predicted value output at that point. This represents the corresponding actual predicted target value. For the number of variables, This is the prediction time step length. This loss term is used to constrain the model's prediction accuracy under the current missing conditions.

[0080] Based on this, the semantic feature alignment loss constructed in step 3 is introduced. and semantic consistency constraint loss This is used to ensure the consistency of the semantic representation learned by the same sample under different missing rate views. Furthermore, a bias-reduction ranking constraint loss is introduced. Minimum semantic margin constraints are applied only to valid negative sample pairs confirmed after periodic filtering and teacher semantic discrimination. This avoids semantically similar power operation states being misclassified as negative samples and participates in the constraints, and enhances the semantic distinction between different operating conditions, improving the model's accuracy in distinguishing between abnormal and normal load fluctuations. However, when semantic alignment reliability is insufficient or the degree of missing information is high, relying solely on semantic consistency constraints is insufficient to fully recover key information in the input space. Therefore, the information compensation and missing feature reconstruction loss constructed in step 4 is introduced. This is used to enhance the model's ability to model and recover missing information under conditions of high missing information. To achieve an adaptive balance among multiple objectives, a semantic alignment reliability adjustment coefficient is introduced. This is used to represent the overall reliability of semantic alignment and consistency constraints under the current missing condition. Specifically, it is based on the semantic uncertainty index obtained in step 1. and the semantic alignment reliability function defined in step 3 Jointly optimize weights This design enables the model to adaptively adjust the strength of semantic alignment and consistency learning in the overall optimization objective under different missing conditions. When semantic uncertainty is low and semantic alignment has high reliability, the model emphasizes semantic alignment and consistency constraints; conversely, when semantic uncertainty is high or the degree of missing information is large, the model automatically weakens semantic alignment-related constraints and correspondingly enhances the influence of information compensation and missing feature reconstruction terms in the overall optimization objective. Accordingly, This is used to adjust the weights of information compensation and missing feature reconstruction terms in the overall optimization objective. This leads to the construction of a missing rate condition. The joint optimization objective function is as follows:

[0081]

[0082] Furthermore, for all missing rate condition sets By uniformly modeling the joint optimization objective under the given conditions, we obtain the overall training objective function:

[0083]

[0084] This invention employs the Adam optimization algorithm to backpropagate and update the overall loss function, minimizing the model parameters to a weighted joint objective that integrates prediction loss, semantic alignment loss, semantic consistency loss, and missing feature reconstruction loss under different missing rate conditions. The optimization parameters are fixed at a learning rate of 0.0002, weight decay of 0.0001, batch size of 16, maximum gradient norm of 5, and a maximum training epoch of 100. To ensure convergence stability, the overall loss function is considered to have a relative rate of change of less than 1 / 2 for 10 consecutive epochs. Training terminates upon reaching the specified time; otherwise, training automatically ends after the 100th round. Through the aforementioned multi-objective joint optimization mechanism, the model can adaptively adjust the weight allocation of the semantic alignment and information compensation modules based on the semantic uncertainty assessment results under different power data missing ratios. While ensuring the accuracy of power load prediction, it effectively improves the model's prediction stability and robustness under scenarios of sensor anomalies, communication interruptions, and missing peak load data.

[0085] Beneficial effects

[0086] Compared with the prior art, the present invention has the following advantages:

[0087] 1. This invention uses an uncertainty assessment mechanism to perform reliability modeling on samples with multiple missing rates, thereby achieving adaptive control of semantic alignment and information compensation and improving the robustness of the model in complex missing scenarios.

[0088] 2. This invention introduces joint semantic consistency constraints under multiple missing rate conditions, enabling a single model to adapt to different missing patterns, avoiding repeated training, and improving the model's generalization ability and application flexibility.

[0089] 3. This invention employs a selective information compensation mechanism, which reconstructs only the missing locations in a controlled manner, reducing interference with the observed data and improving prediction stability and the reliability of the training process. Attached Figure Description

[0090] Figure 1 For missing multivariate time series prediction models;

[0091] Figure 2 This describes the overall process of selective reconstruction and alignment methods. Specific implementation methods

[0092] To make the objectives, technical solutions, and advantages of the present invention clearer, the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0093] This invention proposes an adaptive modeling method for power load forecasting to improve forecast stability under conditions of missing operational data. The method performs missing data awareness analysis on the input multivariate power operation data and dynamically adjusts the feature modeling intensity based on data reliability. By performing hierarchical modeling and collaborative fusion of historical information, the model selectively performs feature alignment or information compensation operations under different missing data states, thereby achieving multi-stage feature collaborative learning and improving overall modeling stability and robustness while ensuring forecast accuracy.

[0094] The specific steps are as follows:

[0095] Step 1: Construct a model for assessing missing multivariate time series samples and uncertainty.

[0096] In this embodiment, historical load data and related environmental data collected by a transformer operation monitoring system in a certain regional power grid are acquired to construct a multivariate time series sample for the power load forecasting task. The multivariate time series sample is represented as follows: ,in This indicates the number of variables included in the time series, including transformer load, oil temperature, ambient temperature, ambient humidity, current, and voltage. Indicates the length of historical observation time. This represents the number of feature channels corresponding to each variable. Due to potential sensor failures, communication interruptions, or data anomalies during power grid operation, historical observation data may be missing to varying degrees during acquisition and storage. Therefore, the multivariate time series samples used in the training phase... This represents incomplete observation data. Let the missing pattern of the true observations be denoted as . .when A time interval indicates that a valid observation exists at that location; otherwise, the location is considered missing. The predicted target is denoted as... in This indicates the prediction timeframe. To construct sample views under different missing rate conditions, a missing rate set is defined. ,in This indicates the number of missing rate categories. For any given missing rate... Generate the corresponding mask matrix based on random missing or structured missing strategies. The mask matrix is ​​missing a mask in actual observation. Based on this, further masking is applied without changing the original true missing locations, and missing multivariate time series samples are constructed accordingly. ,in This represents element-wise multiplication. Preset missing padding values ​​are used to fill in missing positions. A time index is introduced to represent the changing characteristics of a multivariate time series at different time points and period stages. And extract periodic feature information based on time index. Building upon this, to quantify the stability of the semantic representation of power load under different missing conditions, an uncertainty assessment model based on stochastic reasoning is introduced. The semantic encoder employs a two-layer feedforward neural network structure, with an input embedding dimension of [missing information]. The hidden layer has a dimension of 128, and the activation function is ReLU, defined as follows: Each layer of linear mapping is followed by a randomly deactivated unit, with a fixed deactivation probability. This remains enabled during both the training and inference phases. Given missing samples... Under the condition, perform on the same sample After one random forward propagation, the set of teacher semantic representations is obtained. The mean and variance of the teacher's semantic representation were calculated based on the results of multiple inferences. , Furthermore, a semantic uncertainty index is constructed based on the semantic variance of teachers. The sample confidence level is calculated using the semantic uncertainty index. .

[0097] Step 2: Multi-scale feature extraction and semantic representation construction.

[0098] For the power load samples constructed in step 1 under different missing rates To characterize the operating patterns of transformer loads under different time periods, a time scale set is defined. The historical time series is divided into multiple scales, among which Indicates the first The length of the time window corresponding to each time scale. This indicates the number of time scales. Missing power load samples are identified through multi-scale partitioning. Mapped to a set of missing subsequences at different time scales , in Indicated on the time scale The missing subsequences obtained through downsampling, sliding window aggregation, or segmented aggregation operations are used to characterize the local dynamics or long-term trend features at that time scale. For the missing subsequences at different time scales, corresponding multi-scale feature extraction models are constructed. Each time scale... Corresponding to a scale feature extraction function The feature extraction function employs a two-layer one-dimensional convolutional structure with a fixed kernel size of 3, a stride of 1, an output feature dimension of 128, and the activation function ReLU. Feature mapping is performed on the input missing subsequence. ,in This is a scale-level missing mask. This is a regulatory factor derived from sample uncertainty, used to adjust the feature extraction intensity under different missing conditions. After obtaining the scale feature representations at each time scale, to achieve missing-aware cross-scale fusion, the time scale is adjusted accordingly. The available scale ratio is calculated based on its scale-level missing mask. Combined with sample uncertainty control factors Scale can be represented by proportion. Scale embedding vector and the periodic characteristics corresponding to the time position Construct scale-level gating functions .in For the Sigmoid function, and These are learnable parameters. To ensure the stability of the scale weight allocation, the gating weights are normalized. Based on the normalized scale weights, the feature representations at different time scales are weighted and fused to obtain a unified conditional semantic feature representation. ,in Scale-specific linear mappings or multilayer perceptrons are used to achieve spatial alignment of features at different time scales.

[0099] Step 3: A semantic feature alignment method based on uncertainty regulation with multiple missing rates

[0100] In the scenario of industrial time series forecasting for power load, the proportion of missing data varies across different time periods. If alignment or co-constraints of uniform strength are applied to semantic features without a reliability assessment, erroneous semantic constraints can easily be introduced, amplifying the impact of noise. Therefore, this invention, based on the uncertainty assessment results constructed in step 1, uniformly regulates the semantic alignment and consistency learning process for multiple missing rates. Let the missing rate condition be... The semantic representation of the following is The semantic distribution center and variance obtained by the teacher under random inference are respectively , The uncertainty indicator is And construct semantic alignment regulation factors For any missing rate condition Construct its connection with the teacher semantic distribution center Alignment constraints ,in This represents a distance function used to measure differences in semantic features. To avoid positive constants with a denominator of zero, we further adjust the semantic alignment strength and consistency constraint weights under different missing rate conditions based on semantic uncertainty indices. We also construct weighted semantic consistency constraints by adaptively controlling the weights of negative samples and periodic structure constraints, thereby improving the stability and robustness of semantic representation learning under multiple missing rate conditions.

[0101] Step 4: Semantic Alignment and Consistency Learning Method Based on Uncertainty Regulation

[0102] In this embodiment, a semantic alignment and consistency learning method based on uncertainty regulation is employed to collaboratively optimize the semantic representations obtained under different missing rates. This method improves the stability of feature learning under conditions of high missing rates or unreliable semantics by dynamically adjusting the semantic alignment strength and consistency constraint weights. The specific execution flow is as follows:

[0103] S1: Initialize model parameters and state variables. Randomly initialize the network parameters of the student and teacher models, and initialize the semantic representation space. Let the missing rate set be... , Indicates the current missing rate condition The following is the number of samples participating in the semantic alignment calculation, and the training iteration count is set to... The maximum number of iterations is set to Simultaneously initialize sample-level semantic uncertainty indicators. and semantic alignment regulation factor .

[0104] S2: Construct semantic representations of missing samples: In the current iteration, for any missing rate condition... The corresponding missing multivariate time series samples Input the multi-scale feature extraction and fusion model described in step 2 to obtain a unified semantic representation. .

[0105] S3: Obtain the semantic distribution and uncertainty index of teachers. For missing samples... Execute according to the random inference mechanism described in step 1 After the next forward propagation, the set of teacher hidden representations is obtained, and the teacher semantic distribution center is calculated as described in step 1. ,variance Semantic uncertainty index and semantic alignment regulation factor .

[0106] S4: Constructing a semantic alignment loss with uncertainty control: conditional on missing rate Constructing student semantic representations With the teacher semantic distribution center Semantic alignment constraints between them. Specifically, based on the mean and variance of the teacher's semantic distribution, distribution-level alignment is performed on the student's semantic representation, and the semantic alignment loss is defined as... ,in For numerically stable terms, This indicates that the mean of the samples within the batch is calculated. To avoid imposing overly strong alignment constraints when the semantics are unreliable or the degree of missing data is high, the sample-level alignment control factor obtained in step S3 is used. Adaptive weighting of semantic alignment loss This design enables the model to strengthen semantic alignment constraints when semantic uncertainty is low, while adaptively suppressing unreliable alignment when semantic uncertainty is high, thereby improving the robustness of semantic consistency learning.

[0107] S5: Construct consistency constraints across multiple missing rates: For the same sample under different missing rate conditions The semantic representation obtained below Constructing a consistency regulation factor Furthermore, continuous sample weights are introduced. Constructing a weighted consistency constraint loss Among them, the consistency regulation factor Semantic uncertainty index of corresponding samples Adaptive computation is employed, reducing the consistency constraint strength when semantic uncertainty is high. Furthermore, the similarity of sample pairs is calculated based on the teacher's semantic space. And construct continuous sample weights Used for overall consistency strength Building upon this foundation, the gradient contributions of different samples to the consistency constraint are further refined. Based on this, a weighted consistency constraint loss function with multiple missing rate is constructed. ,in, This represents the set of semantic representations formed under different missing rates for the same original time window. Indicates the sample in the current training batch Sample sets from different original time windows, Indicates the missing rate The semantic representation obtained below For traversing the set index variable, , This represents the vector transpose operation; express Norm.

[0108] S6: Introducing a Time Periodic Structure and a Teacher Semantic Bias Reduction Mechanism: In multivariate time series scenarios, due to significant periodicity and repetitive patterns, samples from different time windows may be highly similar in semantic space. Directly treating such samples as negative samples to participate in consistency constraints can easily introduce spurious negative samples, leading to semantic drift. Sample-based time indexing... With periodic characteristics Periodically filter the candidate negative sample set. For the sample Construct a candidate set for periodic sensing Its Δ= ⋅ , The time series sampling period, It is a positive integer, generally taken as Based on this, a similarity judgment of teachers' semantic distribution is introduced. Let the sample... The distribution centers in the teacher semantic space are respectively The semantic similarity of its teachers is defined as follows: Further, a debiasing gating coefficient is introduced. in For indicator functions, A threshold for semantic debiasing in teacher evaluation. When hour If a sample pair is considered to be highly similar in the teacher's semantic space, then the sample pair will not participate in the negative sample constraint; when hour This sample pair is considered a valid negative sample pair. Based on this, a bias-reducing ranking constraint loss is constructed. ,in This is the set of candidate sample pairs after periodic filtering. The margin hyperparameter controls the minimum separation distance for negative samples. This bias-reduction loss applies explicit margin constraints to valid negative sample pairs while avoiding incorrect separation constraints on semantically similar samples, thereby reducing the semantic drift risk caused by pseudo-negative samples in periodic scenarios.

[0109] S7: Update model parameters: conditional on the current missing rate Semantic alignment loss constructed below Consistency constraint loss due to multiple missing rates and periodic-aware bias-reduction ranking loss We perform a weighted summation, which serves as the optimization objective for the current iteration, and update the student model parameters through backpropagation.

[0110] S8: Determine the termination condition of the iteration: If Then let If the condition is not met, return to S2; otherwise, end step 4 and output the unified semantic representation result that converges under different missing rates.

[0111] Step 5: Multi-objective joint optimization method based on uncertainty control.

[0112] In industrial operation scenarios for power load forecasting, the prediction error, semantic consistency, and missing feature reconstruction effect of the model under different missing rate conditions are mutually influential. To ensure the stability of the overall performance of the model under dynamic missing environment, this embodiment constructs a multi-objective joint optimization mechanism based on uncertainty control. This method uses the semantic uncertainty evaluation results obtained in step 4 as a basis to uniformly control the prediction loss, semantic alignment loss, multi-missing rate consistency constraint loss, and missing information compensation loss. The specific execution process is as follows.

[0113] S1: Initialize the joint optimization weight parameters. Let the missing rate set be... For any missing rate condition Initialize semantic alignment reliability adjustment coefficient And based on the semantic uncertainty index obtained in step 4 Set its initial value so that Follow Monotonically decreasing, used to characterize the reliability of semantic alignment results under the current missing condition. Simultaneously, an initialization information compensation control factor is used. And make it follow the semantic uncertainty index Monotonically increasing to enhance information compensation when semantics are unreliable or highly lacking.

[0114] S2: Calculate the prediction loss in the current training round. Below, regarding the missing rate condition Missing multivariate time series samples Input the prediction model and obtain the prediction results. Based on the actual predicted target Calculate the prediction loss function ,in Indicates variable index, The total number of variables; Indicates the prediction time step index. To predict the length of time, This represents the prediction error measurement function.

[0115] S3: Calculate the information compensation and missing feature reconstruction loss based on the set of missing locations obtained in step 1. and the unified semantic representation constructed in step 2 Information compensation and reconstruction are performed on the features corresponding to the missing locations to obtain the missing feature reconstruction loss. ,in Indicates the feature channel index. and These represent the reconstructed value and the actual value, respectively. Distance metric function Further introduce information compensation regulation factors. The weighted missing feature reconstruction loss is obtained. .

[0116] S4: Construct a joint optimization objective under the condition of a single missing rate: Given the missing rate... Under these conditions, the prediction loss and the semantic alignment loss constructed in step 4 will be used. Consistency constraint loss due to multiple missing rates and periodic-aware bias-reduction ranking loss To perform joint operations, a scheduling factor based on semantic uncertainty is introduced. Construct a joint optimization objective function under the condition of single missing rate. Specifically, when semantic uncertainty is low, the model focuses on semantic alignment and consistency learning; when semantic uncertainty is high or the degree of missing information is large, the model focuses on missing information compensation and reconstruction.

[0117] S5: Joint Optimization Objective Across Missing Rate Sets: To optimize the set of missing rates... The overall training objective function is obtained by summing the joint optimization objectives under all missing rate conditions. .

[0118] S6: Based on the overall training objective function The Adam optimization algorithm is used for backpropagation to update model parameters. The optimization parameters are set as follows: learning rate 0.0002, weight decay 0.0001, batch size 16, maximum gradient norm 5, and maximum number of training epochs. wheel.

[0119] S7: After each iteration, calculate the relative rate of change of the overall training objective function. ,in The convergence threshold, Indicates the first The overall training objective function value calculated during each training iteration. This represents the overall training objective function value corresponding to the previous training iteration. If this relative rate of change is less than [value missing] for 10 consecutive iterations... If the convergence condition is met, the model is considered to have converged, and the multi-objective joint optimization process is terminated; otherwise, the iteration continues. Until the maximum number of training rounds is reached. Then, the training is forcibly terminated, and the final model parameters are output.

Claims

1. A selective reconstruction and alignment method for missing multivariate time series, characterized by: Step 1. Construct a model for assessing missing multivariate time series samples and uncertainty; Based on the application scenario of power system load forecasting, a multivariate time series sample composed of historical operating data is obtained. This historical operating data is collected continuously for two years at 15-minute sampling intervals and includes transformer active load, oil temperature, ambient temperature, ambient humidity, voltage, current, and time period identifier variables; thus forming the multivariate time series sample. ,in This indicates the number of variables included in the time series. Indicates the length of historical observations. This represents the number of feature channels corresponding to each variable. In actual power grid operation, due to sensor failures, communication interruptions, or abnormal operating conditions, historical observation data may be missing to varying degrees during the acquisition and storage phases. Therefore, the multivariate time series samples used in the training phase... The data itself is incomplete observational data; let's assume... This represents the actual missing observation patterns in the training set, where This indicates that a valid observation exists at this location; otherwise, it is considered a true missing value. The corresponding predicted target is denoted as... , Indicates the length of the prediction period; Define the missing rate set ,in Indicates different missing percentages. This indicates the number of missing rate categories; the default missing rate set is... For each missing rate Generate the corresponding mask matrix based on random missing or structured missing strategies. And construct missing multivariate time series samples Its definition is: ;in This represents element-wise multiplication. Indicates the preset missing fill value and ; A set of multi-view missing samples with different missing rates is constructed within the same time window. This is used to simulate the actual operating state of power operation data under different missing intensities; to represent the variation characteristics of multivariate time series at different time scales, a time index sequence is constructed. And extract periodic feature information based on time index. Periodic features include hourly and weekly periodic information, used to represent the operating status of power load at different periodic positions, and serving as auxiliary information input for multi-scale feature modeling and uncertainty assessment; randomness is achieved by enabling random deactivation, parameter perturbation, or random sampling mechanisms during the inference phase, performing the same sample... A random forward propagation is performed to obtain a set of teacher hidden representations: ;in Indicates the first The teacher's hidden semantic representation obtained through random inference. Indicates the first The teacher encoder, which introduces randomness in the second forward propagation, employs a 3-layer MLP feedforward neural network structure with input feature dimensions of [dimensionality missing]. Corresponding to traffic flow, hour embedding, and weekday embedding, the hidden embedding dimension is set to The output embedding dimension is Therefore, this encoder implements arrive Mapping; each layer has a hidden dimension of 128; the first layer will transfer the input from... Dimension mapping to The second and third layers maintain the same input and output dimensions. Each linear mapping layer is followed by the following connections: (1) nonlinear activation function units, used to enhance the network's expressive power; in this invention, the activation function is a modified linear unit function. (2) A random deactivation unit, used to randomly mask hidden neurons during the training and random inference phases to simulate model parameter perturbations; the random deactivation unit operates according to a preset probability. Set the neuron output to zero; calculate the mean of the teacher's representation. With variance This is used to characterize the central tendency and dispersion of teachers' semantic representations: ; The mean of the teacher representations reflects the dominant semantics learned by the model under the current missing conditions, while its variance is used to quantify the stability of the model's understanding of that semantics, i.e., semantic uncertainty; based on the variance of the teacher representations, for missing multivariate time series samples... Constructing uncertainty indicators The sample confidence level is calculated using the uncertainty index. Its definition is: ; ;in When larger Smaller When smaller The uncertainty is relatively large; based on the above uncertainty assessment process, for each missing multivariate time series sample, a corresponding [details can be obtained]. and , Used to quantify the stability of a model's semantic understanding of the current sample under given missing conditions. This is used to transform semantic uncertainty into a control quantity that can directly participate in subsequent feature alignment strength adjustment and weight allocation; Step 2. Multi-scale feature extraction and semantic representation construction; For each missing sample Set according to a preset time scale The historical time series is divided into multiple scales using {96,192, 336, 720}. Indicates the first The length of the time window corresponding to each time scale. Indicates the number of time scales; through multi-scale partitioning, each missing sample is divided... Mapped to ,in Indicated on the time scale The power operation subsequences are obtained by truncation or aggregation; for missing subsequences at different time scales, a multi-scale feature extraction model is constructed to learn the dynamic feature representation of the time series at the corresponding time scale; The multi-scale feature extraction model includes several feature extraction sub-modules, each corresponding to a time scale. and the missing subsequence of the input Perform feature mapping operation: ;in Representation and time scale The corresponding feature extraction function, For scale-level missing masks; The above feature mapping process combines the semantic confidence index obtained in step 1. This enables adaptive adjustment of the scale feature extraction intensity under different missing conditions, thus improving the scale feature representation. It can reflect the distribution of missing values ​​at the current time scale and their impact on semantic stability; Regarding the time scale According to its corresponding scale-level missing mask The scale of calculation can be scaled. : ;in Representing the scale embedding vector, constructing a scale-level reliability gating function. : ;in For the Sigmoid function, and Learnable parameters; construct time index sequences And extract periodic feature information based on time index. ;in, Indicates index by time The periodic time feature embeddings obtained from the mapping are used to represent the structural state information of the time series at different periodic positions and serve as auxiliary time condition inputs for subsequent multi-scale semantic modeling and uncertainty assessment. To ensure the stability of weight allocation across different time scales, the scale-level gating weights are normalized. ;in This represents the scale fusion weights obtained by normalizing all scale-gated scores. Indicates a time-scale index. Indicates the index for normalized summation. Indicates the number of time scales. Indicates the first The gating scores at each time scale are obtained; after obtaining the normalized scale weights, the feature representations at different time scales are weighted and fused to obtain a unified conditional semantic feature representation. ;in Scale-specific linear mappings or multilayer perceptrons are used to spatially align feature representations across different time scales. Through the aforementioned multi-scale feature extraction, gated modeling with missing and confidence awareness, and cross-scale fusion processes, the model can form stable and consistent conditional semantic representations under different missing conditions and time scale differences. This provides a basic representation for subsequent semantic alignment and information compensation; Step 3. A semantic feature alignment method based on uncertainty regulation with multiple missing rates; Let the missing rate condition obtained in step 2 be used as an example. The semantic representation of the following is The semantic distribution center and variance obtained by the teacher under random inference are respectively , The uncertainty index obtained in step 1 is For any missing rate condition Construct its connection with the teacher semantic distribution center Alignment constraints are applied, and the uncertainty in the semantic distribution of teachers is explicitly introduced for weighting: ; ;in Indicates missing conditions The semantic feature alignment loss is used to constrain the semantic representations learned by the student model under the current missing patterns. Consistent with the center of the semantic distribution of the teacher model, These are numerically stable terms used to prevent instability during numerical calculations. This is an uncertainty regularization term used to avoid over-enhancing alignment constraints when the teacher's semantics are unstable. Based on the sample-level semantic uncertainty index obtained in step 1 A consistency strength regulation mechanism is introduced for semantic representation under different missing conditions; specifically, for indexes... and The semantic representation pairs constituted define the consistency strength regulation factor. for: ;in and They represent semantic representations respectively. , The corresponding sample-level semantic uncertainty under missing conditions; when the sample semantic uncertainty is high. Reduce, lower the consistency constraint strength; when semantic uncertainty is low Increase the size to reinforce consistent learning; Using the teacher semantic distribution as a continuous reference space, the gradient contribution of negative samples is continuously adjusted; for any two missing rates within the same time window... , The semantic representation obtained below and Define semantic similarity : ;in Based on this, the weights of continuous negative samples are defined. : ;in This is a similarity threshold used to control the smooth transition of negative sample weights from valid to invalid. This is a smoothing coefficient; when the teacher model determines that sample pairs are semantically similar, Reduce this, thereby decreasing the gradient contribution of the sample as a negative sample; construct the weighted semantic consistency constraint as follows: ; ;in , , Indicates sample With sample Consistency intensity regulation factor between them Indicates the missing rate The semantic representation obtained below, The distance function represents the measure of semantic feature similarity. This represents the vector transpose operation. express Norm; Let the sample The time index is The periodic characteristics are For the sample and candidate samples Define the candidate negative sample set for periodic awareness. , This represents samples that are not in the same period at any given time. To avoid false exclusion caused by the continuity of short-term adjacent states, a minimum time interval threshold can be added. : ,in The minimum time interval threshold is used to avoid redundant constraints at adjacent time points, and its Δ= ⋅ , The time series sampling period, It is a positive integer, generally taken as This filtering rule does not rely on the teacher model; it only uses time structure information to exclude sample pairs in the same period from entering the negative sample set, thereby reducing the risk of false negative samples in periodic scenarios. For candidate negative sample pairs after periodic filtering Further bias removal judgment is performed based on teachers' semantic distribution information; let the sample... The corresponding teacher semantic distribution centers are respectively , The semantic distribution layer similarity of its teachers is defined as follows: Introducing bias gating coefficients Used to determine sample pairs Whether a pair should be considered a valid negative sample is defined as follows: ;in This is an indicator function that takes the value 1 when the condition within the parentheses is true, and 0 otherwise. The semantic bias removal threshold for teachers is used to determine whether two samples should be considered semantically similar in the teacher's semantic space; when At that time, it was considered that the sample and Semantic similarity exists in the teacher's semantic space, at this point This sample pair does not participate in subsequent sorting or bias removal constraints; when hour, This sample pair is considered a valid negative sample pair and is used to construct the subsequent bias-reduction ranking constraint; based on this, the bias-reduction ranking constraint loss is constructed. This is used to explicitly constrain the semantic margin of valid negative sample pairs, and is defined as follows: ;in , This is an interval hyperparameter used to control the minimum separation magnitude of the sorting constraints; To ensure that consistency constraints apply only to valid sample pairs that have passed periodic filtering, the weighted semantic consistency constraint is rewritten as follows: Step 4. Information compensation and missing feature reconstruction method based on uncertainty control; For the missing multivariate time series samples constructed in step 1 Any missing view sample in the collection Define its missing location set for: ;in , , These represent the variable index, time index, and feature channel index, respectively. For missing rate view The missing indicator variable is set below, with a value of 0 indicating a missing value at the corresponding position; based on the multi-scale semantic feature fusion result constructed in step 2, the missing samples are denoted as... In time index The corresponding fusion semantic representation is as follows: ,in This represents the unified semantic representation dimension obtained after multi-scale semantic feature fusion; simultaneously, the time periodic features obtained in step 2 are introduced. , The embedding dimension of the time-periodic features is used to represent the periodic structure and phase information in the time series, constructing a conditional semantic vector for missing feature reconstruction: ;in Indicated in the missing rate view Next, Time Index The conditional semantic context vector constructed at that point is composed of multi-scale fused semantic representations. With time cycle characteristics It is constructed through vector concatenation operations; This represents a vector concatenation operation; the vectors are indexed at the same time. The following applies to all variables and feature channels, and is shared for any missing location. The corresponding missing feature reconstruction values ​​are calculated through explicit linear mapping: ;in Indicated in the missing rate view Below, represented by conditional semantics The predicted first Variables in time ,aisle The missing feature reconstruction value at the location; and Define the missing reconstruction loss function for learnable parameters. for: ;in Represents the distance metric operator. This represents the actual observed value at the corresponding location, and its value comes from the original multivariate time series sample in the variable dimension. Time dimension and feature channels Numerical representation on; introduction of uncertainty-based indicators Information compensation regulation factor This is used to adaptively adjust the strength of the missing feature reconstruction term in the overall optimization objective; specifically, it is based on the semantic uncertainty index obtained in step 1. and the semantic alignment reliability function defined in step 3 The information compensation regulation factor is constructed as follows: Overall information compensation loss Defined as: Step 5. A multi-objective joint optimization method based on uncertainty control; For any missing rate condition Missing multivariate time series samples constructed below Prediction results based on student model output Define the prediction loss function , used to constrain the prediction accuracy of the model under the current missing conditions, is defined in the form of: ;in Indicates missing conditions Below, the student model for the first The electricity variables at the prediction time step The predicted value output at that point. This represents the corresponding actual predicted target value. For the number of variables, The prediction time step length; this loss term is used to constrain the model's prediction accuracy under the current missing conditions; Based on this, the semantic feature alignment loss constructed in step 3 is introduced. and semantic consistency constraint loss This is used to ensure the consistency of the semantic representation learned by the same sample under different missing rate views; in addition, a bias-reduction ranking constraint loss is further introduced. The minimum semantic margin constraint is applied only to valid negative sample pairs confirmed after periodic filtering and teacher semantic discrimination; the information compensation and missing feature reconstruction loss constructed in step 4 are introduced. This is used to enhance the model's ability to model and recover missing information under high missing information conditions; to achieve an adaptive balance among multiple objectives, a semantic alignment reliability adjustment coefficient is introduced. This is used to represent the overall reliability of semantic alignment and consistency constraints under the current missing condition; specifically, it is based on the semantic uncertainty index obtained in step 1. and the semantic alignment reliability function defined in step 3 Jointly optimize weights , This is used to adjust the weights of information compensation and missing feature reconstruction terms in the overall optimization objective; thus, a missing rate condition is constructed. The joint optimization objective function is as follows: For all missing rate condition sets By uniformly modeling the joint optimization objective under the given conditions, we obtain the overall training objective function: The Adam optimization algorithm is used to backpropagate and update the overall loss function, minimizing the model parameters to a weighted joint objective of prediction loss, semantic alignment loss, semantic consistency loss, and missing feature reconstruction loss constructed under different missing rate conditions. The optimization parameters are fixed at a learning rate of 0.0002, weight decay of 0.0001, batch size of 16, maximum gradient norm of 5, and a maximum number of training epochs of 100. To ensure convergence stability, if the relative rate of change of the overall loss function between adjacent epochs is less than a certain value for 10 consecutive epochs... Training will terminate when the condition is met; otherwise, training will automatically end after the 100th round.