An electric energy meter anomaly detection method based on a double memory enhanced autoencoder
By extracting multi-scale robust features through a dual-memory enhanced autoencoder and combining global and local contextual information for electricity meter anomaly detection, the problem of existing methods struggling to capture complex structures and noise interference in time series data is solved, thus improving detection performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIJING UNIV OF POSTS & TELECOMM
- Filing Date
- 2023-11-01
- Publication Date
- 2026-06-30
AI Technical Summary
Existing anomaly detection methods for smart meters struggle to effectively capture complex time structures and nonlinear relationships when processing time-series data. Furthermore, their model generalization ability is insufficient in the presence of noise and abnormal interference, resulting in poor anomaly detection performance.
We employ a dual-memory-enhanced autoencoder approach, which extracts multi-scale robust features through global and local memory-enhanced encoders, reconstructs them by combining global and local contextual information, and utilizes multi-level feature fusion for anomaly detection.
It improves the performance of anomaly detection, increases the distinction between normal and anomaly, mitigates the impact of the model on anomaly generalization, and maintains the ability to reconstruct atypical normal data.
Smart Images

Figure CN117491939B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of electricity metering technology, and more specifically, to an anomaly detection method for electricity meters based on a dual-memory enhanced autoencoder. Background Technology
[0002] Accelerating the digital transformation of the power system and building a smart grid aimed at achieving "dual carbon" (carbon emission reduction and carbon sequestration) is a crucial development path for the power system. Smart meters are one of the most important components of the smart grid, not only serving as the foundation for metering and display but also enabling two-way communication between electricity users and power companies. With the development of the smart grid, the functions of smart meters are becoming increasingly diverse. Timely detection and repair of faulty meters are of great significance for the stable operation of the power grid and the stable electricity consumption of users. The traditional electricity meter verification model, which involves replacing expired meters according to the verification cycle, cannot detect faulty meters in a timely manner and results in a huge waste of human and material resources. According to the relevant regulations of the State Administration for Market Regulation on mandatory verification, the traditional model of replacing expired residential electricity meters (hereinafter referred to as electricity meters) should be reformed, and a new model of extending the use or replacing meters based on their condition should be gradually implemented, replacing "replacement for inaccurate meters" with "replacement upon expiration." The implementation of replacement for inaccurate meters is of great significance for maintaining fair and just trade settlement between power companies and electricity customers, promoting the reform of the legal regulatory model for electricity meters, and contributing to the national "dual carbon" goal. To achieve this goal, it is urgent to transform the current traditional physical verification and calibration model for electricity meters and innovate the verification and calibration methods for electricity meters.
[0003] To build a smart grid, advanced sensor technology, communication technology, and intelligent power equipment are being applied to the power system, bringing significant intelligent upgrades to the grid. Smart meters, as the most widely used intelligent power devices, can collect multivariate time-series data such as current, voltage, and energy consumption readings from households and businesses. By analyzing the periodicity and trends of multivariate time series data and the correlations between variables, normal patterns in the multivariate time series can be learned, thereby detecting normal and abnormal data, and thus comprehensively analyzing and monitoring the operation of smart meters.
[0004] In reality, systems operate normally most of the time, with anomalies occurring infrequently and hidden within large amounts of normal data. Manually labeling such anomalies is both difficult and costly, resulting in a very limited pool of usable labeled data. Therefore, current research primarily focuses on unsupervised anomaly detection.
[0005] Classical unsupervised anomaly detection methods such as Local Outlier Factor (LOF), One-Class Support Vector Machine (OCSVM), and Isolation Forest (IF) have been applied to time series anomaly detection in smart meters. However, many traditional anomaly detection methods are not suitable for time series anomaly detection. The limitation of classical unsupervised anomaly detection methods lies in their failure to consider the temporal structure of time series data. They treat time series with chronological order as independent points, ignoring the contextual dependencies between adjacent time points, thus failing to capture the complex structural information and nonlinear relationships within the time series. Some methods use autoregressive methods such as the Autoregressive Integrated Moving Average (ARIMA) model to fit the correlation between historical time points and predict future data, but they still struggle to model complex, nonlinear, multivariate time series. In time series data, the definition between anomalous and normal behavior is often imprecise and constantly evolving. This lack of a clear, representative normal boundary poses a challenge to traditional methods. Furthermore, during inference, they still need to re-traverse the entire training set to find the most recent samples, making them computationally inefficient. These factors collectively contribute to the limited effectiveness of traditional machine learning methods in detecting anomalies in multivariate time series data from smart meters with large datasets. Recently, however, the rapid development of deep learning methods has led to impressive progress in unsupervised anomaly detection approaches based on deep learning.
[0006] Deep learning models, with their strong nonlinearity, can effectively mine patterns in high-dimensional data, and are therefore widely used in anomaly detection for smart meters (MTS). Deep learning-based anomaly detection methods for smart meters can be broadly categorized into reconstruction-based and prediction-based methods. Both methods share a fundamental assumption: the training data consists of normal data. Prediction-based methods train a prediction model using historical normal data to predict the time series at the next point in time or over a specific period. During testing, the error between the actual value and the predicted value at the next time point or period is used as the anomaly score. Because time series inherently exhibit strong randomness and unpredictability, and it's impossible to collect all variables affecting the time series in reality, prediction-based methods cannot accurately predict future values when the prediction period is too long. Therefore, most current deep learning-based anomaly detection methods focus on reconstruction-based methods. Reconstruction-based methods use an encoder to map the original normal data to a low-dimensional space, and then a decoder to restore the learned low-dimensional feature representation back to the original space. By setting an information bottleneck, representative features from the original time series are extracted. Since the training data is assumed to be normal data, models based on reconstruction methods cannot reconstruct abnormal sequences in the test sequence. Therefore, abnormal data has a larger reconstruction error and anomaly score compared to normal data. During testing, the reconstruction error between the original time series and the reconstructed time series can be used as the anomaly score to determine whether a sequence is positive or not.
[0007] Despite some achievements in reconstruction-based multidimensional time series anomaly detection methods for smart meters, they still face several challenges. First, in real-world datasets with noise and anomalies, existing reconstruction-based methods suffer from unexpected generalization of anomalies, leading to a lack of distinction between positive and negative anomalies and impacting detection performance. Some probabilistic generative models address this issue due to their implicit robustness to noise, allowing them to remain robust even with contaminated training data. However, this implicit robustness is not robust to different datasets and suffers from poor interpretability. Furthermore, deep generative models exhibit training instability and poor generalization when the training dataset is contaminated. Memory modules, as an explicit design, are used to address multivariate time series anomaly detection in smart meters, but while suppressing generalization to anomalies, they sacrifice the ability to reconstruct atypical normal data, resulting in reduced distinction between normal and anomaly detection. Second, existing time series anomaly detection methods struggle to consider long-term dependencies beyond the time window, limiting the model's ability to extract time-series patterns and reconstruct time windows. Summary of the Invention
[0008] To address the shortcomings of existing technologies, this invention provides a method for detecting anomalies in energy meters based on a dual-memory enhanced autoencoder.
[0009] According to one aspect of the present invention, a method for detecting anomalies in an energy meter based on a dual-memory enhanced self-encoder is provided, comprising:
[0010] Acquire multivariate long-term series data of historical measurements of the energy meter under test;
[0011] Divide multivariate long-term series data into multiple time windows of a preset window length;
[0012] Multiple time window data and their adjacent time window data are input into a pre-trained anomaly detection model, and the reconstructed data corresponding to the time window data is output. The anomaly detection model adopts a dual-memory augmented autoencoder.
[0013] Based on the reconstructed data and original data of each time window, the anomaly score of each time point in that time window is determined, and based on the anomaly score, the degree of anomaly of the energy meter under test at each time point is determined.
[0014] Optionally, after acquiring the multivariate long-term series data of the historical measurements of the energy meter under test, the method further includes: normalizing the multivariate long-term series data, wherein the preprocessing process is as follows:
[0015]
[0016]
[0017] Among them, P i j This represents the original input vector. Let α represent the variable vector after learnable normalization. j Let β represent the j-th element in the learnable mean vector. j W represents the j-th element in the learnable variance vector. p1 and W p2 X represents the weights in the adaptive weighted summation. i This represents the model input after passing through the learnable data preprocessing module.
[0018] Optionally, the training process for the anomaly detection model is as follows:
[0019] Acquire multivariate time series sample data from multiple historical energy meter readings and merge them into a single multivariate long time series sample data.
[0020] Multivariate long-term series sample data is windowed and divided into multiple time windows of sample data with preset windows;
[0021] Based on the pre-built global memory augmented encoder, features are extracted from each time window sample to be detected in multiple time window sample data, and the reconstructed global latent variables of each time window sample to be detected are output.
[0022] Based on the pre-built local memory enhancement encoder and the neighboring window samples of each time window sample to be detected, the reconstructed local latent variables of each time window sample to be detected are output.
[0023] The reconstructed global latent variables and the reconstructed local latent variables are fused using a weighted learning layer to output the reconstructed latent variables for each sample in the time window to be detected.
[0024] The reconstructed latent variables of each time window sample to be detected are decoded using an MLP-based neural network decoder, and the reconstructed sample data of each time window sample to be detected is output.
[0025] The total loss of the anomaly detection model is determined by pre-defined global sparse loss, local latent variable loss, and time window reconstruction loss.
[0026] The anomaly detection model is determined by updating and optimizing the network and parameters of the anomaly detection model based on the total loss.
[0027] Optionally, features are extracted from each sample of the time window to be detected in multiple time window sample data according to the pre-built global memory augmentation encoder, and the reconstructed global latent variables of each sample of the time window to be detected are output, including:
[0028] Each sample in the time window to be detected first passes through the encoder layer, which outputs global latent variables. The encoder layer consists of multiple stacked GRUs.
[0029] The global latent variables are input into the global memory layer to reconstruct the global latent variables, and the reconstructed global latent variables are output. The calculation process of the encoder layer is as follows:
[0030]
[0031]
[0032]
[0033]
[0034] In the formula, x t The input represents the samples to be detected within the time window. W and b are the learnable weight and bias parameter matrices of the network. ⊙ represents the Hadamard product of the two matrices. The output of the encoder layer is the global latent variable Z. g =[h1,h2,…,h T ], where ht This represents the output at time t, σ represents the operation of σ(·), T represents the number of time points within the window, and u t The parameter representing the update gate of the GRU indicates how much of the previous information needs to be updated. This represents a candidate intermediate state of the GRU, determined by resetting the gate r. t Controls the output h of the previous unit t-1 The proportion;
[0035] The calculation process of the global memory layer is as follows:
[0036]
[0037] in,
[0038]
[0039] Sim(Z g ,m i ) = Z g ·m i T
[0040] In the formula, the learnable matrix H represents the size of the memory matrix, K represents the feature dimension of each typical pattern in the memory matrix, and m i Sim(Z) represents the i-th memory feature in the memory module. g ,m i ) represents the similarity between the global latent variable and each memory feature. S represents the i-th vector in the score matrix. g The global memory score matrix is obtained by normalizing the similarity matrix.
[0041] Optionally, based on the pre-built local memory augmentation encoder and the neighboring window samples of each time window sample to be detected, the reconstructed local latent variables of each time window sample to be detected are output, including:
[0042] The adjacent window samples of each time window sample to be detected are encoded by a fully connected network to generate adjacent window encoded samples.
[0043] The adjacent window encoded samples are input into the encoder layer consisting of stacked GRUs, and the adjacent window sample matrix is output.
[0044] The global latent variable of each window sample to be detected is used as the query, and the local latent variables are reconstructed using the adjacent window sample matrix as the memory matrix. The reconstructed local latent variables are then output.
[0045] The calculation process for local latent variables is as follows:
[0046]
[0047] In the formula, S l Let e represent the local score matrix. i Z represents the i-th item in the adjacent window sample matrix. g For global latent variables, the adjacent window sample matrix L represents the number of adjacent windows, and B represents the dimension of the fully connected network after encoding.
[0048] Optionally, the formula for reconstructing the sample data is:
[0049]
[0050] in,
[0051]
[0052] In the formula, To reconstruct the latent variables, W1 and W2 are the weights of the reconstructed global latent variables and the reconstructed local latent variables, respectively. This represents the weight corresponding to each granularity of reconstructed latent variable, and o represents the number of layers in the multi-level information fusion.
[0053] Alternatively, the formula for the total loss is:
[0054] L=λ1Loss recon +λ2Loss latent +λ3Loss spar_sum
[0055] in,
[0056]
[0057]
[0058]
[0059] In the formula, Loss spar_sum Global sparse loss, Loss latent Local latent variable loss, Loss recon Time window reconstruction loss, This represents the i-th latent variable that is reconstructed for the first time. Let λi represent the i-th latent variable for the second reconstruction, and λ1, λ2, and λ3 represent weighting factors used to balance the three losses. This represents the sparse loss of the first encoder E1 in the encoder-decoder-encoder structure used in the anomaly detection model for the j-th sample at the i-th scale. Let E2 be the sparse loss of the second encoder in the encoder-decoder-encoder series for the j-th sample at the i-th scale, N be the number of training samples, and MSE() be the mean squared error.
[0060] Optionally, the formula for calculating the outlier score is:
[0061]
[0062] In the formula, AS t x is the outlier score at time step t. t This refers to the unreconstructed time window data at time t. The data is the reconstructed data at time t.
[0063] According to another aspect of the present invention, an anomaly detection device for an energy meter based on a dual-memory enhanced self-encoder is provided, comprising:
[0064] The acquisition module is used to acquire multivariate long-term series data of historical tests of the energy meter under test;
[0065] The partitioning module is used to divide multivariate long-term series data into multiple time windows of a preset window length;
[0066] The output module is used to input multiple time window data and their adjacent time window data into a pre-trained anomaly detection model and output the reconstructed data corresponding to the time window data. The anomaly detection model adopts a dual-memory augmented autoencoder.
[0067] The determination module is used to determine the anomaly score of each time point in the data of each time window based on the reconstructed data and the original data, and to determine the degree of anomaly of the energy meter under test at each time point based on the anomaly score.
[0068] According to another aspect of the present invention, a computer-readable storage medium is provided, the storage medium storing a computer program for performing the methods described in any of the above aspects of the present invention.
[0069] According to another aspect of the present invention, an electronic device is provided, the electronic device comprising: a processor; a memory for storing executable instructions of the processor; the processor being configured to read the executable instructions from the memory and execute the instructions to implement the method described in any of the preceding aspects of the present invention.
[0070] Therefore, this invention proposes a multi-scale, dual-memory-enhanced anomaly detection method for smart energy meters that couples global and local robust features. Unlike traditional anomaly detection methods that use only a single fixed window for feature extraction and reconstruction, this invention proposes a global-local dual-memory-enhanced autoencoder that takes consecutive adjacent windows as input. It extracts latent variables that fuse global and local features, and reconstructs these latent variables using typical global features and common features of the local context. Finally, it uses the fusion of multi-level features to construct the final latent variables for window reconstruction, thereby completing anomaly detection. The proposed method mitigates the sacrifice in normal pattern reconstruction ability while effectively suppressing the model's undesirable generalization to anomalies. Furthermore, by simultaneously considering semantic information representing different levels, it obtains more representative robust features, increases the distinction between positive anomalies, and improves anomaly detection performance. Attached Figure Description
[0071] Exemplary embodiments of the present invention can be more fully understood by referring to the following figures:
[0072] Figure 1 This is a flowchart illustrating an exemplary embodiment of the present invention for a method of detecting anomalies in an energy meter based on a dual-memory enhanced autoencoder.
[0073] Figure 2 This is a schematic diagram of the overall process of an energy meter anomaly detection method based on a dual-memory enhanced autoencoder provided in an exemplary embodiment of the present invention;
[0074] Figure 3 This is a flowchart of global and local context information fusion provided by an exemplary embodiment of the present invention;
[0075] Figure 4 This is a schematic diagram of the structure of an energy meter anomaly detection device based on a dual-memory enhanced self-encoder provided in an exemplary embodiment of the present invention;
[0076] Figure 5 This is the structure of an electronic device provided in an exemplary embodiment of the present invention. Detailed Implementation
[0077] Hereinafter, exemplary embodiments according to the present invention will be described in detail with reference to the accompanying drawings. Obviously, the described embodiments are merely some embodiments of the present invention, and not all embodiments of the present invention. It should be understood that the present invention is not limited to the exemplary embodiments described herein.
[0078] It should be noted that, unless otherwise specifically stated, the relative arrangement, numerical expressions, and values of the components and steps described in these embodiments do not limit the scope of the invention.
[0079] Those skilled in the art will understand that the terms "first," "second," etc., in the embodiments of the present invention are only used to distinguish different steps, devices, or modules, and do not represent any specific technical meaning, nor do they indicate a necessary logical order between them.
[0080] It should also be understood that in the embodiments of the present invention, "multiple" can refer to two or more, and "at least one" can refer to one, two or more.
[0081] It should also be understood that any component, data or structure mentioned in the embodiments of the present invention can generally be understood as one or more unless explicitly defined or given contrary instructions in the context.
[0082] Furthermore, the term "and / or" in this invention is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. Additionally, the character " / " in this invention generally indicates that the preceding and following related objects have an "or" relationship.
[0083] It should also be understood that the description of the various embodiments in this invention emphasizes the differences between the various embodiments, and the similarities or similarities can be referred to each other. For the sake of brevity, they will not be described in detail.
[0084] At the same time, it should be understood that, for ease of description, the dimensions of the various parts shown in the accompanying drawings are not drawn according to actual scale.
[0085] The following description of at least one exemplary embodiment is merely illustrative and is in no way intended to limit the invention or its application or use.
[0086] Techniques, methods, and equipment known to those skilled in the art may not be discussed in detail, but where appropriate, they should be considered part of the specification.
[0087] It should be noted that similar labels and letters in the following figures indicate similar items; therefore, once an item is defined in one figure, it does not need to be discussed further in subsequent figures.
[0088] This invention can be applied to electronic devices such as terminal devices, computer systems, and servers, and can operate with a wide range of other general-purpose or special-purpose computing system environments or configurations. Examples of well-known terminal devices, computing systems, environments, and / or configurations suitable for use with terminal devices, computer systems, servers, and other electronic devices include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments including any of the above systems, etc. Terminal devices, computer systems, servers, and other electronic devices can be described in the general context of computer system executable instructions (such as program modules) executed by a computer system. Typically, program modules can include routines, programs, object programs, components, logic, data structures, etc., which perform specific tasks or implement specific abstract data types. Computer systems / servers can be implemented in distributed cloud computing environments where tasks are performed by remote processing devices linked via a communication network. In distributed cloud computing environments, program modules can reside on local or remote computing system storage media, including storage devices.
[0089] Exemplary methods
[0090] Figure 1 This is a schematic flowchart of an exemplary embodiment of the present invention, illustrating a method for detecting anomalies in an energy meter based on a dual-memory enhanced self-encoder. This embodiment can be applied to electronic devices, such as… Figure 1 As shown, the energy meter anomaly detection method 100 based on dual-memory enhanced autoencoder includes the following steps:
[0091] Step 101: Obtain multivariate long-term series data of historical tests of the electricity meter under test;
[0092] Step 102: Divide the multivariate long-term series data into multiple time windows of a preset window length;
[0093] Step 103: Input multiple time window data and their adjacent time window data into a pre-trained anomaly detection model, and output the reconstructed data corresponding to the time window data. The anomaly detection model adopts a dual-memory augmented autoencoder.
[0094] Step 104: Determine the anomaly score for each time point of the data in each time window based on the reconstructed data and the original data, and determine the degree of anomaly of the energy meter under test at each time point based on the anomaly score.
[0095] Specifically, most existing smart meter anomaly detection methods based on reconstruction assume the use of clean data for training. However, in reality, noise and even contamination exist, and these methods can still reconstruct anomalies well, leading to a lack of distinction between normal and abnormal data, thus affecting the anomaly detection performance. Some anomaly detection methods targeting noise and anomalies suffer from limitations in the implicit structure itself or sacrifice of the ability to reconstruct normal time patterns, resulting in limited improvement in anomaly detection performance. This application proposes a multi-scale dual-memory enhanced global-local feature extraction framework, which extracts robust features for reconstruction from two dimensions: different semantic information and temporal dependency length. First, unlike previous methods where feature extraction is limited to a single fixed window, this application designs consecutive adjacent windows as model inputs to extract robust features that fuse global and local information. Second, this application proposes a dual-memory enhanced encoder that couples global and local features to simultaneously extract global and local contextual information, and reconstructs latent variables using global typical patterns and local common features, thus suppressing the model's ability to generalize to anomalies while ensuring the ability to reconstruct atypical normal data. Finally, this application fuses latent variables representing different levels of semantic information and uses an encoder-decoder-encoder structure to train the model. It uses the reconstructed latent variables to generate reconstructed samples for anomaly detection, which increases the distinction between normal and abnormal and improves the performance of anomaly detection.
[0096] The robust feature extraction framework proposed in this application, which involves multi-scale global-local feature coupling, is as follows: Figure 2 As shown, it mainly adopts an Encoder-Decoder-Encoder structure, primarily containing a multi-scale global-local dual-memory enhancement coding module and a nonlinear decoding module. Let X represent the domain of the data sample, and Z represent the domain of the encoded latent variables. Using f... E (·): X→Z represents the encoder mapping from the data sample domain to the latent variable domain, f D (·): Z→X represents the decoder mapping from the latent variable domain to the data sample domain. Given a sample input X∈X after learnable data preprocessing, a multi-level global-local fusion encoder extracts and reconstructs features to obtain a reconstructed latent variable that integrates typical patterns and common features. Then, the data is reconstructed using a non-linear decoding module to obtain the reconstructed data sample. To mitigate the impact of the mismatch between the extracted abstract memory features and the pixel-level reconstruction loss requirements on model training, the proposed method performs feature extraction again on the reconstructed data samples to obtain a second set of latent variables for reconstruction. The overall process is shown in the following formula:
[0097]
[0098]
[0099]
[0100] Where, θ E1 θ represents the parameters of the first encoder. D The decoder parameters, θ E2 This represents the parameters of the second encoder. During the testing phase, normal samples can be well reconstructed using multi-level typical patterns and common features, while anomalies, due to the reconstruction of latent variables, result in decoded data that differs significantly from the original anomalous samples. Therefore, we use the reconstruction error between the input sample and the reconstructed sample as the anomaly score, and employ a fixed threshold method to classify samples with anomalies greater than the threshold as anomalous samples, and samples with anomaly scores less than or equal to the threshold as normal samples.
[0101] 1. Problem Description: Given a time series... This represents a series of data in chronological order, where TL represents the length of the time series. Let χ represent the vector data at time point t, and F represent the number of variables in the time series. For multivariate time series anomaly detection tasks, the goal is to determine whether anomalies have occurred at each time point in the time series χ. Since sparse anomalies are hidden within a large amount of normal data, and labeled data is difficult to obtain easily, it is not possible to directly determine anomalies at time point x. i Positive anomalies. This application employs an unsupervised approach to compress and reconstruct each time point, using the reconstruction error between the reconstructed data and the input data as the anomaly score for that time point, and combining this with a threshold to determine positive anomalies. To better extract the complex temporal dependencies and correlations between variables in the time series, this application uses a sliding window approach to divide the original time series. Using a preset sliding step size SP and window size T, the original time series χ is divided into a window set D for subsequent model training and testing, i.e., D = {P...} 0~T ,P SP~SP+T ,…,P i*SP~i*SP+T ,…}, where P i*SP~i*SP+T ={x i*SP ,x i*SP+1 ,···,x i*SP+T}
[0102] Learnable data preprocessing modules: Traditional data preprocessing typically uses fixed statistics to normalize or standardize data, scaling the original data to a fixed interval to eliminate the dimensions of the original data, thus accelerating convergence and improving model accuracy. However, traditional data preprocessing may result in inaccurate calculations of statistical properties when the training data contains outliers or atypical distributions, leading to unsatisfactory data normalization results. Furthermore, real-world time series data have many complex characteristics; for example, MTS (Mean Transformer Series) is often a feature-coupled time series, and traditional data preprocessing methods may alter the relative weights between various features, affecting model training.
[0103] In this application, we designed a learnable data preprocessing module, which includes a learnable standardization module and a learnable weighted combination module. This module allows the model to automatically learn and adapt to the mean and variance of different data distributions. This makes data preprocessing more flexible and adaptable to changes in various data characteristics. Furthermore, the module uses learnable weights to perform a weighted sum of the learnable standardization and the original data as the final input data, enabling the model to adaptively select information contained in the original data and avoid information loss during preprocessing. Traditional data preprocessing methods are usually performed before data input, while the method proposed in this application integrates the data preprocessing process into the model, allowing the entire model to learn end-to-end. This helps the model better understand the inherent characteristics of the data, and the preprocessing parameters can be optimized using the backpropagation algorithm and trained together with other model parameters. This allows the model to adaptively learn the preprocessing parameters, further improving model performance. For a sliding time window... The j-th variable vector The specific computational process for learnable data preprocessing is as follows:
[0104]
[0105]
[0106] in, Let α represent the variable vector after learnable normalization. j Let β represent the j-th element in the learnable mean vector. j W represents the j-th element in the learnable variance vector. p1 and W p2 X represents the weights in the adaptive weighted summation. i This represents the model input after passing through the learnable data preprocessing module.
[0107] 2. The robust feature extraction framework with multi-scale global-local feature coupling proposed in this application is as follows: Figure 2 As shown, its main body adopts an encoder-decoder-encoder structure, mainly including a multi-scale global-local dual-memory enhancement coding module and a nonlinear decoding module.
[0108] 2.1 Multi-scale Global-Local Dual-Memory Enhanced Encoding Module: Complex time series generally contain semantic information at different levels, such as year, month, and day. Using only a single-dimensional memory module can only record one level of high-level semantic information, ignoring the positive role of different information in reconstruction. Therefore, this application proposes a multi-level encoding feature fusion module that integrates memory modules of different dimensions. This module extracts information from the window to be reconstructed at multiple levels, with each granularity containing the fusion of global and local information to better reconstruct typical and atypical normal patterns. Through the multi-level encoding feature fusion module, this application constructs a three-dimensional information extraction structure with different information granularities (high-dimensional and low-dimensional) and different information breadths (global and local). This fully extracts the pattern information of normal time series, while the global and local memory modules greatly avoid the model's overgeneralization of anomalies, enabling the model to better distinguish between positive and abnormal patterns during testing.
[0109] Global Memory Enhanced Encoder: To avoid poor reconstruction of anomalies, this application specifically designs a globally memory-enhanced encoder to preserve typical global normal patterns and reconstruct encoded latent variables based on these patterns. First, the preprocessed time window to be reconstructed passes through an encoder layer to obtain the global latent variable Z. g Then, new latent vectors are reconstructed by retrieving typical features from the memory module and performing linear combinations of them. Throughout the training phase, the memory modules share parameters. The global memory augmentation encoder is described in detail below, and its flowchart is as follows: Figure 3 As shown, the encoder layer consists of multiple stacked GRU (Gate Recurrent Unit) layers. GRU is an improved version of RNN (Recurrent Neural Network). While inheriting RNN's good ability to extract temporal dependencies, it improves upon the gradient vanishing problem that occurs when processing long sequences, resulting in better performance in modeling long-term sequences compared to RNN. GRU controls the influence of previous data on subsequent data through update and reset gates, where u... t The parameter representing the update gate of the GRU indicates how much of the previous information needs to be updated. This represents a candidate intermediate state of the GRU, which is determined by resetting the gate r. t Controls the output h of the previous unit t-1 The percentage. The calculation process for GRU is as follows:
[0110]
[0111]
[0112]
[0113]
[0114] Where x t The input is represented by Z, where W and b are the learnable weight and bias parameter matrices of the network, ⊙ represents the Hadamard product of the two matrices, and Z is the output of the encoder layer. g =[h1,h2,…,h T ], σ represents the operation of σ(·).
[0115] After obtaining the global latent variables, this application uses global memory to reconstruct the latent variables to alleviate the model's overfitting to anomalies. The memory module is essentially a learnable matrix. H represents the size of the memory matrix, K represents the feature dimension of each typical pattern in the memory matrix, and m i Let z represent the i-th memory feature in the memory module. This application uses the global latent variable z... g Treating it as a query, calculate its similarity Sim(Z) with each memory feature. g ,m i Sim(·) is calculated as follows:
[0116] Sim(Z g ,m i ) = Z g ·m i T
[0117] Where T denotes the transpose of the matrix. Then, the obtained similarity matrix is normalized to obtain the global memory score matrix S. g :
[0118]
[0119] A higher similarity in a memory feature indicates a more likely normal pattern for the sample, therefore this memory feature needs to be more involved in reconstructing the latent variables. The final reconstructed latent variables... It can be obtained through the following calculation:
[0120]
[0121] in Let represent the i-th vector in the score matrix. However, the dense score matrix S can still reconstruct anomalies through complex combinations of multiple memory items. To mitigate this, this application follows previous methods and utilizes sparse loss. spar To constrain and minimize Sg :
[0122]
[0123] Sparse addressing encourages models to use fewer but more relevant memory items to represent a sample, thereby leading to the learning of more information representations in memory.
[0124] The encoder with local memory enhancement: Due to the information bottleneck of the memory matrix, it can only store typical features globally and reconstruct latent variables based on these typical features. For pattern features that are normal but not globally typical, relying solely on the memory module often fails to reconstruct the time window effectively. Therefore, this application proposes using local memory enhancement to complement global memory for better reconstruction and anomaly detection. Unlike traditional methods that only use the window to be reconstructed for information compression and reconstruction, this application uses adjacent windows to provide local memory information. Adjacent windows often share more commonalities. During training and testing, this application reconstructs local latent variable features by extracting common features from adjacent windows, making the local memory module similar to the global memory module. This also alleviates the model's overgeneralization of anomalies and, based on this, better reconstructs normal time windows.
[0125] For the time window X to be detected t This application sets its adjacent windows {…,X t-2 ,X t-1 ,X t+1 ,X t+2 The adjacent windows are used as input to the Local Memory Augmented Encoder (LDE). This application does not include the detection time window in the LDE input to avoid learning potential anomalous pattern information from the samples themselves during training, which could affect the distinction between positive and anomalous samples during testing. First, this application encodes adjacent windows using a fully connected network. The encoded adjacent windows are then input into an encoder layer composed of stacked GRUs for feature extraction to obtain the adjacent window matrix. Where L represents the number of adjacent windows, and N represents the dimension of the fully connected network after encoding.
[0126] Then, this application uses the window to be detected, X. t The global latent variable Z obtained in global encoding g As a query, the local latent variables are reconstructed using the adjacent window matrix as the local memory matrix. The calculation process is as follows:
[0127]
[0128] Where S l Let e represent the local score matrix. iThis represents the i-th item in the adjacent window matrix.
[0129] Memory Feature Fusion: After obtaining the reconstructed global and local latent variables, this application designs a learnable weight learning layer to fuse global and local information, thereby obtaining the fused reconstructed latent variables. The weights of global and local latent variables are continuously updated during the learning process to find the most suitable weight combination for reconstruction. During testing, the weights are no longer updated; instead, the weights saved during training at the minimum loss are used as the weights for combining global and local latent variables. (Latent variable reconstruction) The calculation is as follows:
[0130]
[0131] W1 and W2 are the weights of the combination of global and local latent variables, respectively.
[0132] This invention sets up global and local memory modules with different dimensions according to gradients, forming global-local encoding fusion modules of different granularities. Similar to the feature fusion of global and local memory modules, this invention designs a weight-learnable network layer to adaptively fuse encoding modules of different granularities to more completely extract information of the window to be reconstructed for reconstruction and anomaly detection. The final fused latent variables are used for reconstruction. The calculation is as follows:
[0133]
[0134] in This represents the weight corresponding to the latent variable in each granularity of reconstruction, and o represents the number of layers in the multi-level information fusion.
[0135] 2.2 Nonlinear Decoder Module: This module handles the reconstructed latent variables obtained after the multi-level global-local fusion encoder. This application designs a neural network based on MLP as a decoder to reconstruct latent variables. The decoder consists of two fully connected layers and an activation function. The two fully connected layers map the latent variables to the original data space, and the activation function increases the nonlinearity of the network. The designed decoder has a simple structure, few parameters, and effectively prevents model overfitting while reconstructing the data well. For the extracted sample pairs features... This application can obtain the reconstruction result of the input window through a decoder. The calculation process is as follows:
[0136]
[0137] Among them W D1 W D2represents the weight parameters of the first and second linear layers in the decoder, respectively. GELU(·) represents the activation function of GELU. This application defines the decoding reconstruction loss as Loss. recon For input sample X i and decoding reconstruction output The mean absolute error (MSE) is calculated using the following formula:
[0138]
[0139] Where N represents the number of training samples, and MSE(·) represents the calculation of mean squared error.
[0140] To facilitate better model training, this application defines the overall model loss L as the sum of the global sparsity loss, the local latent variable loss, and the time window reconstruction loss, calculated as follows:
[0141] L=λ1Loss recon +λ2Loss latent +λ3Loss spar_sum ,
[0142]
[0143]
[0144] in This represents the i-th latent variable that is reconstructed for the first time. Let λi represent the i-th latent variable for the second reconstruction, and λ1, λ2, and λ3 represent weighting factors used to balance the three losses. This represents the sparse loss of the E1 encoder for the j-th sample at the i-th scale. Let represent the sparse loss of the E2 encoder for the j-th sample at the i-th scale. In the training of this application, λ1 = λ2 = λ3 = 1 is set to make the three parts equally important.
[0145] 3 Anomaly Detection
[0146] The proposed deep hybrid normalization module is trained together with the autoencoder model. During the testing phase, similar to many other reconstruction-based methods, the energy meter anomaly detection method (MSDMM) based on a dual-memory augmented autoencoder in this application obtains its anomaly score by calculating the reconstruction error of the test data at each time step, i.e.:
[0147]
[0148] AS t x represents the outlier score at time step t. t This represents the unreconstructed data at time t. Table t shows the reconstructed data at time point t. The choice of threshold depends on the application scenario, and there are also many studies on dynamically configuring thresholds based on anomaly scores. This invention focuses on designing a framework for learning high-level semantic features of data and performing anomaly detection. Therefore, the experimental results reported in this invention are based on the highest-scoring threshold, as has been done in previous work.
[0149] Furthermore, MSDMM was compared with 15 other advanced models on five authoritative real-world datasets representing the diversity of time-series data distributions and actual smart meter datasets, demonstrating the effectiveness and advancement of the proposed multi-dimensional time-series anomaly detection method for electricity meters.
[0150] 1. Evaluation Metric: AUC-ROC was selected as the evaluation metric to assess the performance of the proposed method and the baseline. AUC-ROC is a commonly used metric in anomaly detection, representing the area under the receiver operating characteristic curve (ROC) with FPR (False Positive Rate) and TPR (True Positive Rate) as the x and y axes, obtained at different thresholds. AUC-ROC directly reflects the anomaly detection performance of the algorithm after excluding the influence of the threshold. Its range is from 0 to 1; a perfect algorithm has an AUC-ROC value of 1, while a random guessing model has an AUC value close to 0.5. The formulas for calculating TPR and FPR are shown below.
[0151]
[0152] Wherein, TP (True Postive) and FP (False Postive) are the number of true and false postive time points detected, respectively, and TN (True Negative) and FN (False Negative) are the number of true and false normal time points detected, respectively.
[0153] 2. Comparison Methods: This invention compares the proposed method with 15 multivariate time series anomaly detection methods, including traditional machine learning methods: LOF, OCSVM, Isolation Forest (IF); prediction-based methods: GDN, GTA; and reconstruction methods: MSCRED, USAD, InterFusion, TranAD, AT, CAE-AD, RAE, DiffAD, TSMAE, and MAUTAD. Specific information on the comparison methods is shown in Table 1.
[0154] Table 1 Comparison Methods
[0155]
[0156]
[0157] 3. Implementation Details: MSDMM was implemented using Python 3.8 and PyTorch 1.11. During training, MSDMM used the Adam optimizer for model optimization. The learning rate was set to 1e-4, the batch size to 32, and the number of epochs to 50. Early stopping was used to stop the training process early. When the validation loss for 5 consecutive epochs is greater than the historical minimum validation loss, the training process will stop early, and the network parameters with the minimum validation loss will be selected as the optimal training result. When the reconstruction loss on the validation set does not decrease for 5 consecutive epochs, training will stop early, and the model with the lowest reconstruction loss on the validation set will be retained. All experiments were conducted on a workstation with an Intel(R) Core(TM) I9-10900x 10-core 3.70GHz CPU and an NVIDIA GeForce RTX 3090 GPU. All experimental results used in this application are the average results after 5 independent runs on different seeds.
[0158] 4. Introduction to Public Datasets: Five real-world datasets from three scenarios were used. Table 2 summarizes the properties of these datasets.
[0159] Secure Water Treatment (SWaT). SWaT is data collected by 51 sensors in a continuously operating water treatment system, which records anomalous events caused by cyber and physical attacks.
[0160] ServerMachine Dataset (SMD) is a dataset collected and publicly released by a large internet company over a five-week period from server machines with 38 monitoring metrics.
[0161] PSM (Pooled Server Metrics) is a dataset collected from multiple application server nodes within eBay, comprising 26 dimensions.
[0162] The Mars Science Laboratory (MSL) dataset and the Soil Moisture Active Passive (SMAP) dataset. Both MSL and SMAP datasets are real-world datasets from NASA, with 55 and 25 dimensions respectively, and contain telemetry anomaly data derived from Incident Surprise Anomaly (ISA) reports from spacecraft monitoring systems.
[0163] Table 2 Attributes of the datasets in the experiment
[0164]
[0165]
[0166] 5. Evaluation of Results on Public Datasets: Table 3 reports the scores of MSDMM and other baseline models on the AUC metric. Bold text indicates the best score on each dataset, and underlined text indicates the second-best score. Overall, MSDMM's average AUC score is significantly higher than other baseline models, and it achieves the best average ranking, demonstrating that MSDMM outperforms all the comparison methods.
[0167] Table 3 Comparison results of baseline method and MSDMM under various indicators.
[0168]
[0169] 6. Introduction to the Smart Meter Dataset: The specific characteristics of the Smart Meter Dataset (ELE) are shown in Table 4. This dataset is collected from 9 physical three-phase meters across multiple distribution areas. Each meter includes 22 sensor values: current (phase A, phase B, phase C), voltage (phase A, phase B, phase C), energy readings (positive active), energy readings (reverse active), energy readings (positive reactive), energy readings (reverse reactive), active power (phase A, phase B, phase C, total), reactive power (phase A, phase B, phase C, total), and power factor (phase A, phase B, phase C, total).
[0170] Table 4 Characteristics of Actual Electricity Meter Data Sets
[0171]
[0172] 7. Introduction to the smart meter dataset: Table 5 shows the AUC results of the embodiments of the present invention and other comparative methods on the smart meter dataset. The results show that MSDMM achieved the highest score in the AUC metric, indicating that its anomaly detection performance on the actual smart meter dataset is significantly superior to the other methods, demonstrating the reliability and superiority of MSDMM in real-world environments.
[0173] Table 5 Evaluation of the Electricity Meter Dataset Results
[0174] method OCSVM LOF <![CDATA[iFo res t]]> MSCRED USAD <![CDATA[I n t er F us I n ]]> GDN GTA AUC 0.6009 0.5691 0.5256 0.5722 0.461 0.5976 0.5617 0.5429 method TranAD AT CAE_AD TSMAE RAE MAUTAD DiffAD MSDMM AUC 0.5048 0.4947 0.5797 0.5341 0.5347 0.6095 0.5344 0.6231
[0175] Therefore, this invention proposes a multi-scale, dual-memory-enhanced anomaly detection method for smart energy meters that couples global and local robust features. Unlike traditional anomaly detection methods that use only a single fixed window for feature extraction and reconstruction, this invention proposes a global-local dual-memory-enhanced autoencoder that takes consecutive adjacent windows as input. It extracts latent variables that fuse global and local features, and reconstructs these latent variables using typical global features and common features of the local context. Finally, it uses the fusion of multi-level features to construct the final latent variables for window reconstruction, thereby completing anomaly detection. The proposed method mitigates the sacrifice in normal pattern reconstruction ability while effectively suppressing the model's undesirable generalization to anomalies. Furthermore, by simultaneously considering semantic information representing different levels, it obtains more representative robust features, increases the distinction between positive anomalies, and improves anomaly detection performance.
[0176] Exemplary device
[0177] Figure 4 This is a schematic diagram of the structure of an energy meter anomaly detection device based on a dual-memory enhanced self-encoder provided in an exemplary embodiment of the present invention. Figure 4 As shown, the device 400 includes:
[0178] The acquisition module 410 is used to acquire multivariate long-term series data of historical detection of the energy meter under test;
[0179] The partitioning module 420 is used to partition multivariate long-term series data into multiple time windows of a preset window length;
[0180] The output module 430 is used to input multiple time window data and their adjacent time window data into a pre-trained anomaly detection model and output the reconstructed data corresponding to the time window data. The anomaly detection model adopts a dual-memory augmented autoencoder.
[0181] The determination module 440 is used to determine the anomaly score of each time point of the data in each time window based on the reconstructed data and the original data of each time window, and to determine the degree of anomaly of the energy meter under test at each time point based on the anomaly score.
[0182] Optionally, after acquiring the multivariate long-term series data of the historical detection of the energy meter under test, the device 400 further includes: a preprocessing module for normalizing the multivariate long-term series data, wherein the preprocessing process is as follows:
[0183]
[0184]
[0185] Among them, P i jThis represents the original input vector. Let α represent the variable vector after learnable normalization. j Let β represent the j-th element in the learnable mean vector. j W represents the j-th element in the learnable variance vector. p1 and W p2 X represents the weights in the adaptive weighted summation. i This represents the model input after passing through the learnable data preprocessing module.
[0186] Optionally, the training process of the anomaly detection model in output module 430 is as follows:
[0187] The acquisition submodule is used to acquire multivariate time series sample data from multiple historical energy meters and merge them into a single multivariate long-term sample data.
[0188] The partitioning submodule is used to window multivariate long-term series sample data and divide it into multiple time windows of sample data with preset windows;
[0189] The first output submodule is used to extract features from each time window sample to be detected in multiple time window sample data according to the pre-built global memory enhancement encoder, and output the reconstructed global latent variables of each time window sample to be detected.
[0190] The second output submodule is used to output the reconstructed local latent variables of each time window sample to be detected based on the pre-built local memory enhancement encoder and the neighboring window samples of each time window sample to be detected.
[0191] The third output submodule is used to fuse the reconstructed global latent variables and the reconstructed local latent variables using the weight learning layer to output the reconstructed latent variables of each sample in the time window to be detected.
[0192] The fourth output submodule is used to decode the reconstructed latent variables of each sample in the time window to be detected using an MLP-based neural network decoder, and output the reconstructed sample data of each sample in the time window to be detected.
[0193] The first determination submodule is used to determine the total loss of the anomaly detection model through preset global sparse loss, local latent variable loss and time window reconstruction loss;
[0194] The second determination submodule is used to update and optimize the network and parameters of the anomaly detection model based on the total loss, and to determine the anomaly detection model.
[0195] Optionally, the first output submodule includes:
[0196] The first output unit is used to pass each sample in the time window to be detected through the encoder layer and output the global latent variable, wherein the encoder layer is composed of multiple stacked GRUs;
[0197] The second output unit is used to input global latent variables into the global memory layer to reconstruct global latent variables, and output the reconstructed global latent variables. The calculation process of the encoder layer is as follows:
[0198]
[0199]
[0200]
[0201]
[0202] In the formula, x t The input represents the samples to be detected within the time window. W and b are the learnable weight and bias parameter matrices of the network. ⊙ represents the Hadamard product of the two matrices. The output of the encoder layer is the global latent variable Z. g =[h1,h2,…,h T ], where h t This represents the output at time t, σ represents the operation of σ(·), T represents the number of time points within the window, and u t The parameter representing the update gate of the GRU indicates how much of the previous information needs to be updated. This represents a candidate intermediate state of the GRU, determined by resetting the gate r. t Controls the output h of the previous unit t-1 The proportion;
[0203] The calculation process of the global memory layer is as follows:
[0204]
[0205] in,
[0206]
[0207] Sim(Z g ,m i ) = Z g ·m i T
[0208] In the formula, the learnable matrix H represents the size of the memory matrix, K represents the feature dimension of each typical pattern in the memory matrix, and m i Sim(Z) represents the i-th memory feature in the memory module. g ,m i) represents the global latent variable and its similarity to each memory feature. S represents the i-th vector in the score matrix. g The global memory score matrix is obtained by normalizing the similarity matrix.
[0209] Optionally, the second output submodule includes:
[0210] The generation unit is used to encode the adjacent window samples of each time window sample to be detected through a fully connected network, and generate adjacent window encoded samples.
[0211] The third output unit is used to input the coded samples of adjacent windows into the encoder layer stacked by GRUs and output the adjacent window sample matrix.
[0212] The fourth output unit is used to reconstruct local latent variables by using the global latent variables of each window sample as a query and the adjacent window sample matrix as a memory matrix, and outputs the reconstructed local latent variables.
[0213] The calculation process for local latent variables is as follows:
[0214]
[0215] In the formula, S l Let e represent the local score matrix. i Z represents the i-th item in the adjacent window sample matrix. g For global latent variables, the adjacent window sample matrix L represents the number of adjacent windows, and B represents the dimension of the fully connected network after encoding.
[0216] Optionally, the formula for reconstructing the sample data is:
[0217]
[0218] in,
[0219]
[0220] In the formula, To reconstruct the latent variables, W1 and W2 are the weights of the reconstructed global latent variables and the reconstructed local latent variables, respectively. This represents the weight corresponding to each granularity of reconstructed latent variable, and o represents the number of layers in the multi-level information fusion.
[0221] Alternatively, the formula for the total loss is:
[0222] L=λ1Loss recon +λ2Loss latent +λ3Loss spar_sum
[0223] in,
[0224]
[0225]
[0226]
[0227] In the formula, Loss spar_sum Global sparse loss, Loss latent Local latent variable loss, Loss recon Time window reconstruction loss, This represents the i-th latent variable that is reconstructed for the first time. Let λi represent the i-th latent variable for the second reconstruction, and λ1, λ2, and λ3 represent weighting factors used to balance the three losses. This represents the sparse loss of the first encoder E1 in the encoder-decoder-encoder structure used in the anomaly detection model for the j-th sample at the i-th scale. Let E2 be the sparse loss of the second encoder in the encoder-decoder-encoder series for the j-th sample at the i-th scale, N be the number of training samples, and MSE() be the mean squared error.
[0228] Optionally, the formula for calculating the outlier score is:
[0229]
[0230] In the formula, AS t x is the outlier score at time step t. t This refers to the unreconstructed time window data at time t. The data is the reconstructed data at time t.
[0231] Exemplary electronic devices
[0232] Figure 5 This is the structure of an electronic device provided in an exemplary embodiment of the present invention. For example... Figure 5 As shown, the electronic device 50 includes one or more processors 51 and memory 52.
[0233] The processor 51 may be a central processing unit (CPU) or other form of processing unit with data processing and / or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.
[0234] The memory 52 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and / or non-volatile memory. The volatile memory may include, for example, random access memory (RAM) and / or cache memory. The non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 51 may execute the program instructions to implement the methods of the software programs of the various embodiments of the present invention described above, and / or other desired functions. In one example, the electronic device may also include an input device 53 and an output device 54, these components being interconnected via a bus system and / or other forms of connection mechanisms (not shown).
[0235] In addition, the input device 53 may also include, for example, a keyboard, a mouse, etc.
[0236] The output device 54 can output various information to the outside. The output device 54 may include, for example, a display, a speaker, a printer, and a communication network and its connected remote output devices, etc.
[0237] Of course, for the sake of simplicity, Figure 5 Only some of the components of this electronic device relevant to the present invention are shown, omitting components such as buses, input / output interfaces, etc. In addition, the electronic device may include any other suitable components depending on the specific application.
[0238] Exemplary computer program products and computer-readable storage media
[0239] In addition to the methods and apparatus described above, embodiments of the present invention may also be computer program products, which include computer program instructions that, when executed by a processor, cause the processor to perform the steps in the methods according to various embodiments of the present invention described in the "Exemplary Methods" section above.
[0240] The computer program product can be written in any combination of one or more programming languages to perform the operations of the embodiments of the present invention. The programming languages include object-oriented programming languages such as Java and C++, as well as conventional procedural programming languages such as C or similar languages. The program code can be executed entirely on the user's computing device, partially on the user's computing device, as a standalone software package, partially on the user's computing device and partially on a remote computing device, or entirely on a remote computing device or server.
[0241] Furthermore, embodiments of the present invention may also be computer-readable storage media storing computer program instructions thereon, which, when executed by a processor, cause the processor to perform the steps of the methods according to various embodiments of the present invention described in the "Exemplary Methods" section above.
[0242] The computer-readable storage medium may be any combination of one or more readable media. A readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or any combination thereof. More specific examples (a non-exhaustive list) of readable storage media include: an electrical connection having one or more wires, a portable disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof.
[0243] The basic principles of the present invention have been described above with reference to specific embodiments. However, it should be noted that the advantages, benefits, and effects mentioned in the present invention are merely examples and not limitations, and should not be considered as essential features of each embodiment of the present invention. Furthermore, the specific details disclosed above are for illustrative and facilitative purposes only, and are not limitations. These details do not limit the present invention to the necessity of employing the aforementioned specific details.
[0244] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For system embodiments, since they largely correspond to method embodiments, the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments.
[0245] The block diagrams of devices, systems, devices, and systems involved in this invention are merely illustrative examples and are not intended to require or imply that they must be connected, arranged, or configured in the manner shown in the block diagrams. As those skilled in the art will recognize, these devices, systems, devices, and systems can be connected, arranged, and configured in any manner. Words such as “comprising,” “including,” “having,” etc., are open-ended terms meaning “including but not limited to,” and are used interchangeably with them. The terms “or” and “and” as used herein refer to the terms “and / or,” and are used interchangeably with them unless the context clearly indicates otherwise. The term “such as” as used herein refers to the phrase “such as but not limited to,” and is used interchangeably with it.
[0246] The methods and systems of the present invention may be implemented in many ways. For example, they may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order of steps for the methods is for illustrative purposes only, and the steps of the methods of the present invention are not limited to the order specifically described above unless otherwise specifically stated. Furthermore, in some embodiments, the present invention may also be implemented as a program recorded on a recording medium, the program comprising machine-readable instructions for implementing the methods according to the present invention. Thus, the present invention also covers recording media storing programs for performing the methods according to the present invention.
[0247] It should also be noted that in the systems, apparatus, and methods of the present invention, the components or steps can be disassembled and / or recombined. These disassemblies and / or recombinations should be considered equivalents of the present invention. The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use the invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the general principles defined herein can be applied to other aspects without departing from the scope of the invention. Therefore, the invention is not intended to be limited to the aspects shown herein, but rather to be carried out within the widest scope consistent with the principles and novel features disclosed herein.
[0248] The above description has been given for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the invention to the forms disclosed herein. Although numerous exemplary aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, alterations, additions, and sub-combinations therein.
Claims
1. A method for detecting anomalies in an energy meter based on a dual-memory enhanced self-encoder, characterized in that, include: Acquire multivariate long-term series data of historical measurements of the energy meter under test; The multivariate long-term series data is divided into multiple time windows of a preset window length; Multiple time window data and their adjacent time window data are input into a pre-trained anomaly detection model, and the reconstructed data corresponding to the time window data is output. The anomaly detection model adopts a dual-memory enhanced autoencoder. The anomaly score for each time point in the time window is determined based on the reconstructed data and the original data for each time window, and the degree of anomaly of the energy meter under test is determined based on the anomaly score. The training process of the anomaly detection model is as follows: Acquire multivariate time series sample data from multiple historical energy meter readings and merge them into a single multivariate long time series sample data. The multivariate long-term series sample data is windowed and divided into multiple time window sample data of the preset window; Based on the pre-built global memory augmentation encoder, feature extraction is performed on each time window sample to be detected in the multiple time window sample data, and the reconstructed global latent variable of each time window sample to be detected is output. Based on the pre-constructed local memory enhancement encoder and the neighboring window samples of each of the time window samples to be detected, the reconstructed local latent variables of each of the time window samples to be detected are output. The reconstructed global latent variables and the reconstructed local latent variables are fused using a weighted learning layer to generate memory features, and the reconstructed latent variables for each of the samples in the time window to be detected are output. The reconstructed latent variables of each time window sample to be detected are decoded using an MLP-based neural network decoder, and the reconstructed sample data of each time window sample to be detected is output. The total loss of the anomaly detection model is determined by pre-defined global sparse loss, local latent variable loss, and time window reconstruction loss. The anomaly detection model is determined by updating and optimizing the network and parameters of the anomaly detection model based on the total loss. The formula for calculating the reconstructed sample data is: in, In the formula, To reconstruct latent variables, and These are the weights of the combined global and local latent variables from the reconstruction, respectively. This represents the weight corresponding to each granularity of the reconstructed latent variable. Indicates the number of layers in the multi-level information fusion; It is a global latent variable; Z l for Local latent variables; The formula for calculating the abnormal score is: In the formula, The outlier score for time step t. This refers to the unreconstructed time window data at time t. The data is the reconstructed data at time t.
2. The method according to claim 1, characterized in that, After acquiring the multivariate long-term series data of the historical measurements of the energy meter under test, the method further includes: performing normalization preprocessing on the multivariate long-term series data, wherein the preprocessing process is as follows: , , in, This represents the original input vector. This represents the variable vector after learnable normalization. Let j represent the j-th element in the learnable mean vector. Let j represent the j-th element in the learnable variance vector. and The weights represent the adaptive weighted summation. This represents the model input after passing through the learnable data preprocessing module.
3. The method according to claim 1, characterized in that, Based on a pre-constructed global memory augmentation encoder, features are extracted from each of the multiple time window sample data to be detected, and the reconstructed global latent variables of each time window sample to be detected are output, including: Each sample in the time window to be detected is first passed through an encoder layer to output a global latent variable, wherein the encoder layer is composed of multiple stacked GRUs; The global latent variables are input into the global memory layer to reconstruct the global latent variables, and the reconstructed global latent variables are output. The calculation process of the encoder layer is as follows: In the formula, This represents the input sample within the time window to be detected. and It is the learnable weight and bias parameter matrix of the network. This represents the Hadamard product operation of two matrices; the output of the encoder layer is the global latent variable. , in This represents the output at time t. express In the calculation, T represents the number of time points within the window. The parameter representing the update gate of the GRU indicates how much of the previous information needs to be updated. Representing the candidate intermediate states of the GRU, through the reset gate Controls the output of the previous unit The proportion; The calculation process for the global memory layer is as follows: in, In the formula, the learnable matrix , Indicates the size of the memory matrix. This represents the feature dimension of each typical pattern in the memory matrix. Indicates the first memory module A memory feature, The similarity between the global latent variable and each memory feature. Represents the score matrix of the first... i A vector, The global memory score matrix is obtained by normalizing the similarity matrix.
4. The method according to claim 3, characterized in that, Based on the pre-constructed local memory augmentation encoder and the neighboring window samples of each of the detected time window samples, the reconstructed local latent variables of each of the detected time window samples are output, including: The adjacent window samples of each time window sample to be detected are encoded by a fully connected network to generate adjacent window encoded samples. The adjacent window encoded samples are input into an encoder layer consisting of stacked GRUs, and the adjacent window sample matrix is output. The global latent variable of each sample in the window to be detected is used as the query, and the matrix of adjacent window samples is used as the memory matrix to reconstruct the local latent variable. The reconstructed local latent variable is then output. The calculation process for the local latent variables is as follows: In the formula, Represents the local score matrix. This represents the i-th item in the adjacent window sample matrix. For global latent variables, the adjacent window sample matrix , Indicates the number of adjacent windows. This represents the dimension after encoding of a fully connected network.
5. The method according to claim 1, characterized in that, The formula for the total loss is: in, In the formula, Loss spar_sum Global sparse loss, Loss latent Local latent variable loss, Loss recon Time window reconstruction loss, This represents the i-th latent variable that is reconstructed for the first time. Indicates the first i A second reconstruction of latent variables, and This indicates that weighting factors are used to balance the three losses. This represents the first encoder in the encoder-decoder-encoder structure used in the anomaly detection model. In the i The first scale j Sparse loss of each sample, This indicates the second encoder in the encoder-decoder-encoder series. In the i The first scale j Sparse loss of each sample, N Indicates the number of training samples. This indicates the calculation of the mean square error.
6. An energy meter anomaly detection device based on a dual-memory enhanced self-encoder, used to implement the method described in any one of claims 1-5, characterized in that, include: The acquisition module is used to acquire multivariate long-term series data of historical tests of the energy meter under test; The partitioning module is used to divide the multivariate long-term series data into multiple time windows of a preset window length; The output module is used to input multiple time window data and their adjacent time window data into a pre-trained anomaly detection model and output the reconstructed data corresponding to the time window data, wherein the anomaly detection model adopts a dual-memory enhanced autoencoder. The determination module is used to determine the anomaly score of each time point of the data in each time window based on the reconstructed data and the original data, and to determine the degree of anomaly of the energy meter under test at each time point based on the anomaly score.
7. A computer-readable storage medium, characterized in that, The storage medium stores a computer program for performing the method described in any one of claims 1-5.