A user electricity stealing identification method and system based on load modeling and multi-model detection
By constructing a baseline load curve and a typical daily electricity consumption behavior curve, and combining deep neural networks with various traditional machine learning models, the accuracy and stability issues of electricity theft identification in low-voltage distribution networks were solved, and refined modeling and interpretable risk classification of electricity theft risks were achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANJING INST OF TECH
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies are insufficient to effectively identify electricity theft in low-voltage distribution networks, especially when there are many users, complex electricity consumption behaviors, and significant noise interference. Traditional methods are prone to false alarms and missed alarms, and deep learning solutions lack explicit modeling of users' historical and current behaviors, resulting in insufficient stability and interpretability of the identification results.
By constructing an annual baseline load curve and a typical daily electricity consumption behavior curve, and combining deep neural networks with various traditional machine learning models, differential feature vectors are calculated and multi-model joint decision-making is performed. The probability of electricity theft risk is integrated to generate an electricity theft risk level, and the detection basis is displayed through visualization.
It improves the accuracy and robustness of electricity theft identification, reduces the false positive rate, realizes fine modeling and interpretable risk classification of electricity theft behavior, and enhances the intelligence level of low-voltage power distribution electricity theft identification.
Smart Images

Figure CN122241584A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of electricity consumption abnormality behavior identification technology, specifically involving a user electricity theft identification method and system based on load modeling and multi-model detection. Background Technology
[0002] Low-voltage distribution networks have a large number of end users with complex and variable electricity consumption behaviors. With the increasing integration of distributed power sources and the growing variety of electrical equipment, distribution companies face increasing pressure to control line losses and combat electricity theft. Low-voltage user electricity theft is typically characterized by its high degree of concealment, diverse methods, and long duration. Traditional methods relying on manual inspections, experience-based judgment, or simple report comparisons often only passively detect problems when there are obvious abnormalities in electricity consumption or a significant increase in line losses, making it difficult to locate suspicious users in a timely and accurate manner, easily leading to economic losses and management risks. In existing technologies, a common approach is to perform simple statistical analysis based on electricity meter readings, such as year-on-year or month-on-month comparisons of monthly electricity consumption, or setting fixed thresholds for a few indicators such as peak-valley electricity ratio and power factor, triggering an alarm when an indicator exceeds a preset range. While simple to implement, this method relies on only a few statistical quantities and cannot comprehensively depict the load variation patterns of users on an annual and daily scale. For users with significant seasonal fluctuations and large differences in electricity consumption patterns between weekdays and non-weekdays, false alarms and missed alarms are likely, limiting the ability to identify electricity theft.
[0003] To improve identification accuracy, existing technologies have attempted to incorporate traditional machine learning methods to model and classify historical electricity consumption data of low-voltage users. For example, models such as decision trees, logistic regression, or support vector machines are used to distinguish between "normal users" and "suspected electricity thieves" through supervised learning. While these methods introduce multi-dimensional features to some extent, they often rely on human experience to select a limited number of features, such as monthly electricity consumption, maximum daily load, and load factor. They fall short in characterizing the shape of the load time series itself, the timing of peak and valley occurrences, and the differences in patterns between weekdays and holidays. Furthermore, in actual low-voltage power distribution scenarios, the number of electricity theft samples is relatively small, the categories are severely imbalanced, and the field data is often mixed with measurement noise and outliers. Single traditional models are highly sensitive to small samples and noise disturbances, and their generalization ability and robustness are insufficient to meet engineering requirements.
[0004] In recent years, with the development of deep learning, some technical solutions have begun to attempt to use models such as convolutional neural networks and recurrent neural networks to automatically extract features and classify electricity load time series, hoping to improve the effectiveness of electricity theft detection by leveraging the representational capabilities of deep networks. However, most existing deep learning solutions directly use the original daily load curve or short-term historical data as input, lacking explicit modeling of the deviation between the user's historical annual baseline behavior and the behavior during the current detection period. They fail to effectively utilize long-term statistical information and rarely perform quantitative extraction and comprehensive analysis of the differences in amplitude deviation, shape similarity, and time series offset between the detection period and the baseline curve. At the same time, many solutions rely solely on the output of a single deep model as the basis for judgment, without complementing and jointly deciding on the high-dimensional features extracted by deep neural networks with the advantages of traditional classifiers such as random forests and support vector machines. When facing complex operating conditions such as seasonal variations, small sample imbalances, and noise interference, the stability and interpretability of the identification results remain insufficient. Summary of the Invention
[0005] This invention addresses the shortcomings of existing technologies by providing a user electricity theft identification method and system based on load modeling and multi-model detection. It utilizes a baseline curve, the load curve of the period to be identified, and their differential characteristics, and combines deep neural networks with multiple traditional machine learning models for joint decision-making. While ensuring the feasibility and ease of deployment of the electricity theft identification method, it improves the accuracy, robustness, and interpretability of risk classification results in identifying electricity theft behavior of low-voltage distribution users.
[0006] This invention provides the following technical solution:
[0007] Firstly, a user electricity theft identification method based on load modeling and multi-model detection is provided, including the following steps: S1: Acquire historical and target time period electricity metering data from low-voltage distribution users and perform preprocessing; S2: The preprocessed historical electricity metering data is classified and summarized according to different dimensions, and the daily load curves of the same date are averaged point by point to obtain the user baseline curve, including: annual baseline load curve and typical daily electricity consumption behavior curve. S3: For the period to be identified, the preprocessed electricity metering data is aggregated by day or hour to construct the load curve of the period to be identified. After calculating several difference features between the load curve of the period to be identified and the benchmark curve, all the difference features are spliced into a difference feature vector. S4: The statistical features and differential feature vectors of the load curve for the period to be identified are concatenated to obtain the load feature sequence. After the features are extracted by the deep neural network, they are input into at least one machine learning classifier to obtain the probability of electricity theft output by multiple classifiers. S5: Combine the electricity theft risk probabilities output by all classifiers to obtain the user's electricity theft risk level.
[0008] Optionally, in step S1, the preprocessing includes: standardization, filling missing values in the electricity metering data with the median of electricity consumption of the same user in adjacent time windows, and removing or truncating abnormal electricity metering data that exceeds preset upper and lower limits.
[0009] Optionally, in step S2, the typical daily electricity consumption behavior curve includes the weekday electricity consumption behavior curve, the non-weekday electricity consumption behavior curve, the seasonal weekday electricity consumption behavior curve, and the seasonal non-weekday electricity consumption behavior curve. The annual benchmark load curve uses the day number within the year as the independent variable, and the typical daily electricity consumption behavior curve uses the hour number within the day as the independent variable.
[0010] Optionally, step S2 obtains the user baseline curve, which specifically includes: grouping the historical period electricity metering data according to user type, season, working day and non-working day; averaging the daily load curves with the same day number in each group point by point to obtain the annual baseline load curve and the typical daily electricity consumption behavior curve; and smoothing the baseline curve by using a moving average or low-pass filtering.
[0011] Optionally, in step S3, if the time period to be identified is measured on an intraday or interday hourly basis, the difference characteristics between the load curve of the time period to be identified and the corresponding time period of the typical daily electricity consumption behavior curve are calculated; if the time period to be identified includes several days or covers a year, the difference characteristics between the load curve of the time period to be identified and the corresponding time period of the annual benchmark load curve are calculated.
[0012] Optionally, the difference features in step S3 include at least one of amplitude deviation, shape similarity, and temporal offset; The amplitude deviation The calculation formula is: ; in, The number of sampling points for the time period to be identified. and Sampling points in the time period to be identified The values of the baseline curve and the load curve for the period to be identified. To prevent the minimum power threshold from being zero.
[0013] Optionally, the deep neural network in step S4 is a one-dimensional convolutional neural network, which includes an input layer connected in sequence, at least two one-dimensional convolutional layers and pooling layers, a fully connected layer, and a penultimate layer for outputting a high-dimensional representation vector. The one-dimensional convolutional layer is used to extract local patterns and periodic features of the load feature sequence in the time dimension, and the output of the penultimate layer is provided as a high-dimensional representation vector to multiple subsequent machine learning classifiers. The machine learning classifiers include: random forest classifier, support vector machine classifier, multilayer perceptron, and gradient boosting tree; The statistical characteristics of the load curve for the period to be identified in step S4 include at least one of the following: peak value, valley value, peak-valley difference, average load, load factor, volatility index, and the time when the peak occurs.
[0014] Optionally, in step S5, the electricity theft risk probabilities output by all classifiers are fused by voting fusion or weighted fusion to obtain the final electricity theft risk probability, and the electricity theft risk level is divided according to the adaptive threshold, including normal, suspicious and serious. The adaptive threshold is determined by constructing an empirical distribution using the comprehensive risk probability of historical normal users, and then taking the upper limit of that distribution. quantiles serve as the first threshold between normal and suspicious conditions. , take up quantiles serve as a second threshold between suspicious and highly suspicious conditions. ,in ,when When judged as a normal user, when When a user is identified as suspicious, Users were identified as highly suspicious at that time.
[0015] Optionally, the method further includes step S6: generating a corresponding identification result based on the electricity theft risk level, outputting the user number, risk probability and risk level, and graphically displaying the baseline curve, the load curve of the period to be identified and their differences.
[0016] Secondly, a user electricity theft identification system based on load modeling and multi-model detection is provided, including: Data acquisition and preprocessing module: Acquires and preprocesses historical and unidentified time period electricity metering data from low-voltage power distribution users; The baseline curve modeling module classifies and summarizes the pre-processed historical electricity metering data according to different dimensions, and averages the daily load curves of the same date point by point to obtain the user baseline curve, including: the annual baseline load curve and the typical daily electricity consumption behavior curve. Difference Construction Module: For the period to be identified, the preprocessed electricity metering data is aggregated by day or hour to construct the load curve for the period to be identified. After calculating several difference features between the load curve for the period to be identified and the baseline curve, all difference features are concatenated into a difference feature vector. Multi-model joint identification module: The statistical features and differential feature vectors of the load curve of the period to be identified are concatenated to obtain the load feature sequence. After the feature is extracted by the deep neural network, it is input into at least one machine learning classifier to obtain the probability of electricity theft output by multiple classifiers. Electricity theft assessment module: integrates the electricity theft risk probabilities output by all classifiers to obtain the user's electricity theft risk level; Visualization and report generation module: Graphically displays the results of electricity theft identification, as well as the baseline curve, the load curve of the period to be identified, and their differences.
[0017] Compared with the prior art, the beneficial effects of the present invention are: (1) Compared with traditional electricity theft identification schemes that rely on a single threshold or a single model, the user electricity theft identification method proposed in this invention has significant improvements in load behavior modeling, abnormal feature characterization, and model structure design. First, based on data preprocessing, this invention introduces an annual benchmark load curve and a typical daily electricity consumption behavior curve to finely model the normal electricity consumption pattern of users on the annual and daily scales, and distinguishes different dimensions so that the user to be detected is always compared with the historical behavior of the same type and the same scenario. By performing group statistics, point-by-point averaging, and smoothing on the daily load curve, the constructed benchmark curve can effectively filter metering noise and reflect long-term stable electricity consumption habits. Thus, even when there are seasonal fluctuations in load, climate change, or natural adjustments in user electricity consumption behavior, it can still maintain sensitivity to abnormal behavior and reduce the risk of normal users being misjudged as electricity thieves.
[0018] (2) Based on the load modeling of the period to be identified, this invention constructs a difference feature vector including at least one of amplitude deviation, curve shape similarity, and peak-valley time sequence offset. Information such as whether the electricity consumption is abnormal, whether the electricity consumption pattern has changed, and whether the electricity consumption time distribution is misaligned are uniformly integrated into the same feature space. Furthermore, it combines deep neural networks with various traditional machine learning models to form a structure of deep feature extraction combined with multi-model joint identification. On the one hand, it utilizes the advantages of deep networks in high-dimensional feature representation to mine the time sequence patterns and behavioral differences hidden in the load sequence. On the other hand, it leverages the complementary characteristics of classifiers such as random forests and support vector machines to obtain a comprehensive electricity theft risk score through voting or weighted fusion. This effectively improves the detection rate and reduces the false judgment rate in actual power distribution scenarios with small samples, unbalanced samples, and noise interference. Building upon this foundation, the present invention also introduces an adaptive threshold mechanism based on the historical distribution of risk scores for normal users. This mechanism combines the comprehensive risk score with the deviation from the baseline curve, classifying users into different risk levels such as normal, suspicious, and seriously suspicious. Furthermore, through the visualization of the baseline curve and the load curve of the period to be identified, the detection basis and risk results are presented to maintenance personnel in an intuitive manner. This makes the electricity theft identification process accurate, interpretable, and engineering-practical, fully leveraging the value of existing electricity metering data and enhancing the intelligence level and application promotion value of low-voltage distribution electricity theft identification. Attached Figure Description
[0019] Figure 1 This is a flowchart of the user electricity theft identification method based on load modeling and multi-model detection of the present invention; Figure 2 This is a structural block diagram of the user electricity theft identification system based on load modeling and multi-model detection of the present invention; Figure 3 This is a flowchart of the power metering data preprocessing and feature extraction process of the present invention; Figure 4 This is a daily-scale comparison diagram of the baseline curve and the load curve of the period to be identified provided by the present invention. The horizontal axis represents the hour within the day, and the vertical axis represents the electricity consumption or load value. The solid line in the figure represents the baseline curve, and the dashed line represents the load curve of the period to be identified. Figure 5 This is an annual scale comparison diagram of the baseline curve and the load curve of the period to be identified provided by the present invention. The horizontal axis represents the day number within the year, and the vertical axis represents the electricity consumption or load value. The solid line in the figure represents the baseline curve, and the dashed line represents the load curve of the period to be identified. Detailed Implementation
[0020] The present invention will be further described below with reference to the accompanying drawings. The following embodiments are only used to more clearly illustrate the technical solutions of the present invention and should not be used to limit the scope of protection of the present invention. It should be noted that the term "comprising" and any variations thereof in the specification, claims and the above-mentioned drawings of the present invention are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to these processes, methods, products or devices.
[0021] Example 1: like Figure 1 and Figure 3 As shown, a user electricity theft identification method based on load modeling and multi-model detection includes the following steps: S1: Acquire historical and target time period electricity metering data from low-voltage distribution users and perform preprocessing; S2: The preprocessed historical electricity metering data is classified and summarized according to different dimensions, and the daily load curves of the same date are averaged point by point to obtain the user baseline curve, including: annual baseline load curve and typical daily electricity consumption behavior curve. S3: For the period to be identified, the preprocessed electricity metering data is aggregated by day or hour to construct the load curve of the period to be identified. After calculating several difference features between the load curve of the period to be identified and the benchmark curve, all the difference features are spliced into a difference feature vector. S4: The statistical features and differential feature vectors of the load curve for the period to be identified are concatenated to obtain the load feature sequence. After the features are extracted by the deep neural network, they are input into at least one machine learning classifier to obtain the probability of electricity theft output by multiple classifiers. S5: Combine the electricity theft risk probabilities output by all classifiers to obtain the user's electricity theft risk level.
[0022] In this embodiment, the electricity metering data for the historical period in step S1 refers to the set of metering data used to characterize the user's historical electricity consumption behavior, which may include electricity consumption or load value and its corresponding metering time, etc.; the electricity metering data for the period to be identified refers to the set of metering data within the time window to be identified, used to determine whether the user has abnormal electricity consumption behavior within the time window.
[0023] Electricity metering data can originate from electricity metering systems, electricity information collection terminals, or historical metering databases. The data carrier can be a structured file or database table record. Electricity metering data includes at least user identification information, metering time, and metering values related to electricity consumption behavior. These metering values can be one or more of electricity consumption, load values, or power values. To ensure consistency in subsequent modeling and comparison, this embodiment performs unified parsing of metering time and aligns metering records to a preset time granularity. The time granularity can be set to minute, hour, or day levels according to business needs. For cases where multiple records exist within the same time granularity, an aggregation method can be used to generate a representative value. The aggregation method can be the mean, median, or weighted mean.
[0024] In step S1, the preprocessing includes: standardization, filling missing values in the electricity metering data with the median of electricity consumption of the same user in adjacent time windows, and removing or truncating abnormal electricity metering data that exceeds preset upper and lower limits.
[0025] After data import, this embodiment handles missing and outlier values. For missing values, continuous measurements can be filled using methods such as median imputation, neighborhood interpolation, or time window interpolation. Records with unparseable time records or missing key fields can be removed or corrected retrospectively. For outliers, they can be identified and processed based on historical statistical distribution or physically reasonable ranges. For example, extreme points that clearly exceed reasonable ranges can be truncated, corrected, or removed to reduce the impact of noise and occasional anomalies on subsequent feature calculations and model discrimination. As an optional embodiment, data quality markers can be added to samples to indicate the degree of missing values, the degree of anomalies, or the correction method. This allows for the deweighting of low-quality samples during subsequent training and risk assessment stages, improving the robustness of the overall discrimination results.
[0026] Subsequently, feature construction is performed based on the cleaned data. For categorical fields, categorical encoding can be used to convert them into numerical representations suitable for model calculations. For time fields, this embodiment decomposes the metering time into time features to form multi-scale time features. These time features include, but are not limited to, year, month, date, week number, hour, and whether it is a working day, used to characterize seasonality, periodicity, and intraday patterns. Simultaneously, this embodiment constructs statistical features related to load behavior. These statistical features may include peak values, valley values, peak-valley differences, average load, load factor, volatility indicators, and peak occurrence times, to characterize the overall characteristics of user electricity consumption patterns and provide a basis for subsequent differential feature calculations.
[0027] To eliminate the impact of differences in user metrics and feature value ranges on model training and discrimination, this embodiment standardizes numerical features, including measurement features and derived statistical features. The standardization parameters are preferably obtained from data distribution estimation during the training phase, and the same set of parameters is reused in the recognition phase to ensure consistency between training and recognition. Preferably, the standardization process can employ a linear transformation based on the mean and standard deviation, or a linear scaling method based on the minimum and maximum values, to ensure that features with different metrics are within a comparable scale, thereby improving the training stability and generalization ability of subsequent deep feature extraction and multi-model joint recognition.
[0028] In this embodiment, the baseline load modeling in step S2 is used to form a benchmark reference from historical metering data that can characterize the normal electricity consumption behavior of users. The result of the annual baseline load modeling can form an annual baseline load curve on an annual scale, and can further form a typical daily electricity consumption behavior curve on a daily scale, thereby providing a comparison object for the load modeling and difference feature calculation of the period to be identified in the subsequent step S3. The typical daily electricity consumption behavior curve includes weekday electricity consumption behavior curve, non-weekday electricity consumption behavior curve, seasonal weekday electricity consumption behavior curve, and seasonal non-weekday electricity consumption behavior curve. The annual baseline load curve uses the day number within the year as the independent variable, and the typical daily electricity consumption behavior curve uses the hour number within the day as the independent variable. Of course, user categories can be further distinguished to construct corresponding typical daily electricity consumption behavior curves.
[0029] The annual baseline load curve is constructed based on normal electricity consumption samples from historical metering data. These samples can be obtained through preset rules or existing labeling information. During screening, data from periods with significant gaps, abnormal fluctuations, or confirmed electricity theft risks can be removed to prevent abnormal behavior from contaminating the baseline. To improve the baseline curve's adaptability to seasonal changes and periodic patterns, this embodiment optionally groups historical data for modeling. Grouping methods include, but are not limited to, grouping by season, by month, and by weekdays and non-working days.
[0030] After sample selection and grouping, this embodiment aligns the historical data of each group along the time axis and performs aggregation operations at the corresponding time positions to generate a baseline curve. The aggregation method can be either mean aggregation or median aggregation, with median aggregation being less sensitive to extreme points and thus improving the robustness of the baseline curve. For scenarios with high noise or significant sampling fluctuations, this embodiment can optionally smooth the aggregated baseline curve, for example, using sliding window smoothing or low-pass filtering, to reduce the impact of short-term random disturbances on the curve shape while preserving key behavioral characteristics such as peak and trough variations. Through the above processing, an annual baseline load curve for annual-scale comparison can be obtained, and its corresponding annual statistical characteristics can be output, such as annual total, annual peak and trough values, peak-to-trough difference, peak occurrence interval, and load rate, for subsequent differential feature construction and risk interpretation.
[0031] Furthermore, to facilitate comparison of the behavior during the identified time period on an intraday timescale, this embodiment further constructs a typical daily electricity consumption behavior curve as a benchmark daily-scale expression. The typical daily electricity consumption behavior curve can be obtained by segmenting historical metering data by day, aligning and aggregating it along the intraday hourly dimension; in some embodiments, different typical daily benchmark curves can be constructed for weekdays and non-weekdays respectively to reflect the differences in users' electricity consumption patterns under different day types. The aggregation method can also be selected as the mean or median, and smoothing can be performed as needed to obtain a more stable typical daily curve shape. The result of the typical daily electricity consumption behavior curve can be... Figure 4 The daily-scale benchmark shown provides a benchmark reference by comparing it with the load curve of the period to be identified. At the same time, this embodiment can output daily-scale statistical features corresponding to typical daily curves, including intraday peaks, valleys, peak-valley differences, peak times, valley times, and intraday volatility indicators, which serve as important inputs for the subsequent step S3 difference feature calculation and discriminant interpretation.
[0032] In this embodiment, the load modeling and differential feature calculation for the time period to be identified in step S3 are used to model the electricity consumption behavior within the time window to be identified and compare it with the baseline load to form a differential feature vector. The differential feature vector serves as an important input for subsequent deep feature extraction and multi-model joint identification; its calculation process can be combined with… Figure 4 The diagram showing the comparison of diurnal scale curves and Figure 5 The annual scale curve comparison diagram is shown below for illustration.
[0033] The period to be identified can be a target day, a time window of several consecutive days, or a time range covering an entire year, depending on the actual identification needs. For example, when it is necessary to characterize intraday behavioral anomalies, the period window corresponding to the target day can be selected to form a daily-scale load curve for the period to be identified; when it is necessary to characterize annual behavioral shifts, the period window corresponding to the target year can be selected to form an annual-scale load curve for the period to be identified. To ensure comparability, the data within the period window to be identified should be processed using the same time granularity and alignment method as in the baseline modeling stage, and should be consistent with the preprocessing strategy in step S1. The resulting load curve for the period to be identified can be compared with the typical daily electricity consumption behavior curve on a daily scale, and with the annual baseline load curve on an annual scale. The daily scale comparison can refer to... Figure 4 For annual scale comparisons, please refer to Figure 5 .
[0034] After constructing the load curve for the period to be identified, this embodiment calculates a difference feature vector based on the difference between the load curve for the period to be identified and the baseline curve. The difference features may include multiple dimensions such as amplitude deviation, shape similarity, and time series offset, which are used to characterize the degree of deviation between the behavior of the period to be identified and the baseline behavior from different perspectives.
[0035] Amplitude deviation The calculation formula is: ; in, The number of sampling points for the time period to be identified. and Sampling points in the time period to be identified The values of the baseline curve and the load curve for the period to be identified. To prevent the minimum power threshold from being zero.
[0036] This embodiment also calculates the shape similarity between the load curve for the period to be identified and the baseline curve to reflect the degree of consistency in the trend pattern of the two curves. Optionally, a correlation coefficient can be used as a similarity measure. For example, the aligned curves on a daily or annual scale are discretized into lengths of... sequence and The correlation coefficient It can be represented as:
[0037] in and These are the mean values of the corresponding sequences. Higher similarity indicates a closer morphological similarity between the load curve and the baseline curve for the time period to be identified; lower similarity indicates a more significant morphological deviation. For some scenarios, equivalent measures such as cosine similarity can also be used to calculate shape similarity.
[0038] In addition, this embodiment can optionally calculate the time-series offset difference to characterize the offset of peak time, valley time, or major load change periods on the time axis. For example, offset features can be formed based on the peak time difference and valley time difference, or the optimal alignment offset can be estimated as a time-series offset index through cross-correlation methods. The time-series offset feature can supplement the time position change information that amplitude deviation and shape similarity cannot fully cover, and is particularly suitable for identifying abnormal behaviors with electricity consumption period migration.
[0039] When the period to be identified is on a daily scale, it is preferable to select the corresponding typical daily electricity consumption behavior curve as the benchmark curve based on the type of the date for difference feature calculation. Specifically, first, it is determined whether the date to be identified is a weekday or a non-working day; if the date is a weekday, the load curve of that day is compared with the typical load curve of historical weekdays; if it is a non-working day, it is compared with the typical load curve of historical non-working days. By calculating the amplitude deviation, shape similarity, and time sequence offset between the two, a difference feature vector describing the degree of deviation of the daily electricity consumption behavior is obtained. Of course, in some other embodiments, it is preferred to first determine whether the date to be identified is a weekday or a non-working day, and then determine the season, thereby finding the corresponding seasonal weekday electricity consumption behavior curve or seasonal non-working day electricity consumption behavior curve.
[0040] In this embodiment, the input features in step S4 include differential feature vectors and statistical features, which can be concatenated according to preset rules to form a unified feature input. Statistical features may include peak values, valley values, peak-valley differences, load rates, volatility indicators, and peak occurrence times, while differential features may include amplitude deviation, shape similarity, and temporal offset. To improve feature representation capabilities, this embodiment constructs a deep feature extraction network to learn the above input features. The deep feature extraction network may be a multi-layer fully connected network, a deep network with normalization and non-linear activation, or a one-dimensional convolutional network capable of processing sequence features. After the network training is completed, a high-dimensional representation vector is extracted from the intermediate layer output or the penultimate layer output of the deep feature extraction network. This vector is used to represent the comprehensive deviation pattern of the user's behavior during the period to be identified relative to the baseline behavior. Compared to directly using the original differential features, the high-dimensional representation vector can learn a more stable discriminative structure in the feature space, thereby improving the adaptability to noise disturbances and behavioral diversity.
[0041] In optional embodiments, to improve the stability and generalization ability of model training, regularization training strategies can be used to constrain the deep feature extraction network. For example, dropout mechanisms, parameter norm constraints, or early stopping strategies can be introduced into the network layers to avoid overfitting. When there is class imbalance in the samples, class weights, resampling, or cost-sensitive learning methods can be used to adjust the training process, so that the model can reduce the risk of missed detections while maintaining overall accuracy. The above training strategies are preferred implementations and do not change the main process of deep representation vector extraction in step S4.
[0042] After obtaining the high-dimensional representation vector, this embodiment further employs multiple classification models to discriminate the high-dimensional representation vector, outputting the probability of electricity theft or a risk score. The classifier may include one or more of the following: random forest classifier, support vector machine classifier, multilayer perceptron, gradient boosting tree, and other supervised learning classifiers. Each classification model can output a corresponding risk probability or score, used to characterize the confidence level of whether the electricity consumption behavior during the identified period belongs to the abnormal category. By combining the high-dimensional representation vector learned by deep networks with the discriminative power of traditional classification models, this embodiment achieves a technical path of deep feature extraction and multi-model joint identification. It maintains good identification robustness even in scenarios with small samples, high noise levels, or complex differences in electricity consumption behavior, and provides a quantifiable model output basis for the subsequent fusion decision and risk classification in step S5.
[0043] In this embodiment, in step S5, the electricity theft risk probabilities output by all classifiers are fused by voting fusion or weighted fusion to obtain the final electricity theft risk probability, and the electricity theft risk level is divided according to the adaptive threshold, including normal, suspicious and serious.
[0044] The adaptive threshold is determined by constructing an empirical distribution using the comprehensive risk probability of historical normal users, and then taking the upper limit of that distribution. quantiles serve as the first threshold between normal and suspicious conditions. , take up quantiles serve as a second threshold between suspicious and highly suspicious conditions. ,in ,when When judged as a normal user, when When a user is identified as suspicious, Users were identified as highly suspicious at that time.
[0045] Weighted fusion can highlight the contribution of a superior model when there are significant performance differences between different models. Let the... The risk score output by each model is: The corresponding weight is Then the comprehensive risk score It can be represented as:
[0046] in, The number of models participating in the fusion, weights The system satisfies preset constraints, such as all weights being non-negative and the sum of the weights being 1. Through this fusion, the judgment results of multiple models can be uniformly mapped to the same scoring scale, reducing the impact of misjudgments by a single model on the final conclusion and improving the stability of risk assessment.
[0047] In an alternative embodiment, weights The weights can be adaptively determined based on the model's performance on the validation data. For example, weights can be allocated based on metrics such as accuracy, recall, or F1 score, allowing the better-performing model to have a higher weight in the fusion score. When the recognition scenario or data distribution changes, the weights can also be periodically updated to maintain the effectiveness of the fusion strategy. The above-mentioned adaptive weighting method is the preferred implementation and does not change the main flow of step S5.
[0048] In some other embodiments, this application further includes step S6: generating corresponding identification results based on the electricity theft risk level, outputting user number, risk probability and risk level, and graphically displaying the baseline curve, the load curve of the period to be identified and their differences.
[0049] In some embodiments, the identification results are output in the form of structured fields, including at least: object identification information, time window information for the identification period, comprehensive risk score, and corresponding risk level. Object identification information may be a user ID, device ID, or transformer area user ID, etc.; the time window information for the identification period may include the target day, target year, or the start and end times of a continuous identification period window. In addition to the core fields mentioned above, optional output fields may also include key difference indicators used to explain the risk conclusions, such as amplitude deviation summary indicators, shape similarity indicators, peak time offset, or trough time offset, etc., so that the risk score and risk level can be supported by interpretable difference features, thereby improving the understandability and traceability of the results.
[0050] To visually represent the deviation of the electricity consumption behavior during the period to be identified from the baseline electricity consumption behavior, this embodiment provides a visual comparison of the baseline curve and the load curve for the period to be identified. For daily-scale visualization, this embodiment uses the typical daily electricity consumption behavior curve as the daily-scale baseline curve, and overlays it with the load curve for the period to be identified corresponding to the target day. The horizontal axis represents the time within the day, and the vertical axis represents the load value or electricity consumption value, thus forming a display as shown below. Figure 4The curve comparison shown illustrates the differences in intraday peak-valley patterns, load levels, and electricity consumption distribution. For annual-scale visualization, this embodiment uses the annual baseline curve as the annual-scale baseline curve and overlays it with the annual curves of the target year or target time period to be identified. The horizontal axis represents the day number within the year, and the vertical axis represents the load value or electricity consumption value, thus forming a display as shown below. Figure 5 The curves shown are for comparison; this comparison allows us to observe characteristics such as annual trend shifts, seasonal variations, and long-term load level changes.
[0051] In an optional embodiment, the visualization output can be presented in conjunction with difference indicators to enhance interpretability. For example, while generating curve comparisons, key difference information related to the curves can be output, including the maximum deviation interval, the main deviation period, and peak-valley shifts, thereby providing a clearer basis for risk level determination. To improve the display stability in scenarios with high noise or inconsistent data quality, this embodiment can optionally perform robust aggregation or smoothing on the curves, and reduce the weight of low-quality data samples in the visualization calculation to avoid distortion of curve comparison caused by occasional outliers; the above processing is only a preferred implementation method and does not change the main process of step S6.
[0052] In addition, this embodiment optionally provides a report export function, which is used to output the identification results, main difference indicators, and corresponding curve comparison results in electronic file form for easy archiving and sharing. The report may include information about the object to be identified, the time window to be identified, risk score, risk level, summary of difference indicators, and curve comparison chart, etc.; the report format may be one or more of table files, document files, or image files. Through step S6, this embodiment realizes the structured output, visualization, and optional report generation of the identification results, thereby completing the electricity theft risk identification process based on the deviation of the benchmark curve and providing an implementable output format for engineering applications.
[0053] Example 2: like Figure 2 As shown, a user electricity theft identification system based on load modeling and multi-model detection includes: Data acquisition and preprocessing module: Acquires and preprocesses historical and unidentified time period electricity metering data from low-voltage power distribution users; The baseline curve modeling module classifies and summarizes the pre-processed historical electricity metering data according to different dimensions, and averages the daily load curves of the same date point by point to obtain the user baseline curve, including: the annual baseline load curve and the typical daily electricity consumption behavior curve. Difference Construction Module: For the period to be identified, the preprocessed electricity metering data is aggregated by day or hour to construct the load curve for the period to be identified. After calculating several difference features between the load curve for the period to be identified and the baseline curve, all difference features are concatenated into a difference feature vector. Multi-model joint identification module: The statistical features and differential feature vectors of the load curve of the period to be identified are concatenated to obtain the load feature sequence. After the feature is extracted by the deep neural network, it is input into at least one machine learning classifier to obtain the probability of electricity theft output by multiple classifiers. Electricity theft assessment module: integrates the electricity theft risk probabilities output by all classifiers to obtain the user's electricity theft risk level; Visualization and report generation module: Graphically displays the results of electricity theft identification, as well as the baseline curve, the load curve of the period to be identified, and their differences.
[0054] For more detailed information on the above modules, please refer to the relevant content disclosed in the foregoing embodiments, which will not be repeated here.
[0055] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the systems disclosed in the embodiments, since they correspond to the methods disclosed in the embodiments, the descriptions are relatively simple; relevant parts can be referred to the method section. Those skilled in the art will clearly understand that the technologies in the embodiments of this invention can be implemented using software plus necessary general-purpose hardware platforms. Based on this understanding, the technical solutions in the embodiments of this invention, in essence or the parts that contribute to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or certain parts of the embodiments of this invention.
[0056] The above are merely preferred embodiments of the present invention. The scope of protection of the present invention is not limited to the above embodiments. All technical solutions falling within the scope of the present invention's concept are within the scope of protection of the present invention. It should be noted that for those skilled in the art, any improvements and modifications made without departing from the principles of the present invention should be considered within the scope of protection of the present invention.
Claims
1. A user electricity theft identification method based on load modeling and multi-model detection, characterized in that, Includes the following steps: S1: Acquire historical and target time period electricity metering data from low-voltage distribution users and perform preprocessing; S2: The preprocessed historical electricity metering data is classified and summarized according to different dimensions, and the daily load curves of the same date are averaged point by point to obtain the user baseline curve, including: annual baseline load curve and typical daily electricity consumption behavior curve. S3: For the period to be identified, the preprocessed electricity metering data is aggregated by day or hour to construct the load curve of the period to be identified. After calculating several difference features between the load curve of the period to be identified and the benchmark curve, all the difference features are spliced into a difference feature vector. S4: The statistical features and differential feature vectors of the load curve for the period to be identified are concatenated to obtain the load feature sequence. After the features are extracted by the deep neural network, they are input into at least one machine learning classifier to obtain the probability of electricity theft output by multiple classifiers. S5: Combine the electricity theft risk probabilities output by all classifiers to obtain the user's electricity theft risk level.
2. The user electricity theft identification method based on load modeling and multi-model detection according to claim 1, characterized in that, In step S1, the preprocessing includes: standardization, filling missing values in the electricity metering data with the median of electricity consumption of the same user in adjacent time windows, and removing or truncating abnormal electricity metering data that exceeds preset upper and lower limits.
3. The user electricity theft identification method based on load modeling and multi-model detection according to claim 1, characterized in that, In step S2, the typical daily electricity consumption behavior curve includes the weekday electricity consumption behavior curve, the non-weekday electricity consumption behavior curve, the seasonal weekday electricity consumption behavior curve, and the seasonal non-weekday electricity consumption behavior curve. The annual benchmark load curve uses the day number within the year as the independent variable, and the typical daily electricity consumption behavior curve uses the hour number within the day as the independent variable.
4. The user electricity theft identification method based on load modeling and multi-model detection according to claim 3, characterized in that, Step S2 obtains the user baseline curve, which specifically includes: grouping the historical period electricity metering data according to user type, season, working day and non-working day; averaging the daily load curves with the same day number in each group point by point to obtain the annual baseline load curve and the typical daily electricity consumption behavior curve; and smoothing the baseline curve by using a moving average or low-pass filter.
5. The user electricity theft identification method based on load modeling and multi-model detection according to claim 1, characterized in that, In step S3, if the period to be identified is measured on an intraday or interday hourly basis, the difference characteristics between the load curve of the period to be identified and the corresponding period of the typical daily electricity consumption behavior curve are calculated. If the period to be identified includes several days or covers a year, the difference characteristics between the load curve of the period to be identified and the corresponding period of the annual benchmark load curve are calculated.
6. The user electricity theft identification method based on load modeling and multi-model detection according to claim 1, characterized in that, The difference features mentioned in step S3 include at least one of amplitude deviation, shape similarity, and temporal offset; The amplitude deviation The calculation formula is: ; in, The number of sampling points for the time period to be identified. and Sampling points in the time period to be identified The values of the baseline curve and the load curve for the period to be identified. To prevent the minimum power threshold from being zero.
7. The user electricity theft identification method based on load modeling and multi-model detection according to claim 1, characterized in that, The deep neural network in step S4 is a one-dimensional convolutional neural network. The one-dimensional convolutional neural network includes an input layer connected in sequence, at least two one-dimensional convolutional layers and a pooling layer, a fully connected layer, and a penultimate layer for outputting a high-dimensional representation vector. The one-dimensional convolutional layer is used to extract local patterns and periodic features of the load feature sequence in the time dimension, and the output of the penultimate layer is provided as a high-dimensional representation vector to multiple subsequent machine learning classifiers. The machine learning classifiers include: random forest classifier, support vector machine classifier, multilayer perceptron, and gradient boosting tree; The statistical characteristics of the load curve for the period to be identified in step S4 include at least one of the following: peak value, valley value, peak-valley difference, average load, load factor, volatility index, and the time when the peak occurs.
8. The user electricity theft identification method based on load modeling and multi-model detection according to claim 1, characterized in that, In step S5, the electricity theft risk probabilities output by all classifiers are fused through voting fusion or weighted fusion to obtain the final electricity theft risk probability. The electricity theft risk level is then divided according to an adaptive threshold, including normal, suspicious and serious. The adaptive threshold is determined by constructing an empirical distribution using the comprehensive risk probability of historical normal users, and then taking the upper limit of that distribution. quantiles serve as the first threshold between normal and suspicious conditions. , take up quantiles serve as a second threshold between suspicious and highly suspicious conditions. ,in ,when When judged as a normal user, when When a user is identified as suspicious, Users were identified as highly suspicious at that time.
9. The user electricity theft identification method based on load modeling and multi-model detection according to claim 1, characterized in that, It also includes step S6: generating corresponding identification results based on the electricity theft risk level, outputting user number, risk probability and risk level, and graphically displaying the baseline curve, the load curve of the period to be identified and their differences.
10. A user electricity theft identification system based on load modeling and multi-model detection, characterized in that, include: Data acquisition and preprocessing module: Acquires and preprocesses historical and unidentified time period electricity metering data from low-voltage power distribution users; The baseline curve modeling module classifies and summarizes the pre-processed historical electricity metering data according to different dimensions, and averages the daily load curves of the same date point by point to obtain the user baseline curve, including: the annual baseline load curve and the typical daily electricity consumption behavior curve. Difference Construction Module: For the period to be identified, the preprocessed electricity metering data is aggregated by day or hour to construct the load curve for the period to be identified. After calculating several difference features between the load curve for the period to be identified and the baseline curve, all difference features are concatenated into a difference feature vector. Multi-model joint identification module: The statistical features and differential feature vectors of the load curve of the period to be identified are concatenated to obtain the load feature sequence. After the feature is extracted by the deep neural network, it is input into at least one machine learning classifier to obtain the probability of electricity theft output by multiple classifiers. Electricity theft assessment module: integrates the electricity theft risk probabilities output by all classifiers to obtain the user's electricity theft risk level; Visualization and report generation module: Graphically displays the results of electricity theft identification, as well as the baseline curve, the load curve of the period to be identified, and their differences.