A multi-dimensional data-driven based user energy consumption profile analysis method
By using a user energy consumption profiling analysis method driven by multivariate data, and by employing SAX algorithm for dimensionality reduction, simulated annealing particle swarm optimization, and improved AP clustering algorithm, the problem of participation willingness and economy in user-side demand response was solved, and in-depth analysis of user energy consumption behavior and resource optimization were achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- STATE GRID SHANGHAI MUNICIPAL ELECTRIC POWER CO
- Filing Date
- 2022-12-19
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies suffer from problems such as weak user participation, poor economic efficiency, and immature business models in responding to user-side demands. Furthermore, they lack in-depth data analysis models for the comprehensive energy consumption behavior of end users.
A user energy consumption profile analysis method based on multivariate data is adopted, including time series symbolic aggregation approximation SAX algorithm for dimensionality reduction, simulated annealing particle swarm optimization algorithm and improved AP clustering algorithm for analyzing user energy consumption characteristics, and combining CRITIC weighting method to determine indicator weights and construct user energy consumption profiles.
It enables in-depth analysis of users' energy consumption behavior, improves users' willingness to participate and economic efficiency, builds a reasonable business model, and enhances the efficiency of user-side resource utilization.
Smart Images

Figure CN116304295B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to an analysis method, and more particularly to a user energy consumption profile analysis method based on multi-source data and its application. Background Technology
[0002] User-side resources are generally utilized through three methods: peak shaving, valley filling, and precise real-time load control. This can reduce power system investment while maintaining a balance between power generation, grid, and load, promoting the consumption of renewable energy, and mitigating environmental accident risks. Smart energy, with big data technology at its core, can better understand user needs, rationally allocate energy, ensure the fulfillment of users' daily production and living needs, place greater emphasis on user experience, and achieve complementary advantages among individual users. It can also construct new models that transform energy data into social public value, rationally adjust energy supply and demand, and contribute to industrial upgrading and people's livelihood development.
[0003] Regarding user-side demand response, numerous domestic and international experts have conducted research and made significant contributions to exploring user-side demand optimization, identifying issues such as weak user participation, poor economic viability of user-end projects, and immature business models. In terms of user behavior characteristic analysis methods, common data extraction methods include PCA evolutionary transformation and the k-means algorithm. Domestic and international experts have analyzed these technologies; PCA evolutionary transformation can achieve massive-scale analysis, preserve key data from the original data, reduce dimensionality, and improve clustering quality; the k-means method is simple and convenient, has a good clustering success rate, and strong scalability. Current scientific research mainly focuses on data analysis of customers' comprehensive energy consumption behavior, but the development of data analysis models for end-user comprehensive energy consumption behavior from an integration capability perspective is still in the exploratory stage. This research aims to effectively address the current problems at the user end of integrated energy systems and fill the research gap in this direction. Summary of the Invention
[0004] To address the shortcomings of existing technologies, this invention discloses a user energy consumption profile analysis method based on multi-source data, the technical solution of which is as follows:
[0005] A user energy consumption profiling analysis method based on multi-source data is characterized by the following steps:
[0006] Step 1: Use the time series symbolic aggregation approximation SAX algorithm to reduce the dimensionality of the load curve and extract features;
[0007] Step 2: Optimize the extracted features using a simulated annealing particle swarm optimization algorithm;
[0008] Step 3: Based on the user's energy consumption characteristics, perform cluster analysis on the load curve using the improved AP clustering algorithm;
[0009] Step 4: Analyze the energy consumption behavior of various types of users based on the clustering results.
[0010] The present invention also discloses a non-volatile storage medium, characterized in that the non-volatile storage medium includes a stored program, wherein the program, when running, controls the device where the non-volatile storage medium is located to execute the above-described method.
[0011] The present invention also discloses an electronic device, characterized in that it comprises a processor and a memory; the memory stores computer-readable instructions, and the processor is used to execute the computer-readable instructions, wherein the computer-readable instructions, when executed, perform the method described above.
[0012] This invention also discloses a user energy consumption profile analysis device based on multi-source data, characterized by comprising the following modules:
[0013] Dimensionality reduction and feature extraction module: used to reduce the dimensionality of the load curve and extract features using the time series symbol aggregation approximation SAX algorithm;
[0014] Simulated Annealing Particle Swarm Optimization Algorithm Module: Based on the simulated annealing particle swarm optimization algorithm, the optimization problem expressed by the symbolic aggregation approximation (SAX) of the time series of load curves is transformed into a multi-objective optimization problem;
[0015] Clustering analysis module: Based on user energy consumption characteristics, the improved AP clustering algorithm is used to perform clustering analysis on the load curve;
[0016] Energy consumption analysis module for various user types: Analyzes the energy consumption behavior of various user types based on clustering results.
[0017] Beneficial effects
[0018] Based on the current energy consumption status of users, this invention employs a reasonable image information acquisition algorithm and an improved AP clustering algorithm to mine effective information from energy consumption data and apply it to the analysis of users' diverse energy consumption behaviors, thereby grasping the characteristics of users' energy consumption and constructing a user energy consumption behavior profile set. Attached Figure Description
[0019] Figure 1 The flowchart of the improved AP clustering algorithm of this invention.
[0020] Figure 2 This invention provides a cluster center curve diagram of the user dataset. Detailed Implementation
[0021] Example 1
[0022] This invention discloses a user energy consumption profile analysis method based on multi-source data, including the following:
[0023] (1) Time Series Symbolic Aggregation Approximation Method Based on Particle Swarm Optimization (1.1) Principle of Time Series Symbolic Aggregation Approximation Algorithm
[0024] Symbolic Aggregation Approximation (SAX) is a method for representing continuous time series using symbolic representation. It converts time series into strings and exhibits good dimensionality reduction performance for high-dimensional sequences. The specific steps are as follows:
[0025] Step 1: Convert the n-dimensional time series into a w-dimensional vector, and convert the original load curve X = [x1, x2... x...] n The data is approximated by segmented aggregation, and the data is segmented into w segments. The i-th The calculation formula is as follows:
[0026]
[0027] The original n-dimensional time series vector is divided into w segments, thus reducing it to w dimensions. j It is the column vector of the original load curve; It is the mean of the i-th segment; This refers to the compression ratio.
[0028] Step 2: Characterize the sequence data obtained by Piecewise Aggregate Approximation (PAA) to normalize each time series, and then convert it into a Piecewise Aggregate Approximation (PAA) representation.
[0029]
[0030] in, For a sub-column of length n; α j β is the i-th element in the alphabet; j-1 β j These are the (j-1)th and jth probability values in the Gaussian distribution breakpoint list, respectively.
[0031] Step 3: After dimensionality reduction of the time series, false negatives can easily occur during feature space queries. The lower bound theorem is used to ensure no false negatives. Extending this to SAX, the n-dimensional time series C and Q are transformed into w-dimensional vectors, resulting in the PAA representation. Substituting the dimensionality reduction formula into the Euclidean distance, the distance metric formula for PAA is obtained:
[0032]
[0033] in, These are the time series after dimensionality reduction for Q and C, respectively. They are respectively The i-th element. Further, the data is transformed into a symbolic representation, and the MINDIST function is defined, which returns the minimum distance between the original time series of two words:
[0034]
[0035] Step 4: There is an optimization direction, namely, improving the tightness of the lower bound (TLB), which is represented in this paper as:
[0036]
[0037] D(Q,C) represents the Euclidean distance between time series Q and C. Clearly, the TLB value ranges between 0 and 1; the closer its value is to 1, the closer the lower bound distance is to the true distance metric, meaning the smaller the error.
[0038] (1.2) Based on simulated annealing particle swarm algorithm
[0039] Particle Swarm Optimization (PSO) is a swarm-based optimization algorithm with global optimization capabilities. It uses an iterative method to search for the optimal value. The system is initialized with a set of random solutions, and particles (potential solutions) search for the best particle swarm among these solutions. However, PSO suffers from local optima, resulting in slow convergence and poor accuracy in the later stages of the evolution. To overcome these computational challenges, this paper proposes a simulated annealing-based PSO algorithm. This algorithm retains the unique global optimization technique of traditional PSO, is simpler, and effectively avoids the problem of PSO getting trapped in local optima.
[0040] The optimization problem expressed by the symbolic aggregation approximation (SAX) of the load curve time series is transformed into a multi-objective optimization problem based on the simulated annealing particle swarm optimization algorithm. The objective function is as follows:
[0041]
[0042] in:
[0043]
[0044]
[0045]
[0046] 2≤l≤l m (10)
[0047] 2≤w≤w m (11)
[0048] In the formula, A represents the accuracy, reflecting the representation function of the segmented load curve on the original load curve; E represents the information content, measured by information entropy. The smaller the information entropy, the greater the accuracy when making predictions using existing signals, and the greater the amount of information it contains; R represents the simplification rate, reflecting the degree of compression of the original load curve. The value of PPA after piecewise approximation of the load curve. Compared with the original load curve X i The correlation coefficient. Due to the different dimensions, this paper adopts... After spline interpolation, it forms a shape similar to X. i For sequences of the same dimension, the correlation coefficient is calculated: p i For the character i in X i The probability of occurrence in; l m w is the maximum number of characters. m To set the maximum number of segments, this paper takes l. m =w m =10, μ is the weighting coefficient for simplifying the two parameters, and in this paper we take μ=0.5.
[0049] The algorithm's effectiveness is evaluated using three metrics: A, R, and E. The optimal load curve is the one that achieves the best overall performance.
[0050] (2) AP clustering algorithm based on optimized time series symbol aggregation approximation algorithm and energy consumption characteristic index
[0051] (2.1) Description of user energy consumption characteristics
[0052] In processing user energy consumption data, employing appropriate feature extraction techniques can ensure effective operational results while reducing computational load. Data mining often involves acquiring data with a clearer physical meaning, enabling power companies to better study and process relevant data. This allows for early warning, anomaly analysis, and demand-side management through energy consumption data analysis. Furthermore, by combining the discrete and time-domain characteristics obtained from key demand-side data with those acquired through time-series symbolic aggregation approximation techniques, the load curve can be dimensionality-reduced, allowing for a more efficient and intuitive analysis of its intrinsic meaning and a more complete evaluation.
[0053] User energy consumption characteristic indicators reflect the internal patterns of load curves and can quickly and efficiently extract useful information from high-dimensional load curves. This paper introduces three typical energy consumption characteristic indicators: energy load level, energy stability, and energy interaction capability. Specific indicators, including average daily load, daily load factor, peak energy consumption rate, and off-peak electricity factor, are selected as feature vectors to cluster the load curves. Using these indicators as the main data feature vectors, and based on the discrete characteristics of SAX optimization, the time-domain and state characteristics of the load curves are comprehensively reflected, serving as the basis for load curve clustering. The selected indicators are shown in Table 1.
[0054] Table 1 User Energy Consumption Characteristics Indicators of Integrated Energy System
[0055]
[0056]
[0057] (2.2) CRITIC weighting method
[0058] To avoid subjectivity in users' energy consumption characteristic indicator settings, this paper adopts the CRITIC weighting method to evaluate the contribution of each characteristic indicator to the clustering results and objectively determine the indicator weights of energy consumption characteristics. The basic idea is to comprehensively measure the objective weights of indicators based on the comparative strength and the conflict between them. The comparative strength borrows from the concept of mean squared error to characterize the differences between evaluation indicators. That is, the larger the mean squared error value, the greater the amount of information contained in the indicator; conflict represents the correlation between different indicators. If the correlation coefficient between two indicators is larger, the correlation is stronger, and the corresponding conflict is lower.
[0059] The specific steps for obtaining objective weights using the CRITIC weighting method are as follows:
[0060] 1) Indicator normalization. Suppose there are m evaluation objects and n evaluation indicators. Given that different indicators have different trends in their impact on the final evaluation results, a positive / negative normalization method is used to normalize the different indicators.
[0061] Positive indicators are shown in (12):
[0062]
[0063] The reverse indicator is shown in (13):
[0064]
[0065] In the formula: i = 1, 2, ..., m; j = 1, 2, ..., n; a ij b represents the actual value of the j-th metric for the i-th user; ij This represents the j-th metric value for the i-th user after normalization.
[0066] 2) Calculate the correlation coefficient of the evaluation index matrix. The correlation coefficient can describe the conflict between the indicators. If there is a significant positive correlation between two indicators, it indicates that the conflict is smaller and the weight is lower. The correlation coefficient is calculated as shown in equation (14):
[0067]
[0068] Where: i = 1, 2, ..., n; j = 1, 2, ..., n; r ij Let be the correlation coefficient between the i-th indicator and the j-th indicator.
[0069] 3) Calculate the weights. Using the obtained correlation coefficient matrix, calculate the comparative strength and conflict of each evaluation index, as shown in equation (15):
[0070]
[0071] Where: j = 1, 2, ..., n; σ j Let be the correlation coefficient between the i-th indicator and the j-th indicator. Let j be the comparative strength of the j-th indicator. The quantitative indicator represents the degree of conflict between the j-th indicator and other indicators. Based on the comparative strength and conflict of the indicators, the amount of information contained in the indicator is calculated, as shown in Equation (16):
[0072]
[0073] Among them G j The larger the value, the more information the j-th indicator contains, and the greater the weight should be.
[0074] The final objective weight W of the j-th indicator j for:
[0075]
[0076] (2.3) Improved AP clustering algorithm
[0077] The AP clustering algorithm has advantages such as not requiring a specified number of clusters and minimizing the quadratic sum of clustering errors, but its inherent complexity is relatively high. In processing multidimensional data, it often requires a considerable amount of computation time. Therefore, this paper improves the calculation speed of the AP clustering similarity matrix by selecting discrete state variables and energy consumption characteristics of the load curve, and adjusts the bias parameter to enhance clustering efficiency.
[0078] 1) Improve the similarity matrix
[0079] s(i,j)=-[αd dij +(1-α)dtij i≠j (18)
[0080]
[0081] Where s(i,j) are the elements of the improved similarity matrix; d dij and d tij d represents the discrete state characteristics of load curve i and load curve j after SAX calculation, and t represents the distance between them. The distance between the energy consumption characteristics is represented by Euclidean distance; α is the feature weight coefficient.
[0082] 2) Improve the bias parameter
[0083] The element s(i,i) on the main diagonal of the similarity matrix is the bias parameter, and its value is related to the number of clustering results. Using clustering evaluation metrics to select a reasonable bias parameter value can effectively reduce the number of algorithm iterations and improve clustering accuracy.
[0084] The AP clustering algorithm exhibits good stability, with minimal variation in the clustering performance evaluation (DB) metric range across multiple iterations. Therefore, the DB metric is used as the bias parameter selection and convergence criterion for the AP clustering algorithm, as shown in the equation.
[0085] s(i,i)=p m +δDB min (20)
[0086] Where p m The initial value for the median of all numbers not on the main diagonal; DB min δ is the minimum DB value calculated by the current algorithm; δ is the search threshold, δ>0 indicates forward search, δ<0 indicates backward search; the DB index is calculated as shown in (21), the smaller the value, the lower the similarity between classes, and thus the better the clustering effect.
[0087]
[0088] In the formula, n is the number of clusters; W i W j The distances from data points within the i-th and j-th clusters to the cluster center C are respectively. j Average distance; C ij Let be the distance between cluster centers i and j.
[0089] Improved AP clustering algorithm process as follows Figure 1 As shown.
[0090] (3) Case Analysis
[0091] This section selects user data from a comprehensive energy system park, randomly choosing 2000 load curves. The initial energy consumption characteristic indicators are weighted equally. After optimization using a particle swarm optimization algorithm based on simulated annealing, the optimal number of segments w = 3 and the optimal number of characters l = 6 are obtained. Using the optimized AP clustering algorithm presented in this paper, the final cluster centers are 4 types, as shown below. Figure 2 As shown:
[0092] Depend on Figure 2 It can be seen that there are significant differences between the load curves, and the energy consumption of various typical users varies considerably. Each cluster center represents the energy consumption of one type of user. Figure 2 It can be seen that there are significant differences between the load curves, and the energy consumption of various typical users varies considerably. Each cluster center represents the energy consumption of a type of user. Type A users have higher energy consumption in the morning and evening, with a significant drop at midday, likely belonging to working professionals. Types B and C users' energy consumption increases after 7 am and decreases after 8 pm, their energy consumption behavior aligns with the daily routines of most residents. Type B users have relatively even energy consumption throughout the day, with slightly higher consumption in the morning and evening, exhibiting continuous energy consumption characteristics but without obvious peak-valley features. Type C users have higher daytime energy consumption than Type B users, with two peaks at midday and evening, belonging to a bi-peak load. Type D users have low energy consumption, mostly due to equipment wear and tear, but possibly also due to vacant rooms, business travelers, or other residents who do not use electricity throughout the day. Based on the extracted load characteristics, in-depth analysis of user energy consumption behavior can be conducted.
[0093] Users in category D have excessively low energy consumption levels and are therefore not analyzed. This paper evaluates the energy consumption levels of users A, B, and C respectively. Users in category A have a large daily peak-to-valley difference and should implement peak shaving and valley filling, representing a potential group for demand response. Users in categories B and C have high daily load rates and can be considered representatives of residential demand response. Higher peak-hour electricity prices should be set for these users to guide them in implementing peak shaving and valley filling, thus promoting the optimal allocation of electricity resources. In addition, users in category B have a large peak-shaving capacity and relatively stable daily energy consumption, and can be coordinated with users in category D for scheduling to fill load troughs.
[0094] The cluster center characteristic indicators are shown in Table 2, and the corresponding initial weights and improved final weights for cluster A are shown in Table 3. To simplify the analysis, the cluster centers are used as representative loads on the load curves for this type. As shown in Table 3, the daily average load has the highest weight and should be primarily considered in the analysis.
[0095] Table 2 Cluster Center Characteristics Indicators
[0096]
[0097]
[0098] Table 3 Initial Weights and Update Results
[0099]
[0100] Meanwhile, based on the discrete state characteristics of each representative load, the CRITIC weighting method can be used to analyze energy consumption characteristics. Combined with the qualitative analysis of energy consumption characteristic indicators, the user's demand response potential can be further analyzed. According to formula (16) in the CRITIC weighting method, the greater the amount of information contained in the indicator, the greater the weight. Conflict represents the correlation between different indicators. The correlation coefficient is used to represent the correlation between indicators. The stronger the correlation with other indicators, the less conflict the indicator has with other indicators. The more identical information it reflects, the more repetitive the evaluation content it can reflect. To a certain extent, this weakens the evaluation strength of the indicator, and the weight allocated to the indicator should be reduced. Therefore, it can be considered that users with a large amount of information are suitable for price-based demand response, and users with a small amount of information are suitable for incentive-based demand response. Assuming that the correlation coefficient remains unchanged, the greater the conflict, i.e., the standard deviation, the greater the amount of information contained. The calculation of the conflict of indicators for each user is shown in Table 4. It can be seen that Class B users contain a larger amount of information and their overall energy consumption level is relatively average, making them suitable as price-based demand response customers. Flexible electricity prices can be set to guide users to change their energy consumption behavior. On the other hand, Class A and Class C users consume more energy and contain less information, making them suitable as incentive-based demand response customers. Based on the satisfaction of different users, electricity demand can be reduced when the system needs it or when there is a power shortage.
[0101] Table 4. Conflicts in metrics among users
[0102] user Indicator Conflict A 59.91 B 89.81 C 46.16
[0103] Example 2
[0104] Based on the same inventive concept, this application also provides a non-volatile storage medium, which includes a stored program, wherein the program, when running, controls the device where the non-volatile storage medium is located to execute the method in Embodiment 1 above.
[0105] Example 3
[0106] Based on the same inventive concept, this application also provides an electronic device comprising a processor and a memory; the memory stores computer-readable instructions, and the processor is used to execute the computer-readable instructions, wherein the computer-readable instructions execute the method in Embodiment 1 above.
[0107] Example 4
[0108] Based on the same inventive concept, this application also provides a user energy consumption profile analysis device driven by multi-source data, comprising the following modules:
[0109] Dimensionality reduction and feature extraction module: used to reduce the dimensionality of the load curve and extract features using the time series symbol aggregation approximation SAX algorithm;
[0110] Simulated Annealing Particle Swarm Optimization Algorithm Module: Based on the simulated annealing particle swarm optimization algorithm, the optimization problem expressed by the symbolic aggregation approximation (SAX) of the time series of load curves is transformed into a multi-objective optimization problem;
[0111] Clustering analysis module: Based on user energy consumption characteristics, the improved AP clustering algorithm is used to perform clustering analysis on the load curve;
[0112] Energy consumption analysis module for various user types: Based on the clustering results, analyze the energy consumption behavior of various user types.
[0113] In summary, this algorithm can efficiently and accurately cluster load curves and extract their important features, which is beneficial for the analysis of user energy consumption behavior. This paper proposes an improved AP clustering algorithm based on SAX discrete state features and weighted energy consumption characteristic indicators, and uses the CRITIC weighting method to determine the objective weights of the energy consumption characteristic indicators. Numerical examples demonstrate that the extracted features not only ensure clustering accuracy but also facilitate the analysis of user energy consumption behavior. It can be extended to various applications such as demand response.
[0114] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are merely principles of the invention. Various changes and modifications can be made to the invention without departing from its spirit and scope, and all such changes and modifications fall within the scope of the claimed invention. The scope of protection claimed by the appended claims and their equivalents is defined.
Claims
1. A user energy consumption profiling analysis method based on multi-source data, characterized by: Includes the following steps: Step 1: Use the time series symbolic aggregation approximation SAX algorithm to reduce the dimensionality of the load curve and extract features; Step 2: Based on the simulated annealing particle swarm optimization algorithm, the optimization problem of the time series symbol aggregation of the load curve approximating the SAX expression is transformed into a multi-objective optimization problem; Step 3: Based on the user's energy consumption characteristics, perform cluster analysis on the load curve using the improved AP clustering algorithm; The improved AP clustering algorithm includes the following: 1) By selecting discrete state variables and energy consumption characteristics of the load curve, the dimensionality of the load curve is reduced, improving the calculation speed of the AP clustering similarity matrix, and the bias parameter is adjusted to improve clustering efficiency: s(i,j)=-[αd dij +(1-α)d tij ]i≠j (18) Where, d dij and d tij d represents the discrete state characteristics of load curve i and load curve j after SAX calculation, and t represents the distance between them. The distance between the energy consumption characteristics is expressed as Euclidean distance; α is the feature weight coefficient. 2) Improvement of bias parameter: The element value s(i,i) on the main diagonal of the similarity matrix is the bias parameter, and its value is related to the number of clustering results; The DB index, used to evaluate clustering performance, is employed as a bias parameter for the AP clustering algorithm and as a convergence criterion, as shown in the following equation: s(i,i)=p m +δDB min (20) Where p m The median of all numbers not on the main diagonal, where is the initial value; DB min The minimum DB value is calculated by the current algorithm; δ is the search threshold. If we want to search forward, we take δ > 0, otherwise we take δ < 0. The DB index is calculated as shown in (21): In the formula, n is the number of clusters; W i For data points within class i, the path to cluster center C is... j average distance; C ij The distance between cluster centers i and j; Step 4: Analyze the energy consumption behavior of various types of users based on the clustering results.
2. The user energy consumption profile analysis method based on multi-source data as described in claim 1, characterized in that step 1 further includes the following: Step 1: Convert the n-dimensional time series into a w-dimensional vector, and convert the original load curve X = [x1, x2, ..., x...] to a w-dimensional vector. n The data is approximated by segmented aggregation, and the data is segmented into w segments. The i-th The calculation formula is as follows: The original n-dimensional time series vector is divided into w segments, thus reducing it to w dimensions. j It is the column vector of the original load curve; It is the mean of the i-th segment; Compression ratio; Step 2: Characterize the sequence data obtained by segmented aggregation approximation PAA to normalize each time series, and then convert it into segmented aggregation approximation PAA representation; in, For a sub-column of length n; α j β is the i-th element in the alphabet; j-1 β j These are the (j-1)th and jth probability values in the Gaussian distribution breakpoint list, respectively. Step 3: After dimensionality reduction of the time series, false negatives are prone to occur during feature space queries. The lower bound theorem is used to ensure no false negatives. Extending this to SAX, the n-dimensional time series C and Q are transformed into w-dimensional vectors, resulting in the PAA representation. Substituting the dimensionality reduction formula into the Euclidean distance yields the distance metric formula for PAA: in, These are the time series after dimensionality reduction for Q and C, respectively. They are respectively The i-th element.
3. The user energy consumption profile analysis method based on multi-source data as described in claim 1, characterized in that step 2 further includes the following: The objective function is as follows: in: 2≤l≤l m (10) 2≤w≤w m (11) In the formula, A is the accuracy, which reflects the representation function of the segmented load curve on the original load curve; E is the information content, which is measured by information entropy. The smaller the information entropy, the greater the accuracy when making predictions using existing signals, and the greater the information content; R is the simplification rate, which reflects the degree of compression of the original load curve. The value of PPA after piecewise approximation of the load curve. Compared with the original load curve X i The correlation coefficient; due to the different dimensions, this paper adopts After spline interpolation, it forms a shape similar to X. i For sequences of the same dimension, the correlation coefficient is calculated: p i For the character i in X i The probability of occurrence in; l m w is the maximum number of characters. m To set the maximum number of segments, this paper takes l. m =w m =10, μ is the weighting coefficient of the two parameters after simplification.
4. The user energy consumption profile analysis method based on multi-source data as described in claim 1, characterized in that: Step 3 further includes the following: introducing three types of typical energy consumption characteristic indicators, namely energy load level, energy consumption stability, and energy consumption interaction capability, and selecting specific indicators including average daily load, daily load rate, peak energy consumption rate, and off-peak electricity coefficient as feature vectors to cluster the load curves. The above indicators are used as data feature vectors, and based on the discrete characteristics optimized by SAX, they comprehensively reflect the time domain and state characteristics of the load curves and serve as the basis for clustering the load curves.
5. The user energy consumption profile analysis method based on multi-source data as described in claim 4, characterized in that: The CRITIC weighting method is used to evaluate the contribution of each characteristic index to the clustering results and to objectively determine the index weights of energy use characteristics. The objective weights of the indicators are comprehensively measured based on the contrast strength and the conflict between the indicators. The contrast strength characterizes the difference between the evaluation indicators: that is, the larger the mean square error, the greater the amount of information contained in the indicator. The conflict represents the correlation between different indicators. If the correlation coefficient between two indicators is larger, the correlation is stronger and the conflict is lower.
6. The user energy consumption profile analysis method based on multi-source data as described in claim 5, characterized in that: CR The specific steps for obtaining objective weights using the ITIC weighting method are as follows: 1) Indicator normalization: Assume there are m evaluation objects and n evaluation indicators. Given that different indicators have different trends in their impact on the final evaluation results, a positive / reverse normalization method is used to normalize the different indicators. 2) Calculate the correlation coefficient of the evaluation index matrix: The correlation coefficient can describe the conflict between the indicators. If there is a significant positive correlation between two indicators, it means that the conflict is smaller and the weight is lower. 3) Calculate the weights: Use the obtained correlation coefficient matrix to calculate the comparative strength and conflict of each evaluation index.
7. A non-volatile storage medium, characterized in that, The non-volatile storage medium includes a stored program, wherein the program, when executed, controls the device where the non-volatile storage medium is located to perform the method described in any one of claims 1 to 6.
8. An electronic device, characterized in that, It includes a processor and a memory; the memory stores computer-readable instructions, and the processor is configured to execute the computer-readable instructions, wherein the computer-readable instructions, when executed, perform the method according to any one of claims 1 to 6.
9. A user energy consumption profile analysis device based on multi-source data, the device performing the method described in any one of claims 1-6, characterized in that: Includes the following modules: Dimensionality reduction and feature extraction module: used to reduce the dimensionality of the load curve and extract features using the time series symbol aggregation approximation SAX algorithm; Simulated Annealing Particle Swarm Optimization Algorithm Module: Based on the simulated annealing particle swarm optimization algorithm, the optimization problem of the time series symbol aggregation of load curves approximating the SAX expression is transformed into a multi-objective optimization problem; Clustering analysis module: Based on user energy consumption characteristics, the improved AP clustering algorithm is used to perform clustering analysis on the load curve; Energy consumption analysis module for various user types: Based on the clustering results, analyze the energy consumption behavior of various user types.