Power load curve coding and clustering method and system, storage medium, terminal

By constructing a dedicated encoder and an unsupervised training mode for power load curve encoding and clustering, this method solves the problems of insufficient feature extraction, limited clustering ability, poor interpretability, and insufficient data security in existing technologies, and achieves efficient and secure power load data analysis.

CN122020228BActive Publication Date: 2026-06-26STATE GRID TIANJIN ELECTRIC POWER CO CHENGXI POWER SUPPLY BRANCH +2

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
STATE GRID TIANJIN ELECTRIC POWER CO CHENGXI POWER SUPPLY BRANCH
Filing Date
2026-04-16
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing power load curve analysis technologies suffer from insufficient feature extraction, limited clustering capabilities, poor interpretability, low computational efficiency, and inadequate data security. Furthermore, data annotation costs are high, making it difficult to meet the needs of large-scale, high-precision, and high-security power load data analysis.

Method used

A dedicated encoder is constructed by using orthogonal square wave basis function generation, frequency domain parameter extraction, hybrid coding and Transformer coding components, combined with adaptive weighted pooling units. Power load curves are encoded and clustered through unsupervised training mode and online clustering loss function. Clustering is performed using hybrid coding matrix and low-dimensional coding vector, and the final number of clusters is determined by combining multi-index decision rules.

Benefits of technology

It effectively captures the complex time-series patterns and frequency domain characteristics of power load curves, improves the accuracy and rationality of clustering, reduces computational complexity, enhances the real-time performance and security of data processing, and reduces data annotation costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122020228B_ABST
    Figure CN122020228B_ABST
Patent Text Reader

Abstract

The application discloses a kind of power load curve coding and clustering method and system, storage medium, terminal, belongs to power load data analysis technical field.Method includes: acquisition and pre-processing batch power load curve data;Initial encoder containing orthogonal square wave base function generation unit is constructed, and the hybrid coding unit fuses multiple encodings to obtain hybrid coding matrix;Using unsupervised mode, combined with attraction and repulsion loss online clustering loss function training encoder;The preprocessed data are input into the trained encoder to obtain coding;Through multi-index combination consistency decision rule clustering coding obtains final clustering number and cluster center;After new data pre-processing, coding is assigned to the cluster of nearest cluster center.The application fully extracts load curve characteristics, improves clustering accuracy and efficiency, enhances model interpretability, ensures data security, reduces application cost, and the clustering allocation effect is excellent, suitable for fine management of power system.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of power load data analysis technology, specifically to a power load curve encoding and clustering method and system, storage medium, and terminal. Background Technology

[0002] With the accelerated construction of smart grids and the continuous development of energy management systems, the analysis and processing technology of power load curves is playing an increasingly important role in power energy applications such as load forecasting, demand response, and energy efficiency assessment, becoming a key technical support for the refined management of power systems.

[0003] Existing technologies and methods for power load curve analysis still have many shortcomings and deficiencies in practical applications, making it difficult to meet the needs of large-scale, high-precision, and high-security power load data analysis. Specific problems are as follows:

[0004] (1) Insufficient feature extraction: Traditional analysis methods are mostly based on statistical feature extraction or simple data transformation, which cannot effectively capture the complex time series patterns and frequency domain characteristics contained in the power load curve. There is a significant deficiency in capturing key information reflecting load characteristics, such as load fluctuation patterns, equipment start-up and shutdown characteristics, and peak-valley power consumption distribution.

[0005] (2) Limited clustering ability: Existing clustering methods generally measure similarity based on the spatial distance of the original data, which is not accurate enough in identifying the morphological similarity of power load curves and makes it difficult to uncover the deep load patterns behind the data.

[0006] (3) Poor interpretability: The relevant deep learning models have typical "black box" characteristics. The encoding results output by the model lack clear physical meaning and cannot effectively explain the encoding process and results.

[0007] (4) Low computational efficiency: Some complex analysis models have long training cycles, high computational complexity, and insufficient real-time data processing.

[0008] (5) Data security risks are prominent: In the existing technology, load curves are mostly displayed, transmitted and stored in their original form. The curve data directly reflects key information such as the shape of the load curve and the energy consumption pattern, which poses a risk of direct exposure of key data and is prone to leakage of energy consumption characteristics, affecting data security.

[0009] (6) High data annotation cost: The coding method of supervised learning requires manual annotation of a large amount of load curve data, which not only consumes a lot of manpower and material resources, but also has high requirements for the accuracy and standardization of the annotation work. The annotation is difficult and greatly increases the cost and threshold of technology application.

[0010] In summary, existing power load curve analysis techniques have several unresolved issues regarding feature extraction, clustering effectiveness, interpretability, computational efficiency, data security, and annotation applications. A novel load curve processing method is urgently needed to overcome these shortcomings and improve the overall performance of power load curve analysis. Summary of the Invention

[0011] In view of the technical problems mentioned in the background, the purpose of this invention is to provide a method and system for encoding and clustering power load curves, a storage medium, and a terminal.

[0012] To achieve the objectives of this invention, the technical solution provided by this invention is as follows:

[0013] First aspect

[0014] This invention provides a method for encoding and clustering electricity load curves, comprising the following steps:

[0015] Step 1: Collect batches of power load curve data Lx The data is then cleaned and standardized to obtain preprocessed power load curve data.

[0016] Step 2: Construct the initial encoder, which includes an orthogonal square wave basis function generation unit, a frequency domain parameter extraction unit, a hybrid coding unit, a Transformer coding component, an adaptive weighted pooling unit, and an output coding layer connected in sequence.

[0017] The hybrid coding unit is used to perform feature coding, absolute position coding, relative frequency coding, and hybrid coding on the frequency domain parameter vector of batch power load curve data to obtain a hybrid coding matrix of power load curve data, and the hybrid coding matrix is ​​the sum of the feature coding matrix, the relative frequency coding matrix, and the absolute position coding matrix.

[0018] Step 3: Train the initial encoder to obtain the trained encoder; the initial encoder is trained in an unsupervised training mode, using an online clustering loss function that combines attraction loss and repulsion loss.

[0019] Step 4: Input the preprocessed power load curve data into the trained encoder to obtain a batch of power load curve data codes z;

[0020] Step 5: Cluster the batch of power load curve data z to obtain the final number of clusters for the batch of power load curve data. Cluster centers of each cluster ;

[0021] Step 6: Transfer the newly collected power load curve data After cleaning and standardization, preprocessed new power load curve data is obtained; this new power load curve data is then input into a trained encoder to obtain new power load curve data encoding. Encode the new power load curve data Assigned to the nearest cluster center The cluster it belongs to.

[0022] Second aspect

[0023] This invention provides a power load curve encoding and clustering system for executing the power load curve encoding and clustering method, comprising the following modules: a data acquisition and processing module, an initial encoder construction module, a training module, an encoding module, a cluster center output module, and a clustering module;

[0024] The data acquisition and processing module is used to acquire batches of power load curve data. Lx The data is then cleaned and standardized to obtain preprocessed power load curve data.

[0025] The initial encoder construction module is used to construct an initial encoder, which includes an orthogonal square wave basis function generation unit, a frequency domain parameter extraction unit, a hybrid coding unit, a Transformer coding component, an adaptive weighted pooling unit, and an output coding layer connected in sequence.

[0026] The hybrid coding unit is used to perform feature coding, absolute position coding, relative frequency coding, and hybrid coding on the frequency domain parameter vector of batch power load curve data to obtain a hybrid coding matrix of power load curve data, and the hybrid coding matrix is ​​the sum of the feature coding matrix, the relative frequency coding matrix, and the absolute position coding matrix.

[0027] The training module is used to train the initial encoder to obtain the trained encoder; wherein, the training of the initial encoder adopts an unsupervised training mode and uses an online clustering loss function that combines attraction loss and repulsion loss.

[0028] The encoding module is used to input the preprocessed power load curve data into the trained encoder to obtain a batch of power load curve data encodings z.

[0029] The cluster center output module is used to cluster the batch of power load curve data encoding z to obtain the final cluster number of the batch of power load curve data. Cluster centers of each cluster ;

[0030] The clustering module is used to process the newly collected power load curve data. After cleaning and standardization, preprocessed new power load curve data is obtained; this new power load curve data is then input into a trained encoder to obtain new power load curve data encoding. Encode the new power load curve data Assigned to the nearest cluster center The cluster it belongs to.

[0031] Third aspect

[0032] This invention provides a storage medium storing at least one instruction, at least one program, a code set, or an instruction set, wherein the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by a processor to implement the power load curve encoding and clustering method described above.

[0033] Fourth aspect

[0034] The present invention provides an electronic terminal, which includes a processor and a memory. The memory stores at least one instruction, at least one program, a code set, or an instruction set. The at least one instruction, the at least one program, the code set, or the instruction set are loaded and executed by the processor to implement the power load curve encoding and clustering method.

[0035] Compared with the prior art, the beneficial effects of the present invention are as follows:

[0036] This invention designs a dedicated encoder structure that includes orthogonal square wave basis function generation, frequency domain parameter extraction, and hybrid coding. It integrates feature coding, absolute position coding, and relative frequency coding to form a hybrid coding matrix. Combined with Transformer coding components and adaptive weighted pooling units, it can effectively capture the complex time-series patterns, frequency domain characteristics, and intrinsic relationships between frequencies of the power load curve. It fully explores key load characteristic information such as load fluctuation patterns, peak and valley power consumption distribution, and equipment start-up and shutdown characteristics, solving the problem of insufficient feature extraction in traditional methods.

[0037] Furthermore, clustering based on the low-dimensional encoded vectors output by the trained encoder breaks through the limitations of traditional methods that rely on spatial distance metrics of the original data, and better reflects the morphological similarity characteristics of the load curve. Simultaneously, the elbow method, silhouette coefficient, Calinski-Harabasz index, and Davies-Bouldin index are combined with a consensus decision rule to determine the final number of clusters. Combined with the K-means clustering algorithm, this approach can uncover deeper load patterns behind the data, significantly improving the accuracy and rationality of clustering. Newly acquired data can be directly matched to the nearest cluster center based on the encoded data, resulting in excellent clustering assignment performance. Attached Figure Description

[0038] Figure 1 This is a schematic diagram of the power load curve encoding and clustering method provided in an embodiment of the present invention. Detailed Implementation

[0039] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention.

[0040] It should be noted that the acquisition of data and collection of information in this application are legal, compliant, or obtained with the consent of the subject of the data collection.

[0041] like Figure 1 As shown, this invention provides a method for encoding and clustering power load curves, including the following steps:

[0042] Step 1: Collect batches of power load curve data Lx The data is then cleaned and standardized to obtain preprocessed power load curve data.

[0043] It should be noted that in step 1, the data for each power load curve... Lx All include within one day L Active power values ​​at each time point.

[0044] In addition, data cleaning can be performed using the following approach: first, use the 3σ principle to detect outliers in the input data, and then use linear interpolation to fill in missing and outlier values ​​in the input data.

[0045] Data standardization can be achieved through the following approach: Based on the characteristics of power load curves, a rated standardization method is used to process the load curves into per-unit values. This process preserves the physical meaning of the load curves, making load curves with different rated capacities comparable.

[0046] Step 2: Construct the initial encoder, which includes an orthogonal square wave basis function generation unit, a frequency domain parameter extraction unit, a hybrid coding unit, a Transformer coding component, an adaptive weighted pooling unit, and an output coding layer connected in sequence.

[0047] The hybrid coding unit is used to perform feature coding, absolute position coding, relative frequency coding, and hybrid coding on the frequency domain parameter vector of batch power load curve data to obtain a hybrid coding matrix of power load curve data, and the hybrid coding matrix is ​​the sum of the feature coding matrix, the relative frequency coding matrix, and the absolute position coding matrix.

[0048] Among them, the orthogonal square wave basis function generating unit is used to generate the orthogonal square wave basis function combination w, which contains The basis functions of a square wave at frequencies, where, Each power load curve contains half the number of data points;

[0049] The frequency domain parameter extraction unit is used to calculate the dot product between each preprocessed power load curve data and each frequency square wave in w, and combine the frequency dot product results of the same power load curve data into a vector to obtain the frequency domain parameter vector f of batch power load curve data.

[0050] The hybrid coding unit is specifically used for the following:

[0051] (1) The frequency domain parameter vector f of the batch of power load curve data is mapped to a high-dimensional feature space to obtain the feature encoding matrix. The formula is as follows:

[0052] ;

[0053] in: Encode the weight matrix for the features; The feature encoding bias vector;

[0054] For feature dimensions; is the number of orthogonal square wave basis functions; T is the transpose of the matrix;

[0055] (2) The absolute position of each element in the frequency domain parameter vector f of the power load curve data is encoded using a sinusoidal Euclidean function. The position encoding parameter for each absolute position pos is:

[0056] ;

[0057] in:

[0058] This is the absolute position index in the frequency domain parameter vector;

[0059] For dimensional indexing, ;

[0060] For position Encoded value in even-dimensional 2p;

[0061] :Location Encoded values ​​in odd-dimensional 2p+1;

[0062] The absolute position codes at each pos position are combined into a position coding matrix. :

[0063] ;

[0064] (3) For the frequency domain parameter vector f of the batch power load curve data, consider the relative relationship between frequency domain parameters, capture the intrinsic relationship between frequencies, and construct the relative relationship matrix. ,in The element at position (i, j) in R is ;

[0065]

[0066] in:

[0067] It is a relative relation matrix;

[0068] For the first The and the first The strength of the relationship between the frequency domain parameters;

[0069] The frequency domain index difference represents the frequency distance;

[0070] The absolute value of the frequency domain parameter;

[0071] To avoid division by zero, it is usually taken as a small constant. ;

[0072] Represents the relative distance between locations in the frequency domain;

[0073] Indicates the relative magnitude of frequency domain parameter values;

[0074] After constructing the relative relation matrix R, calculate the relative frequency coding matrix. :

[0075]

[0076] in, It is a learnable and trainable relative frequency encoded weight matrix;

[0077] (4) The three encoding matrices are added together to obtain the hybrid encoding matrix H:

[0078] ;

[0079] The Transformer encoding component unit is used to input the hybrid encoding matrix of the power load curve data into the Transformer model for processing, and obtain the Transformer encoding of the batch of power load curve data.

[0080] It should be noted that the Transformer model uses a multi-layer (3-layer) cascaded structure to construct the Transformer encoding component. The first layer of the encoding component performs deep feature extraction on the hybrid encoding of each load curve. Further through the cascaded structure, each layer of the Transformer model deepens the feature extraction and feature fusion of the output data of the previous layer, and the last layer of the Transformer model outputs the Transformer encoding.

[0081] Adaptive weighted pooling units are used in Transformer encoding of batch power load curve data. Weighted pooling is performed on each frequency dimension to extract the global feature vector. The formula is as follows:

[0082] ;

[0083] in: The output of the Transformer encoding component unit; For the first A vector with frequency dimension; This is a learnable attention weight vector; For the first Normalized attention weights for each frequency dimension; r is the frequency dimension index, used to normalize and sum the attention weights for all frequency dimensions in the denominator;

[0084] The output encoding layer is used to process the pooled global feature vector. Mapping to the final low-dimensional encoding vector, we obtain the final encoding vector z1 of the encoder for the batch of power load curve data, as shown in the following formula:

[0085] ;

[0086] in: This is the first layer weight matrix; This is the first layer bias vector; This is the weight matrix for the second layer; This is the second layer bias vector; Let the hidden layer dimension be denoted as . ; For the final encoding dimension, take .

[0087] It should be noted that the encoder design revolves around the frequency domain characteristics, location characteristics, and relative frequency relationships of the power load curve. Each encoding unit and processing step corresponds to a clear physical meaning (such as frequency domain parameter extraction and frequency relative distance / size measurement), breaking the black-box nature of traditional deep learning models. This allows for effective interpretation of the encoding process and results, facilitating technology implementation and result analysis. The constructed encoder structure is specifically designed, using orthogonal square wave basis function dimensionality reduction and adaptive weighted pooling to extract global features, mapping the load curve into a fixed-dimensional low-dimensional encoding vector (the final encoding dimension is 48), significantly reducing the complexity of subsequent clustering calculations. Simultaneously, the online clustering loss function in unsupervised training mode optimizes the model training process, shortens the training cycle, and improves the computational efficiency and real-time performance of power load data processing, meeting the needs of large-scale data processing.

[0088] Step 3: Train the initial encoder to obtain the trained encoder. The initial encoder is trained using unsupervised training mode, employing batch gradient descent, with AdamW as the optimizer and annealing as the learning rate scheduling strategy. An online clustering loss function combining attraction loss and repulsion loss is used. The innovation of this loss function lies in constructing a dual-force driven, end-to-end optimized online clustering loss function. By combining attraction loss and repulsion loss, it simultaneously guides the encoder's learning and cluster structure formation during unsupervised training: attraction loss causes samples to cluster towards nearby cluster centers, while repulsion loss controls the separation between different clusters through interpretable distance boundaries. This directly promotes a clear, stable, and physically meaningful cluster distribution during the encoding process, without pre-setting the number of clusters or relying on post-processing clustering algorithms. It achieves joint adaptive learning of encoding optimization and cluster guidance, significantly improving the discriminative power of feature representation and the interpretability of clustering results.

[0089] The online clustering loss function includes the following:

[0090] (1) Loss of attractiveness , used to measure the average distance between a sample and its nearest cluster center, is calculated as follows:

[0091] ;

[0092] in: The number of samples processed in each training session is taken as... ;

[0093] : No. The encoded vector of each sample, with dimension ;

[0094] For the first The encoded vector of each sample serves as a potential cluster center in online clustering;

[0095] For the minimization operator, find the distance to the th The nearest cluster center for each sample;

[0096] Euclidean distance function;

[0097] (2) Loss of repulsive force This is used to measure the degree of inadequacy in the distance between cluster centers, and the formula is as follows:

[0098] ;

[0099] in: For double summation, traverse all distinct cluster center pairs, totaling right;

[0100] This is the hinge loss function; it returns the original value if the value in parentheses is positive, otherwise it returns 0.

[0101] The distance boundary is a hyperparameter that controls the minimum expected distance between cluster centers. ;

[0102] Cluster center and The Euclidean distance between them;

[0103] Normalization factor;

[0104] (3) Total loss The objective function used for model training and optimization is as follows:

[0105] ;

[0106] in: As a weighting coefficient, it balances the importance of attractive force loss and repulsive force loss. .

[0107] Step 4: Input the preprocessed power load curve data into the trained encoder to obtain a batch of power load curve data codes z;

[0108] Step 5: Cluster the batch of power load curve data z to obtain the final number of clusters for the batch of power load curve data. Cluster centers of each cluster Specifically, it includes the following:

[0109] Step 5.1: Determine the range of candidate cluster number K as [2, ...]. ];

[0110] ;

[0111] in: K max The maximum number of clusters; This is a rounding operation;

[0112] Step 5.2: Based on the batch of power load curve data encoding z, calculate the set of effective candidate optimal K values. The details are as follows:

[0113] (1) Calculation In the data, the clustering parameters for each cluster number are as follows:

[0114] right For each cluster size value, the K-means clustering algorithm is used, with the Euclidean distance between the codes as the distance metric, to calculate the cluster center of each cluster when the number of candidate clusters is K. and the clusters of power load curves for each cluster center , where k is the label of each of the K clusters;

[0115] (2) Determine the number of first candidate optimal clusters using the elbow method. The details are as follows:

[0116] For each candidate cluster number Calculate the clustering inertia corresponding to the clustering results for each K value. :

[0117] ;

[0118] in, To belong to the cluster center The load curve coding, k Let K be the identifier for each of the K candidate clusters;

[0119] Based on clustering inertia The elbow method is used to select the point with the largest change in inertial curvature as the first candidate for the optimal number of clusters. ;

[0120] (3) Determine the number of second candidate optimal clusters using the optimal silhouette coefficient. The details are as follows:

[0121] For each candidate cluster number Calculate the global silhouette coefficient corresponding to the clustering results for each K value. Select The maximum number of candidate clusters K is the number of the second-best candidate clusters. ;

[0122] (4) Use the optimal Calinski-Harabasz index to determine the number of third candidate optimal clusters. The details are as follows:

[0123] For each candidate cluster number Calculate the Calinski-Harabasz index CH(K) corresponding to the clustering results for each K value, and select the cluster number K with the largest CH(K) as the third candidate optimal cluster number. ;

[0124] (5) Use the optimal Davies-Bouldin index to determine the number of fourth candidate optimal clusters. The details are as follows:

[0125] For each candidate cluster number Calculate the Davies-Bouldin index DB(K) corresponding to the clustering results for each K value, and select the cluster number K with the largest DB(K) as the fourth candidate optimal cluster number. ;

[0126] (6) Collect a set of valid candidate optimal K values :

[0127] ;

[0128] Step 5.3: Set of Valid Candidate Optimal K Values Generate the final number of clusters. Final number of clusters Corresponding cluster centers and the clusters of power load curves for each cluster center The details are as follows:

[0129] Step 5.31: Based on Consistency measures are obtained, including range consistency, mode consistency, and median consistency, as follows:

[0130] Among them, range consistency The formula is as follows:

[0131] ;

[0132] in, The maximum value, for The minimum value;

[0133] The formula for mode consistency is as follows:

[0134] ;

[0135] Among them, median consistency The formula is as follows:

[0136] ;

[0137] Step 5.32: Determine the final number of clusters based on the consistency metric and decision rules. Based on the final number of clusters Obtain the corresponding cluster centers and the clusters of power load curves for each cluster center The details are as follows:

[0138] The following rules a1-a4 are executed according to priority, with the priority of rules a1-a4 decreasing sequentially. During execution, if a rule is satisfied, the subsequent rules are skipped and the result is output directly; if not satisfied, the next rule is executed according to priority.

[0139] Wherein, rule a1: exists as follows :

[0140] ;

[0141] Condition: At least 3 indicators recommend the same K value;

[0142] Rule a2: There exists the following :

[0143] ;

[0144] Condition: The maximum difference between all K values ​​does not exceed 2;

[0145] Rule a3: Only two primary candidate values and :

[0146] ;

[0147] Then choose the K value with the larger profile coefficient:

[0148] ;

[0149] Rule a4: Exists And there is no modality:

[0150] Trust the elbow rule first, but verify it:

[0151] ;

[0152] Otherwise, try the second-best option:

[0153] ;

[0154] Based on the execution structure of the above rules, the final number of clusters is obtained. ;

[0155] Based on the final cluster number Obtain the corresponding cluster centers and the clusters of power load curves for each cluster center .

[0156] Step 6: Transfer the newly collected power load curve data After cleaning and standardization, preprocessed new power load curve data is obtained; this new power load curve data is then input into a trained encoder to obtain new power load curve data encoding. Encode the new power load curve data Assigned to the nearest cluster center The cluster it belongs to.

[0157] Experiments have shown that the clustering and allocation effect of the newly collected power load curve data using the technical solution of this invention is excellent.

[0158] In addition, preferably, after step 5, a step 7 is included to generate the cluster center representative curve vector, which specifically includes the following:

[0159] Step 7.1: Calculation The weight of each power load curve data q in the data. Weight It is inversely proportional to the distance from the encoded vector to the cluster center in clustering:

[0160] ;

[0161] in: Encoding vector for power load curve data q To the cluster center Euclidean distance; o represents clustering. The sample index in For the sample The Euclidean distance from the encoded vector to the cluster center;

[0162] This is the weight decay coefficient, which controls the sensitivity of the weights to distance; it is set to 0.5. satisfy ;

[0163] Step 7.2: Based on weights Generate cluster center representative curve vectors The details are as follows:

[0164]

[0165] in: The final number of clusters The data for the qth power load curve; .

[0166] This invention designs a weight calculation method based on the inverse ratio of encoding distance to generate representative curve vectors of cluster centers. This method gives higher weights to load curves that are closer to the cluster centers and are more representative, enabling the generated representative curves to accurately reflect the load characteristics of the corresponding clusters and providing more valuable load pattern information for the refined management of power systems.

[0167] In addition, the present invention also provides an electricity load curve encoding and clustering system for executing the electricity load curve encoding and clustering method, comprising the following modules: a data acquisition and processing module, an initial encoder construction module, a training module, an encoding module, a cluster center output module, and a clustering module;

[0168] The data acquisition and processing module is used to acquire batches of power load curve data. Lx The data is then cleaned and standardized to obtain preprocessed power load curve data.

[0169] The initial encoder construction module is used to construct an initial encoder, which includes an orthogonal square wave basis function generation unit, a frequency domain parameter extraction unit, a hybrid coding unit, a Transformer coding component, an adaptive weighted pooling unit, and an output coding layer connected in sequence.

[0170] The hybrid coding unit is used to perform feature coding, absolute position coding, relative frequency coding, and hybrid coding on the frequency domain parameter vector of batch power load curve data to obtain a hybrid coding matrix of power load curve data, and the hybrid coding matrix is ​​the sum of the feature coding matrix, the relative frequency coding matrix, and the absolute position coding matrix.

[0171] The training module is used to train the initial encoder to obtain the trained encoder; wherein, the training of the initial encoder adopts an unsupervised training mode and uses an online clustering loss function that combines attraction loss and repulsion loss.

[0172] The encoding module is used to input the preprocessed power load curve data into the trained encoder to obtain a batch of power load curve data encodings z.

[0173] The cluster center output module is used to cluster the batch of power load curve data encoding z to obtain the final cluster number of the batch of power load curve data. Cluster centers of each cluster ;

[0174] The clustering module is used to process the newly collected power load curve data. After cleaning and standardization, preprocessed new power load curve data is obtained; this new power load curve data is then input into a trained encoder to obtain new power load curve data encoding. Encode the new power load curve data Assigned to the nearest cluster center The cluster it belongs to.

[0175] Among them, each power load curve data Lx All include within one day L Active power values ​​at each time point.

[0176] In addition, the present invention also provides a storage medium storing at least one instruction, at least one program, code set or instruction set, wherein the at least one instruction, the at least one program, the code set or instruction set is loaded and executed by a processor to implement the power load curve encoding and clustering method described above.

[0177] In addition, the present invention also provides an electronic terminal, which includes a processor and a memory. The memory stores at least one instruction, at least one program, code set or instruction set. The at least one instruction, the at least one program, the code set or instruction set is loaded and executed by the processor to implement the power load curve encoding and clustering method.

[0178] Finally, it should be noted that the above embodiments are merely illustrative and explanatory of the present invention, and are not intended to limit the present invention to the scope of the described embodiments. Furthermore, those skilled in the art will understand that the present invention is not limited to the above embodiments, and many more variations and modifications can be made based on the teachings of the present invention, all of which fall within the scope of protection claimed by the present invention.

Claims

1. A method for encoding and clustering electricity load curves, characterized in that, Includes the following steps: Step 1: Collect batches of power load curve data Lx The data is then cleaned and standardized to obtain preprocessed power load curve data. Step 2: Construct the initial encoder, which includes an orthogonal square wave basis function generation unit, a frequency domain parameter extraction unit, a hybrid coding unit, a Transformer coding component, an adaptive weighted pooling unit, and an output coding layer connected in sequence. The hybrid coding unit is used to perform feature coding, absolute position coding, relative frequency coding, and hybrid coding on the frequency domain parameter vector of batch power load curve data to obtain a hybrid coding matrix of power load curve data, and the hybrid coding matrix is ​​the sum of the feature coding matrix, the relative frequency coding matrix, and the absolute position coding matrix. Step 3: Train the initial encoder to obtain the trained encoder; the initial encoder is trained in an unsupervised training mode, using an online clustering loss function that combines attraction loss and repulsion loss. Step 4: Input the preprocessed power load curve data into the trained encoder to obtain a batch of power load curve data codes z; Step 5: Cluster the batch of power load curve data z to obtain the final number of clusters for the batch of power load curve data. Cluster centers of each cluster ; Step 6: Transfer the newly collected power load curve data After cleaning and standardization, preprocessed new power load curve data is obtained; this new power load curve data is then input into a trained encoder to obtain new power load curve data encoding. Encode the new power load curve data Assigned to the nearest cluster center The cluster it belongs to.

2. The method for encoding and clustering power load curves according to claim 1, characterized in that, In step 1, the data for each power load curve Lx All include within one day L Active power values ​​at each time point.

3. The method for encoding and clustering power load curves according to claim 2, characterized in that, In step 2, the orthogonal square wave basis function generation unit is used to generate the orthogonal square wave basis function combination w, which contains The basis functions of a square wave at frequencies, where, Each power load curve contains half the number of data points; The frequency domain parameter extraction unit is used to calculate the dot product between each preprocessed power load curve data and each frequency square wave in w, and combine the frequency dot product results of the same power load curve data into a vector to obtain the frequency domain parameter vector f of batch power load curve data. The hybrid coding unit is specifically used for the following: (1) The frequency domain parameter vector f of the batch of power load curve data is mapped to a high-dimensional feature space to obtain the feature encoding matrix. The formula is as follows: ; in: Encode the weight matrix for the features; The feature encoding bias vector; For feature dimensions; is the number of orthogonal square wave basis functions; T is the transpose of the matrix; (2) The absolute position of each element in the frequency domain parameter vector f of the power load curve data is encoded using a sinusoidal Euclidean function. The position encoding parameter for each absolute position pos is: ; in: This is the absolute position index in the frequency domain parameter vector; For dimensional indexing, ; For position Encoded value in even-dimensional 2p; :Location Encoded values ​​in odd-dimensional 2p+1; The absolute position codes at each pos position are combined into a position coding matrix. : ; (3) For the frequency domain parameter vector f of the batch power load curve data, consider the relative relationship between frequency domain parameters, capture the intrinsic relationship between frequencies, and construct the relative relationship matrix. ,in The element at position (i, j) in R is ; in: It is a relative relation matrix; For the first The and the first The strength of the relationship between the frequency domain parameters; The frequency domain index difference represents the frequency distance; The absolute value of the frequency domain parameter; To avoid division by zero, it is usually taken as a small constant. ; Represents the relative distance between locations in the frequency domain; Indicates the relative magnitude of frequency domain parameter values; After constructing the relative relation matrix R, calculate the relative frequency coding matrix. : in, It is a learnable and trainable relative frequency encoded weight matrix; (4) The three encoding matrices are added together to obtain the hybrid encoding matrix H: ; The Transformer encoding component unit is used to input the hybrid encoding matrix of the power load curve data into the Transformer model for processing, and obtain the Transformer encoding of the batch of power load curve data. Adaptive weighted pooling units are used in Transformer encoding of batch power load curve data. Weighted pooling is performed on each frequency dimension to extract the global feature vector. The formula is as follows: ; in: The output of the Transformer encoding component unit; For the first A vector with frequency dimension; This is a learnable attention weight vector; For the first Normalized attention weights for each frequency dimension; r is the frequency dimension index, used to normalize and sum the attention weights for all frequency dimensions in the denominator; The output encoding layer is used to process the pooled global feature vector. Mapping to the final low-dimensional encoding vector, we obtain the final encoding vector z1 of the encoder for the batch of power load curve data, as shown in the following formula: ; in: This is the first layer weight matrix; This is the first layer bias vector; This is the weight matrix for the second layer; This is the second layer bias vector; Let the hidden layer dimension be denoted as . ; For the final encoding dimension, take .

4. The method for encoding and clustering power load curves according to claim 3, characterized in that, In step 3, the online clustering loss function includes the following: (1) Loss of attractiveness , used to measure the average distance between a sample and its nearest cluster center, is calculated as follows: ; in: The number of samples processed in each training session is taken as... ; : No. The encoded vector of each sample, with dimension ; For the first The encoded vector of each sample serves as a potential cluster center in online clustering; For the minimization operator, find the distance to the th The nearest cluster center for each sample; Euclidean distance function; (2) Loss of repulsive force This is used to measure the degree of inadequacy in the distance between cluster centers, and the formula is as follows: ; in: For double summation, traverse all distinct cluster center pairs, totaling right; This is the hinge loss function; it returns the original value if the value in parentheses is positive, otherwise it returns 0. The distance boundary is a hyperparameter that controls the minimum expected distance between cluster centers. ; Cluster center and The Euclidean distance between them; Normalization factor; (3) Total loss The objective function used for model training and optimization is as follows: ; in: As a weighting coefficient, it balances the importance of attractive force loss and repulsive force loss. .

5. The method for encoding and clustering power load curves according to claim 4, characterized in that, Step 5 specifically includes the following: Step 5.1: Determine the range of candidate cluster number K as [2, ...]. ]; ; in: K max The maximum number of clusters; This is a rounding operation; Step 5.2: Based on the batch of power load curve data encoding z, calculate the set of effective candidate optimal K values. The details are as follows: (1) Calculation In the data, the clustering parameters for each cluster number are as follows: right For each cluster size value, the K-means clustering algorithm is used, with the Euclidean distance between the codes as the distance metric, to calculate the cluster center of each cluster when the number of candidate clusters is K. and the clusters of power load curves for each cluster center , where k is the label of each of the K clusters; (2) Determine the number of first candidate optimal clusters using the elbow method. The details are as follows: For each candidate cluster number Calculate the clustering inertia corresponding to the clustering results for each K value. : ; in, To belong to the cluster center The load curve coding, k Let K be the identifier for each of the K candidate clusters; Based on clustering inertia The elbow method is used to select the point with the largest change in inertial curvature as the first candidate for the optimal number of clusters. ; (3) Determine the number of second candidate optimal clusters using the optimal silhouette coefficient. The details are as follows: For each candidate cluster number Calculate the global silhouette coefficient corresponding to the clustering results for each K value. Select The maximum number of candidate clusters K is the number of the second-best candidate clusters. ; (4) Use the optimal Calinski-Harabasz index to determine the number of third candidate optimal clusters. The details are as follows: For each candidate cluster number Calculate the Calinski-Harabasz index CH(K) corresponding to the clustering results for each K value, and select the cluster number K with the largest CH(K) as the third candidate optimal cluster number. ; (5) Use the optimal Davies-Bouldin index to determine the number of fourth candidate optimal clusters. The details are as follows: For each candidate cluster number Calculate the Davies-Bouldin index DB(K) corresponding to the clustering results for each K value, and select the cluster number K with the largest DB(K) as the fourth candidate optimal cluster number. ; (6) Collect a set of valid candidate optimal K values : ; Step 5.3: Set of Valid Candidate Optimal K Values Generate the final number of clusters. Final number of clusters Corresponding cluster centers and the clusters of power load curves for each cluster center The details are as follows: Step 5.31: Based on Consistency measures are obtained, including range consistency, mode consistency, and median consistency, as follows: Among them, range consistency The formula is as follows: ; in, The maximum value, for The minimum value; The formula for mode consistency is as follows: ; Among them, median consistency The formula is as follows: ; Step 5.32: Determine the final number of clusters based on the consistency metric and decision rules. Based on the final number of clusters Obtain the corresponding cluster centers and the clusters of power load curves for each cluster center The details are as follows: The following rules a1-a4 are executed according to priority, with the priority of rules a1-a4 decreasing sequentially. During execution, if a rule is satisfied, the subsequent rules are skipped and the result is output directly; if not satisfied, the next rule is executed according to priority. Wherein, rule a1: exists as follows : ; Condition: At least 3 indicators recommend the same K value; Rule a2: There exists the following : ; Condition: The maximum difference between all K values ​​does not exceed 2; Rule a3: Only two primary candidate values and : ; Then choose the K value with the larger profile coefficient: ; Rule a4: Exists And there is no modality: Trust the elbow rule first, but verify it: ; Otherwise, try the second-best option: ; Based on the execution structure of the above rules, the final number of clusters is obtained. ; Based on the final cluster number Obtain the corresponding cluster centers and the clusters of power load curves for each cluster center .

6. The method for encoding and clustering power load curves according to claim 5, characterized in that, Following step 5, step 7, which generates the cluster center representative curve vector, includes the following: Step 7.1: Calculation The weight of each power load curve data q in the data. Weight It is inversely proportional to the distance from the encoded vector to the cluster center in clustering: ; in: Encoding vector for power load curve data q To the cluster center Euclidean distance; o represents clustering. The sample index in For the sample The Euclidean distance from the encoded vector to the cluster center; This is the weight decay coefficient, which controls the sensitivity of the weights to distance; it is set to 0.

5. satisfy ; Step 7.2: Based on weights Generate cluster center representative curve vectors The details are as follows: in: The final number of clusters The data for the qth power load curve; .

7. A power load curve encoding and clustering system, used to execute the power load curve encoding and clustering method as described in any one of claims 1-6, characterized in that, It includes the following modules: data acquisition and processing module, initial encoder construction module, training module, encoding module, cluster center output module, and clustering module; The data acquisition and processing module is used to acquire batches of power load curve data. Lx The data is then cleaned and standardized to obtain preprocessed power load curve data. The initial encoder construction module is used to construct an initial encoder, which includes an orthogonal square wave basis function generation unit, a frequency domain parameter extraction unit, a hybrid coding unit, a Transformer coding component, an adaptive weighted pooling unit, and an output coding layer connected in sequence. The hybrid coding unit is used to perform feature coding, absolute position coding, relative frequency coding, and hybrid coding on the frequency domain parameter vector of batch power load curve data to obtain a hybrid coding matrix of power load curve data, and the hybrid coding matrix is ​​the sum of the feature coding matrix, the relative frequency coding matrix, and the absolute position coding matrix. The training module is used to train the initial encoder to obtain the trained encoder; wherein, the training of the initial encoder adopts an unsupervised training mode and uses an online clustering loss function that combines attraction loss and repulsion loss. The encoding module is used to input the preprocessed power load curve data into the trained encoder to obtain a batch of power load curve data encodings z. The cluster center output module is used to cluster the batch of power load curve data encoding z to obtain the final cluster number of the batch of power load curve data. Cluster centers of each cluster ; The clustering module is used to process the newly collected power load curve data. After cleaning and standardization, preprocessed new power load curve data is obtained; this new power load curve data is then input into a trained encoder to obtain new power load curve data encoding. Encode the new power load curve data Assigned to the nearest cluster center The cluster it belongs to.

8. The power load curve encoding and clustering system according to claim 7, characterized in that, Data for each power load curve Lx All include within one day L Active power values ​​at each time point.

9. A storage medium, characterized in that, The storage medium stores at least one instruction, at least one program, code set, or instruction set, wherein the at least one instruction, the at least one program, the code set, or instruction set is loaded and executed by a processor to implement the power load curve encoding and clustering method as described in any one of claims 1-6.

10. An electronic terminal, characterized in that, The electronic terminal includes a processor and a memory, wherein the memory stores at least one instruction, at least one program, code set, or instruction set, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded and executed by the processor to implement the power load curve encoding and clustering method as described in any one of claims 1-6.