Low-voltage area household relationship identification method based on t-SNE and improved GMM
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA THREE GORGES UNIV
- Filing Date
- 2026-02-02
- Publication Date
- 2026-06-19
Smart Images

Figure CN122241286A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of low-voltage transformer substation relationship identification, and specifically to a method for low-voltage transformer substation relationship identification based on t-SNE and improved GMM. Background Technology
[0002] The household-transformer relationship refers to the subordinate connection between the end-user's electricity meter and the transformer connected to it in a low-voltage distribution area. It is a crucial piece of fundamental information reflecting the topology of the low-voltage distribution network. In recent years, the widespread application of new technologies such as distributed renewable energy generation, electric vehicle charging stations, and smart meters has led to increased complexity in the distribution network architecture while ensuring power quality. Simultaneously, with the continuous advancement of urban residential line renovations and rural urbanization, the number of users in low-voltage distribution areas has increased significantly, resulting in frequent changes to household-transformer relationship information. Therefore, constructing a high-precision and scalable method for identifying household-transformer relationships is of significant engineering importance.
[0003] Traditional methods for identifying transformer-household relationships, such as the instantaneous power outage method and the carrier communication method, rely on manual operation, resulting in low efficiency and high cost, making them unsuitable for the large-scale application of smart grids. With the widespread application of smart meters in low-voltage distribution areas, the amount of electrical data collected by the power system has increased significantly, leading to the development of data-driven methods that utilize daily voltage characteristics for transformer-household relationship identification. However, existing methods often require massive and complete electrical data, assume clear data distribution boundaries, have limited ability to identify "marginal users" or abnormal samples mixed into low-voltage distribution areas, and their clustering effect is greatly affected by initial parameters and threshold settings. These problems result in poor performance and accuracy of existing transformer-household relationship identification methods.
[0004] Therefore, there is an urgent need to design a new technical solution to solve the problem of identifying the relationship between households and transformers in low-voltage distribution areas more efficiently and accurately. Summary of the Invention
[0005] To address the shortcomings of existing low-voltage distribution network transformer-household relationship identification methods, such as insufficient identification accuracy, limited ability to process high-dimensional voltage features, and inaccurate identification of marginal users, this invention proposes a low-voltage transformer-household relationship identification method based on t-SNE and an improved Gaussian Mixture Model (GMM). This method first performs missing value completion and normalization on the time-series voltage data of low-voltage transformer-household users to construct a high-dimensional feature matrix representing user voltage behavior characteristics. Then, it uses the t-SNE algorithm for nonlinear dimensionality reduction, reducing feature dimensionality and enhancing the separability between sample classes. Next, it achieves cluster identification through an improved GMM. Finally, it combines posterior probability and Mahalanobis distance distribution to perform secondary correction on marginal samples, thereby achieving high-precision identification of transformer-household relationships in low-voltage distribution networks.
[0006] The technical solution adopted in this invention is as follows:
[0007] The method for identifying the relationship between low-voltage transformer substations and households based on t-SNE and improved GMM includes the following steps: Step 1: Collect electrical data of users and corresponding transformers in each low-voltage distribution area, and construct a dataset; Step 2: Use the t-SNE algorithm to perform nonlinear dimensionality reduction on the high-dimensional voltage features, maintain the local similarity and global distribution characteristics between user voltage curves, and obtain a visualized feature space; Step 3: In the reduced feature space, an improved Gaussian Mixture Model (GMM) is used to cluster and identify user samples; Step 4: After clustering the improved Gaussian Mixture Model (GMM), determine whether the posterior probability matrix is a marginal user by using an adaptive maximum a posteriori probability threshold. Calculate the Mahalanobis distance between the marginal user and the cluster center, and perform a secondary correction for the marginal user by using an adaptive threshold reclassification method. Step 5: Based on the final clustering labels, output the mapping relationship between each terminal user and the corresponding transformer to form the low-voltage distribution area user-transformer relationship identification results.
[0008] In step 1, time-series voltage data of low-voltage distribution area users and corresponding transformers are collected at a fixed sampling frequency. The time-series voltage data is then preprocessed as follows: Convert the time-series voltage data of each user into 1 D is a single-row data type; , This represents a vector of voltage data for a single user on a single day. Indicates the first Voltage values at each time point ; This indicates the number of sampling points within a day.
[0009] The time-series voltage data for each user on the same date are compiled into a matrix. Interpolation is then used to fill in any missing values, resulting in the original voltage data matrix. for: (1); In equation (1), This represents the total number of users in the Taiwan region. This represents the number of voltage data points collected per day. This indicates the voltage measurement value of the first user in the distribution area at the first moment; This indicates the voltage measurement value of the first user in the distribution area at the second moment; This indicates that the first user in the area was in the [number]th [year]. Voltage measurement at time; Indicates the number of Taiwan districts The user in the first Voltage measurement at a given time.
[0010] The original voltage data matrix was normalized using the Z-Score method. Standardization process: (2); In equation (2), Original voltage data matrix The The nth column vector represents the nth column vector. Voltage data for all users at any given time; This indicates that the first user is in the [number]th [year]. The voltage value at that moment; This indicates that the second user is in the... The voltage value at that moment; Indicates the first The user in the first The voltage value at that moment; Represents the transpose matrix; For the first The average voltage of all users at any given time; For the first The standard deviation of voltage for all users at any given time; This indicates that all users in the same area are in the [number]th [period]. Standardized voltage value at time The standardized user voltage dataset for the transformer substation is as follows: , Represents the set of real numbers, indicating the voltage dataset. Each element in the equation is a real number, i.e., a voltage measurement value; Indicates all sizes are OK, The set of real matrix columns; This represents the voltage value of all users in the transformer area at the first moment; This indicates the number of all users in the same area. The voltage value at a given moment.
[0011] Step 2 includes the following steps: Step 2.1: Determine the feature dimension of the user voltage dataset in the transformer area, set the estimated feature dimension value as d, and then process the low-dimensional voltage feature matrix. for: (3); In equation (3), This represents the voltage value of the first user at the first moment after dimensionality reduction; This represents the voltage value of the first user at the second time step after dimensionality reduction; This indicates that the first user after dimensionality reduction is in the [dimensionality] position. The voltage value at that moment; Indicates the dimensionality reduction of the first... The user in the first The voltage value at that moment; Step 2.2: Modeling similarity in high-dimensional space: User voltage feature vector set The similarity between users in their neighborhoods in a high-dimensional space is calculated. The calculation method transforms Euclidean distance into conditional probability, yielding the similarity distribution between users. for: (4); (5); In the above formula, and It represents the conditional probability among users in the same area. It is with users The relevant Gaussian distribution variance; For joint probability; and These are the standardized voltage data matrices. The first in The and the first The row vectors of the i-th user represent the i-th user's row vectors, respectively. Users and A sequence of standardized voltage values for a user at all sampling times throughout the day; Indicates all except Other users ; It is a Gaussian kernel function that transforms Euclidean distance into a similarity measure.
[0012] Step 2.3: Construct a low-dimensional spatial similarity distribution : (6); In equation (6), For users in the Taiwan area and Similarity in low-dimensional space; , For the assumed dimensionality reduction user With users Low-dimensional coordinates; It represents the square of the Euclidean distance between two low-dimensional coordinate points in the low-dimensional space; Step 2.4: Calculate the Kullback-Leible (KL) divergence and solve iteratively. : (7); In equation (7), The value of C represents the degree of difference in the similarity distribution between users in the front-end and back-end regions after dimensionality reduction. The smaller the value of C, the more consistent the relative distribution. This indicates the calculation of two probability distributions. and A function of the difference between them; For users and Symmetric joint probability similarity in high-dimensional space; For users and Symmetric joint probability similarity in low-dimensional space; Step 2.5: Minimize C using gradient descent. By C Taking the partial derivative, we obtain its gradient form: (8); In equation (8), Indicates users in the Taiwan area The gradient vector; and Indicates users in the Taiwan area and Low-dimensional features; Indicates the number of users in the region; Indicates users in the Taiwan area loss function For users low-dimensional features The gradient of the derivative.
[0013] (9); In equation (9), Represents the entire low-dimensional feature matrix The gradient vector; The loss function represents the loss function for all users in the distribution area. Low-dimensional features for all users The gradient of the derivative; The loss function represents the loss function of the first user in the distribution area. Low-dimensional features of the first user The gradient of the derivative; Indicates the middle of the Taiwan region Loss function for each user For the Low-dimensional characteristics of individual users The gradient of the derivative; In each iteration, the low-dimensional feature coordinates are updated using gradient descent: (10); In equation (10), The learning rate; This is the momentum factor, used to accelerate convergence and avoid getting trapped in local minima. Indicates the first The low-dimensional voltage feature matrix obtained after the iteration; Indicates the first The low-dimensional voltage feature matrix obtained after the iteration; Represents the entire low-dimensional feature matrix The gradient vector; Indicates the first The low-dimensional voltage feature matrix obtained after the iteration.
[0014] In step 3, the improved Gaussian Mixture Model (GMM) introduces a k-means algorithm for parameter initialization and covariance matrix regularization on the basis of the traditional GMM to improve the stability and convergence accuracy of the algorithm under uneven intra-class distribution and noisy data. The model parameters are solved iteratively by the expectation-maximization (EM) algorithm to obtain the posterior probability matrix of each sample belonging to each cluster.
[0015] In step 3, after completing the t-SNE dimensionality reduction of the voltage features, the resulting two-dimensional or three-dimensional feature space is used as the clustering input, and an improved Gaussian Mixture Model (GMM) is used to cluster and identify user samples; this includes the following steps: Step 3.1, define the total number of stations in the area. There are 1 user, and the voltage feature vector of each user is denoted as: (11); In equation (11), For feature dimensions; , Indicates the number of users in the region; Indicates user Low-dimensional features; The dimension is The set of real numbers.
[0016] These users are divided into Categories, each category corresponding to a specific type of transformer; Gaussian Mixture Model (GMM) assumes that the data distribution is... It is composed of a combination of Gaussian distributions: (12); In equation (12): The mixing coefficient indicates that the user belongs to the first... The prior probability of each station area; This represents the probability density function of a Gaussian distribution. , The first The mean and covariance matrix of voltage characteristics of each transformer area; Indicates user Probability density function under low-dimensional features; Step 3.2: Use the k-means algorithm to pre-cluster the samples to obtain the initial mean vector, covariance matrix, and mixing coefficients for each voltage feature. (13); In equation (13), Indicates the first The center of each cluster, For the collection of samples in the cluster; This represents the objective function of the k-means algorithm; Indicates user Voltage sample characteristics; The algorithm continuously adjusts the cluster centers through an iterative "assignment-update" process until the objective function converges.
[0017] The cluster centers output by K-means clustering are used to initialize the mean pointer of a Gaussian mixture model (GMM). The covariance within each cluster is used as the initial covariance matrix. The percentage of cluster samples is then used as the initial value for the mixing coefficient. .
[0018] Step 3.3: Add a small regularization term to the diagonal of the covariance matrix to ensure that the covariance matrix is positive definite. (14); (15); in, For the first The number of effective samples in a cluster; For the sample Belongs to ingredients The posterior probability; For the first The mean vector of the components; For the first The diagonal matrix of a cluster; A small positive number ( ); It is an identity matrix.
[0019] Step 3.4, Update model parameters: Calculate the posterior probability of a user belonging to each transformer zone based on the current parameters. Then adjust the model parameters Perform iterative updates: Introduce a latent categorical variable , indicating user Depending on which cluster it belongs to, the user data point generation process is as follows: Sample a category from a multivariate distribution: (16); In the selected category The following data is sampled from a Gaussian distribution: (17); In equation (17), These are the observed values; It follows a Gaussian distribution; Given observations The posterior probability of a species belonging to a particular cluster is calculated using Bayes' theorem: (18); In equation (17), Represents the observed value In the Gaussian distribution of central clusters; Estimated parameters The maximum likelihood function method is used for parameter estimation: (19); In equation (19), Represents the likelihood function; This represents the joint probability of all sample points being independently distributed; Equation (19) indicates that in the parameter The model generates observation data. The probability, taking the logarithm of formula (19), is: (20); In equation (20), It is the log-likelihood function; Maximize the objective function using the expectation-maximization algorithm. : Step E: Calculate the posterior probability using Bayes' theorem. ; M-step: Update parameters: (twenty one); In equation (20), Indicates the first After the nth iteration Average voltage characteristics of each transformer substation area; Indicates the first After the nth iteration Covariance matrix of each station area; Indicates the first After the nth iteration The mixing coefficient of each distribution area; Repeat the iterations until the log-likelihood function converges to obtain the Gaussian mixture model parameter values and output the user posterior probability matrix. and preliminary clustering labels .
[0020] Step 4 includes the following steps: Step 4.1: Determine if the user is a marginal user: Define a user Maximum posterior probability: (twenty two); In equation (22), To maximize the posterior probability, set an adaptive threshold. ,like Then the user For marginal users, The calculation formula is as follows: (twenty three); In equation (23), , These are the mean and standard deviation of the within-cluster posterior probability, respectively. To adjust the parameters.
[0021] Step 4.2: Reassign edge users: For samples identified as marginal users, Mahalanobis distance is used to measure the proximity of the user to each cluster center for user assignment correction. (twenty four); In equation (24), This represents the Mahalanobis distance between the user and the central cluster; It is the inverse of the covariance matrix; Final allocation rules: (25); In equation (25), Indicates user The cluster category number corresponds to a certain distribution transformer in the low-voltage distribution area, thus obtaining the corrected distribution area user cluster label. This represents the minimum Mahalanobis distance between the user and the central cluster.
[0022] In step 5, all users' final category labels are categorized according to clustering components to obtain a user set. (26); In equation (26), Indicates the user ID. , The number of users in the Taiwan region; Indicates the area code, Number of stations; This is then matched with the corresponding transformer numbers to form the final household-transformer relationship mapping table. Specifically: The transformer number set was obtained from the district archives. ; Number the transformers and group each cluster of transformer areas. With the corresponding transformer number Perform corresponding matching to construct the household-transformation mapping relationship: , The mapping transformer number is used; the mapping relationship can be represented as a household-transformer relationship mapping table, as shown in Table 1.
[0023]
[0024] This user-transformer relationship mapping table can clearly identify the power supply transformer to which each user belongs, thus enabling automatic identification of the user-transformer correspondence.
[0025] A low-voltage transformer substation relationship identification system based on t-SNE and improved GMM, the system includes: Voltage data preprocessing module, t-SNE dimensionality reduction module, improved GMM clustering module, edge user correction module: The voltage data preprocessing module is used to collect and clean the time-series voltage characteristics of low-voltage users; The t-SNE dimensionality reduction module utilizes the ability of t-SNE to preserve the local structure of samples, mapping high-dimensional voltage features to two-dimensional or three-dimensional space, thereby enhancing class clustering. The improved GMM clustering module introduces k-means initialization parameters on the basis of traditional GMM to improve model stability and robustness; The edge user correction module, based on the posterior probability and maximum posterior probability adaptive threshold of the GMM output, and the Mahalanobis distance distribution, performs secondary classification correction on low-confidence samples to ensure the consistency between the recognition results and the actual transformer station structure.
[0026] The voltage data preprocessing module collects voltage time-series data from all users within the transformer area, performs missing value repair, noise reduction, and standardization on the data, and constructs... The voltage characteristic matrix, where: n is the number of users, and D is the number of sampling points in a single day period; The t-SNE dimensionality reduction module performs dimensionality reduction processing on the original voltage feature matrix, reducing the high-dimensional data to a two-dimensional or three-dimensional space as input for subsequent clustering algorithms; The improved GMM clustering module performs Gaussian mixture clustering on the dimensionality-reduced feature data and optimizes the initial model by setting a covariance matrix regularization term and using k-means multiple initialization.
[0027] The edge user correction module performs secondary correction on edge samples with low posterior probabilities. This includes calculating the Mahalanobis distance from the sample to each class center and determining the optimal class of the sample based on the posterior probability and distance. Finally, it outputs the mapping relationship between users and transformers, forming a complete identification result for the low-voltage distribution area's user-transformer relationship.
[0028] This invention provides a method for identifying the relationship between low-voltage transformer substations and households based on t-SNE and an improved GMM. The technical advantages are as follows: 1) Step 1 of this invention transforms the originally discrete and noisy raw meter reading data into a high-dimensional voltage feature space with good separability and engineering robustness by uniformly collecting, structurally modeling, filling in missing values, and standardizing the full-cycle voltage time-series data of users in low-voltage distribution areas. This step preserves the overall consistency of voltage changes for users under the same distribution transformer, weakens the impact of absolute voltage level differences and local anomalies on the identification results, and provides a reliable data foundation for subsequent nonlinear dimensionality reduction, probabilistic clustering, and edge user correction, thereby improving the accuracy and applicability of the user-transformer relationship identification method in complex low-voltage distribution area environments.
[0029] 2) Step 2 of this invention introduces a t-distributed random neighborhood embedding algorithm based on probabilistic similarity modeling to perform nonlinear dimensionality reduction on the high-dimensional voltage features of users in low-voltage distribution areas. While maintaining the local similarity of user voltage behavior, it reveals the overall distribution differences of users under different power supply transformers in the feature space. This step maps user samples that were originally difficult to distinguish and had overlapping distributions in the high-dimensional space to a low-dimensional separable space, enhancing the inter-class separation and intra-class compactness of user samples, reducing the sensitivity of subsequent clustering algorithms to initial parameters and noisy data, and providing a stable, intuitive, and discriminative feature expression basis for probabilistic model-based clustering identification and edge user determination, thereby improving the overall accuracy and robustness of low-voltage distribution area user-transformer relationship identification.
[0030] 3) Step 3 of this invention introduces an improved Gaussian mixture model (GMM) into the reduced-dimensional voltage feature space for user clustering identification. By combining a k-means initialization mechanism with a covariance matrix regularization strategy, it overcomes the problems of traditional GMMs, such as sensitivity to initial parameters, uneven intra-class distribution, and the presence of noisy samples, which can easily lead to local optima or numerical instability. This step can probabilistically characterize the membership relationship of users belonging to different distribution transformers, outputting a physically interpretable posterior probability matrix. This provides quantitative criteria for subsequent marginal user identification and secondary correction, thereby improving the stability, accuracy, and robustness of user-transformer relationship identification in complex low-voltage distribution area environments.
[0031] 4) Step 4 of this invention addresses practical problems such as overlapping voltage characteristics, noise, and blurred substation boundaries in low-voltage distribution areas. Based on probabilistic clustering results, it introduces an edge user identification and secondary correction mechanism. Low-confidence samples are filtered using an adaptive maximum a posteriori probability threshold, and Mahalanobis distance is combined to comprehensively determine the statistical proximity of users to each cluster center. This avoids the misjudgment problem caused by the one-time hard partitioning of boundary samples in traditional clustering methods. This step can perform targeted local corrections to the initial clustering results, improving the accuracy and consistency of substation boundary user identification without disrupting the overall clustering structure. This enhances the robustness and engineering practical value of the user-transformer relationship identification method in complex operating scenarios.
[0032] 5) Step 5 of this invention involves structurally classifying the final clustering labels and matching the clustering components with the corresponding distribution transformer numbers to construct a mapping table of user-transformer relationships. This transforms the algorithmic identification results into usable engineering ledger data. This step converts the originally abstract clustering output into clear, storable, and queryable user-transformer correspondences, enabling the identification results to directly serve transformer topology verification, operation and maintenance management, and system applications. This avoids the problems of existing technologies where identification results are difficult to implement or require manual secondary processing, thereby improving the practicality, scalability, and engineering application value of the user-transformer relationship identification method. Attached Figure Description
[0033] The present invention will be further described below with reference to the accompanying drawings and examples; Figure 1 This is a flowchart illustrating the overall process of the low-voltage transformer substation relationship identification method of the present invention. Figure 2 This is a structural diagram of the functional modules of the low-voltage transformer substation relationship identification system of the present invention. Figure 3 This is a graph showing voltage characteristic data.
[0034] Figure 4 A schematic diagram of t-SNE dimensionality reduction.
[0035] Figure 5 To improve the GMM clustering flowchart.
[0036] Figure 6 This is a schematic diagram for edge user identification and correction. Detailed Implementation
[0037] The method for identifying the relationship between low-voltage transformer substations and households based on t-SNE and improved GMM includes the following steps: Step 1: Collect electrical data of users and corresponding transformers in each low-voltage distribution area, and construct a dataset; Step 2: Use the t-distributed random neighborhood embedding (t-SNE) algorithm to perform nonlinear dimensionality reduction on high-dimensional voltage features, maintain the local similarity and global distribution features between user voltage curves, and obtain a two-dimensional or three-dimensional visualization feature space, thereby reducing clustering complexity and enhancing class separation. Step 3: In the dimensionality-reduced feature space, an improved Gaussian Mixture Model (GMM) is used to cluster user samples. The model introduces a k-means algorithm for parameter initialization and covariance matrix regularization on top of the traditional GMM to improve the stability and convergence accuracy of the algorithm under uneven intra-class distribution and noisy data. The model parameters are iteratively solved using the Expectation-Maximization (EM) algorithm to obtain the posterior probability matrix of each sample belonging to each cluster. Step 4: Determine whether a user is a marginal user by using the adaptive maximum a posteriori probability threshold on the posterior probability matrix obtained after GMM clustering. Calculate the Mahalanobis distance between the marginal user and the cluster center, and perform a secondary correction on the marginal user by using the adaptive threshold reclassification method. Step 5: Based on the final clustering labels, output the mapping relationship between each terminal user and the corresponding distribution transformer to form the low-voltage distribution area user-transformer relationship identification result.
[0038] Figure 1 This is a flowchart of the overall process for identifying the relationship between a household and a transformer in a low-voltage distribution area proposed in this invention. It shows the complete steps from raw voltage data acquisition, t-SNE feature dimensionality reduction, improved Gaussian mixture model clustering, edge user identification and correction, to the final output of the household-transformer relationship mapping result.
[0039] Figure 2 The diagram shows the system functional module structure corresponding to the present invention, illustrating the specific implementation structure of the method at the device level, including a data acquisition and preprocessing module, a feature dimensionality reduction module, an improved GMM clustering module, an edge user identification and correction module, and a user-transformation relationship output module. The modules achieve functional collaboration through data flow.
[0040] Figure 3This is a schematic diagram of voltage characteristic data used in this invention, used to show the high-dimensional time-series structure of user voltage curves at 96 points a day. It illustrates that high-dimensional data is difficult to intuitively distinguish user categories in the original space, thus highlighting the necessity of dimensionality reduction processing.
[0041] Figure 4 The diagram illustrates t-SNE dimensionality reduction. By embedding high-dimensional voltage features into a two-dimensional visualization space, it shows the cluster structure formed by user samples in the low-dimensional space after dimensionality reduction, providing a clear feature representation basis for subsequent cluster-based recognition.
[0042] Figure 5 The flowchart of the improved GMM clustering of this invention illustrates the steps of constructing a stable and reliable hybrid model using k-means initialization, covariance matrix regularization, EM iterative solution, and edge user identification and reclassification, and finally generating the posterior probability matrix of each sample to achieve a preliminary determination of the user category.
[0043] Figure 6 This is a schematic diagram of the edge user identification and correction method of the present invention. It shows the position of edge samples with low posterior probability in the clustering space, and the process of secondary correction by calculating Mahalanobis distance, thereby optimizing the initial clustering results and improving the accuracy and robustness of household relationship identification.
Claims
1. A low-voltage transformer area household relationship identification method based on t-SNE and improved GMM, characterized in that Includes the following steps Step 1: Collect electrical data of users and corresponding transformers in each low-voltage distribution area, and construct a dataset; Step 2: Use the t-SNE algorithm to perform nonlinear dimensionality reduction on the high-dimensional voltage features, maintain the local similarity and global distribution characteristics between user voltage curves, and obtain a visualized feature space; Step 3: In the reduced feature space, an improved Gaussian Mixture Model (GMM) is used to cluster and identify user samples; Step 4: After clustering the improved Gaussian Mixture Model (GMM), determine whether the posterior probability matrix is a marginal user by using an adaptive maximum a posteriori probability threshold. Calculate the Mahalanobis distance between the marginal user and the cluster center, and perform a secondary correction for the marginal user by using an adaptive threshold reclassification method. Step 5: Based on the final clustering labels, output the mapping relationship between each terminal user and the corresponding transformer to form the low-voltage distribution area user-transformer relationship identification results.
2. The method according to claim 1, wherein the method is characterized in that: In step 1, time-series voltage data of low-voltage distribution area users and corresponding transformers are collected at a fixed sampling frequency. The time-series voltage data is then preprocessed as follows: convert the time-series voltage data of each user into 1 D single-row data type; , represents the voltage data vector of a single user on a single day; represents the voltage value at the th time point ; represents the number of sampling points within a day. The time-series voltage data for each user on the same date are compiled into a matrix. Interpolation is then used to fill in any missing values, resulting in the original voltage data matrix. for: (1); In equation (1), This represents the total number of users in the Taiwan region. This represents the number of voltage data points collected per day. This indicates the voltage measurement value of the first user in the distribution area at the first moment; This indicates the voltage measurement value of the first user in the distribution area at the second moment; This indicates that the first user in the area was in the [number]th [year]. Voltage measurement at time; Indicates the number of Taiwan districts The user in the first Voltage measurement at a given time.
3. The method for identifying the relationship between low-voltage distribution area and household transformer based on t-SNE and improved GMM according to claim 2, characterized in that: The original voltage data matrix was normalized using the Z-Score method. Standardization process: (2); In equation (2), Original voltage data matrix The The nth column vector represents the nth column vector. Voltage data for all users at any given time; This indicates that the first user is in the [number]th [year]. The voltage value at that moment; This indicates that the second user is in the... The voltage value at that moment; Indicates the first The user in the first The voltage value at that moment; Represents the transpose matrix; For the first The average voltage of all users at any given time; For the first The standard deviation of voltage for all users at any given time; This indicates that all users in the same area are in the [number]th [period]. Standardized voltage value at time The standardized user voltage dataset for the transformer substation is as follows: , Represents the set of real numbers, indicating the voltage dataset. Each element in the equation is a real number, i.e., a voltage measurement value; Indicates all sizes are OK, The set of real matrix columns; This represents the voltage value of all users in the transformer area at the first moment; This indicates the number of all users in the same area. The voltage value at a given moment.
4. The method for identifying the relationship between low-voltage transformer substations and households based on t-SNE and improved GMM according to claim 3, characterized in that: Step 2 includes the following steps: Step 2.1: Determine the feature dimension of the user voltage dataset in the transformer area, set the estimated feature dimension value as d, and then process the low-dimensional voltage feature matrix. for: (3); In equation (3), This represents the voltage value of the first user at the first moment after dimensionality reduction; This represents the voltage value of the first user at the second time step after dimensionality reduction; This indicates that the first user after dimensionality reduction is in the [dimensionality] position. The voltage value at that moment; Indicates the dimensionality reduction of the first... The user in the first The voltage value at that moment; Step 2.2: Modeling similarity in high-dimensional space: User voltage feature vector set The similarity between users in their neighborhoods in a high-dimensional space is calculated. The calculation method transforms Euclidean distance into conditional probability, yielding the similarity distribution between users. for: (4); (5); In the above formula, and It represents the conditional probability among users in the same area. It is with users The relevant Gaussian distribution variance; For joint probability; and These are the standardized voltage data matrices. The first in The and the first The row vectors of the i-th user represent the i-th user's row vectors, respectively. Users and A sequence of standardized voltage values for a user at all sampling times throughout the day; Indicates all except Other users ; It is a Gaussian kernel function that transforms Euclidean distance into a similarity measure. Step 2.3: Construct a low-dimensional spatial similarity distribution : (6); In equation (6), For users in the Taiwan area and Similarity in low-dimensional space; , For the assumed dimensionality reduction user With users Low-dimensional coordinates; It represents the square of the Euclidean distance between two low-dimensional coordinate points in the low-dimensional space; Step 2.4: Calculate the Kullback-Leible (KL) divergence and solve iteratively. : (7); In equation (7), The value of C represents the degree of difference in the similarity distribution between users in the front-end and back-end regions after dimensionality reduction. The smaller the value of C, the more consistent the relative distribution. This indicates the calculation of two probability distributions. and A function of the difference between them; For users and Symmetric joint probability similarity in high-dimensional space; For users and Symmetric joint probability similarity in low-dimensional space; Step 2.5: Minimize C using gradient descent. By C Taking the partial derivative, we obtain its gradient form: (8); In equation (8), Indicates users in the Taiwan area The gradient vector; and Indicates users in the Taiwan area and Low-dimensional features; Indicates the number of users in the region; Indicates users in the Taiwan area loss function For users low-dimensional features The gradient of the derivative. (9); In equation (9), Represents the entire low-dimensional feature matrix The gradient vector; The loss function represents the loss function for all users in the distribution area. Low-dimensional features for all users The gradient of the derivative; The loss function represents the loss function of the first user in the distribution area. Low-dimensional features of the first user The gradient of the derivative; Indicates the middle of the Taiwan region Loss function for each user For the Low-dimensional characteristics of individual users The gradient of the derivative; In each iteration, the low-dimensional feature coordinates are updated using gradient descent: (10); In equation (10), The learning rate; This is the momentum factor, used to accelerate convergence and avoid getting trapped in local minima. Indicates the first The low-dimensional voltage feature matrix obtained after the iteration; Indicates the first The low-dimensional voltage feature matrix obtained after the iteration; Represents the entire low-dimensional feature matrix The gradient vector; Indicates the first The low-dimensional voltage feature matrix obtained after the iteration.
5. The method for identifying the relationship between low-voltage distribution area and household transformer based on t-SNE and improved GMM according to claim 4, characterized in that: In step 3, after completing the t-SNE dimensionality reduction of the voltage features, the two-dimensional or three-dimensional feature space obtained by dimensionality reduction is used as the clustering input, and the improved Gaussian mixture model (GMM) is used to perform clustering identification on the user samples. Includes the following steps: Step 3.1, define the total number of stations in the area. There are 1 user, and the voltage feature vector of each user is denoted as: (11); In equation (11), For feature dimensions; , Indicates the number of users in the region; Indicates user Low-dimensional features; The dimension is The set of real numbers. These users are divided into Categories, each category corresponding to a specific type of transformer; Gaussian Mixture Model (GMM) assumes that the data distribution is... It is composed of a combination of Gaussian distributions: (12); In equation (12): The mixing coefficient indicates that the user belongs to the first... The prior probability of each station area; This represents the probability density function of a Gaussian distribution. , The first The mean and covariance matrix of voltage characteristics of each transformer area; Indicates user Probability density function under low-dimensional features; Step 3.2: Use the k-means algorithm to pre-cluster the samples to obtain the initial mean vector, covariance matrix, and mixing coefficients for each voltage feature. (13); In equation (13), Indicates the first The center of each cluster, For the collection of samples in the cluster; This represents the objective function of the k-means algorithm; Indicates user Voltage sample characteristics; The algorithm continuously adjusts the cluster centers through an iterative "assignment-update" process until the objective function converges. The cluster centers output by K-means clustering are used to initialize the mean pointer of a Gaussian mixture model (GMM). The covariance within each cluster is used as the initial covariance matrix. The percentage of cluster samples is then used as the initial value for the mixing coefficient. . Step 3.3: Add a small regularization term to the diagonal of the covariance matrix to ensure that the covariance matrix is positive definite. (14); (15); in, For the first The number of effective samples in a cluster; For the sample Belongs to ingredients The posterior probability; For the first The mean vector of the components; For the first The diagonal matrix of a cluster; A small positive number ( ); It is an identity matrix. Step 3.4, Update model parameters: Calculate the posterior probability of a user belonging to each transformer zone based on the current parameters. Then adjust the model parameters Perform iterative updates: Introduce a latent categorical variable , indicating user Depending on which cluster it belongs to, the user data point generation process is as follows: Sample a category from a multivariate distribution: (16); In the selected category The following data is sampled from a Gaussian distribution: (17); In equation (17), These are the observed values; It follows a Gaussian distribution; Given observations The posterior probability of a species belonging to a particular cluster is calculated using Bayes' theorem: (18); In equation (17), Represents the observed value In the Gaussian distribution of central clusters; Estimated parameters The maximum likelihood function method is used for parameter estimation: (19); In equation (19), Represents the likelihood function; This represents the joint probability of all sample points being independently distributed; Equation (19) indicates that in the parameter The model generates observation data. The probability, taking the logarithm of formula (19), is: (20); In equation (20), It is the log-likelihood function; Maximize the objective function using the expectation-maximization algorithm. : Step E: Calculate the posterior probability using Bayes' theorem. ; M-step: Update parameters: (21); In equation (20), Indicates the first After the nth iteration Average voltage characteristics of each transformer substation area; Indicates the first After the nth iteration Covariance matrix of each station area; Indicates the first After the nth iteration The mixing coefficient of each distribution area; Repeat the iterations until the log-likelihood function converges to obtain the Gaussian mixture model parameter values and output the user posterior probability matrix. and preliminary clustering labels .
6. The method for identifying the relationship between low-voltage distribution area and household transformer based on t-SNE and improved GMM according to claim 5, characterized in that: Step 4 includes the following steps: Step 4.1: Determine if the user is a marginal user: Define a user Maximum posterior probability: (22); In equation (22), To maximize the posterior probability, set an adaptive threshold. ,like Then the user For marginal users, The calculation formula is as follows: (23); In equation (23), , These are the mean and standard deviation of the within-cluster posterior probability, respectively. To adjust the parameters. Step 4.2: Reassign edge users: For samples identified as marginal users, Mahalanobis distance is used to measure the proximity of the user to each cluster center for user assignment correction. (24); In equation (24), This represents the Mahalanobis distance between the user and the central cluster; It is the inverse of the covariance matrix; Final allocation rules: (25); In equation (25), Indicates user The cluster category number corresponds to a certain distribution transformer in the low-voltage distribution area, thus obtaining the corrected distribution area user cluster label. This represents the minimum Mahalanobis distance between the user and the central cluster.
7. The method for identifying the relationship between low-voltage transformer substations and households based on t-SNE and improved GMM according to claim 6, characterized in that: In step 5, all users' final category labels are categorized according to clustering components to obtain a user set. (26); In equation (26), Indicates the user ID. , The number of users in the Taiwan region; Indicates the area code, Number of stations; Then, it is matched with the transformer number of the corresponding component to form the final household-transformer relationship mapping table.
8. The method for identifying the relationship between low-voltage transformer substations and households based on t-SNE and improved GMM according to claim 7, characterized in that: The transformer number set was obtained from the district archives. ; Number the transformers and group each cluster of transformer areas. With the corresponding transformer number Perform corresponding matching to construct the household-transformation mapping relationship: , The mapping transformer number is used; the mapping relationship is represented by the household transformer relationship mapping table. Through the household transformer relationship mapping table, the power supply transformer to which each user belongs can be clearly identified, realizing the automatic identification of the household transformer correspondence.
9. A low-voltage distribution area transformer-household relationship identification system based on t-SNE and improved GMM, characterized in that... The system includes: Voltage data preprocessing module, t-SNE dimensionality reduction module, improved GMM clustering module, edge user correction module: The voltage data preprocessing module is used to collect and clean the time-series voltage characteristics of low-voltage users; The t-SNE dimensionality reduction module utilizes the ability of t-SNE to preserve the local structure of samples, mapping high-dimensional voltage features to two-dimensional or three-dimensional space, thereby enhancing class clustering. The improved GMM clustering module introduces k-means initialization parameters on the basis of traditional GMM to improve model stability and robustness; The edge user correction module, based on the posterior probability and maximum posterior probability adaptive threshold of the GMM output, and the Mahalanobis distance distribution, performs secondary classification correction on low-confidence samples to ensure the consistency between the recognition results and the actual transformer station structure.
10. The low-voltage transformer substation relationship identification system based on t-SNE and improved GMM according to claim 9, characterized in that: The voltage data preprocessing module collects voltage time-series data from all users within the transformer area, performs missing value repair, noise reduction, and standardization on the data, and constructs... The voltage characteristic matrix, where: n is the number of users, and D is the number of sampling points in a single day period; The t-SNE dimensionality reduction module performs dimensionality reduction processing on the original voltage feature matrix, reducing the high-dimensional data to a two-dimensional or three-dimensional space as input for subsequent clustering algorithms; The improved GMM clustering module performs Gaussian mixture clustering on the dimensionality-reduced feature data and optimizes the initial model by setting a covariance matrix regularization term and using k-means multiple initialization. The edge user correction module performs secondary correction on edge samples with low posterior probabilities. This includes calculating the Mahalanobis distance from the sample to each class center and determining the optimal class of the sample based on the posterior probability and distance. Finally, it outputs the mapping relationship between users and transformers, forming a complete identification result for the low-voltage distribution area's user-transformer relationship.