Method for automatically identifying and checking power distribution network topology by fusing multi-source data
By constructing a multi-source data fusion architecture and utilizing time-series feature mining and causal inference algorithms, the topological relationship of the distribution network is automatically inverted, solving the problem of the disconnect between static ledgers and dynamic operation. This achieves high-confidence closed-loop correction and precise synchronization of the digital twin model, thereby improving the intelligent operation and maintenance capabilities of the distribution network.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHENGDU SUN HIGH-TECH CO LTD
- Filing Date
- 2026-05-14
- Publication Date
- 2026-06-12
AI Technical Summary
In existing technologies, the static records of distribution network topology are disconnected from dynamic operation, and there is a lack of multi-source data fusion mechanisms. This results in delayed topology error detection and a lack of automatic correction capabilities, affecting the smooth operation of distribution and dispatching and the reliability of power supply.
A three-layer integrated architecture of 'data layer - feature layer - decision layer' is constructed. By utilizing multi-source heterogeneous data from marketing, scheduling and production systems, and through time-series feature mining and causal inference algorithms, topological relationships are automatically inverted to achieve closed-loop correction under high confidence.
It enables automatic identification and dynamic verification of distribution network topology, significantly improving the accuracy of topology identification and closed-loop correction capability, ensuring real-time synchronization between digital twin model and physical entity, reducing manual verification workload, and improving the intelligent level of distribution network operation and maintenance.
Smart Images

Figure CN122203584A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of power system automation and digital twin technology, specifically involving a method for automatic identification and verification of power grid topology that integrates multi-source data. Background Technology
[0002] As a crucial link connecting the transmission network and end users, the accuracy of the distribution network's topology is the core foundation for supporting seamless operation, distribution, and dispatching, lean line loss management, precise fault location, and power supply reliability analysis. Currently, the distribution network topology mainly relies on static ledger data maintained manually in GIS and production systems.
[0003] However, existing technologies have the following inherent drawbacks:
[0004] 1. Disconnect between static ledgers and dynamic operation: Frequent power distribution network upgrades and highly random user access lead to significant delays in updating manually entered static ledgers, resulting in discrepancies between the ledgers and the actual situation. Traditional methods cannot utilize massive amounts of real-time operational data to dynamically verify static ledgers.
[0005] 2. Data silos in operation, distribution and dispatch: The marketing system (user electricity consumption), dispatch system (operational flow), and production system (equipment ledger) are independent of each other, lacking an effective data fusion mechanism, and cannot cross-verify the accuracy of the topology from multiple dimensions.
[0006] 3. Passive topology error detection and correction mechanism: Existing topology verification mostly relies on manual on-site inspection or simple comparison based on a single data source, which is inefficient and has poor timeliness, and lacks the closed-loop capability to automatically detect and correct erroneous topologies.
[0007] Therefore, how to break down data silos, utilize multi-source operational data to invert static topological relationships, and realize automatic identification, dynamic verification, and closed-loop correction of distribution network topology is a technical problem that urgently needs to be solved. Summary of the Invention
[0008] The purpose of this invention is to provide an automatic identification and verification method for the topology of a power distribution network that integrates multi-source data. This method aims to solve the problems of inaccurate static ledgers, delayed topology error detection, and lack of closed-loop correction mechanisms in existing technologies. By constructing a three-layer integrated architecture of "data layer - feature layer - decision layer", it utilizes multi-source heterogeneous data from marketing, scheduling, and production systems. Through time-series feature mining and causal inference algorithms, it automatically inverts topological relationships, achieves closed-loop correction under high confidence, and realizes accurate synchronization between the digital twin model and physical entities of the power distribution network.
[0009] To achieve the above-mentioned objectives, the technical solution adopted by this invention is: an automatic identification and verification method for power distribution network topology that integrates multi-source data, comprising the following steps:
[0010] S1. Collect user electricity data from the marketing system, power flow data from the dispatching system, and equipment ledger data from the production system. Perform time-series alignment, outlier removal, and missing value imputation on the collected multi-source heterogeneous data to construct a standardized multi-source time-series dataset. The user electricity data from the marketing system includes high-frequency voltage, current, and electricity time series from user smart meters. The power flow data from the dispatching system includes high-frequency voltage, active power, and reactive power time series from feeder heads and distribution transformers. The equipment ledger data from the production system includes equipment parameters, static topology connections, and geographic information from the production management system / GIS system.
[0011] S2. Identify the actual user-transformer relationship based on the similarity between the user voltage and the low-voltage side voltage of the transformer, and then identify the actual transformer-line relationship based on the causal relationship between the transformer load power and the power at the feeder head end. Construct a data-driven topology identification model to automatically reverse the physical topology relationship between the transformer-line-user.
[0012] S3. Compare the identified actual topological relationships with the static ledger of the production system to generate a topological difference map, and evaluate the confidence of the identification results through an ensemble learning algorithm.
[0013] S4. Output topology anomaly alarms or automatic correction instructions according to confidence level, update the ledger and synchronize the digital twin model.
[0014] In the preferred technical solution, step S2, the household-transformer association identification includes:
[0015] Obtain user voltage data and transformer low-voltage side voltage data, and use dynamic time warping to calculate user voltage sequences. With transformer low-voltage side voltage sequence Cumulative distance matrix The dynamic programming recurrence formula is:
[0016] ;
[0017] In the formula, Represents the th element in the cumulative distance matrix Line number The value of the column represents the distance from the start of the user sequence to the [number]. Point, transformer sequence starting point to the first The cumulative distance of the shortest regularized path to each point; Represents a Euclidean distance metric. , Indicates the first Voltage data for each user, Indicates the first Voltage data on the low-voltage side of the transformer; Indicates the total number of users; Indicates the total number of transformers; This means selecting the path with the smallest distance among the three forward paths, ensuring that the most similar path is taken each time, and finally obtaining the alignment with the highest overall similarity.
[0018] Calculate the Pearson correlation coefficient for the dynamically time-normalized and aligned sequences:
[0019] ;
[0020] In the formula, This represents the Pearson correlation coefficient; and These represent the first and second digits after dynamic time warping and alignment. User voltage values and transformer low-voltage side voltage values at each sampling point; and These represent the mean values of the aligned sequences of user voltage and transformer low-voltage side voltage, respectively. Indicates the total number of sampling points;
[0021] When the Pearson correlation coefficient is higher than the first threshold, the household-to-strong topological association is determined.
[0022] In the preferred technical solution, step S2, the variable-line association identification includes:
[0023] Obtain transformer load power data and feeder head-end power data, and analyze the transformer load power sequence. With feeder head power sequence Perform ADF stationarity verification;
[0024] Granger causality tests were used to determine whether transformer power fluctuations were a Granger cause of feeder power fluctuations, and unconstrained regression models were constructed accordingly. Constrained regression model :
[0025] ;
[0026] ;
[0027] In the formula, Indicates the first Power sequence at the beginning of the feeder at each sampling time; The constant / intercept term of the regression model represents the basic bias level of the feeder power sequence; The regression coefficient represents the lag term of the feeder power, reflecting the weight of the influence of historical feeder power on the current feeder power; Indicates the feeder at the 1st Active power values at each sampling time (historical data of feeder power). The regression coefficient represents the transformer power lag term, reflecting the weight of the influence of historical transformer power on current feeder power; Indicates the transformer at the first Active power values at each sampling time (from the operating data of the distribution transformer); This represents the residual term of the regression model, which represents the random error of the model; The lag order of the feeder power sequence is expressed using the preceding... The current value is predicted by the feeder power at a historical moment. The lag order of a transformer power sequence is represented by the first... The transformer power at a historical moment can be used to predict the current feeder power.
[0028] Through calculation Statistical verification of lag term coefficients Joint significance:
[0029] ;
[0030] In the formula, and Let represent the sum of squared residuals for the constrained model and the unconstrained model, respectively; Indicates sample size;
[0031] like The statistic corresponds to If the value is less than the significance level, it indicates that the transformer power fluctuation is the Granger cause of the feeder power fluctuation, thus confirming the topology attribution.
[0032] In a preferred embodiment, step S3, evaluating the confidence level of the recognition result using an ensemble learning algorithm, includes:
[0033] This network structure integrates three heterogeneous models—logistic regression, gradient boosting decision tree, and support vector machine—as base learner layers. The input to the base learner layers is the feature vector. , Mean voltage similarity , Similarity variance , For Granger's cause and effect value, Time series stability index , For historical identification consistency ;
[0034] The output of the base learner layer is a probability vector. :
[0035] ;
[0036] ;
[0037] ;
[0038] In the formula, This represents the confidence level of the topological relationship output by logistic regression. Represents the weight vector transpose, The weight vector and feature vector of the logistic regression model. The dimensions are consistent, and each component corresponds to the importance weight of a topological feature; This represents the bias / intercept term of the logistic regression model; express Activation function, mapping linear scores to The interval is given, and the output is a confidence score in probability form.
[0039] This represents the confidence level of the topological relationship output of the gradient-enhanced decision tree; Indicates the first The characteristic spatial region divided by the trees; This represents the total number of decision trees in the gradient boosting decision tree model; Indicates the first The output weights of each decision tree represent the contribution of that tree to the final result; Represents an exponential function, when The value is 1 when the condition is met, and 0 otherwise, representing the eigenvector. Does it fall in the first place? The corresponding area of each tree;
[0040] This represents the classification result output by the support vector machine; This represents a classification function. When the input value is greater than 0, the output is +1, indicating confidence; when the input value is less than 0, the output is -1, indicating unconfidence. The kernel function is used to map feature vectors to a high-dimensional space to achieve nonlinear classification. This invention uses the radial basis function (RBF). , The width parameter represents the kernel function and controls the complexity of the mapping space; Indicates the first The Lagrange multipliers corresponding to each support vector represent the importance of that sample. Indicates the first The labels for each training sample are: +1 indicates that the topological relationship is correct, and -1 indicates that the topological relationship is incorrect. Indicates the first Feature vectors of support vectors; This indicates the number of support vectors in the training samples; This represents the bias term of the support vector machine model, used to adjust the position of the classification hyperplane;
[0041] A ridge regression model is used as the meta-learner layer. The input to the meta-learner layer is the probability vector output by the base learner layer, and the output of the meta-learner layer is the final fused confidence score. Its mathematical expression is as follows:
[0042] ;
[0043] In the formula, This indicates the final fusion confidence score; Indicates the activation function; This represents the probability vector output by the base learner layer; and This represents the weight parameters of the meta-learner layer obtained through training with five-fold cross-validation.
[0044] Compared with the prior art, the present invention has the following beneficial effects:
[0045] This invention integrates data from the marketing, scheduling, and production systems, and uses operational data to reverse-engineer static ledgers, constructing a dual-drive mechanism of "data-driven + model-driven" to fundamentally solve the ledger consistency problem in the integrated operation, distribution, and scheduling system. It introduces advanced algorithms and specific mathematical implementations such as dynamic time warping, Granger causality testing, and Stacking ensemble learning confidence assessment, overcoming the limitations of traditional single correlation analysis in nonlinear and time-delay scenarios, significantly improving the accuracy of topology identification. It establishes a closed-loop mechanism of "perception-analysis-decision-execution," which not only detects errors but also automatically corrects them at high confidence levels, ensuring real-time synchronization between the digital twin model of the distribution network and the physical power grid, providing a solid data foundation for advanced applications (such as line loss calculation and fault self-healing). It achieves automatic identification and maintenance of topology relationships, significantly reducing the workload of manual on-site verification and improving the intelligence level of distribution network operation and maintenance. Attached Figure Description
[0046] Figure 1 This is the overall architecture and flowchart of the automatic identification and verification method for power distribution network topology that integrates multi-source data according to the present invention;
[0047] Figure 2 This is a schematic diagram illustrating the principle of household-transformer relationship identification based on voltage time series similarity in this invention.
[0048] Figure 3 This is a schematic diagram illustrating the principle of power timing causality-based variable-line relationship verification in this invention.
[0049] Figure 4This is a block diagram of the closed-loop correction logic driven by topological conflict detection and confidence level in this invention. Detailed Implementation
[0050] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Unless otherwise specified, the technical means used in the embodiments are conventional means well known to those skilled in the art.
[0051] This invention discloses an automatic identification and verification method for distribution network topology that integrates multi-source data. The main inventive idea is to construct a three-layer fusion architecture of "data layer - feature layer - decision layer." By collecting user electricity data from the marketing system, power flow data from the dispatching system, and equipment ledger data from the production system, and utilizing time-series feature extraction, multi-dimensional correlation analysis, and Granger causality testing, a data-driven topology identification model is constructed to automatically invert the physical topology relationship of "transformer-line-customer." An ensemble learning algorithm is used to evaluate the confidence level of the identification results and to detect conflicts with static ledgers, ultimately achieving intelligent alarms for topology anomalies and automatic correction under high confidence. This invention transforms the traditional "ledger-driven" model into a dual-drive model of "data-driven + model-driven," realizing dynamic closed-loop management of distribution network topology relationships and precise synchronization of the digital twin model, significantly improving the accuracy and intelligence level of the integrated operation, distribution, and dispatching system.
[0052] like Figure 1 As shown, this invention discloses an automatic identification and verification method for the topology of a power distribution network that integrates multi-source data, comprising the following steps:
[0053] S1. Collect user electricity data from the marketing system, power flow data from the dispatching system, and equipment ledger data from the production system. Perform time-series alignment, outlier removal, and missing value imputation on the collected multi-source heterogeneous data to construct a standardized multi-source time-series dataset. The user electricity data from the marketing system includes high-frequency voltage, current, and electricity time series from user smart meters. The power flow data from the dispatching system includes high-frequency voltage, active power, and reactive power time series from feeder heads and distribution transformers. The equipment ledger data from the production system includes equipment parameters, static topology connections, and geographic information from the production management system / GIS system.
[0054] S2. Identify the actual user-transformer relationship based on the similarity between the user voltage and the low-voltage side voltage of the transformer (i.e., user-transformer association identification based on voltage time series similarity), and then identify the actual transformer-line relationship based on the causal relationship between the transformer load power and the feeder head-end power (i.e., transformer-line association identification based on power time series causality). Construct a data-driven topology identification model to automatically reverse the physical topology relationship between the feeder and the user.
[0055] S3. Compare the identified actual topological relationships with the static ledger of the production system to generate a topological difference map, and evaluate the confidence of the identification results through an ensemble learning algorithm.
[0056] S4. Output topology anomaly alarms or automatic correction instructions according to confidence level, update the ledger and synchronize the digital twin model.
[0057] In a further implementation, step 2, the household-transformer association identification includes:
[0058] To acquire user voltage data and transformer low-voltage side voltage data, dynamic time warping (DTW) is used to calculate the user voltage sequence because the voltage curves of users and transformers may have time delays and be inconsistent in length. With transformer low-voltage side voltage sequence Cumulative distance matrix DTW aligns the two curves, finds the most similar matching path, and calculates the overall distance. The smaller the distance, the more similar the voltage fluctuations. DTW aims to find an optimal regularized path that minimizes the cumulative distance. Its dynamic programming recursive formula is:
[0059] ;
[0060] In the formula, Represents the th element in the cumulative distance matrix Line number The value of the column represents the distance from the start of the user sequence to the [number]. Point, transformer sequence starting point to the first The cumulative distance of the shortest regularized path to each point; Represents a Euclidean distance metric. , Indicates the first Voltage data for each user, Indicates the first Voltage data on the low-voltage side of the transformer; Indicates the total number of users; Indicates the total number of transformers; This means selecting the path with the smallest distance among the three forward paths, ensuring that the most similar path is taken each time, and finally obtaining the alignment with the highest overall similarity.
[0061] Based on the sequence after dynamic time warping and alignment, the Pearson correlation coefficient of the two corrected voltage curves is calculated to eliminate the influence of amplitude differences. The mathematical expression is as follows:
[0062] ;
[0063] In the formula, This represents the Pearson correlation coefficient; and These represent the first and second digits after dynamic time warping and alignment. User voltage values and transformer low-voltage side voltage values at each sampling point; and These represent the mean values of the aligned sequences of user voltage and transformer low-voltage side voltage, respectively. This indicates the total number of sampling points.
[0064] When the Pearson correlation coefficient is higher than the first threshold (e.g., 0.85), a strong household-transformer topological correlation is determined. The closer the calculated Pearson correlation coefficient is to 1, the more synchronized the voltage changes.
[0065] In a further implementation, the core logic of transformer-feeder correlation identification in step 2 is that changes in transformer power will cause changes in the power of its feeder. Two models are constructed: one model uses only the historical power of the feeder for prediction, and the other model incorporates the transformer power for prediction. The specific process includes:
[0066] Obtain transformer load power data and feeder head-end power data, and analyze the transformer load power sequence. With feeder head power sequence Perform ADF stationarity verification to ensure sequence stationarity, exclude drastic mutations and anomalous data, and ensure the effectiveness of subsequent causal tests.
[0067] Granger causality tests were used to determine whether transformer power fluctuations were a Granger cause of feeder power fluctuations, and unconstrained regression models were constructed accordingly. and constrained regression model :
[0068] ;
[0069] ;
[0070] In the formula, Indicates the first Power sequence at the beginning of the feeder at each sampling time; The constant / intercept term of the regression model represents the basic bias level of the feeder power sequence; The regression coefficient represents the lag term of the feeder power, reflecting the weight of the influence of historical feeder power on the current feeder power; Indicates the feeder at the 1st Active power values at each sampling time (historical data of feeder power). The regression coefficient represents the transformer power lag term, reflecting the weight of the influence of historical transformer power on current feeder power; Indicates the transformer at the first Active power values at each sampling time (from the operating data of the distribution transformer); This represents the residual term of the regression model, which represents the random error of the model; The lag order of the feeder power sequence is expressed using the preceding... The current value is predicted by the feeder power at a historical moment. The lag order of a transformer power sequence is represented by the first... The transformer power at a historical moment can be used to predict the current feeder power.
[0071] Through calculation Statistical verification of lag term coefficients Joint significance:
[0072] ;
[0073] In the formula, and Let represent the sum of squared residuals for the constrained model and the unconstrained model, respectively; Indicates the sample size.
[0074] like The statistic corresponds to If the value is less than the significance level (e.g., 0.05), it indicates that the transformer power fluctuation is the Granger cause of the feeder power fluctuation, thus confirming the topology attribution.
[0075] In a further implementation, in step S3, a confidence evaluation model based on the Stacking ensemble learning framework is constructed, and the specific network structure and mathematical description are as follows:
[0076] It comprises three heterogeneous models, integrating Logistic Regression (LR), Gradient Boosting Decision Tree (GBDT), and Support Vector Machine (SVM) as base learner layers. The input to the base learner layers is the feature vector. , Mean voltage similarity , Similarity variance , For Granger's cause and effect value, Time series stability index , For historical identification consistency .
[0077] In some implementations, the average voltage similarity The data was obtained by performing DTW alignment between the user voltage sequence and the transformer low-voltage side voltage sequence, calculating the Pearson correlation coefficient, and averaging the results from multiple days of sampling. Similarity variance... The method for obtaining the data is to calculate the variance of the Pearson correlation coefficient over multiple time periods, reflecting the stability of the similarity. (Time-series stability index) The data is obtained by calculating the ADF test statistic and the autocorrelation coefficient (the strength of the correlation between the values of the same time series at different time points), reflecting the stationarity of the voltage / power series. Historical identification consistency is also considered. The method of obtaining the results is to compare the identified topological relationships with the static ledger of the production system and output the results.
[0078] The output of the base learner layer is a probability vector. :
[0079] ;
[0080] ;
[0081] ;
[0082] In the formula, This represents the confidence level of the topological relationship output by logistic regression. Represents the weight vector transpose, The weight vector and feature vector of the logistic regression model. The dimensions are consistent, and each component corresponds to the importance weight of a topological feature; This represents the bias / intercept term of the logistic regression model; express Activation function, mapping linear scores to The interval is given, and the output is the confidence level in probability form.
[0083] This represents the confidence level of the topological relationship output of the gradient-enhanced decision tree; Indicates the first The characteristic spatial region divided by the trees; This represents the total number of decision trees in the gradient boosting decision tree model; Indicates the first The output weights of each decision tree represent the contribution of that tree to the final result; Represents an exponential function, when The value is 1 when the condition is met, and 0 otherwise, representing the eigenvector. Does it fall in the first place? The corresponding area of each tree.
[0084] This represents the classification result output by the support vector machine; This represents a classification function. When the input value is greater than 0, the output is +1, indicating confidence; when the input value is less than 0, the output is -1, indicating unconfidence. The kernel function is used to map feature vectors to a high-dimensional space to achieve nonlinear classification. This invention uses the radial basis function (RBF). , The width parameter represents the kernel function and controls the complexity of the mapping space; Indicates the first The Lagrange multipliers corresponding to each support vector represent the importance of that sample. Indicates the first The labels for each training sample are: +1 indicates that the topological relationship is correct, and -1 indicates that the topological relationship is incorrect. Indicates the first Feature vectors of support vectors; This indicates the number of support vectors in the training samples; This represents the bias term of the support vector machine model, used to adjust the position of the classification hyperplane.
[0085] A ridge regression model is used as the meta-learner layer. The input to the meta-learner layer is the probability vector output by the base learner layer, and the output of the meta-learner layer is the final fused confidence score. Its mathematical expression is as follows:
[0086] ;
[0087] In the formula, This indicates the final fusion confidence score; This represents the activation function that maps the linear output to... The interval transforms the fusion results of the model into a probabilistic form of confidence. This represents the probability vector output by the base learner layer; and This represents the weights and intercepts of the meta-learner layer obtained through five-fold cross-validation training.
[0088] In a further implementation, in step S4, when When the value is below the alarm threshold but above the error threshold, the system outputs a topology anomaly alarm and pushes it to the maintenance personnel.
[0089] when When the confidence level exceeds the preset high confidence threshold (e.g., 0.95) for multiple consecutive data collection periods (e.g., 7 days), the system automatically generates a correction command, updates the device connection relationship in the production system through a standard interface (e.g., WebService or message bus), drives the digital twin model of the power distribution network to be updated synchronously, and realizes closed-loop correction of the topology.
[0090] Example 1
[0091] This embodiment takes a core power supply area of a city as an example to illustrate the specific implementation process of the present invention in detail.
[0092] Step 1: Data Layer Construction
[0093] A data acquisition front-end unit was deployed in the area to synchronize data from the marketing system (acquiring voltage curves for 1200 low-voltage users at a 15-minute granularity), the dispatching system (acquiring 15-minute granular voltage and active power curves for the beginning of 8 10kV feeders and 45 distribution transformers), and the production system (acquiring the current static topology diagram of the "station-line-transformer-user" relationship). Data was cleaned using ETL tools to remove outliers caused by communication interruptions, and missing points were filled using cubic spline interpolation. All data was then aligned to the same time axis to construct a multi-source time-series dataset.
[0094] Step 2: Feature Layer Topology Inversion
[0095] 2.1 Household-Change Relationship Identification
[0096] like Figure 2 As shown, user A, who is connected to transformer T08 in the ledger records, is selected. The voltage sequence of user A and all transformers (T01-T45) in the area is extracted, and the dynamic time warping algorithm is applied to eliminate possible time delays. The specific calculation process is as follows:
[0097] Assume user A's sequence length is 96 (15 minutes per day), and transformer sequence length is 96. Based on the recursive formula... Construct the cumulative distance matrix. Calculate the Pearson correlation coefficient after normalization. .
[0098] The calculation results show that the voltage sequence correlation coefficient between user A and transformer T05 is... The correlation coefficient with transformer T08 recorded in the ledger... The system initially determined that user A's actual physical connection should be transformer T05, and the original ledger record contained an error.
[0099] 2.2 Identification of Variable-Line Relationship
[0100] like Figure 3 As shown, for transformer T05, its active power sequence is extracted. And the active power sequence of the eight feeders in the area. Perform ADF stationarity verification on the sequence ( After all values are less than 0.05, a Granger causality test is performed, and the lag order is determined. The value is determined to be 2 according to the AIC criteria.
[0101] in accordance with Statistical formula calculate.
[0102] The test revealed that, right of The statistic is 7.83, corresponding to the Granger causality probability value. (much less than 0.05); while for the feeder L1 to which the ledger record belongs... The statistic is 0.81, corresponding to... value (Without significant causality), the system determines that the actual physical feeder of transformer T05 should be L2, not L1 as recorded in the ledger.
[0103] Step 3: Confidence assessment of decision-makers and conflict detection
[0104] like Figure 4 As shown, feature vectors are extracted. (corresponding to the mean, variance, and causality of voltage similarity, respectively) The values (stability index, historical consistency) are input into the trained Stacking ensemble model.
[0105] Base learner LR output: ;
[0106] GBDT base learner output: ;
[0107] SVM base learner output: ;
[0108] The meta-learner Ridge is based on weights Calculate linear combinations and map them using the Sigmoid function:
[0109] ;
[0110] The system compares the identification results with the static ledger in the production system and generates a conflict report.
[0111] For example, a topology conflict alarm:
[0112] 1. User A (ID:10086) is linked to T08 in the ledger, but is actually linked to T05, with a confidence level of 0.93;
[0113] 2. Transformer T05 (ID:50012) is linked to L1 in the ledger, and is actually linked to L2 with a confidence level of 0.89.
[0114] Step 4: Closed-loop execution
[0115] The system operates according to a preset strategy (for 7 consecutive days). (Trigger automatic correction), and continue to monitor the conflict.
[0116] Within 7 consecutive collection periods (7 days), the similarity between user A and T05 remained stable between 0.92 and 0.95, the causal strength between T05 and L2 remained stable between 0.85 and 0.90, and the fusion confidence of the Stacking model output was higher than 0.9.
[0117] The system determines that the automatic correction conditions are met and automatically generates a correction command:
[0118] By calling the production system (GIS) through the WebService interface, the attachment relationship of user A is changed from T08 to T05;
[0119] Change the connection of transformer T05 from L1 to L2;
[0120] Record a correction log, including the reason for the correction, confidence score, and timestamp.
[0121] After the correction is completed, the system triggers a synchronous refresh of the distribution network digital twin model, updating the basic topology parameters of the line loss calculation model.
[0122] The embodiments described above are merely preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Any modifications, alterations, substitutions, or variations made by those skilled in the art to the technical solutions of the present invention without departing from the spirit of the present invention shall fall within the protection scope defined by the claims of the present invention.
Claims
1. A method for automatic identification and verification of power distribution network topology integrating multi-source data, characterized in that: Includes the following steps: S1. Collect user electricity data from the marketing system, power flow data from the scheduling system, and equipment ledger data from the production system. Perform time-series alignment, outlier removal, and missing value imputation on the collected multi-source heterogeneous data to construct a standardized multi-source time-series dataset. S2. Identify the actual user-transformer relationship based on the similarity between the user voltage and the low-voltage side voltage of the transformer, and then identify the actual transformer-line relationship based on the causal relationship between the transformer load power and the power at the feeder head end. Construct a data-driven topology identification model to automatically reverse the physical topology relationship between the transformer-line-user. S3. Compare the identified actual topological relationships with the static ledger of the production system to generate a topological difference map, and evaluate the confidence of the identification results through an ensemble learning algorithm. S4. Output topology anomaly alarms or automatic correction instructions according to confidence level, update the ledger and synchronize the digital twin model.
2. The method for automatic identification and verification of power distribution network topology by integrating multi-source data according to claim 1, characterized in that: In step S2, the household-transformer association identification includes: Obtain user voltage data and transformer low-voltage side voltage data, and use dynamic time warping to calculate user voltage sequences. With transformer low-voltage side voltage sequence Cumulative distance matrix The dynamic programming recurrence formula is: ; In the formula, Represents the th element in the cumulative distance matrix Line number The value of the column; Represents a Euclidean distance metric. , Indicates the first Voltage data for each user, Indicates the first Voltage data on the low-voltage side of the transformer; Indicates the total number of users; Indicates the total number of transformers; This indicates selecting the path with the smallest distance among the three forward paths; Calculate the Pearson correlation coefficient for the dynamically time-normalized and aligned sequences: ; In the formula, This represents the Pearson correlation coefficient; and These represent the first and second digits after dynamic time warping and alignment. User voltage values and transformer low-voltage side voltage values at each sampling point; and These represent the mean values of the aligned sequences of user voltage and transformer low-voltage side voltage, respectively. Indicates the total number of sampling points; When the Pearson correlation coefficient is higher than the first threshold, the household-to-strong topological association is determined.
3. The method for automatic identification and verification of power distribution network topology by integrating multi-source data according to claim 1, characterized in that: In step S2, the variable-line association identification includes: Obtain transformer load power data and feeder head-end power data, and analyze the transformer load power sequence. With feeder head power sequence Perform ADF stationarity verification; Granger causality tests were used to determine whether transformer power fluctuations were a Granger cause of feeder power fluctuations, and unconstrained regression models were constructed accordingly. Constrained regression model : ; ; In the formula, Indicates the first Power sequence at the beginning of the feeder at each sampling time; This represents the constant / intercept term of the regression model; Represents the regression coefficient of the feeder power lag term; Indicates the feeder at the 1st The active power value at each sampling time; Represents the regression coefficient of the transformer power lag term; Indicates the transformer at the first The active power value at each sampling time; Represents the residual term of the regression model; Indicates the lag order of the feeder power sequence; Indicates the lag order of the transformer power sequence; Through calculation Statistical verification of lag term coefficients Joint significance: ; In the formula, and Let represent the sum of squared residuals for the constrained model and the unconstrained model, respectively; Indicates sample size; like The statistic corresponds to If the value is less than the significance level, it indicates that the transformer power fluctuation is the Granger cause of the feeder power fluctuation, thus confirming the topology attribution.
4. The method for automatic identification and verification of power distribution network topology based on multi-source data according to any one of claims 1-3, characterized in that: In step S3, the confidence evaluation of the recognition result using an ensemble learning algorithm includes: This network structure integrates logistic regression, gradient boosting decision tree, and support vector machine models as base learner layers, with feature vectors as the input to the base learner layers. , Mean voltage similarity , Similarity variance , For Granger's cause and effect value, Time series stability index , For historical identification consistency The output of the base learner layer is a probability vector. , This represents the confidence level of the topological relationship output by logistic regression. This represents the confidence level of the topological relationship output of the gradient-enhanced decision tree; This represents the classification result output by the support vector machine; A ridge regression model is used as the meta-learner layer. The input to the meta-learner layer is the probability vector output by the base learner layer, and the output of the meta-learner layer is the final fused confidence score. Its mathematical expression is as follows: ; In the formula, This indicates the final fusion confidence score; Indicates the activation function; This represents the probability vector output by the base learner layer; and This represents the weight vector and intercept of the meta-learner layer obtained through five-fold cross-validation training.
5. The method for automatic identification and verification of power distribution network topology by integrating multi-source data according to claim 4, characterized in that: The formula for the logistic regression model is as follows: ; In the formula, Represents the weight vector Transpose of; This represents the bias / intercept term of the logistic regression model; express Activation function.
6. The method for automatic identification and verification of power distribution network topology by fusing multi-source data according to claim 4, characterized in that: The formula for the gradient boosting decision tree model is as follows: ; In the formula, Indicates the first The characteristic spatial region divided by the trees; This represents the total number of decision trees in the gradient boosting decision tree model; Indicates the first The output weights of each decision tree; This represents an exponential function.
7. The method for automatic identification and verification of power distribution network topology by integrating multi-source data according to claim 4, characterized in that: The support vector machine model formula is as follows: ; In the formula, Represents the classification function; Represents the kernel function; Indicates the first Lagrange multipliers corresponding to each support vector; Indicates the first Labels of each training sample; Indicates the first Feature vectors of support vectors; This indicates the number of support vectors in the training samples; This represents the bias term of the support vector machine model.