High-proportion new energy access power system infrastructure project decision method and system
The decision-making model for power system infrastructure projects, constructed using clustering algorithms and Bayesian neural networks, solves the problem of mismatch between the construction sequence of new energy power sources and grid-connected power transmission in traditional methods. This model improves the reliability and accuracy of power system infrastructure projects with a high proportion of new energy access, thereby enhancing the operating efficiency and power supply reliability of the power system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- STATE GRID HUNAN ELECTRIC POWER COMPANY LIMITED
- Filing Date
- 2023-01-30
- Publication Date
- 2026-06-30
Smart Images

Figure CN115983710B_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of electrical automation, specifically relating to a decision-making method and system for power system infrastructure projects with a high proportion of renewable energy integration. Background Technology
[0002] With economic and technological development and the improvement of people's living standards, electricity has become an indispensable secondary energy source in people's production and daily life, bringing endless convenience. Therefore, ensuring a stable and reliable supply of electricity has become one of the most important tasks of the power system.
[0003] Currently, with more and more new energy power generation systems being integrated into the grid, decision-making on power system infrastructure projects is facing new challenges. Effective decision-making on power system infrastructure projects can significantly improve the operational efficiency and power supply reliability of the power system, promoting its sound development. Therefore, ensuring the scientific and rational nature of the project decision-making process, and making objective and effective decisions regarding project plans, has become a key focus for power system researchers.
[0004] Currently, traditional decision-making methods for power system infrastructure projects do not comprehensively consider the factors influencing project effectiveness and fail to prioritize the construction sequence of renewable energy power projects and grid-connected transmission projects. This results in low reliability, poor accuracy, and strong subjectivity in the decision-making results. Therefore, current traditional decision-making methods for power system infrastructure projects are no longer suitable for today's power systems. Summary of the Invention
[0005] One of the objectives of this invention is to provide a reliable, accurate, objective, and scientific decision-making method for power system infrastructure projects with a high proportion of renewable energy integration.
[0006] The second objective of this invention is to provide a system for decision-making on power system infrastructure projects that enable the integration of high proportions of new energy sources.
[0007] The decision-making method for power system infrastructure projects with a high proportion of renewable energy integration provided by this invention includes the following steps:
[0008] S1. Obtain historical data on power system infrastructure projects;
[0009] S2. Clean and expand the historical data obtained in step S1;
[0010] S3. Cluster the data obtained in step S2 using a clustering algorithm to extract the typical construction progress curve of the project;
[0011] S4. Based on the typical construction progress curve of the project obtained in step S3, the strong correlation influencing factors of each performance indicator are calculated using the maximum mutual information coefficient method.
[0012] S5. Construct a decision-making model for power system infrastructure projects based on Bayesian neural networks, train and extrapolate various performance indicators, output predicted values for all performance indicators, and complete the decision-making process for power system infrastructure projects with a high proportion of renewable energy access.
[0013] The historical data of power system infrastructure projects mentioned in step S1 specifically includes a typical construction progress curve model of power system infrastructure projects.
[0014] Step S2, which involves cleaning and expanding the historical data obtained in step S1, specifically includes the following steps:
[0015] The following steps are used for data cleaning:
[0016] The acquired construction progress curve is standardized to remove unit limitations and convert it into dimensionless, purely numerical data: Input the original construction progress data sequence T = (t1, t2, ..., t L After Z-Score normalization, the result is... Where x i For the transformed data, t i The data is before transformation, μ is the mean of the elements contained in sequence T, δ is the standard deviation of the elements contained in sequence T, i = 1, 2, ..., L;
[0017] After conversion, historical projects are categorized based on voltage level, project attributes, regional attributes, and construction scale. Progress data from all historical projects with the same attributes are grouped together to generate several datasets of different project types. A KNN-based outlier detection algorithm is used to remove abnormal data from the construction progress curves of projects of the same type. The algorithm sets the number of nearest neighbors k and the outlier threshold m, and iterates through the construction progress datasets of different project types to calculate outliers in each dataset. Specifically, for a data point p, the k-nearest neighbor mean distance OF(p,k) is calculated using the following formula:
[0018]
[0019] In the formula N k (p) is the set of all construction progress curve data points p that do not exceed the outlier threshold m in the historical project construction progress curve set; f dist (p,q i Let p and q be data points. i The distance between them; q iIt is one of the k neighboring data points of data point p;
[0020] Then, the average k-nearest neighbor distance of each data point is determined. If the average k-nearest neighbor distance is greater than the set outlier threshold m, the data point is added to the outlier set. Finally, the original construction progress curves corresponding to the data points in the outlier set are deleted from the dataset.
[0021] The following steps are used to augment the data:
[0022] After data cleaning, the construction progress curve dataset is D = {T1,...,T} N}, where each construction progress curve corresponds to a time series T = (t1,...,t L Weight all sequences in D to obtain a weighted time series set D' = {(T1,ω2),...,(T...}; N ,ω n )},ω n For T N The corresponding weights; calculate the weighted average T, and use the weighted average to synthesize a new construction progress curve;
[0023] Calculate the sum of squared DTW distances for each series in the weighted time series set D':
[0024]
[0025] In the formula, E represents the sample space of the construction progress curve; For calculation and T i DTW distance between them;
[0026] In the specific calculation, the DBA iterative algorithm is used for iteration: a time series is randomly selected from D' as the initial average series. Match the coordinates of the current average sequence with the coordinates of each sequence in D', and calculate the sum of squared DTW distances from the current average sequence to each sequence in D'; then, update the coordinates of the average sequence to the mean of all matching coordinates of the other sequences; continuously iterate and update the average sequence until the calculated sum of squared DTW distances no longer decreases;
[0027] The final average time series is used as the new synthetic construction progress curve and added to set D; by changing the weight value ω in D... i This yields several different weighted time series sets, which in turn generate several average time series, thus expanding the original construction progress curve.
[0028] Step S3 involves using a clustering algorithm to cluster the data obtained in step S2 and extracting a typical construction progress curve for the project. This specifically includes the following steps:
[0029] The data obtained in step S2 is clustered using the AutoEncoder deep embedded clustering model;
[0030] The AutoEncoder deep embedded clustering model includes an autoencoder layer for learning an initial compressed feature representation of the unlabeled dataset; and clustering layers stacked on top of the autoencoder layer for distributing the output of the encoding layer to the clusters.
[0031] First, the stacked autoencoder layers are initialized using a layer-by-layer greedy training strategy, with each autoencoder layer undergoing unsupervised training independently. The autoencoder layer is defined as follows:
[0032]
[0033]
[0034]
[0035]
[0036] In the formula The construction progress data is randomly mapped; Dropout() is a random mapping function used to randomly set a portion of the input dimension to 0; x is the input construction progress data; h is the transformation function; g1() is the encoder layer activation function; W1, W2, b1, and b2 are model parameters; y is the transformation function after random mapping; g2() is the encoded construction progress data; g2() is the activation parameter of the decoder layer.
[0037] Using the hidden layer output of the previous autoencoder layer as the input of the next autoencoder layer, and employing the backpropagation algorithm, the mean square error is minimized. By continuously training the network structure parameters, the weights and biases of each layer are obtained, ultimately minimizing the error between the input and the reconstruction result. After training the stacked autoencoder layers layer by layer, all the encoder layers are connected in reverse order to form the decoder layer, ultimately forming a multi-layer deep autoencoder, which is then adjusted to minimize the loss function.
[0038] The construction progress curve set is input into an initialized stacked autoencoder layer. Encoding is performed only using the encoding layer. The input construction progress data is then non-linearly mapped to a hidden layer, thus embedding it into another dimension, reducing the data dimensionality, and ultimately obtaining the feature information of the original construction progress data. This forms the initial mapping between the original data space and the feature space of the construction progress curve: fθ :X→Z, where X is the original data space, Z is the feature space, and θ is the parameter to be learned;
[0039] After obtaining low-dimensional mapped data by transmitting construction progress data through an initialized autoencoder layer, K-Means clustering is first performed in the feature space Z to initialize cluster centroids: centroids are extracted using the arithmetic mean of the corresponding coordinates of the sequences, and the set of n construction progress curves is clustered into k clusters, each cluster consisting of a centroid c. j Let j = 1, 2, ..., k;
[0040] After obtaining the initial estimate of the centroid, an unsupervised algorithm is used to train a clustering model of the construction progress curve: the unsupervised algorithm is repeated alternately between the two steps until the set convergence condition is met.
[0041] The process of using a clustering algorithm to cluster the data obtained in step S2 and extract the typical construction progress curve of the project specifically includes the following steps:
[0042] Calculate through f θ The soft assignment between the low-dimensional feature representation of the mapped construction progress curve and the cluster centroids:
[0043]
[0044] In the formula q ij To input the construction progress curve x i The probability of belonging to cluster j; z i To input the construction progress curve x i The low-dimensional eigenvectors obtained after nonlinear mapping, and z i The closer to the center of mass c j x i The higher the probability of belonging to cluster j; ||z i -c j ||For z i With c j The distance between them; α is the number of degrees of freedom of the t-distribution; c j' Let j be the centroid of cluster j';
[0045] Update mapping relationship f θ And the cluster centroids are corrected by learning the current high-confidence assignment through the auxiliary target distribution: using p ij To assist the target distribution function, and to set q ij Increasing the value to the square to improve clustering accuracy:
[0046]
[0047]
[0048] In the formula p ij For the auxiliary target distribution function; f j The sum of probabilities for assigning the eigenvectors of n low-dimensional construction progress curves in a soft assignment to cluster j; q ij′ To input the construction progress curve x i The probability of belonging to cluster j'; f j′ The sum of probabilities of assigning the eigenvectors of the n low-dimensional construction progress curves in the soft assignment to cluster j';
[0049] A deep embedding clustering model is trained by matching the soft assignment of the construction progress curve with the target distribution; a method for minimizing the soft assignment q is defined. i and auxiliary distribution p i The KL divergence between them is used as the loss function:
[0050]
[0051] In the formula, L is the loss function;
[0052] The network is trained using KL divergence, while the cluster centroid c is optimized. j And the autoencoder parameter θ, thereby improving the accuracy of clustering and assignment of construction progress curves; the low-dimensional feature vector z of the construction progress curve is calculated using the following formula. i and cluster centroid c j gradient of L:
[0053]
[0054]
[0055] In the formula, α represents the degrees of freedom of the t-distribution;
[0056] gradient The encoding network is passed to the stacked autoencoder and used to compute the gradients of the network parameters during standard backpropagation. When the change in centroid between two consecutive iterations is less than a set value, the iteration process is stopped, and the optimal cluster centroid is obtained. Finally, the centroid is decoded by a stacked decoder to generate a typical construction progress curve model.
[0057] Step S4, which involves calculating the strongly correlated influencing factors of each performance indicator based on the typical project construction progress curve obtained in step S3 using the maximum mutual information coefficient method, specifically includes the following steps:
[0058] This paper analyzes the influencing factors of the planning effectiveness of power system infrastructure projects from three dimensions: source network construction sequence, network topology, and seasonal factors. The influencing factors include the grid connection and commissioning time of new energy sources, the synchronous commissioning rate, the location of new energy access, the balance of power distribution, and the demand for winter and summer load growth.
[0059] The performance indicators to be considered include the power supply capacity improvement rate, the capacity-to-load ratio compliance rate, the number of N-1 lines that can be resolved, the number of lines that can be resolved from heavy overload, the number of main transformers that can be resolved from N-1, and the number of surrounding substations that can be alleviated from heavy overload.
[0060] Using the maximum mutual information coefficient method, a data sequence set X = {X1, X2, ..., X5} for performance influencing factors and a data sequence set Y = {Y1, Y2, ..., Y6} for performance indicators are established. The mutual information between each influencing factor and the performance indicator is calculated, representing the reduction in uncertainty of the performance indicator information due to information in the performance influencing factors. The results are obtained by calculating the difference between information entropy and conditional entropy.
[0061] I(X i ,Y j )=H(Y j )-H(Y j |X i )
[0062]
[0063]
[0064] In the formula X i Y is the sequence of the i-th influencing factor; j Let I(X) be the sequence of the j-th performance indicators; i ,Y j ) is X i and Y j Mutual information; H(Y) j H(Y) represents information entropy; j |X i P(x) is the conditional entropy; P(x) is the conditional entropy of X. i The probability of the sequence value being x; P(y) is the probability of Y. j The probability that the sequence value is y;
[0065] The data is gridded, with the grid size set to a×b, where a is the number of grid cells along the x-axis and b is the number of grid cells along the y-axis. Under constraints, the grid resolution that maximizes mutual information is determined, and the maximum mutual information value is normalized to obtain the MIC value.
[0066]
[0067] In the formula, B is the sample data volume raised to the power of 0.6;
[0068] The MIC values between different influencing factors and different performance indicators were calculated sequentially to obtain the correlation characteristics.
[0069] For each performance indicator, the larger the MIC value, the deeper the influence of the influencing factor on that performance indicator and the stronger the correlation. Finally, based on the correlation characteristic M, the strongly correlated influencing factors for each performance indicator are screened out.
[0070] Step S5, which involves constructing a decision-making model for power system infrastructure projects based on a Bayesian neural network, training and extrapolating various performance indicators, and outputting predicted values for all performance indicators, specifically includes the following steps:
[0071] Modeling is performed using a Bayesian neural network:
[0072] A Bayesian neural network (BNN) consists of an input layer, hidden layers, and an output layer. The hidden layers are probabilistic layers, enabling the network to describe uncertainty. The weights and biases in the probabilistic layers are assumed to follow a normal distribution. The weight set of the probabilistic layers is denoted as W, and p(W) is the prior distribution. Given observation data D = X, Y, X represents the input data, including basic project characteristics, data on strongly correlated influencing factors corresponding to various performance indicators, and monthly typical daily power flow distribution curves of the power system. The basic project characteristics include voltage level, engineering attributes, project area, construction scale, whether it belongs to an urban area, and whether it is a rigid project. The monthly typical daily power flow distribution curves of the power system are calculated based on power flow curves of relevant nodes in the power system. Y represents the output data, including performance indicators for power system infrastructure project planning decisions. The BNN provides the following distribution:
[0073]
[0074] In the formula, p(W|X,Y) is the posterior distribution of the probability layer weights; p(W) is the prior distribution; p(Y|X,W) is the probability distribution of the output performance data under the condition of determining the probability layer weights and the input data; and P(Y|X) is the probability distribution of the output performance data under the condition of determining the input data.
[0075] Improvements were made to the constructed Bayesian neural network model:
[0076] The input data is standardized and transformed into dimensionless, purely numerical data: The original data sequence D = (d1, d2, ..., d...) with different characteristics is input sequentially. l After Z-score normalization, the result is obtained. Where d i ' represents the converted data, d i The original data before transformation is given, μ is the mean of the elements in sequence D, and δ is the standard deviation of the elements in sequence D; the processed data conforms to a normal distribution.
[0077] A deep fully connected neural network is used to process the basic feature data of the project and the data of strongly correlated influencing factors corresponding to various performance indicators: all input features are connected layer by layer, and the hidden layer of the neural network has a non-linear activation function F to represent the relationship between the output of the upper layer neuron and the input of the lower layer neuron. The output Y of the i-th layer is defined as follows. i For Y i =F(W i ·X i +B i ), where W i Let X be the weight of the i-th hidden layer. i B is the input to the i-th hidden layer. i As the bias of the i-th layer, a modified linear unit is used as the nonlinear activation function F(x) = max(0,x). After introducing the nonlinear activation function, the fully connected neural network can extract the basic features of the input project and the nonlinear relationship between the strong correlation factors of various performance indicators and the output performance layer by layer in the hidden layer, thereby providing more complete and effective information for subsequent model training.
[0078] A one-dimensional convolutional neural network is used to extract the power flow distribution curve data of typical daily months of the power system: the input power flow distribution curve data X of typical daily months is used in the i-th convolutional layer of the one-dimensional convolutional neural network. i =[x i1 ,x i2 ,…,x it After convolution by a convolution kernel, the output data Y is obtained. i The t-th element y in it The formula for calculating y is it =F(W i ·X it +b it ), where W i Let b be the weight of the convolutional kernel in the i-th layer. it Let be the unit bias matrix of the i-th layer; set the convolution kernel length to 2k+1, where k is any positive integer, then and y it Corresponding X it For X it =[x i(t-k) ,x i(t-k+1) ,…,x i(t+k-1) ,x i(t+k) The result after convolution is subjected to average pooling, and the output of each pooling unit in the pooling layer is represented as... Where N is the kernel of the i-th pooling layer, x i,j Input data X into the i-th pooling unit i The j-th element inside;
[0079] The input data is fed into the probability layer of a Bayesian neural network, and variational inference is used to optimize and improve the weights of the probability layer: a distribution q(W) is introduced to approximate the posterior distribution p(W|X,Y), with parameters θ=(μ,σ). The weights of each layer follow a normal distribution (μ,σ). KL divergence is used to measure the difference between the two distributions, and the weight distribution is optimized by minimizing the KL divergence.
[0080]
[0081] In the formula θ * This represents the KL divergence value.
[0082] Calculated
[0083] Introducing a lower bound on evidence Where D KL (q(W)||p(W)) is the KL divergence between the two distributions q(W) and p(W);
[0084] Since the KL divergence is non-negative, the formula for calculating the lower bound of evidence is simplified to:
[0085]
[0086] Where E q(W) Let [logq(W)-logp(W)-logp(Y|X,W)] be the expectation of the distribution q(W).
[0087] The method of reparameterizing the weights is adopted: ω i =μ i +σ i ×ε i , where ε i ~N(0,1), replacing ω with ε gives
[0088]
[0089] Through several different ε i ~N(0,1) is used to estimate the expectation of the derivative, thus approximating the derivative of the KL divergence with respect to θ;
[0090] After training, the Monte Carlo sampling method is used to sample the current power system infrastructure project planning data, and finally the predicted values of various effectiveness indicators of power system infrastructure project planning decisions are output.
[0091] The method of using Monte Carlo sampling to sample current power system infrastructure project planning data and finally outputting predicted values for various effectiveness indicators of power system infrastructure project planning decisions includes the following steps:
[0092] A. Select a certain performance indicator and use the basic data of historical project planning and the data of that performance indicator to train the improved Bayesian neural network model;
[0093] B. Input the basic characteristics data of the current power system infrastructure project decision-making, and the data of factors that are strongly correlated with the selected performance indicators;
[0094] C. Starting from the earliest start time of the project in the current project plan, and taking months as the unit, let the initial k=1, input the typical daily tidal distribution curves of the corresponding month into the improved Bayesian neural network model, and after training, output the probability distribution of the selected performance indicators after k months of the current project plan starting, and calculate the corresponding mean as the prediction result.
[0095] D. Determine whether the current project plan has been fully completed based on the passage of time and months:
[0096] If not fully completed, the time will be increased by one month, and the process will proceed to step C.
[0097] If all is completed, proceed to the next step;
[0098] E. Determine whether all performance indicators have been successfully projected:
[0099] If not all are completed, select the next performance indicator and repeat steps A through D;
[0100] If all are completed, the projected values of all performance indicators after the project planning and launch will be output monthly to complete the decision-making for the current project.
[0101] This invention also provides a system for implementing the decision-making method for power system infrastructure projects with a high proportion of renewable energy access, comprising a data acquisition module, a data processing module, a progress curve calculation module, an influencing factor calculation module, and a decision-making module; the data acquisition module, data processing module, progress curve calculation module, influencing factor calculation module, and decision-making module are connected in series; the data acquisition module is used to acquire historical data of power system infrastructure projects and upload the data to the data processing module; the data processing module is used to clean and expand the historical data based on the received data and upload the data to the progress curve calculation module; the progress curve calculation module is used to cluster the received data using a clustering algorithm, extract the typical construction progress curve of the project, and upload the data to the influencing factor calculation module; the influencing factor calculation module is used to calculate the strongly correlated influencing factors of each performance indicator using the maximum mutual information coefficient method based on the received data and the obtained typical construction progress curve of the project, and upload the data to the decision-making module; the decision-making module is used to construct a decision-making model for power system infrastructure projects based on a Bayesian neural network based on the received data, train and extrapolate each performance indicator, output the predicted values of all performance indicators, and complete the decision-making process for power system infrastructure projects with a high proportion of renewable energy access.
[0102] The decision-making method and system for power system infrastructure projects with a high proportion of renewable energy integration provided by this invention, through its innovative power system infrastructure project decision-making scheme, not only enables decision-making for power system infrastructure projects with a high proportion of renewable energy integration, but also has high reliability, good accuracy, and a more objective and scientific decision-making process. Attached Figure Description
[0103] Figure 1 This is a schematic diagram of the process flow of the decision-making method of the present invention.
[0104] Figure 2 This is a schematic diagram of the AutoEncoder deep embedded clustering model of the decision-making method of this invention.
[0105] Figure 3 This is a schematic diagram of the original Bayesian neural network structure of the decision-making method of this invention.
[0106] Figure 4 This is a schematic diagram of the functional modules of the system of the present invention. Detailed Implementation
[0107] like Figure 1 The diagram shown illustrates the process flow of the decision-making method of this invention: The decision-making method for power system infrastructure projects with a high proportion of renewable energy access provided by this invention includes the following steps:
[0108] S1. Obtain historical data for power system infrastructure projects; specifically, this includes typical construction progress curve models for power system infrastructure projects, etc.
[0109] S2. Clean and augment the historical data obtained in step S1; specifically, this includes the following steps:
[0110] The following steps are used for data cleaning:
[0111] The historical power grid infrastructure project construction progress curve data is massive. To avoid the influence of some interfering, redundant, and incomplete data, the raw data needs to be preprocessed to improve data quality. The acquired construction progress curves are standardized to remove unit limitations and convert them into dimensionless, purely numerical data. The input is the original construction progress data sequence T = (t1, t2, ..., t...). L After Z-Score normalization, the result is... Where x i For the transformed data, t i The data is before transformation, μ is the mean of the elements contained in sequence T, δ is the standard deviation of the elements contained in sequence T, i = 1, 2, ..., L;
[0112] After conversion, historical projects are categorized based on voltage level, project attributes, regional attributes, and construction scale. Progress data from all historical projects with the same attributes are grouped together to generate several datasets of different project types. A KNN-based outlier detection algorithm is used to remove abnormal data from the construction progress curves of projects of the same type. The algorithm sets the number of nearest neighbors k and the outlier threshold m, and iterates through the construction progress datasets of different project types to calculate outliers in each dataset. Specifically, for a data point p, the k-nearest neighbor mean distance OF(p,k) is calculated using the following formula:
[0113]
[0114] In the formula N k (p) is the set of all construction progress curve data points p that do not exceed the outlier threshold m in the historical project construction progress curve set; f dist (p,q i Let p and q be data points. i The distance between them; q i It is one of the k neighboring data points of data point p;
[0115] Then, the average k-nearest neighbor distance of each data point is determined. If the average k-nearest neighbor distance is greater than the set outlier threshold m, the data point is added to the outlier set. Finally, the original construction progress curves corresponding to the data points in the outlier set are deleted from the dataset.
[0116] The following steps are used to augment the data:
[0117] To avoid insufficient sample size affecting clustering results, synthetic data was used to augment the sample; the cleaned construction progress curve dataset is D = {T1,...,T...} N}, where each construction progress curve corresponds to a time series T = (t1,...,t L Weight all sequences in D to obtain a weighted time series set D' = {(T1,ω2),...,(T...}; N ,ω n )},ω n For T N The corresponding weights; calculate the weighted average T, and use the weighted average to synthesize a new construction progress curve;
[0118] Calculate the sum of squared DTW distances for each series in the weighted time series set D':
[0119]
[0120] In the formula, E represents the sample space of the construction progress curve; DTW(T,T) i To calculate T and T i DTW distance between them;
[0121] In the specific calculation, the DBA iterative algorithm is used for iteration: a time series is randomly selected from D' as the initial average series T. The coordinates of the current average series are matched with the coordinates of each series in D', and the sum of squares of the DTW distances from the current average series to each series in D' is calculated. Then, since there may be one or more coordinates that match the coordinates of other series with the coordinates of the current average series, the coordinates of the average series are updated to the mean of all matching coordinates of other series, which can effectively reduce the DTW distance and thus obtain a new average series. However, the new average series will form new matching relationships with other series in D', so the average series needs to be iteratively updated continuously until the calculated sum of squares of the DTW distances no longer decreases.
[0122] The final average time series is used as the new synthetic construction progress curve and added to set D; by changing the weight value ω in D... i This yields several different weighted time series sets, which in turn generate several average time series, thus expanding the original construction progress curve.
[0123] S3. Cluster the data obtained in step S2 using a clustering algorithm to extract typical project construction progress curves; specifically including the following steps:
[0124] Using AutoEncoder deep embedded clustering model (e.g.) Figure 2 (As shown) Cluster the data obtained in step S2;
[0125] The AutoEncoder deep embedded clustering model includes an autoencoder layer for learning an initial compressed feature representation of the unlabeled dataset; and clustering layers stacked on top of the autoencoder layer for distributing the output of the encoding layer to the clusters.
[0126] First, the stacked autoencoder layers are initialized using a layer-by-layer greedy training strategy, with each autoencoder layer undergoing unsupervised training independently. The autoencoder layer is defined as follows:
[0127]
[0128]
[0129]
[0130]
[0131] In the formula The construction progress data is randomly mapped; Dropout() is a random mapping function used to randomly set a portion of the input dimension to 0; x is the input construction progress data; h is the transformation function; g1() is the encoder layer activation function; W1, W2, b1, and b2 are model parameters; y is the transformation function after random mapping; g2() is the encoded construction progress data; g2() is the activation parameter of the decoder layer.
[0132] Using the hidden layer output of the previous autoencoder layer as the input of the next autoencoder layer, and employing the backpropagation algorithm, the mean square error is minimized. By continuously training the network structure parameters, the weights and biases of each layer are obtained, ultimately minimizing the error between the input and the reconstruction result. After training the stacked autoencoder layers layer by layer, all the encoder layers are connected in reverse order to form the decoder layer, ultimately forming a multi-layer deep autoencoder, which is then adjusted to minimize the loss function.
[0133] The construction progress curve set is input into an initialized stacked autoencoder layer. Encoding is performed only using the encoding layer. The input construction progress data is then non-linearly mapped to a hidden layer, thus embedding it into another dimension, reducing the data dimensionality, and ultimately obtaining the feature information of the original construction progress data. This forms the initial mapping between the original data space and the feature space of the construction progress curve: f θ :X→Z, where X is the original data space, Z is the feature space, and θ is the parameter to be learned;
[0134] After obtaining low-dimensional mapped data by transmitting construction progress data through an initialized autoencoder layer, K-Means clustering is first performed in the feature space Z to initialize cluster centroids: centroids are extracted using the arithmetic mean of the corresponding coordinates of the sequences, and the set of n construction progress curves is clustered into k clusters, each cluster consisting of a centroid c. j Let j = 1, 2, ..., k;
[0135] After obtaining the initial estimate of the centroid, an unsupervised algorithm is used to train the clustering model of the construction progress curve: the unsupervised algorithm is repeated alternately between the two steps until the set convergence condition is met (preferably, the centroid iteration error is less than the set value);
[0136] The specific implementation includes the following steps:
[0137] Calculate through f θ The soft assignment between the low-dimensional feature representation of the mapped construction progress curve and the cluster centroids:
[0138]
[0139] In the formula q ij To input the construction progress curve x i The probability of belonging to cluster j; z i To input the construction progress curve x i The low-dimensional eigenvectors obtained after nonlinear mapping, and z i The closer to the center of mass c j x i The higher the probability of belonging to cluster j; ||z i -c j ||For z i With c j The distance between them; α is the number of degrees of freedom of the t-distribution, usually set to 1; c j' Let j be the centroid of cluster j';
[0140] Update mapping relationship f θ And the cluster centroids are corrected by learning the current high-confidence assignment through the auxiliary target distribution: using p ij To assist the target distribution function, and to set q ij Increasing the value to the square to improve clustering accuracy:
[0141]
[0142]
[0143] In the formula p ij For the auxiliary target distribution function; f jThe sum of probabilities for assigning the eigenvectors of n low-dimensional construction progress curves in a soft assignment to cluster j; q ij′ To input the construction progress curve x i The probability of belonging to cluster j'; f j′ The sum of probabilities of assigning the eigenvectors of the n low-dimensional construction progress curves in the soft assignment to cluster j';
[0144] Clustering is iteratively refined by learning from high-confidence assignments of clusters through an auxiliary target distribution; a deep embedding clustering model is trained by matching the soft assignments of the construction progress curve with the target distribution; and a method is defined to minimize the soft assignment q. i and auxiliary distribution p i The KL divergence between them is used as the loss function:
[0145]
[0146] In the formula, L is the loss function;
[0147] The network is trained using KL divergence, while the cluster centroid c is optimized. j And the autoencoder parameter θ, thereby improving the accuracy of clustering and assignment of construction progress curves; the low-dimensional feature vector z of the construction progress curve is calculated using the following formula. i and cluster centroid c j gradient of L:
[0148]
[0149]
[0150] In the formula, α represents the degrees of freedom of the t-distribution;
[0151] gradient The encoding network is passed to the stacked autoencoder and used to compute the gradients of the network parameters during standard backpropagation. When the change in centroid between two consecutive iterations is less than a set value, the iteration process is stopped, and the optimal cluster centroid is obtained. Finally, the centroid is decoded by a stacked decoder to generate a typical construction progress curve model.
[0152] S4. Based on the typical project construction progress curve obtained in step S3, the strongly correlated influencing factors of each performance indicator are calculated using the maximum mutual information coefficient method; specifically, the following steps are included:
[0153] This paper analyzes the influencing factors of the planning effectiveness of power system infrastructure projects from three dimensions: source network construction sequence, network topology, and seasonal factors. The influencing factors include the grid connection and commissioning time of new energy sources, the synchronous commissioning rate, the location of new energy access, the balance of power distribution, and the demand for winter and summer load growth.
[0154] Regarding the grid connection and commissioning time of new energy: Based on the typical construction progress curve model of projects with different voltage levels, project attributes, regional attributes, and construction scale, the construction period value of the supporting transmission project of new energy power can be obtained. Combined with the construction progress of new energy power units, the grid connection and commissioning time of new energy projects can be inferred. Since the power flow distribution of the power system is dynamic, the climate characteristics, power demand, equipment operation status, etc. corresponding to different commissioning time points will affect the effectiveness of project planning.
[0155] Regarding the synchronous commissioning rate: To ensure the timely transmission of new energy power, new energy power projects and their grid-connected transmission projects should be planned, constructed, and put into operation simultaneously, achieving coordinated development between power sources and the grid. However, in practice, the construction progress of many supporting transmission projects lags significantly behind that of power projects. This mismatch in construction timing leads to delayed commissioning of new energy projects, hindering both the improvement of power supply capacity and the development of new energy. Based on the typical construction progress curves of different types of projects, the following formula should be satisfied:
[0156] [(D i +T i )-(D j +T j )]≤0
[0157] In the formula D i D j T represents the commencement time of the supporting power grid project and the corresponding new energy power project, respectively. i T j These represent their respective construction periods. The simultaneous commissioning rate can be expressed as the ratio of the number of projects to the total number in the power system project planning formula above;
[0158] Regarding the location of new energy access: The network topology of the power system is complex, and due to the fluctuation and randomness of the output of new energy itself, the access location of new energy power sources in the power system network topology has a certain impact on the power flow distribution of the entire system. Different access locations will result in different effects of new energy projects after grid connection and commissioning.
[0159] Regarding the balance of power distribution: Due to the inconsistent distribution of power resources and electrical loads, the layout of power sources in the power system is crucial to the efficiency of the entire system operation and also affects the effectiveness of power grid project planning and construction. The degree of matching of power distribution can be measured by the power distribution coefficient, which is the degree of deviation between the product of power capacity and impedance at the connection point.
[0160]
[0161] In the formula G i Let x represent the capacity of the i-th power source. iLet be the impedance value between the i-th power source and the connection point;
[0162] In response to the increased demand for winter and summer loads: Seasonal factors have a significant impact on the power grid's supply load. Due to extreme weather, residents' demand for heating and cooling increases, leading to a substantial increase in electricity load. Some areas face severe challenges during peak electricity consumption periods in winter and summer. Whether the demand for peak summer and peak winter loads can be met is one of the key considerations in the planning of power system infrastructure projects.
[0163] The performance indicators to be considered include the power supply capacity improvement rate, the capacity-to-load ratio compliance rate, the number of N-1 lines that can be resolved, the number of lines that can be resolved from heavy overload, the number of main transformers that can be resolved from N-1, and the number of surrounding substations that can be alleviated from heavy overload.
[0164] Using the maximum mutual information coefficient method, a data sequence set X = {X1, X2, ..., X5} for performance influencing factors and a data sequence set Y = {Y1, Y2, ..., Y6} for performance indicators are established. The mutual information between each influencing factor and the performance indicator is calculated, representing the reduction in uncertainty of the performance indicator information due to information in the performance influencing factors. The results are obtained by calculating the difference between information entropy and conditional entropy.
[0165] I(X i ,Y j )=H(Y j )-H(Y j |X i )
[0166]
[0167]
[0168] In the formula X i Y is the sequence of the i-th influencing factor; j Let I(X) be the sequence of the j-th performance indicators; i ,Y j ) is X i and Y j Mutual information; H(Y) j H(Y) represents information entropy; j |X i P(x) is the conditional entropy; P(x) is the conditional entropy of X. i The probability of the sequence value being x; P(y) is the probability of Y. j The probability that the sequence value is y;
[0169] Since mutual information (MI) cannot be normalized, the results vary greatly among different influencing factors and performance indicators, making horizontal comparisons impossible. Therefore, the maximum mutual information coefficient (MIC) is introduced for measurement. The data is gridded, with the grid size set to a×b, where a is the number of grids along the x-axis and b is the number of grids along the y-axis. Under constraints, the grid resolution that maximizes mutual information is determined, and the maximum mutual information value is normalized to obtain the MIC value.
[0170]
[0171] In the formula, B is the sample data volume raised to the power of 0.6;
[0172] The MIC values between different influencing factors and different performance indicators were calculated sequentially to obtain the correlation characteristics. The details are shown in Table 1:
[0173] Table 1. Schematic diagram of the correlation characteristics between performance indicators and influencing factors of power system infrastructure projects.
[0174]
[0175] For each performance indicator, the larger the MIC value, the deeper the influence of the influencing factor on that performance indicator and the stronger the correlation. Finally, based on the correlation characteristic M, the strongly correlated influencing factors for each performance indicator are screened out.
[0176] S5. Construct a decision-making model for power system infrastructure projects based on Bayesian neural networks, train and extrapolate various performance indicators, output predicted values for all performance indicators, and complete the decision-making process for power system infrastructure projects with a high proportion of renewable energy integration; specifically including the following steps:
[0177] Using Bayesian neural networks (e.g.) Figure 3 Modeling is performed as shown:
[0178] A Bayesian neural network (BNN) consists of an input layer, hidden layers, and an output layer. The hidden layers are probabilistic layers, enabling the network to describe uncertainty. The weights and biases in the probabilistic layers are assumed to follow a normal distribution. The weight set of the probabilistic layers is denoted as W, and p(W) is the prior distribution. Given observation data D = X, Y, X represents the input data, including basic project characteristics, data on strongly correlated influencing factors corresponding to various performance indicators, and monthly typical daily power flow distribution curves of the power system. The basic project characteristics include voltage level, engineering attributes, project area, construction scale, whether it belongs to an urban area, and whether it is a rigid project. The monthly typical daily power flow distribution curves of the power system are calculated based on power flow curves of relevant nodes in the power system. Y represents the output data, including performance indicators for power system infrastructure project planning decisions. The BNN provides the following distribution:
[0179]
[0180] In the formula, p(W|X,Y) is the posterior distribution of the probability layer weights; p(W) is the prior distribution; p(Y|X,W) is the probability distribution of the output performance data under the condition of determining the probability layer weights and the input data; and P(Y|X) is the probability distribution of the output performance data under the condition of determining the input data.
[0181] Improvements were made to the constructed Bayesian neural network model:
[0182] The input data contains multiple features of the project plan. These features have different attributes, units, and significantly different orders of magnitude. Therefore, the input data is standardized and transformed into dimensionless, purely numerical data. The original data sequence D = (d1, d2, ..., d...) is input sequentially for each feature. l After Z-score normalization, the result is obtained. Where d i ' represents the converted data, d i The original data before transformation is given, μ is the mean of the elements in sequence D, and δ is the standard deviation of the elements in sequence D; the processed data conforms to a normal distribution.
[0183] Due to the diverse types and characteristics of the input data, different modules are inserted at the input of the Bayesian neural network to preprocess the input data in order to fully extract its internal effectiveness and improve the model training effect. The data on basic project features and performance influencing factors are rich in latent variables and have highly complex nonlinear relationships with the various indicators of the output project planning performance. To fully extract the key information, a deep fully connected neural network is used to process the basic project feature data and the strongly correlated influencing factor data corresponding to the various performance indicators: all input features are connected layer by layer. To describe the nonlinear relationships between the data, a nonlinear activation function F exists in the hidden layer of the neural network to represent the relationship between the output of the upper layer neuron and the input of the lower layer neuron. The output Y of the i-th layer is defined as... i For Y i =F(W i ·X i +B i ), where W i Let X be the weight of the i-th hidden layer. i B is the input to the i-th hidden layer. iAs the bias of the i-th layer, a modified linear unit is used as the nonlinear activation function F(x) = max(0,x). After introducing the nonlinear activation function, the fully connected neural network can extract the basic features of the input project and the nonlinear relationship between the strong correlation factors of various performance indicators and the output performance layer by layer in the hidden layer, thereby providing more complete and effective information for subsequent model training.
[0184] Unlike the basic characteristics and factors influencing the effectiveness of the project, the input monthly typical daily power flow curve data has very obvious time-series characteristics. A one-dimensional convolutional neural network is used to extract the input monthly typical daily power flow distribution curve data of the power system: In the i-th convolutional layer of the one-dimensional convolutional neural network, the input monthly typical daily power flow distribution curve data X... i =[x i1 ,x i2 ,…,x it After convolution by a convolution kernel, the output data Y is obtained. i The t-th element y in it The formula for calculating y is it =F(W i ·X it +b it ), where W i Let b be the weight of the convolutional kernel in the i-th layer. it Let be the unit bias matrix of the i-th layer; set the convolution kernel length to 2k+1, where k is any positive integer, then and y it Corresponding X it For X it =[x i(t-k) ,x i(t-k+1) ,…,x i(t+k-1) ,x i(t+k) The result after convolution is subjected to average pooling, and the output of each pooling unit in the pooling layer is represented as... Where N is the kernel of the i-th pooling layer, x i,j Input data X into the i-th pooling unit i The j-th element inside; the input monthly typical daily tidal current distribution curve data is transformed into variables with significant features after pooling, which can obtain the effective time series information contained therein, while simplifying the number of units and variables in the neural network;
[0185] The input data is fed into the probability layer of a Bayesian neural network, and variational inference is used to optimize and improve the weights of the probability layer: a distribution q(W) is introduced to approximate the posterior distribution p(W|X,Y), with parameters θ=(μ,σ). The weights of each layer follow a normal distribution (μ,σ). KL divergence is used to measure the difference between the two distributions, and the weight distribution is optimized by minimizing the KL divergence.
[0186]
[0187] In the formula θ * This represents the KL divergence value.
[0188] Calculated
[0189] Introducing a lower bound on evidence Where D KL (q(W)||p(W)) is the KL divergence between the two distributions q(W) and p(W);
[0190] Since the KL divergence is non-negative, the formula for calculating the lower bound of evidence is simplified to:
[0191]
[0192] Where E q(W) Let [logq(W)-logp(W)-logp(Y|X,W)] be the expectation of the distribution q(W).
[0193] The method of reparameterizing the weights is adopted: ω i =μ i +σ i ×ε i , where ε i ~N(0,1), replacing ω with ε gives
[0194]
[0195] Through several different ε i ~N(0,1) is used to estimate the expectation of the derivative, thus approximating the derivative of the KL divergence with respect to θ;
[0196] After training, the Monte Carlo sampling method is used to sample the current power system infrastructure project planning data, and finally the predicted values of various effectiveness indicators of power system infrastructure project planning decisions are output.
[0197] The specific implementation includes the following steps:
[0198] A. Select a certain performance indicator and use the basic data of historical project planning and the data of that performance indicator to train the improved Bayesian neural network model;
[0199] B. Input the basic characteristics data of the current power system infrastructure project decision-making, and the data of factors that are strongly correlated with the selected performance indicators;
[0200] C. Starting from the earliest start time of the project in the current project plan, and taking months as the unit, let the initial k=1, input the typical daily tidal distribution curves of the corresponding month into the improved Bayesian neural network model, and after training, output the probability distribution of the selected performance indicators after k months of the current project plan starting, and calculate the corresponding mean as the prediction result.
[0201] D. Determine whether the current project plan has been fully completed based on the passage of time and months:
[0202] If not fully completed, the time will be increased by one month, and the process will proceed to step C.
[0203] If all is completed, proceed to the next step;
[0204] E. Determine whether all performance indicators have been successfully projected:
[0205] If not all are completed, select the next performance indicator and repeat steps A through D;
[0206] If all are completed, the projected values of all performance indicators after the project planning and launch will be output monthly to complete the decision-making for the current project.
[0207] like Figure 4 The diagram shows the functional modules of the system of the present invention: The system provided by the present invention for implementing the decision-making method of power system infrastructure projects with a high proportion of new energy access includes a data acquisition module, a data processing module, a progress curve calculation module, an influencing factor calculation module, and a decision-making module; the data acquisition module, data processing module, progress curve calculation module, influencing factor calculation module, and decision-making module are connected in series; the data acquisition module is used to acquire historical data of power system infrastructure projects and upload the data to the data processing module; the data processing module is used to clean and expand the historical data according to the received data and upload the data to the progress curve calculation module; the progress curve calculation module... The first module uses clustering algorithms to cluster the received data, extract typical project construction progress curves, and upload the data to the influencing factor calculation module. The second module uses the maximum mutual information coefficient method to calculate the strongly correlated influencing factors of each performance indicator based on the received data and the typical project construction progress curves, and uploads the data to the decision-making module. The third module uses Bayesian neural networks to build a decision-making model for power system infrastructure projects based on the received data, trains and extrapolates each performance indicator, outputs the predicted values of all performance indicators, and completes the decision-making process for power system infrastructure projects with a high proportion of renewable energy access.
Claims
1. A decision-making method for power system infrastructure projects with a high proportion of renewable energy integration, comprising the following steps: S1. Obtain historical data on power system infrastructure projects; S2. Clean and expand the historical data obtained in step S1; S3. Cluster the data obtained in step S2 using a clustering algorithm to extract the typical construction progress curve of the project; S4. Based on the typical construction progress curve of the project obtained in step S3, the strong correlation influencing factors of each performance indicator are calculated using the maximum mutual information coefficient method. S5. Construct a decision-making model for power system infrastructure projects based on Bayesian neural networks, train and extrapolate various performance indicators, output predicted values for all performance indicators, and complete the decision-making process for power system infrastructure projects with a high proportion of renewable energy integration; specifically including the following steps: Modeling is performed using a Bayesian neural network: A Bayesian neural network consists of an input layer, hidden layers, and an output layer; the hidden layers are probability layers, enabling the network to describe uncertainty, and the weights and biases in the probability layers are assumed to follow a normal distribution; the weight set of the probability layers is defined as follows. , Given the prior distribution and the observed data ; The input data includes basic project characteristic data, data on strongly correlated influencing factors corresponding to various performance indicators, and monthly typical daily power flow distribution curve data of the power system. Among them, the basic project characteristic data includes voltage level, engineering attributes, project area, construction scale, whether it is located in an urban area, and whether it is a rigid project; the monthly typical daily power flow distribution curve data of the power system is obtained by simulation calculation based on the power flow curves of relevant nodes of the power system. To provide output data, including performance indicators for power system infrastructure project planning and decision-making, BNN provides the following distribution: In the formula Let be the posterior distribution of the probability layer weights; It is the prior distribution; To output the probability distribution of performance data under the given probability layer weights and input data conditions; To output the probability distribution of performance data given the input data; Improvements were made to the constructed Bayesian neural network model: The input data is standardized and transformed into dimensionless, purely numerical data: This involves sequentially inputting raw data sequences with different characteristics. After Z-score normalization, the result is ,in For the converted data, The original data before conversion. Let be the mean of the elements contained in sequence D. Let be the standard deviation of the elements contained in sequence D; the processed data conforms to a normal distribution. A deep fully connected neural network is used to process the basic feature data of the project and the data of strongly correlated influencing factors corresponding to various performance indicators: all input features are connected layer by layer, and the hidden layer of the neural network has a non-linear activation function F to represent the relationship between the output of the upper layer neuron and the input of the lower layer neuron. The output of the i-th layer is defined. for ,in Let be the weights of the i-th hidden layer. This is the input to the i-th hidden layer. For the bias of the i-th layer, a modified linear unit is used as the nonlinear activation function. By introducing a non-linear activation function, the fully connected neural network can extract the basic features of the input project and the non-linear relationship between the strong correlation factors of various performance indicators and the output performance layer by layer in the hidden layer, thereby providing more complete and effective information for subsequent model training. A one-dimensional convolutional neural network is used to extract the power flow distribution curve data of typical daily months of the power system: the power flow distribution curve data of typical daily months is input into the i-th convolutional layer of the one-dimensional convolutional neural network. After convolution by a convolution kernel, the output data is... The t-th element in The calculation formula is ,in Let the weights be the convolutional kernel weights of the i-th layer. Let be the unit bias matrix of the i-th layer; set the convolution kernel length to 2k+1, where k is any positive integer, then... corresponding for The result after convolution is subjected to average pooling, and the output of each pooling unit in the pooling layer is represented as follows: ,in For the kernel of the i-th pooling layer, Input data to the i-th pooling unit The j-th element inside; The input data is fed into the probability layer of a Bayesian neural network, and variational inference is used to optimize and improve the weights of the probability layer: a distribution is introduced. Approximate posterior distribution The corresponding parameters are The weights of each layer follow a normal distribution. KL divergence is used to measure the difference between two distributions, and the weight distribution is optimized by minimizing the KL divergence. In the formula This represents the KL divergence value. Calculated , Introducing a lower bound on evidence ,in for and KL divergence between two distributions; Since the KL divergence is non-negative, the formula for calculating the lower bound of evidence is simplified to: in for Distribution Expectations; The method of reparameterizing the weights is adopted: ,in ,use replace Later Through several different Estimate the expectation of the derivative to approximate the KL divergence. Differentiate; After training, the Monte Carlo sampling method is used to sample the current power system infrastructure project planning data, and finally the predicted values of various effectiveness indicators of power system infrastructure project planning decisions are output.
2. The decision-making method for power system infrastructure projects with a high proportion of renewable energy access as described in claim 1, characterized in that... The historical data of power system infrastructure projects mentioned in step S1 specifically includes a typical construction progress curve model of power system infrastructure projects.
3. The decision-making method for power system infrastructure projects with a high proportion of renewable energy access as described in claim 2, characterized in that... Step S2, which involves cleaning and expanding the historical data obtained in step S1, specifically includes the following steps: The following steps are used for data cleaning: The acquired construction progress curves are standardized to remove unit limitations and convert them into dimensionless, purely numerical data. The original construction progress data sequence is then input. After Z-Score normalization, the result is ,in For the converted data, The data before conversion. Let T be the mean of the elements contained in sequence T. Let T be the standard deviation of the elements contained in sequence T. ; After conversion, historical projects are categorized based on voltage level, project attributes, regional attributes, and construction scale. Progress data from all historical projects with the same attributes are grouped together to generate several datasets of different project types. An outlier detection algorithm based on KNN is used to remove abnormal data from the construction progress curves of projects of the same type. The number of neighboring objects is set. outlier threshold The construction progress datasets of different types of projects are traversed to calculate outliers in each dataset. In practice, for a data point The corresponding k-nearest neighbor average distance is calculated using the following formula. : In the formula For historical project construction progress curves, aggregate all outliers not exceeding the outlier threshold. Construction progress curve data points The set that constitutes; For data points and The distance between them; It is one of the k neighboring data points of data point p; Then, determine the k-nearest neighbor average distance for each data point. If the k-nearest neighbor average distance is greater than the set outlier threshold... If the data point is found to be outlier, it is placed in the outlier set. Finally, the original construction progress curve corresponding to the data point in the outlier set is removed from the dataset. The following steps are used to augment the data: The construction progress curve data set after data cleaning is as follows: Each construction progress curve corresponds to a time series. ;right All sequences are weighted to obtain a weighted time series set. , for The corresponding weights; Calculate the weighted average The weighted average value was then used to synthesize a new construction progress curve. Calculate the weighted time series set The sum of squared DTW distances for each sequence: In the formula To provide a sample space for the construction progress curve; For calculation and DTW distance between them; In the specific calculation, the DBA iterative algorithm is used for iteration: from A time series is randomly selected as the initial average series. The coordinates of the current average sequence and Match the coordinates of each sequence and calculate the current average sequence to... The average sequence is calculated as the sum of squared DTW distances for each sequence. Then, the coordinates of the average sequence are updated to the mean of the coordinates of all matching sequences. The average sequence is iteratively updated until the calculated sum of squared DTW distances no longer decreases. The final average time series is used as the new synthetic construction progress curve and added to the set. In the middle; by changing the weight values in D This yields several different weighted time series sets, which in turn generate several average time series, thus expanding the original construction progress curve.
4. The decision-making method for power system infrastructure projects with a high proportion of renewable energy access as described in claim 3, characterized in that... Step S3 involves using a clustering algorithm to cluster the data obtained in step S2 and extracting a typical construction progress curve for the project. This specifically includes the following steps: The data obtained in step S2 is clustered using the AutoEncoder deep embedded clustering model; The AutoEncoder deep embedded clustering model includes an autoencoder layer for learning an initial compressed feature representation of the unlabeled dataset; Clustering layers are stacked on top of the autoencoder layer to distribute the output of the encoding layer to the clusters; First, the stacked autoencoder layers are initialized using a layer-by-layer greedy training strategy, with each autoencoder layer undergoing unsupervised training independently. The autoencoder layer is defined as follows: In the formula The construction progress data is obtained after random mapping; This is a random mapping used to randomly set a portion of the input dimension to 0; For input construction progress data; It is a transformation function; The activation function for the encoder layer; , , and These are model parameters; The transformation function after random mapping; This is the coded construction progress data; These are the activation parameters for the decoder layer; Using the hidden layer output of the previous autoencoder layer as the input of the next autoencoder layer, and employing the backpropagation algorithm, the mean square error is minimized. By continuously training the network structure parameters, the weights and biases of each layer are obtained, ultimately minimizing the error between the input and the reconstruction result. After training the stacked autoencoder layers layer by layer, all the encoder layers are connected in reverse order to form the decoder layer, ultimately forming a multi-layer deep autoencoder, which is then adjusted to minimize the loss function. The construction progress curve set is input into the initialized stacked autoencoder layer. Encoding is performed using only the encoding layer. The input construction progress data is nonlinearly mapped to the hidden layer, thus embedding it into another dimension space, reducing the data dimensionality, and finally obtaining the feature information of the original construction progress data, forming an initial mapping between the original data space and feature space of the construction progress curve. ,in For the original data space, For the feature space, These are the parameters to be learned; After obtaining low-dimensional mapped data by transmitting construction progress data through an initialized autoencoder layer, K-Means clustering is first performed in the feature space Z to initialize cluster centroids: centroids are extracted using the arithmetic mean of the corresponding coordinates of the sequences, and the set of n construction progress curves is clustered into k clusters, each cluster consisting of a centroid. It means that among them ; After obtaining the initial estimate of the centroid, an unsupervised algorithm is used to train a clustering model of the construction progress curve: the unsupervised algorithm is repeated alternately between the two steps until the set convergence condition is met.
5. The decision-making method for power system infrastructure projects with a high proportion of renewable energy access as described in claim 4, characterized in that... The process of using a clustering algorithm to cluster the data obtained in step S2 and extract the typical construction progress curve of the project specifically includes the following steps: Calculation passed The soft assignment between the low-dimensional feature representation of the mapped construction progress curve and the cluster centroids: In the formula To input the construction progress curve The probability of belonging to cluster j; To input the construction progress curve The low-dimensional eigenvectors obtained after performing nonlinear mapping, and The closer to the center of mass , The higher the probability of belonging to cluster j; for and The distance between them; Let be the number of degrees of freedom of the t-distribution; For clusters The center of mass; Update mapping relationship And the cluster centroids are corrected by learning the current high-confidence assignment through the auxiliary target distribution: To assist the target distribution function, and Increasing the value to the square to improve clustering accuracy: In the formula For auxiliary target distribution function; The sum of probabilities of assigning the eigenvectors of the n low-dimensional construction progress curves in the soft assignment to cluster j; To input the construction progress curve Belongs to cluster The probability of; Assigning n low-dimensional construction progress curve eigenvectors to clusters in soft allocation. The sum of probabilities; A deep embedding clustering model is trained by matching the soft assignment of the construction progress curve with the target distribution; a method for minimizing the soft assignment is defined. and auxiliary distribution The KL divergence between them is used as the loss function: In the formula The loss function; The network is trained using KL divergence, while the cluster centroids are optimized. and automatic encoder parameters This improves the accuracy of clustering and assigning construction progress curves; the low-dimensional feature vector of the construction progress curve is calculated using the following formula. and cluster centroid gradient of L: In the formula Let be the number of degrees of freedom of the t-distribution; gradient The encoding network is passed to the stacked autoencoder and used to compute the gradients of the network parameters during standard backpropagation. When the change in centroid between two consecutive iterations is less than a set value, the iteration process is stopped to obtain the optimal cluster centroid; finally, the centroid is decoded by a stacked decoder to generate a typical construction progress curve model.
6. The decision-making method for power system infrastructure projects with a high proportion of renewable energy access as described in claim 5, characterized in that... Step S4, which involves calculating the strongly correlated influencing factors of each performance indicator based on the typical project construction progress curve obtained in step S3 using the maximum mutual information coefficient method, specifically includes the following steps: This paper analyzes the influencing factors of the planning effectiveness of power system infrastructure projects from three dimensions: source network construction sequence, network topology, and seasonal factors. The influencing factors include the grid connection and commissioning time of new energy sources, the synchronous commissioning rate, the location of new energy access, the balance of power distribution, and the demand for winter and summer load growth. The performance indicators to be considered include the power supply capacity improvement rate, the capacity-to-load ratio compliance rate, the number of N-1 lines that can be resolved, the number of lines that can be resolved from heavy overload, the number of main transformers that can be resolved from N-1, and the number of surrounding substations that can be alleviated from heavy overload. The maximum mutual information coefficient method was used to establish a data sequence set of factors influencing effectiveness. Performance indicator data sequence set ; Calculate the mutual information between each influencing factor and the performance indicator, representing the reduction in uncertainty of the performance indicator information by the information in the influencing factors; obtain the result by calculating the difference between information entropy and conditional entropy: In the formula Let i be the sequence of influencing factors; Let j be the sequence of performance indicators; for and Mutual information; Information entropy; Conditional entropy; for The probability that the sequence value is x; for The probability that the sequence value is y; Grid the data and set the grid size to... , This represents the number of grid cells along the x-axis. Let be the number of grid cells along the y-axis. Under constraints, find the grid resolution that maximizes mutual information, and normalize the maximum mutual information value as the MIC value. In the formula It is the sample data volume raised to the power of 0.6; The MIC values between different influencing factors and different performance indicators were calculated sequentially to obtain the correlation characteristics. ; For each performance indicator, the larger the MIC value, the deeper the influence of the influencing factors on that performance indicator, and the stronger the correlation; ultimately, based on the correlation characteristics... We identified the strongly correlated influencing factors for each performance indicator.
7. The decision-making method for power system infrastructure projects with a high proportion of renewable energy access as described in claim 6, characterized in that... The method of using Monte Carlo sampling to sample current power system infrastructure project planning data and finally outputting predicted values for various effectiveness indicators of power system infrastructure project planning decisions includes the following steps: A. Select a specific performance indicator and use the basic data of historical project planning and the data of that performance indicator to train the improved Bayesian neural network model; B. Input the basic characteristics data of the current power system infrastructure project decision-making, and the data of factors that are strongly correlated with the selected performance indicators; C. Starting from the earliest start time of the project in the current project plan, and using months as the unit, let the initial k=1, input the typical daily tidal flow distribution curves of the corresponding month into the improved Bayesian neural network model, and after training, output the probability distribution of the selected performance indicators after k months of the current project plan starting, and calculate the corresponding mean as the prediction result. D. Determine whether the current project plan has been fully completed based on the passage of time and months: If not fully completed, the time will be increased by one month, and the process will proceed to step C. If all is completed, proceed to the next step; E. Determine whether all performance indicators have been successfully projected: If not all are completed, select the next performance indicator and repeat steps A through D; If all are completed, the projected values of all performance indicators after the project planning and launch will be output monthly to complete the decision-making for the current project.
8. A system for implementing the decision-making method for power system infrastructure projects with a high proportion of renewable energy access as described in any one of claims 1 to 7, characterized in that... It includes a data acquisition module, a data processing module, a schedule curve calculation module, an influencing factor calculation module, and a decision-making module; these modules are connected in series. The data acquisition module acquires historical data of power system infrastructure projects and uploads the data to the data processing module. The data processing module cleans and expands the historical data based on the received data and uploads the data to the schedule curve calculation module. The schedule curve calculation module uses a clustering algorithm to cluster the received data, extracts typical construction schedule curves for the project, and uploads the data to the influencing factor calculation module. The influencing factor calculation module is used to calculate the strongly correlated influencing factors of each performance indicator based on the received data and the obtained typical construction progress curve of the project, using the maximum mutual information coefficient method, and then upload the data to the decision-making module. The decision-making module is used to construct a decision-making model for power system infrastructure projects based on the received data and a Bayesian neural network. It trains and extrapolates various performance indicators, outputs the predicted values of all performance indicators, and completes the decision-making process for power system infrastructure projects with a high proportion of renewable energy access.