A power distribution network typhoon operation scene heterogeneous data feature extraction method and system

By constructing a parallel dual-branch extraction path, and utilizing an improved semi-supervised Laplace algorithm and a hierarchical mining strategy, we extracted multi-source heterogeneous data features of the power distribution network under typhoon scenarios. This solved the problem of single data source and improved the accuracy of feature extraction and the applicability of the model.

CN122286271APending Publication Date: 2026-06-26WENZHOU ELECTRIC POWER BUREAU +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
WENZHOU ELECTRIC POWER BUREAU
Filing Date
2026-05-27
Publication Date
2026-06-26

Smart Images

  • Figure CN122286271A_ABST
    Figure CN122286271A_ABST
Patent Text Reader

Abstract

This invention relates to the field of power system automation technology and discloses a method and system for feature extraction from heterogeneous data in a distribution network operating scenario under typhoon conditions. It constructs a multi-source heterogeneous dataset based on multi-source heterogeneous data from the distribution network under typhoon conditions; designs a parallel dual-branch extraction path, where the continuous feature extraction path extracts a subset of continuous features from the multi-source heterogeneous dataset using an improved semi-supervised Laplace algorithm incorporating meteorological intensity difference weights and topological constraints; and the discrete feature extraction path extracts a subset of discrete features from the multi-source heterogeneous dataset using a hierarchical mining strategy based on globally common and locally rare features. The continuous and discrete feature subsets are then fused to obtain the final set of key fault features. This effectively integrates multi-source heterogeneous data, overcoming the problems of scarce fault samples and missing physical mechanisms, and significantly improving the accuracy, robustness, and interpretability of fault feature extraction in distribution networks under typhoon conditions.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of power system automation technology, and in particular to a method and system for extracting features from heterogeneous data in a typhoon operation scenario of a power distribution network. Background Technology

[0002] As the scale of power distribution networks continues to expand, the impact of severe weather events such as typhoons on the safe operation of power distribution networks is becoming increasingly prominent. Therefore, extracting key features that are highly correlated with faults from relevant data of power distribution networks to predict the risk of power distribution network faults is of great significance for improving the disaster resistance capability of power grids and the level of operation and maintenance decision-making.

[0003] Existing research on feature extraction and selection mainly focuses on power system stability analysis or modeling a single data source for a local object within a power system. For example, for problems such as power system transient stability analysis, transformer fault diagnosis, and wind farm and photovoltaic power prediction, existing studies typically utilize only equipment operation data or a single type of time series data, extracting features through statistical analysis or machine learning methods. While this type of research is effective in its specific application scenarios, its research objects are mostly single devices or local systems, and the data sources are relatively limited. Furthermore, for feature extraction in the context of the overall operation of the distribution network under extreme weather conditions such as typhoons, considering only grid operation data or a single type of monitoring information is insufficient to reflect the overall state characteristics of the distribution network under complex operating scenarios.

[0004] Therefore, how to solve the problem of limited data sources and accuracy in existing typhoon scenario feature extraction and feature selection schemes for power distribution networks has become a technical problem that urgently needs to be solved by those skilled in the art. Summary of the Invention

[0005] This invention provides a method and system for extracting features from heterogeneous data in a power distribution network operating scenario during a typhoon, solving the problem that existing feature extraction and selection schemes for power distribution networks in typhoon scenarios suffer from limited data sources and accuracy.

[0006] To address the aforementioned technical problems, the first aspect of this invention provides a method for extracting features from heterogeneous data in a typhoon-affected power distribution network, comprising: Acquire multi-source heterogeneous data of the power distribution network under typhoon scenarios, and process the multi-source heterogeneous data to form a multi-source heterogeneous dataset; A parallel dual-branch extraction path is constructed, which includes a continuous feature extraction path and a discrete feature extraction path. The continuous feature extraction path is used to extract features from the multi-source heterogeneous dataset by introducing an improved semi-supervised Laplace algorithm with the introduction of meteorological intensity difference weights and the topological constraints of the distribution network, to obtain a continuous feature subset. The discrete feature extraction path is used to extract features from the multi-source heterogeneous dataset by using a hierarchical mining strategy based on global common and local rare features, to obtain a discrete feature subset. The continuous feature subset and the discrete feature subset are fused to obtain the final set of key fault features.

[0007] A second aspect of the present invention provides a heterogeneous data feature extraction system for a power distribution network operating scenario during a typhoon, comprising: The dataset construction module is used to acquire multi-source heterogeneous data of the power distribution network under typhoon scenarios, and process the multi-source heterogeneous data to form a multi-source heterogeneous dataset. The feature extraction module is used to construct a parallel dual-branch extraction path that includes a continuous feature extraction path and a discrete feature extraction path. The continuous feature extraction path uses an improved semi-supervised Laplace algorithm with the introduction of meteorological intensity difference weights and the topological constraints of the distribution network to extract features from the multi-source heterogeneous dataset, obtaining a continuous feature subset. The discrete feature extraction path uses a hierarchical mining strategy based on global common and local rare features to extract features from the multi-source heterogeneous dataset, obtaining a discrete feature subset. The feature fusion module is used to fuse the continuous feature subset and the discrete feature subset to obtain the final set of key fault features.

[0008] Compared with the prior art, the beneficial effects of the embodiments of the present invention are as follows: The solution incorporates power grid operation data, static structural parameters, geographic information, meteorological data, and user load data, covering key factors affecting distribution network faults under typhoon disasters. By uniformly processing multi-source data, it solves the problems of diverse data types, inconsistent formats, and significant differences in spatiotemporal scales, laying a high-quality data foundation for subsequent feature extraction. A parallel dual-branch extraction path is constructed, employing different algorithms for continuous and discrete features respectively, avoiding the limitation of single feature extraction methods in processing certain types of information. Specifically, the continuous feature extraction path introduces an improved semi-supervised Laplace algorithm with meteorological intensity difference weights and distribution network topology constraints, effectively capturing the nonlinear changes in the impact of meteorological factors on the power grid and utilizing topology constraints to enhance the correlation between features and faults. The model can reliably extract key continuous features even in typhoon scenarios with limited labeled data, based on the correlation of fault locations. The discrete feature extraction path adopts a hierarchical mining strategy that combines globally common and locally rare features. This strategy can identify globally prevalent but easily overlooked information in discrete features, while simultaneously mining locally rare but fault-sensitive features, thus improving the representational power of discrete features. By fusing continuous and discrete feature subsets to form the final set of key fault features, the model balances the complementarity and synergy between features, improving its generalization ability and interpretability in typhoon scenarios. The solution does not rely on a large number of manually labeled samples and adopts a semi-supervised and hierarchical mining strategy. It can still operate effectively under conditions where it is difficult to obtain actual typhoon data and labels are scarce, demonstrating good feasibility and engineering applicability. Attached Figure Description

[0009] To more clearly illustrate the technical solution of the present invention, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0010] Figure 1 This is a flowchart of a method for extracting features of heterogeneous data in a typhoon operation scenario of a power distribution network, provided by a certain embodiment of the present invention; Figure 2 This is a structural diagram of a heterogeneous data feature extraction system for a power distribution network operating scenario during a typhoon, provided in a certain embodiment of the present invention. Figure label: Among them, 10 is the dataset construction module; 20 is the feature extraction module; and 30 is the feature fusion module. Detailed Implementation

[0011] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings and examples. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. The purpose of providing these embodiments is to make the disclosure of the present invention more thorough and comprehensive. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention.

[0012] In the description of this application, the terms "first," "second," "third," etc., are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of indicated technical features. Thus, a feature defined with "first," "second," "third," etc., may explicitly or implicitly include one or more of that feature. In the description of this application, unless otherwise stated, "a plurality of" means two or more. The term "and / or" as used herein includes any and all combinations of one or more of the associated listed items. Those skilled in the art will be able to understand the specific meaning of the above terms in this application according to the specific circumstances.

[0013] In the description of this application, it should be noted that, unless otherwise defined, all technical and scientific terms used in this invention have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in this specification is merely for describing specific embodiments and is not intended to limit the invention. Those skilled in the art can understand the specific meaning of the above terms in this application based on the specific circumstances.

[0014] In one embodiment, such as Figure 1 As shown, the first aspect of the present invention provides a method for extracting features from heterogeneous data in a typhoon operation scenario of a power distribution network, comprising: S1. Acquire multi-source heterogeneous data of the power distribution network under typhoon scenarios, and process the multi-source heterogeneous data to form a multi-source heterogeneous dataset; the multi-source heterogeneous data includes power grid operation data, power grid static structural parameters, geographic information data, meteorological data, and user load data; Specifically, this invention defines the overall operating state of the distribution network at a certain moment or within a certain time window under a typhoon scenario as an operating scenario sample. Each operating scenario sample corresponds to a row of data in the dataset. This sample does not focus on a single node or a single line, but rather takes the entire distribution network as the research whole, comprehensively characterizing the state characteristics of the distribution network at that moment under the influence of structure, operation, and external environment.

[0015] The data sources for the operational scenario samples include historical data, namely historical SCADA monitoring data (power grid operation data: electrical quantity data and operating status of each node and line during operation, such as node voltage amplitude, voltage phase, active power, reactive power, line power flow, and fault alarm data), historical AMI monitoring data from the electricity consumption information collection system (user load data: historical load curve data on the user side), historical meteorological observation data, static structural parameters of the power grid obtained from power grid ledger information (describing the inherent attributes of the distribution network at the structural and environmental levels, including but not limited to: line length, line type, commissioning time; number of towers, tower material, tower foundation type; whether nodes have undergone structural reinforcement; location, quantity, and topology of distribution network equipment), geographic information data (geographic information such as line underlying surface type, surface roughness, elevation, slope, and slope position), and corresponding power grid operation result information. In the historical data, each operational scenario sample can be labeled as a fault state or a normal state according to the actual operation results, serving as a labeled sample for subsequent feature selection and model training. Predictive data includes future distribution network operation data obtained from power grid operation prediction models (such as ARIMA), typhoon weather prediction data obtained from meteorological prediction models (such as NWP), as well as power grid structure parameters and geographical data. The power grid structure parameters and geographical data are static feature data and do not differ significantly from historical data. The predicted operation data is only used to construct samples of future operation scenarios. The corresponding samples do not have fault or normal labels and participate in feature space construction and analysis as unlabeled samples.

[0016] This invention introduces historical data and predicted data simultaneously, so that the resulting multi-source heterogeneous data can reflect the typical operational characteristics of past failures and cover future typhoon operation scenarios, thereby improving the engineering applicability of feature extraction results.

[0017] In one embodiment, processing the multi-source heterogeneous data to form a multi-source heterogeneous dataset includes: Each electrical quantity in the power grid operation data is taken as an operation feature, and the feature dimensions of each operation feature are unified to obtain power grid operation feature data; The static structural parameters of the power grid and the geographic information data are fused on a feeder-by-feeder basis to obtain a feeder-level integrated static feature vector; Based on the impact of typhoons, target meteorological data is extracted from the meteorological data, and the target meteorological data is processed using the worst meteorological conditions representation method to obtain the meteorological characteristics of the entire network. The user load data is aggregated on a per-feeder basis, and the aggregation results are clustered to obtain user load characteristics; The power grid operation characteristic data, the feeder-level integrated static feature vector, the whole network meteorological characteristics, and the user load characteristics are combined to form the multi-source heterogeneous dataset.

[0018] Specifically, this invention takes an electrical quantity in the power grid operation data as a power grid operation feature, comprehensively considers the electrical quantities dispersed on various nodes and lines to construct power grid operation data, and makes the data dimensions consistent between samples of different operation scenarios to obtain power grid operation feature data.

[0019] Since the static structural parameters and geographic information data of the power grid are static data, which typically do not change over time, they can significantly impact the disaster resistance and failure probability of the distribution network under typhoon disasters. Therefore, in the process of dataset construction, this invention utilizes GIS technology to spatially match geographic information data with the distribution network topology, mapping the geographic environmental features of lines and nodes onto the corresponding power grid structural units, thereby obtaining node-level and line-level object feature data. Distribution networks typically contain a large number of nodes and lines; directly incorporating all the original features of each node and line into the operational scenario samples would result in excessively high feature dimensionality. Therefore, this invention proposes a feature construction method based on the fusion of object-level risk characterization and feeder-level statistics, fusing static data on a feeder-by-feeder basis, and then converting node-level and line-level features into unified feeder-level comprehensive features.

[0020] In one embodiment, the fusion of the power grid static structural parameters and the geographic information data on a feeder-by-feeder basis to obtain a feeder-level integrated static feature vector includes: Based on the static structural parameters of the power grid and the geographic information data, for each node or line object in the distribution network, an object-level risk representation vector is constructed, which includes structural vulnerability indicators, geographic exposure indicators, and topological criticality indicators. For all objects within the same feeder, the statistical characteristics of the object-level risk characterization vector are calculated, and the proportion of objects with different risk levels within each feeder is statistically analyzed by clustering and binning to obtain the risk distribution characteristics. The statistical features and the risk distribution features are combined to form the feeder-level integrated static feature vector.

[0021] In this embodiment, an object-level risk representation vector is first constructed for each node object or each line object. Let the basic feature vector of the i-th object (node ​​or line) be... for: In the formula, d is the original static feature dimension.

[0022] Based on these fundamental feature vectors, we calculate structural vulnerability indicators to reflect the disaster resistance of equipment structures, geographical exposure indicators to characterize the degree to which the equipment's environment is affected by typhoons, and topological criticality indicators to reflect the importance of equipment in the power distribution network structure. Combining these indicators yields the object-level risk characterization vector; among them, structural vulnerability indicators... The calculation process is shown in the following formula: In the formula, This refers to the service life of the equipment, which is the ratio of the service life to the design life. The material types for the poles are as follows: steel poles 0.4, concrete poles 0.6, and wooden poles 0.9. The foundation types are as follows: pile foundation 0.3, isolated foundation 0.5, and shallow foundation 0.8. The structure is in a reinforced state, with 0.2 mm reinforced and 1.0 mm unreinforced. The weights are denoted as , and the sum of the four weights is 1, preferably 0.3, 0.25, 0.25, and 0.2.

[0023] Geographic exposure index The following formula is used to calculate: In the formula, The normalized elevation is calculated as (actual elevation value - minimum elevation value) / (maximum elevation value - minimum elevation value). Normalized slope, the ratio of the actual slope value to the maximum slope value; The slope coefficient is set to 1.0 at the top of the slope, 0.7 at the middle of the slope, and 0.4 at the toe of the slope. The surface roughness coefficient is determined by the type of underlying surface, and is set to 1.0 for water surface, 0.8 for farmland, 0.6 for urban areas, and 0.4 for forest. The weights are denoted as , and the sum of the four weights is 1, preferably 0.2, 0.2, 0.25, and 0.35.

[0024] Topological key indicators The following formula is used to calculate: In the formula, This represents the normalized topological depth from the object to the power source, which is the ratio of the distance from the object to the power source to the total length of the feeder. Indicates whether it belongs to the main line (the coefficient is 1 if it belongs, and 0.5 if it does not). This indicates the downstream impact and is set as the ratio of the number of downstream nodes to the total number of feeder nodes. The weights are 1, and the sum of the three is 0.3, 0.3, and 0.4.

[0025] After obtaining the risk characterization of all objects, feeder-level statistical fusion is performed on all objects within the same feeder. Assuming a feeder contains N objects, the feeder-level statistical characteristics are... It can be represented as: In the formula, , Object-level risk representation vector The mean and standard deviation; They are the 25th, 50th, and 75th percentiles, respectively.

[0026] To characterize the risk distribution features within the feeder, this invention further performs clustering and binning on the object-level risk representation. First, nodes and lines are clustered according to their original features, divided into k risk level categories (e.g., low-risk, medium-risk, and high-risk categories). Then, the proportion of each category in each feeder is calculated, i.e., the ratio of the number of each risk level to the total number of risk levels, thus forming a feeder-level risk distribution feature vector; preferably, the number of risk levels is 3.

[0027] Finally, the feeder-level statistical features and risk distribution features are combined to obtain a unified feeder-level comprehensive static feature vector. Through this method, the present invention not only solves the problems of excessively high original feature dimensionality and inconsistent feature quantities between feeders, but also significantly reduces the original feature dimensionality while maintaining key features of distribution network structure and geographical environment information. This achieves unified feature representation for feeders of different scales, providing standardized input for subsequent continuous feature screening and discrete feature mining.

[0028] Furthermore, considering the difficulty in obtaining high spatial resolution meteorological data in actual engineering projects, and the obvious localized nature of typhoon disasters, which often only significantly affect certain areas of the distribution network, introducing meteorological features node by node would significantly increase data dimensionality and modeling complexity. Therefore, this invention introduces the concept of a typhoon-affected area to model meteorological data: based on the typhoon's path, impact range, and the geographical distribution of the distribution network, the area directly affected by the typhoon is determined and defined as the typhoon-affected area, while the remaining areas are considered non-affected or weakly affected areas. Based on this, meteorological information is extracted only from the typhoon-affected area, and the "most severe meteorological condition representation method" is used to construct the network-wide meteorological features. That is, within the typhoon-affected area, meteorological variables such as wind speed, rainfall, wind direction, and lightning strikes are statistically analyzed; feature quantities reflecting the extreme nature of the disaster, such as maximum wind speed, maximum instantaneous wind speed, cumulative rainfall, and most unfavorable wind direction, are extracted; these most severe meteorological features are used as the network-wide meteorological features of the distribution network in the typhoon-affected area under this operating scenario. In this way, meteorological characteristics remain at a fixed dimension at the sample level, while highlighting the key environmental factors that have a decisive impact on the operation of the power distribution network.

[0029] The process for determining the typhoon-affected area is as follows: Using the probability circle of the typhoon's path forecast issued by the meteorological department as the base surface, such as the 80% probability coverage circle of the typhoon's path over the next 24 / 48 hours, the spatial overlay tool of GIS is used to overlay the surface vector data of the typhoon's 7 / 8 / 10 level wind circle onto the path probability circle. The intersection area of ​​the two is taken as the initial impact range of the typhoon's entire area. A 10-minute average wind speed ≥17.2 m / s is used as the critical wind speed for power distribution network equipment failure, with an auxiliary threshold of cumulative rainfall ≥100 mm, specifically for coastal mountainous areas. For special terrains such as narrow channels and isolated hilltops, wind speed thresholds are corrected according to power industry standards. For example, wind speeds on the windward side of coastal mountains are increased by 15%, and those on narrow channels by 20%. The corrected disaster-causing wind speed is calculated as: Basic Threshold × (1 - Terrain Increase Coefficient). In GIS, refined typhoon wind speed / rainfall raster data is filtered by attribute, retaining areas where raster attribute values ​​are ≥ the disaster-causing threshold, such as wind speed ≥ 17.2 m / s and rainfall ≥ 100 mm. This generates a typhoon-causing meteorological raster surface. For distribution network node vector data: GIS is used. Point and area spatial inclusion analysis is used to determine whether a node falls within the typhoon-affected meteorological raster area. If it does, it is identified as a "node affected by the typhoon." For distribution network line vector data: GIS line and area spatial intersection analysis is used to determine whether a line intersects with the typhoon-affected meteorological raster area. If the intersection length is ≥ 10% of the total line length (adjustable), it is identified as a "line affected by the typhoon." Based on the distribution network feeder topology partitioning, affected nodes / lines under the same feeder are aggregated to determine whether the entire feeder segment is affected, adapting to the original scheme's feeder... The requirement for line-level feature extraction; based on spatial overlay results, combined with typhoon wind speed levels, terrain correction coefficients, and the disaster resistance capabilities of distribution network components, the entire distribution network is divided into three levels. The criteria for determining the direct impact area (high-risk area) are: distribution network components are located within a wind circle of ≥10 levels (wind speed ≥24.5 m / s), or a wind circle of 8~10 levels plus wind speed correction for special terrain (mountains / narrow channels) resulting in winds ≥10 levels; characteristics: distribution network equipment is directly impacted by strong typhoons, with extremely high failure risk, and this is the core area for meteorological feature extraction in the original scheme's typhoon impact area. The criteria for determining the weak impact area (medium-risk area) are: distribution network components are located within a wind circle of 8~10 levels (wind speed 17.2~24.4 m / s), without wind speed amplification due to special terrain, or only meeting the rainfall-induced disaster threshold; characteristics: the typhoon has an indirect impact on the distribution network, with a moderate failure risk, and this can be used as an auxiliary analysis area for meteorological features in the original scheme. Criteria for determining no-impact (low-risk) zones: Distribution network components are not located within the typhoon-affected meteorological grid area, or wind speed is <8 and rainfall is <100mm; Characteristics: The typhoon has no significant impact on the distribution network and does not need to be included in the typhoon scene feature extraction scope. For distribution network lines crossing impact zones, the boundary of the line's impact zone is corrected according to the impact level of the tower nodes. If all tower nodes of a certain line segment are directly affected, the entire line segment is classified into the directly affected zone, avoiding overly fragmented division of the line's impact zone, thus obtaining the typhoon-affected zone.

[0030] To further characterize the impact of typhoon disaster warnings and user behavior on the operation of the power distribution network under extreme weather conditions, this invention extracts features from user load data on a feeder basis, based on the construction of power grid operation data, meteorological data, and geographic information data, in order to extract feature quantities that can reflect changes in group behavior from the user-side load curve.

[0031] In one embodiment, the aggregation of user load data on a per-feeder basis, and the clustering of the aggregation results to obtain user load characteristics, includes: The user load data is aggregated on a feeder-by-feeder basis to obtain a feeder-level aggregated load sequence. Statistical and correlation features are extracted from the feeder-level aggregated load sequence to classify the feeders into three categories: meteorological sensitive, load stable, and behavior random, using a clustering algorithm. The user load characteristics are obtained by processing the feeder-level aggregated load sequence based on the classification labels.

[0032] This invention acquires historical load curve data from the user side using an Access Management Information (AMI) system, and spatially aggregates this user load data according to the distribution network feeder topology, constructing a feeder-level aggregated load sequence using feeders as the basic analysis unit. This approach reduces data dimensionality while maintaining the overall expressive power of user behavior, thus improving the efficiency of subsequent feature analysis.

[0033] This invention extracts two types of features from the feeder-level aggregated load sequence: statistical features and covariate correlation features, to characterize user behavior patterns and their response to external environmental factors. Statistical features describe the overall behavior of the aggregated load curve at different time scales, including but not limited to load scale features (maximum load, minimum load, average load, load standard deviation); load fluctuation features (daily peak-to-valley difference, load change rate, load fluctuation coefficient); and electricity consumption behavior features (daily electricity consumption, nighttime electricity consumption ratio, peak-hour electricity consumption ratio, load duration, etc.). These statistical features comprehensively reflect the scale, fluctuation characteristics, and electricity consumption habits of the user group, and are used to characterize load behavior patterns under different operating scenarios. The invention also quantifies the sensitivity of user electricity consumption behavior to environmental changes by calculating correlation indices between the load sequence and meteorological variables. These correlation features include, but are not limited to, the correlation coefficient between load and rainfall; the correlation coefficient between load and wind speed; the correlation coefficient between load and temperature; and the correlation coefficient between load and typhoon meteorological variables such as wind direction. Furthermore, the Pearson correlation coefficient method is used to measure the correlation between the load sequence and covariates in the correlation calculation. The correlation coefficient is used to characterize the degree of linear association between variables.

[0034] To further explore the behavioral differences among different user groups, this invention uses the K-means clustering algorithm to classify user behavior features, thereby obtaining classification labels for user behavior: First, a user behavior feature matrix is ​​constructed. Assuming there are N feeder lines or user aggregation units, each unit extracts d-dimensional behavioral features, thus forming the user behavior feature matrix F: In the formula, For the Nth feeder aggregation unit d A behavioral characteristic.

[0035] Based on the user behavior feature matrix, K-means clustering algorithm is used to perform cluster analysis on these feature vectors, thereby dividing the feeder aggregation unit into several user groups with similar electricity consumption behavior patterns: three classification labels: weather-sensitive, load-stable, and random behavior. After clustering, corresponding classification labels are assigned to the feeder unit, and these labels are introduced into subsequent model analysis as user load characteristics. These classification labels can reflect the differences in electricity consumption habits and responses to weather conditions among different user groups, thereby enhancing the model's ability to characterize user behavior characteristics and depicting the impact of load changes on power grid operation risks from a social behavior perspective, achieving collaborative feature modeling of power grid, meteorology, geography, and social behavior.

[0036] This invention aggregates data on a feeder-by-feeder basis, converging dispersed user load behavior into overall feeder load characteristics. This reduces data dimensionality and aligns load characteristics with the operating units (feeders) of the distribution network, facilitating unified modeling with static structural features and operational characteristics. By extracting statistical features to reflect the amplitude and fluctuation characteristics of the load, and simultaneously extracting correlation features with meteorological elements, it can quantitatively characterize the load's sensitivity to weather changes, providing multi-dimensional quantitative basis for distinguishing different types of feeders. Clustering forms clearly defined classification labels, facilitating understanding by maintenance personnel and providing prior category information for distinguishing the operational characteristics of different feeders in subsequent fault feature extraction.

[0037] Finally, the obtained power grid operation characteristic data, feeder-level integrated static feature vector, network-wide meteorological features, and user load features are combined to form a multi-source heterogeneous dataset of the distribution network under the influence of typhoons, providing input data foundation for subsequent key feature screening and fault risk analysis of the distribution network under typhoon operation scenarios.

[0038] This invention solves the problem of inconsistencies in the dimensions, sampling frequency, and data structure of different electrical quantities by treating each electrical quantity in the power grid operation data as an independent operation feature and unifying its dimensions, thus avoiding errors introduced by chaotic data formats. It integrates static structure and geographic information at the feeder level to enhance the spatial correlation of features. By using the most severe meteorological conditions as a representation method, it condenses the impact of extreme weather on the entire network into unified features, effectively compressing the dimensions of meteorological data while highlighting the most threatening meteorological conditions to power grid operation, avoiding noise interference in fine-grained meteorological information. Based on feeder aggregation and clustering, it extracts user load features, revealing load response patterns. By combining power grid operation features, feeder-level static features, network-wide meteorological features, and user load features into a unified dataset, it achieves spatiotemporal alignment and structured organization of multi-dimensional information such as electrical, meteorological, geographic, and load data. This allows subsequent parallel dual-branch extraction paths to directly and accurately mine continuous and discrete features based on this dataset, improving the overall stability and engineering practicality of the method.

[0039] After constructing and extracting the aforementioned feeder-level integrated static features, meteorological features of typhoon-affected areas, and social behavioral features, the multi-source heterogeneous dataset constructed in this invention contains three main feature sources: first, feeder-level integrated features after feature engineering processing, including feeder-level integrated static feature vectors and user load feature labels; second, meteorological representative features after typhoon-affected area modeling processing; and third, raw monitoring features without feature extraction, mainly including power grid operation data such as node voltage and line power obtained from the SCADA system. These features differ significantly in data type, dimensions, numerical range, and missing data. Directly using them for subsequent feature selection and modeling analysis may lead to model bias or computational distortion. Therefore, before conducting dual-path feature extraction of continuous and discrete features, unified data preprocessing is required on the fused multi-source heterogeneous dataset to improve data quality and ensure the effectiveness of subsequent feature selection and the accuracy of model analysis. The specific preprocessing steps are as follows: Default value handling: For default values ​​in historical or predicted data caused by abnormal collection, communication failures, or incomplete predictions, appropriate imputation strategies are adopted according to the feature type, including mean imputation, nearest-time interpolation, or estimation imputation methods based on historical statistical characteristics, to ensure the integrity of sample features. Feature encoding: Categorical features are numerically encoded so that they can be used as input features in subsequent analysis. Numerical standardization: Continuous numerical features are standardized or normalized to eliminate the influence of dimensions and prevent certain features from dominating the subsequent feature selection and modeling process, taking into account the differences in units and numerical ranges of features from different sources.

[0040] Through the above steps, the multi-source heterogeneous dataset constructed in this invention achieves the following: each column of data corresponds to a power distribution network operation scenario sample; each row of data corresponds to an operation characteristic, structural characteristic, or environmental characteristic of a power distribution network; and all samples maintain consistent feature dimensions, meeting the input requirements for subsequent feature selection and fault correlation analysis. Assume there is a labeled sample set... The unlabeled sample set is .in This represents the set of labels for power distribution network operation scenarios; , Let A and B represent the feature set of labeled samples, respectively. , ; Represents a feature value; the label set is , is represented as: In the formula, This represents the overall feature dimension, which includes power grid operation characteristics, feeder-level integrated static characteristics, meteorological characteristics, and social behavior characteristics. This indicates the number of labeled samples. This represents the number of unlabeled samples. This provides a unified and standardized data foundation for subsequent feature extraction and fault feature screening of heterogeneous data in typhoon operation scenarios of power distribution networks.

[0041] S2. Construct a parallel dual-branch extraction path that includes a continuous feature extraction path and a discrete feature extraction path. The continuous feature extraction path uses an improved semi-supervised Laplace algorithm, incorporating meteorological intensity difference weights and the topological constraints of the power distribution network, to extract features from the multi-source heterogeneous dataset, obtaining a continuous feature subset. The discrete feature extraction path then uses a hierarchical mining strategy based on globally common and locally rare features to extract features from the multi-source heterogeneous dataset, obtaining a discrete feature subset. Specifically, this invention designs a dual-parallel feature extraction path—a parallel dual-branch extraction path—to address the different attributes and feature expression requirements of continuous and discrete data. This path separately completes the screening and refinement of continuous features and the mining and quantification of discrete features, specifically solving the problem that a single processing method cannot simultaneously consider the different attributes of continuous / discrete features.

[0042] This invention targets continuous features such as wind speed, temperature, air pressure, power flow, and voltage. It utilizes a continuous feature extraction path in a parallel dual-branch extraction path and adopts a two-stage semi-supervised feature selection framework. By comprehensively utilizing labeled historical data and unlabeled prediction data, it selects key continuous features with strong discriminative power and low redundancy from multi-source heterogeneous datasets, thus solving the problem that existing technologies rely on limited labeled samples and cannot characterize the physical laws of typhoons and power grids.

[0043] In one embodiment, the improved semi-supervised Laplace algorithm, which incorporates meteorological intensity difference weights and topological constraints of the distribution network, is used to extract features from the multi-source heterogeneous dataset to obtain a continuous feature subset, including: Extract all continuous features from the multi-source heterogeneous dataset to construct a semi-supervised temporal feature space; The improved semi-supervised Laplace algorithm is used to perform initial feature screening on the semi-supervised temporal feature space to obtain an initial feature subset. The initial feature subset is refined using a semi-supervised collaborative training random forest to obtain the continuous feature subset.

[0044] First, all continuous features are extracted from multi-source heterogeneous datasets to construct a semi-supervised temporal feature space. Let... Let be the feature data matrix, where , This represents the feature vector of a labeled sample. This represents the feature vector of the unlabeled sample. Extract continuous features to form a continuous feature set Let D represent the number of continuous features. Let the number of continuous feature sample points be... , The semi-supervised temporal feature space (also known as the continuous feature sample set matrix) is represented as follows: The corresponding tag set is , where: n represents the number of samples, and each column of samples corresponds to the entire power distribution network operation scenario sample under the typhoon scenario at a certain moment; D represents the feature dimension; The nth sample contains D-dimensional features, including: distribution network operation characteristics (node ​​voltage amplitude, voltage phase angle, node active / reactive power, line active / reactive power, etc.); feeder-level static characteristics; and environmental characteristics of the typhoon-affected area (representative meteorological characteristics such as maximum wind speed and maximum rainfall within the typhoon-affected area). The sample set contains two types of data: labeled sample sets. : This data comes from historical operational data, and its label indicates whether the distribution network is in a "fault state" or "normal state" at that moment. This indicates the number of labeled sample points; unlabeled sample points. This data comes from power distribution network operation forecasts and weather forecasts, and does not contain explicit fault labels. This indicates the number of unlabeled sample points.

[0045] In typhoon scenarios, the formation of power distribution network faults is not only related to numerical similarity in the feature space, but also significantly influenced by physical factors such as the electrical topology of the power distribution network, the dominant propagation direction of the typhoon, and the intensity gradient of meteorological impacts. Traditional semi-supervised Laplacian scoring methods only construct sample relationship graphs based on Euclidean distance in the feature space, which is insufficient to characterize the propagation characteristics of typhoon disasters on the power distribution network topology. To make the feature selection process more consistent with the physical mechanisms under typhoon scenarios, this invention structurally modifies the semi-supervised Laplacian scoring algorithm by introducing meteorological intensity difference weights, wind direction anisotropy propagation weights, and a feature-topology dual-graph coupling mechanism, constructing a topology-meteorology coupled Laplacian Score for Semi-supervised Feature Selection (TMC-LSDF) model as an algorithm for preliminary screening of continuous features in the filtering stage of feature selection. In one embodiment, the improved semi-supervised Laplacian algorithm is used to perform preliminary feature screening on the semi-supervised temporal feature space to obtain a preliminary feature subset, including: Based on the semi-supervised temporal feature space, the meteorological intensity difference weight and wind direction anisotropy propagation weight are calculated to construct a feature map. A power grid topology diagram is constructed using the multi-source heterogeneous dataset. The feature map and the power grid topology diagram are then fused to generate a coupled Laplacian matrix. The score of each continuous feature in the semi-supervised temporal feature space is calculated based on the coupled Laplacian matrix. Based on the scores, the continuous features in the semi-supervised temporal feature space are sorted, and the sorting results are filtered by a threshold according to a preset ratio to obtain the initial screening feature subset.

[0046] In the continuous feature extraction pathway, this invention introduces meteorological intensity difference weights (to adjust the influence of meteorological environment on sample similarity) and wind direction anisotropy propagation weights (to conform to the physical characteristics of typhoon propagation along wind direction) to correct the correlation between samples, and thereby construct a feature map characterizing the similarity of the feature space. In one embodiment, the step of calculating the meteorological intensity difference weights and wind direction anisotropy propagation weights based on the semi-supervised temporal feature space to construct the feature map includes: Based on the semi-supervised temporal feature space, the meteorological intensity value of each node object or each line object is determined, and the meteorological intensity difference weight is calculated by combining its structural vulnerability index. The connection direction angle of the power distribution network in the typhoon-affected area is determined based on the semi-supervised temporal feature space, and the wind direction anisotropy propagation weight is determined based on the connection direction angle. Calculate the spatial distance between each node object or each line object, and determine the intra-class feature map weight matrix by combining the meteorological intensity difference weight and the wind direction anisotropic propagation weight; The inter-class feature map weight matrix is ​​constructed using the multi-source heterogeneous dataset, and the feature map is constructed based on the intra-class feature map weight matrix and the inter-class feature map weight matrix.

[0047] In typhoon disaster scenarios, the probability of power distribution network failures is closely related to the distribution of meteorological intensity within the typhoon's impact area. In areas with similar meteorological intensities, power grid equipment is more likely to exhibit similar operating states or failure modes, while in areas with significant differences in meteorological intensity, the operating states of equipment often differ markedly. The similarity between operating scenarios depends not only on electrical characteristics but also on the intensity of the surrounding meteorological environment. Ignoring this factor and constructing a sample relationship graph solely based on feature distance may incorrectly classify samples in completely different meteorological environments as similar, thus affecting the feature selection results. To more accurately characterize the impact of typhoon disasters on the power distribution network, this invention further introduces a meteorological intensity difference weight in the sample relationship weighting process to reflect the moderating effect of meteorological environmental differences between different operating scenarios on the degree of sample correlation.

[0048] Let the meteorological intensity index of the operational scenario corresponding to sample i be . Then the meteorological intensity index of the operating scenario corresponding to sample j is The meteorological intensity value can be determined based on meteorological observation data under typhoon conditions. It is a weighted meteorological intensity within the affected area, and the calculation method is as follows: In the formula, Indicates sample A collection of lines and nodes located in the typhoon-affected area; For the sample In the middle node or line The meteorological values ​​at the location are determined by GIS; For nodes or lines in The weight of each location is a structural vulnerability index. ; For normalized Wind speed index; For normalized Rainfall intensity index; The weights of each meteorological element are 1, with the sum of the two being 0.7 and 0.3, respectively.

[0049] Construct a weighting function for meteorological intensity differences based on the differences in meteorological intensity values ​​among samples: In the formula, The weights are the differences in meteorological intensity between samples i and j; This is the meteorological intensity attenuation coefficient, used to control the influence of meteorological intensity differences on sample association weights, with a preferred value range of 2-3. This does not simply represent the strength of the weather, but rather the intensity of the meteorological conditions that effectively affect the power distribution network. When the sample... With sample When the differences in meteorological intensity in the corresponding operational scenarios are small, Smaller, then A larger value indicates that the two operating scenarios are more similar in meteorological conditions, and the correlation between their samples is enhanced; when the meteorological intensity of the two operating scenarios differs greatly, the weight value is reduced accordingly, thereby reducing unreasonable connections between samples.

[0050] Considering that power distribution network faults exhibit a spatial evolution characteristic of spreading along the typhoon direction, constructing a sample relationship diagram solely based on feature spatial distances would assume that the disaster impact propagates uniformly in all directions, which deviates from actual engineering scenarios. Therefore, this invention introduces anisotropic propagation weights based on wind direction when constructing the sample relationship diagram. The connection strength of the samples is adjusted by the relationship between the spatial directional angles between samples and the prevailing typhoon wind direction. This allows the sample relationship diagram to more realistically reflect the spatial propagation patterns of typhoon disasters in the power distribution network, improving the physical rationality and engineering applicability of the feature selection results.

[0051] Although the sample represents the entire power distribution network operation scenario, in the typhoon scenario, each operation scenario corresponds to a typhoon-affected area. This affected area has a clear spatial range and center of gravity. Therefore, the center of gravity of the affected area or the risk weighting center can be used to assign spatial representative coordinates to each operation scenario to characterize the typhoon's propagation directionality. Definition Let be the geometric center coordinates of the typhoon-affected area at time i, used to characterize the spatial centroid of the area with the strongest typhoon effect and highest risk. Then, at time i, the geometric center coordinates of the typhoon-affected area can be obtained by the following formula: In the formula, k is the node number located in the typhoon-affected area; Risk weights for nodes ( Indicates the wind speed at the node. Indicates the load of the node. (This indicates the structural fragility of a node).

[0052] Therefore, the direction angle of the power distribution network connection within the typhoon-affected area is: In the formula, Let be the spatial direction angle from sample i to sample j.

[0053] Construct wind direction anisotropic propagation weights based on the direction angle of the connecting line: In the formula, Weights for wind propagation in anisotropic directions; This represents the wind direction sensitivity coefficient, with a value ranging from 0.5 to 3; this paper uses 1. And when... The real-time wind speed angle has the highest weight.

[0054] In typhoon-affected power distribution network scenarios, the relationships between samples are determined not only by feature similarity but also by the power grid topology. Therefore, to further enhance the algorithm's ability to represent power distribution network structural information, this invention employs a dual-graph coupled Laplace structure. For each sample point... Using k-NN to calculate Euclidean distance (that is, the spatial distance between objects), find its k nearest neighbors and form a neighborhood set. .Note: This represents the neighborhood index set for each point, preferably 8-12. For labeled samples with the same known label in the sample set, if their distance in the feature space is less than a preset neighborhood threshold, the sample pair is considered to belong to the same intra-class neighborhood, and a connection relationship is established in the intra-class graph to characterize the compactness of samples of the same class in the local space. The construction rules for the intra-class graph weight matrix considering topological and meteorological factors are as follows: In the formula, This is the weight matrix of the in-class graph; The spatial distance between node objects or line objects within sample i and j; The characteristic spatial distance attenuation coefficient has a value range of 1-3, preferably 2.

[0055] For labeled sample pairs with different labels, or labeled samples and their unlabeled neighboring sample pairs, a discriminative constraint is introduced to construct an inter-class graph structure to characterize the separability of different operating states (fault / normal) in the feature space. Inter-class graph weight matrix. The construction rules are as follows: Based on the above intra-class and inter-class graph weight matrices, the Laplacian matrix of the feature map is constructed as follows: In the formula, The Laplacian matrix of the feature map, or simply the feature map; This is the adjacency matrix of the weight matrix; Let be the degree matrix of the feature map.

[0056] This invention introduces a structural vulnerability index to correct the weight of meteorological intensity differences, thereby improving the ability of feature extraction to focus on actual risk scenarios; calculates the anisotropic propagation weight of wind direction based on the direction angle of the connecting line, effectively characterizing the spatial pattern of typhoon disasters propagating along specific directions; distinguishes the weight matrices of intra-class and inter-class feature maps to enhance the topological consistency of feature extraction; and integrates multi-dimensional weights to construct feature maps, laying a high-quality foundation for coupling the Laplace matrix.

[0057] In one embodiment, the step of constructing a power grid topology map using the multi-source heterogeneous dataset, and fusing the feature map and the power grid topology map to generate a coupled Laplacian matrix includes: Based on the multi-source heterogeneous dataset, the adjacency relationship of each node object in the distribution network and the commissioning status of each line object are determined to construct a topology state matrix; The topological distance between samples is determined based on the topological state matrix, and the weight matrix of the intra-class graph of the topological structure is constructed based on the topological distance between samples; The inter-class graph weight matrix of the topology structure is constructed using the multi-source heterogeneous dataset, and then combined with the intra-class graph weight matrix of the topology structure to obtain the power grid topology structure graph. The feature map and the power grid topology map are weighted to obtain the coupling Laplace matrix.

[0058] In this embodiment, the present invention determines the adjacency relationships of each node object in the distribution network and the operational status of each line object based on the multi-source heterogeneous dataset, and constructs a topology state matrix accordingly. This process is represented by the following formula: In the formula, for Represents a node With nodes Adjacency relationships in the basic power distribution network; Indicates sample Branch under corresponding operating mode The operational status; Indicates sample Branch under corresponding operating mode The electrical connectivity coefficient, which integrates static topology and dynamic operating status; Indicates sample The topology state matrix corresponding to the operating mode.

[0059] Similarly, the topology state matrix under the operating mode corresponding to sample j can be calculated. Then the topological distance between samples is: In the formula, The topological distance between samples i and j; This represents the number of nodes in the distribution network.

[0060] The weight matrix of the class graph of topological structure is constructed based on the topological distance between samples. for: In the formula, The topological distance attenuation coefficient has a value range of 1-2.5, preferably 1.5.

[0061] The weight matrix of the inter-class graph of the topology structure Construction rules and Similarly, the Laplace matrix of the topological structure diagram Construction process and characteristic diagram of (also known as power grid topology diagram) Similarly, I won't go into details here.

[0062] Finally, by weighting the feature map and the power grid topology map, the topology-feature coupling Laplace matrix can be obtained. This process is represented by the following formula: In the formula, β∈[0,1] is the coupling weight coefficient. When β is large, the characteristic space structure is emphasized, and when β is small, the electrical topology structure is emphasized. The preferred value range is 0.3-0.7.

[0063] Similarly, by applying the above weighting process to the two intra-class graphs and the two inter-class graphs respectively, the coupling Laplace matrix of the intra-class graphs can be obtained. Coupled with inter-class graph Laplacian matrix .

[0064] During the operation of a power distribution network, the state changes between nodes depend not only on the similarity of operational characteristics but also on the constraints of the electrical topology. Traditional methods that rely solely on constructing sample relationship graphs based on feature space are insufficient to characterize the impact of the power grid structure on the propagation of operational states. Therefore, this invention constructs a dual-graph structure consisting of a feature graph and an electrical topology graph, and fuses the information from both types of graphs by coupling a Laplace matrix. This allows the feature selection process to consider both operational characteristic similarity and power grid structure constraints simultaneously, thereby improving the accuracy and stability of key feature selection in typhoon operation scenarios.

[0065] Subsequently, the semi-supervised Laplacian score is improved based on the Laplacian matrix obtained through coupling, for the r-th continuous feature. Its improved semi-supervised Laplace score is defined as: In the formula, To improve semi-supervised Laplace scores.

[0066] This score simultaneously measures intra-class topological-meteorological consistency, inter-class topological-meteorological separation, electrical coupling propagation characteristics, and typhoon directional influence; a smaller score indicates that the feature maintains topological-meteorological local structural consistency while having a stronger ability to distinguish between fault and normal states. In this way, the score for each continuous feature in the semi-supervised temporal feature space can be calculated.

[0067] Based on the Laplace scoring criterion mentioned above, the score of each feature is calculated, and all continuous features are ranked. The top d features (where d represents 20%-40% of the total number of continuous features) are selected according to a preset ratio to form the initial feature subset SF. This significantly reduces the feature dimensionality and provides a high-quality candidate set for subsequent refinement. This feature subset has significant advantages in the following aspects: it eliminates redundant features with low correlation to distribution network faults; it fully utilizes unlabeled prediction data without relying on a large number of labeled samples; it retains key structural information about the distribution network's operating status under typhoon scenarios; and it significantly reduces the computational complexity of subsequent encapsulated feature selection and fault prediction models.

[0068] This invention improves the responsiveness of features to typhoon dynamics by introducing meteorological intensity difference weights into the semi-supervised Laplace algorithm, introducing wind direction anisotropy propagation weights to characterize the directional propagation characteristics of typhoon disasters, fusing feature maps with power grid topology maps to achieve feature learning under electrical constraints, and achieving efficient initial feature screening based on score calculation and threshold screening using coupled Laplace matrices, thereby reducing the computational complexity of subsequent feature refinement and improving overall feature extraction efficiency.

[0069] Finally, to further eliminate redundancy between features and improve the ability of the feature set to identify power distribution network faults in typhoon operation scenarios, this invention introduces a random forest model based on semi-supervised collaborative training to refine and screen candidate features during the encapsulation stage.

[0070] Constructing a Random Forest Model: A random forest model consists of multiple decision trees, capable of handling high-dimensional, nonlinear, and multi-feature coupled data features. The importance of a feature can be measured by its contribution during the tree node splitting process. This invention utilizes random forests to evaluate the role of candidate features in distribution network fault identification. The process includes: constructing m training subsets based on labeled samples using a bootstrap sampling method, and randomly selecting h≪d (generally taking a value of h≪d) from each decision tree. The candidate features are used to divide the nodes, and m decision trees are constructed. Using the sample purity index (Gini index) as the splitting criterion, m independent decision trees are generated to form a random forest model. For a new sample, each decision tree votes independently, and the output of the majority class is taken as the final decision result.

[0071] Semi-supervised collaborative training mechanism: To fully utilize the distribution network operation information contained in a large amount of unlabeled prediction data, this invention introduces a collaborative training mechanism, constructing two independent random forest classifiers to form a semi-supervised collaborative training structure. The specific process is as follows: Labeled samples are randomly divided into two subsets, which are used to train the first random forest model and the second random forest model respectively, ensuring that the two models maintain certain differences in structure and training samples; the trained random forest model is used to predict unlabeled samples, and samples with high prediction confidence are selected and assigned corresponding pseudo-labels; the pseudo-labeled unlabeled samples are cross-added to the training set of the other random forest model to update the model parameters; the above process is repeated until the prediction results of unlabeled samples tend to stabilize or reach the preset iteration conditions. Through the above collaborative training method, the random forest model can gradually mine the distribution network operation modes and fault characteristics hidden in the unlabeled samples under limited labeled sample conditions, improving the model's generalization ability to typhoon operation scenarios. After the model training is completed, when the classification results of the first random forest model and the second random forest model for the same new sample are inconsistent, the classification result with the larger weight is selected as the final output based on the classification confidence or weight coefficient of each model.

[0072] Feature Refinement and Screening Based on Random Forest: Based on the semi-supervised collaborative random forest model, different candidate feature subsets are evaluated one by one. The specific steps are as follows: Feature Subset Construction: From the initially filtered candidate feature set, feature subsets to be evaluated are constructed according to a preset feature selection strategy (including but not limited to stepwise forward selection, stepwise backward selection, or heuristic search methods). Model Performance Evaluation: For each feature subset, the semi-supervised collaborative random forest model is used for training and testing, and classification accuracy, recognition stability, or comprehensive performance indicators are used as the evaluation criteria for the feature subset. Feature Subset Optimization: The model performance corresponding to different feature subsets is compared, and the feature subset that achieves the optimal model performance or meets the preset performance threshold is selected as the output result of the encapsulation stage. Ultimately, a set of features refined through semi-supervised collaborative training of random forests is formed—a continuous feature subset. This feature set is the key feature subset for continuous faults, which has the following characteristics: it can effectively reflect the relationship between the overall operating status of the distribution network and the fault risk under typhoon operation scenarios; it significantly reduces feature redundancy while integrating operating features, static structural features, and environmental features; it makes full use of prediction data under limited label sample conditions, improving the stability and reliability of feature selection results; and it provides high-quality, low-dimensional input features for subsequent distribution network fault prediction models or risk assessment models.

[0073] In the process of extracting continuous features, this invention: By constructing a semi-supervised temporal feature space, it fully utilizes unlabeled samples to avoid feature extraction bias caused by insufficient labels; by introducing meteorological intensity difference weights, it strengthens the influence of key meteorological periods, thus making the extracted continuous features better reflect the key mechanism of typhoons on power grid operation; by incorporating the distribution network topology as a constraint into the construction process of the Laplace matrix, it ensures that the spatial distribution of the extracted continuous features is consistent with the actual power grid topology, improving the interpretability of features in fault location and cause analysis; firstly, it uses an improved semi-supervised Laplace algorithm for initial feature screening, quickly eliminating redundant and irrelevant features to obtain an initial feature subset; then, it refines the features based on semi-supervised co-trained random forest, and further filters high-discriminative features using cross-validation of multiple base learners. This two-step approach controls computational complexity while ensuring the robustness and generalization ability of the final continuous feature subset.

[0074] This invention targets discrete features such as line type, tower structure, terrain type, and whether or not lightning strikes. It uses a hierarchical mining strategy based on global common and local rare features to extract key discrete features that are strongly correlated with and high-risk to power distribution network faults from multi-source heterogeneous datasets. This solves the problems of low automation in processing discrete features and inability to identify rare and high-risk factors in existing technologies.

[0075] In one embodiment, the step of extracting features from the multi-source heterogeneous dataset using a hierarchical mining strategy based on globally common and locally rare features to obtain a discrete feature subset includes: All discrete features in the multi-source heterogeneous dataset are extracted to construct a standard transaction database, and the standard transaction database is initially screened using the chi-square test method to obtain the initially screened transaction database. The Apriori algorithm is used to mine global common features in the initially screened transaction database to obtain global high-frequency itemsets of common features. The occurrence frequency of the initially screened transaction database is counted, and rare feature local high-frequency itemsets are extracted by combining the frequency threshold. The common risk weights corresponding to common features in the global high-frequency item set of common features are calculated using the conditional failure frequency gain method, and the rare risk weights corresponding to rare features in the local high-frequency item set of rare features are calculated using the Birnbaum component importance measurement method. Based on the common risk weights and the rare risk weights, feature filtering is performed on the global high-frequency itemset of the common features and the local high-frequency itemset of the rare features to obtain the discrete feature subset.

[0076] In this embodiment, labeled samples from a multi-source heterogeneous dataset Extract all discrete features Let discrete characteristics , , v This represents the feature dimension of the initial discrete feature set. Let a characteristic value be represented by a eigenvalue. There are k categories in total. Representation of features A certain category value. In association rule mining, this is also called an item. Each sample's discrete feature set and its fault state are combined to construct a transaction record, represented as... The fault status is used to characterize whether a fault has occurred in the corresponding sample. The discrete features include, but are not limited to: power distribution network structure features: whether lightning strikes occurred, whether bird damage occurred, whether trees were felled, line type, line commissioning time range, whether reinforcement was carried out, whether energy storage devices were installed, etc.; meteorological environment features: wind direction, etc.; topographic environment features: underlying surface type, surface roughness level, elevation range, slope range, slope type, etc. From the labeled sample set... Discrete features are extracted to form a discrete feature sample set (standard transaction database). , is represented as: This invention constructs a standard transaction database (each record is a "transaction", and the discrete feature value is an "item") by combining the discrete feature set of each sample with the fault state. The discrete features are explicitly categorized into three types: power grid structure, meteorological environment, and terrain environment. This transforms the discrete features into a format that can be processed by association rule mining, laying the foundation for subsequent analysis. Next, the standard transaction database undergoes initial screening of discrete features based on the chi-square test. This quickly identifies categorical features that are significantly correlated with the target variable, and then rapidly eliminates discrete features that are not significantly correlated with the fault. This data-driven feature initial screening and dimensionality reduction prevents dimensionality explosion during subsequent association rule mining.

[0077] For each candidate discrete feature Construct a contingency table and perform a chi-square test: First create a contingency table The frequency table, where this feature The number of categories is If the target variable Y has 2 categories (fault / normal), then the table is as follows: Table 1. Contingency Table Based on Samples Assumption The contingency table of expectations independent of Y is as follows: where The expected values ​​are shown in the table below: Table 2 Expected Contingency Table The chi-square statistic is calculated based on the table above. : Then, the p-value is calculated: the p-value represents the probability of observing the current contingency table situation or a more extreme situation when the null hypothesis (the feature is independent of the target Y, i.e., the feature has no effect on the fault) holds. Here, it is used to determine whether the association between the feature and the target variable is significant. The smaller the p-value, the less reliable the null hypothesis. Based on the chi-square statistic and degrees of freedom... The p-value can be obtained by consulting the chi-square distribution table or by calculating it using statistical software.

[0078] Feature selection based on significance level: Set a significance level threshold α, typically 0.05 or 0.01. If p-value ≤ α: Reject the null hypothesis, considering a significant statistical correlation between the feature and "fault status". Retain the feature for the next round of analysis. If p-value > α: There is insufficient evidence to reject the null hypothesis, considering the feature may be irrelevant to the target variable. Remove the feature. Optional, combine with chi-square value ranking: Among features with significant p-values, sort them from largest to smallest chi-square value. The larger the chi-square value, the stronger the correlation between the feature and the target variable, which can serve as a preliminary reference for feature importance ranking. After the chi-square test, manually check those features that were eliminated. For clearly high-risk factors in business operations, such as "lightning strike status" or "special terrain," even if the p-value is not significant, they should be retained. Based on the screening results of the chi-square test, a preliminary screening transaction database G is formed, which is expressed by the following formula: In the formula, w represents the feature dimension of the transaction database after initial screening.

[0079] Association rule mining was employed to analyze the transaction database, identifying high-frequency itemsets and high-confidence association rules. To ensure a direct correlation between the selected features and distribution network faults, only association rules whose consequent is a fault state were retained during the rule mining process. Specifically, when a discrete feature element is part of an association rule's antecedent and forms an association rule with a fault state that satisfies the support and confidence thresholds, that element is considered to have a significant statistical association with the occurrence of the fault. Furthermore, discrete feature elements entering high-frequency itemsets were also considered to have potential fault association value in the sample set.

[0080] By using the above method, a set of discrete feature elements that are significantly correlated with the occurrence of the fault is obtained, which serves as the candidate element set for subsequent risk weight assessment. Discrete feature elements that are not included in the high-frequency itemset and are not used as antecedents of the fault association rules will be removed and will no longer participate in subsequent calculations, thereby reducing the interference of irrelevant or weakly correlated features on the feature selection results.

[0081] This invention applies the Apriori algorithm to the initially screened transaction database G for association rule mining, and uses traditional importance assessment criteria to mine common high-frequency item sets and corresponding high-confidence association rules; wherein the traditional importance assessment criteria calculation formula includes support. and confidence level It is expressed by the following formula: In the formula, X represents a single item or set of items in the association rule mining process; Indicates the database label for the transaction, i=2,3,...,l +1 represents a column in the evaluation transaction database G; Y represents the cardinality of the set of transactions in G that simultaneously satisfy all the included conditions; M represents the fault state corresponding to the sample; and M represents the numerical range from 2 to (w+1).

[0082] The minimum support is set to 0.1 (meaning it must appear in at least 10% of transactions), and the minimum confidence is set to 0.6. Based on the above calculation results, a global high-frequency itemset of common features is obtained.

[0083] In addition, the importance assessment of traditional association rules is based on the frequency of element occurrence. This approach often ignores rare and high-risk factors. Therefore, this step improves the importance assessment formula based on the assessment logic of "conditional typicality in rare subsets". At the same time, the assessment is performed on the constructed rare factor database instead of the entire database, so as to effectively discover rare and high-risk factors.

[0084] set up There are k categories in total. Representation of features A specific category value. For each discrete feature. Statistical analysis of its various values Frequency of occurrence: Values ​​below the global threshold are classified as rare values, and values ​​above the threshold are classified as common values, forming a rare item set Xr and a common item set Xc.

[0085] Input the first discrete feature, and extract the fault records containing rare values ​​of that feature to construct the corresponding sub-database. , G represents features The transaction records contain any rare elements. In the sub-database, a feature-customized state importance evaluation criterion is used to calculate local support and confidence, and the support threshold min_SUs is lowered (e.g., 1% or an adaptive threshold). The Apriori algorithm is then called again to mine rare local high-frequency item sets. + With rules + →Fault; Perform the above steps sequentially on all w discrete features to achieve comprehensive and focused discovery of rare and high-risk combinations (single rare factor + multiple common factors → fault). Among these, local support... and local confidence The following formula is used to calculate: In the formula, j=2,3,...,w+1 represents a characteristic attribute.

[0086] Meanwhile, based on human experience and task requirements, local support thresholds and local confidence thresholds are set, and rare feature local high-frequency itemsets are obtained by combining the above calculation results.

[0087] This invention calculates the failure risk weight only for the feature values ​​of items that successfully enter the high-frequency item set. ; Items not discovered by default This method automatically eliminates noise interference. Furthermore, to avoid overestimating common but low-risk elements due to calculating risk weights solely based on element frequency, this invention proposes a risk weight calculation method based on conditional fault frequency gain for common discrete feature elements. This method, while maintaining frequency interpretability, introduces fault discrimination capability constraints, ensuring that the risk weights of common elements accurately reflect their actual impact on fault occurrence. Using the difference between the conditional fault frequency ratio and the baseline fault frequency ratio as a correction coefficient, the common risk weights corresponding to common features in the global high-frequency term set of common features are defined as follows: In the formula, the first term represents the element. The frequency of occurrence in the sample is used to reflect the representativeness of the element; in the second item... The baseline failure frequency ratio for the entire sample is used to characterize the overall average failure level of the system without considering any features. For elements The failure frequency ratio under the given conditions is used to characterize the occurrence of elements. The relative frequency of failures. Furthermore, it can be seen from the consistent occurrence of elements... When the appearance of a factor does not increase the relative frequency of fault occurrence, its risk weight is suppressed to zero, thereby avoiding interference from common but low-risk elements on the key feature selection results.

[0088] like As a rare element, the calculation method based on Birnbaum's Component Importance Measurement (CIM) is used to evaluate the difference in the impact of the element on the overall failure risk of the system when it is present and when it is absent. Based on the difference in the failure risk of the system, the "causal contribution" or "risk increment" to the failure is quantified, so as to accurately characterize the contribution of low-frequency but high-risk factors to the occurrence of failure. It is especially suitable for capturing low-frequency-high-risk (LFHR) events.

[0089] Define the rare risk weights corresponding to rare features in a local high-frequency term set of rare features. for: The first term in the formula represents the occurrence of elements The proportion of failures occurring at that time; the second term represents the proportion of failures occurring when no element is present. The proportion of failures occurring at that time. At the sample estimation level (i.e., approximating the probability using dataset frequency), that is: .

[0090] By combining the risk weights of rare and common elements, all candidate discrete feature elements are sorted according to their risk weights. Elements with risk weights higher than a preset threshold or ranking in the top few positions are selected as key features of distribution network faults in typhoon scenarios, forming a subset of discrete features. This method of selecting key discrete fault features effectively reduces feature dimensionality while ensuring the integrity of fault-related information, highlighting low-frequency, high-risk, and high-risk-gain features, providing a reliable feature foundation for subsequent fault analysis, risk assessment, and prediction model construction. Furthermore, before sorting, the calculated risk weights are normalized by applying the maximum absolute value of the original risk weights of all elements, allowing for comparison of risk weights of different types of elements within the same sorting framework. During sorting, for each discrete feature, the maximum absolute risk weight among all its values ​​is taken as the representative importance of that feature, and the top K features are selected from high to low as key discrete features of distribution network faults in typhoon scenarios. By designing risk weight calculation methods for rare and common discrete values ​​respectively, and using the extreme value principle to aggregate importance at the feature level, we can ensure the effective identification of low-frequency high-risk factors and avoid interference from high-frequency low-risk factors, thereby achieving stable and reliable fault feature selection.

[0091] This invention, in discrete feature extraction, employs a hierarchical mining approach combining "globally common" and "locally rare" features. This allows for the simultaneous capture of common features across the entire network and unique features within specific feeders or regions, avoiding the problem of overlooking important local information due to single-threshold screening. Initial feature screening utilizes chi-square verification to eliminate discrete features irrelevant or weakly correlated with fault labels, significantly reducing the dimensionality of subsequent mining and improving mining efficiency and reliability. Finally, the classic Apriori algorithm is used to perform global frequent itemset mining on the initially screened transaction database. This effectively identifies frequently occurring discrete feature combinations in typhoon scenarios, revealing the correlations between different equipment states and between equipment states and meteorological conditions, thus providing insights into fault mechanisms. The analysis provides a basis for identifying rare features by statistically analyzing their frequency and combining it with thresholds to extract local high-frequency itemsets. This allows us to discover abnormal patterns that are rare across the entire network but frequently occur on specific feeders or during specific time periods. This is of great value for identifying local hidden dangers and differentiated operation and maintenance. Common features are evaluated using the conditional fault frequency gain method to assess their gain under fault conditions, highlighting common patterns closely associated with faults. Rare features are evaluated using the Birnbaum component importance measurement method to assess the impact of changes from normal to abnormal states on system risks, highlighting rare but significant features. This differentiated processing ensures that the final selected discrete feature subset has both universal applicability and key specificity, improving the comprehensive discriminative ability of the features.

[0092] After the above processing, the parallel dual-branch extraction path outputs a set of continuous feature subsets obtained by semi-supervised filtering-encapsulation method. This subset mainly reflects the operating status change characteristics of the distribution network under typhoon scenarios. It also outputs a set of discrete feature subsets obtained by association rule mining and risk weight assessment. This subset mainly reflects the impact characteristics of distribution network structure, equipment attributes and environmental conditions on fault risk.

[0093] S3. The continuous feature subset and the discrete feature subset are fused to obtain the final set of key fault features. Specifically, after extracting the key features of the continuous feature path and the discrete feature path, the present invention further performs fusion analysis on the two types of features to comprehensively confirm the importance of key fault features in the typhoon operation scenario of the distribution network, and constructs a feature-fault state mapping model with good interpretability, thereby enhancing the understandability and operability of the feature extraction results in actual power system operation and maintenance decision-making.

[0094] By directly integrating continuous and discrete feature subsets, a final set of key fault features is constructed. This set of features covers power grid operation, structure, meteorology, geography, and social behavior. The data attributes include both continuous and discrete quantities. All features are highly correlated with faults and have low redundancy, which can characterize the formation mechanism of distribution network faults in typhoon scenarios from multiple dimensions.

[0095] Furthermore, to further verify the effectiveness of the aforementioned dual-path feature extraction method and to conduct a unified importance assessment of the obtained key features, this invention introduces the Classification and Regression Tree (CART) decision tree algorithm to model and analyze the fused feature set. Specifically, using the fault state in historical operational data as the target variable and the fused key feature set as the input features, a CART decision tree model is constructed. The overall importance of each feature in fault identification is quantified by the contribution of each feature to the improvement of sample purity during the decision tree node partitioning process. Features with higher importance indicate that the results obtained in the continuous feature selection path or the discrete feature selection path have strong consistency and reliability, thereby achieving overall verification of the aforementioned feature extraction method.

[0096] Compared to black-box models, CART decision trees can intuitively express the logical relationship between features and fault states in the form of tree structure and rules. Therefore, after completing model training, this invention further analyzes the decision tree structure, extracts the decision paths, and forms a set of clear qualitative mapping rules between features and fault states. Specifically, CART decision trees can clearly show how continuous feature values ​​within a certain range or a specific discrete feature value are associated with fault states, forming a clear mapping from features to fault states. For example, through the splitting conditions of tree nodes, it can be known that when a continuous feature (such as voltage amplitude) falls within a specific interval, or when a discrete feature (such as tower type) has a specific value, the probability of a fault occurring will significantly increase. These mapping rules not only help verify the rationality of the extracted key features, but also provide distribution network operators with intuitive and understandable decision-making basis.

[0097] This embodiment achieves unified verification and interpretation of the output results from continuous and discrete feature paths, ensuring that the final key feature subset is not only statistically highly correlated with distribution network faults but also possesses a clear physical and engineering explanation in terms of logical relationships. The resulting interpretability model helps improve the applicability and reliability of fault feature extraction results in distribution network typhoon operation scenarios for actual operation and maintenance, risk warning, and scheduling decisions.

[0098] This application addresses the limitations of existing feature extraction and selection schemes for distribution networks under typhoon scenarios, which suffer from single data sources and limited accuracy. It proposes a heterogeneous data feature extraction method for distribution networks operating during typhoons. This method simultaneously incorporates power grid operation data, static structural parameters, geographic information, meteorological data, and user load data, covering key factors affecting distribution network failures under typhoon disasters. By uniformly processing multi-source data, it solves the problems of diverse data types, inconsistent formats, and significant differences in spatiotemporal scales, laying a high-quality data foundation for subsequent feature extraction. A parallel dual-branch extraction path is constructed, employing different algorithms for continuous and discrete features respectively, avoiding the problem of insufficient processing capacity for certain types of information by a single feature extraction method. Specifically, the continuous feature extraction path introduces an improved semi-supervised Laplace algorithm based on meteorological intensity difference weights and distribution network topology constraints. It can effectively capture the nonlinear changes in the impact of meteorological factors on the power grid and enhance the correlation between features and fault locations by utilizing topological constraints. It can still stably extract key continuous features even in typhoon scenarios with limited labeled data. The discrete feature extraction path adopts a hierarchical mining strategy of global common and local rare features, which can identify globally common but easily ignored information in discrete features, while mining features that are rare in local areas but sensitive to faults, thus improving the representational ability of discrete features. The fusion of continuous feature subsets and discrete feature subsets forms the final set of key fault features, taking into account the complementarity and synergy between features, and improving the model's generalization ability and interpretability in typhoon scenarios. The solution does not rely on a large number of manually labeled samples and adopts a semi-supervised and hierarchical mining strategy. It can still run effectively under the conditions of high difficulty in obtaining actual typhoon data and scarce labels, and has good feasibility and engineering applicability.

[0099] It should be noted that although the steps in the flowchart above are shown sequentially as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless otherwise specified in this document, there is no strict order requirement for the execution of these steps, and they can be executed in other orders.

[0100] In another embodiment, such as Figure 2 As shown, a second aspect of the present invention provides a heterogeneous data feature extraction system for a power distribution network operating scenario during a typhoon, comprising: The dataset construction module 10 is used to acquire multi-source heterogeneous data of the power distribution network under typhoon scenarios, and process the multi-source heterogeneous data to form a multi-source heterogeneous dataset. The feature extraction module 20 is used to construct a parallel dual-branch extraction path that includes a continuous feature extraction path and a discrete feature extraction path. The continuous feature extraction path uses an improved semi-supervised Laplace algorithm with the introduction of meteorological intensity difference weights and the topological constraints of the distribution network to extract features from the multi-source heterogeneous dataset, obtaining a continuous feature subset. The discrete feature extraction path uses a hierarchical mining strategy based on global common and local rare features to extract features from the multi-source heterogeneous dataset, obtaining a discrete feature subset. The feature fusion module 30 is used to fuse the continuous feature subset and the discrete feature subset to obtain the final fault key feature set.

[0101] It should be noted that each module in the aforementioned heterogeneous data feature extraction system for a power distribution network under typhoon operation scenarios can be implemented entirely or partially through software, hardware, or a combination thereof. These modules can be embedded in or independent of the processor in a computer device, or stored in the memory of a computer device as software, so that the processor can call and execute the corresponding operations of each module. For specific limitations regarding the heterogeneous data feature extraction system for a power distribution network under typhoon operation scenarios, please refer to the limitations of the heterogeneous data feature extraction method for a power distribution network under typhoon operation scenarios described above; both have the same function and role, and will not be repeated here.

[0102] In summary, this invention relates to the field of power system automation technology, and discloses a method and system for feature extraction from heterogeneous data in a distribution network operating scenario under typhoon conditions. It constructs a multi-source heterogeneous dataset based on multi-source heterogeneous data from the distribution network under typhoon conditions; designs a parallel dual-branch extraction path, where the continuous feature extraction path extracts a subset of continuous features from the multi-source heterogeneous dataset using an improved semi-supervised Laplace algorithm incorporating meteorological intensity difference weights and topological constraints; and the discrete feature extraction path extracts a subset of discrete features from the multi-source heterogeneous dataset using a hierarchical mining strategy based on globally common and locally rare features. The continuous and discrete feature subsets are then fused to obtain the final set of key fault features. This effectively fuses multi-source heterogeneous data, overcoming the problems of scarce fault samples and missing physical mechanisms, and significantly improving the accuracy, robustness, and interpretability of fault feature extraction from the distribution network under typhoon conditions.

[0103] The various embodiments in this specification are described in a progressive manner. For directly identical or similar parts of the embodiments, refer to each other. Each embodiment focuses on its differences from other embodiments. In particular, the system embodiments are basically similar to the method embodiments, so the description is relatively simple; relevant parts can be referred to the descriptions in the method embodiments. It should be noted that the technical features of the above embodiments can be combined arbitrarily. For the sake of brevity, not all possible combinations of the technical features in the above embodiments are described. However, as long as the combination of these technical features does not contradict each other, it should be considered within the scope of this specification.

[0104] The embodiments described above are merely preferred embodiments of this application, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of the invention patent. It should be noted that those skilled in the art can make various improvements and substitutions without departing from the technical principles of this invention, and these improvements and substitutions should also be considered within the scope of protection of this application. Therefore, the scope of protection of this patent application should be determined by the scope of the claims.

Claims

1. A method for extracting features of heterogeneous data of a typhoon operation scenario of a power distribution network, characterized in that, include: Acquire multi-source heterogeneous data of the power distribution network under typhoon scenarios, and process the multi-source heterogeneous data to form a multi-source heterogeneous dataset; A parallel dual-branch extraction path is constructed, which includes a continuous feature extraction path and a discrete feature extraction path. The continuous feature extraction path is used to extract features from the multi-source heterogeneous dataset by introducing an improved semi-supervised Laplace algorithm with the introduction of meteorological intensity difference weights and the topological constraints of the distribution network, to obtain a continuous feature subset. The discrete feature extraction path is used to extract features from the multi-source heterogeneous dataset by using a hierarchical mining strategy based on global common and local rare features, to obtain a discrete feature subset. The continuous feature subset and the discrete feature subset are fused to obtain the final set of key fault features. 2.The method of claim 1, wherein, The multi-source heterogeneous data includes power grid operation data, power grid static structural parameters, geographic information data, meteorological data, and user load data; among which, The process of processing the multi-source heterogeneous data to form a multi-source heterogeneous dataset includes: Each electrical quantity in the power grid operation data is taken as an operation feature, and the feature dimensions of each operation feature are unified to obtain power grid operation feature data; The static structural parameters of the power grid and the geographic information data are fused on a feeder-by-feeder basis to obtain a feeder-level integrated static feature vector; Based on the impact of typhoons, target meteorological data is extracted from the meteorological data, and the target meteorological data is processed using the worst meteorological conditions representation method to obtain the meteorological characteristics of the entire network. The user load data is aggregated on a per-feeder basis, and the aggregation results are clustered to obtain user load characteristics; The power grid operation characteristic data, the feeder-level integrated static feature vector, the whole network meteorological characteristics, and the user load characteristics are combined to form the multi-source heterogeneous dataset.

3. The method of claim 2, wherein the method comprises: The process of fusing the static structural parameters of the power grid and the geographic information data on a feeder-by-feeder basis to obtain a feeder-level integrated static feature vector includes: Based on the static structural parameters of the power grid and the geographic information data, for each node or line object in the distribution network, an object-level risk representation vector is constructed, which includes structural vulnerability indicators, geographic exposure indicators, and topological criticality indicators. For all objects within the same feeder, the statistical characteristics of the object-level risk characterization vector are calculated, and the proportion of objects with different risk levels within each feeder is statistically analyzed by clustering and binning to obtain the risk distribution characteristics. The statistical features and the risk distribution features are combined to form the feeder-level integrated static feature vector.

4. The power distribution network typhoon operation scenario heterogeneous data feature extraction method according to claim 2, characterized in that, The process of aggregating the user load data on a per-feeder basis and performing clustering processing on the aggregation results to obtain user load characteristics includes: The user load data is aggregated on a feeder-by-feeder basis to obtain a feeder-level aggregated load sequence. Statistical and correlation features are extracted from the feeder-level aggregated load sequence to classify the feeders into three categories: meteorological sensitive, load stable, and behavior random, using a clustering algorithm. The user load characteristics are obtained by processing the feeder-level aggregated load sequence based on the classification labels.

5. The method of claim 3, wherein the method comprises: The improved semi-supervised Laplace algorithm, which incorporates meteorological intensity difference weights and topological constraints of the distribution network, extracts features from the multi-source heterogeneous dataset to obtain a continuous feature subset, including: Extract all continuous features from the multi-source heterogeneous dataset to construct a semi-supervised temporal feature space; The improved semi-supervised Laplace algorithm is used to perform initial feature screening on the semi-supervised temporal feature space to obtain an initial feature subset. The initial feature subset is refined using a semi-supervised collaborative training random forest to obtain the continuous feature subset.

6. The power distribution network typhoon operation scenario heterogeneous data feature extraction method according to claim 5, characterized in that, The improved semi-supervised Laplace algorithm is used to perform initial feature screening on the semi-supervised temporal feature space to obtain an initial feature subset, including: Based on the semi-supervised temporal feature space, the meteorological intensity difference weight and wind direction anisotropy propagation weight are calculated to construct a feature map. A power grid topology diagram is constructed using the multi-source heterogeneous dataset. The feature map and the power grid topology diagram are then fused to generate a coupled Laplacian matrix. The score of each continuous feature in the semi-supervised temporal feature space is calculated based on the coupled Laplacian matrix. Based on the scores, the continuous features in the semi-supervised temporal feature space are sorted, and the sorting results are filtered by a threshold according to a preset ratio to obtain the initial screening feature subset.

7. The method for extracting features from heterogeneous data in a typhoon operation scenario of a power distribution network according to claim 6, characterized in that, The step of calculating meteorological intensity difference weights and wind direction anisotropy propagation weights based on the semi-supervised temporal feature space to construct a feature map includes: Based on the semi-supervised temporal feature space, the meteorological intensity value of each node object or each line object is determined, and the meteorological intensity difference weight is calculated by combining its structural vulnerability index. The connection direction angle of the power distribution network in the typhoon-affected area is determined based on the semi-supervised temporal feature space, and the wind direction anisotropy propagation weight is determined based on the connection direction angle. Calculate the spatial distance between each node object or each line object, and determine the intra-class feature map weight matrix by combining the meteorological intensity difference weight and the wind direction anisotropic propagation weight; The inter-class feature map weight matrix is ​​constructed using the multi-source heterogeneous dataset, and the feature map is constructed based on the intra-class feature map weight matrix and the inter-class feature map weight matrix.

8. The method for extracting features from heterogeneous data in a typhoon operation scenario of a power distribution network according to claim 7, characterized in that, The step of constructing a power grid topology map using the multi-source heterogeneous dataset, and fusing the feature map and the power grid topology map to generate a coupled Laplacian matrix includes: Based on the multi-source heterogeneous dataset, the adjacency relationship of each node object in the distribution network and the commissioning status of each line object are determined to construct a topology state matrix; The topological distance between samples is determined based on the topological state matrix, and the weight matrix of the intra-class graph of the topological structure is constructed based on the topological distance between samples; The inter-class graph weight matrix of the topology structure is constructed using the multi-source heterogeneous dataset, and then combined with the intra-class graph weight matrix of the topology structure to obtain the power grid topology structure graph. The feature map and the power grid topology map are weighted to obtain the coupling Laplace matrix.

9. The method for extracting features from heterogeneous data in a typhoon operation scenario of a power distribution network according to claim 1, characterized in that, The feature extraction of the multi-source heterogeneous dataset is performed using a hierarchical mining strategy based on globally common and locally rare features to obtain a discrete feature subset, including: All discrete features in the multi-source heterogeneous dataset are extracted to construct a standard transaction database, and the standard transaction database is initially screened using the chi-square test method to obtain the initially screened transaction database. The Apriori algorithm is used to mine global common features in the initially screened transaction database to obtain global high-frequency itemsets of common features. The occurrence frequency of the initially screened transaction database is counted, and rare feature local high-frequency itemsets are extracted by combining the frequency threshold. The common risk weights corresponding to common features in the global high-frequency item set of common features are calculated using the conditional failure frequency gain method, and the rare risk weights corresponding to rare features in the local high-frequency item set of rare features are calculated using the Birnbaum component importance measurement method. Based on the common risk weights and the rare risk weights, feature filtering is performed on the global high-frequency itemset of the common features and the local high-frequency itemset of the rare features to obtain the discrete feature subset.

10. A system for extracting features from heterogeneous data in a power distribution network operating scenario during a typhoon, characterized in that, include: The dataset construction module is used to acquire multi-source heterogeneous data of the power distribution network under typhoon scenarios, and process the multi-source heterogeneous data to form a multi-source heterogeneous dataset. The feature extraction module is used to construct a parallel dual-branch extraction path that includes a continuous feature extraction path and a discrete feature extraction path. The continuous feature extraction path uses an improved semi-supervised Laplace algorithm with the introduction of meteorological intensity difference weights and the topological constraints of the distribution network to extract features from the multi-source heterogeneous dataset, obtaining a continuous feature subset. The discrete feature extraction path uses a hierarchical mining strategy based on global common and local rare features to extract features from the multi-source heterogeneous dataset, obtaining a discrete feature subset. The feature fusion module is used to fuse the continuous feature subset and the discrete feature subset to obtain the final set of key fault features.