Adaptive DBSCAN wind power data cleaning method and system based on multiple physical quantities

The adaptive DBSCAN wind power data cleaning method based on multiple physical quantities solves the problems of neglecting pitch angle information and data confusion caused by heteroscedasticity in existing technologies, and achieves the preservation of high-quality training samples and the improvement of the accuracy of wind power prediction.

CN122286084APending Publication Date: 2026-06-26HUANENG GUANGXI CLEAN ENERGY CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUANENG GUANGXI CLEAN ENERGY CO LTD
Filing Date
2026-03-19
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing wind power data cleaning techniques ignore pitch angle information, leading to confusion between wind curtailment and fault data. Furthermore, the lack of consideration for data heteroscedasticity results in the accidental deletion of valid data in the rated power range, affecting the prediction accuracy and generalization ability of deep learning models.

Method used

An adaptive DBSCAN wind power data cleaning method based on multiple physical quantities is adopted. By acquiring SCADA time-series data, the wind curtailment data is identified using pitch angle, a baseline power curve is constructed, residual features are calculated, and after feature scaling, the DBSCAN algorithm is applied to identify outlier noise, and the data is labeled and filled.

Benefits of technology

Accurately distinguishing between wind curtailment and fault data, preserving true physical characteristics, improves the quality of training samples for deep learning models, and enhances the accuracy and generalization ability of wind power prediction.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122286084A_ABST
    Figure CN122286084A_ABST
Patent Text Reader

Abstract

This invention discloses an adaptive DBSCAN wind power data cleaning method and system based on multiple physical quantities, adhering to the principle of prioritizing physical mechanisms and followed by statistical mining. This invention integrates SCADA pitch angle information, breaking the limitations of two-dimensional wind speed-power data and distinguishing between equipment faults and dispatch-induced power curtailment; it uses segmented hybrid quantiles to construct a baseline, avoiding curve collapse in the rated power range; and it adapts the physical distribution of the data by scaling anisotropic features, combined with the adaptive DBSCAN algorithm, requiring only one set of parameters to complete full curve cleaning. This method solves problems such as over-cleaning in high wind speed ranges and the cumbersome multi-threshold methods of traditional methods, ensuring the integrity of training samples, restoring the true physical characteristics of the unit, improving the generalization ability of subsequent deep learning models, and making engineering applications efficient and convenient.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of wind power generation technology, and more specifically, relates to an adaptive DBSCAN wind power data cleaning method and system based on multiple physical quantities. Background Technology

[0002] Deep learning-based wind power prediction heavily relies on high-quality SCADA historical data to establish accurate mapping rules between environmental variables and turbine output. However, the various noises and abnormal operating conditions prevalent in the raw data severely distort the true distribution characteristics of the data. Existing data cleaning techniques are mostly limited to statistical or cluster analysis in the two-dimensional "wind speed-power" plane. On the one hand, by ignoring the coupling information of the key physical quantity of pitch angle, the algorithm struggles to effectively distinguish between active wind curtailment and passive performance failures of the turbine, causing the deep learning model to mistakenly learn low power output under human control as an inherent characteristic of the turbine. On the other hand, due to the heteroscedasticity of wind power data, the effective data density in the rated power stage is significantly lower than that in the maximum power point tracking (MPPT) stage, and even approximates the noise density. Traditional fixed-parameter DBSCAN clustering or 3σ... Because the criteria lack an adaptive measurement mechanism, they are prone to over-cleaning of the sparse and effective data in this range. This lack of key samples and label confusion directly leads to the training set being unable to fully represent the nonlinear output pattern of the unit across the entire wind speed range. This makes it impossible for the deep learning model to correctly capture the saturation characteristics and true boundaries of the power curve, severely weakening the generalization ability and accuracy of the prediction model in practical applications. Summary of the Invention

[0003] The purpose of this invention is to provide an adaptive DBSCAN wind power data cleaning method and system based on multiple physical quantities, which aims to solve the problems of existing cleaning methods confusing wind curtailment and fault data due to ignoring pitch angle information, and the problem of erroneous deletion of valid data in the rated power range due to not considering data heteroscedasticity. This provides high-quality training samples that retain the true physical characteristics for deep learning models of wind power prediction.

[0004] To achieve the above objectives, the present invention adopts the following technical solution: The adaptive DBSCAN wind power data cleaning method based on multiple physical quantities includes the following steps: Step 1: Data Acquisition and Preprocessing Acquire SCADA time-series data, including active power, 30-second average wind speed, and blade pitch angle; Step 2: Elimination of Abnormal Operating Conditions Based on the relationship between pitch angle distribution and power, wind curtailment data are identified and labeled from the acquired SCADA time series data. Step 3: Construction of the reference power curve The data is processed using a binning method, and the theoretical power of each wind speed range is calculated by combining the median and high quantile strategies. Continuous theoretical power curves are obtained through Gaussian smoothing and linear interpolation. Step 4: Residual Feature Extraction Calculate the residual between the measured power and the theoretical power, and calculate the standard deviation of the local residuals to perform preliminary threshold screening; Step 5: Adaptive DBSCAN clustering cleaning Feature scaling is applied to wind speed and power residuals to construct a dimensionless feature space, and the DBSCAN algorithm is applied to identify outlier noise. Step 6: Data Labeling and Filling The data is labeled according to the cleaning results, including normal, wind curtailment, abnormal, noise, and dead values, and missing or cleaned gaps are filled.

[0005] A further improvement of this invention is that, in step 1, acquiring SCADA time-series data, including active power, 30-second average wind speed, and blade pitch angle, includes: Export one year's worth of historical operating data for the target turbine from the wind farm's SCADA database, defining the dataset as follows: (1) in, t For timestamps, v The average wind speed over 30 seconds is expressed in m / s. P Active power, kW; β The pitch angle is in degrees.

[0006] A further improvement of this invention is that, in step 2, based on the relationship between pitch angle distribution and power, the acquired SCADA time-series data is identified and labeled with wind curtailment data, including: First, determine the wind curtailment status based on the pitch angle distribution; Curtailment criteria: During normal operation, the pitch angle β should be maintained at its minimum value during the low wind speed phase; if β increases significantly and the power P is much lower than the theoretical value under non-rated wind speed, it is determined to be a curtailment condition. High wind speed protection: When the wind speed reaches the rated wind speed or even exceeds the cut-out wind speed, the unit will shut down or limit its power. This type of data is directly marked as high wind speed protection condition based on the wind speed threshold. Time freeze: If v and P change asynchronously within multiple consecutive time steps, it is determined as a sensor "dead value"; Finally, the data on wind curtailment and power restriction conditions were obtained and then removed.

[0007] A further improvement of this invention is that, in step 3, the data is processed using a binning method, and the theoretical power for each wind speed segment is calculated by combining the median and high quantile strategies. A continuous theoretical power curve is then obtained through Gaussian smoothing and linear interpolation, including: To calculate the degree of deviation for each data point, a baseline power curve is constructed: The wind speed range is divided into several equal-width boxes: (2) Among them, the i The interval of each box is The central wind speed is ; Power data set for each enclosure Calculate representative power To overcome the problem of sparse data and susceptibility to control strategies in high-wind-speed segments, a segmented strategy is adopted: (3) in, v e That is the rated wind speed. med The median, qua It is the 95th percentile; Next, Gaussian smoothing is applied to the discrete points, followed by linear interpolation to obtain a continuous theoretical power function. .

[0008] A further improvement of this invention is that, in step 4, the residual for each data point is calculated based on the fitted curve obtained in step 2: (4) Then calculate the standard deviation: (5) according to The criteria set hard boundaries, and data outside the zone is judged as obviously abnormal and no longer undergoes DBSCAN calculation, thus obtaining data within the zone that has not undergone noise detection.

[0009] A further improvement of this invention is that step 5 involves feature scaling and adaptive DBSCAN clustering: Because the dimensions and numerical ranges of wind speed and power residuals differ greatly between 0-25 m / s, directly using Euclidean distance will lead to clustering failure; therefore, a feature space is constructed, and feature transformation is performed first: Define the feature vectors used for clustering.

[0010] (6) Execute the DBSCAN algorithm and set the parameters: (7) Clustering is performed on the transformed dataset. Data that cannot be classified into any core cluster is identified as "outlier noise," thus obtaining all data with anomaly labels.

[0011] A further improvement of this invention is that, in step 6, the data is labeled according to the cleaning results, including normal, abandoned wind, abnormal, noise, and dead values, and missing or cleaned gaps are filled, including: Based on the data label vector, power fitting is performed on low wind speed curtailment data points, and constraints are applied according to time series rules. Linear interpolation is performed on outlier noise points.

[0012] The adaptive DBSCAN wind power data cleaning system based on multiple physical quantities includes: Data acquisition and preprocessing unit: Acquires SCADA time-series data, including active power, 30-second average wind speed, and blade pitch angle; Abnormal operating condition elimination unit: Based on the relationship between pitch angle distribution and power, it identifies and marks wind curtailment data in the acquired SCADA time series data; The reference power curve construction unit uses a binning method to process data, combines median and high quantile strategies to calculate the theoretical power of each wind speed range, and obtains continuous theoretical power curves through Gaussian smoothing and linear interpolation. Residual feature extraction unit: calculates the residual between measured power and theoretical power, and calculates the local residual standard deviation to perform preliminary threshold screening; Adaptive DBSCAN clustering cleaning unit: Features are scaled on wind speed and power residuals to construct a dimensionless feature space, and the DBSCAN algorithm is applied to identify outlier noise; Data labeling and filling unit: The data is labeled according to the cleaning results, including normal, wind curtailment, abnormal, noise and dead values, and missing or cleaned gaps are filled.

[0013] A further improvement of this invention is that the data acquisition and preprocessing unit acquires SCADA time-series data, including active power, 30-second average wind speed, and blade pitch angle, including: Export one year's worth of historical operating data for the target turbine from the wind farm's SCADA database, defining the dataset as follows: (1) in, t For timestamps, v The average wind speed over 30 seconds is expressed in m / s. P Active power, kW; β The pitch angle is in degrees.

[0014] A further improvement of this invention is that, in the abnormal operating condition rejection unit, based on the relationship between pitch angle distribution and power, the acquired SCADA time-series data is identified and marked for wind curtailment data, including: First, determine the wind curtailment status based on the pitch angle distribution; Curtailment criteria: During normal operation, the pitch angle β should be maintained at its minimum value during the low wind speed phase; if β increases significantly and the power P is much lower than the theoretical value under non-rated wind speed, it is determined to be a curtailment condition. High wind speed protection: When the wind speed reaches the rated wind speed or even exceeds the cut-out wind speed, the unit will shut down or limit its power. This type of data is directly marked as high wind speed protection condition based on the wind speed threshold. Time freeze: If v and P change asynchronously within multiple consecutive time steps, it is determined as a sensor "dead value"; Finally, the data on wind curtailment and power restriction conditions were obtained and then removed.

[0015] Compared with the prior art, the present invention has at least the following beneficial technical effects: 1. Correcting model learning bias: By introducing the physical quantity of pitch angle, non-fault data such as wind curtailment are accurately separated, avoiding the deep learning model from misjudging human-induced power curtailment logic as the performance degradation of the unit itself, and ensuring that the model learns the correct physical laws of wind power conversion.

[0016] 2. Effective protection of rated power data: In view of the shortcomings of existing technologies that are prone to accidental deletion of sparse data in the high wind speed range, this invention uses high quantile fitting combined with feature scaling DBSCAN to reduce the stringent requirements on the data density of the high wind speed range during the clustering process, effectively preserving the real data points of the rated power range and improving the integrity of the data across the entire wind speed range.

[0017] 3. Adaptive noise identification: Through specific feature scaling, the clustering algorithm can adapt to the distribution characteristics of the wind power curve, which is "small variance in the low wind speed range and large variance in the high wind speed range". This enables accurate identification of noise in different areas without the need to frequently adjust the clustering parameters for different wind speed ranges. Attached Figure Description

[0018] To more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of the present invention. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0019] Figure 1 This is a schematic diagram of the data cleaning method of the present invention; Figure 2This is a schematic diagram showing the results of an embodiment of the data cleaning method of the present invention; Figure 3 This is a structural block diagram of the data cleaning system of the present invention. Detailed Implementation

[0020] In the following description, only certain exemplary embodiments are briefly described. As those skilled in the art will recognize, the described embodiments can be modified in various ways without departing from the spirit or scope of the invention. Therefore, the drawings and description are considered to be exemplary in nature and not restrictive.

[0021] In the description of this invention, it should be understood that, when used in this specification and the appended claims, the terms "comprising" and "including" indicate the presence of the described features, integrals, steps, operations, elements and / or components, but do not exclude the presence or addition of one or more other features, integrals, steps, operations, elements, components and / or collections thereof.

[0022] It should also be understood that the terminology used in this specification is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms unless the context clearly indicates otherwise.

[0023] It should also be further understood that the term "and / or" as used in this specification and the appended claims refers to any combination of one or more of the associated listed items and all possible combinations, and includes such combinations.

[0024] The accompanying drawings illustrate various structural schematic diagrams according to embodiments disclosed in this invention. These drawings are not to scale, and some details have been enlarged for clarity, and some details may have been omitted. The shapes of the various regions and layers shown in the drawings, as well as their relative sizes and positional relationships, are merely exemplary and may deviate from reality due to manufacturing tolerances or technical limitations. Furthermore, those skilled in the art can design regions / layers with different shapes, sizes, and relative positions as needed.

[0025] The embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

[0026] Example 1 Reference Figure 1 The adaptive DBSCAN wind power data cleaning method based on multiple physical quantities provided by this invention is implemented according to the following steps: Step 1: Obtain SCADA time-series data, including active power, wind speed (30-second average), and blade pitch angle, as follows: Export one year's worth of historical operating data for the target turbine from the wind farm's SCADA database, defining the dataset as follows: (1) in, t For timestamps, v The average wind speed (m / s) over 30 seconds. P Active power (kW). β This is the propeller pitch angle (in degrees). This dataset is the input data for step 2.

[0027] Step 2: Identification of abnormal operating conditions based on multiple physical quantities. The specific steps are as follows: First, determine the wind curtailment status based on the pitch angle distribution; Curtailment criteria: During normal operation, the pitch angle β should be maintained at its minimum value during the low wind speed phase; if β increases significantly and the power P is much lower than the theoretical value under non-rated wind speed, it is determined to be a curtailment condition (corresponding to label 1 in Table 1).

[0028] High wind speed protection: When the wind speed reaches the rated wind speed or even exceeds the cut-out wind speed, the unit will shut down or limit its power. This type of data is directly marked as high wind speed protection condition based on the wind speed threshold (corresponding to label 2 in Table 1).

[0029] Time freeze (dead value): If v and P change asynchronously within multiple consecutive time steps, it is determined to be a sensor dead value (corresponding to label-2 in Table 1).

[0030] Finally, the data of wind curtailment and power restriction conditions that were marked and removed were obtained as the basis for the curve fitting in step 3.

[0031] Table 1 is a label table for different data in this invention;

[0032] Step 3: Benchmark curve fitting based on quantiles: To accurately quantify the degree to which each data point deviates from the normal state, a baseline power curve representing the true performance of the unit needs to be constructed. This step employs a segmented hybrid strategy to address the fitting bias caused by data sparsity in the high-wind-speed range: The wind speed range is divided into several equal-width boxes: (2) Among them, the i The interval of each box is The central wind speed is .

[0033] Power data set for each enclosure Calculate representative power To overcome the problem of sparse data and susceptibility to control strategies in high-wind-speed segments, a segmented strategy is adopted: (3) in, v e That is the rated wind speed. med The median, qua It is at the 95th percentile.

[0034] Next, Gaussian smoothing is applied to the discrete points, followed by linear interpolation to obtain a continuous theoretical power function. .

[0035] Step 4: Calculate the residual for each data point based on the fitted curve obtained in Step 2: (4) Then calculate the standard deviation: (5) according to The criteria establish hard boundaries; data outside the defined area is considered significantly abnormal and will not be subject to DBSCAN calculation. This yields data within the defined area that has not undergone noise detection, which serves as input for step 5.

[0036] Step 5: Perform feature scaling and adaptive DBSCAN clustering: Because the dimensions and numerical ranges of wind speed (0-25 m / s) and power residuals differ greatly, directly using Euclidean distance will lead to clustering failure. A specific feature space needs to be constructed, and feature transformation is performed first: Define the feature vectors used for clustering.

[0037] (6) Through this non-uniform scaling, the originally narrow residual band distributed along the power axis is mapped to a relatively uniform region. Then, the DBSCAN algorithm is executed, with the parameters set as follows: (7) Clustering is performed on the transformed dataset. Data that cannot be classified into any core cluster is identified as "outlier noise" (corresponding to label-1 in Table 1). All anomaly-labeled data are then used as input for further processing in step 6. This process removes dense noise closely following the normal curve while preserving sparse rated power points to the greatest extent possible, thanks to the high quantile baseline from step 3 and the elliptical neighborhood from this step.

[0038] Step 6: Fill in and output the data. Repair of low wind speed curtailment points (corresponding to label 1 in Table 1): Since this type of data corresponds to the situation of "should have generated more electricity", the theoretical power curve obtained in step 3 is used to reconstruct the value by combining it with the current wind speed, and the potential true power value is restored.

[0039] Outlier noise points (corresponding to label-1 in Table 1) filling: For random outlier noise, linear interpolation of adjacent normal points is used for filling.

[0040] Dead values ​​(corresponding to label-2 in Table 1) are handled by either directly removing them or using a Long Short-Term Memory (LSTM) network for temporal prediction and filling.

[0041] The final output is a dataset containing the cleaned power sequences, which serves as the standard training samples for the deep learning model.

[0042] Example 2 Reference Figure 1 The adaptive DBSCAN wind power data cleaning method based on multiple physical quantities provided by this invention is implemented according to the following steps: Step 1: Data Acquisition and Preprocessing Acquire SCADA time-series data, including active power, wind speed (30-second average), and blade pitch angle; Step 2: Elimination of Abnormal Operating Conditions Based on the relationship between pitch angle distribution and power, wind curtailment data can be identified and labeled. Step 3: Construction of the reference power curve The data is processed using a binning method, and the theoretical power of each wind speed range is calculated by combining the median and high quantile strategies. Continuous theoretical power curves are obtained through Gaussian smoothing and linear interpolation. Step 4: Residual Feature Extraction Calculate the residual between the measured power and the theoretical power, and calculate the standard deviation of the local residuals to perform preliminary threshold screening; Step 5: Adaptive DBSCAN clustering cleaning By scaling the features of wind speed and power residuals, a dimensionless feature space is constructed, and the DBSCAN algorithm is applied to identify outlier noise.

[0043] Step 6: Data Labeling and Filling The data is labeled according to the cleaning results (normal, wind curtailment, abnormal, noise, dead value), and missing or cleaned gaps are filled.

[0044] This invention prioritizes physical mechanisms over statistical mining. First, it utilizes the pitch angle as a physical quantity to eliminate human control factors and remove data distribution biases caused by operational strategies. Then, addressing the pain point of sparse data and susceptibility to accidental deletion in the high-wind-speed segment of wind power curves, it innovatively proposes a segmented hybrid quantile fitting method to construct a baseline. Finally, by constructing a non-uniformly scaled feature space, it leverages the density clustering characteristics of the DBSCAN algorithm in dimensionless space to achieve adaptive identification of noise around the nonlinear power curve.

[0045] By fusing multi-source features, the limitations of relying solely on two-dimensional wind speed-power data are overcome. Pitch angle information from the SCADA system is introduced to effectively distinguish between equipment failures and power curtailment. Simultaneously, a hybrid quantile baseline is adopted in the construction of the baseline power curve, using the median for low wind speed areas and the 95th quantile for high wind speed areas for differentiated processing. This overcomes the curve collapse problem caused by the decrease in data density in the rated power range of conventional fitting methods. Furthermore, through anisotropic feature scaling, different scaling factors are applied to wind speed and residuals, making the search neighborhood of the clustering algorithm anisotropic and accurately adapting to the physical distribution of wind power data.

[0046] By optimizing the data cleaning logic, the true physical characteristics of the unit across the entire wind speed range are restored. This not only enables the subsequent deep learning prediction model to learn the correct nonlinear mapping law, significantly reducing prediction errors and improving model generalization ability, but also effectively solves the over-cleaning phenomenon caused by data sparsity in the high wind speed range of existing DBSCAN algorithms or 3σ criteria, ensuring the integrity of training samples in the high wind speed range. At the same time, compared with the shortcomings of traditional methods that require setting multiple thresholds for different wind speed ranges, the adaptive DBSCAN algorithm of this invention only needs one set of parameters to adapt to the cleaning requirements of the entire power curve, which has the advantage of strong parameter robustness and greatly improves the simplicity and efficiency of engineering applications.

[0047] Example 3 Reference Figure 3 The present invention provides an adaptive DBSCAN wind power data cleaning system based on multiple physical quantities, comprising: Data acquisition and preprocessing unit: Acquires SCADA time-series data, including active power, 30-second average wind speed, and blade pitch angle; Abnormal operating condition elimination unit: Based on the relationship between pitch angle distribution and power, it identifies and marks wind curtailment data in the acquired SCADA time series data; The reference power curve construction unit uses a binning method to process data, combines median and high quantile strategies to calculate the theoretical power of each wind speed range, and obtains continuous theoretical power curves through Gaussian smoothing and linear interpolation. Residual feature extraction unit: calculates the residual between measured power and theoretical power, and calculates the local residual standard deviation to perform preliminary threshold screening; Adaptive DBSCAN clustering cleaning unit: Features are scaled on wind speed and power residuals to construct a dimensionless feature space, and the DBSCAN algorithm is applied to identify outlier noise; Data labeling and filling unit: The data is labeled according to the cleaning results, including normal, wind curtailment, abnormal, noise and dead values, and missing or cleaned gaps are filled.

[0048] In the data acquisition and preprocessing unit of this embodiment, SCADA time-series data is acquired, including active power, 30-second average wind speed, and blade pitch angle, including: Export one year's worth of historical operating data for the target turbine from the wind farm's SCADA database, defining the dataset as follows: (1) in, t For timestamps, v The average wind speed over 30 seconds is expressed in m / s. P Active power, kW; β The pitch angle is in degrees.

[0049] In the abnormal operating condition rejection unit of this embodiment, based on the relationship between pitch angle distribution and power, the acquired SCADA time series data is used to identify and mark wind curtailment data, including: First, determine the wind curtailment status based on the pitch angle distribution; Curtailment criteria: During normal operation, the pitch angle β should be maintained at its minimum value during the low wind speed phase; if β increases significantly and the power P is much lower than the theoretical value under non-rated wind speed, it is determined to be a curtailment condition. High wind speed protection: When the wind speed reaches the rated wind speed or even exceeds the cut-out wind speed, the unit will shut down or limit its power. This type of data is directly marked as high wind speed protection condition based on the wind speed threshold. Time freeze: If v and P change asynchronously within multiple consecutive time steps, it is determined as a sensor "dead value"; Finally, the data on wind curtailment and power restriction conditions were obtained and marked for elimination. The results of the example calculation are as follows: Figure 2 As shown, normal data are represented by white hollow circles, distributed in accordance with the theoretical power curve of the wind turbine, exhibiting a pattern of increasing power as wind speed rises and then stabilizing. The four types of abnormal data are low-wind-speed curtailment points, high-wind-speed protection shutdown points, discrete noise points identified by DBSCAN, and time-series dead points. Each corresponds to a different invalid scenario, with clear markings and significantly different distribution characteristics, verifying the method's ability to distinguish between them.

[0050] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. It will be apparent to those skilled in the art that the invention is not limited to the details of the exemplary embodiments described above, and that the invention can be implemented in other specific forms without departing from its spirit or essential characteristics. Therefore, the embodiments should be considered illustrative and non-limiting in all respects, and the scope of the invention is defined by the appended claims rather than the foregoing description. Thus, all variations falling within the meaning and scope of equivalents of the claims are intended to be included within the scope of the invention. No reference numerals in the claims should be construed as limiting the scope of the claims.

[0051] Furthermore, it should be understood that although this specification describes embodiments, not every embodiment contains only one independent technical solution. This narrative style is merely for clarity. Those skilled in the art should consider the specification as a whole, and the technical solutions in each embodiment can be appropriately combined to form other embodiments that can be understood by those skilled in the art. The above content is only for illustrating the technical concept of the present invention and should not be construed as limiting the scope of protection of the present invention. Any modifications made based on the technical concept proposed in this invention shall fall within the scope of protection of the claims of this invention.

Claims

1. An adaptive DBSCAN wind power data cleaning method based on multiple physical quantities, characterized in that, Includes the following steps: Step 1: Data Acquisition and Preprocessing Acquire SCADA time-series data, including active power, 30-second average wind speed, and blade pitch angle; Step 2: Elimination of Abnormal Operating Conditions Based on the relationship between pitch angle distribution and power, wind curtailment data are identified and labeled from the acquired SCADA time series data. Step 3: Construction of the reference power curve The data is processed using a binning method, and the theoretical power of each wind speed range is calculated by combining the median and high quantile strategies. Continuous theoretical power curves are obtained through Gaussian smoothing and linear interpolation. Step 4: Residual Feature Extraction Calculate the residual between the measured power and the theoretical power, and calculate the standard deviation of the local residuals to perform preliminary threshold screening; Step 5: Adaptive DBSCAN clustering cleaning Feature scaling is applied to wind speed and power residuals to construct a dimensionless feature space, and the DBSCAN algorithm is applied to identify outlier noise. Step 6: Data Labeling and Filling The data is labeled according to the cleaning results, including normal, wind curtailment, abnormal, noise, and dead values, and missing or cleaned gaps are filled.

2. The adaptive DBSCAN wind power data cleaning method based on multiple physical quantities according to claim 1, characterized in that, In step 1, SCADA time-series data is acquired, including active power, 30-second average wind speed, and blade pitch angle, including: Export one year's worth of historical operating data for the target turbine from the wind farm's SCADA database, defining the dataset as follows: (1) in, t For timestamps, v The average wind speed over 30 seconds is expressed in m / s. P Active power, kW; β The pitch angle is in degrees.

3. The adaptive DBSCAN wind power data cleaning method based on multiple physical quantities according to claim 1, characterized in that, In step 2, based on the relationship between pitch angle distribution and power, the acquired SCADA time-series data is used to identify and label wind curtailment data, including: First, determine the wind curtailment status based on the pitch angle distribution; Curtailment criteria: During normal operation, the pitch angle β should be maintained at its minimum value during the low wind speed phase; if β increases significantly and the power P is much lower than the theoretical value under non-rated wind speed, it is determined to be a curtailment condition. High wind speed protection: When the wind speed reaches the rated wind speed or even exceeds the cut-out wind speed, the unit will shut down or limit its power. This type of data is directly marked as high wind speed protection condition based on the wind speed threshold. Time freeze: If v and P change asynchronously within multiple consecutive time steps, it is determined as a sensor "dead value"; Finally, the data on wind curtailment and power restriction conditions were obtained and then removed.

4. The adaptive DBSCAN wind power data cleaning method based on multiple physical quantities according to claim 1, characterized in that, In step 3, the data is processed using a binning method, and the theoretical power for each wind speed range is calculated by combining the median and high quantile strategies. Continuous theoretical power curves are then obtained through Gaussian smoothing and linear interpolation, including: To calculate the degree of deviation for each data point, a baseline power curve is constructed: The wind speed range is divided into several equal-width boxes: (2) Among them, the i The interval of each box is The central wind speed is ; Power data set for each enclosure Calculate representative power To overcome the problem of sparse data and susceptibility to control strategies in high-wind-speed segments, a segmented strategy is adopted: (3) in, v e That is the rated wind speed. med The median, qua It is the 95th percentile; Next, Gaussian smoothing is applied to the discrete points, followed by linear interpolation to obtain a continuous theoretical power function. .

5. The adaptive DBSCAN wind power data cleaning method based on multiple physical quantities according to claim 1, characterized in that, In step 4, the residual for each data point is calculated based on the fitted curve obtained in step 2: (4) Then calculate the standard deviation: (5) according to The criteria set hard boundaries, and data outside the zone is judged as obviously abnormal and no longer undergoes DBSCAN calculation, thus obtaining data within the zone that has not undergone noise detection.

6. The adaptive DBSCAN wind power data cleaning method based on multiple physical quantities according to claim 1, characterized in that, Step 5: Perform feature scaling and adaptive DBSCAN clustering. Because the dimensions and numerical ranges of wind speed and power residuals differ greatly between 0-25 m / s, directly using Euclidean distance will lead to clustering failure; therefore, a feature space is constructed, and feature transformation is performed first: definition Feature vectors used for clustering (6) Execute the DBSCAN algorithm and set the parameters: (7) Clustering is performed on the transformed dataset. Data that cannot be classified into any core cluster is identified as "outlier noise," thus obtaining all data with anomaly labels.

7. The adaptive DBSCAN wind power data cleaning method based on multiple physical quantities according to claim 1, characterized in that, In step 6, the data is labeled according to the cleaning results, including normal, wind curtailment, abnormal, noise, and dead values, and missing or cleaned gaps are filled, including: Based on the data label vector, power fitting is performed on low wind speed curtailment data points, and constraints are applied according to time series rules. Linear interpolation is performed on outlier noise points.

8. An adaptive DBSCAN wind power data cleaning system based on multiple physical quantities, characterized in that, include: Data acquisition and preprocessing unit: Acquires SCADA time-series data, including active power, 30-second average wind speed, and blade pitch angle; Abnormal operating condition elimination unit: Based on the relationship between pitch angle distribution and power, it identifies and marks wind curtailment data in the acquired SCADA time series data; The reference power curve construction unit uses a binning method to process data, combines median and high quantile strategies to calculate the theoretical power of each wind speed range, and obtains continuous theoretical power curves through Gaussian smoothing and linear interpolation. Residual feature extraction unit: calculates the residual between measured power and theoretical power, and calculates the local residual standard deviation to perform preliminary threshold screening; Adaptive DBSCAN clustering cleaning unit: Features are scaled on wind speed and power residuals to construct a dimensionless feature space, and the DBSCAN algorithm is applied to identify outlier noise; Data labeling and filling unit: The data is labeled according to the cleaning results, including normal, wind curtailment, abnormal, noise and dead values, and missing or cleaned gaps are filled.

9. The adaptive DBSCAN wind power data cleaning system based on multiple physical quantities according to claim 8, characterized in that, The data acquisition and preprocessing unit acquires SCADA time-series data, including active power, 30-second average wind speed, and blade pitch angle, including: Export one year's worth of historical operating data for the target turbine from the wind farm's SCADA database, defining the dataset as follows: (1) in, t For timestamps, v The average wind speed over 30 seconds is expressed in m / s. P Active power, kW; β The pitch angle is in degrees.

10. The adaptive DBSCAN wind power data cleaning system based on multiple physical quantities according to claim 8, characterized in that, In the abnormal operating condition elimination unit, based on the relationship between pitch angle distribution and power, the acquired SCADA time-series data is used to identify and label wind curtailment data, including: First, determine the wind curtailment status based on the pitch angle distribution; Curtailment criteria: During normal operation, the pitch angle β should be maintained at its minimum value during the low wind speed phase; if β increases significantly and the power P is much lower than the theoretical value under non-rated wind speed, it is determined to be a curtailment condition. High wind speed protection: When the wind speed reaches the rated wind speed or even exceeds the cut-out wind speed, the unit will shut down or limit its power. This type of data is directly marked as high wind speed protection condition based on the wind speed threshold. Time freeze: If v and P change asynchronously within multiple consecutive time steps, it is determined as a sensor "dead value"; Finally, the data on wind curtailment and power restriction conditions were obtained and then removed.