A sea-land wind identification method based on density clustering
By using a density-based sea-land wind identification method, which utilizes the recycling factor and sea-land temperature difference feature data, wind clusters in coastal areas are adaptively identified. This solves the problems of high false alarm rate and strong parameter dependence in existing technologies, and achieves high accuracy and robustness in wind identification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- FUJIAN PROVINCIAL ACADEMY OF ENVIRONMENTAL SCI
- Filing Date
- 2026-06-03
- Publication Date
- 2026-06-30
Smart Images

Figure CN122310331A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of atmospheric science and meteorological data analysis technology, and in particular to a method for identifying land and sea breezes based on density clustering. Background Technology
[0002] In coastal cities, sea and land breezes are key factors affecting air quality, the urban heat island effect, and wind energy utilization. Accurate identification of sea and land breezes is a prerequisite for achieving air pollution control, optimizing meteorological forecasts, and planning urban ventilation corridors. It can effectively reveal pollutant accumulation mechanisms and improve early warning accuracy.
[0003] Currently, mainstream sea and land breeze identification technologies mainly rely on threshold-based discrimination methods and classification methods based on traditional clustering (such as K-means), but both methods have significant limitations. Threshold-based discrimination methods typically set specific wind direction (e.g., easterly / westerly) and wind speed thresholds for judgment. However, this method ignores the interference of background winds. In actual meteorological observations, local sea and land breezes often overlap with large-scale background winds (such as monsoons and the outer circulation of typhoons), forming non-spherical, unevenly dense clusters, resulting in a complex mixed state of wind direction data. Simple threshold discrimination is prone to false alarms.
[0004] Traditional clustering methods force data to be divided into spherical categories. However, due to the influence of topography, sea and land wind data are often distributed in chains or irregularly. Traditional clustering methods cannot capture such non-spherical clusters, resulting in the wind field structure being incorrectly fragmented. Coastal meteorological sensor data are often affected by extreme weather or equipment errors, producing outliers. Traditional clustering methods are sensitive to noise, and outliers can easily pull the cluster centers off-center, causing the identification model to fail. Traditional clustering methods are highly parameter-dependent and require manual pre-setting of the number of clusters. However, wind fields vary greatly in different coastal areas (such as plains / mountains / cities), and manual parameters cannot be adaptive, resulting in weak model generalization ability.
[0005] In addition, existing studies mostly focus on instantaneous wind speed and direction values, lacking indicators to quantify the local "stagnation" and "circulation" characteristics of air masses, making it difficult to effectively distinguish between local sea and land breezes with reciprocating air mass movement and background winds with unidirectional air mass movement. Summary of the Invention
[0006] The main objective of this invention is to propose a density clustering-based method for identifying sea and land breezes, which can improve the accuracy and robustness of sea and land breeze identification.
[0007] This invention is achieved through the following technical solution:
[0008] A density-based clustering method for identifying sea and land breezes includes the following steps:
[0009] Step S1: Within the set consecutive N days, for each day, acquire N sets of hourly wind field data and land-sea temperature difference data in the target area within a unit time interval. Calculate the recirculation factor of the weather cluster based on the wind field data of each day, and extract the wind field feature data and land-sea temperature difference feature data corresponding to the first and second time intervals within the unit time interval of the day. Map and combine the wind field feature data, land-sea temperature difference feature data, and recirculation factor to construct a single sample representing the day. N samples form a dataset to be classified.
[0010] Step S2: Preprocess the dataset to be classified to obtain a standard classification dataset. Use the standard classification dataset to determine the optimal parameter combination of the density clustering model. Then, use the density clustering model with the optimal parameter combination to perform density clustering on the standard classification dataset, divide it into multiple wind clusters and remove noise points.
[0011] Step S3: Identify each wind category cluster one by one based on the statistical characteristics of each category cluster to obtain the wind category identification results.
[0012] Furthermore, the wind field data includes wind direction data and wind speed data. The wind direction data includes the sine and cosine values of the wind rotation angle, and the wind speed data includes the zonal wind component and the meridional wind component. The land-sea temperature difference data is the temperature difference between land observation points and ocean observation points within the target area.
[0013] Furthermore, in step S1, the recirculation factor R is calculated based on hourly wind field data: firstly, the zonal wind component is extracted hourly within a unit time interval T. Meridional wind component ,according to and Will and Convert to hourly displacement increments and Then, sum up all hourly displacement increments within a unit time interval T and calculate the net linear displacement of the air mass within that unit time interval. and the total length of the motion path Finally, according to Calculate the recirculation factor, assigning R=1 when S=0, where, , .
[0014] Furthermore, in step S1, the unit time interval T = 24 hours, the first time interval is from 3:00 to 7:00 local time in the target area, and the second time interval is from 14:00 to 18:00 local time in the target area.
[0015] Furthermore, in step S1, the wind field characteristic data includes the average wind field data in the first time period and the average wind field data in the second time period; the land-sea temperature difference characteristic data includes the average land-sea temperature difference in the first time period and the average land-sea temperature difference in the second time period; and the average wind field data includes the average wind direction data and the average wind speed data.
[0016] Furthermore, in step S2, the wind field feature data, land-sea temperature difference feature data, and recycling factor in the sample are normalized to make them standard wind field feature data, standard land-sea temperature difference feature data, and standard recycling factor with a mean of 0, a standard deviation of 1, and a standard normal distribution. The standard wind field feature data, standard land-sea temperature difference feature data, and standard recycling factor of each day are spliced together to form a standard sample, resulting in a standard classification dataset composed of N1 standard samples.
[0017] Furthermore, in step S2, determining the optimal parameter combination of the density clustering model specifically includes: calculating the distance between any two standard samples in the standard sample dataset to form a sample distance matrix; traversing each parameter combination according to a preset interval and a preset distance; and calculating the clustering evaluation index score corresponding to each parameter combination one by one. The clustering evaluation index includes the silhouette coefficient, the CH index, and the DB index. The optimal parameter combination should make the silhouette coefficient tend to the peak value, the CH index tend to the peak value, and the DB index tend to the valley value. Here, the parameter combination refers to the combination of the ε neighborhood radius and the minimum number of contained points M.
[0018] Furthermore, in determining the optimal parameter combination for the density clustering model, three curves corresponding to the silhouette coefficient, CH index, and DB index are constructed with the parameter combination as the horizontal axis and the silhouette coefficient, CH index, and DB index scores as the vertical axis, respectively. By analyzing the changing trends of the three curves, the parameter combination in which the silhouette coefficient tends to the peak, the CH index tends to the peak, and the DB index tends to the trough is selected as the optimal parameter combination.
[0019] Furthermore, in step S2, density clustering specifically includes: based on the sample distance matrix, strictly according to the optimal combination of ε-neighborhood radius and minimum number of contained points M, identifying standard samples in the standard classification dataset as core points, boundary points, and noise points; starting from any unassigned core point, expanding outward using density reachability, continuously absorbing new core points through chain connections of core points to be classified into the same wind cluster, and classifying boundary points within the ε-neighborhood radius of the core point into that wind cluster; traversing all unvisited core points until all core points and boundary points are classified into the corresponding wind clusters; and removing noise points during this process. Here, a core point refers to a standard sample whose total number of standard samples contained within its ε-neighborhood radius is at least M; a boundary point refers to a standard sample covered by the ε-neighborhood radius of a core point but which is not itself a core point; and a noise point refers to a standard sample that is neither a core point nor a boundary point.
[0020] Furthermore, in step S3, the statistical characteristics of each classification cluster include average wind speed, average wind direction, average land-sea temperature difference, and average recirculation factor. The wind type identification results include northerly wind type, southerly wind type, westerly wind type, easterly wind type, and local land-sea wind type.
[0021] As can be seen from the above description of the present invention, compared with the prior art, the present invention has the following beneficial effects:
[0022] 1. This invention first acquires N sets of hourly meteorological data for a target area within a unit time interval for each day within a set consecutive N days. Based on the wind field data of the day, the recycling factor of the weather cluster is calculated, and the wind field characteristic data and land-sea temperature difference characteristic data corresponding to the first and second time intervals within the unit time interval of the day are extracted. The wind field characteristic data, land-sea temperature difference characteristic data and recycling factor are mapped and combined to construct a single sample representing the day. N samples form a dataset to be classified. Then, the dataset to be classified is preprocessed to obtain a standard classification dataset. The optimal parameter combination of the density clustering model is determined using the standard classification dataset. Then, the density clustering model with the optimal parameter combination is used to perform density clustering on the dataset to be classified to divide it into multiple wind clusters. Finally, based on the statistical characteristics of each classification cluster, each wind cluster is identified one by one to obtain the wind category identification result. This invention enables the adaptive identification of cluster structures of arbitrary shapes for wind field data in coastal areas, which are often characterized by chain-like and irregular distributions due to topography and monsoon influences. It also possesses strong noise resistance, automatically identifying and eliminating anomalous noise points caused by extreme weather or equipment errors, thus avoiding the problem of outliers skewing cluster centers and effectively improving the accuracy and robustness of wind type identification. Furthermore, the invention introduces a recirculation factor to effectively quantify the local residence and circulation of air masses, thereby effectively distinguishing between local sea and land winds with reciprocating air mass movement and background winds with unidirectional air mass movement, further enhancing the accuracy of wind type identification.
[0023] 2. When determining the optimal parameter combination, this invention does not require manual intervention to preset the number of clusters, but rather performs adaptive calculations entirely based on the natural distribution characteristics of the data to be classified. This not only solves the problem of weak generalization ability of artificial parameters caused by huge differences in wind fields in different coastal areas (such as plains, mountains, and cities), but also ensures that the clustering results objectively conform to the density distribution logic of the data itself.
[0024] 3. When constructing samples, this invention extracts the average temperature difference between land and sea during two key time periods: 3:00 to 7:00 and 14:00 to 18:00, as feature data. This combines the thermal mechanism of land and sea breeze formation with the diurnal variation pattern, further improving the accuracy of wind type identification. Attached Figure Description
[0025] The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
[0026] Figure 1 This is a flowchart of the present invention. Detailed Implementation
[0027] The present invention will be further described below through specific embodiments.
[0028] like Figure 1As shown, the density clustering-based sea and land breeze identification method includes the following steps:
[0029] Step S1: Within the set consecutive N days, for each day, acquire N sets of 24-hour meteorological data for the target area hourly within a unit time interval. The meteorological data includes wind field data and land-sea temperature difference data. Calculate the recirculation factor of the weather cluster based on the wind field data for each day, and extract the wind field characteristic data and land-sea temperature difference characteristic data corresponding to the first and second time intervals within the unit time interval of the day. Map and combine the wind field characteristic data, land-sea temperature difference characteristic data, and recirculation factor to construct a single sample representing the day. N samples form a dataset to be classified.
[0030] Where N≥180, the unit time interval is 24 hours, the first time interval is from 3:00 to 7:00 local time in the target area, and the second time interval is from 14:00 to 18:00 local time in the target area.
[0031] Wind field data includes wind direction and wind speed data. Wind direction data includes the sine and cosine values of the wind rotation angle, and wind speed data includes zonal and meridional wind components. The land-sea temperature difference data represents the temperature difference between land and ocean observation points within the target area. Wind field data can be obtained from ground-based meteorological stations and anemometer towers. The land-sea temperature difference data is obtained from temperature sensors deployed at both land and ocean observation points within the target area.
[0032] The process of obtaining the recirculation factor R includes:
[0033] For each unit time interval, extract the zonal wind component collected every hour. Meridional wind component Where 0≤t≤23 represents the number of hours, and the zonal wind component... Meridional wind component The unit is meters per second;
[0034] according to Hehe ,Will and Convert to hourly displacement increments and Where 3600 is the number of seconds corresponding to 1 hour, realizing the dimensional conversion of wind speed to time domain displacement;
[0035] By summing the moduli of all hourly displacement increments within a unit time interval, the total path length of the air mass throughout the day can be obtained. ;
[0036] By summing up all hourly displacement increments within a unit time interval, the net displacement vector of the air mass within that unit time interval can be obtained. And calculate its modulus. This represents the net linear displacement of the air mass within a unit time interval, where... , ;
[0037] according to Calculate the recirculation factor. Special cases are handled as follows: when L=0 (calm wind within a unit time interval), R=0 is assigned, indicating that the air mass is completely circulated and lingering locally. Each unit time interval corresponds to a unique recirculation factor, which is stored one-to-one with wind field characteristic data and land-sea temperature difference characteristic data, forming a dataset for wind type identification.
[0038] The wind field characteristic data includes the average wind field data in the first time period and the average wind field data in the second time period. The land-sea temperature difference characteristic data includes the average land-sea temperature difference in the first time period and the average land-sea temperature difference in the second time period. The average wind field data includes the average wind direction data and the average wind speed data.
[0039] Step S2: Preprocess the dataset to be classified to obtain a standard classification dataset. Use the standard classification dataset to determine the optimal parameter combination of the density clustering model. Then, use the density clustering model with the optimal parameter combination to perform density clustering on the standard classification dataset, divide it into N' wind clusters and remove noise points; where N' is automatically generated by the density clustering algorithm based on the natural distribution of the data.
[0040] The preprocessing specifically includes: for the three features in the sample—wind field feature data, land-sea temperature difference feature data, and recirculation factor—the average value of each feature data is calculated, and each feature data is subtracted from its own average value to obtain an initial dataset with a mean of 0. This initial dataset is then normalized to obtain standard wind field feature data, standard land-sea temperature difference feature data, and standard recirculation factor with a mean of 0, a standard deviation of 1, and following a standard normal distribution. The standard wind field feature data, standard land-sea temperature difference feature data, and standard recirculation factor of each day are then concatenated to form standard samples, resulting in a standard classification dataset composed of N standard samples.
[0041] Determining the optimal parameter combination for a density clustering model specifically includes:
[0042] Calculate the distance between any two standard samples in the standard sample dataset to form a sample distance matrix. The distance can be Euclidean distance, Manhattan distance or cosine distance. The specific calculation process of the distance is the existing technology.
[0043] The ε-neighborhood radius and the minimum number of contained points M form a parameter combination. Each parameter combination is iterated through according to a preset interval and a preset criterion, and the clustering evaluation index score corresponding to each parameter combination is calculated one by one. This clustering evaluation index includes the silhouette coefficient, the CH index, and the DB index. The optimal parameter combination should make the silhouette coefficient approach its peak, the CH index approach its peak, and the DB index approach its trough. The preset interval for the ε-neighborhood radius can be 0.1, and the preset interval for the minimum number of contained points can be 1.
[0044] In determining the optimal parameter combination for the density clustering model, three curves are constructed with the parameter combination as the abscissa and the silhouette coefficient, CH index, and DB index scores as the ordinates, respectively. By analyzing the changing trends of the three curves, the parameter combination in which the silhouette coefficient tends to the peak, the CH index tends to the peak, and the DB index tends to the trough is selected as the optimal parameter combination.
[0045] The contour coefficient is based on the formula. The closer the value is to 1, the better the clustering result under that parameter combination. Therefore, the parameters corresponding to the peak value of the silhouette coefficient are preferred. For standard samples The profile coefficient, if wind cluster If there is only one sample, assign a value directly. , For standard samples The average distance to all other standard samples in its wind cluster. For wind-related clusters The total number of samples, For the sample and The Euclidean distance between them For standard samples The minimum value of the average distance to all wind clusters that do not contain the standard sample. To remove Any wind cluster outside of the category.
[0046] The CH (Calinski-Harabasz) index is calculated according to the formula... The higher the CH index, the more significant the differences between clusters, the more compact the samples within a cluster, and the better the clustering effect. The sum of squared distances from the centers of all wind clusters to the global mean vector represents the separation between wind clusters. Let K be the sum of squared distances from all wind cluster samples to their cluster centers, reflecting the cluster density. K is the total number of wind clusters under the current parameter combination. Let be the global mean vector (i.e., cluster center) of the k-th wind cluster. It is a Euclidean norm. This is the global mean vector.
[0047] The DB (Davies-Bouldin) index is based on Calculations show that the closer the index is to 0, the better the clustering effect. Let be the similarity between any two wind clusters. Let be the center distance between the i'th wind cluster and the j'th wind cluster. Let be the average intra-cluster distance of the i'th wind cluster.
[0048] Performing density clustering specifically includes:
[0049] Based on the sample distance matrix, and strictly adhering to the optimal combination of ε-neighborhood radius and minimum number of contained points M, standard samples in the standard classification dataset are identified as core points, boundary points, and noise points. Starting from any unassigned core point, the algorithm expands outward using density reachability, continuously absorbing new core points to be classified into the same wind cluster through chain connections of core points. Boundary points within the ε-neighborhood radius of a core point are also classified into that wind cluster. This process is repeated for all unvisited core points until all core points and boundary points are classified into their respective wind clusters. During this process, noise points are removed. Here, a core point is defined as a standard sample whose total number of standard samples within its ε-neighborhood radius is at least M; a boundary point is a standard sample covered by the ε-neighborhood radius of a core point but which is not itself a core point; and a noise point is a standard sample that is neither a core point nor a boundary point.
[0050] Step S3: Identify each wind category cluster one by one based on the statistical characteristics of each category cluster to obtain the wind category identification results.
[0051] The statistical characteristics of each classification cluster include average wind speed, average wind direction, average land-sea temperature difference, and average recirculation factor. The wind type identification results include northerly wind type, southerly wind type, westerly wind type, easterly wind type, and local land-sea wind type. The specific process of determining the wind type based on each statistical characteristic is the existing technology.
[0052] In this invention, the terms "first," "second," and "third," etc., are used only to distinguish similar objects and are not necessarily used to describe a specific order or sequence, nor should they be construed as indicating or implying relative importance. The use of terms such as "upper," "lower," "left," "right," "front," and "rear" to indicate orientation or positional relationships is based on the orientation or positional relationships shown in the accompanying drawings and is only for the convenience of describing the invention, not to indicate or imply that the device referred to must have a specific orientation, or be constructed and operated in a specific orientation. Therefore, it should not be construed as a limitation on the scope of protection of this invention. Those skilled in the art can understand the specific meaning of the above terms in this application according to the specific circumstances.
[0053] Furthermore, in the description of this application, unless otherwise stated, "multiple" means two or more. "And / or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A alone, A and B simultaneously, or B alone. The character " / " generally indicates that the preceding and following related objects have an "or" relationship.
[0054] The above are merely specific embodiments of the present invention, but the design concept of the present invention is not limited thereto. Any non-substantial modifications made to the present invention using this concept shall be considered as infringing upon the protection scope of the present invention.
Claims
1. A method for identifying sea and land breezes based on density clustering, characterized in that: Includes the following steps: Step S1: Within the set consecutive N days, for each day, acquire N sets of hourly meteorological data in the target area within a unit time interval. The meteorological data includes wind field data and land-sea temperature difference data. Calculate the recirculation factor of the weather cluster based on the wind field data of each day, and extract the wind field feature data and land-sea temperature difference feature data corresponding to the first and second time intervals within the unit time interval of the day. Map and combine the wind field feature data, land-sea temperature difference feature data, and recirculation factor to construct a single sample representing the day. N samples form a dataset to be classified. Step S2: Preprocess the dataset to be classified to obtain a standard classification dataset. Use the standard classification dataset to determine the optimal parameter combination of the density clustering model. Then, use the density clustering model with the optimal parameter combination to perform density clustering on the standard classification dataset, divide it into multiple wind clusters and remove noise points. Step S3: Identify each wind category cluster one by one based on the statistical characteristics of each category cluster to obtain the wind category identification results.
2. The sea and land breeze identification method based on density clustering according to claim 1, characterized in that: The wind field data includes wind direction data and wind speed data. The wind direction data includes the sine and cosine values of the wind rotation angle, and the wind speed data includes the zonal wind component and the meridional wind component. The land-sea temperature difference data is the temperature difference between land observation points and ocean observation points within the target area.
3. The sea and land breeze identification method based on density clustering according to claim 2, characterized in that: In step S1, the recirculation factor R is calculated based on hourly wind field data: firstly, the zonal wind component is extracted hourly within a unit time interval T. Meridional wind component ,according to and Will and Convert to hourly displacement increments and Then, sum up all hourly displacement increments within a unit time interval T and calculate the net linear displacement of the air mass within that unit time interval. and the total length of the motion path Finally, according to Calculate the recirculation factor, assigning R=1 when S=0, where, , .
4. The sea and land breeze identification method based on density clustering according to claim 3, characterized in that: In step S1, the unit time interval T = 24 hours, the first time interval is from 3:00 to 7:00 local time in the target area, and the second time interval is from 14:00 to 18:00 local time in the target area.
5. The sea and land breeze identification method based on density clustering according to claim 4, characterized in that: In step S1, the wind field characteristic data includes the average wind field data in the first time period and the average wind field data in the second time period; the land-sea temperature difference characteristic data includes the average land-sea temperature difference in the first time period and the average land-sea temperature difference in the second time period; and the average wind field data includes the average wind direction data and the average wind speed data.
6. The sea and land breeze identification method based on density clustering according to claim 5, characterized in that: In step S2, the wind field feature data, land-sea temperature difference feature data, and recycling factor in the sample are normalized to make them standard wind field feature data, standard land-sea temperature difference feature data, and standard recycling factor with a mean of 0, a standard deviation of 1, and a standard normal distribution. The standard wind field feature data, standard land-sea temperature difference feature data, and standard recycling factor of each day are spliced together to form a standard sample, resulting in a standard classification dataset composed of N1 standard samples.
7. The sea and land breeze identification method based on density clustering according to claim 6, characterized in that: In step S2, determining the optimal parameter combination of the density clustering model specifically includes: calculating the distance between any two standard samples in the standard sample dataset to form a sample distance matrix; traversing each parameter combination according to a preset interval and a preset distance; and calculating the clustering evaluation index score corresponding to each parameter combination one by one. The clustering evaluation index includes the silhouette coefficient, CH index, and DB index. The optimal parameter combination should make the silhouette coefficient tend to the peak value, the CH index tend to the peak value, and the DB index tend to the valley value. Here, the parameter combination refers to the combination of the ε neighborhood radius and the minimum number of contained points M.
8. The sea and land breeze identification method based on density clustering according to claim 6, characterized in that: In determining the optimal parameter combination for the density clustering model, three curves are constructed with the parameter combination as the abscissa and the silhouette coefficient, CH index, and DB index scores as the ordinates, respectively. By analyzing the changing trends of the three curves, the parameter combination in which the silhouette coefficient tends to the peak, the CH index tends to the peak, and the DB index tends to the trough is selected as the optimal parameter combination.
9. The sea and land breeze identification method based on density clustering according to claim 8, characterized in that: In step S2, density clustering specifically includes: based on the sample distance matrix, strictly according to the optimal combination of ε-neighborhood radius and minimum number of contained points M, identifying standard samples in the standard classification dataset as core points, boundary points, and noise points. Starting from any unassigned core point, expanding outward using density reachability, continuously absorbing new core points through chain connections of core points to be classified into the same wind cluster, and classifying boundary points within the ε-neighborhood radius of the core point into that wind cluster. Traversing all unvisited core points until all core points and boundary points are classified into the corresponding wind clusters. During this process, noise points are removed. Here, a core point refers to a standard sample whose total number of standard samples contained within its ε-neighborhood radius is at least M, a boundary point refers to a standard sample covered by the ε-neighborhood radius of a core point but which is not itself a core point, and a noise point refers to a standard sample that is neither a core point nor a boundary point.
10. The sea and land breeze identification method based on density clustering according to claim 9, characterized in that: In step S3, the statistical characteristics of each category include average wind speed, average wind direction, average land-sea temperature difference, and average recirculation factor. The wind category identification results include northerly wind category, southerly wind category, westerly wind category, easterly wind category, and land-sea wind category.