A hail label completion matching method based on DBSCAN clustering

By using the DBSCAN clustering method, based on radar reflectivity data and geographic coordinate information, the spatiotemporal trajectory of hailstones is constructed and their positions extrapolated, achieving efficient completion of hailstone labels, solving the problem of incomplete labels, and improving the completeness and quality of the label dataset.

CN122240637APending Publication Date: 2026-06-19JIANGXI PROVINCIAL METEOROLOGICAL DATA CENT (JIANGXI PROVINCIAL METEOROLOGICAL ARCHIVES)

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
JIANGXI PROVINCIAL METEOROLOGICAL DATA CENT (JIANGXI PROVINCIAL METEOROLOGICAL ARCHIVES)
Filing Date
2026-05-20
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies suffer from incomplete labeling issues in hail labeling, especially in scenarios where labeling data is scarce. Methods relying on additional prediction models are computationally expensive and inefficient, making it difficult to achieve efficient label completion.

Method used

The DBSCAN clustering method is used to extract hail feature regions using radar reflectivity data and geographic coordinate information from multiple time frames, construct the spatiotemporal motion trajectory of hail cells, extrapolate the hail position of the target time frame based on motion parameters, and complete the labeling by matching unlabeled candidate positions with spatial relationships.

Benefits of technology

It achieves low-cost and efficient hail label completion, adapts to scenarios without predictive samples, improves the completeness and quality of the label dataset, and solves the problem of missing labels caused by radar observation blind spots, data collection gaps, and human annotation omissions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240637A_ABST
    Figure CN122240637A_ABST
Patent Text Reader

Abstract

This invention provides a hail label completion and matching method based on DBSCAN clustering. The method first acquires radar reflectivity data, geographic coordinate information, and the original label dataset for multiple time frames of the sample to be completed. Then, based on the above data, it extracts hail feature regions for each time frame through clustering, performs correlation matching on the feature regions of different time frames, and constructs the spatiotemporal motion trajectory of individual hailstones. Next, it calculates the motion parameters of the individual hailstones and extrapolates them using time information to obtain the predicted hailstone location for the target time frame. Unlabeled candidate locations for the target time frame are selected from the original label dataset, and matching is performed based on their spatial relationship with the predicted hailstone location. The labels of the successfully matched candidate locations are updated to hailstone labels, and finally, the updated label dataset is output. This invention solves the problem of lacking a low-cost and efficient hail label completion and matching method based on DBSCAN clustering that does not rely on additional prediction models.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of tag completion technology, and in particular to a hail tag completion and matching method based on DBSCAN clustering. Background Technology

[0002] With the rapid application of artificial intelligence technology in the meteorological field, hail identification and forecasting models based on machine learning and deep learning have become an important technical means in meteorological operations. The performance of such data-driven models is highly dependent on high-quality and complete labeled training data. Massive and accurate hail label data has become a key foundation for improving the model's forecast accuracy and generalization ability.

[0003] In hail monitoring and labeling, radar composite reflectivity (CR) data is a core data source for identifying hail cells and labeling them because it directly reflects the reflection characteristics of precipitation particles in the atmosphere. Its reflectivity information, recorded with a fixed spatiotemporal resolution, accurately characterizes the spatial distribution and temporal movement of hail cells. In practice, hail tags are mostly generated manually by combining radar observation data with labeling. The tag dataset contains the latitude and longitude coordinates of candidate locations within the radar coverage area at different time frames, along with hail occurrence markers. However, due to factors such as radar coverage blind spots, time gaps in data collection, and the potential for oversights in manual labeling, some time frames have missing hail tags, i.e., blank hail tags for hail cells without predicted samples.

[0004] To address the issue of incomplete labeling, existing technologies include methods based on supervised prediction models, such as training a sequence prediction network to directly predict the location of hail in future frames based on historical radar image sequences. While these methods are effective, their performance heavily relies on training the model with a large amount of fully labeled data. In the context of this problem where labeled data is scarce, a "chicken or egg" dilemma exists, and the models are complex and computationally expensive. Summary of the Invention

[0005] Therefore, the purpose of this invention is to provide a hail tag completion and matching method based on DBSCAN clustering, which aims to solve the problem that there is no hail tag completion and matching method based on DBSCAN clustering in the prior art that does not rely on additional prediction models, is low in cost and highly efficient.

[0006] A hail tag completion and matching method based on DBSCAN clustering according to an embodiment of the present invention, the method comprising: Obtain radar reflectivity data containing multiple time frames, corresponding geographic coordinate information, and original label dataset corresponding to the sample to be completed; Based on the radar reflectivity data of the multiple time frames and their corresponding geographic coordinate information, the hail feature regions of each time frame are extracted by clustering, and the feature regions of different time frames are associated and matched to construct the spatiotemporal movement trajectory of individual hailstones. Based on the spatiotemporal trajectory, the motion parameters of the hailstone are calculated, and based on the motion parameters and time information, the predicted hail location of the target time frame is extrapolated. From the original label dataset, unlabeled candidate locations of the target time frame are obtained. The spatial relationship between the candidate locations and the predicted hail locations is calculated and matched. The labels of the successfully matched candidate locations are updated to hail labels to output the updated label dataset.

[0007] In addition, the hail tag completion and matching method based on DBSCAN clustering according to the above embodiments of the present invention may also have the following additional technical features: Furthermore, the steps for extracting hail feature regions from each time frame through clustering include: The reflectivity threshold is determined based on the overall statistical distribution of the radar reflectivity data from the multiple time frames. The neighborhood radius parameter of the density clustering algorithm is dynamically determined based on the spatial density of pixels with reflectivity higher than the reflectivity threshold in the current frame. Using the reflectivity threshold and the neighborhood radius parameter, the DBSCAN algorithm is used to cluster the current frame data to obtain hail feature regions.

[0008] Furthermore, the step of associating and matching feature regions from different time frames includes: The matching cost is composed of the spherical distance between the center points of the feature regions and the morphological similarity calculated based on the pixel distribution of the feature regions. The Hungarian algorithm is used to find the optimal set of matching pairs that minimizes the total matching cost.

[0009] Furthermore, the motion parameters include at least the moving speed and the moving direction, and the step of calculating the motion parameters of the hailstone cell includes: The calculated moving speed is compared with a preset maximum reasonable speed value. If the moving speed exceeds the maximum reasonable speed value, it is corrected to the maximum reasonable speed value. The calculated motion direction angle sequence is filtered by moving average to smooth out abrupt changes in angles between frames.

[0010] Furthermore, the step of extrapolating the predicted hail location for the target time frame based on the motion parameters and time information includes: Starting from the last known position in the spatiotemporal motion trajectory, the extrapolated position is calculated using a uniform linear motion model based on the corrected movement speed, the corrected movement direction angle, and the time interval from the last known time frame to the target time frame. The movement direction angle used to calculate the extrapolated position is the sum of the direction angle after moving average filtering and the direction angle correction amount. The direction angle correction amount is determined based on the direction angle change trend of the spatiotemporal motion trajectory within a preset number of historical frames.

[0011] Furthermore, the step of matching the candidate locations with the predicted hail locations by calculating the spatial relationship includes: Calculate the minimum distance from each unmarked candidate location to all predicted hail locations, and sort them in order; Starting from the initial search radius, the search radius is gradually increased. In each round of search, candidate positions whose minimum distance is less than or equal to the current search radius are added to the undetermined set. For each candidate location in the set of candidates, a completion confidence score is calculated. The completion confidence score is calculated by weighted summation based on the minimum distance of the location, the number of consecutive frames of the trajectory corresponding to the matched hail prediction location, and the historical maximum reflectivity intensity of the feature region associated with the trajectory. Based on the completion confidence score, the candidate positions to be finally completed are selected from the set of undetermined positions.

[0012] Furthermore, the step of selecting the final candidate position to be completed from the undetermined set based on the completion confidence score includes: Within the current search radius, if the number of candidate positions in the undetermined set exceeds the maximum number of completions per frame, then candidate positions with higher completion confidence scores are selected for completion.

[0013] Another objective of this invention is to provide a hail tag completion and matching system based on DBSCAN clustering, for implementing the aforementioned hail tag completion and matching method based on DBSCAN clustering, the system comprising: The data acquisition module is used to acquire radar reflectivity data containing multiple time frames, corresponding geographic coordinate information, and the original label dataset corresponding to the sample to be completed; The data processing module is used to extract hail feature regions of each time frame by clustering based on the radar reflectivity data of the multiple time frames and their corresponding geographic coordinate information, and to associate and match the feature regions of different time frames in order to construct the spatiotemporal movement trajectory of individual hailstones. The prediction module is used to calculate the motion parameters of the hailstones based on the spatiotemporal motion trajectory, and extrapolate the predicted hail location of the target time frame based on the motion parameters and time information. The labeling module is used to obtain unlabeled candidate locations of the target time frame from the original label dataset, match the candidate locations with the predicted hail locations by calculating the spatial relationship, and update the labels of the successfully matched candidate locations with hail labels to output an updated label dataset.

[0014] Another objective of this invention is to provide a storage medium storing a computer program that, when executed by a processor, implements the steps of the aforementioned hail tag completion and matching method based on DBSCAN clustering.

[0015] Another objective of this invention is to provide an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the aforementioned hail tag completion and matching method based on DBSCAN clustering.

[0016] This invention utilizes multi-timeframe radar reflectivity data, geographic coordinate information, and the original label dataset. By mining the spatiotemporal motion patterns of individual hailstones, it completes feature region extraction, trajectory construction, target location extrapolation, and label matching and completion. The entire completion process is autonomously completed based on historical radar data and existing labels, without relying on any additional supervised prediction models. This significantly reduces computational overhead, achieving low-cost, lightweight hailstone label completion that is suitable for real-world business scenarios without predicted samples. Furthermore, by constructing the spatiotemporal motion trajectory of individual hailstones and accurately extrapolating the hailstone position in the target frame, and then completing the labeling of unlabeled candidate positions through spatial relationship matching, this method effectively addresses the problem of missing hailstone labels caused by radar coverage blind spots, data acquisition time gaps, and manual annotation omissions in actual business operations. It effectively achieves automated completion of missing hailstone labels in the absence of predicted samples, significantly improving the completeness and quality of the hailstone label dataset. Therefore, this invention solves the problem of the lack of a low-cost and efficient DBSCAN clustering-based hailstone label completion and matching method in existing technologies that does not rely on additional prediction models. Attached Figure Description

[0017] Figure 1 This is a flowchart of a hail tag completion and matching method based on DBSCAN clustering in the first embodiment of the present invention; Figure 2 This is a schematic diagram of the structure of a hail tag completion and matching system based on DBSCAN clustering in the second embodiment of the present invention; Figure 3 This is a schematic diagram of the structure of the electronic device in the third embodiment of the present invention; The following detailed description, in conjunction with the accompanying drawings, will further illustrate the present invention. Detailed Implementation

[0018] To facilitate understanding of the present invention, a more complete description will be given below with reference to the accompanying drawings. Several embodiments of the invention are illustrated in the drawings. However, the invention can be implemented in many different forms and is not limited to the embodiments described herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

[0019] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and / or" as used herein includes any and all combinations of one or more of the associated listed items.

[0020] Example 1 Please see Figure 1 The figure shows a hail tag completion and matching method based on DBSCAN clustering in the first embodiment of the present invention. The method specifically includes steps S01-S04.

[0021] S01, obtain the radar reflectivity data, corresponding geographic coordinate information, and original label dataset corresponding to the sample to be completed, which includes multiple time frames.

[0022] Specifically, the radar composite reflectivity factor is stored in NumPy array format, with a spatial resolution of grid, where each grid point corresponds to a latitude and longitude coordinate. Data coverage. The grid is spaced 6 minutes apart. Each frame of CR data corresponds to a time point and includes reflectivity values ​​(in dBZ). Each CR file contains a corresponding radar station format file, including start and end latitude and longitude, grid row and column numbers, used to convert pixel coordinates to geographic coordinates. The original label file is in CSV format, containing three columns: caseID, Time, and Label. caseID is the process number, Time is the occurrence time, and the Label column contains the latitude and longitude of all candidate points in that frame and their labels (0 or 1), where 0 represents no occurrence and 1 represents occurrence. The format is... For each caseID that needs to be completed, extract its historical frame CR data and corresponding label information as input for subsequent trajectory tracking and extrapolation.

[0023] S02, based on the radar reflectivity data of the multiple time frames and their corresponding geographic coordinate information, the hail feature regions of each time frame are extracted by clustering, and the feature regions of different time frames are associated and matched to construct the spatiotemporal movement trajectory of individual hailstones.

[0024] Specifically, the steps for extracting hail feature regions from each time frame through clustering include: determining a reflectivity threshold based on the overall statistical distribution of radar reflectivity data across multiple time frames; dynamically determining the neighborhood radius parameter of a density clustering algorithm based on the spatial density of pixels with reflectivity higher than the reflectivity threshold in the current frame; and using the reflectivity threshold and the neighborhood radius parameter, clustering the current frame data using the DBSCAN algorithm to obtain hail feature regions. By adapting the threshold to reflectivity distributions in different weather scenarios, the system avoids missed or false detections due to fixed thresholds. Furthermore, by matching the spatial clustering density of hail echoes with a dynamic neighborhood radius, clustering accuracy is improved. DBSCAN does not require a preset number of clusters and can identify hail regions of arbitrary shapes, adapting to the irregular shapes of individual hailstones. This allows for accurate separation of hail feature regions from radar echoes, filtering out noise interference, and obtaining the spatial location and morphological characteristics of individual hailstones in a single frame.

[0025] In practical implementation, the reflectivity threshold can be a preset value, but for greater accuracy, an adaptive reflectivity threshold can be used. For example, by statistically analyzing the reflectivity data of all pixels in all T frames of the current case, the 95th percentile can be calculated, and this value can be used as the adaptive reflectivity threshold. This method allows the threshold to be dynamically adjusted according to the data distribution itself, resulting in greater accuracy. Furthermore, the neighborhood radius can be a preset value or an adaptive neighborhood radius. For example, for the current processing frame, all pixels with reflectivity higher than the reflectivity threshold are identified, and the average distance between these high-value pixels is calculated. The neighborhood radius is then set to a multiple of this average distance, such as 1.5 times, thereby achieving dynamic adjustment based on the cluster density of high-reflectivity areas. In addition, since hail echoes typically appear as high-density areas in CR images, while noise points have low density, the DBSCAN algorithm is used for clustering. This eliminates the need for a preset cluster size, automatically identifying clusters of arbitrary shapes and filtering noise. For each frame of data, the DBSCAN algorithm is used for clustering, with parameters being the adaptively obtained neighborhood radius and a fixed minimum sample size. For each cluster, the mean of its pixel coordinates is first calculated, and then the pixel coordinates are converted into geographic coordinates (latitude and longitude). Furthermore, clusters can be filtered based on the number of pixels within each cluster to remove excessively small clusters and retain effective clusters, thereby improving efficiency. The coordinate transformation formula is:

[0026]

[0027] in: for The width and height of the data These are the column index and row index of the grid, respectively, with value ranges of [missing information]. ; For the converted geographical longitude and latitude, These represent the minimum and maximum longitudes of the CR data coverage area, respectively. These are the minimum and maximum latitudes, respectively. The starting latitude and longitude, To determine the latitude and longitude, the above processing yields the cluster set for frame t. , where m is the total number of clusters obtained from clustering in frame t. Each cluster contains the center latitude and longitude. and the maximum CR value. Sample CR data are... The grid format, where each pixel corresponds to a fixed geospatial unit, uses this formula as the only linear mapping method for converting pixel coordinates to geographic latitude and longitude. Using a non-linear mapping would disrupt the spatial resolution consistency of the radar data, failing to meet the accuracy requirements for hailstone trajectory tracking. The latitude and longitude format output by this formula perfectly matches the input requirements of the spherical distance formula in subsequent steps, ensuring a unified coordinate system throughout the entire process. Mercator projection was also considered during the scheme selection process; however, this method introduces projection distortion around the radar site and increases computational complexity, which does not meet the real-time tag completion requirements of this invention.

[0028] Furthermore, the spherical distance between the center points of the feature regions and the morphological similarity calculated based on the pixel distribution of the feature regions together constitute the matching cost. The Hungarian algorithm is used to find the optimal set of matching pairs that minimizes the sum of the matching costs. The spherical distance ensures the accuracy of geospatial matching, the morphological similarity avoids mismatches of heterogeneous hailstones with large morphological differences, and the Hungarian algorithm achieves globally optimal matching, avoiding local matching conflicts and ensuring trajectory continuity. This enables the precise association of hailstone feature regions from different time frames, forming continuous hailstone trajectory.

[0029] In practical implementation, to construct the continuous trajectory of a hailstone, it is necessary to associate the feature regions of two adjacent frames. The matching cost for two feature regions from two adjacent frames is determined by both spatial distance and morphological similarity. The spatial distance is calculated by determining the approximate spherical distance between the center points of the two regions, using the following formula:

[0030] in, , , . The latitude of the center point of a certain cluster region in the previous frame. The latitude of the center of this cluster region in the next frame. The longitude of the center of this cluster region in the previous frame. This is the longitude of the center of the cluster region in the next frame; It is the difference in latitude between the centers of the two regions. The difference in longitude between the center points of the two regions; This represents the average latitude of the center points of the two regions, used for latitude correction of longitude distance. Furthermore, in short-term, temporary forecasts, the movement range of hailstone cells is typically within 100km, and the calculation error of this formula is <0.1km, fully meeting the accuracy requirements of "inter-frame cluster matching". Using a high-precision spherical distance formula increases the computation by 3 times, but the accuracy improvement is <0.1km, which is somewhat excessive and does not meet the target requirements. Moreover, the Hungarian algorithm requires the cost matrix to be "numerical, non-negative, and comparable." Using the distance value output by this formula as the cost, no additional normalization is needed, making it the optimal cost representation method for cluster matching. The latitude correction term, together with the latitude correction in the longitude increment calculation in subsequent steps, forms a unified geographic correction logic, ensuring consistency of geographic error throughout the entire process. Furthermore, without excessively increasing the computational load, to further guarantee matching accuracy, morphological similarity can be used in conjunction to avoid mismatches of heterogeneous individuals with large morphological differences.

[0031] Morphological similarity can be calculated based on shape features such as area and aspect ratio of the two regions, resulting in a similarity score between 0 and 1. Therefore, the total matching cost can be designed as follows: ,in This represents the total matching cost, used to measure the degree of matching between two feature regions in two adjacent frames. The smaller the value, the more likely they belong to the same hailstone unit. It is the normalized distance. and These are weighting coefficients. The similarity score is used as the basis for calculation. Finally, a cost matrix is ​​constructed using the matching costs described above, and the Hungarian algorithm is employed to find the optimal match. Successfully matched regions are considered as the positions of the same hailstone at different times, thus forming or continuing a trajectory.

[0032] S03, based on the spatiotemporal motion trajectory, calculate the motion parameters of the hailstones, and based on the motion parameters and time information, extrapolate the predicted hail location of the target time frame.

[0033] Specifically, the motion parameters include at least the moving speed and the moving direction. After the step of calculating the motion parameters of the hailstone, the following steps are included: comparing the calculated moving speed with a preset maximum reasonable speed value; if the moving speed exceeds the maximum reasonable speed value, then correcting it to the maximum reasonable speed value; and performing a moving average filter on the calculated motion direction angle sequence to smooth out abnormal inter-frame angle abrupt changes.

[0034] In practical implementation, the formula for calculating the movement speed of a hailstone cell is as follows:

[0035] in, t represents the time interval between two adjacent frames, and s represents the distance traveled by adjacent hailstones. This is a preset threshold for the maximum reasonable movement speed of a single hailstone. Since the movement speed of hailstones is affected by atmospheric circulation, the maximum speed observed in actual observations does not exceed 80 km / h. Therefore, this threshold is set... It is a physically reasonable upper limit, and the extrapolation position formula in step S03 directly depends on the velocity. This speed constraint formula can control the extrapolation error to be less than or equal to every 6 minutes. This meets the error tolerance range for short-term hail forecasting (0-30 minutes). If the unconstrained velocity formula is used directly... In extreme cases, the extrapolation error exceeds 50km, and tag completion becomes completely ineffective.

[0036] The formula for calculating the direction angle of movement is:

[0037] Used here The function can output The entire angular range perfectly matches the meteorological direction characterization rule where true north is 0 and increases clockwise, ordinary The function can only output This requires additional angle conversion and can easily introduce directional ambiguity. In the formula... The correction term addresses the geographical issue of the decrease in longitude spacing with increasing latitude, and The angle-based output format, directly used as the core parameter for latitude or longitude increment calculations in subsequent steps, requires no additional unit conversion, ensuring a seamless transition from direction calculation to position extrapolation. After calculating the initial motion direction angle sequence of a hailstone cell, due to noise inherent in the radar data and minor errors introduced when calculating the geographic center and distance from discrete pixel coordinates, the instantaneous direction angle calculated directly using the formula may exhibit non-physical, high-frequency inter-frame jumps. If such jumps are directly used for subsequent position extrapolation, it will lead to an unstable extrapolation path, significantly reducing the accuracy and reliability of the predicted position. To address these issues and improve the physical rationality and smoothness of the motion parameters, this embodiment introduces a smoothing step for the motion direction angle after obtaining the initial direction angle sequence. Specifically, methods such as moving average filtering are used to process the sequence to filter out abnormal short-term fluctuations and retain low-frequency variation components that reflect the true motion trend of the hailstone cell. The smoothed direction angle will be used for subsequent position extrapolation calculations. This processing significantly enhances the smoothness and continuity of the extrapolated trajectory, effectively suppressing the divergence of predicted positions caused by data noise, thus providing a more accurate and reliable predicted position input for subsequent label matching steps.

[0038] Furthermore, taking the last known position in the spatiotemporal motion trajectory as the starting point, the extrapolated position is calculated using a uniform linear motion model based on the corrected movement speed, the corrected movement direction angle, and the time interval from the last known time frame to the target time frame. The movement direction angle used to calculate the extrapolated position is the sum of the direction angle after moving average filtering and the direction angle correction amount. The direction angle correction amount is determined based on the direction angle change trend of the spatiotemporal motion trajectory within a preset number of historical frames.

[0039] In practice, the hail center position in future target frames is extrapolated based on the last known position of the trajectory and its motion parameters. Assume the reference frame is... The target frame is offset step size .

[0040] The total displacement distance is:

[0041] The latitude increment is:

[0042] in, The direction angle used for extrapolation is 111.0, which is the number of kilometers corresponding to 1° of latitude near the equator. The distance in the latitudinal direction is approximately constant.

[0043] The longitude increment is:

[0044] in, The latitude of the reference frame is used to correct meridian convergence.

[0045] Final extrapolation position:

[0046] For each trajectory, the extrapolated points of the target frame are obtained. .in, The longitude of the hail center in the target frame is obtained by extrapolation. The origin of the extrapolated prediction is the longitude of the hail center in the baseline frame. This represents the latitude of the hail center in the target frame obtained through extrapolation. Furthermore, the direction angle used for extrapolation is... , , This is the corrected movement direction angle. This refers to the average change in azimuth angle per frame calculated based on the historical trend of the trajectory's azimuth angle. This invention uses a latitude and longitude extrapolation formula based on the assumption of uniform linear motion, rather than complex dynamic models, nonlinear extrapolation, or machine learning extrapolation models. The reasons are as follows: First, the short-term extrapolation window for hail is only a multi-frame interval on the order of 6 minutes. Individual motion can be approximated as uniform linear motion within a short time. This assumption is highly consistent with the physical laws of severe convective weather, ensuring extrapolation accuracy while avoiding the introduction of unnecessary parameters. Second, the formula introduces longitude increments... The correction term is completely consistent with the geographic correction logic used in the previous spherical distance calculation and orientation angle calculation, ensuring geospatial consistency throughout the entire process from trajectory tracking to position extrapolation and avoiding cumulative errors caused by inconsistencies in coordinate systems. Furthermore, the formula is computationally efficient and highly interpretable, relying only on three existing trajectory parameters: velocity, direction, and reference position, without requiring additional meteorological field data. This perfectly matches the technical approach of this invention, which is based on radar CR data for tag completion. If high-precision dynamic extrapolation or machine learning extrapolation were used, it would increase model complexity and computational cost, and would require additional input data, which would not meet the design goals of lightweight, robust, and engineering-ready implementation of this method. Therefore, this invention chooses the uniform linear extrapolation formula as the optimal solution that balances accuracy, efficiency, and technical approach consistency.

[0047] S04. Obtain the unlabeled candidate locations of the target time frame from the original label dataset, match the spatial relationship between the candidate locations and the predicted hail locations by calculating the spatial relationship, and update the labels of the successfully matched candidate locations to hail labels to output the updated label dataset.

[0048] Specifically, the minimum distance from each unlabeled candidate location to all predicted hail locations is calculated and sorted sequentially. Starting from the initial search radius, the search radius is gradually increased. In each round of search, candidate locations with a minimum distance less than or equal to the current search radius are added to the pending set. A completion confidence score is calculated for each candidate location in the pending set. The completion confidence score is calculated by weighted summation based on the minimum distance of the location, the number of consecutive frames of the trajectory corresponding to the matched predicted hail location, and the historical maximum reflectivity intensity of the feature region associated with the trajectory. Based on the completion confidence score, the candidate location to be finally completed is selected from the pending set.

[0049] In practical implementation, the minimum distance can be used as the ranking criterion for candidate points, rather than the average distance, maximum distance, density weight, or random ranking. The rationale is as follows: First, this distance directly reflects the spatial proximity between the point to be completed and the extrapolated hail center, conforming to the physical prior that the closer to the hail center, the more likely it is to belong to a hail area, thus the ranking result has clear meteorological significance. Second, the minimum distance value can directly inherit the calculation result of the spherical distance formula mentioned earlier, without redefining the distance metric, ensuring the consistency and coherence of the distance definition throughout the entire process from trajectory extrapolation to label matching. Third, this ranking method provides an ordered candidate list for the subsequent radius-increasing matching strategy, enabling the system to gradually expand the search range from near to far, prioritizing the matching of the most reliable points, satisfying the constraint of the maximum number of labels per frame, and significantly reducing the false matching rate. If other ranking methods are used, the principle of spatial proximity priority will be violated, leading to the priority selection of distant points, greatly increasing the probability of label false completion. Therefore, minimum distance ranking is the most reasonable and stable ranking criterion in the label matching strategy of this invention. However, to further ensure the accuracy and precision of the matching, features such as distance, trajectory reliability, and intensity can be introduced to quantitatively evaluate the credibility of each completion based on the minimum distance, and this can guide the matching decision, making the completion process more intelligent and the results more interpretable.

[0050] Furthermore, within the current search radius, if the number of candidate positions in the undetermined set exceeds the maximum number of completions per frame constraint, then candidate positions with higher completion confidence scores are selected for completion.

[0051] In practical implementation, let the set of all unlabeled candidate points in the target frame be . The set of all predicted hail locations is For each candidate point ∈ Calculate its to Minimum distance of all points .

[0052] During the search and matching process, a completion confidence score is calculated for each candidate point. Starting from the initial search radius, the radius is gradually increased. Within each radius, candidate points are sorted from highest to lowest confidence score, with higher-scoring points matched first, until the maximum number of completions per frame is reached or the score falls below a threshold. The formula for calculating the completion confidence score is:

[0053] in, This represents the distance confidence component, based on the distance from the candidate point to the predicted point. To maximize the search radius, This is the minimum distance from the candidate point to the nearest predicted point.

[0054]

[0055] For the trajectory duration confidence component, based on the number of consecutive frames of the matched trajectory, To match the duration of the trajectory to which the predicted point belongs, i.e. the number of consecutive frames from when the trajectory was lost to the present.

[0056]

[0057] in, For the historical reflectance confidence component, based on the historical maximum dBZ of the matched trajectory, This represents the historical maximum reflectivity of the area associated with the trajectory, reflecting the historical peak intensity of individual hailstones.

[0058]

[0059] in, To comprehensively complete the confidence score, , , These are the corresponding weighting coefficients. , , The values ​​are all in the range [0,1]. The value range is [0,3], with higher scores indicating that the candidate point is more worthy of priority completion. In summary, this invention provides a hail label completion and matching method based on DBSCAN clustering. This method uses radar reflectivity data from multiple time frames, geographic coordinate information, and the original label dataset as a foundation. By mining the spatiotemporal motion patterns of individual hailstones, it completes feature region extraction, trajectory construction, target location extrapolation, and label matching completion. The entire completion process is autonomously completed based on historical radar data and existing labels, without relying on any additional supervised prediction models, thus significantly reducing computational overhead and achieving low-cost, lightweight hail label completion, adapting to the actual business scenarios without prediction samples. Simultaneously, by constructing the spatiotemporal motion trajectory of individual hailstones and accurately extrapolating the hailstone position in the target frame, and then completing the label completion of unlabeled candidate positions through spatial relationship matching, this method can specifically solve the problem of missing hail labels caused by radar observation coverage blind spots, data acquisition time gaps, and manual annotation omissions in actual business operations. It effectively achieves automated completion of missing hail labels in the absence of prediction samples, significantly improving the completeness and data quality of the hail label dataset. Therefore, this invention solves the problem in the prior art of lacking a low-cost and efficient hail tag completion and matching method based on DBSCAN clustering that does not rely on additional prediction models.

[0060] Example 2 Please see Figure 2 The diagram shown is a structural block diagram of a hail tag completion and matching system based on DBSCAN clustering proposed in the second embodiment of the present invention. This hail tag completion and matching system 200 based on DBSCAN clustering includes: a data acquisition module 21, a data processing module 22, a prediction module 23, and a labeling module 24, wherein: The data acquisition module 21 is used to acquire radar reflectivity data containing multiple time frames, corresponding geographic coordinate information, and the original label dataset corresponding to the sample to be completed; Data processing module 22 is used to extract hail feature regions of each time frame by clustering based on the radar reflectivity data of the multiple time frames and their corresponding geographic coordinate information, and to associate and match the feature regions of different time frames in order to construct the spatiotemporal movement trajectory of a hailstone. Prediction module 23 is used to calculate the motion parameters of the hailstone based on the spatiotemporal motion trajectory, and extrapolate the predicted hailstone position of the target time frame based on the motion parameters and time information. The labeling module 24 is used to obtain the unlabeled candidate positions of the target time frame from the original label dataset, match the spatial relationship between the candidate positions and the predicted hail positions by calculating the spatial relationship, and update the labels of the successfully matched candidate positions to hail labels to output the updated label dataset.

[0061] Example 3 In another aspect, the present invention also proposes an electronic device, please refer to [link to relevant documentation]. Figure 3 The diagram shows an electronic device according to the third embodiment of the present invention, including a memory 20, a processor 10, and a computer program 30 stored in the memory and executable on the processor. When the processor 10 executes the computer program 30, it implements the hail tag completion and matching method based on DBSCAN clustering as described above.

[0062] In some embodiments, the processor 10 may be a central processing unit (CPU), controller, microcontroller, microprocessor or other data processing chip, used to run program code stored in memory 20 or process data, such as executing access restriction programs.

[0063] The memory 20 includes at least one type of readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 20 can be an internal storage unit of an electronic device, such as the hard disk of the electronic device. In other embodiments, the memory 20 can also be an external storage device of the electronic device, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc. Furthermore, the memory 20 can include both internal and external storage units of the electronic device. The memory 20 can be used not only to store application software and various types of data of the electronic device, but also to temporarily store data that has been output or will be output.

[0064] It should be pointed out that, Figure 3 The structure shown does not constitute a limitation on the electronic device. In other embodiments, the electronic device may include fewer or more components than shown, or combine certain components, or have different component arrangements.

[0065] This invention also proposes a computer-readable storage medium storing a computer program that, when executed by a processor, implements a hail tag completion and matching method based on DBSCAN clustering as described above.

[0066] Those skilled in the art will understand that the logic and / or steps represented in the flowcharts or otherwise described herein, for example, can be considered as a ordered list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a processor-included system, or other system that can fetch and execute instructions from, an instruction execution system, apparatus, or device). For the purposes of this specification, "computer-readable medium" can mean any means that can contain, store, communicate, propagate, or transmit programs for use by, or in conjunction with, an instruction execution system, apparatus, or device.

[0067] More specific examples of computer-readable media (a non-exhaustive list) include: electrical connections (electronic devices) having one or more wires, portable computer disk drives (magnetic devices), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Furthermore, computer-readable media can even be paper or other suitable media on which the program can be printed, because the program can be obtained electronically, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in computer memory.

[0068] It should be understood that various parts of the present invention can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.

[0069] In the description of this specification, references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.

[0070] The above embodiments merely illustrate several implementation methods of the present invention, and their descriptions are relatively specific and detailed, but they should not be construed as limiting the scope of the present invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these all fall within the protection scope of the present invention. Therefore, the protection scope of this patent should be determined by the appended claims.

Claims

1. A hail tag completion and matching method based on DBSCAN clustering, characterized in that, The method includes: Obtain radar reflectivity data containing multiple time frames, corresponding geographic coordinate information, and original label dataset corresponding to the sample to be completed; Based on the radar reflectivity data of the multiple time frames and the geographic coordinate information, the hail feature regions of each time frame are extracted by clustering, and the feature regions of different time frames are associated and matched to construct the spatiotemporal movement trajectory of individual hailstones. Based on the spatiotemporal trajectory, the motion parameters of the hailstone are calculated, and based on the motion parameters and time information, the predicted hail location of the target time frame is extrapolated. From the original label dataset, unlabeled candidate locations of the target time frame are obtained. The spatial relationship between the candidate locations and the predicted hail locations is calculated and matched. The labels of the successfully matched candidate locations are updated to hail labels to output the updated label dataset.

2. The hail tag completion and matching method based on DBSCAN clustering according to claim 1, characterized in that, The steps for extracting hail feature regions for each time frame through clustering include: The reflectivity threshold is determined based on the overall statistical distribution of the radar reflectivity data from the multiple time frames. The neighborhood radius parameter of the density clustering algorithm is dynamically determined based on the spatial density of pixels with reflectivity higher than the reflectivity threshold in the current frame. Using the reflectivity threshold and the neighborhood radius parameter, the DBSCAN algorithm is used to cluster the current frame data to obtain hail feature regions.

3. The hail tag completion and matching method based on DBSCAN clustering according to claim 1, characterized in that, The steps for associating and matching feature regions from different time frames include: The matching cost is composed of the spherical distance between the center points of the feature regions and the morphological similarity calculated based on the pixel distribution of the feature regions. The Hungarian algorithm is used to find the optimal set of matching pairs that minimizes the total matching cost.

4. The hail tag completion and matching method based on DBSCAN clustering according to claim 1, characterized in that, The motion parameters include at least the moving speed and the moving direction. The step of calculating the motion parameters of the hailstone cell includes: The calculated moving speed is compared with a preset maximum reasonable speed value. If the moving speed exceeds the maximum reasonable speed value, it is corrected to the maximum reasonable speed value. The calculated motion direction angle sequence is filtered by moving average to smooth out abrupt changes in angles between frames.

5. The hail tag completion and matching method based on DBSCAN clustering according to claim 4, characterized in that, The steps for extrapolating the predicted hail location for the target time frame based on the motion parameters and time information include: Starting from the last known position in the spatiotemporal motion trajectory, the extrapolated position is calculated using a uniform linear motion model based on the corrected movement speed, the corrected movement direction angle, and the time interval from the last known time frame to the target time frame. The movement direction angle used to calculate the extrapolated position is the sum of the direction angle after moving average filtering and the direction angle correction amount. The direction angle correction amount is determined based on the direction angle change trend of the spatiotemporal motion trajectory within a preset number of historical frames.

6. The hail tag completion and matching method based on DBSCAN clustering according to claim 1, characterized in that, The step of matching the candidate locations with the predicted hail locations by calculating the spatial relationship includes: Calculate the minimum distance from each unmarked candidate location to all predicted hail locations, and sort them in order; Starting from the initial search radius, the search radius is gradually increased. In each round of search, candidate positions whose minimum distance is less than or equal to the current search radius are added to the undetermined set. For each candidate location in the set of candidates, a completion confidence score is calculated. The completion confidence score is calculated by weighted summation based on the minimum distance of the location, the number of consecutive frames of the trajectory corresponding to the matched hail prediction location, and the historical maximum reflectivity intensity of the feature region associated with the trajectory. Based on the completion confidence score, the candidate positions to be finally completed are selected from the set of undetermined positions.

7. A hail tag completion and matching method based on DBSCAN clustering according to claim 6, characterized in that, The step of selecting the final candidate position to be completed from the undetermined set based on the completion confidence score includes: Within the current search radius, if the number of candidate positions in the undetermined set exceeds the maximum number of completions per frame, then candidate positions with higher completion confidence scores are selected for completion.

8. A hail tag completion and matching system based on DBSCAN clustering, characterized in that, The system for implementing the hail tag completion and matching method based on DBSCAN clustering as described in any one of claims 1 to 7 comprises: The data acquisition module is used to acquire radar reflectivity data containing multiple time frames, corresponding geographic coordinate information, and the original label dataset corresponding to the sample to be completed; The data processing module is used to extract hail feature regions of each time frame by clustering based on the radar reflectivity data of the multiple time frames and their corresponding geographic coordinate information, and to associate and match the feature regions of different time frames in order to construct the spatiotemporal movement trajectory of individual hailstones. The prediction module is used to calculate the motion parameters of the hailstones based on the spatiotemporal motion trajectory, and extrapolate the predicted hail location of the target time frame based on the motion parameters and time information. The labeling module is used to obtain unlabeled candidate locations of the target time frame from the original label dataset, match the candidate locations with the predicted hail locations by calculating the spatial relationship, and update the labels of the successfully matched candidate locations with hail labels to output an updated label dataset.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the steps of a hail tag completion and matching method based on DBSCAN clustering as described in any one of claims 1 to 7.

10. An electronic device, characterized in that, It includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement a hail tag completion and matching method based on DBSCAN clustering as described in any one of claims 1-7.