Meteorological risk early warning method and system based on multi-source data fusion
By using a multi-source data fusion method, spatial feature point clusters and a grid system are generated. Combined with a BP neural network model and a logistic regression model, the problem of local differences in geological disaster early warning in mountainous counties is solved, and more accurate meteorological risk warnings are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- FUJIAN METEOROLOGICAL SERVICE CENT
- Filing Date
- 2026-03-05
- Publication Date
- 2026-06-19
AI Technical Summary
Existing meteorological risk early warning methods for geological disasters in mountainous counties fail to fully consider the spatial differences in local geological environments and the mutual influence of dynamic changes in precipitation, resulting in insufficient accuracy of early warning results in local areas.
By acquiring multi-source heterogeneous data, spatial granulation and structural reorganization are performed to generate spatial feature point clusters, construct a grid system, and data fusion is carried out using a BP neural network model and a logistic regression model to calculate the geological disaster risk index and precipitation critical probability, and generate a comprehensive meteorological risk warning probability.
It has improved the accuracy and timeliness of geological disaster early warning, reduced early warning deviations caused by local differences, and enhanced the precision of early warning results.
Smart Images

Figure CN121787917B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of data processing technology, and in particular to a meteorological risk early warning method and system based on multi-source data fusion. Background Technology
[0002] In the meteorological risk early warning work for geological disasters in mountainous counties, current early warning methods mostly combine regional precipitation monitoring data with static geological and geomorphological data within the county for subsequent analysis. However, existing early warning methods still have room for improvement in practical implementation, failing to fully consider the spatial differences in local geological environments and the mutual influence of dynamic precipitation changes, thus affecting the accuracy of early warning results in certain areas. Specifically, taking a mountainous county in southern China as an example, the county has complex terrain, including slopes formed by granite and gully areas with shale distribution. Significant spatial differences exist in soil layer thickness, rock mass integrity, groundwater depth, and other geological conditions in different areas. At the same time, precipitation distribution in this area is also uneven, with short-term heavy rainfall and... The alternating periods of continuous overcast and rainy weather, with varying intensity and duration of precipitation across different areas within the county, mean that existing early warning methods largely rely on standardized geological references and fixed precipitation criteria. These methods fail to adequately consider the spatial heterogeneity of local geological conditions and the interaction between precipitation processes. For instance, during the same round of continuous rainfall, the soil permeability of granite slopes is relatively high, allowing for rapid infiltration of short-duration heavy rainfall, potentially triggering deep landslides. Conversely, the soil in shale gully areas has better water retention, making them more susceptible to surface mudslides from continuous moderate to heavy rain. This could lead to situations where early warnings are not timely enough for granite slopes, while warnings for shale gully areas are overly stringent, resulting in a mismatch between warning results and actual local risk conditions. Consequently, these methods may not adequately meet the precise early warning needs of different areas within the county. Summary of the Invention
[0003] The technical problem to be solved by the present invention is to provide a meteorological risk early warning method and system based on multi-source data fusion, which improves the accuracy and timeliness of geological disaster early warning.
[0004] To solve the above-mentioned technical problems, the technical solution of the present invention is as follows:
[0005] Firstly, a meteorological risk early warning method based on multi-source data fusion, the method comprising:
[0006] Step 1: Acquire multi-source heterogeneous data, including geological and geomorphological element data and precipitation monitoring data;
[0007] Step 2: Spatial granulation and structural reorganization are performed on the multi-source heterogeneous data to generate spatial feature point clusters; a spatial base surface is generated based on the spatial feature point clusters; initial analytic elements are defined on the base surface, and the initial analytic elements are further subdivided according to the heterogeneity of the spatial feature point cluster distribution to construct a raster system; each spatial feature point is mapped and assigned to the corresponding bottom-level raster, and a modified weight is calculated based on the aggregation characteristics of the feature points in each bottom-level raster.
[0008] Step 3: Correct the geological and geomorphological element data using corrected weights to obtain the corrected geological environment parameter field; based on the corrected geological environment parameter field, use a BP neural network model to perform evaluation and calculation to obtain the geological hazard index.
[0009] Step 4: Based on precipitation monitoring data, calculate the daily comprehensive effective precipitation dataset, and obtain the precipitation critical probability based on the statistical correlation between the daily comprehensive effective precipitation dataset and geological hazards;
[0010] Step 5: Based on the logistic regression model, the geological disaster risk index and the critical probability of precipitation are fused to obtain the comprehensive meteorological risk warning probability;
[0011] Step 6: Based on the comprehensive meteorological risk warning probability, match the preset meteorological disaster risk classification threshold value, determine the warning level, and obtain the meteorological risk warning result.
[0012] Secondly, a meteorological risk early warning system based on multi-source data fusion includes:
[0013] The data acquisition module is used to acquire multi-source heterogeneous data, including geological and geomorphological element data and precipitation monitoring data;
[0014] The processing module is used to perform spatial granulation and structural reorganization on multi-source heterogeneous data to generate spatial feature point clusters; based on the spatial feature point clusters, a spatial base surface is generated; initial analytic elements are defined on the base surface, and the initial analytic elements are subdivided according to the heterogeneity of the distribution of spatial feature point clusters to construct a raster system; each spatial feature point is mapped and assigned to the corresponding bottom-level raster, and a modified weight is calculated and generated based on the aggregation features of feature points in each bottom-level raster;
[0015] The correction module is used to correct the geological and geomorphological element data using correction weights to obtain the corrected geological environment parameter field; based on the corrected geological environment parameter field, a BP neural network model is used for evaluation and calculation to obtain the geological hazard index.
[0016] The calculation module is used to calculate the daily comprehensive effective precipitation dataset based on precipitation monitoring data, and to obtain the precipitation critical probability based on the statistical correlation between the daily comprehensive effective precipitation dataset and geological hazards.
[0017] The fusion module is used to fuse the geological disaster risk index and the critical probability of precipitation based on the logistic regression model to obtain the comprehensive meteorological risk warning probability.
[0018] The matching module is used to match the preset meteorological disaster risk classification threshold value based on the comprehensive meteorological risk warning probability, determine the warning level, and obtain the meteorological risk warning result.
[0019] Thirdly, a computing device includes:
[0020] One or more processors;
[0021] A storage device for storing one or more programs that, when executed by one or more processors, cause the one or more processors to implement the method.
[0022] The above-described solution of the present invention has at least the following beneficial effects:
[0023] This method performs standardized spatial granulation and structural reorganization on multi-source heterogeneous data. It then clusters spatial feature points based on proximity and attribute similarity to generate point clusters. Based on the distribution of these point clusters, it constructs a reasonable spatial base surface and raster system. The initial analytical elements are then subdivided into layers using heterogeneity indices until they meet the homogeneity standard. Simultaneously, it calculates and corrects weights by combining the aggregation features of feature points within the raster. This approach can fully take into account the subtle spatial differences in the local geological environment of mountainous counties, effectively avoiding the limitations of using a unified reference standard. This makes the data processing more closely match the actual topographic and geological distribution characteristics of the region, and helps reduce early warning deviations caused by local differences. Attached Figure Description
[0024] Figure 1 This is a schematic diagram of the meteorological risk early warning method based on multi-source data fusion provided in an embodiment of the present invention.
[0025] Figure 2 This is a schematic diagram of a meteorological risk early warning system based on multi-source data fusion provided in an embodiment of the present invention. Detailed Implementation
[0026] Exemplary embodiments of the present disclosure will now be described in more detail with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
[0027] like Figure 1As shown, embodiments of the present invention propose a meteorological risk early warning method based on multi-source data fusion, the method comprising the following steps:
[0028] Step 1: Acquire multi-source heterogeneous data, including geological and geomorphological element data and precipitation monitoring data;
[0029] Step 2: Spatial granulation and structural reorganization are performed on the multi-source heterogeneous data to generate spatial feature point clusters; a spatial base surface is generated based on the spatial feature point clusters; initial analytic elements are defined on the base surface, and the initial analytic elements are further subdivided according to the heterogeneity of the spatial feature point cluster distribution to construct a raster system; each spatial feature point is mapped and assigned to the corresponding bottom-level raster, and a modified weight is calculated based on the aggregation characteristics of the feature points in each bottom-level raster.
[0030] Step 3: Correct the geological and geomorphological element data using corrected weights to obtain the corrected geological environment parameter field; based on the corrected geological environment parameter field, use a BP neural network model to perform evaluation and calculation to obtain the geological hazard index.
[0031] Step 4: Based on precipitation monitoring data, calculate the daily comprehensive effective precipitation dataset, and obtain the precipitation critical probability based on the statistical correlation between the daily comprehensive effective precipitation dataset and geological hazards;
[0032] Step 5: Based on the logistic regression model, the geological disaster risk index and the critical probability of precipitation are fused to obtain the comprehensive meteorological risk warning probability;
[0033] Step 6: Based on the comprehensive meteorological risk warning probability, match the preset meteorological disaster risk classification threshold value, determine the warning level, and obtain the meteorological risk warning result.
[0034] In this embodiment of the invention, the method performs standardized spatial granulation and structural reorganization on multi-source heterogeneous data, clusters point clusters by combining the proximity and attribute similarity of spatial feature points, and constructs a reasonable spatial base surface and grid system based on the distribution of point clusters. The initial analytical elements are subdivided into layers by judging heterogeneity index until the uniformity standard is met. At the same time, the weight is corrected by calculating the aggregation features of feature points within the grid. This method can fully take into account the subtle spatial differences in the local geological environment of mountainous counties, effectively avoid the limitations brought about by using a unified reference standard, and make the data processing more in line with the actual topographic and geological distribution characteristics of the region, which helps to reduce the early warning deviation caused by local differences.
[0035] In a preferred embodiment of the present invention, step 1, acquiring multi-source heterogeneous data, including geological and geomorphological element data and precipitation monitoring data, specifically includes: conducting comprehensive data collection by rationally deploying geological exploration equipment, fixed precipitation monitoring stations, and mobile precipitation monitoring equipment within the study area. The collection scope focuses on covering different terrain areas such as granite slopes, shale gullies, and gentle slopes, ensuring that the collected data can accurately reflect the spatial heterogeneity of geological conditions and precipitation distribution within the area, and avoiding deviations in subsequent processing due to incomplete data coverage; wherein, the collection of geological and geomorphological element data adopts the method of on-site survey and data verification, with professional geological survey personnel carrying professional survey equipment such as ground-penetrating radar, thickness gauges, and slope meters, according to a preset survey point distribution plan, to conduct on-site surveys of various key areas within the study area. Key points were surveyed point by point in the field. At each survey point, data on soil thickness, rock integrity, groundwater depth, topographic slope, and lithological distribution were collected simultaneously. These data are static data reflecting the basic characteristics of the geological environment of the area. During the data collection process, positioning equipment was used to record the latitude and longitude coordinates of each survey point to ensure that each set of geological and geomorphological data corresponds one-to-one with the specific spatial location, preventing data mismatch. The focus of the data collection was different for different terrain areas. For granite slopes, the focus was on collecting auxiliary data related to soil permeability to accurately assess the risk of landslides caused by precipitation infiltration. For shale gullies, the focus was on collecting auxiliary data related to soil water retention to assess the risk of debris flows induced by continuous precipitation.
[0036] After all survey data was collected, professionals conducted preliminary processing, verifying the data at each survey point one by one. Abnormal data caused by factors such as equipment malfunction, human error, or external environmental interference were eliminated, including outliers such as soil thickness exceeding the normal range for the area and contradictory rock mass integrity data. After processing and verification, a complete and accurate dataset of geological and geomorphological elements was formed. Precipitation monitoring data collection employed a dual-mode approach: fixed stations and mobile supplementary stations. Fixed precipitation monitoring stations, pre-deployed within the study area, collected real-time data on precipitation amount, duration, and intensity at each station, with a collection frequency set to once per hour to ensure the capture of dynamic changes in precipitation. Simultaneously, for remote areas not covered by fixed precipitation monitoring stations... In areas with complex terrain, mobile precipitation monitoring equipment was used for supplementary data collection. The mobile monitoring equipment carried out data collection according to a preset route and time interval to ensure that precipitation data could fully cover the entire study area without any blind spots. During the collection process, the latitude and longitude coordinates of each monitoring station and the time of each data collection were recorded simultaneously. The real-time data was integrated with the historical multi-day data to form a complete real-time and historical multi-day raw precipitation sequence. The collected precipitation data also underwent preliminary screening to remove abnormal data that exceeded the normal precipitation range of the area, such as short-term precipitation far exceeding the historical extreme value of the area or data with obvious logical contradictions in the duration of precipitation. Finally, the organized and verified geological and geomorphological element dataset and precipitation monitoring dataset were integrated to form a multi-source heterogeneous data set.
[0037] This embodiment, through a detailed multi-source heterogeneous data acquisition process, adopts a combination of field surveys and dual monitoring to ensure that the collected data can comprehensively cover different areas of the study area and accurately capture the geological and precipitation characteristics of different areas such as granite slopes and shale gullies.
[0038] In a preferred embodiment of the present invention, step 2 involves spatial granulation and structural reorganization of multi-source heterogeneous data to generate spatial feature point clusters; generating a spatial base surface based on the spatial feature point clusters; defining initial analytic elements on the base surface and further subdividing the initial analytic elements according to the heterogeneity of the spatial feature point cluster distribution to construct a raster system; mapping each spatial feature point to its corresponding bottom-level raster and calculating and generating corrected weights based on the aggregation features of feature points within each bottom-level raster, which may include:
[0039] Step 201 involves spatial granulation of the multi-source heterogeneous data, transforming geological and geomorphological element data and precipitation monitoring data into discrete spatial feature points to generate a set of spatial feature points. Based on this set, structural reorganization is performed, and clustering is conducted according to the spatial proximity and attribute similarity between the spatial feature points to generate spatial feature point clusters. Specifically, this includes: firstly, spatial granulation of the processed and verified multi-source heterogeneous data. The core of spatial granulation is to deeply bind the data with spatial location, transforming it into discrete spatial feature points. Specifically, the spatial location information of each exploration point in the geological and geomorphological element data, i.e., latitude and longitude coordinates, is combined with the corresponding geological attribute values for each point. Each location is transformed into a discrete geological spatial feature point. Each geological spatial feature point includes the latitude and longitude coordinates, soil layer thickness, rock mass integrity, groundwater depth, topographic slope, lithological distribution, and corresponding auxiliary attribute data, ensuring that the information of each feature point is complete and traceable. Similarly, the latitude and longitude coordinates of each monitoring station in the precipitation monitoring data are combined with the various precipitation attribute values corresponding to that station, and each monitoring station is transformed into a discrete precipitation spatial feature point. Each precipitation spatial feature point includes the latitude and longitude coordinates of that station, precipitation amount, precipitation duration, precipitation intensity, and data acquisition time, ensuring that the precipitation feature points can reflect the spatial location and dynamic characteristics of precipitation.
[0040] All transformed discrete spatial feature points for geological and precipitation types are compiled, and the information of each feature point is verified one by one. Feature points with missing information or incorrect locations are removed, ultimately generating a complete set of spatial feature points, clarifying the type, spatial location, and attribute information of each feature point. The generated set of spatial feature points undergoes structural reorganization. The core operation is to cluster spatial feature points based on spatial proximity and attribute similarity, generating spatial feature point clusters to ensure that feature points within the same cluster have similar spatial locations and attribute characteristics. Spatial proximity is determined by calculating the straight-line distance between any two spatial feature points, specifically based on the latitude and longitude coordinates of the two feature points. The spatial similarity is determined by calculating the straight-line distance between two points using the geographic distance formula. The smaller the distance value, the stronger the spatial proximity of the two spatial feature points. Attribute similarity is determined by calculating the degree of difference between the corresponding attribute values of any two spatial feature points. For two geological feature points, the focus is on calculating the difference in core attributes such as soil layer thickness, rock mass integrity, and groundwater depth; the smaller the difference, the stronger the attribute similarity. For two precipitation feature points, the focus is on calculating the difference in core attributes such as precipitation amount and precipitation intensity; the smaller the difference, the stronger the attribute similarity. For a geological feature point and a precipitation feature point, since their attribute types are different, attribute similarity is not calculated; clustering is only performed on feature points of the same type during the clustering process.
[0041] The clustering process employs an iterative clustering approach, specifically as follows: First, several spatial feature points are randomly selected as initial cluster centers. The number of initial cluster centers is predetermined based on the size of the study area and the number of feature points, ensuring that the clustering results cover all regions with different characteristics. Next, the spatial proximity and attribute similarity between each spatial feature point and each initial cluster center are calculated. Combining the results of these two indicators, the spatial feature point is assigned to the cluster containing the cluster center with the strongest spatial proximity and attribute similarity, ensuring that each feature point is assigned to the cluster best suited to its own characteristics. After all feature points have completed their first assignment, the cluster center for each cluster is recalculated by taking the average of the corresponding attribute values of all feature points within the cluster and combining this with the average of the spatial locations of all feature points to determine new cluster centers. This process is repeated. The process of assigning feature points to recalculated cluster centers involves checking the change in cluster centers for each cluster after each iteration. The iteration stops when the change in cluster centers is less than a preset minimum change threshold and the assignments of all spatial feature points no longer change, ultimately generating multiple spatial feature point clusters. Within each cluster, spatial feature points are spatially close and share similar attributes. For example, geological feature points in granite slope areas, due to similarities in soil permeability and rock integrity, will cluster into one or more feature point clusters. Similarly, geological feature points in shale gully areas, due to similarities in soil water retention and groundwater depth, will cluster into another type of feature point cluster. Rainfall-related feature points in areas with abundant rainfall will cluster into their own specific clusters, while those in areas with scarce rainfall will form another type of cluster.
[0042] Step 202: Based on spatial feature point clusters, calculate and determine the outermost boundary point set containing all spatial feature point clusters. Connect the boundary point sets sequentially to obtain a convex polyhedron envelope surface, using the convex polyhedron envelope surface as the spatial base surface. On the spatial base surface, divide according to a preset fixed side length to generate initial analysis elements. Specifically, based on all spatial feature point clusters generated in step 201, first calculate and determine the outermost boundary point set containing all spatial feature point clusters to ensure that all feature point clusters are surrounded by this boundary point set. The specific implementation process is as follows: First, for each spatial feature point cluster, calculate the cluster center. The cluster center is calculated by taking the average of the latitude and longitude coordinates of all feature points in the cluster as the spatial location of the cluster center. Then, calculate the straight-line distance between each feature point in the cluster and the cluster center, and select the feature point farthest from the cluster center and located at the cluster edge. This refers to the outermost boundary point of the cluster. For each feature point cluster, one or more outermost boundary points are selected to ensure that the edge outline of the cluster can be completely delineated. The outermost boundary points of all spatial feature point clusters are summarized to obtain a preliminary set of boundary points. From this preliminary set of boundary points, the outermost boundary points that can encompass all spatial feature point clusters are further selected. The selection method is to determine whether each boundary point is an outer point of all feature point clusters. If there are no spatial feature point clusters outside a certain boundary point, then that point is the outermost boundary point to be retained. If there are still spatial feature point clusters outside that point, then that point is removed. After all the selections are completed, the remaining boundary points together constitute the outermost boundary point set containing all spatial feature point clusters. This is checked and confirmed again to ensure that all spatial feature point clusters are located within the area enclosed by this boundary point set, and no feature point clusters exceed this range.
[0043] The boundary points of the outermost boundary point set are connected sequentially according to their spatial position. Adjacent boundary points are connected by straight lines, and the first and last boundary points are also connected by straight lines, ultimately forming a closed convex polyhedral envelope surface. This convex polyhedral envelope surface can completely cover the entire spatial range of the study area and conforms to the terrain contours of the study area, fully adapting to complex terrains such as granite slopes and shale gullies. Therefore, this convex polyhedral envelope surface is used as the spatial base surface, providing a stable spatial carrier for subsequent analytic element partitioning and raster system construction. On the spatial base surface, initial analytic elements are generated by uniformly dividing the surface according to a preset fixed side length. The preset fixed side length is based on the study area... The overall area and actual early warning accuracy requirements are predetermined. Specifically, the determination method is to combine the total area of the mountainous county and the early warning accuracy required for grassroots disaster prevention and mitigation to calculate a suitable fixed side length. This ensures that the initial analysis elements after division can completely cover the entire spatial base surface, and that the area of each initial analysis element is consistent. This avoids the situation where the side length is too large, which would prevent the subsequent capture of local spatial heterogeneity, or the situation where the side length is too small, which would result in excessive computation. During the division process, a uniform grid division method is adopted. Starting from a vertex of the spatial base surface, the grid is divided sequentially along the length and width of the surface according to the preset fixed side length, dividing the spatial base surface into multiple initial analysis elements with regular shapes and the same size.
[0044] Step 203: Based on the distribution density and attribute differences of the spatial feature point clusters within the initial analysis element, calculate the heterogeneity index of each initial analysis element. For initial analysis elements with heterogeneity indices higher than a preset threshold, further divide the initial analysis element into analysis units of smaller size until the distribution of spatial feature point clusters within all generated analysis units meets the preset uniformity standard, thereby constructing a grid system. The finest division unit of the grid system is defined as the bottom-level grid. Specifically, this includes: First, calculating the heterogeneity index of each initial analysis element. The heterogeneity index is used to measure the uniformity of the distribution and the consistency of attributes of the spatial feature point clusters within the initial analysis element. The higher the index value, the stronger the spatial heterogeneity within the initial analysis element, requiring further subdivision. The calculation of the heterogeneity index is based on the distribution density and attribute differences of the spatial feature point clusters within the initial analysis element. The two indices are weighted and summed to obtain the final heterogeneity index. The specific calculation process is as follows: For the calculation of distribution density, first statistically analyze each initial analysis element... The area of the initial analytical element is divided by the number of spatial feature point clusters within it to obtain the distribution density of these clusters. A higher distribution density indicates a denser distribution of feature point clusters and potentially stronger spatial heterogeneity. For attribute differences, the average core attribute of all spatial feature point clusters within the initial analytical element is calculated. For geological feature point clusters, the core attributes are soil thickness and rock integrity; for precipitation feature point clusters, they are precipitation amount and precipitation intensity. The average core attribute of each type of feature point cluster is calculated, and then the difference between the core attribute value and the corresponding average value is calculated. The absolute values of all differences are summed, and the sum is divided by the number of feature point clusters within the initial analytical element to obtain the average attribute difference of the spatial feature point clusters within the initial analytical element. A higher average attribute difference indicates more significant attribute differences and stronger spatial heterogeneity within the initial analytical element.
[0045] The heterogeneity index is calculated by multiplying the distribution density by a preset density weighting coefficient, and then adding the average attribute difference by a preset difference weighting coefficient. The sum of these two values is the heterogeneity index of the initial analytical element. The density weighting coefficient and the difference weighting coefficient are predetermined based on the historical geological disaster data of the mountainous county, with values ranging from 0.4 to 0.6. The sum of the two is always 1. In areas prone to geological disasters, the density weighting coefficient can be set to 0.5 to 0.6 to highlight the influence of the distribution density of characteristic point clusters on heterogeneity. In areas with fewer geological disasters, the difference weighting coefficient can be set to 0.5 to 0.6. The heterogeneity index of each initial analytical element is compared with a preset threshold. The preset threshold is determined based on the overall spatial heterogeneity of the study area and the early warning accuracy requirements, with a value range of 0. The heterogeneity index ranges from 3 to 0.5. When the heterogeneity index is higher than this threshold, it indicates that the initial analyte has strong spatial heterogeneity and cannot meet the accuracy requirements of subsequent processing. It needs to be further divided into smaller analyte units, with the size of the analyte unit after division being half the size of the original initial analyte. When the heterogeneity index is lower than or equal to this threshold, it indicates that the distribution of feature point clusters in the initial analyte is relatively uniform, and no further subdivision is required. After division, the heterogeneity index of the newly generated analyte units is recalculated using the same method and compared with the preset threshold. Units that still need to be subdivided are processed again, and the calculation, comparison, and subdivision process is repeated until the heterogeneity index of all analyte units is lower than or equal to the preset threshold. All generated analyte units together constitute a multi-level raster system, where the finest division unit is defined as the bottom raster.
[0046] Step 204: Map all spatial feature points to the raster system, establishing the affiliation relationship between each spatial feature point and a bottom-level raster. Based on the affiliation relationship, statistically analyze and calculate the attribute values of all spatial feature points affiliated to the same bottom-level raster to generate aggregated features. Based on the aggregated features, process them according to preset weight calculation rules to generate corrected weights. Specifically, this includes: mapping all spatial feature points generated in step 201 to the raster system constructed in step 203, establishing a unique affiliation relationship between each spatial feature point and a bottom-level raster; during mapping, first extract the latitude and longitude coordinates of each spatial feature point, as well as the maximum and minimum latitude and longitude coordinates of each bottom-level raster, and determine whether the coordinates of each spatial feature point are within the range of a certain bottom-level raster. If they are within the range, they are affiliated to that raster; if they are within the range of multiple raster ranges, they are affiliated to the bottom-level raster closest to the raster center; if they are not within any raster range, they are affiliated to the nearest bottom-level raster and recorded for subsequent data verification; after mapping, check the number of feature points in each bottom-level raster, remove empty rasteres with no feature points affiliated to, and ensure the raster system is effective.
[0047] For geological feature points, the sum, average, and fluctuation range of core geological attributes such as soil layer thickness and rock mass integrity are statistically analyzed. For precipitation feature points, the average and fluctuation range of core precipitation attributes such as precipitation amount and precipitation duration are statistically analyzed using the same method. These are then summarized as the aggregated features of the bottom-level raster and labeled with the corresponding raster identifier. Finally, based on the aggregated features, a correction weight is generated for each bottom-level raster using a preset weight calculation rule. The correction weight ranges from 0 to 2; a higher value indicates stronger spatial heterogeneity within the raster, resulting in a larger correction magnitude for subsequent data. The preset weight calculation rule assigns an attribute weight coefficient to the average value of each attribute in the aggregated features and a fluctuation weight coefficient to the fluctuation range of each attribute. All attribute weights are assigned a weight coefficient. The sum of the values is 0.5, and the sum of all fluctuation weight coefficients is 0.5. The attribute weight coefficients are determined according to the degree of influence of geological and precipitation attributes on geological hazards. The attribute weight coefficients for soil thickness and precipitation range from 0.12 to 0.15, the values for rock mass integrity and precipitation intensity range from 0.08 to 0.1, and the values for groundwater depth, topographic slope, and precipitation duration range from 0.05 to 0.07. The fluctuation weight coefficients are determined according to the regional spatial heterogeneity characteristics. For attributes with large fluctuation ranges, such as soil water retention and soil permeability, the fluctuation weight coefficients range from 0.1 to 0.12. For the next largest fluctuation range, such as rock mass integrity and precipitation intensity, the values range from 0.07 to 0.09. The values for the remaining attributes range from 0.04 to 0.06. During calculation, the average value of each attribute is multiplied by the corresponding attribute weight coefficient and summed to obtain the weighted sum of the attribute average values; the fluctuation range of each attribute is multiplied by the corresponding fluctuation weight coefficient and summed to obtain the weighted sum of the fluctuation ranges; the sum of the two is the correction weight of the underlying raster.
[0048] This embodiment transforms multi-source heterogeneous data into discrete spatial feature points and performs ordered clustering. It then constructs an adaptive raster system by combining feature point clusters. Based on local spatial heterogeneity, it subdivides the analysis units and corrects the generation of weights to accurately quantify the attribute differences of different raster regions, thereby improving the precision of data processing.
[0049] In a preferred embodiment of the present invention, step 3 involves correcting the geological and geomorphological element data using corrected weights to obtain a corrected geological environment parameter field; based on the corrected geological environment parameter field, an evaluation calculation is performed using a BP neural network model to obtain a geological hazard index, which may include:
[0050] Step 301: Based on the corrected weights, the spatial feature point attribute values corresponding to the geological and geomorphological element data belonging to each bottom-level raster are weighted and adjusted to generate rasterized attribute baseline values. Specifically, this includes: for each bottom-level raster, according to the attribution relationship established in step 204, extracting the geological attribute values of all geological spatial feature points within that raster. The specific definitions and value ranges of each geological attribute parameter are as follows: Soil thickness, referring to the vertical thickness of the soil layer at the exploration point corresponding to the geological feature point within the bottom-level raster, is a static geological parameter with a value range of 0.5 meters to 5 meters. Specifically, the soil thickness on granite slopes ranges from 0.5 meters to 2.5 meters, the soil thickness in shale gullies ranges from 1.0 meter to 3.5 meters, and the soil thickness in flat areas ranges from 1.5 meters to 5 meters; Rock mass integrity, represented by the rock mass integrity coefficient, is a static geological parameter with a value range of 0.3 to 0.9. The closer the coefficient is to 0.9, the better the rock mass integrity. The rock mass integrity coefficient for granite areas ranges from 0.6 to 0.9, and for shale areas... The value ranges from 0.3 to 0.6; groundwater depth refers to the vertical distance from the groundwater surface to the surface at the exploration point corresponding to the geological feature point within the bottom grid. It is a static geological parameter, with a value range of 1 meter to 15 meters. For gentle slopes, the value ranges from 1 meter to 8 meters; for shale gully areas, it ranges from 3 meters to 10 meters; and for granite slopes, it ranges from 5 meters to 15 meters. Topographic slope refers to the angle of inclination of the surface at the exploration point corresponding to the geological feature point within the bottom grid. It is a static geological parameter, with a value range of 0 degrees to 60 degrees. For gentle slopes, the value ranges from 0.3 to 0.6. The range is 0 to 15 degrees, with values of 15 to 45 degrees for granite slopes and 30 to 60 degrees for shale gullies. The lithological distribution quantification parameter refers to the quantified value set according to the lithological type corresponding to the geological feature points in the bottom grid. It belongs to static geological parameters. The quantified value corresponding to granite is 1.0 to 1.2, the quantified value corresponding to shale is 0.7 to 0.9, and the quantified value corresponding to other lithologies is 0.8 to 1.0. During the extraction process, the attribute values of each geological feature point are checked one by one to ensure that the extraction is complete, without omissions, and without deviation.
[0051] Based on the correction weights of the underlying raster, each geological attribute value is weighted and adjusted. The correction weights are core parameters generated in step 204 to quantify spatial heterogeneity within the underlying raster, ranging from 0 to 2. Higher values indicate stronger spatial heterogeneity within the raster, resulting in greater subsequent data correction. The adjustment method involves multiplying each geological attribute value of each geological feature point by the corresponding correction weight of the underlying raster to calculate the weighted adjustment value for each geological attribute. The weighted adjustment value is an intermediate parameter in the calculation process, without a fixed range, and is obtained by multiplying the corresponding geological attribute value by the correction weight. For the same geological feature point within the same underlying raster... For geological attributes, the weighted adjustment values of all geological feature points within the raster are summed to obtain the total weighted adjustment of the geological attribute. Then, the total weighted adjustment is divided by the number of geological feature points corresponding to the attribute to calculate the rasterized attribute baseline value of the geological attribute within the raster. The rasterized attribute baseline value is the corrected core output parameter, and its value range is consistent with the original value range of the corresponding geological attribute, namely, soil layer thickness baseline value 0.5 to 5 meters, rock mass integrity baseline value 0.3 to 0.9, groundwater depth baseline value 1 to 15 meters, terrain slope baseline value 0 to 60 degrees, and lithology distribution quantification baseline value 0.7 to 1.2.
[0052] This method sequentially calculates the rasterized attribute baseline values for all geological attributes within the bottom-level raster. Each bottom-level raster corresponds to a complete set of rasterized attribute baseline values, and is uniquely identified to ensure a one-to-one correspondence between baseline values and raster cells. Rasterized attribute baseline values effectively correct for deviations in the original geological data caused by local spatial heterogeneity, closely reflecting the actual geological conditions of different areas such as granite slopes and shale gullies, thus improving data accuracy. After calculation, all rasterized attribute baseline values for each bottom-level raster are verified, and outliers exceeding the normal range are removed to ensure data accuracy. The effectiveness of outlier detection is determined by the core parameters of the rasterized attribute benchmark values of the corresponding geological attribute. The average value is the arithmetic mean of all rasterized attribute benchmark values of a certain geological attribute within the bottom raster, and its range is consistent with the benchmark value of the corresponding geological attribute. The standard deviation is a parameter reflecting the dispersion of the benchmark value of a certain geological attribute within the bottom raster, and its range is 0.1 to 0.2 times the average value of the corresponding geological attribute, specifically determined based on the historical geological data of the region. The outlier detection criterion is a value that exceeds plus or minus 2 times the standard deviation of the average value of the benchmark value of the corresponding geological attribute within the bottom raster.
[0053] Step 302: Based on the rasterized attribute baseline values, an interpolation network is constructed to calculate the parameter values at any point within the study area using the rasterized attribute baseline values of adjacent rasters, generating a corrected geological environment parameter field. Specifically, this includes: based on the rasterized attribute baseline values of all bottom-level rasters obtained in step 301, constructing an interpolation network to transform discrete data into continuous spatial data. During the construction of the interpolation network, the center coordinates of each bottom-level raster are first extracted, i.e., the average of the maximum and minimum latitude and longitude coordinates of the raster, as interpolation nodes. The rasterized attribute baseline value corresponding to each node is used as the node attribute data. Then, all interpolation nodes are connected adjacently in spatial order to form an interpolation network covering the entire study area. In areas with complex terrain, the density of interpolation node connections is appropriately increased to improve interpolation accuracy. After the interpolation network is constructed, the parameter values of any point within the study area are calculated. First, the latitude and longitude coordinates of the point to be calculated are extracted. Then, 4 to 8 nearest neighboring interpolation nodes are selected in the interpolation network, with 8 selected for complex terrain and 4 for flat terrain. The straight-line distance between the point to be calculated and each neighboring node is calculated. The closer the distance, the greater the weight of the node. The weight is calculated by dividing the sum of the distances between the point to be calculated and all neighboring nodes by the distance between the point to be calculated and a single neighboring node, ensuring that the sum of the weights of all neighboring nodes is 1. The rasterized attribute baseline value of each neighboring node is multiplied by the corresponding weight, and the sum is obtained to obtain the geological attribute parameter value of the point to be calculated. All geological attribute parameter values are calculated in this way. The complete geological attribute parameter values are obtained by calculating each point within the study area, which together constitute a corrected geological environment parameter field that continuously covers the entire study area.
[0054] Step 303: Extract a set of location point parameter values from the calibration geological environment parameter field that match the input structure of the BP neural network model to form a multi-parameter input tensor. Specifically, this includes: defining the pre-constructed input structure of the BP neural network model, with 5 neurons in the input layer, corresponding to 5 core geological attribute parameters: soil layer thickness, rock mass integrity, groundwater depth, topographic slope, and lithological distribution quantification parameters; selecting uniformly distributed location points from the calibration geological environment parameter field at a preset sampling interval, the sampling interval determined based on the study area and early warning accuracy, with the number of location points ranging from 1000 to 2000 to ensure coverage of all areas and uniform distribution; extracting... The five core geological attribute parameters for each selected location point are arranged in the order of soil layer thickness, rock mass integrity, groundwater depth, topographic slope, and lithological distribution, forming a one-dimensional parameter vector of dimension 5, which is adapted to the number of neurons in the model input layer. The one-dimensional parameter vectors corresponding to all locations are integrated in spatial order to form a multi-parameter input tensor with a dimension of the number of locations multiplied by 5. After the tensor is constructed, data standardization is performed to map all parameter values to the interval between 0 and 1. The standardization method is to subtract the minimum value of each parameter value and then divide by the difference between the maximum and minimum values of the parameter. The standardized multi-parameter input tensor is used as the final model input data.
[0055] Step 304a: Input the multi-parameter input tensor into the input layer of the pre-trained BP neural network model; the first hidden layer of the BP neural network model receives and processes the multi-parameter input tensor from the input layer, performs the first round of nonlinear transformation and weighted calculation, and generates the first intermediate layer feature data. Specifically, this includes: pre-constructing and training the BP neural network model to adapt to the geological characteristics of mountainous counties and accurately mining the relationship between geological environmental parameters and geological disaster risk index; the model has a four-layer structure, with 5 neurons in the input layer, 7 neurons in the first hidden layer, 6 neurons in the second hidden layer, and 1 neuron in the output layer for outputting the geological disaster risk index; the core parameters of the model are set as follows: the activation function is the Sigmoid function, which maps the input data to... The range is 0 to 1; the loss function is the mean squared error loss function, which measures the difference between the predicted and actual values; the optimizer is the gradient descent optimizer, with a learning rate ranging from 0.008 to 0.012, and a default value of 0.01, balancing training convergence speed and stability; the model training process is as follows: collect no less than 5 years of regional historical geological disaster data and corresponding historical geological environment parameter data, process the historical geological environment parameter data according to steps 301 to 303 to form a multi-parameter input tensor as training input data; quantify the risk level of historical geological disasters, with no risk, low risk, medium risk, high risk, and extremely high risk quantified as 0, 0.2, 0.4, 0.6, and 0.8 respectively, as training label data to ensure a one-to-one correspondence with the input data.
[0056] The training input data and label data are divided into training and validation sets in a 7:3 ratio. Training is set to 100 rounds. After each round, the loss value is calculated using a loss function, and the model connection weights and biases are adjusted using an optimizer. Every 5 rounds, the model performance is validated using the validation set, and the loss value is recorded. A minimum loss threshold of 0.001 is preset. Training stops and model parameters are saved when the validation set loss value falls below this threshold and the change in loss value is less than 0.0001 for three consecutive rounds. After model training, the standardized multi-parameter input tensor is input into the input layer. Five neurons receive the corresponding core geological parameter values and pass them to the first hidden layer. The first hidden layer performs the first round of nonlinear transformation and weighted calculation on the input data. Each neuron first multiplies the input layer parameter value with the corresponding connection weight and sums the results, adds its own preset bias value to obtain the input sum, and then maps it to the 0-1 interval using the Sigmoid activation function to obtain the neuron's output value. The output values of the seven neurons together constitute the feature data of the first intermediate layer.
[0057] Step 304b: The second hidden layer of the BP neural network model receives and processes the feature data of the first intermediate layer, performs a second round of nonlinear transformation and weighted calculation, and generates the feature data of the second intermediate layer. Specifically, the second hidden layer receives the feature data of the first intermediate layer and transmits it completely to each neuron. Following the same calculation logic as the first hidden layer, it performs a second round of nonlinear transformation and weighted calculation. Each neuron multiplies the feature value of the first intermediate layer with the corresponding connection weight and sums the sum, adds its own preset bias value to obtain the input sum, maps it to the 0 to 1 interval through the Sigmoid activation function, and obtains the output value of the neuron. The output values of the six neurons together constitute the feature data of the second intermediate layer.
[0058] Step 304c: The output layer of the BP neural network model receives and processes the feature data from the second intermediate layer, performing final linear or nonlinear calculations to obtain the geological hazard index. Specifically, this includes: the output layer receiving all feature values from the second intermediate layer and transmitting them to the output layer neurons; multiplying each feature value by its corresponding connection weight and summing the results; adding the preset bias value of the output layer to obtain the input sum; mapping the sum to the 0-1 range using the Sigmoid activation function; and the output value being the geological hazard index for the corresponding location. A larger value indicates a higher potential risk of geological disasters at that location, while a smaller value indicates a more stable geological foundation. Geological hazard indices vary significantly across different areas; some locations in granite slopes and shale gullies have higher indices, while flatter areas have lower indices. The geological hazard indices of all locations are summarized, and their corresponding coordinates and area information are labeled to form a geological hazard index set, providing accurate geological risk quantification for subsequent comprehensive meteorological risk warnings. After the geological hazard index calculation is completed, accuracy verification is performed by selecting historical disaster-occurring locations and disaster-free locations, comparing the actual risk level quantification values with the calculated geological hazard index.
[0059] In this embodiment, the construction, training, and implementation of the BP neural network model fully incorporates historical data from the study area. Through multiple rounds of training and validation, the model parameters are optimized to ensure that the model can adapt to the geological characteristics of the area. The model's double-layer hidden layer structure can fully extract complex features from the geological environment parameters, thereby improving the calculation accuracy of the geological hazard index.
[0060] In a preferred embodiment of the present invention, step 4, calculating the daily comprehensive effective precipitation dataset based on precipitation monitoring data, and obtaining the precipitation critical probability based on the statistical correlation between the daily comprehensive effective precipitation dataset and geological hazards, may include:
[0061] Step 401: Based on precipitation monitoring data, obtain the real-time and historical multi-day raw precipitation sequences for each precipitation monitoring station within the study area. Based on the raw precipitation sequences, define an initial circular spatial influence range centered on each precipitation monitoring station. Specifically, this includes: First, collecting precipitation monitoring data from all precipitation monitoring stations within the study area. These stations include those previously deployed in different areas of the mountainous county and existing meteorological observation stations within the area, covering different terrain areas such as steep slopes, valleys, and gentle slopes in the mountainous county, ensuring that the monitoring data comprehensively covers the precipitation situation in the study area. The monitoring data includes real-time precipitation data and historical multi-day precipitation data. Real-time precipitation data is collected hourly and summarized into the daily real-time precipitation at the end of the day. Historical multi-day precipitation data... The time span is no less than five years, with priority given to historical data within the last ten years, to ensure coverage of different precipitation seasons and intensities. This aligns with the characteristics of uneven precipitation distribution, strong seasonality, and concentrated heavy rainfall in mountainous counties, comprehensively reflecting the precipitation patterns in the study area. For each precipitation monitoring station, the collected real-time precipitation data is organized chronologically and summarized by calendar day to form a daily real-time precipitation record. Simultaneously, historical precipitation data from multiple days is also organized by calendar day, from morning to evening, to form a unique raw precipitation sequence for each monitoring station. The raw precipitation sequence contains the daily precipitation record for that station. If there is no precipitation on a given day, the precipitation record is recorded as zero. If precipitation is interrupted on a given day, the total daily precipitation is still summarized by calendar day, ensuring that each calendar day has a unique corresponding precipitation record.
[0062] After data processing, the raw precipitation sequences for each precipitation monitoring station were checked one by one. Records with missing or abnormal data were removed. Abnormal data were defined as those where the precipitation on a given day exceeded three times or more the average precipitation for the same period (month and solar term) at that station, and there was no corresponding record of heavy rain. Such abnormal data was directly removed. For missing dates, if the missing dates were no more than three consecutive days, the average precipitation of the adjacent dates was used to fill the gaps. If the missing dates were more than three consecutive days, the time period corresponding to the missing data was removed to ensure the completeness and accuracy of the raw precipitation sequences. To avoid data issues affecting subsequent calculation results, an initial circular spatial influence range is defined with each precipitation monitoring station as the center. Combining the distribution density of precipitation monitoring stations in mountainous counties and the terrain characteristics, the straight-line distances between all precipitation monitoring stations in the study area are first counted to calculate the average spacing. The center of the initial circular spatial influence range is the actual geographical coordinates (latitude and longitude) of the precipitation monitoring station. The initial radius is set to one-third of the average spacing to ensure that the initial circular spatial influence range does not contain too many adjacent monitoring stations, while also covering the precipitation area that the station can effectively monitor.
[0063] Step 402 involves iteratively expanding the radius of the initial circular spatial influence area. After each expansion, the number of newly included neighboring stations is counted. Expansion stops when the number of neighboring stations reaches a preset gain threshold, thus determining the final effective influence area. Specifically, this includes: first, clarifying the rules for iterative expansion. The radius expansion range remains consistent with each iteration. Considering the complexity of the mountainous terrain and the distribution of monitoring stations, the expansion range is fixed at 500 meters each time. This range is determined based on the precipitation propagation distance and terrain barrier characteristics in mountainous areas, ensuring the stability and rationality of the iteration process. It avoids including too many irrelevant stations at once due to an excessively large expansion range, or having too many iterations and low efficiency due to an excessively small expansion range. Iterative expansion begins with the initial circular spatial influence area of each precipitation monitoring station. After the first expansion, a new circular spatial influence area is obtained, with a radius equal to the initial radius plus 500 meters. Then, the number of neighboring precipitation monitoring stations newly included within this circular spatial influence area after the expansion is counted. The number of newly included neighboring stations refers to the area above the initial radius. The number of precipitation monitoring stations that have been expanded from the original circular spatial influence range to the new circular spatial influence range is determined by extracting the geographical coordinates of each adjacent station and calculating the straight-line distance between the adjacent station and the central station using the geographical distance calculation formula. The distance is then checked against the radius of the expanded circle; if it is, the station is considered newly included. The coordinates of each adjacent station are verified to confirm whether it is covered by the new circular spatial influence range, ensuring accurate statistical results and preventing over- or under-counting. After the statistics are completed, the number of newly included adjacent stations is compared with a preset gain threshold. The preset gain threshold is determined based on the distribution density of precipitation monitoring stations in the study area, topographic heterogeneity, and the actual needs of the precipitation influence range. It is generally set to 3 to 5 stations, with 3 stations set for flat areas with higher distribution density and 5 stations set for steep slopes and valleys with lower distribution density. Its purpose is to determine whether the current circular spatial influence range has covered enough adjacent stations to comprehensively reflect the precipitation impact of the central station.If the number of newly included neighboring stations does not reach the preset gain threshold, the next iteration of expansion continues, repeating the above expansion, statistics, and comparison process, with each expansion increasing the radius by 500 meters. If the number of newly included neighboring stations reaches the preset gain threshold, the expansion of the circular spatial influence range of the precipitation monitoring station is stopped, and the current circular spatial influence range is the final effective influence area of the precipitation monitoring station. If, after multiple iterations of expansion, the radius of the circular spatial influence range has reached the maximum limit radius set for the study area (generally set to 5 kilometers based on the overall area of the study area), and the number of newly included neighboring stations still does not reach the preset gain threshold, the expansion is also stopped, and the current circular spatial influence range is taken as the final effective influence area of the precipitation monitoring station. This avoids unlimited expansion that could lead to the influence range exceeding the study area or including too many irrelevant areas. Each precipitation monitoring station determines its final effective influence area through the above iterative expansion process.
[0064] Step 403a: Based on the original precipitation sequence and the preset attenuation coefficient sequence within the final effective influence area, the daily comprehensive effective precipitation corresponding to each central station is generated through weighted calculation and accumulation, and a daily comprehensive effective precipitation dataset is generated. Specifically, this includes: obtaining the original precipitation sequence of all precipitation monitoring stations within the final effective influence area of each precipitation monitoring station, including the original precipitation sequence of the central station itself and the original precipitation sequences of all adjacent precipitation monitoring stations included in the final effective influence area; extracting the daily precipitation data of each station one by one, classifying and organizing it by date to ensure that the extracted data accurately corresponds to the corresponding date; grouping the precipitation data of all stations on the same date together for easy subsequent daily weighted calculation; and determining the preset attenuation coefficient sequence, which is based on the distance between precipitation monitoring stations. The core principle is that the farther the adjacent station is from the central station, the smaller its contribution to the precipitation in the central station's influence area, and the smaller the corresponding attenuation coefficient. The closer an adjacent station is to the central station, the greater its contribution to the precipitation in the central station's influence area, and the larger its corresponding attenuation coefficient. The central station's own attenuation coefficient is set to 1, which is the maximum attenuation coefficient, ensuring that the central station's own precipitation data dominates the calculation and avoiding interference from precipitation data from adjacent stations in the central station's precipitation assessment. Each attenuation coefficient in the attenuation coefficient sequence is calculated based on the actual distance between the central station and adjacent stations. Specifically, the straight-line distance between the central station and adjacent stations is divided by the radius of the central station's final effective influence area to obtain the distance proportion. This distance proportion is then subtracted from 1 to obtain the corresponding attenuation coefficient. The attenuation coefficient is controlled between 0.1 and 1. If the calculated attenuation coefficient is less than 0.1, it is set to 0.1 to ensure the attenuation coefficient is reasonable and reflects the impact of mountainous terrain on precipitation propagation. In areas with severe terrain obstruction, the above calculation results can be multiplied by a correction factor of 0.8 to further reflect the actual precipitation impact.
[0065] For each central station, its corresponding daily comprehensive effective precipitation is calculated. The calculation process is carried out daily, and the calculation method is consistent every day. Specifically, the precipitation data of the central station for that day is extracted first. The precipitation data of the central station for that day is multiplied by the attenuation coefficient 1 corresponding to the central station to obtain the weighted value of the precipitation of the central station for that day. This weighted value is the precipitation of the central station for that day. Then, the precipitation data of each adjacent station in the final effective influence area for that day is extracted one by one. The precipitation data of each adjacent station for that day is multiplied by the attenuation coefficient corresponding to the adjacent station to obtain the weighted value of the precipitation of each adjacent station for that day. The weighted value of each adjacent station is calculated one by one, without omitting any adjacent station included in the effective influence area. Then, the weighted value of the precipitation of the central station for that day is added to the weighted values of the precipitation of all adjacent stations for that day. The sum is the daily comprehensive effective precipitation of the central station for that day. During the addition process, each weighted value is checked one by one to ensure that the addition calculation is accurate. Following the above method, calculate the daily comprehensive effective precipitation for each central station one by one. After the daily calculation is completed, record the corresponding information of the central station, the date, and the daily comprehensive effective precipitation value. Then, compile and summarize the daily comprehensive effective precipitation of all central stations to form a daily comprehensive effective precipitation dataset.
[0066] Step 403b involves dividing the daily comprehensive effective precipitation dataset into multiple consecutive numerical intervals based on the daily comprehensive effective precipitation dataset and the number of geological disaster occurrence samples within each interval. The frequency of geological disaster occurrences is calculated based on the number of geological disaster occurrence samples and the number of daily comprehensive effective precipitation samples. This includes collecting historical geological disaster occurrence records within the study area, ensuring that the time span of these records matches the time span of the original precipitation sequence (both are no less than five years) to guarantee accurate correspondence. The historical geological disaster occurrence records include the specific date, geographical coordinates of the location, and disaster type information for each event. The records are then organized, and incomplete or unclear records are removed. Records with unclear locations refer to those without specific geographical coordinates and cannot be located to a specific area; such records cannot be correlated with precipitation data and are directly removed to ensure the accuracy and usability of the historical geological disaster occurrence records.
[0067] Extract all daily comprehensive effective precipitation values from the daily comprehensive effective precipitation dataset, calculate the maximum and minimum values, analyze the distribution range of these values, and, considering the correlation between geological disasters and precipitation in mountainous counties, divide the values in the daily comprehensive effective precipitation dataset into multiple continuous numerical intervals. During the division, ensure that each numerical interval has a uniform span of 5 mm, starting from the minimum value of zero and dividing upwards sequentially until the maximum value in the daily comprehensive effective precipitation dataset is covered. For example, if the minimum value is 0 mm and the maximum value is 100 mm, then the intervals are 0 to 5 mm, 5 to 100 mm, etc. The range is from 10 to 15 millimeters, and so on, up to 95 to 100 millimeters. Each numerical interval is a left-closed, right-open interval, meaning it includes the left endpoint but not the right endpoint, ensuring seamless connection between adjacent intervals without any gaps. At the same time, it takes into account the actual situation of geological disasters. If the numerical distribution in a certain interval is small, two adjacent intervals can be merged into one interval, so that each numerical interval has a certain number of daily comprehensive effective precipitation samples. This avoids inaccurate statistical results due to insufficient sample size in a certain interval. The values in each daily comprehensive effective precipitation dataset can be divided into the corresponding intervals without omission.
[0068] After the division is completed, the number of relevant samples for each numerical interval is counted one by one. First, the number of daily comprehensive effective precipitation samples in each numerical interval is counted, which is to count the number of all values in the daily comprehensive effective precipitation dataset that belong to that numerical interval. Each value corresponds to one sample. During the statistics, each daily comprehensive effective precipitation value is extracted one by one, and its interval is determined. The interval to which each value belongs is checked one by one, and the number of samples in each interval is recorded to ensure the accuracy of the statistical results. After the statistics are completed, the number of samples in each interval is recorded in relation to that interval for easy verification later. Then, the number of geological disaster samples within each numerical interval is counted. Combining historical geological disaster records, the specific date of each geological disaster is found. Then, the daily comprehensive effective precipitation values of all central stations corresponding to that date are found, and the numerical interval to which these values belong is determined. For each date of a geological disaster, regardless of how many central stations' precipitation values correspond to it, as long as these values belong to a certain interval, a geological disaster sample is added to that interval. That is, one disaster date corresponds to one disaster sample, avoiding the same disaster date being counted multiple times. During the statistics, the date of each geological disaster record and the corresponding daily comprehensive effective precipitation value are checked one by one to confirm the corresponding interval, ensuring the accuracy of the number of geological disaster samples counted and avoiding over- or under-counting. After the sample size is counted, the frequency of geological disasters corresponding to each numerical interval is calculated. The calculation method is to divide the number of geological disaster samples in each numerical interval by the number of daily comprehensive effective precipitation samples in that numerical interval. The calculation is described in words as the number of geological disaster samples divided by the number of daily comprehensive effective precipitation samples. The result is the frequency of geological disasters corresponding to that numerical interval. The frequency value is rounded to four decimal places. Each numerical interval corresponds to a unique frequency of geological disasters.
[0069] Step 403c: Based on all numerical intervals and their corresponding geological disaster occurrence frequencies, establish the correspondence between the daily comprehensive effective precipitation numerical intervals and the geological disaster occurrence frequencies. Specifically, this includes: organizing all the divided daily comprehensive effective precipitation numerical intervals, arranging them in ascending order of interval size, and organizing the geological disaster occurrence frequencies corresponding to each numerical interval, matching the intervals with their corresponding frequencies one by one, ensuring that each numerical interval has a corresponding geological disaster occurrence frequency, without omissions or errors in correspondence. During the organization process, the upper and lower limits of each interval, the number of daily comprehensive effective precipitation samples, the number of geological disaster occurrence samples, and the corresponding occurrence frequencies are marked to form a preliminary list of correspondences.
[0070] Establish a correspondence between the two values according to the numerical range from smallest to largest, clarifying the frequency of geological disasters corresponding to each daily comprehensive effective precipitation range. For example, the frequency of occurrence for the 0 to 5 mm range is 0.0012, and for the 5 to 10 mm range it is 0.0035, and so on. This ensures that if the range to which a given daily comprehensive effective precipitation value belongs can be known, the corresponding frequency of geological disasters can be quickly found. When establishing the correspondence, the characteristics of geological disasters in mountainous counties should be considered, focusing on ranges with larger daily comprehensive effective precipitation values. These ranges usually correspond to higher frequencies of geological disasters and are the focus of subsequent precipitation critical probability calculations. The frequency calculation results for these high-frequency ranges need to be checked one by one to ensure accuracy. After the correspondence is established, a comprehensive check should be conducted to verify the correctness of the frequency of geological disasters corresponding to each numerical range. The check should include the upper and lower limits of the range, the number of occurrence samples, the sample size, and the frequency calculation results to ensure accuracy. At the same time, a clear list of correspondences should be compiled, arranged from smallest to largest range, clearly marking the relevant information and corresponding frequency for each range.
[0071] Step 404: Based on the correspondence, map each value in the daily comprehensive effective precipitation dataset to the corresponding frequency of geological disasters to obtain the corresponding critical precipitation probability. Specifically, this includes: extracting each value from the daily comprehensive effective precipitation dataset and processing them one by one according to the station number and date, ensuring no value is missed. During processing, the corresponding central station number and date are recorded simultaneously for subsequent association and organization. For each daily comprehensive effective precipitation value, first refer to the correspondence established in step 403c to determine its value range. During the determination, carefully check the value size and the upper and lower limits of the range to ensure accurate range determination. If the value is exactly equal to the right endpoint of a certain range, then... The data is then divided into intervals to ensure consistent judgment rules. For example, a value of 7 mm can be determined to belong to the 5-10 mm interval by referring to the corresponding relationship. After finding the corresponding value interval, the frequency of geological disasters occurring in that interval is queried. This frequency is the critical probability of precipitation corresponding to the daily comprehensive effective precipitation value. It means the possibility of geological disasters occurring under that precipitation level. The larger the value, the higher the probability of precipitation inducing geological disasters, and vice versa. For example, the interval frequency corresponding to 7 mm is 0.0035, which means the probability of geological disasters occurring under that precipitation level is 0.35%. Following the above method, all values in the daily comprehensive effective precipitation dataset are mapped to the corresponding critical probability of precipitation.
[0072] In this embodiment, the calculation process of the critical probability of precipitation fully combines the characteristics of precipitation distribution and the occurrence pattern of geological disasters in mountainous counties. By using real-time and historical precipitation monitoring data, the effective influence area of precipitation monitoring stations is determined through iterative expansion. The daily comprehensive effective precipitation is calculated by weighted accumulation in combination with the attenuation coefficient. Then, the critical probability of precipitation is obtained through statistical correlation. This can accurately quantify the impact of precipitation on the occurrence of geological disasters and solve the problem of inaccurate assessment of the impact of precipitation.
[0073] In a preferred embodiment of the present invention, step 5, which involves fusing the geological disaster risk index and the precipitation critical probability based on a logistic regression model to obtain a comprehensive meteorological risk warning probability, may include:
[0074] Step 501: Construct a fused feature vector based on the geological hazard index and precipitation critical probability. Input the fused feature vector into a pre-trained logistic regression model. Specifically, this includes: clarifying that the fused feature vector is constructed based on the geological hazard index calculated in Step 3 and the precipitation critical probability calculated in Step 4. The geological hazard index reflects the basic geological risk at different spatial locations in the study area, encompassing the comprehensive influence of geological factors such as topographic slope, lithology, and groundwater depth. A higher value indicates a more unstable geological foundation and a higher basic risk of inducing geological disasters. The precipitation critical probability reflects the probability of geological disasters occurring under different precipitation conditions. A higher value indicates a higher risk of geological disasters induced by the current precipitation. The combination of the two can comprehensively reflect the comprehensive risk of geological disasters, avoiding the limitations of single-factor assessment and conforming to the occurrence pattern of geological disasters in mountainous counties. Construct a fused feature vector for each spatial location point in the study area divided into 100m × 100m grids. Each grid cell corresponds to a unique spatial location point and a unique fused feature vector. During construction, the geological hazard index corresponding to that location point is first extracted, which is obtained by weighted calculation of geological data in step 3, with a value range of 0 to 1. Then, the precipitation critical probability corresponding to that location point is extracted, which is obtained by mapping in step 4, with a value range of 0 to 1. During extraction, it is ensured that the two values accurately correspond, with no missing data or correspondence errors. If any data point is missing for any location, it is discarded and does not participate in subsequent fusion calculations. The two values are arranged in a fixed order: geological hazard index first, precipitation critical probability second, forming the fused feature vector for that location point. For example, if the geological hazard index for a location point is 0.6 and the precipitation critical probability is 0.0035, its fused feature vector is 0.6, 0.0035. It is ensured that the fused feature vector structure of all location points is consistent, all being two-dimensional vectors, which facilitates processing by the logistic regression model and avoids model calculation anomalies caused by inconsistent vector structures. After construction, each fused feature vector is checked to ensure that the geological hazard index and precipitation critical probability accurately correspond to the corresponding spatial location points. The check includes the coordinates of the spatial location points and the accuracy of the two values to avoid affecting the subsequent model calculation results due to errors in the fused feature vectors. All constructed fused feature vectors are input into the pre-trained logistic regression model. Before input, it is confirmed that the model input layer is a two-dimensional input, consistent with the dimension of the fused feature vectors, to ensure that the vector structure meets the model input requirements. During the input process, each fused feature vector is transmitted one by one and associated with its corresponding spatial location point coordinates to ensure that all vectors are accurately received by the model without omissions or transmission errors, thus preparing for subsequent model calculations.
[0075] The construction, training, and implementation process of a logistic regression model, which integrates the geological disaster risk index and precipitation critical probability to accurately output a comprehensive meteorological risk warning probability, aligns with the comprehensive geological disaster early warning needs of mountainous counties, is as follows: The core of constructing the logistic regression model is determining the input layer, output layer, and parameters to ensure the model structure meets the integration and early warning requirements. The input layer corresponds to the fusion feature vector. Since the vector contains two components—the geological disaster risk index and the precipitation critical probability—the input layer dimension is set to two, with two neurons receiving each component. The input range is set to 0 to 1, consistent with the component value range, to avoid calculation anomalies caused by input data exceeding the receiving range. The output layer corresponds to the comprehensive meteorological risk warning probability, which is a value between 0 and 1. This model is used to quantify the comprehensive risk of geological disasters at different spatial locations. The larger the value, the higher the comprehensive risk. Therefore, the output layer dimension is set to one, and the neuron output range is 0 to 1 to meet the risk quantification requirements. The model parameters include the weights and bias values corresponding to two feature components. The first weight corresponds to the geological disaster risk index, and the second weight corresponds to the precipitation critical probability. The weights are used to measure the degree of influence of the two components on the comprehensive early warning probability. Considering the characteristics of geological disasters in mountainous counties, the two weights are initially set to be of equal importance. The bias value is used to adjust the model output benchmark to avoid output deviation caused by input data offset. After the model is built, the parameters are initialized. The initial values of the weights are all set to 0.5, and the initial values of the bias values are set to zero to ensure reasonable initialization and lay the foundation for subsequent parameter optimization.
[0076] The core of training a logistic regression model is to optimize model parameters through a large number of training samples, ensuring that the model can accurately capture the correlation between the geological disaster risk index, the critical probability of precipitation, and the probability of comprehensive meteorological risk warning, thus aligning with the characteristics of geological disasters in mountainous counties. The specific process is as follows: Training samples are collected from historical data of the study area to ensure representativeness and relevance. Each sample contains a fusion feature vector and a label value. The fusion feature vector is constructed from historical geological disaster risk indices and historical critical probabilities of precipitation, corresponding to the same spatial location and the same date. The label value is determined based on historical geological disaster records; if a geological disaster occurs at a certain spatial location at a certain time, the label value is set to one; otherwise, it is set to zero, used only to identify the risk status of the sample. The number of training samples should be no less than 1000, covering different geological disaster risk indices and different critical probabilities of precipitation, while also encompassing different terrain areas within the study area, such as granite slopes, shale gullies, and gentle slopes, to avoid insufficient or singular samples leading to inadequate model training. After collection, the samples are randomly divided into training and validation sets in a 7:3 ratio. 70% is used for parameter optimization and 30% is used to verify the training effect. After the division, the samples are checked to ensure that there are no duplicates, no missing parts, no label errors, and that the distribution of the two sets of samples is consistent with the overall sample.
[0077] Initiate model training, conducting a fixed number of iterations from 1000 to 5000, which can be adjusted based on training results. Each iteration proceeds as follows: Input the fused feature vectors from the training set one by one into the initialized model. The model calls the initial parameters to calculate the predicted value for each sample. This predicted value is a value between 0 and 1, used to predict the comprehensive risk probability corresponding to the sample. Then, compare the predicted value of each sample with the label value, and calculate the error using mean squared error. The calculation method is to sum the squared differences between the predicted value and the label value for each sample, and then divide by the number of training samples. The smaller the mean squared error, the more accurate the model's prediction results. Adjust the model weights and biases based on the error. The adjustment principle is to reduce the prediction error. If the error is large, appropriately increase the weight of the corresponding component. If the overall prediction value is low, appropriately increase the bias value. The adjustment range is controlled between 0.01 and 0.1 to ensure stable parameter adjustment and avoid model oscillation and non-convergence. After each iteration, input the vectors from the validation set into the current model, calculate the mean squared error of the validation set, and record the error changes of the training set and validation set to form an error change curve for easy observation of the training effect.
[0078] Continuous iterative training continues until the stopping condition is met: the mean squared error of the validation set drops below a preset threshold of 0.05, and after 50 consecutive iterations the validation set error shows no significant decrease, and the difference between two iterations is less than 0.001. This ensures the model is sufficiently trained, accurately capturing risk correlations while avoiding overtraining that could lead to a decline in generalization ability. After training, the optimized model parameters, namely the weights and biases of the two feature components, are saved to the model file for subsequent comprehensive early warning probability calculations. The trained model can accurately capture the correlation among the three factors, fitting the characteristics of geological disasters in mountainous counties, such as those in granite slope areas. The risk index has a higher weight, and the critical probability of precipitation in areas with concentrated heavy rainfall has a higher weight. The implementation process of the logistic regression model is as follows: the constructed fusion feature vector is input into the trained model, the model calls the saved optimization parameters to calculate the vector, and finally outputs the comprehensive meteorological risk warning probability. During the implementation process, it is necessary to ensure the accuracy of parameter calls, check the parameter values when reading the weights and bias values in the model file, and avoid reading errors; ensure the stability of the calculation process, without program anomalies, and avoid output deviations caused by parameter calls or calculation errors. At the same time, associate the spatial location coordinates of each vector to ensure that the output comprehensive warning probability corresponds accurately with the location point.
[0079] Step 502: The logistic regression model performs linear combination calculations on each component of the fused feature vector to generate a linear weighted sum; based on the linear weighted sum, a nonlinear transformation calculation is performed using a logistic function to generate an intermediate probability value; based on the intermediate probability value, a normalization process is applied to obtain the comprehensive meteorological risk warning probability. Specifically, after receiving the fused feature vector, the model performs linear combination calculations on the two components to generate a linear weighted sum. The calculation process involves multiplying the geological disaster risk index by the corresponding optimized weight, multiplying the precipitation critical probability by the corresponding optimized weight, adding the two products, and then adding the model bias value. The sum is the linear weighted sum. Verification is required during calculation. The correspondence between weights and components ensures the accuracy of multiplication and addition calculations. For example, if a fused feature vector is 0.6 and 0.0035, and the optimized geological disaster risk index weight is 0.6 and the precipitation critical probability weight is 0.4, with a bias value of 0.02, the linear weighted sum is 0.6 multiplied by 0.6 plus 0.0035 multiplied by 0.4 plus 0.02, resulting in a value of 0.3814. The linear weighted sum can comprehensively reflect the combined influence of the two components. The larger the weight of the component, the more significant its influence, which aligns with the geological disaster occurrence patterns in mountainous counties. For example, the geological disaster risk index weight is higher in steep slope areas, and the precipitation critical probability weight is higher in areas with frequent rainstorms.
[0080] After the linear weighted sum is generated, an intermediate probability value is generated through a non-linear transformation using a logic function. The logic function used is the Sigmoid function, which converts the linear weighted sum into a value between 0 and 1, initially reflecting the probability of geological disasters and meeting the needs of comprehensive early warning probability quantification. The transformation process involves inputting the linear weighted sum into the Sigmoid function, which calculates the intermediate probability value. The Sigmoid function calculation logic is as follows: , among which The sum is a linear weighted sum, with e taking a value of approximately 2.71828. After conversion, a larger linear weighted sum corresponds to an intermediate probability value close to 1, while a smaller linear weighted sum corresponds to an intermediate probability value close to 0. This aligns with the requirements of probability quantification. A larger intermediate probability value indicates a higher probability of geological disasters occurring at the corresponding spatial location, and vice versa. For example, the linear weighted sum above is 0.3814, and the intermediate probability value obtained after inputting it into the Sigmoid function is approximately 0.594. After conversion, it is necessary to repeat the calculation and check to avoid conversion errors.
[0081] After the intermediate probability values are generated, the final comprehensive meteorological risk warning probability is obtained through normalization. The purpose of normalization is to ensure that the comprehensive warning probability of all spatial locations is within the range of 0 to 1, reasonably reflecting risk differences, avoiding warning deviations caused by unreasonable distribution of intermediate probability values, and eliminating deviations in calculations from different batches. The specific processing procedure is as follows: First, the intermediate probability values of all spatial locations are statistically analyzed, and the maximum and minimum values are extracted and recorded to ensure statistical accuracy. For example, the maximum value is 0.95 and the minimum value is 0.12. Then, the minimum value is subtracted from the intermediate probability value of each location to obtain the difference. This difference is then divided by the difference between the maximum and minimum values. The result is the comprehensive meteorological risk warning probability of that location. During the calculation, it is necessary to ensure that the maximum and minimum values are statistically accurate and that the division is performed correctly. For example, the intermediate probability value above is 0.594, the difference is 0.594 minus 0.12 equals 0.474, and the difference between the maximum and minimum values is 0.95 minus 0.12 equals 0.83. The corresponding comprehensive meteorological risk warning probability is approximately 0.571.
[0082] In this embodiment, the construction, training, and implementation of the logistic regression model closely aligns with the actual geological disaster situation in mountainous counties. By optimizing model parameters based on historical data, it can effectively integrate the geological disaster risk index and precipitation critical probability, achieving a comprehensive assessment of basic geological risk and meteorological precipitation risk. This avoids the limitations of single-factor assessment and enhances the rationality and reliability of comprehensive risk assessment.
[0083] In a preferred embodiment of the present invention, step 6, which involves matching a preset meteorological disaster risk classification threshold with the comprehensive meteorological risk warning probability to determine the warning level and obtain the meteorological risk warning result, may include:
[0084] Step 601: Compare the comprehensive meteorological risk warning probability with the preset meteorological disaster risk classification thresholds to obtain the risk level range of the comprehensive meteorological risk warning probability. Specifically, this includes: determining the preset meteorological disaster risk classification thresholds. These thresholds are determined based on the geological disaster prevention and control needs of mountainous counties, historical disaster occurrence patterns, and grassroots disaster prevention and mitigation capabilities. The number of thresholds matches the number of warning levels. Typically, four thresholds are set to correspond to five risk level ranges. The four thresholds are 0.2, 0.4, 0.6, and 0.8, set according to the historical disaster occurrence probability distribution to ensure that the risk level corresponding to each threshold aligns with actual prevention and control needs. The risk level ranges corresponding to the thresholds are: below 0.2 is extremely low risk, 0.2 to 0.4 is low risk, 0.4 to 0.6 is medium risk, 0.6 to 0.8 is high risk, and above 0.8 is extremely high risk, aligning with the geological disaster prevention and control needs of mountainous counties. The actual risk distribution of geological disasters is analyzed, with high-risk and extremely high-risk zones mainly corresponding to disaster-prone areas such as steep slopes and valleys. Once determined, the critical values are kept fixed and used to compare the comprehensive early warning probabilities of all spatial locations. Verification of the critical values ensures their reasonableness and accuracy, avoiding errors in early warning level judgments due to improperly set critical values. Subsequently, the comprehensive early warning probability of each spatial location is extracted and compared one by one in ascending order of critical values. It is determined which critical value the probability is greater than or equal to and which is less than, thus determining the corresponding risk level zone. During the comparison, the probability and critical value are carefully verified to ensure accuracy. For example, a probability of 0.15 falls within the extremely low-risk zone (0-0.2), 0.3 within the low-risk zone (0.2-0.4), 0.5 within the medium-risk zone (0.4-0.6), 0.7 within the high-risk zone (0.6-0.8), and 0.85 within the extremely high-risk zone (above 0.8).
[0085] Step 602: Based on the risk level range, match the corresponding preset warning level to determine the preliminary warning level for each spatial location. This includes establishing a correspondence between risk level ranges and preset warning levels. Combining grassroots disaster prevention and mitigation needs with the geological disaster prevention and control procedures in mountainous counties, each range corresponds to a unique warning level. There are five warning levels, from low to high, corresponding to different prevention and control requirements. Specifically, the extremely low risk range of 0 to 0.2 corresponds to no warning level; the low risk range of 0.2 to 0.4 corresponds to a blue warning level; the medium risk range of 0.4 to 0.6 corresponds to a yellow warning level; the high risk range of 0.6 to 0.8 corresponds to an orange warning level; and the extremely high risk range of 0.8 and above corresponds to a red warning level. The red warning level has the highest prevention and control requirements, necessitating the activation of an emergency response; the orange warning level is next, requiring strengthened patrols and monitoring; the yellow warning level requires preparedness for prevention and control; the blue warning level requires strengthened monitoring; and the no warning level indicates extremely low risk, requiring only routine monitoring. This approach aligns with grassroots disaster prevention and mitigation capabilities and work priorities, avoiding unreasonable warning levels that could lead to wasted prevention and control resources or inadequate prevention and control. After establishing the correspondence, check the warning level corresponding to each interval to ensure there are no errors in the correspondence, and organize it into a list of correspondences for easy query and matching in the future. For each spatial location point, according to the risk level interval determined in step 601, query the corresponding warning level as the preliminary warning level, and check the interval and preliminary warning level of each location point to ensure there are no omissions or errors. For example, the preliminary warning level corresponding to a probability of 0.3 is blue, and the preliminary warning level corresponding to a probability of 0.7 is orange.
[0086] Step 603: Process the preliminary warning level to generate warning areas of different levels; based on the warning areas of different levels and their corresponding warning levels, obtain the meteorological risk warning results, which specifically includes: collecting the preliminary warning levels and corresponding geographic coordinates of all spatial locations, organizing them into a complete dataset, ensuring that the warning level and coordinates of each location point are accurately matched, with no missing data or corresponding errors, and removing location points with missing coordinates or warning levels from the warning area generation to avoid affecting the accuracy of the area. Subsequently, the initial warning levels were integrated. The core process involved consolidating spatial locations with the same warning level into continuous warning areas. These areas were processed sequentially from lowest to highest warning level, starting with no warning level, followed by blue, yellow, orange, and red warning levels, to avoid confusion between different levels. During processing, discrete coordinate points were fitted based on the geographic coordinates of all locations within the same warning level. Specifically, all discrete coordinate points of the same warning level were first systematically organized by longitude and latitude, eliminating outliers with excessive coordinate deviations. Then, considering the mountainous terrain, and adhering to the principles of proximity and conforming to terrain barriers, the organized discrete coordinate points were continuously fitted together. By connecting the coordinate points to form a closed contour, a continuous polygonal region is formed, which is the warning area corresponding to the warning level. During the fitting process, the characteristics of mountainous terrain are taken into account, and the terrain barriers such as ridges, valleys, and roads are considered to ensure that the regional boundaries conform to the actual geographical terrain and avoid isolated areas that cross terrain obstacles. For example, discrete points on both sides of a ridge are not fitted into a continuous region to ensure the rationality of the warning area and to conform to the actual spatial distribution characteristics of the disaster. At the same time, isolated warning points with too small an area are removed. The judgment criteria are that the coverage area of a single point is less than 1 square kilometer and the straight-line distance from the warning area of the same level is greater than 1 kilometer. After removal, each warning area is ensured to be continuous and reasonable, which facilitates the prevention and control work at the grassroots level and avoids wasting prevention and control efforts.
[0087] After the warning areas are generated, the warning level corresponding to each area is verified to ensure there are no errors. Each area is checked against its geographic coordinates to clarify its spatial extent. The geographic coordinates of the area boundaries are marked and compiled into a coordinate list to facilitate accurate location of the area by grassroots staff and avoid blind spots in prevention and control work. All warning areas of different levels and their corresponding warning levels are summarized into a complete meteorological risk warning result. The result clearly includes the area range and warning level corresponding to each warning level, along with the prevention and control requirements for each level, aligning with the needs of grassroots disaster prevention and mitigation. Specific prevention and control requirements are as follows: Red warning areas require the activation of emergency response, 24-hour duty arrangements, and preparation for evacuation of residents; Orange warning areas require strengthened patrols, conducted every 2 hours, to promptly identify potential disaster hazards; Yellow warning areas require the preparation of prevention and control materials and the assignment of dedicated personnel for monitoring; Blue warning areas require strengthened monitoring of precipitation and geological changes, with daily reporting of monitoring data; Areas without warning levels do not require special prevention and control measures and can be monitored according to routine procedures.
[0088] In this embodiment, the determination of the warning level and the generation of the warning area are in line with the actual needs of grassroots disaster prevention and mitigation. By matching the warning level with the preset graded threshold value, invalid isolated points are eliminated, and continuous and reasonable warning areas are generated. The risk level and prevention and control focus of each warning area are clearly defined, which solves the problem of the warning results being not specific and not very instructive, and improves the efficiency of grassroots disaster prevention and mitigation.
[0089] like Figure 2 As shown, embodiments of the present invention also provide a meteorological risk early warning system based on multi-source data fusion, including:
[0090] The data acquisition module is used to acquire multi-source heterogeneous data, including geological and geomorphological element data and precipitation monitoring data;
[0091] The processing module is used to perform spatial granulation and structural reorganization on multi-source heterogeneous data to generate spatial feature point clusters; based on the spatial feature point clusters, a spatial base surface is generated; initial analytic elements are defined on the base surface, and the initial analytic elements are subdivided according to the heterogeneity of the distribution of spatial feature point clusters to construct a raster system; each spatial feature point is mapped and assigned to the corresponding bottom-level raster, and a modified weight is calculated and generated based on the aggregation features of feature points in each bottom-level raster;
[0092] The correction module is used to correct the geological and geomorphological element data using correction weights to obtain the corrected geological environment parameter field; based on the corrected geological environment parameter field, a BP neural network model is used for evaluation and calculation to obtain the geological hazard index.
[0093] The calculation module is used to calculate the daily comprehensive effective precipitation dataset based on precipitation monitoring data, and to obtain the precipitation critical probability based on the statistical correlation between the daily comprehensive effective precipitation dataset and geological hazards.
[0094] The fusion module is used to fuse the geological disaster risk index and the critical probability of precipitation based on the logistic regression model to obtain the comprehensive meteorological risk warning probability.
[0095] The matching module is used to match the preset meteorological disaster risk classification threshold value based on the comprehensive meteorological risk warning probability, determine the warning level, and obtain the meteorological risk warning result.
[0096] It should be noted that this system is a system corresponding to the above method. All implementation methods in the above method embodiments are applicable to this embodiment and can achieve the same technical effect.
[0097] Embodiments of the present invention also provide a computing device, including: a processor and a memory storing a computer program, wherein the computer program, when executed by the processor, performs the method described above. All implementations in the above method embodiments are applicable to this embodiment and can achieve the same technical effects.
[0098] The above description represents the preferred embodiments of the present invention. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principles of the present invention, and these improvements and modifications should also be considered within the scope of protection of the present invention.
Claims
1. A meteorological risk early warning method based on multi-source data fusion, characterized in that, The method includes: Step 1: Acquire multi-source heterogeneous data, including geological and geomorphological element data and precipitation monitoring data; Step 2: Spatial granulation processing is performed on the multi-source heterogeneous data, transforming geological and geomorphological element data and precipitation monitoring data into discrete spatial feature points to generate a set of spatial feature points. Based on the set of spatial feature points, structural reorganization processing is performed, and clustering is performed according to the spatial proximity and attribute similarity between spatial feature points to generate spatial feature point clusters. Based on the spatial feature point clusters, the outermost boundary point set containing all spatial feature point clusters is calculated and determined, and the boundary point sets are connected sequentially to obtain the convex polyhedron envelope surface, which is used as the spatial base surface. On the spatial base surface, graffiti is performed according to a preset fixed side length. The process involves several steps: First, initial analytical elements are generated. Second, based on the distribution density and attribute differences of spatial feature point clusters within each initial analytical element, a heterogeneity index is calculated for each element. Third, for initial analytical elements with heterogeneity indices exceeding a preset threshold, they are further divided into smaller analytical units until the distribution of spatial feature point clusters within all generated analytical units meets a preset uniformity standard, thus constructing a grid system. The finest unit of the grid system is defined as the bottom-level grid. Fourth, each spatial feature point is mapped to its corresponding bottom-level grid, and a correction weight is calculated based on the aggregation characteristics of feature points within each bottom-level grid. Step 3: Based on the corrected weights, the spatial feature point attribute values corresponding to the geological and geomorphological element data belonging to each bottom raster are weighted and adjusted to generate rasterized attribute benchmark values; based on the rasterized attribute benchmark values, an interpolation network is constructed to calculate the parameter values of any point in the study area using the rasterized attribute benchmark values of adjacent raster cells to generate a corrected geological environment parameter field; based on the corrected geological environment parameter field, a BP neural network model is used for evaluation and calculation to obtain the geological hazard index; Step 4: Based on precipitation monitoring data, obtain the real-time and historical multi-day raw precipitation sequences for each precipitation monitoring station within the study area. Based on the raw precipitation sequences, define an initial circular spatial influence range centered on each precipitation monitoring station. Iteratively expand the radius of the initial circular spatial influence range, counting the number of newly included neighboring stations after each expansion. Stop expanding when the number of neighboring stations reaches a preset gain threshold to determine the final effective influence area. Extract the raw precipitation sequences within the final effective influence area and calculate the daily comprehensive effective precipitation dataset using an attenuation-weighted accumulation method. Based on the daily comprehensive effective precipitation dataset and historical geological disaster records, perform statistical analysis to establish a correspondence between the daily comprehensive effective precipitation value range and the frequency of geological disasters. Based on this correspondence, map each value in the daily comprehensive effective precipitation dataset to the corresponding frequency of geological disasters to obtain the corresponding precipitation critical probability. Step 5: Based on the logistic regression model, the geological disaster risk index and the critical probability of precipitation are fused to obtain the comprehensive meteorological risk warning probability; Step 6: Based on the comprehensive meteorological risk warning probability, match the preset meteorological disaster risk classification threshold value, determine the warning level, and obtain the meteorological risk warning result. 2.The weather risk early warning method based on multi-source data fusion according to claim 1, characterized in that, Each spatial feature point is mapped and assigned to its corresponding bottom-level raster, and a corrected weight is calculated based on the aggregated features of the feature points within each bottom-level raster, including: All spatial feature points are mapped to the raster system, establishing the affiliation relationship between each spatial feature point and a bottom-level raster. Based on the affiliation relationship, the attribute values of all spatial feature points belonging to the same bottom-level raster are statistically analyzed and calculated to generate aggregated features. Based on the aggregated features, they are processed according to preset weight calculation rules to generate corrected weights. 3.The weather risk early warning method based on multi-source data fusion according to claim 2, characterized in that, Based on the corrected geological environment parameter field, a BP neural network model is used for evaluation and calculation to obtain the geological hazard index, including: From the corrected geological environment parameter field, a set of location point parameter values that match the input structure of the BP neural network model are extracted to form a multi-parameter input tensor. The multi-parameter input tensor is input into a pre-trained BP neural network model, and the hidden layers of the BP neural network model are used to calculate and generate intermediate layer feature data. The output layer of the BP neural network model receives and processes the intermediate layer feature data to obtain the geological disaster risk index.
4. The weather risk early warning method based on multi-source data fusion according to claim 3, characterized in that, The multi-parameter input tensor is input into a pre-trained BP neural network model, and the hidden layers of the BP neural network model are used for calculation to generate intermediate layer feature data. The output layer of the BP neural network model receives and processes the feature data from the intermediate layers to obtain the geological hazard index, including: The multi-parameter input tensor is input into the input layer of the pre-trained BP neural network model; the first hidden layer of the BP neural network model receives and processes the multi-parameter input tensor from the input layer, performs the first round of nonlinear transformation and weighted calculation, and generates the first intermediate layer feature data. The second hidden layer of the BP neural network model receives and processes the feature data of the first intermediate layer, performs a second round of nonlinear transformation and weighted calculation, and generates the feature data of the second intermediate layer. The output layer of the BP neural network model receives and processes the feature data from the second intermediate layer, performs the final linear or nonlinear calculation, and obtains the geological hazard index.
5. The meteorological risk early warning method based on multi-source data fusion according to claim 4, characterized in that, The original precipitation sequence within the final effective impact area is extracted and calculated using an attenuation-weighted accumulation method to generate a daily comprehensive effective precipitation dataset. Based on the daily comprehensive effective precipitation dataset and historical geological disaster records, statistical analysis is performed to establish the correspondence between the daily comprehensive effective precipitation value intervals and the frequency of geological disasters, including: Based on the original precipitation sequence and the preset attenuation coefficient sequence within the final effective influence area, the daily comprehensive effective precipitation corresponding to each central station is generated through weighted calculation and accumulation, and the daily comprehensive effective precipitation dataset is generated. Based on the daily comprehensive effective precipitation dataset and historical geological disaster records, the values in the daily comprehensive effective precipitation dataset are divided into multiple continuous value intervals, and the number of daily comprehensive effective precipitation samples and the number of geological disaster occurrence samples are counted in each value interval; based on the number of geological disaster occurrence samples and the number of daily comprehensive effective precipitation samples, the frequency of geological disaster occurrence is calculated. Based on all numerical intervals and the corresponding frequencies of geological disasters, a correspondence between the numerical intervals of daily comprehensive effective precipitation and the frequencies of geological disasters is established.
6. The weather risk early warning method based on multi-source data fusion according to claim 5, characterized in that, Based on a logistic regression model, the geological disaster risk index and precipitation critical probability are fused to obtain a comprehensive meteorological risk warning probability, including: Based on the geological hazard index and the critical probability of precipitation, a fused feature vector is constructed; the fused feature vector is then input into a pre-trained logistic regression model. The logistic regression model performs linear combination calculations on the components in the fused feature vector to generate a linear weighted sum; based on the linear weighted sum, a nonlinear transformation calculation is performed through a logistic function to generate intermediate probability values. Based on the intermediate probability value, the comprehensive meteorological risk warning probability is obtained through normalization.
7. The weather risk early warning method based on multi-source data fusion according to claim 6, characterized in that, Based on the comprehensive meteorological risk warning probability, a preset meteorological disaster risk classification threshold is matched to determine the warning level, and the meteorological risk warning result is obtained, including: By comparing the probability of a comprehensive meteorological risk warning with the preset critical value for meteorological disaster risk classification, the risk level range in which the probability of a comprehensive meteorological risk warning falls can be obtained. Based on the risk level range, the corresponding preset warning level is matched to determine the preliminary warning level for each spatial location point; The preliminary warning level is processed to generate warning areas of different levels; based on the warning areas of different levels and their corresponding warning levels, the meteorological risk warning results are obtained.
8. A weather risk early warning system based on multi-source data fusion, the system implements the method of any one of claims 1 to 7, characterized in that, include: The data acquisition module is used to acquire multi-source heterogeneous data, including geological and geomorphological element data and precipitation monitoring data; The processing module is used to perform spatial granulation and structural reorganization on multi-source heterogeneous data to generate spatial feature point clusters; Based on spatial feature point clusters, a spatial base surface is generated; initial analysis elements are defined on the base surface, and the initial analysis elements are further subdivided according to the heterogeneity of the distribution of spatial feature point clusters to construct a grid system. Each spatial feature point is mapped and assigned to its corresponding bottom-level grid, and a corrected weight is calculated based on the aggregated features of the feature points within each bottom-level grid. The correction module is used to correct the geological and geomorphological element data using correction weights to obtain the corrected geological environment parameter field; based on the corrected geological environment parameter field, a BP neural network model is used for evaluation and calculation to obtain the geological hazard index. The calculation module is used to calculate the daily comprehensive effective precipitation dataset based on precipitation monitoring data, and to obtain the precipitation critical probability based on the statistical correlation between the daily comprehensive effective precipitation dataset and geological hazards. The fusion module is used to fuse the geological disaster risk index and the critical probability of precipitation based on the logistic regression model to obtain the comprehensive meteorological risk warning probability. The matching module is used to match the preset meteorological disaster risk classification threshold value based on the comprehensive meteorological risk warning probability, determine the warning level, and obtain the meteorological risk warning result.
9. A computing device, comprising: include: One or more processors; A storage device for storing one or more programs, which, when executed by one or more processors, cause the one or more processors to implement the method as described in any one of claims 1 to 7.