Land subsidence spatial structure prediction and driving mechanism partitioning method
By constructing a multi-source explanatory variable database and a nonlinear mapping model, and combining explanatory variable contribution analysis and spatial clustering, the problem of regional-scale land subsidence prediction and driving mechanism differentiation was solved, realizing the prediction of subsidence spatial structure and the zoning of driving mechanisms, and supporting regional differentiated prevention and control.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- INST OF GEOGRAPHICAL SCI & NATURAL RESOURCE RES CAS
- Filing Date
- 2026-03-25
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies are insufficient for predicting land subsidence and identifying differences in driving mechanisms at the regional scale. They lack quantitative analysis of subsidence driving factors in different regions, making it difficult to support differentiated prevention and control in different regions. Furthermore, they lack a systematic characterization of the spatial structure characteristics of land subsidence.
By constructing a multi-source explanatory variable database, performing spatial registration and resampling, screening key explanatory variables, establishing a predictive model of nonlinear mapping relationships, conducting model training and accuracy evaluation, and combining explanatory variable contribution analysis and spatial clustering, the settlement driving mechanism is identified and partitioned.
It realizes the zoning of the prediction and driving mechanism of the spatial structure of ground subsidence, improves the applicability and scalability of regional-scale subsidence prediction, supports the optimization of geological disaster prevention and monitoring networks, and provides a scientific basis for regionally differentiated prevention and control.
Smart Images

Figure CN122241479A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of geological disaster monitoring and assessment technology, and in particular to a method for predicting and zoning land subsidence driving mechanisms using multi-source geographic information data and data-driven models. This method integrates geographic information systems, spatial data analysis, and interpretable model analysis techniques, and can be used for land subsidence risk assessment, monitoring network optimization, and regional prevention and control decision support in coastal areas. Background Technology
[0002] Land subsidence refers to the continuous or phased decline of the earth's surface caused by the combined effects of natural factors and human activities. It is one of the most common and widespread geological hazards in urbanized areas, plains, and areas where groundwater is extracted. Land subsidence reduces the water storage capacity of underground aquifers, induces ground fissures and surface deformation, causes damage to buildings and infrastructure, and significantly increases the risk of flooding, thus having a long-term impact on regional economic and social development and public safety.
[0003] Currently, the prediction and assessment of land subsidence mainly employs physical mechanism models and statistical analysis methods. Physical mechanism models are typically based on soil consolidation theory or numerical simulation methods, requiring a large number of high-precision hydrogeological parameters as input, such as elastic modulus, permeability coefficient, and pore water pressure. However, obtaining these parameters over large areas is difficult and costly, hindering the widespread application of such methods at regional scales.
[0004] With the development of remote sensing monitoring and geographic information technology, data-driven methods are increasingly being applied to land subsidence prediction. Existing technologies utilize machine learning models to predict subsidence using multi-source environmental variables, improving prediction accuracy to some extent. However, most existing prediction models are "black box models," making it difficult to explain the contribution and direction of each driving factor to subsidence formation. Existing technologies still have the following shortcomings:
[0005] (1) Existing studies usually analyze the driving factors as independent variables, and lack the identification of the differences in the subsidence driving mechanisms under different geological and hydrological backgrounds;
[0006] (2) Existing methods focus on settlement prediction itself and lack technical means to combine the prediction results with regionally differentiated prevention and control needs;
[0007] (3) In cases where data sources are complex and regional differences are significant, there is a lack of a comprehensive technical solution that can simultaneously achieve settlement prediction, explanation of driving mechanisms and regional zoning.
[0008] (4) Existing technologies lack a systematic characterization of the spatial structure characteristics of land subsidence, making it difficult to reveal the differences in the subsidence driving mechanisms in different regions from the perspective of spatial structure, thus limiting the effectiveness of land subsidence risk assessment and regional differentiated prevention and control.
[0009] Therefore, there is an urgent need for a technical method that can predict land subsidence at the regional scale, quantify the contribution of explanatory variables, and identify the subsidence driving mechanisms in different regions, so as to improve the scientific nature and pertinence of land subsidence risk assessment and regional prevention and control. Summary of the Invention
[0010] To address the shortcomings of existing technologies, the present invention aims to provide a method for predicting the spatial structure of land subsidence and zoning its driving mechanisms. This method enables the prediction of the spatial distribution characteristics of land subsidence at the regional scale, solves the problems of difficulty in interpreting subsidence prediction results and implementing differentiated regional prevention and control in existing technologies, and quantifies the contribution of driving factors to achieve automatic identification of subsidence driving modes in different regions. This provides a scientific basis for monitoring network optimization and regional prevention and control decisions.
[0011] To address the aforementioned technical problems, the present invention proposes a land subsidence prediction and driving mechanism zoning method, comprising the following steps:
[0012] Step 1: Obtain ground subsidence observation data of the study area, construct a ground subsidence sample database, and collect multi-source explanatory variable data related to ground subsidence;
[0013] Step 2: Spatial registration and resampling are performed on the settlement sample database and explanatory variable data to match data from different sources on a unified spatial grid, forming a standardized spatial dataset;
[0014] Step 3: Perform correlation analysis on the explanatory variables of the standardized spatial dataset to remove redundant explanatory variables; and determine the set of key explanatory variables through feature selection;
[0015] Step 4: Construct a ground subsidence prediction model based on the set of key explanatory variables, establish a nonlinear mapping relationship between the explanatory variables and the subsidence rate, and train the model;
[0016] Step 5: Use the trained model to make spatial predictions for the study area, generate spatial prediction results of the settlement rate for the study area; and evaluate the accuracy of the ground settlement prediction model.
[0017] Step 6: Based on the ground settlement prediction model, perform interpretability analysis to calculate the contribution and direction of each explanatory variable to the settlement prediction results;
[0018] Step 7: Based on the contribution characteristics of the explanatory variables and combined with spatial adjacency relationships, perform spatial clustering analysis to obtain the zoning results of the settlement driving mechanism of the study area.
[0019] Furthermore, in step 7, the key explanatory variable SHAP value of each sample point is used as the clustering input, and spatial continuity constraints are applied in combination with K-nearest neighbor relationships to construct the clustering objective function.
[0020] Furthermore, in step 7, the clustering results based on the driving mechanism are spatially overlaid with the administrative division boundaries to form a comprehensive zoning that takes into account both natural mechanisms and administrative management, providing an operational basis for regional differentiated prevention and control.
[0021] Compared with the prior art, the present invention has the following beneficial effects:
[0022] (1) To achieve the integration of prediction of spatial structure of ground subsidence and analysis of driving mechanism.
[0023] This invention not only enables the prediction of regional-scale ground subsidence, but also characterizes the spatial structural features of subsidence and analyzes the contribution of driving factors, thereby revealing the subsidence formation mechanism in different regions.
[0024] (2) Improve the applicability and scalability of regional-scale settlement prediction
[0025] This invention establishes a settlement prediction model based on multi-source data fusion, avoiding the dependence of traditional physical mechanism models on high-precision hydrogeological parameters. It can achieve settlement prediction over a wide area, and is especially suitable for risk assessment in areas where hydrogeological parameters are difficult to obtain or monitoring data is insufficient.
[0026] (3) Spatial partition identification of settlement driving mechanism
[0027] This invention conducts spatial cluster analysis based on the contribution characteristics of explanatory variables, which can identify the differences in subsidence driving patterns in different regions and provide a technical basis for differentiated regional governance.
[0028] (4) Support the optimization of geological disaster prevention and monitoring networks
[0029] The results of this invention on the distribution of settlement rate and the zoning of driving mechanisms can be used to guide the optimization of monitoring site layout, groundwater management and the formulation of engineering control measures.
[0030] (5) It has good engineering application value.
[0031] This invention can be applied to fields such as geological disaster risk assessment in coastal areas, land spatial planning, urban infrastructure safety assessment, and disaster prevention and mitigation decision support, and has broad application prospects. Attached Figure Description
[0032] To more clearly illustrate the technical solution of the present invention, the present invention will be further described below with reference to the accompanying drawings. The accompanying drawings are only used to explain the present invention and are not intended to limit the scope of protection of the present invention.
[0033] Figure 1 This is a spatial distribution map of sample points in an embodiment of the present invention.
[0034] Figure 2 This is a technical flowchart of the method of the present invention.
[0035] Figure 3 This is a graph showing the simulation results of the settlement rate in an embodiment of the present invention.
[0036] Figure 4 This is a graph showing the contribution results of explanatory variables in an embodiment of the present invention.
[0037] Figure 5 This is a spatial clustering partitioning result diagram of an embodiment of the present invention.
[0038] Figure 6 This is a graph showing the contribution results of the partitioned explanatory variables in an embodiment of the present invention. Detailed Implementation
[0039] This embodiment takes the coastal area of China as the study area and uses the method described in this invention to predict and partition the driving mechanism of land subsidence. The specific steps are as follows:
[0040] Step 1: Construct a settlement sample database and a multi-source explanatory variable database related to ground settlement.
[0041] In this embodiment, the ground subsidence observation data is based on a systematic review of Chinese and English literature from the past 10 years. Data sources include InSAR remote sensing data, GNSS ground monitoring station data, leveling data, and subsidence rate information from published literature. Subsidence locations and their corresponding vertical deformation rates are extracted by digitizing and georegistering the subsidence rate maps from the literature. This embodiment only retains subsidence records with subsidence rates > 0 mm / yr, excluding samples with positive vertical deformation. To supplement the zero-value area of ground subsidence, samples are randomly selected from recorded extremely low subsidence sensitivity areas as non-subsidence control points (zero-value points), referencing existing global subsidence sensitivity research results. These extremely low sensitivity areas are further determined by combining topographic and geological indicators such as bedrock exposure, elevation, and slope. For the same 1 km... 2 For each sampling point within a grid, its arithmetic mean is calculated and used as the representative value for that grid. After resampling and cleaning, the final dataset contains 9237 sample points, including 6081 settlement points and 3156 non-settlement points. The spatial distribution of sample points in some areas is shown below. Figure 1 As shown.
[0042] In this embodiment, the multi-source explanatory variables are selected from dimensions such as geology, soil, climate, vegetation, groundwater, regional functional classification, socio-economic development, and spatial proxy, totaling 32 explanatory variables. After subsequent correlation analysis, redundant explanatory variables need to be removed, so 18 explanatory variables are ultimately retained. This embodiment only lists the 18 ultimately retained explanatory variables. Figure 4 and Figure 6 The meanings of the explanatory variables in this context are explained below, including:
[0043] (1) Geological and soil condition data: soil and sediment thickness (SST), lithological classification (LTH);
[0044] (2) Climate conditions data: annual average precipitation (PRCP), average temperature of the hottest season (TEMPWQ);
[0045] (3) Topographic data: Digital Elevation Model (DEM), Slope (SLP);
[0046] (4) Groundwater dynamic data: Groundwater storage anomaly (GWSA), groundwater level (GWL), groundwater level depth (WTD), aquifer thickness (AQFT), topographic moisture index (TWI);
[0047] (5) Regional functional classification data: Regional functional system classification (LS), enhanced vegetation index (EVI);
[0048] (6) Socioeconomic activity data and other spatial proxy variables: mining land use (MINING), building height (BDHT), building weight (BDWT), distance to coastline (DCOAST), distance to railway (DRAIL).
[0049] All 18 explanatory variables mentioned above are well-known in the field and can be obtained through publicly available data.
[0050] The study area is a coastal region with complex and diverse coastal landforms. Precipitation increases from north to south, ranging from 400-600 mm to 1500-2500 mm annually. Hundreds of rivers flow into the sea along the coast, forming vast river deltas, providing a typical and diverse geological and hydrological background for the study of land subsidence.
[0051] Step 2: Data preprocessing and spatial unification.
[0052] To ensure consistency of data from different sources across spatial scales, the underlying database undergoes standardization processing, specifically including:
[0053] (1) Project all explanatory variable data to the same coordinate system and resample and align to 1 km. 2On regular grid cells with spatial resolution;
[0054] (2) Neighborhood filling is performed on explanatory variables with missing values at the land-sea interface. Continuous variables are filled with the mean of a 5×5 neighborhood, and categorical variables are filled with the mode, to ensure that each grid cell has complete variable values.
[0055] (3) Numerical encoding is performed on categorical variables. In this embodiment, ordinal encoding is used for two types of categorical variables: lithology and land system. Their categorical attributes are preserved, and the built-in categorical support of the XGBoost model is enabled during subsequent model regression.
[0056] The above steps form a standardized spatial dataset, thereby improving the stability and reliability of subsequent model training.
[0057] Step 3: Feature correlation analysis and selection of key explanatory variables.
[0058] To avoid overfitting of the model due to collinearity among high-dimensional explanatory variables and to prevent interference with the accurate determination of the explanatory contributions of each factor, this embodiment adopts a two-step feature selection strategy:
[0059] The first step is to remove redundant variables based on the Spearman correlation coefficient.
[0060] The Spearman correlation coefficient between explanatory variables is calculated using the following formula.
[0061]
[0062] in, Let be the correlation coefficient between the i-th and j-th explanatory variables, and n be the number of sample points. For each variable, sort them in ascending order of observations, and denote the rank of the k-th sample with respect to variables i and j as follows: , ,but , where is the difference in rank between the two variables in the k-th sample. For variable pairs with an absolute correlation coefficient exceeding the threshold of 0.8, the explanatory variable with the higher absolute Spearman correlation coefficient with the settling rate is retained, while the other explanatory variable is removed.
[0063] The second step is the final selection of variables based on forward feature selection.
[0064] Forward Feature Selection (FFS) was employed to determine the set of key explanatory variables. Starting with an empty feature set, the negative mean squared error (Neg-MSE) under 5-fold cross-validation was used as the evaluation metric. A greedy strategy was employed to iteratively add features that maximized the model's generalization ability. FFS can automatically detect and remove variables that are counterproductive to the objective, effectively improving the model's generalization ability and ultimately determining the set of key explanatory variables used for settlement prediction.
[0065] The above two-step strategy is used to determine the set of key explanatory variables for settlement prediction.
[0066] Step 4: Establish a ground subsidence prediction model.
[0067] Based on the selected set of key explanatory variables, this embodiment uses the XGBoost algorithm to construct a settlement prediction model to describe the nonlinear relationship between the explanatory variables and the settlement rate:
[0068]
[0069] in, Let x be the predicted settlement rate of the i-th grid cell. i For the explanatory variable vector, Let M be the m-th regression tree model, where M is the number of regression trees. This represents the overall integrated function model.
[0070] The objective function for model optimization is as follows:
[0071]
[0072] Among them, y i This represents the actual settlement rate. Let i be the predicted settlement rate of the i-th grid cell. The number of sample points. For the number of regression trees, Let be the regularization term for the m-th decision tree. The loss function is calculated using the mean squared error in this example.
[0073] The expression for the regularization term is as follows:
[0074]
[0075] Where γ is the structural penalty coefficient, T is the number of leaf nodes, λ is the weight penalty coefficient, and w is the leaf node weight vector. This is a regularization term for a single decision tree, used to control model complexity and prevent overfitting.
[0076] In this embodiment, to objectively evaluate model performance and perform hyperparameter tuning, a nested cross-validation strategy is adopted, combined with the Conditional Latin Hypercube Sampling (cLHS) algorithm to ensure the representativeness of sample features. cLHS selects n representative sample points from a multidimensional auxiliary variable space containing N candidate points. Assuming the auxiliary variable set contains K continuous variables and L categorical variables, the algorithm minimizes the comprehensive objective function. To balance the representativeness of the sample, among which ( For continuous variable distribution matching degree, For the proportion of categorical variables, For the correlation matrix matching degree, , , These represent the weights, which are all set to 1 in this embodiment.
[0077] For continuous variable distribution matching, the aim is to ensure that samples achieve uniform coverage of the marginal distributions of each continuous variable. The range of values for each continuous variable is divided into n strata, and the number of samples actually falling into each stratum is calculated:
[0078]
[0079] in, This represents the number of samples for the k-th continuous variable in the j-th layer.
[0080] The proportion matching degree for categorical variables aims to ensure that the proportion of each category in the sample is consistent with that in the population.
[0081]
[0082] in, For the first The total number of categories for a categorical variable. For the first in the sample Class ratio, For the whole The proportion of classes.
[0083] The correlation matrix matching degree aims to maintain the correlation structure among variables:
[0084]
[0085] in, These are the elements of the correlation coefficient matrix between sample variables. These are the elements of the correlation coefficient matrix among the population variables.
[0086] In the outer loop, the entire sample is divided into training and test sets in a 9:1 ratio using the cLHS algorithm with 1000 iterations. This ensures the test set's distribution in the multidimensional covariate space is consistent with the entire set, avoiding spatial autocorrelation errors caused by random sampling of the test set. This process is repeated 10 times, generating 10 independent test sets. In each outer loop iteration, the corresponding training set data is further divided into 5 folds using cLHS. In each inner iteration, 4 / 5 of the data (4 folds) is used for model calibration and parameter fitting; the remaining 1 / 5 (1 fold) is used for model validation to evaluate the generalization performance of the current hyperparameter combination. This 5-fold rotation ensures that each fold of samples is used as a validation set, reducing the dependence of hyperparameter selection on the distribution of a specific subset.
[0087] The Optuna framework is used to tune the hyperparameters of the calibration set in each inner loop. This method is based on a Bayesian optimization algorithm using a Tree-structured Parzen Estimator (TPE). After each iteration, a surrogate probability model is constructed based on the existing evaluation results to intelligently infer the most promising parameter regions, finding better parameter combinations within a finite number of iterations. Each set of hyperparameter configurations undergoes 100 iterations, with the optimization objective being the negative mean square error on the validation set. The optimized hyperparameters include:
[0088] n_estimators: Number of base learners, ranging from [100, 500];
[0089] max_depth: The maximum depth of the tree, ranging from [3, 7];
[0090] learning_rate: learning rate, ranging from [0.005, 0.1];
[0091] min_child_weight: The minimum sum of weights of leaf node samples, ranging from [1, 20].
[0092] subsample: The sampling ratio of training samples, ranging from [0.6, 1.0];
[0093] colsample_bytree: Feature sampling ratio, range [0.6, 1.0];
[0094] reg_alpha: L1 regularization coefficient, ranging from [0.001, 1];
[0095] reg_lambda: L2 regularization coefficient, ranging from [0.1, 5].
[0096] Step 5: Spatial prediction and model evaluation.
[0097] In this step, the trained model is used to make spatial predictions for the study area. The models trained in 10 outer loops are integrated, and the arithmetic mean is used to generate the final prediction result, which generates a continuous subsidence rate distribution with a resolution of 1 km for the study area. The subsidence rate prediction result is used to characterize the spatial structure features of ground subsidence, including spatial distribution patterns and spatial heterogeneity.
[0098]
[0099] Where d is the number of repetitions of the outer loop, which is 10 in this example. For the first The optimal model obtained by training the outer loop. This is the final integrated model's prediction of the settlement rate for the grid cell at location x. The prediction results are output in a spatially continuous distribution to characterize the spatial structure of ground settlement, including high-value areas, low-value areas, and spatial gradient changes. Based on this, four key regions are identified and analyzed. Figure 3 ).
[0100] In this step, the coefficient of determination R is used. 2 The model is evaluated using metrics such as mean squared error (RMSE) and mean absolute error (MAE). In this embodiment, the model's predictive performance is as follows: R 2 =0.672±0.034, RMSE=5.733±0.206 mm / yr, MAE=3.698±0.108 mm / yr. The spatial autocorrelation of the residuals remained at a low level (Moran's I=0.066±0.023), indicating that there was no significant spatial clustering of the model residuals and no systematic geographical bias.
[0101] Simultaneously, prediction uncertainty is assessed from two dimensions: relative dispersion and absolute fluctuation range. The coefficient of variation (CV, the ratio of standard deviation to mean) is introduced to measure the spatial distribution of uncertainty in the model's prediction results across multiple iterations; a higher CV value indicates higher uncertainty in the prediction results for that region. Furthermore, the range of the simulated values for each grid cell (the difference between the 90th and 10th percentiles) is calculated to visually represent the prediction confidence bandwidth.
[0102] The forecast results indicate that the coastal area exceeds 2.02 × 10 5 km 2The subsidence rate in 41.3% of the study area is >3 mm / yr. Subsidence hotspots are concentrated in deltas and densely populated areas within the study area. Furthermore, the model reveals moderate-intensity (3–10 mm / yr) continuous subsidence corridors in the urban-rural transition zone.
[0103] Step 6: Analysis of the contribution of explanatory variables.
[0104] To ensure the interpretability of the settlement prediction model, this embodiment employs both prior and posterior interpretation methods:
[0105] (1) Prior interpretation: The importance of the built-in Gain feature of XGBoost is used as the core indicator to quantify the reduction in the average loss function brought about by each feature when it is a split node in all decision trees, thereby measuring the contribution of different features to the overall prediction performance of the model.
[0106] (2) Posterior interpretation: Based on the game theory-based SHAP (SHapley Additive exPlanations) method, the model interpretation expression is constructed as follows:
[0107]
[0108] in, The contribution value (SHAP value) of the j-th explanatory variable, where p is the number of explanatory variables. The baseline value represents the expected value of all sample predictions, which is the average predicted output of the model without considering any feature contributions.
[0109] Contribution value calculation:
[0110]
[0111] In the formula, The factorial of the number of elements in subset S. The factorial of the number of elements in the feature set F after feature filtering. The number of elements in the feature set F (i.e., the total number of explanatory variables). The number of elements in the feature subset S that does not contain feature j. To increase the model's predicted value by adding feature j to the feature subset S, The predicted values of the model when only a subset of features S is used.
[0112] By calculating the SHAP values of each explanatory variable, not only can the degree of influence of the explanatory variables on the settlement prediction results be quantified, but the direction of their effect (positive or negative) can also be determined, thereby achieving a quantitative analysis of the settlement driving mechanism. Figure 4 ).
[0113] (3) Feature importance measurement
[0114] For a single sample, the absolute value of the SHAP value It reflects the strength of the feature's influence on the prediction results; the sign of the SHAP value indicates the direction of the influence. Explain the characteristics This increases the risk of settlement. For the entire dataset, global feature importance is obtained by calculating the average absolute value of the SAP values for all samples:
[0115]
[0116] in, For the first Features in each sample The SHAP value, where n is the number of samples.
[0117] In this embodiment, the prior interpretation analysis results show that soil and sediment thickness (SST) is the most critical explanatory variable, followed by average annual precipitation (PRCP) and average temperature of the hottest season (TEMPWQ), with groundwater storage anomaly (GWSA) ranking fourth. The posterior interpretation analysis further reveals the nonlinear effects and directionality of the explanatory variables: negative GWSA values correspond to positive SHAP values, indicating that groundwater depletion is associated with a higher risk of subsidence; high TEMPWQ values also correspond to a higher probability of subsidence; high SST values tend to contribute positively to SHAP, confirming that deep, loose sedimentary layers are more prone to compaction.
[0118] Step 7: Driven mechanism spatial clustering partitioning.
[0119] Based on the contribution characteristics of explanatory variables, this embodiment employs a spatially constrained clustering method based on SKATER (Spatial 'K'luster Analysis by Tree Edge Removal) for partitioning. This method uses the key explanatory variable SHAP value of each sample point as the clustering input, and applies spatial continuity constraints based on K-nearest neighbor relationships to construct the clustering objective function:
[0120] Spatial adjacency constraints are added to achieve region continuity. Based on Constructing a spatial adjacency graph based on neighbor relationships .in, This represents the vertex set, i.e., all sample points; Denotes the edge set, if the sample and If spatial adjacency (K-nearest neighbor relationship) exists, then an edge exists. .
[0121] The SKATER algorithm iteratively removes edges from the minimum spanning tree, ensuring that each cluster is a connected subgraph, thus guaranteeing the spatial continuity of the partitions.
[0122]
[0123] in, Contribution vector to the explanatory variables (i.e., the SHAP value vector of each sample point). For the k-th type region, K represents the cluster center, and K is the number of clusters, which is determined by combining the pseudo-F statistic (the ratio of between-group to within-group variance) and the sample size in each category.
[0124] By spatially overlaying the clustering results with the administrative boundaries of prefecture-level cities, a sub-region integrating natural mechanisms and administrative management is ultimately formed. Figure 5 ).
[0125] Unlike directly using raw hydrogeological data, clustering based on SHAP values can identify areas with similar subsidence driving mechanisms, not just those with similar climate or geological conditions. Combining the clustering results, the areas are categorized according to variable importance, SHAP characteristic patterns, and regional background. If subsidence and SHAP patterns have a significant coupling relationship with natural hydrological-topographic variables such as precipitation, temperature, groundwater, and DEM, they are initially classified as the natural hydrological-topographic dominant category (candidates C1-C6). If subsidence mainly occurs suddenly or is concentrated near or along infrastructure such as mining sites, foundation pits, building complexes, and railways, and is not sensitive to seasonal variations in regional climate and groundwater, it is initially classified as the engineering-infrastructure dominant category (candidates C7-C8). Within the natural hydrological-topographic dominant category, further subdivisions are made based on dominant variables, regional topography, and location conditions. Within the engineering / infrastructure dominant category, C7 and C8 are distinguished based on whether the disturbance surface is isal (mining / new town) or linear (railway). In this embodiment, the clustering results divide the study area into four driving mechanism zones (…). Figure 5 , Figure 6 The characteristics of each partition are as follows:
[0126] Cluster 1 (C1): Northern Coastal – Climate-Groundwater Coupled Type. C1 encompasses all coastal cities in the north. This region is characterized by deep Quaternary sedimentary layers and extensive alluvial plains, with high-intensity agricultural and industrial groundwater extraction. In C1, annual average precipitation and average temperature of the hottest season are the most important, and groundwater-related variables (GWSA, GWL) show a strong positive SHAP contribution, indicating that groundwater depletion is the dominant driver of subsidence in this region.
[0127] Cluster 2 (C2): Central Coastal – Sedimentation-Hydrology-Load Synergistic Type. C2 includes cities along the central coast. This region is located in a deltaic zone and is characterized by extremely thick soft soil deposits, a high groundwater level, and intensive urbanization. In C2, precipitation and subsidence show an inverse relationship: the lower the precipitation, the higher the corresponding positive SHAP value, reflecting the high natural groundwater level in the region and the crucial role of precipitation in maintaining pore pressure balance.
[0128] Cluster 3 (C3): Southeast Coastal – Moderate Background Controlled Type. C3 consists of southeastern coastal cities. This region is characterized by mountainous coastal topography, thin sedimentary layers, crystalline bedrock along the coast, and relatively low groundwater extraction intensity. DEM (Depth-Earth Elevation) is a relatively more important explanatory variable in C3, with low-elevation coastal plains showing higher positive SHAP values. As shown in the figure, annual precipitation (PRCP) is the largest explanatory variable, topography (DEM) is the second largest, and the hottest season mean temperature (TEMPWQ) is relatively less important, ranking seventh. This region is less sensitive to temperature changes, and topography plays a minor moderating role in hydrological processes within the background environment.
[0129] Cluster 4 (C4): Southern Coast – Topographically Constrained. C4 includes southern coastal cities. This region features highly differentiated sediment distribution and a tropical monsoon climate. The DEM (Depth Equation of Earth's Microstructure) is the top explanatory variable in C4, reflecting the region's extremely high topographic heterogeneity and the concentration of subsidence in low-lying delta and coastal plain environments. The importance of the hottest season mean temperature (TEMPWQ) is significantly increased (ranked 3rd), reflecting the high coupling between hydrothermal conditions and groundwater dynamics under topographic constraints, and extremely high topographic sensitivity. A notable feature of C4 is the positive GWSA value corresponding to a positive SHAP contribution, which differs from the pattern in regions C1 to C3. This may be due to groundwater accumulation in karst and fractured rock aquifers, leading to reduced drainage efficiency and increased pore pressure, thus triggering localized land subsidence.
[0130] In addition, there are other potential typical settlement-driving areas, including:
[0131] C5: Salinized Compression Type. These areas are typically very close to the coastline and are affected by tidal dynamics, seawater intrusion, and highly saline clayey soils. Due to the high salt content of the soil, changes in chemical osmotic pressure caused by dynamic changes in groundwater lead to microscopic collapse of the soil structure, resulting in continuous and slow settlement.
[0132] C6: Vegetation-soil moisture transpiration driven type. This type of area is usually located in non-urbanized areas with deep expansive soil or clay layers. During extremely dry or hot seasons, high-coverage vegetation excessively extracts shallow soil moisture through strong transpiration, causing the clay soil to lose water and shrink, triggering periodic subsidence of the land surface.
[0133] C7: Engineering Disturbance Dominant Type. These areas are typically located in mining areas or large newly developed urban areas. Settlement does not change with seasonal rainfall or groundwater level fluctuations, but rather exhibits sudden or localized settlement that is related to underground space excavation, forced groundwater drainage, or instantaneous loading height of high-rise building complexes.
[0134] C8: Infrastructure Axial Compression Type. These areas are typically located around existing railways, primarily driven by the soil compaction effect caused by long-term traffic load vibrations and the continuous underground drainage effect created during corridor construction.
[0135] The above steps are intended to output the following results:
[0136] (1) Spatial distribution of subsidence rate: Spatial prediction map of land subsidence rate along the Chinese coast with a resolution of 1 km ( Figure 3 );
[0137] (2) Explanatory variable contribution results: including the global feature importance ranking (prior and posterior), the contribution degree and direction of each explanatory variable to the settlement prediction results ( Figure 4 );
[0138] (3) Settlement driving mechanism partitioning results: Four types of driving mechanism partitioning based on SHAP value space clustering and the contribution characteristics of explanatory variables in each partition ( Figure 5 , Figure 6 ).
[0139] The above results provide a scientific basis for formulating differentiated prevention and control strategies for different regions: Region C1 should focus on controlling deep groundwater extraction and strengthening drought early warning; Region C2 should focus on soft soil engineering reinforcement and groundwater level maintenance; Region C3 should maintain low-intensity extraction and strengthen ecological protection; Region C4 should improve drainage systems and optimize groundwater management.
[0140] The above-described embodiments are merely illustrative of several implementations of the present invention, and while the descriptions are relatively specific and detailed, they should not be construed as limiting the scope of the invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these modifications and improvements all fall within the protection scope of the present invention.
[0141] In addition to the embodiments described above, the present invention may have other implementations. All technical solutions formed by equivalent substitution or equivalent transformation fall within the protection scope claimed by the present invention.
Claims
1. A method for predicting the spatial structure and driving mechanism of land subsidence, comprising the following steps: Step 1: Obtain ground subsidence observation data of the study area, construct a ground subsidence sample database, and collect multi-source explanatory variable data related to ground subsidence; Step 2: Spatial registration and resampling are performed on the settlement sample database and explanatory variable data to match data from different sources on a unified spatial grid, forming a standardized spatial dataset; Step 3: Perform correlation analysis on the explanatory variables of the standardized spatial dataset to remove redundant explanatory variables; and determine the set of key explanatory variables through feature selection; Step 4: Construct a ground subsidence prediction model based on the set of key explanatory variables, establish a nonlinear mapping relationship between the explanatory variables and the subsidence rate, and train the model; Step 5: Use the trained model to make spatial predictions for the study area, generate spatial prediction results of the settlement rate for the study area; and evaluate the accuracy of the ground settlement prediction model. Step 6: Based on the ground settlement prediction model, perform interpretability analysis to calculate the contribution and direction of each explanatory variable to the settlement prediction results; Step 7: Based on the contribution characteristics of the explanatory variables and combined with spatial adjacency relationships, perform spatial clustering analysis to obtain the zoning results of the settlement driving mechanism of the study area.
2. The method according to claim 1, characterized in that: In step 1, the ground subsidence observation data includes remote sensing monitoring data, historical survey data, or ground monitoring station data, and the explanatory variable data includes geological and soil condition data, climate condition data, topographic data, groundwater dynamic data, regional functional classification data, socio-economic activity data, and other spatial proxy variable data.
3. The method according to claim 1, characterized in that: In step 2, the spatial registration and resampling process includes uniformly converting all explanatory variable data to a regular grid with the same spatial resolution.
4. The method according to claim 1, characterized in that: In step 3, the feature selection includes calculating the correlation between explanatory variables to remove redundant explanatory variables, and using a forward feature selection method to determine the set of key explanatory variables.
5. The method according to claim 1, characterized in that: In step 4, the settlement prediction model is established using a machine learning regression model, and the model's prediction performance is improved through nested cross-validation, conditional Latin hypercube sampling, and Bayesian hyperparameter optimization.
6. The method according to claim 1, characterized in that: In step 5, the models obtained from the training of the 10 outer loops are integrated, and the final prediction result is generated by taking the arithmetic mean, thus generating the continuous settlement rate distribution of the study area.
7. The method according to claim 1, characterized in that: In step 6, the contribution values of the explanatory variables are calculated by using prior interpretation and posterior interpretation respectively to achieve model interpretability analysis, so as to determine the positive or negative impact of the explanatory variables on the settlement prediction results.
8. The method according to claim 1, characterized in that: In step 7, the key explanatory variable SHAP value of each sample point is used as the clustering input, and spatial continuity constraints are applied in combination with K-nearest neighbor relationship to construct the clustering objective function.
9. The method according to claim 1, characterized in that: In step 7, the clustering results based on the driving mechanism are spatially overlaid with the administrative division boundaries to form a comprehensive zoning that takes into account both natural mechanisms and administrative management, providing an operational basis for regional differentiated prevention and control.
10. The method according to claim 1, characterized in that: The settling driving mechanism in step 7 includes one or more of the following types: Climate-groundwater coupled type: The groundwater storage anomaly GWSA and groundwater level GWL show a strong positive SHAP contribution, indicating that groundwater depletion is the dominant driving force of subsidence in this region; Sedimentation-hydrology-load synergistic type: precipitation and settlement show an inverse relationship: the lower the precipitation, the higher the corresponding positive SHAP value, reflecting the high natural groundwater level in the region and the key role of precipitation in maintaining pore pressure balance; Moderate background controlled type: annual precipitation PRCP is the number one explanatory variable, digital elevation model DEM is the number two explanatory variable, and the average temperature of the hottest season TEMPWQ ranks no higher than 7. Terrain-constrained type: The Digital Elevation Model (DEM) is the number one explanatory variable, the hottest season mean temperature (TEMPWQ) ranks no less than third, and the positive value of the groundwater storage anomaly (GWSA) corresponds to the positive contribution of SHAP. Salinized compression type: Located near the coastline, affected by tidal dynamics, seawater intrusion and highly salinized clay soil, the soil has a high salt content. In the dynamic changes of groundwater, the changes in chemical osmotic pressure cause microscopic collapse of the soil structure, resulting in continuous slow settlement. Vegetation-soil moisture transpiration driven type: Located in non-urbanized areas with deep expansive soil or clay layers, during extremely dry or hot seasons, high-coverage vegetation excessively extracts shallow soil moisture through strong transpiration, causing the clay soil to lose water and shrink, triggering periodic subsidence of the land surface. Engineering disturbance-driven type: Located in mining areas or large newly developed cities, the settlement does not change with seasonal precipitation or groundwater level fluctuations, and exhibits sudden or localized settlement related to underground space excavation, forced groundwater drainage or instantaneous loading height of high-rise building complexes. Infrastructure axial compression type: Located around existing railways, the main driving force is the soil compaction effect caused by long-term traffic load vibration, and the underground continuous drainage effect caused during the construction of the corridor.