A method and system for predicting soil thickness in a hilly and mountainous region of the south

By dividing the hilly and mountainous areas of southern China into geomorphic units and training multiple models in parallel, and combining them with physical prior data constraints, the problem of insufficient accuracy and rationality in soil thickness prediction was solved, and efficient and accurate soil thickness prediction was achieved.

CN122198264APending Publication Date: 2026-06-12JIANGXI AGRICULTURAL UNIVERSITY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
JIANGXI AGRICULTURAL UNIVERSITY
Filing Date
2026-05-14
Publication Date
2026-06-12

AI Technical Summary

Technical Problem

Existing technologies suffer from insufficient accuracy and poor rationality in predicting soil thickness in hilly and mountainous areas of southern China. They are unable to effectively characterize the spatial non-stationarity of soil-environment relationships as they change with different landform locations, resulting in bottlenecks in prediction accuracy and geographically unreasonable results.

Method used

A slope-based partitioning method is adopted, which divides the landform into units using a decision tree regression model. Random forest, quantile random forest and extreme gradient boosting models are trained in parallel. The prediction results are then constrained and fused with physical prior data to integrate the final prediction data.

Benefits of technology

It enables accurate prediction of soil thickness in hilly and mountainous areas of southern China, improves prediction accuracy and physical rationality, adapts to the actual needs of spatial heterogeneity of soil thickness, and solves the problems of lack of accuracy and rationality in existing technologies.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122198264A_ABST
    Figure CN122198264A_ABST
Patent Text Reader

Abstract

The application provides a hilly and mountainous area soil thickness prediction method and system, which comprises the following steps: determining target feature data based on collected data screening and preprocessing, and dividing a target mountainous area into multiple landform units by a preset model based on the target feature data and a slope division threshold; determining the corresponding optimal prediction model by training multiple preset prediction models in parallel for each landform unit, outputting a prediction result according to the optimal prediction model; determining the physical prior data corresponding to each landform unit based on the target feature data, and performing constraint fusion on the prediction result according to the physical prior data to determine the final prediction data of each landform unit, and integrating all the final prediction data to determine the prediction information of the target mountainous area. The application solves the problem that there is no hilly and mountainous area soil thickness prediction method with high precision, accuracy and rationality in the prior art.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of soil thickness prediction technology, and in particular to a method and system for predicting soil thickness in hilly and mountainous areas of southern China. Background Technology

[0002] Soil thickness is a key attribute characterizing soil resources and ecological functions, and is of great significance for soil erosion assessment, productivity estimation, hydrological process simulation, and ecological environment management. However, influenced by multiple factors such as parent material, topography, climate, biology, and human activities, soil thickness exhibits strong spatial heterogeneity. Traditional methods, which use point data to represent areas, struggle to accurately depict its continuous spatial distribution. To address this, Digital Soil Mapping (DSM) has emerged. By constructing quantitative relationship models between soil properties and environmental covariates (such as topography, climate, and remote sensing data), it enables spatial prediction and mapping of soil properties, and has become a core tool in current soil geography research. Among these methods, machine learning models, such as random forests and gradient boosting, have gained widespread application in DSM due to their ability to effectively handle nonlinear relationships and high-dimensional data.

[0003] In the hilly and mountainous areas of southern China, dramatic topographic relief leads to strong differentiation in surface processes such as erosion, transportation, and deposition. This results in the spatial distribution of soil thickness being coupled and controlled by complex geomorphological processes, exhibiting a high degree of non-stationarity. This means that the relationship between soil properties and environmental variables is not globally consistent, but rather changes significantly with variations in geomorphic location and dominant processes. This characteristic poses a serious challenge to the traditional DSM paradigm, which is based on a single global model.

[0004] Current mainstream digital soil mapping methods, when applied to predict soil thickness in mountainous areas, typically use a single machine learning model to fit the entire study area. This approach ignores the spatial heterogeneity of geomorphic processes such as erosion, transportation, and deposition controlled by topographic gradients. This results in the model's inability to characterize the spatial non-stationarity of soil-environment relationships across different geomorphic locations, creating a bottleneck in prediction accuracy. Furthermore, this "black box" prediction, which relies entirely on statistical correlation, may yield statistically optimal but geographically inaccurate results. It also fails to effectively reveal how the contributions of different environmental factors to the prediction evolve with dominant geomorphic processes, limiting the model's mechanistic explanatory depth. Therefore, the existing technical framework faces challenges in terms of accuracy, rationality, and interpretability, the core of which is the disconnect between data-driven models and the understanding of geographical processes. Summary of the Invention

[0005] Based on this, the purpose of this invention is to provide a method and system for predicting soil thickness in hilly and mountainous areas of southern China, aiming to solve the problem that there is a lack of a highly accurate and reasonable method for predicting soil thickness in hilly and mountainous areas of southern China in the existing technology.

[0006] A method for predicting soil thickness in hilly and mountainous areas of southern China, according to an embodiment of the present invention, the method comprising: Based on the collected data, the target feature data is determined through filtering and preprocessing. Based on the target feature data, the target mountain area is divided into multiple geomorphic units by determining the partition threshold through a preset model using the slope as the partitioning basis. Multiple preset prediction models are trained in parallel for each of the aforementioned landform units to determine the corresponding optimal prediction model, and prediction results are output based on the optimal prediction model. Based on the target feature data, physical prior data corresponding to each landform unit is determined, and the prediction results are constrainedly fused according to the physical prior data to determine the final prediction data of each landform unit. All the final prediction data are then integrated to determine the prediction information of the target mountain area.

[0007] In addition, the method for predicting soil thickness in hilly and mountainous areas in southern China according to the above embodiments of the present invention may also have the following additional technical features: Furthermore, the steps for filtering and preprocessing the collected data to determine the target feature data include: Acquire multi-source environmental covariate data, including topographic attributes, climate data, remote sensing spectra and vegetation indices, as well as soil thickness sample data; A recursive feature elimination method is used to filter features in the multi-source environmental covariate data to determine the associated feature data related to the soil thickness sample data; Different environmental covariates are combined to construct interactive feature data, so that the interactive feature data and the associated feature data determine the target feature data.

[0008] Furthermore, the step of dividing the target mountainous area into multiple geomorphic units based on the target feature data, using slope as the partitioning criterion, and determining the partitioning threshold through a preset model includes: Using slope as the independent variable and soil thickness as the dependent variable, a decision tree regression model with a pre-defined structure is constructed and trained. By limiting the maximum depth of the decision tree or the number of leaf nodes, the output of a fixed number of split points is controlled. Using the dividing point as the partition threshold, the target mountain area is divided into multiple geomorphic units, wherein the geomorphic units include at least gentle slope units dominated by deposition, transitional units in which erosion and deposition are dynamically balanced, and steep slope units dominated by erosion. One-way ANOVA was performed on soil thickness samples within each of the divided topographic units. If the significance level obtained from the analysis was lower than the preset threshold, the zoning scheme was confirmed to be effective.

[0009] Furthermore, the step of training multiple preset prediction models in parallel for each of the aforementioned geomorphic units to determine the corresponding optimal prediction model includes: For each geomorphic unit, a random forest model, a quantile random forest model, and an extreme gradient boosting model are constructed in parallel, and the target feature data of the geomorphic unit are divided to determine the training set and the validation set. Based on the training set, each preset model is trained by adjusting its hyperparameters using the five-fold cross-validation method with the estimation accuracy as the benchmark to obtain the corresponding optimized model. Based on the validation set, a multidimensional performance evaluation is performed on each optimized model to determine the model with the best overall performance as the optimal prediction model for the corresponding landform unit.

[0010] Furthermore, the step of outputting the prediction result based on the optimal prediction model includes: Based on the training set of the landform unit, the importance of all features in the target feature data is scored and ranked using the feature importance evaluation mechanism provided by the corresponding optimal prediction model. The features with the lowest importance scores are removed one by one, and the performance of the optimal prediction model under each feature subset change is evaluated on the validation set of the landform unit. The subset of features that achieves optimal performance on the validation set is determined as the prediction feature data of the optimal prediction model under the current landform unit, so as to output the prediction structure based on the prediction feature data and the optimal prediction model.

[0011] Furthermore, the step of determining the physical prior data corresponding to each of the landform units based on the target feature data, and then constraining the prediction results by fusing the prediction results according to the physical prior data to determine the final prediction data for each landform unit includes: For each of the aforementioned geomorphic units, based on the prediction results output by the corresponding optimal prediction model, a preliminary predicted probability distribution of the soil thickness at each spatial location point within the geomorphic unit is generated. Obtain the target physical prior distribution of soil thickness corresponding to the dominant geomorphic process of the geomorphic unit; By using a preset distribution alignment function, the preliminary prediction probability distribution is reweighted and deformed and mapped to a physical constraint posterior distribution, so as to sample from the physical constraint posterior distribution to obtain the final prediction data of the spatial location point; The preset alignment function is used to minimize the KL divergence between the adjusted preliminary prediction probability distribution and the target physical prior distribution.

[0012] Furthermore, the step of generating a preliminary predicted probability distribution of soil thickness at each spatial location point within the geomorphic unit includes: When the optimal prediction model is a quantile random forest, the empirical distribution function is constructed directly using multiple quantiles of its output; When the optimal prediction model is a model that only outputs point predictions, a parameterized distribution centered on the point prediction value and with the error distribution as the standard deviation is constructed based on the prediction error statistics of the optimal prediction model on the validation set of the landform unit.

[0013] Another objective of this invention is to provide a soil thickness prediction system for hilly and mountainous areas in southern China, for implementing the aforementioned soil thickness prediction method for hilly and mountainous areas in southern China, the system comprising: The data processing module is used to filter and preprocess the collected data to determine target feature data, and to divide the target mountain area into multiple geomorphic units based on the target feature data, using slope as the partitioning basis and a preset model to determine the partitioning threshold. The model training module is used to train multiple preset prediction models in parallel for each of the landform units to determine the corresponding optimal prediction model, and to output the prediction result based on the optimal prediction model. The global prediction module is used to determine the physical prior data corresponding to each of the landform units based on the target feature data, to perform constrained fusion of the prediction results based on the physical prior data, to determine the final prediction data of each landform unit, and to integrate all the final prediction data to determine the prediction information of the target mountain area.

[0014] Another objective of this invention is to provide a storage medium storing a computer program that, when executed by a processor, implements the steps of the above-described method for predicting soil thickness in hilly and mountainous areas of southern China.

[0015] Another objective of this invention is to provide an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the above-described method for predicting soil thickness in hilly and mountainous areas of southern China.

[0016] This invention uses target feature data obtained through screening and preprocessing of collected data as a foundation. It divides landform units based on slope as the zoning criterion and determines zoning thresholds using a preset model. Multiple preset prediction models are trained in parallel for each landform unit, and the optimal model is selected to output the prediction result. The prediction result is then constrained and integrated by combining the target feature data with the physical prior data of each landform unit. The entire prediction process is completed autonomously based on the target feature data, zoning rules, and physical prior data, without relying on a single global machine learning model. This effectively overcomes the prediction limitations of the spatial non-stationarity of soil and environmental relationships in complex terrain areas, achieving accurate prediction of soil thickness in hilly and mountainous areas of southern China, and adapting to the practical needs of scenarios with strong spatial heterogeneity of soil thickness in mountainous and hilly regions. Simultaneously, this method, through landform unit zoning modeling and physical prior constraint fusion, accurately matches the surface process characteristics of different landform units, specifically addressing the problems of existing single global models neglecting topographic gradient and landform differentiation, lacking physical rationality in prediction results, and insufficient prediction accuracy. It effectively achieves efficient prediction of soil thickness in hilly and mountainous areas of southern China without the support of additional process models, significantly improving the accuracy, physical rationality, and spatial applicability of soil thickness prediction results. Therefore, this invention solves the problem of the lack of a highly accurate and reasonable method for predicting soil thickness in hilly and mountainous areas of southern China. Attached Figure Description

[0017] Figure 1 This is a flowchart of a method for predicting soil thickness in hilly and mountainous areas in southern China, as described in the first embodiment of the present invention. Figure 2 This is a schematic diagram of the structure of a soil thickness prediction system in a hilly and mountainous area in southern China according to the second embodiment of the present invention; Figure 3 This is a schematic diagram of the structure of the electronic device in the third embodiment of the present invention; Figure 4 This is a model space prediction result diagram in one embodiment of the present invention, wherein, Figure 4 The schematic diagrams (a), (c), and (e) in the figure represent the global prediction results of RF, QRF, and XGBoost, respectively. Figure 4 Schematic diagrams (b), (d), and (f) in the figure represent the partition prediction results of RF, QRF, and XGBoost, respectively. Figure 4 The diagram (g) in the figure shows the prediction results of the optimal combination model.

[0018] The following detailed description, in conjunction with the accompanying drawings, will further illustrate the present invention. Detailed Implementation

[0019] To facilitate understanding of the present invention, a more complete description will be given below with reference to the accompanying drawings. Several embodiments of the invention are illustrated in the drawings. However, the invention can be implemented in many different forms and is not limited to the embodiments described herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

[0020] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and / or" as used herein includes any and all combinations of one or more of the associated listed items.

[0021] Example 1 Please see Figure 1 The figure shows a method for predicting soil thickness in hilly and mountainous areas in southern China according to the first embodiment of the present invention. The method specifically includes steps S01-S03.

[0022] S01, based on the collected data, filter and preprocess to determine the target feature data, and based on the target feature data, use slope as the partitioning basis and determine the partitioning threshold through a preset model to divide the target mountain area into multiple geomorphic units.

[0023] Specifically, the method acquires multi-source environmental covariate data, including topographic attributes, climate data, remote sensing spectra, and vegetation indices, as well as soil thickness sample data. A recursive feature elimination method is used to screen features from the multi-source environmental covariate data to determine associated feature data related to the soil thickness sample data. Different environmental covariates are combined to construct interactive feature data, which, along with associated feature data, determines the target feature data. In practice, redundant or irrelevant features are iteratively eliminated to achieve dimensionality reduction and prevent overfitting. This ensures that the associated feature data input to the model is efficient and discriminative. Furthermore, by constructing interactive feature data, the complex nonlinear coupling effects between environmental factors on soil thickness are captured, significantly enhancing the model's ability to represent complex geomorphic processes (such as erosion-deposition coupling) and greatly improving prediction accuracy.

[0024] Furthermore, using slope as the independent variable and soil thickness as the dependent variable, a decision tree regression model with a pre-defined structure is constructed and trained. By limiting the maximum depth or the number of leaf nodes of the decision tree, a fixed number of split points are output. These split points serve as zoning thresholds, dividing the target mountain area into multiple geomorphic units. Each geomorphic unit includes at least a gentle slope unit dominated by deposition, a transitional unit where erosion and deposition are in dynamic equilibrium, and a steep slope unit dominated by erosion. One-way ANOVA is performed on soil thickness samples within each geomorphic unit. If the significance level obtained is below the pre-defined threshold, the zoning scheme is considered effective. In practical implementation, the decision tree automatically identifies the critical slope threshold at which significant changes in soil thickness occur, ensuring that the zoning boundaries have clear geomorphological significance and solving the problem of traditional zoning methods being disconnected from geomorphic processes. Within each geomorphic unit, the dominant soil formation process is relatively consistent, thus achieving data-driven spatial zoning that directly responds to soil thickness, rather than relying on subjective or geometric rules. The study also examines whether there are significant differences in the mean soil thickness among different geomorphic units. If the differences are significant, it proves that the zoning successfully distinguishes the soil thickness under different geomorphic processes. This makes the effectiveness and necessity of the zoning scheme statistically verified.

[0025] S02, train multiple preset prediction models in parallel for each landform unit to determine the corresponding optimal prediction model, and output the prediction result based on the optimal prediction model.

[0026] Specifically, for each geomorphic unit, a random forest model, a quantile random forest model, and an extreme gradient boosting model are constructed in parallel. The target feature data of each geomorphic unit is divided to determine the training and validation sets. Based on the training set, each preset model is trained and its hyperparameters are tuned using a five-fold cross-validation method, with the estimation accuracy as the benchmark, to obtain the corresponding optimized model. Based on the validation set, the performance of each optimized model is evaluated in multiple dimensions to determine the model with the best overall performance as the optimal prediction model for the corresponding geomorphic unit. In practice, parallel training ensures fair comparison among the models, and five-fold cross-validation and hyperparameter tuning ensure that each model participates in the competition at its best, avoiding performance underestimation due to improper parameter settings. This allows us to discover and utilize the different patterns of optimal models in different regions, achieving targeted and more accurate local predictions.

[0027] Furthermore, based on the training set of geomorphic units, the feature importance evaluation mechanism provided by the corresponding optimal prediction model is used to score and rank all features in the target feature data according to their importance. Features with the lowest importance scores are successively removed, and the performance of the optimal prediction model under each feature subset change is evaluated on the validation set of the geomorphic unit. The feature subset corresponding to the optimal performance on the validation set is determined as the prediction feature data of the corresponding optimal prediction model under the current geomorphic unit, and the prediction structure is output based on the prediction feature data and the optimal prediction model. In specific implementation, the model's built-in evaluation mechanism completes the objective ranking of feature importance and gradually eliminates low-contribution features, verifying model performance changes in real time, thereby obtaining the smallest feature subset of the optimal model for each partition. This improves model computational efficiency without decreasing prediction accuracy. This achieves corresponding collaborative optimization of feature data, model, and geomorphic units, further improving prediction accuracy and precision.

[0028] In addition to adjusting the target feature data after the optimal model is determined, as mentioned above, targeted adjustments can also be made to each landform unit, the model, and the target features during model training. For example, during model training, a model is constructed for each landform unit. This model consists of a feature adaptive modulation network and a base predictor connected in series. The feature adaptive modulation network takes the target feature data as input and outputs a set of sample-specific feature modulation weights based on the feature values ​​of the input sample. The base predictor is one of a random forest, quantile random forest, or extreme gradient boosting model. Its input is not the original target feature data, but a feature vector weighted by the modulation weights output by the feature adaptive modulation network. During the model training phase, the training set data of the geomorphic unit is simultaneously input into the Feature Adaptive Modulation Network (VAN) and the base predictor for end-to-end joint training. The training objective is to minimize the error between the predicted soil thickness value output by the base predictor and the actual value. Through this joint training, the VVA learns to generate modulation weights that enable the base predictor to make more accurate judgments based on the context of the input samples, while the base predictor learns to make predictions based on this modulated feature space. After training, the VVA and the base predictor together constitute the optimal prediction model for that geomorphic unit. More specifically, the VVA is structured as a small feedforward neural network, with the number of neurons in its output layer matching the dimension of the target feature data. The output of each neuron is restricted to the (0,1) interval using the Sigmoid activation function, directly serving as the modulation weight for the corresponding feature. Joint training employs the gradient descent algorithm, and backpropagation is used to simultaneously update the parameters in both the VVA and the base predictor. For base predictors that are random forests or gradient boosting trees, the training process includes optimization of the tree structure split points, which is based on the modulated features generated by the current VVA. In practical implementation, the feature optimization process is upgraded from static selection to dynamic modulation, and end-to-end joint learning with the prediction model is achieved. This enables sample-level feature adaptation, which is more refined than partition-level feature selection. It allows the model to intelligently focus on the most important features based on the context of the current sample, i.e., the combination of feature values, and can more effectively handle complex nonlinear relationships, further improving prediction performance.

[0029] S03, based on the target feature data, determine the physical prior data corresponding to each landform unit, and perform constrained fusion of the prediction results according to the physical prior data to determine the final prediction data of each landform unit, and integrate all the final prediction data to determine the prediction information of the target mountain area.

[0030] Specifically, for each geomorphic unit, based on the prediction results output by the corresponding optimal prediction model, a preliminary predicted probability distribution of soil thickness at each spatial location point within the geomorphic unit is generated. The target physical prior distribution of soil thickness corresponding to the dominant geomorphic process of the geomorphic unit is obtained. Using a preset distribution alignment function, the preliminary predicted probability distribution is reweighted and deformed, and mapped to a physically constrained posterior distribution. Sampling is then performed from the physically constrained posterior distribution to obtain the final predicted data for the spatial location points. Preset alignment functions are used to minimize the KL (Kullback-Leibler) divergence between the adjusted preliminary predicted probability distribution and the target physical prior distribution. In practical implementation, by expanding a single predicted value into a probability distribution, the uncertainty of the prediction is quantified, thereby obtaining the predicted probability distribution of soil thickness at each spatial location, fully presenting the range of prediction fluctuations. Furthermore, the geographical understanding that gentle slopes have more thick layers and steep slopes have more thin layers is transformed into quantitative distribution data to determine the target thickness distribution of each geomorphic unit, thus introducing the physical distribution law of soil thickness dominated by erosion or deposition as a constrained prior. Next, the predicted distribution is adjusted by minimizing the KL divergence, correcting geographically unreasonable predictions and ensuring that the predicted distribution perfectly matches the geomorphological process, thus avoiding the situation where statistics are optimal but geographically incorrect. Finally, the probability distribution is transformed into specific predicted values ​​that can be mapped, resulting in final single-point prediction data that combines statistical accuracy with physical plausibility.

[0031] Furthermore, when the optimal prediction model is a quantile random forest, the empirical distribution function is directly constructed using its output multiple quantiles. When the optimal prediction model only outputs point predictions, a parameterized distribution centered on the point prediction values ​​and with the error distribution as the standard deviation is constructed based on the prediction error statistics of the optimal prediction model on the validation set of geomorphic units. By adjusting the data for different prediction models, all models can be transformed into standardized probability distributions, unifying the subsequent physical constraint processing flow.

[0032] Furthermore, when correcting the preliminary prediction results for reasonableness using prior physical data, in addition to the aforementioned method applicable to all models, targeted methods can be implemented for different models to further improve the accuracy and physical reasonableness of the prediction data. For example, when the optimal prediction model is one that can output a prediction interval, its output prediction results containing at least the lower quantile, median, and upper quantile are obtained; based on the slope zoning type of the geomorphic unit and its dominant geomorphic process, as well as the thickness category into which the preliminary prediction results fall, a set of differentiated fusion weights are dynamically assigned to the lower quantile, median, and upper quantile prediction results; the assigned fusion weights are used to perform weighted calculations on the lower quantile, median, and upper quantile prediction results to generate the final prediction data; The strategy of dynamically allocating fusion weights satisfies the following physical laws: For gentle slope areas dominated by deposition, the fusion weight configuration corresponding to the thick soil layer category gives higher weight to the quantiles representing larger predicted values; for transitional areas where erosion and deposition are in dynamic equilibrium, the fusion weight configuration corresponding to the middle soil layer category gives higher weight to the median; for steep slope areas dominated by erosion, the fusion weight configuration corresponding to the thin soil layer category gives higher weight to the quantiles representing smaller predicted values.

[0033] As an example, and not a limitation, in some alternative embodiments, examples of applying the above method to mountainous and hilly areas are as follows to facilitate a better understanding of the solution. The specific steps are as follows: 1. Regional Overview and Data Preparation The target area is mainly mountainous and hilly, with an elevation of 168–1503 m. The terrain is high in the east, north, and west, and low in the southwest. This area belongs to the subtropical red soil region, with strong spatial variation in soil thickness.

[0034] Field survey data: Historical profile data from the second and third soil surveys of the target area were collected, supplemented by measured field profiles. After outliers were removed using the three-times-standard-deviation method, a total of 513 valid sample points were obtained. Each sample point includes latitude and longitude coordinates and measured soil thickness (depth from surface to bedrock). The environmental variables used in this case study are shown in Table 1, including topographic attributes such as a 30m resolution digital elevation model (DEM), slope, aspect, slope length, and topographic location index; climate and rainfall erosion data; 10m resolution normalized vegetation index; and B2-B8 remote sensing band data. The complex influence mechanism of environmental variables and topography on soil thickness was explored through feature combinations. Climate factors included monthly average temperature, precipitation, and potential evapotranspiration data for the past 40 years.

[0035] Table 1 Environmental Variables and Their Sources

[0036] 2. Dynamic slope zoning A decision tree regression model with a maximum of 3 leaf nodes was constructed using slope as the independent variable and soil thickness as the dependent variable. The model automatically identified two splitting thresholds: 11.35° and 30.25°. Based on this, Xunwu County was divided into three geomorphic units (Table 2). After partitioning, the mean soil thickness for each slope level was 58.42 cm (gentle slope area), 68.01 cm (transition area), and 48.31 cm (steep slope area), respectively. One-way ANOVA showed that the differences in soil thickness among the three slope levels were highly significant (F=18.51, p<0.001), confirming that this grading method can effectively distinguish soil thickness variations under different slope conditions, providing a reasonable basis for subsequent topographic zoning analysis.

[0037] Table 2 Comprehensive Evaluation of Slope Zoning Scheme

[0038] 3. Partition-based independent modeling and adaptive model selection Within each partition, three machine learning models—Random Forest (RF), Quantile Random Forest (QRF), and Extreme Gradient Boosting (XGB)—were trained, and hyperparameters were optimized using five-fold cross-validation. Model evaluation metrics included the coefficient of determination (R²), consistency correlation coefficient (CCC), relative mean absolute error (RMAE), and prediction interval coverage (PICP).

[0039] Table 3 Comparison of mapping accuracy and interval prediction of machine learning models under different slope units

[0040] In the transition zone, the weights of the three quantiles (5%, 50%, and 95%) of the QRF output are dynamically adjusted based on the physical distribution characteristics of the thickness category. For example, theoretically, the middle layer has the highest proportion of thickness in the transition zone (the target distribution is 60% in the middle layer), so the median weight is set to 0.7, and the lower and upper quantile weights are each set to 0.15. For the gentle slope zone, the target distribution of thick layers is high (40%), so the upper quantile weight is increased to 0.4, making the prediction results more consistent with the physical process of sedimentary accumulation.

[0041] According to the data in Table 3, the combined model achieves an overall R² of 0.434 on the validation set, which is better than any single global model (the global best model QRF has an R² of 0.302), and the prediction interval coverage remains at a reasonable level.

[0042] The spatial distribution maps of soil thickness generated by the three models are as follows: Figure 4 The aforementioned patterns reasonably reflect the basic spatial pattern of "thicker at the foot of slopes and in valleys, and thinner at ridges and steep slopes," confirming the fundamental control of topography over material transport. However, differences exist in the details, according to... Figure 4 (a) to Figure 4 A comparison of the diagrams in (g) shows that XGBoost's zonal prediction surface is the smoothest, accurately depicting the continuous gradations of gentle slopes; while the prediction interval map generated by QRF intuitively reveals the distribution of spatial uncertainty, clearly showing that the prediction uncertainty of steep slopes and ridgelines is significantly higher than that of gentle slope accumulation areas, which is consistent with our understanding of actual geographical processes. The prediction results of RF show slightly larger local fluctuations compared to XGBoost.

[0043] In summary, this invention provides a method for predicting soil thickness in hilly and mountainous areas of southern China. This method is based on target feature data obtained through screening and preprocessing of collected data. It divides landform units by using slope as the zoning criterion and determining zoning thresholds using a preset model. Multiple preset prediction models are trained in parallel for each landform unit, and the optimal model is selected to output the prediction result. The prediction result is then constrained and integrated by combining the target feature data with the physical prior data of each landform unit. The entire prediction process is completed autonomously based on the target feature data, zoning rules, and physical prior data, without relying on a single global machine learning model. This effectively overcomes the prediction limitations due to the spatial non-stationarity of soil-environment relationships in complex terrain areas, achieving accurate prediction of soil thickness in hilly and mountainous areas of southern China, and meeting the practical needs of scenarios with strong spatial heterogeneity in soil thickness in mountainous and hilly regions. Meanwhile, this method, through the fusion of geomorphic unit zoning modeling and physical prior constraints, accurately matches the surface process characteristics of different geomorphic units. It specifically addresses the problems of existing single-model approaches that neglect topographic gradient geomorphic differentiation, lack of physical plausibility in prediction results, and insufficient prediction accuracy. This effectively achieves efficient prediction of soil thickness in hilly and mountainous areas of southern China without the support of additional process models, significantly improving the accuracy, physical plausibility, and spatial applicability of soil thickness prediction results. Therefore, this invention solves the problem of the lack of a highly accurate, reliable, and plausible method for predicting soil thickness in hilly and mountainous areas of southern China.

[0044] Example 2 Please see Figure 2 The diagram shown is a structural block diagram of a soil thickness prediction system for hilly and mountainous areas in southern China, as proposed in the second embodiment of the present invention. This soil thickness prediction system 200 for hilly and mountainous areas in southern China includes: a data processing module 21, a model training module 22, and a global prediction module 23, wherein: Data processing module 21 is used to filter and preprocess the collected data to determine target feature data, and to divide the target mountain area into multiple geomorphic units based on the target feature data, using slope as the partitioning basis and a preset model to determine the partitioning threshold. The model training module 22 is used to train multiple preset prediction models in parallel for each of the landform units to determine the corresponding optimal prediction model, and to output the prediction result based on the optimal prediction model. The global prediction module 23 is used to determine the physical prior data corresponding to each of the landform units based on the target feature data, to perform constrained fusion of the prediction results based on the physical prior data, to determine the final prediction data of each landform unit, and to integrate all the final prediction data to determine the prediction information of the target mountain area.

[0045] Example 3 In another aspect, the present invention also proposes an electronic device, please refer to [link to relevant documentation]. Figure 3The diagram shows an electronic device according to the third embodiment of the present invention, including a memory 20, a processor 10, and a computer program 30 stored in the memory and executable on the processor. When the processor 10 executes the computer program 30, it implements the above-described method for predicting soil thickness in hilly and mountainous areas of southern China.

[0046] In some embodiments, the processor 10 may be a central processing unit (CPU), controller, microcontroller, microprocessor or other data processing chip, used to run program code stored in memory 20 or process data, such as executing access restriction programs.

[0047] The memory 20 includes at least one type of readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (e.g., SD or DX memory), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 20 can be an internal storage unit of an electronic device, such as the hard disk of the electronic device. In other embodiments, the memory 20 can also be an external storage device of the electronic device, such as a plug-in hard disk, smart media card (SMC), secure digital (SD) card, flash card, etc. Furthermore, the memory 20 can include both internal and external storage units of the electronic device. The memory 20 can be used not only to store application software and various types of data of the electronic device, but also to temporarily store data that has been output or will be output.

[0048] It should be pointed out that, Figure 3 The structure shown does not constitute a limitation on the electronic device. In other embodiments, the electronic device may include fewer or more components than shown, or combine certain components, or have different component arrangements.

[0049] This invention also proposes a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-described method for predicting soil thickness in hilly and mountainous areas of southern China.

[0050] Those skilled in the art will understand that the logic and / or steps represented in the flowcharts or otherwise described herein, for example, can be considered as a ordered list of executable instructions for implementing logical functions, and can be embodied in any computer-readable medium for use by, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a processor-included system, or other system that can fetch and execute instructions from, an instruction execution system, apparatus, or device). For the purposes of this specification, "computer-readable medium" can mean any means that can contain, store, communicate, propagate, or transmit programs for use by, or in conjunction with, an instruction execution system, apparatus, or device.

[0051] More specific examples of computer-readable media (a non-exhaustive list) include: electrical connections (electronic devices) having one or more wires, portable computer disk drives (magnetic devices), random access memory (RAM), read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disc read-only memory (CDROM). Furthermore, computer-readable media can even be paper or other suitable media on which the program can be printed, because the program can be obtained electronically, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or otherwise processing as necessary, and then stored in computer memory.

[0052] It should be understood that various parts of the present invention can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented using any one or a combination of the following techniques known in the art: discrete logic circuits having logic gates for implementing logical functions on data signals, application-specific integrated circuits (ASICs) having suitable combinational logic gates, programmable gate arrays (PGAs), field-programmable gate arrays (FPGAs), etc.

[0053] In the description of this specification, references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.

[0054] The above embodiments merely illustrate several implementation methods of the present invention, and their descriptions are relatively specific and detailed, but they should not be construed as limiting the scope of the present invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these all fall within the protection scope of the present invention. Therefore, the protection scope of this patent should be determined by the appended claims.

Claims

1. A method for predicting soil thickness in hilly and mountainous areas of southern China, characterized in that, The method includes: Based on the collected data, the target feature data is determined through filtering and preprocessing. Based on the target feature data, the target mountain area is divided into multiple geomorphic units by determining the partition threshold through a preset model using the slope as the partitioning basis. Multiple preset prediction models are trained in parallel for each of the aforementioned landform units to determine the corresponding optimal prediction model, and prediction results are output based on the optimal prediction model. Based on the target feature data, physical prior data corresponding to each landform unit is determined, and the prediction results are constrainedly fused according to the physical prior data to determine the final prediction data of each landform unit. All the final prediction data are then integrated to determine the prediction information of the target mountain area.

2. The method for predicting soil thickness in hilly and mountainous areas of southern China according to claim 1, characterized in that, The steps for determining target feature data through filtering and preprocessing of collected data include: Acquire multi-source environmental covariate data, including topographic attributes, climate data, remote sensing spectra and vegetation indices, as well as soil thickness sample data; A recursive feature elimination method is used to filter features in the multi-source environmental covariate data to determine the associated feature data related to the soil thickness sample data; Different environmental covariates are combined to construct interactive feature data, so that the interactive feature data and the associated feature data determine the target feature data.

3. The method for predicting soil thickness in hilly and mountainous areas of southern China according to claim 1, characterized in that, The steps for dividing the target mountainous area into multiple geomorphic units based on the target feature data, using slope as the partitioning criterion, and determining the partitioning threshold through a preset model include: Using slope as the independent variable and soil thickness as the dependent variable, a decision tree regression model with a pre-defined structure is constructed and trained. By limiting the maximum depth of the decision tree or the number of leaf nodes, the output of a fixed number of split points is controlled. Using the dividing point as the partition threshold, the target mountain area is divided into multiple geomorphic units, wherein the geomorphic units include at least a gentle slope unit dominated by deposition, a transition unit in which erosion and deposition are dynamically balanced, and a steep slope unit dominated by erosion. One-way ANOVA was performed on soil thickness samples within each of the divided topographic units. If the significance level obtained from the analysis was lower than the preset threshold, the zoning scheme was confirmed to be effective.

4. The method for predicting soil thickness in hilly and mountainous areas of southern China according to claim 1, characterized in that, The steps of training multiple preset prediction models in parallel for each of the aforementioned geomorphic units to determine the corresponding optimal prediction model include: For each geomorphic unit, a random forest model, a quantile random forest model, and an extreme gradient boosting model are constructed in parallel, and the target feature data of the geomorphic unit are divided to determine the training set and the validation set. Based on the training set, each preset model is trained by adjusting its hyperparameters using the five-fold cross-validation method with the estimation accuracy as the benchmark to obtain the corresponding optimized model. Based on the validation set, a multidimensional performance evaluation is performed on each optimized model to determine the model with the best overall performance as the optimal prediction model for the corresponding landform unit.

5. The method for predicting soil thickness in hilly and mountainous areas of southern China according to claim 1, characterized in that, The steps for outputting prediction results based on the optimal prediction model include: Based on the training set of the landform unit, the importance of all features in the target feature data is scored and ranked using the feature importance evaluation mechanism provided by the corresponding optimal prediction model. The features with the lowest importance scores are removed one by one, and the performance of the optimal prediction model under each feature subset change is evaluated on the validation set of the landform unit. The subset of features that achieves optimal performance on the validation set is determined as the prediction feature data of the optimal prediction model under the current landform unit, so as to output the prediction structure based on the prediction feature data and the optimal prediction model.

6. The method for predicting soil thickness in hilly and mountainous areas of southern China according to claim 1, characterized in that, The steps of determining the physical prior data corresponding to each landform unit based on the target feature data, and then performing constrained fusion of the prediction results based on the physical prior data to determine the final prediction data for each landform unit include: For each of the aforementioned geomorphic units, based on the prediction results output by the corresponding optimal prediction model, a preliminary predicted probability distribution of the soil thickness at each spatial location point within the geomorphic unit is generated. Obtain the target physical prior distribution of soil thickness corresponding to the dominant geomorphic process of the geomorphic unit; By using a preset distribution alignment function, the preliminary prediction probability distribution is reweighted and deformed and mapped to a physical constraint posterior distribution, so as to sample from the physical constraint posterior distribution to obtain the final prediction data of the spatial location point; The preset alignment function is used to minimize the KL divergence between the adjusted preliminary prediction probability distribution and the target physical prior distribution.

7. The method for predicting soil thickness in hilly and mountainous areas of southern China according to claim 6, characterized in that, The steps for generating a preliminary predicted probability distribution of soil thickness at each spatial location point within the geomorphic unit include: When the optimal prediction model is a quantile random forest, the empirical distribution function is constructed directly using multiple quantiles of its output; When the optimal prediction model is a model that only outputs point predictions, a parameterized distribution centered on the point prediction value and with the error distribution as the standard deviation is constructed based on the prediction error statistics of the optimal prediction model on the validation set of the landform unit.

8. A soil thickness prediction system for hilly and mountainous areas in southern China, characterized in that, The system for implementing the method for predicting soil thickness in hilly and mountainous areas of southern China as described in any one of claims 1 to 7 comprises: The data processing module is used to filter and preprocess the collected data to determine target feature data, and to divide the target mountain area into multiple geomorphic units based on the target feature data, using slope as the partitioning basis and a preset model to determine the partitioning threshold. The model training module is used to train multiple preset prediction models in parallel for each of the landform units to determine the corresponding optimal prediction model, and to output the prediction result based on the optimal prediction model. The global prediction module is used to determine the physical prior data corresponding to each of the landform units based on the target feature data, to perform constrained fusion of the prediction results based on the physical prior data, to determine the final prediction data of each landform unit, and to integrate all the final prediction data to determine the prediction information of the target mountain area.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the steps of a method for predicting soil thickness in hilly and mountainous areas in southern China as described in any one of claims 1 to 7.

10. An electronic device, characterized in that, It includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement a method for predicting soil thickness in hilly and mountainous areas in southern China as described in any one of claims 1-7.