A method and system for decoupling influencing factors of spatial variation of sediment discharge in a karst basin based on model combination

By combining correlation analysis and random forest models with partial least squares structural equations, the decoupling problem of multi-factor coupling in sediment transport in karst watersheds was solved, enabling accurate identification and effect decomposition of sediment transport changes, and improving the robustness and explanatory power of the model.

CN121456444BActive Publication Date: 2026-06-19INSTITUTE OF SUBTROPICAL AGRICULTURE CHINESE ACADEMY OF SCIENCES

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INSTITUTE OF SUBTROPICAL AGRICULTURE CHINESE ACADEMY OF SCIENCES
Filing Date
2025-12-02
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing research has difficulty in systematically quantifying the comprehensive impact of multi-factor coupling on sediment transport changes in karst basins. Traditional methods suffer from collinearity and path estimation reliability issues when identifying key driving factors, resulting in scattered conclusions, poor reproducibility, and a lack of robust decoupling analysis methods.

Method used

A combined approach of correlation analysis, random forest, and partial least squares structural equation modeling was adopted. By constructing the correspondence between latent variables and observed variables, representative observed variables were selected, a partial least squares structural equation model was constructed, path coefficients and effect strengths were calculated, and the direction and contribution rate of each latent variable on the spatial variation of sediment transport were analyzed.

Benefits of technology

It significantly improves the quantitative decoupling accuracy and model stability of the multi-factor effects of sediment transport in karst watersheds, and can accurately identify the main controlling factors under complex geomorphological and multi-source data conditions, providing a quantitative basis for soil and water conservation and ecological restoration decisions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121456444B_ABST
    Figure CN121456444B_ABST
Patent Text Reader

Abstract

This invention provides a method and system for decoupling the influencing factors of spatial variation in sediment transport in karst watersheds based on model combination, belonging to the field of hydrogeomorphological modeling technology. This invention constructs a basic watershed dataset, establishes the relationship between latent and observed variables within a framework of five factors: climate, lithology, soil, topography, and landscape, uses correlation analysis to screen variables, combines random forest analysis to assess importance and select representative variables, and utilizes partial least squares structural equation modeling to calculate path coefficients and effect strengths, quantitatively analyzing the direction and contribution rate of each latent factor to the spatial variation of sediment transport. This method achieves quantitative decoupling of multi-factor effects, improving the accuracy of identifying controlling factors, the depth of mechanism analysis, and model stability.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of hydrogeomorphological modeling technology, and in particular to a method and system for decoupling the influencing factors of spatial variation of sediment transport in karst watersheds based on model combination. Background Technology

[0002] Sediment transport in karst regions is influenced by multiple factors, including climate, lithology, soil, topography, and landscape patterns. Most existing studies focus on only one or a few factors, making it difficult to systematically quantify the combined impact of multi-factor coupling on sediment transport variations at a spatial scale. Particularly in karst watersheds characterized by widespread carbonate rock distribution, highly interconnected surface and groundwater hydrological processes, and complex and diverse landscape structures, the interaction of precipitation input, topographic relief, and landscape patterns results in significant spatial heterogeneity in sediment transport. Traditional single-model analysis or simple correlation analysis methods struggle to reveal the nonlinear coupling relationships between factors and their contribution to sediment transport mechanisms, making it difficult to achieve quantitative decoupling of the driving mechanisms.

[0003] Existing methods generally suffer from a disconnect between variable selection and causal identification. On the one hand, the system of variables affecting sediment transport is vast and exhibits severe collinearity, making it difficult for traditional selection methods based on linear assumptions to reliably identify key driving factors. On the other hand, while structural equation modeling can reflect the direct and indirect effects between latent variables, it is highly dependent on the representativeness of observed indicators; if the initial variable selection is insufficient, the reliability of path estimation is easily affected. The unique hydrogeological structure, discontinuous soil distribution, and highly fragmented landscape of karst regions further amplify these problems, resulting in scattered and poorly reproducible conclusions in existing studies regarding the dominant factors, contribution rates, and interactions of spatial variation in sediment transport. A comprehensive analytical method capable of robust decoupling under complex geomorphological conditions and multi-source data is still lacking. Summary of the Invention

[0004] To overcome the shortcomings of existing technologies, the purpose of this invention is to provide a decoupling method and system for the spatial variation of sediment transport in karst basins based on model combination. By combining correlation analysis, random forest and partial least squares structural equation model, quantitative decoupling of the effects of multiple factors on sediment transport in karst basins is achieved, which significantly improves the accuracy of main control factor identification, the depth of action mechanism analysis and model stability.

[0005] To achieve the above objectives, the present invention provides the following solution:

[0006] A method for decoupling factors influencing spatial variation of sediment transport in karst watersheds based on model combination includes:

[0007] Construct the basic dataset for the target karst watershed;

[0008] Within the framework of five factors—climate, lithology, soil, topography, and landscape—a correspondence between latent variables and observed variables is established based on the aforementioned basic dataset.

[0009] Using the observed variables as input and sediment transport as output, Pearson correlation analysis was performed to select the observed variables with significant correlations and generate a candidate factor set.

[0010] A random forest model is established using the candidate factor set as input, and the importance of variables is evaluated based on the change of out-of-package error (OBB Error) to select representative observed variables;

[0011] Using climate, lithology, soil, topography, and landscape as latent variables, a partial least squares structural equation model was constructed in conjunction with the aforementioned representative observed variables. The path coefficients and effect strengths were calculated to obtain the model estimation results.

[0012] Based on the model estimation results, the influence direction and contribution rate of each potential variable on the spatial variation of sediment transport are analyzed, the dominant influencing factors are determined, and the decoupling results are output.

[0013] A decoupling system for factors influencing spatial variation of sediment transport in karst watersheds based on model combination includes:

[0014] Basic data construction unit, used to build the basic dataset for the target karst watershed;

[0015] The latent variable mapping unit is used to establish the correspondence between latent variables and observed variables based on the basic dataset within the framework of five factors: climate, lithology, soil, topography, and landscape.

[0016] The correlation screening unit is used to perform Pearson correlation analysis with the observed variable as input and sediment transport as output, select the observed variable with significant correlation, and generate a candidate factor set.

[0017] The variable selection unit is used to build a random forest model with the candidate factor set as input, evaluate the importance of variables based on the change of out-of-package error, and select representative observed variables.

[0018] The structural equation modeling unit is used to construct a partial least squares structural equation model using climate, lithology, soil, topography and landscape as latent variables, combined with the representative observed variables, to calculate path coefficients and effect strengths, and obtain model estimation results.

[0019] The decoupling analysis unit is used to analyze the direction and contribution rate of each potential variable on the spatial variation of sediment transport based on the model estimation results, determine the dominant influencing factors, and output the decoupling results.

[0020] The present invention discloses the following technical effects:

[0021] This invention incorporates five categories of latent variables—climate, lithology, soil, topography, and landscape—into a unified framework. By combining initial screening with correlation analysis and optimal selection using random forest, it effectively overcomes the limitations of traditional linear models in identifying multicollinear and high-dimensional heterogeneous data, achieving robust screening of key observational variables influencing the spatial variation of sediment transport. Compared to methods relying solely on single statistical correlation or principal component analysis, this method significantly improves the accuracy of identifying multi-factor synergistic effects and the discriminative power of dominant factors.

[0022] This invention introduces a PLS-SEM model based on the results of random forest optimization. By decomposing direct, indirect, and total effects using path coefficients, it achieves quantitative decoupling of the mechanisms by which climate, lithology, soil, topography, and landscape affect sediment transport. This model can simultaneously assess the causal chains and interaction directions among latent variables, making the positive and negative influence relationships in complex systems clearer. Its explanatory power is significantly higher than that of a single model, making it suitable for analyzing the highly heterogeneous characteristics of karst watersheds.

[0023] This invention establishes a closed-loop mechanism from variable selection to structural equation verification through multi-model cascading, avoiding dependence on large samples or normal distributions and improving the robustness of the models under complex terrain and limited sample conditions. The resulting decoupling results not only reveal the dominant role of landscape and topographic factors but also serve as a quantitative decision-making basis for watershed soil and water conservation and ecological restoration, providing scientific support for the research and management of water and sediment processes in karst regions. Attached Figure Description

[0024] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0025] Figure 1 A flowchart of the method provided in an embodiment of the present invention;

[0026] Figure 2 A schematic diagram illustrating the temporal variation of annual sediment transport in a typical karst basin from 1950 to 2015, provided for embodiments of the present invention.

[0027] Figure 3 A schematic diagram illustrating the temporal variation of monthly sediment transport in three typical karst basins from 2003 to 2015, provided for embodiments of the present invention.

[0028] Figure 4 This is a schematic diagram illustrating the spatiotemporal variation of watershed runoff, provided as an embodiment of the present invention.

[0029] Figure 5This is a schematic diagram illustrating the spatiotemporal variation of sediment transport in a watershed, provided as an embodiment of the present invention.

[0030] Figure 6 A schematic diagram of Pearson correlation analysis between runoff, sediment transport and climate, soil, topography, land use and landscape factors provided in an embodiment of the present invention;

[0031] Figure 7 A schematic diagram illustrating the analysis of the relative importance of different variables to runoff using the random forest algorithm provided in this embodiment of the invention;

[0032] Figure 8 This is a schematic diagram of the first PLS-SEM model analysis results provided in an embodiment of the present invention;

[0033] Figure 9 This is a schematic diagram of the second PLS-SEM model analysis result provided in an embodiment of the present invention;

[0034] Figure 10 This is a schematic diagram illustrating the spatiotemporal variations of RDs, R25, and PT in a research watershed, provided for an embodiment of the present invention.

[0035] Figure 11 A schematic diagram illustrating the correlation analysis between climate factors and runoff, and between landscape factors and sediment transport, provided for embodiments of the present invention.

[0036] Figure 12 This is a schematic diagram illustrating the relationship between R25 and runoff provided in an embodiment of the present invention;

[0037] Figure 13 This is a schematic diagram illustrating the relationship between RX3 and sediment transport volume provided in an embodiment of the present invention;

[0038] Figure 14 A schematic diagram illustrating the changing trends of water area (W), weighted average shape index (SHAPE-AM), and landscape shape index (LSI) factors in the landscape factors provided for embodiments of the present invention. Detailed Implementation

[0039] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0040] The purpose of this invention is to provide a method and system for decoupling the influencing factors of spatial variation of sediment transport in karst watersheds based on model combination. By constructing a combined system of correlation analysis, random forest and structural equation model, the method achieves accurate identification and effect decomposition of the influencing factors of spatial variation of sediment transport in karst watersheds, and significantly improves the explanatory power of multi-factor coupling relationship and the robustness of the model.

[0041] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0042] Figure 1 The method flowchart provided in the embodiments of the present invention is as follows: Figure 1 As shown, this invention provides a method for decoupling the influencing factors of spatial variation of sediment transport in karst watersheds based on model combination, including:

[0043] Step 100: Construct the basic dataset for the target karst watershed;

[0044] Step 200: Within the framework of five factors—climate, lithology, soil, topography, and landscape—establish the correspondence between latent variables and observed variables based on the basic dataset;

[0045] Step 300: Using the observed variables as input and sediment transport as output, perform Pearson correlation analysis, select the observed variables with significant correlation, and generate a candidate factor set;

[0046] Step 400: Build a random forest model with the candidate factor set as input, evaluate the importance of variables based on the change in out-of-package error, and select representative observed variables;

[0047] Step 500: Using climate, lithology, soil, topography, and landscape as latent variables, and combining them with representative observed variables, construct a partial least squares structural equation model, calculate path coefficients and effect strengths, and obtain model estimation results;

[0048] Step 600: Based on the model estimation results, analyze the direction and contribution rate of each potential variable to the spatial variation of sediment transport, determine the dominant influencing factor, and output the decoupling results.

[0049] Specifically, this embodiment first involves basic data collection and processing. Meteorological data (precipitation, temperature, and potential evapotranspiration data) were obtained from 52 meteorological stations in the study area's watershed (target karst watershed) and surrounding regions, provided by the China Meteorological Information Network. Potential evapotranspiration (PET) was calculated using the Penman formula, and precipitation and potential evapotranspiration were calculated using the CoKriging interpolation algorithm in ArcGIS 10.8 software to obtain the spatial average for each watershed. Lithological maps were obtained from the Institute of Geochemistry, Chinese Academy of Sciences, to calculate the carbonate rock cover of the karst watershed. Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) were retrieved from the Google Earth engine platform. Land use data were selected from a dataset of 30 million land cover parcels annually in China from 1990 to 2021, including nine land use types: farmland, forest, shrubland, grassland, water area, snowfield, bare land, impervious surfaces, and wetlands. Soil data is extracted and classified from the World Soil Database (HWSD) to obtain data such as soil particle size distribution, soil bulk density (SBD), electrical conductivity (EC), calcium carbonate content (CAC), pH, and organic carbon content (SOC).

[0050] Topographic data were extracted from DEM data, with 30-meter resolution digital elevation model (DEM) data downloaded from NASA. The downloaded NASA DEM dataset underwent preprocessing including format conversion, projection conversion, and masking, and 40 corresponding watershed boundaries were generated from the DEM using ArcGIS 10.8. Landscape data selected commonly used landscape indices (edge ​​indices, shape indices, diversity indices, etc.) that reflect changes in landscape patterns, including three levels: patch, class, and landscape. These were obtained from land use maps using Fragstats 4.2 software. Runoff and sediment transport datasets were obtained from watershed hydrological stations and the *China Sediment Bulletin*, with measurements taken at the watershed outlet. All datasets were checked before publication to ensure their reliability and consistency.

[0051] Optionally, this embodiment provides a Partial Least Squares-Structural Equation Modeling (PLS-SEM), a causal modeling method that combines principal component analysis with multiple regression, aiming to maximize the explained variance of the relevant structure. Based on certain assumptions, the complex relationships between latent variables can be measured by the corresponding observed variables. This model can be divided into an internal model and an external model. The internal model (structural model) addresses the complex relationships between the interactions of latent variables, while the external model (measurement model) considers the relationship between each latent variable and its corresponding observed variable. The PLS-SEM model combines the measurement model and the structural model to establish a conceptual model of the relationship between independent and dependent variables. This model uses an iterative algorithm to solve for the components of the measurement model and predicts the relevant path coefficients in the structural model using partial least squares methods. The relationship between latent variables (structural model) (ξ) j ) can be represented as:

[0052] ;

[0053] In the formula, ξ j (j=1,...,j) refers to general endogenous latent variables; β ji ξ is the path coefficient between the i-th extrinsic latent variable and the j-th endogenous latent variable; j This is the error in the internal relationship of the model. Latent variables (ξ) j ) and observed variables (X) jk The relationship between ) can be expressed as:

[0054] ;

[0055] λ jk Is the j-th dominant variable correlated in the k-th block? Error term ε jk This represents the uncertainty error in the measurement. A goodness-of-fit (GoF) index is chosen to validate the model and determine its predictive ability. This includes the average community index and the average R-squared value. 2 The geometric mean of the product of values. It can be expressed as:

[0056] ;

[0057] The PLS-SEM model goodness-of-fit index is primarily used to evaluate the overall predictive ability of the model. In this system, the total effect between two variables is the sum of direct and indirect effects. Direct effects are determined by the corresponding path coefficients, while indirect effects refer to the paths involving mediating variables. This model quantifies the direct and indirect effects between multiple factors by constructing a causal network between latent and observed variables, making it suitable for analyzing relationships in complex systems with small sample sizes. This method has fewer data limitations and is mainly applied to theoretical development and outcome prediction. This embodiment uses R and the "PLSP" software package, employing a component-based PLS-SEM model.

[0058] Random Forest (RF) is a nonparametric regression machine learning algorithm that combines randomized classification and regression tree sets, primarily used for classification and prediction. This method prioritizes influencing factors based on their relative importance, aiding in classification and prediction paradigms. The algorithm uses bootstrapping resampling to randomly select N training trees from the original dataset and builds classification and regression trees, generating an RF consisting of N classification and regression trees. The tree with the highest repetition rate is the final result. For each iteration, each sample from the uncollected data portion generates out-of-bag (OOB) samples to guide the exclusion of data points from the sample, ultimately evaluating model performance. The mean squared error (MSE) calculated from the OOB data can serve as a metric for quantifying the importance of predictor variables. A larger MSE value indicates greater importance of the corresponding variable and a greater contribution to the model. This algorithm requires fewer parameter adjustments and can handle small sample sizes and complex data structures. The RF algorithm is widely used due to its adept management of extensive and complex datasets. The RF algorithm implementation is based on the "randomForest" package in the R statistics environment and performs calculations within the Matlab 2016a processing framework.

[0059] As an optional implementation, this embodiment introduces "core factor priority probability" and "redundancy suppression weighted partitioning criterion" during the random forest training process. The core factor priority probability refers to the probability used for node feature sampling, which is calculated by combining the statistical correlation strength between each observed variable and sediment transport volume obtained from previous steps, and the predictive importance reflected by the error increment of that observed variable on out-of-package samples; the higher the probability, the greater the frequency of inclusion in the candidate set. The redundancy suppression weighted partitioning criterion, when selecting partitioning variables for a node, simultaneously considers the error reduction magnitude of the current node, the core factor priority probability of the variable in the global scope, and the maximum correlation between the variable and the selected variables in the tree path. This balances the improvement of local accuracy in node partitioning with the independence of global variables, prioritizing candidates that can significantly reduce error without redundancy with already used variables. This embodiment addresses a total of 103 observed variables and 5 latent variable categories, ensuring that at least one used variable participates in redundancy determination at each node to guarantee the effectiveness and interpretability of path constraints.

[0060] This embodiment merges two types of evidence into a single, comparable sampling probability: one is the correlation strength with sediment transport (taking the absolute value to ensure the direction does not affect the weight), and the other is the mean squared error increment on out-of-package samples after scrambling the variable, used to measure the model's dependence on the variable. To obtain a stable probability distribution, this embodiment uses exponential normalization on all candidate variables, compressing the two types of evidence with different dimensions to the same scale of zero to one, and ensuring that the sum of the probabilities of all candidate variables is strictly one. To avoid over-concentration, this embodiment uses the default setting of a smoothing temperature of 1, without introducing additional weights to be adjusted; the ratio of the number of features participating in the candidate for each node to the total number of features is not less than 0.2 to maintain a balance between exploratory nature and computational cost; the significance determination follows the previous screening criteria, with a threshold of 0.05, used to filter variables with low evidence strength into the sampling pool.

[0061] After completing node-level feature sampling, this embodiment calculates a weighted score for each candidate variable. This score consists of three parts: the error reduction of the current node (reflecting immediate gains), the priority probability of core factors (reflecting global importance), and redundancy penalty (reflecting the maximum correlation with variables from previous paths). When multiple candidates have similar scores, the candidate with the larger error reduction is selected first to ensure improved local fit. The lower limit of the redundancy penalty is fixed at 1, corresponding to the strength of a baseline without redundancy. The upper limit of the correlation is 1; when it approaches the upper limit, the penalty term increases to suppress duplicate selections that are highly correlated with variables already used on the path. This embodiment completes the above decision without introducing any new adjustable weights, ensuring that the selection logic is stable and consistent with prior evidence.

[0062] Throughout the entire forest, this embodiment accumulates the weighted scores of the selected variables at each node to form a forest-level variable importance sequence. Based on this sequence, a representative set of observed variables is output from high to low importance for direct use in subsequent structural equation modeling. To reduce sampling randomness, bootstrap resampling is used during training, with the sample size of a single resampling being approximately 63% of the original dataset and the proportion of out-of-package samples being approximately 37%. The forest size is determined after comprehensive validation from four candidate sizes: 50, 100, 200, and 300. The empirical upper limit for representative observed variables is no higher than 30 to control the dimensionality and stability of subsequent measurement models.

[0063] This embodiment breaks away from the conventional practice of relying entirely on random selection or fixed parameters in variable sampling and node partitioning in traditional random forests. By introducing an integrated mechanism of core factor priority probability and redundancy suppression, it achieves dynamic self-calibration of variable importance without increasing manual tuning and hyperparameters.

[0064] The core of this embodiment lies in generating probabilistic variable sampling weights using statistical evidence provided by prior correlation analysis and out-of-package error increments, making it easier for high-contribution variables to be selected by nodes. Simultaneously, redundancy penalties at the node level control the repeated selection of collinear variables, thereby improving interpretability and sparsity while ensuring model stability. This structure enables the variable selection process to combine global relevance guidance with local error minimization, forming an adaptively adjustable decision tree ensemble mechanism. Compared to traditional random forests that select partitioning variables solely based on information gain or pure error metrics, this scheme, through a joint design of probabilistic feature sampling and redundancy self-suppression, achieves robust high-dimensional feature selection in multi-factor coupled scenarios in karst basins, significantly improving the model's interpretability and the reliability of variable importance assessment.

[0065] To investigate the spatial differences in runoff and sediment transport variations and influencing factors in karst watersheds, this study focuses on 40 sub-basins of the Xijiang and Wujiang rivers within the entire study area. In recent years, to control severe soil erosion and restore the degraded environment, the region implemented a long-term, policy-driven green food project in the late 1990s and early 21st century, converting farmland into forests or grasslands through natural vegetation restoration. Sediment transport showed a significant downward trend in the early 21st century and became relatively stable during the study period (e.g., Figure 2 (As shown). Therefore, the study period selected data records from 2009 to 2012 (as shown). Figure 3 (As shown).

[0066] The PLS-SEM model can maximize the explained variance of the relevant structure and has low requirements for sample size and data distribution. This embodiment aims to leverage the advantages of the PLS-SEM model to provide a comprehensive and detailed understanding of the various factors controlling the IC (internal capacity) of karst watersheds. To determine the main factors controlling runoff and sediment transport changes in 40 karst watersheds from 2009 to 2012, Pearson correlation analysis was first used to quantify the relationships between runoff, sediment transport, and their latent variables. Then, random forest (RF) analysis was performed using factors that significantly influenced runoff and sediment transport changes. Finally, the main influencing factors of runoff and sediment transport changes selected by the RF algorithm were input into the PLS-SEM model to quantitatively analyze the impact of climate, lithology, soil, topography, and landscape on watershed runoff and sediment transport changes.

[0067] The study selected 103 potential variables affecting runoff and sediment transport, which can be categorized into five types: climate (17), lithology (1), soil (9), topography (27), and landscape (49). To decouple the complex relationships between runoff, sediment transport, and potential variables, PLS-SEM model analysis was performed using R and the "PLSPM" software package. This appendix systematically summarizes the various potential factors affecting sediment transport in karst watersheds. The climatic factors include consecutive dry days (CDD), consecutive rainy days (CWD), potential evapotranspiration (PE), rainfall indices of different intensities (such as R25, R50, RX1, etc.), and temperature-related variables; topographic factors involve elevation (E), slope (S), aspect (A), topographic humidity index (TWI), flow intensity index (SPI), and watershed morphological characteristics (such as area, shape, river network density, etc.); soil factors include physicochemical properties such as particle size composition (GC, SC, STC, CC), bulk density (SBD), organic carbon (SOC), and pH value; landscape indices describe landscape structure from multiple perspectives, including patch scale (NP, PD, LPI), shape (LSI, SHAPE), spatial configuration (CONTIG, COHESION, AI), and diversity (SHDI, SIDI); and lithological aspects focus on carbonate rock cover extent (CBC). These factors together constitute a comprehensive parameter system for analyzing sediment transport mechanisms in karst regions. In this embodiment, the PLS-SEM model is selected to explore the conceptual model of interactions among multiple variables. Here, it is assumed that the latent variables (climate, lithology, soil, topography, and landscape factors) have a significant impact on the dependent variables (runoff and sediment transport).

[0068] This embodiment selects 40 watersheds in the Southwest Karst region as the research objects to explore the main controlling factors of watershed runoff and sediment transport changes. Table 5-1 shows the basic statistical information of the location, watershed area, average runoff, and sediment transport of the 40 watersheds. It can be seen that the watershed area of ​​the 40 watersheds ranges from 420 km². 2up to 339175km 2 Among the 40 watersheds, Fuyang (No. 17) had the smallest drainage area, while Gaoyao (No. 19) had the largest. Average runoff and sediment load were calculated using data from four years (2009-2012). During the study period, runoff from the 40 watersheds ranged from 146 to 1445 mm, with Xiqiao (No. 37) and Guilin (No. 20) having the smallest and largest runoff, respectively. Sediment load ranged from 2.0 to 246 t / km². -2 The variations are shown in Tables 1 and 2. Generally, with the increase of the drainage area, both runoff and sediment transport tend to decrease. For example, Guilin No. 20 (2527 km²) shows a decreasing trend. 2 ), No. 25 Lipu (892km) 2 ) and the Taiping Basin No. 33 (2744km) 2 The runoff was relatively small, while the runoff was 1445, 1038, and 1201 mm, respectively, and the sediment transport was 102, 170, and 164 t / km, respectively. –2 This indicates a large runoff and average annual sediment load. Correspondingly, the Tian'e River Basin (No. 34, 104772 km²) 2 ) and the Wuzhou River Basin No. 35 (314326km) 2 The area is relatively large, but the runoff (347, 514 mm) and sediment transport (2, 28 t / km) are relatively small. -2 (It is relatively small.)

[0069] Table 1. Basic Statistical Information of 20 Hydrological Stations

[0070] Site Longitude (E) Latitude (N) <![CDATA[Area (km 2 )]]> Average runoff (mm) <![CDATA[Average sediment transport volume (t km -2 )]]> 1 Baojialou 108°19′ 29°27′ 3903 731 210 2 Hefeng 106°55′ 26°56′ 420 446 89 3 Carp Pond 107°16′ 27°26′ 4821 376 16 4 Qixingguan 104°57′ 27°9′ 2880 281 148 5 Shibantang 106°12′ 27°9′ 1441 325 53 6 Shiqian 108°13′ 27°31′ 758 479 34 7 Sinan 108°15′ 27°56′ 50352 398 9 8 Wulong 107°44′ 29°20′ 83035 475 29 9 Along the river 108°30′ 28°34′ 54412 387 16 10 Yang Chang 105°11′ 26°39′ 2438 392 122 11 Changba 107°41′ 28°48′ 5499 459 132 12 Put this 107°51′ 25°59′ 1442 640 86 13 Caotouping 104°57′ 25°52′ 4957 998 128 14 Dadukou 104°43′ 26°17′ 8104 341 246 15 Dahuang River Estuary 110°12′ 23°35′ 276582 504 31 16 opposite pavilion 109°40′ 24°26′ 7456 860 80 17 Fuyang 111°16′ 24°51′ 482 578 31 18 high-speed train 105°40′ 25°52′ 2146 566 76 19 Gaoyao 112°28′ 23°30′ 339175 521 40 20 Guilin 110°19′ 25°14′ 2527 1445 102

[0071] Table 2. Basic Statistical Information of the Other 20 Hydrological Stations

[0072] Site Longitude (E) Latitude (N) <![CDATA[Area (km 2 )]]> Average runoff (mm) <![CDATA[Average sediment transport rate (t km -2 )]]> 21 Golden Rooster 110°50′ 23°13′ 9094 753 163 22 Lao Village 110°57′ 24°15′ 1582 659 145 23 Leigongtan 106°35′ 25°25′ 5439 589 24 24 Libo 107°52′ 25°25′ 1283 624 146 25 Lipu 110°24′ 24°3′ 892 1038 170 26 Liuzhou 109°24′ 24°20′ 45469 719 98 27 Maling 104°55′ 25°11′ 2143 469 156 28 Malong 108°19′ 24°14′ 3092 645 74 29 Ping Le 110°40′ 24°37′ 11883 920 53 30 Pingli River 107°3′ 25°50′ 1410 545 35 31 Rongshui 109°15′ 25°4′ 23516 792 96 32 Three-way 108°57′ 24°28′ 16488 639 81 33 Taiping 110°37′ 23°43′ 2744 1201 164 34 Tian'e 107°10′ 24°59′ 104772 347 2 35 Wuzhou 111°20′ 23°28′ 314326 514 28 36 Wuxuan 109°39′ 23°35′ 195730 502 34 37 West Bridge 103°38′ 25°1′ 3042 146 23 38 Yongwei 109°17′ 25°42′ 12883 546 213 39 Benefit 103°50′ 25°35′ 572 156 8 40 Zouwei 108°53′ 23°23′ 1867 814 38

[0073] The temporal variation in runoff ranged from 83 to 1781 mm. The runoff in the Xiqiao basin (No. 37) reached its minimum in 2011, while the runoff in the Taiping basin (No. 33) reached its maximum in 2010 (e.g.,...). Figure 4 (As shown). The sediment transport volume ranges from 1 to 427 t / km. -2 The minimum value was reached in the Zhanyi River Basin (No. 39) in 2011, and the maximum value was reached in the Taiping River Basin (No. 33) in 2012 (e.g.). Figure 5 (As shown).

[0074] Correlation analysis among the 103 selected latent variables showed that most factors were significantly correlated with runoff or sediment transport (P<0.05) (e.g., Figure 6(As shown in the figure). * and ** indicate that the latent variables are significantly correlated with sediment transport under the conditions of P < 0.05 and P < 0.01; ^ and ^^ indicate that the latent variables are significantly correlated with runoff under the conditions of P < 0.05 and P < 0.01. This indicates that 60 factors are significantly correlated with runoff, and 23 factors are significantly correlated with sediment transport. The latent variables for runoff are mainly landscape and climate variables, while topographic factors have a greater impact on sediment transport. Specifically: the factors significantly correlated with runoff are 16 climate factors, 1 lithology factor, 5 soil factors, 8 topographic factors, and 30 landscape factors. The factors significantly correlated with sediment transport are 2 climate factors, 1 lithology factor, 3 soil factors, 10 topographic factors, and 7 landscape factors.

[0075] The descriptive statistical properties of selected factors influencing runoff or sediment transport, such as mean, median, maximum, minimum, standard deviation, and coefficient of variation (CV), are shown in Tables 3 and 4. Topographic factors exhibit stronger variability (CV > 100%) compared to other factors. Among these, the CV ranges from 8% to 48% for climate factors, with relatively stable maximum daily temperature (Tx) and low variability (CV ≤ 10%). Soil properties have CVs ranging from 2% to 60%, with low dispersion and weak variability for soil saturation (SBD) and pH (CV ≤ 10%). The CV range for topographic factors was 4%–217%. Topographic moisture index (TWI), drainage density (DD), and sediment transport capacity index (LS) were relatively stable with low variability (CV ≤ 10%). However, catchment area (AREA), catchment perimeter (CP), watershed length (L), river length (SL), fracture magnitude (SM), river power index (SPI), minimum elevation (Emin), and watershed outlet height (EO) showed strong variability (CV > 100%). The CV range for landscape factors was 1%–212%. Mean shape index (SHAPE_MN), mean fractal dimension index (FRAC_MN), mean perimeter-area ratio (PARA_MN), and perimeter-area fractal dimension (PAFRAC) showed weak variability (CV ≤ 10%), while patch number (NP) and landscape shape index (LSI) showed strong variability (CV > 100%). Other landscape factor variables showed moderate variability (10% < CV ≤ 100%).

[0076] Table 3. Statistical characteristics of the first selected potential variable affecting runoff or sediment transport.

[0077] variable average value Median Maximum value Minimum value SD CV climate RDs 102.2 103.4 118.8 70.3 11.7 11% RX3 116.4 108.3 178.1 66.4 28.2 24% CDD 30.7 29.5 58.5 19.3 9.6 31% CWD 7.2 7.2 11.0 4.8 1.5 21% PT 1106.4 1067.8 1621.1 655.3 273.5 25% RD25 12.3 12.1 957.3 5.5 4.4 36% R25 520.9 532.6 957.3 209.6 199.6 38% RD50 3.3 2.8 458.0 0.5 1.6 47% R50 221.5 190.9 458.0 27.3 105.7 48% RI 7.3 7.0 178.1 4.0 1.9 25% RX1 76.9 76.7 94.6 46.6 8.6 11% RX2 100.5 95.6 145.1 56.4 20.5 20% RX4 131.0 121.6 203.4 70.4 32.5 25% RX5 140.2 133.5 212.0 75.4 32.6 23% T 17.2 16.5 21.6 12.7 2.5 14% Tx 28.7 29.1 50.0 23.9 2.4 8% Lithology CBC 54.4 57.2 93.2 0.8 28.9 53% soil pH 5.7 5.6 6.5 5.1 0.3 6% CAC 1.2 1.0 3.2 0.2 0.7 60% EC 0.1 0.1 0.2 0.1 0.0 11% STC 32.0 31.3 44.3 25.5 3.7 11% CC 34.5 34.2 44.3 23.2 5.5 16% SBD 1.3 1.3 1.6 1.3 0.0 2% SOC 1.3 1.3 6.5 1.1 0.2 12% terrain S 17.8 18.0 23.8 11.0 2.7 15% TWI 6.0 6.0 6.7 5.5 0.3 4% AREA 40277 4365 339175 420.0 86340 214% CP 1136 428.9 6579 102.5 1637 144% L 237.3 114.2 1082 30.3 278.3 117% FF 0.3 0.3 0.5 0.2 0.1 22% SL 19854 1998 169083 207.2 42986 217% DD 0.5 0.5 0.6 0.4 0.0 7% SM 4025 425.5 34268 38.0 8734 217% LS 64.2 63.1 76.8 56.0 5.3 8% SPI 49170 18642 236604 1721 69597 142%

[0078] Table 3. Statistical characteristics of the second selected latent variable affecting runoff or sediment transport.

[0079] variable average value Median Maximum value Minimum value SD CV Emin 442.3 200.0 2888.0 2.0 492.3 111% Emax 2144.3 2014.5 2888.0 1114.0 532.9 25% E 984.0 875.9 2076.0 294.9 524.0 53% EO 461.7 237.0 1877.0 35.0 488.3 106% HI 0.3 0.3 93.2 0.1 0.1 35% DF 522.3 454.9 1183.5 104.2 262.1 50% landscape SHDI 1.0 1.0 1.3 0.6 0.2 16% W 0.6 0.4 2.3 0.0 0.6 100% NP 3150 316.5 26072 57.0 6687 212% LSI 28.0 15.0 108.4 6.1 29.9 107% SHAPE_AM 14.0 8.5 54.7 3.7 13.4 95% PR 5.3 5.0 6.0 3.0 0.7 14% RPR 87.5 83.3 100.0 50.0 12.4 14% FL 23.8 22.1 83.6 7.6 8.2 34% WL 57.6 58.3 83.9 29.1 13.8 24% GL 17.0 14.5 47.6 2.5 10.6 63% LPI 50.4 52.9 83.6 11.9 18.8 37% ED 7.9 8.1 10.3 4.1 1.6 20% GYRATE_MN 927.4 913.2 1155.4 754.5 101.0 11% GYRATE_CV 179.4 186.8 234.0 117.1 32.9 14.1% SHAPE_MN 1.3 1.3 1.5 1.2 0.1 6% SHAPE_CV 69.1 70.1 91.1 49.1 10.4 15% FRAC_MN 1.0 1.0 22.6 1.0 0.0 1% FRAC_CV 3.9 3.9 4.8 3.0 0.4 11% PARA_MN 34.8 34.7 36.8 32.9 1.0 3% PARA_AM 17.1 17.3 22.6 9.4 3.4 20% PARA_CV 19.7 19.9 23.4 15.8 1.9 10% CONTIG_MN 0.1 0.1 0.2 0.1 0.0 19% CONTIG_AM 0.6 0.6 1.7 0.4 0.1 15% CONTIG_CV 131.3 125.1 189.1 104.0 19.8 15% PAFRAC 1.6 1.7 76.5 1.6 0.0 2% ENN_MN 2801.6 2768.2 3750.8 2253.9 313.5 11% CONTAG 42.9 42.8 66.3 9.1 10.9 25% PLADJ 57.3 56.8 76.5 43.5 8.5 15% DIVISION 0.7 0.7 1.0 0.3 0.2 25% SPLIT 5.2 3.5 22.9 1.4 4.8 92% SIDI 0.6 0.6 1.2 0.3 0.1 19% MSIDI 0.8 0.9 1.2 0.3 0.2 26% SHEI 0.6 0.6 1.0 0.4 0.1 19% SIEI 0.7 0.7 78.39 0.4 0.1 19% MSIEI 0.5 0.5 0.91 0.2 0.2 29% AI 58.8 58.2 78.4 45.3 8.3 14%

[0080] In Tables 3 and 4, SD represents standard deviation; CV represents coefficient of variation. Italics indicate variables significantly correlated with runoff, regular font indicates variables significantly correlated with sediment transport, and bold font indicates variables significantly correlated with both runoff and sediment transport.

[0081] Specifically, Pearson correlation analysis was used to select factors significantly correlated with runoff (P ​​< 0.05). Subsequently, the importance of each factor to runoff was determined using the RF algorithm (e.g., Figure 7 (As shown). Based on the normalized RF importance values, the three variables with the highest relative importance were selected to determine the observation variables for the PLS-SEM framework. Heavy rainfall (R25), rainy days (RDs), and total annual precipitation (PT) were identified as the observed variables for climate factors; pH, SBD, and SOC were identified as observed variables to characterize soil factors; EO, elevation (E), and elevation integral (HI) were identified as the main observed variables for topographic factors; and farmland (FL), edge density (ED), and weighted average perimeter-area ratio (PARA-AM) were identified as observed variables for landscape factors.

[0082] Based on the RF algorithm analysis, key factors were selected as observed variables and used as latent variables. The PLS-SEM model was then used to decouple the relative importance of climate, lithology, soil, topography, and landscape on runoff. The GOF result of the PLS-SEM model was 0.76, which, being greater than 0.5, indicates that the model is meaningful. Climate, lithology, soil, topography, and landscape can collectively explain 79% of the total runoff variation (e.g., Figure 8 (As shown). The path coefficients (β) were ranked in the order of climate > lithology > topography > soil > landscape. Among them, climate factors (P, T, PET) had the greatest impact on runoff, with a β value of 0.589, showing a very significant positive effect (P < 0.01). Climate factors played a dominant role in runoff variation, while other factors had a smaller impact. Lithology, soil, topography, and landscape all had insignificant negative effects on runoff.

[0083] Optionally, Pearson correlation analysis was used to identify climatic factors (RDs and maximum 3-day precipitation (RX3)) and soil factors (EC, CAC, pH) influencing sediment transport. Furthermore, the topographic and landscape factors controlling sediment transport were obtained using the RF algorithm to finally form the PLS-SEM framework. However, the final prediction results of the PLS-SEM model were poor when the observed variables for topography and landscape determined by the RF algorithm were introduced. Therefore, by simulating various possible PLS-SEM framework types, the optimal solution was obtained using reselected topographic and landscape factors. Morphological factors (FF) and slope (S) were selected as the main topographic factors, and landscape shape index (LSI), weighted average shape index (SHAPE-AM), and water percentage (W) were selected as the main landscape factors.

[0084] The PLS-SEM model was used to decouple the relative importance of climate, lithology, soil, topography, and landscape on sediment transport. The model's GOF result was 0.51, indicating that the model is meaningful. Figure 9 It can be seen that climate, lithology, soil, topography, and landscape can collectively explain 59% of the sediment transport variability. Except for climate factors, lithology, soil, topography, and landscape all have significant effects on sediment transport variation (as shown in Table 5). Among them, landscape factors have the greatest impact on sediment transport variation (P < 0.01, β = -0.458), followed by lithology (P < 0.05, β = -0.337), topography (P < 0.05, β = 0.246), soil (P < 0.1, β = -0.198), and climate (β = -0.005). Topography factors have a significant positive correlation with sediment transport, while lithology, soil, and landscape factors have significant negative correlations with sediment transport (e.g., ...). Figure 9 (As shown).

[0085] Table 5. Analysis of the effect of standardized path coefficient in the PLS-SEM model.

[0086] variable Direct impact P Direct impact P Runoff Sediment transport volume landform -0.106 0.409 landform 0.246 0.044 landscape -0.050 0.734 landscape -0.458 0.000 climate 0.589 0.000 climate -0.005 0.971 soil -0.079 0.543 soil -0.198 0.099 Lithology -0.181 0.106 Lithology -0.337 0.023

[0087] With increasing precipitation, runoff trends show a significant increase, consistent with many previous studies. Based on the RF algorithm results, PT, RDs, and R25 were selected as the main climate variables, reflecting precipitation intensity and having a significant positive impact on runoff. The R25 factor is often used to represent extreme weather events; therefore, frequent extreme weather events may greatly influence runoff changes in the climate-sensitive Southwest Karst region. The spatiotemporal variations of R25, RDs, and PT factors are largely consistent with runoff (e.g., ...). Figure 10 (As shown in the figure). The results showed that R25, RDs, and PT were all highly significantly correlated with runoff (P ​​< 0.01; as shown in the figure). Figure 11(As shown). Furthermore, it was found that a larger proportion of karst area generally resulted in lower runoff and R25 (e.g., Figure 12 As shown, Figure 12 In this context, r represents the significance level of the Pearson coefficient; r CBC (This represents a partial correlation analysis between R25 and runoff under the condition of controlling for the proportion of karst area). This is likely due to the complex hydrogeological conditions in karst regions, where precipitation enters the groundwater system through numerous pores and fissures on the surface. Generally, surface runoff occurs on slopes in karst watersheds only when precipitation exceeds 60 mm. However, extreme rainfall can rapidly saturate the infiltration of surface karst areas, leading to surface runoff. The high hydrological connectivity of karst regions further enhances the influence of climate factors on runoff. Due to the high rainfall intensity in these areas, soil moisture is relatively high in the early stages of the watershed, thus reducing the infiltration capacity of subsequent water bodies and playing a crucial role in the generation and intensity of runoff. In the karst regions of the Mediterranean, extreme rainfall also leads to significant variations in river flow. For non-karst watersheds, previous studies have also shown that climate is the main influencing factor on runoff variation. Overall, runoff in karst regions is mainly influenced by climate, especially extreme weather events, while lithology, soil, topography, and landscape have less significant impact on runoff.

[0088] Climate, lithology, soil, topography, and landscape factors collectively explain 59% of the total variability in sediment transport. Notably, sediment transport in karst regions is more susceptible to specific geological features, soil properties, topography, and highly heterogeneous landscapes, while climate factors have a relatively small impact on sediment transport variations in karst basins. Generally, normal precipitation events have no significant impact on soil erosion in this region. During extreme rainfall or erosive precipitation events, erosive sediment is typically transported to the underground system through seepage structures such as fissures and sinkholes. Therefore, eroded sediment may block the outlets of sinkholes in karst depressions, leading to flooding. Furthermore, large amounts of eroded sediment are deposited in karst depressions or transported to another basin through underground channels; these unique hydrogeological structures reduce the impact of precipitation on sediment transport variations. Compared to runoff, sediment transport exhibits lower sensitivity to extreme rainfall factors (r = 0.33; e.g., Figure 13 As shown, Figure 13 In this context, r represents the significance level of the Pearson coefficient; r CBCThis indicates a partial correlation between RX3 and sediment transport under the control of karst area proportion. Furthermore, partial correlation analysis shows that, after considering the karst area proportion as a control variable, the correlation between runoff and extreme rainfall is significant, while the correlation between sediment transport and extreme rainfall is not significant. Previous studies have also shown that extreme rainfall has a greater impact on runoff than on sediment transport. Factors influencing sediment transport variation need to consider the complex topography and high heterogeneity of the karst system; precipitation and other external conditions have a relatively small impact on sediment transport variation in different karst basins.

[0089] Generally, variations in sediment transport are closely related to lithological characteristics. Higher coverage of soluble carbonate rocks results in lower surface runoff and sediment transport. Furthermore, soil thickness in karst regions is much smaller than in non-karst regions, and soil spatial distribution is extremely uneven and discontinuous. Eroded sediment is primarily deposited in low-lying areas such as karst depressions, leading to a continuous reduction in erodible soil in steep slopes. Topographic factors can provide kinetic energy for surface sediment transport, affecting not only soil thickness but also the occurrence and intensity of sediment transport. These unique topographic features strongly influence sediment generation and its transport to rivers in karst regions.

[0090] It is worth emphasizing that landscape factors have the greatest impact on sediment transport. As expected, the spatial variations of W, SHAPE-AM, and LSI are consistent with sediment transport, and there is a significant correlation between them (P < 0.05). Figure 11 and Figure 14 (As shown). Overall, landscape is a crucial factor influencing soil erosion processes in watersheds. From a landscape ecology perspective, the landscape structure of a watershed determines surface nutrients, soil erosion, and sediment transport. In karst regions, the widespread distribution of soluble carbonate rocks, the discontinuous distribution of soils, and the spatial fragmentation of vegetation growth lead to high landscape heterogeneity. In recent years, China has implemented a series of ecological restoration projects (such as deforestation and karst desertification control). These ecological engineering projects reduce the risk of rock desertification in karst regions by improving ecosystem stability, thus contributing to the sustainable development of the landscape and ecological environment in karst areas.

[0091] Forests and grasslands have a positive impact on soil and water conservation, while the large-scale expansion of farmland has led to significant soil erosion. Before the 21st century, changes in land cover types in the southwestern karst region exacerbated runoff and sediment storage and transport processes. Since 2000, various ecological restoration projects, including vegetation restoration and reconstruction, soil and water conservation, water resource development, and soil quality improvement, have reduced the risk of soil erosion and rocky desertification in the southwestern karst region of China. However, the permissible soil loss in the karst region is only 30–68 t km². -2Once soil is eroded, it is difficult to restore. In the future, on the one hand, the distribution and quantity of arable land must be considered, and greater emphasis should be placed on controlling activities such as deforestation to further reduce soil erosion. On the other hand, in the karst regions of southwestern China, a highly heterogeneous and fragmented landscape should be maintained to reduce surface runoff and soil erosion.

[0092] It is noteworthy that the main factors influencing sediment transport variation differ between karst and non-karst regions. The Loess Plateau, one of the most severely eroded regions in China and globally, sees topography and land use as the primary factors affecting sediment transport variation. In recent years, a series of soil and water conservation measures have been implemented on the Loess Plateau, including engineering (dams, reservoirs), biological (afforestation and grassland restoration), and agricultural measures (no-till farming and crop rotation). Vegetation cover has generally shown an upward trend, and the main factors reducing sediment transport have gradually shifted from terrace construction, dam embankment, and precipitation changes to vegetation restoration. Analysis of sediment transport control factors in the Godavari Basin of the Indian Peninsula reveals that topography, lithology, and land use have a significant impact on sediment transport variation. In the hilly red soil regions of southern China, land use composition and pattern, as well as geomorphology, have a significant impact on specific sediment transport. However, previous studies have not quantified the relative importance of different factors to sediment transport variation in heterogeneous watersheds.

[0093] In this embodiment, the average sediment transport in the 40 karst basins ranged from 2 to 246 t km. -2 (As shown in Tables 1 and 2), the average sediment transport is significantly lower than in other non-karst basins. The average sediment transport in southern China and the Loess Plateau red soil region is much higher than in karst regions. In karst regions, most of the soil covering the rock surface is eroded. However, the soil formation rate is much lower than the erosion rate. Karst regions have a high proportion of carbonate rocks, and widespread fissures and conduits not only allow precipitation to infiltrate into the underground karst system and be stored underground, but also affect the formation of surface runoff, reducing sediment transport driven by surface runoff. Some eroded soil fills karst conduits, blocking drainage outlets in karst depressions. Although sediment transport in karst basins is relatively low, the risk of soil erosion remains high.

[0094] In karst regions, such as the Mediterranean region of Europe, terraced agriculture and irrigation-based sustainable agricultural systems are widely used. Adopting new vegetation management models and soil and water conservation plant species are important measures for soil resource protection in karst areas. Karst development in Southwest China is more intense than in the Mediterranean region and deserves greater attention. Embedding arable land into karst watersheds with diverse landscape types and implementing appropriate soil and water conservation measures can effectively reduce the risk of soil erosion. Furthermore, attention should be paid to soil erosion on low-lying slopes covered by woodlands and grasslands; reducing the impact on underground porosity through engineering and vegetation measures may be an effective approach. In addition, developing abundant tourism resources related to karst landforms may be a way to reduce land demand. This approach can improve the fragile ecological environment. This example highlights the advantages of the PLS-SEM model in decoupling the relationship between runoff, sediment transport, and their latent variables. The research results provide a scientific basis for developing effective water and soil resource management strategies for karst watersheds.

[0095] Specifically, this embodiment explored the characteristics of runoff and sediment transport variation in different karst watersheds. Using a PLS-SEM model, the coupling effects of multiple environmental factors and water and sediment processes in 40 karst watersheds were quantified, revealing the dominant role of landscape heterogeneity in sediment transport. The research results contribute to a deeper understanding of the runoff and sediment transport characteristics of karst watersheds, providing a scientific basis for effective soil and water conservation and sustainable ecological environment development. The main results are as follows:

[0096] (1) Pearson correlation analysis showed that most of the 103 variables in climate, lithology, soil, topography and landscape factors were significantly correlated with runoff or sediment transport. Among them, 60 factors were significantly correlated with runoff, mainly landscape and climate factors; 23 factors were significantly correlated with sediment transport, with topography and landscape factors having the greatest impact on sediment transport.

[0097] (2) The PLS-SEM model was used to quantitatively distinguish the driving factors of runoff and sediment transport changes in karst watersheds. Climate, lithology, soil, topography, and landscape factors can jointly explain 79% of the runoff changes. Climate factors have the greatest impact on runoff, with a significant positive effect, while lithology, soil, topography, and landscape all have insignificant negative effects on runoff; runoff, climate, lithology, soil, topography, and landscape can jointly explain 59% of the sediment transport changes. Except for climate factors, lithology, soil, topography, and landscape factors all have significant effects on sediment transport. Landscape factors have the greatest impact on sediment transport, followed by lithology, geomorphology, soil, and climate. Topography factors have a significant positive correlation with sediment transport, while lithology, soil, and landscape factors have a significant negative correlation with sediment transport.

[0098] Corresponding to the above method, this embodiment also provides a decoupling system for the spatial variation influencing factors of sediment transport in karst watersheds based on model combination, including:

[0099] Basic data construction unit, used to build the basic dataset for the target karst watershed;

[0100] The latent variable mapping unit is used to establish the correspondence between latent variables and observed variables based on the basic dataset within the framework of five factors: climate, lithology, soil, topography, and landscape.

[0101] The correlation screening unit is used to perform Pearson correlation analysis with the observed variable as input and sediment transport as output, select the observed variable with significant correlation, and generate a candidate factor set.

[0102] The variable selection unit is used to build a random forest model with the candidate factor set as input, evaluate the importance of variables based on the change of out-of-package error, and select representative observed variables.

[0103] The structural equation modeling unit is used to construct a partial least squares structural equation model using climate, lithology, soil, topography and landscape as latent variables, combined with the representative observed variables, to calculate path coefficients and effect strengths, and obtain model estimation results.

[0104] The decoupling analysis unit is used to analyze the direction and contribution rate of each potential variable on the spatial variation of sediment transport based on the model estimation results, determine the dominant influencing factors, and output the decoupling results.

[0105] This document uses specific examples to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only for the purpose of helping to understand the method and core ideas of the present invention. Furthermore, those skilled in the art will recognize that, based on the ideas of the present invention, there will be changes in the specific implementation methods and application scope. Therefore, the content of this specification should not be construed as a limitation of the present invention.

Claims

1. A model combination-based karst watershed sediment discharge spatial variation influencing factor decoupling method, characterized in that, include: Construct the basic dataset for the target karst watershed; Within the framework of five factors—climate, lithology, soil, topography, and landscape—a correspondence between latent variables and observed variables is established based on the aforementioned basic dataset. Using the observed variables as input and sediment transport as output, Pearson correlation analysis was performed to select the observed variables with significant correlations and generate a candidate factor set. A random forest model is established using the candidate factor set as input, and representative observed variables are selected based on the importance of the evaluation variables for changes in out-of-package error. Using climate, lithology, soil, topography, and landscape as latent variables, a partial least squares structural equation model was constructed in conjunction with the aforementioned representative observed variables. The path coefficients and effect strengths were calculated to obtain the model estimation results. Based on the model estimation results, analyze the direction and contribution rate of each potential variable on the spatial variation of sediment transport, determine the dominant influencing factors, and output the decoupling results; Within the framework of five factors—climate, lithology, soil, topography, and landscape—the correspondence between latent variables and observed variables is established based on the aforementioned basic dataset, including: Climate, lithology, soil, topography, and landscape are used as five categories of potential variables, which serve as the superordinate variables for the subsequent measurement model. Based on the aforementioned basic dataset, a list of observed variables is extracted according to the latent variable categories to form a candidate observed variable library; Each observed variable in the candidate observed variable library is mapped to the corresponding latent variable category according to its physical meaning and data source, forming a hierarchical mapping of "latent variable - observed variable" one-to-one attribution; Under the unified caliber of sub-basins and study period, the corresponding entries of the basic dataset are called to verify and solidify the hierarchical mapping, thereby obtaining the correspondence between latent variables and observed variables; Using the observed variables as input and sediment transport as output, a Pearson correlation analysis was performed. Observed variables with significant correlations were selected to generate a candidate factor set, including: The observed variable matrix and the corresponding sediment transport sequence are retrieved from the aforementioned basic dataset as input data for correlation analysis; The Pearson correlation coefficient and significance level between each observed variable and sediment transport were calculated one by one to obtain a set of paired results for correlation coefficient and P-value. Based on the significance determination criteria, observed variables with significant correlation were selected to form a candidate factor list; the significance determination was based on a p-value < 0.05 as the inclusion criterion. Organize the candidate factor list into the candidate factor set; Using climate, lithology, soil, topography, and landscape as latent variables, and combining these with representative observed variables, a partial least squares structural equation model was constructed. Path coefficients and effect strengths were calculated to obtain model estimation results, including: The representative set of observation variables is called, and the measurement blocks are divided according to climate, lithology, soil, topography and landscape according to the correspondence. The observation variable matrix for modeling is generated, and the sediment transport is used as the dependent variable to complete the setting of the measurement model and the structural model. Climate, lithology, soil, topography, and landscape are used as potential variables in the structural model. A partial least squares structural equation model is established, and the measurement model and the structural model are jointly configured. The external weights and path coefficients were estimated using a partial least squares iterative algorithm to obtain the direct, indirect, and total effects of each latent variable on sediment transport, and the significance level was calculated. The goodness of fit and the explanatory power of sediment transport are calculated, and the path coefficients, effect decomposition results and evaluation indicators are output as the basis for determining the dominant factors and ranking their contribution rates, and the model estimation results are obtained.

2. The method for decoupling the influencing factors of spatial variation of sediment discharge in a karst watershed based on model combination according to claim 1, characterized in that, The basic dataset for the watershed in the study area is constructed, including: The target karst basin is divided into several sub-basins as spatial analysis units, and the research period for data processing is determined to form a benchmark framework for data registration and summarization. Precipitation and temperature observation data were obtained from the target karst basin and surrounding meteorological stations. Potential evapotranspiration was calculated based on the Penman formula. Temporal and spatial registration was completed in the sub-basin and study period dimensions to obtain a set of meteorological variables. The carbonate rock cover of each sub-basin was determined based on the lithological map, and the lithological variables were obtained. Particle size distribution, bulk density, electrical conductivity, calcium carbonate content, pH, and organic carbon content were extracted from the World Soil Database and registered in the dimensions of sub-watershed and study period to obtain a set of soil variables. Elevation and slope were extracted based on the digital elevation model, and the sub-basin scale was summarized to obtain a set of topographic variables. Landscape indices were calculated based on land use remote sensing and annual land cover data, and a set of landscape variables was obtained by summarizing at the sub-basin scale. Observational sequences of runoff and sediment transport were obtained from watershed hydrological stations and published annual reports, and registered at the sub-watershed and study period dimensions as response variables. Missing term processing and standardization are performed on the meteorological variable set, the lithological variable set, the soil variable set, the topographic variable set, the landscape variable set, and the response variable to obtain the basic dataset.

3. The method for decoupling the influencing factors of spatial variation of sediment transport in karst watersheds based on model combination as described in claim 1, characterized in that, The candidate observation variable pool consists of 103 observation variables: 17 climate variables, 1 lithology variable, 9 soil variables, 27 topographic variables, and 49 landscape variables.

4. The method for decoupling the influencing factors of spatial variation of sediment transport in karst watersheds based on model combination as described in claim 1, characterized in that, A random forest model is built using the candidate factor set as input. The importance of variables is evaluated based on the change in out-of-package error, and representative observed variables are selected, including: Using the observed variable matrix corresponding to the candidate factor set as the independent variable and the sediment transport sequence as the dependent variable, a dataset for regression modeling is constructed. Based on the dataset used for regression modeling, a training subset is generated by bootstrapping resampling, and the random forest model consisting of multiple regression decision trees is trained according to the training subset, while retaining the corresponding out-of-bag samples; For each of the observed variables, a perturbation is applied to the out-of-bag samples, and the increment of the out-of-bag mean square error before and after the perturbation is calculated to obtain the variable importance score of each of the observed variables; The variables are sorted from highest to lowest importance score, and the representative observed variables are determined based on the sorting results.

5. The method for decoupling the influencing factors of spatial variation of sediment transport in karst watersheds based on model combination as described in claim 1, characterized in that, Based on the model estimation results, the influence direction and contribution rate of each latent variable on the spatial variation of sediment transport are analyzed, the dominant influencing factors are identified, and the decoupling results are output, including: The direction of action is determined based on the sign of the path coefficient corresponding to the sediment transport volume as the dependent variable, and the corresponding significance level is used as the basis for validity. Using the total effect of each latent variable on the sediment transport as the basis for relative contribution, the contribution rates of the five categories of latent variables—climate, lithology, soil, topography, and landscape—are ranked from high to low, and their significance is marked. The leading and significant latent variables are identified as dominant influencing factors, and together with their direction of action, contribution rate, significance, goodness of fit, and explanatory value, they form the decoupling results for use in watershed soil and water conservation and landscape optimization decisions.

6. The method for decoupling the influencing factors of spatial variation of sediment transport in karst watersheds based on model combination as described in claim 1, characterized in that, The random forest model employs a self-calibrating structure with core factor priority probability and redundancy suppression for each candidate observed variable. First, a priority probability is constructed based on Pearson correlation analysis and out-of-package error increment. Then, feature sampling is performed at each tree node using this priority probability. Finally, the partitioning variable is selected according to the redundancy suppression weighted partitioning criterion, and the importance weight of the variable in the forest is determined. The expression for the redundancy suppression weighted partitioning criterion is: ;in, From the formula Determine, and make on each node Take the largest value as the partitioning variable for the current node, and simultaneously set the values ​​of each node... Accumulation serves as a forest-level measure of variable importance; among which, To use variables in the current node The decrease in mean square error resulting from the partitioning; For variables The priority probability of the core factor; This is the set of variables that have been selected as partitioning variables along the path from the root to this node in the current tree; For variables variables in the path set The Pearson correlation coefficient; For variables Pearson correlation coefficient between sediment transport and sediment transport volume; To test variables on out-of-package samples The increment of mean square error caused by scrambling.

7. A decoupling system for factors influencing spatial variation of sediment transport in karst watersheds based on model combination, characterized in that, The system for implementing the method as described in any one of claims 1 to 6 comprises: Basic data construction unit, used to build the basic dataset for the target karst watershed; The latent variable mapping unit is used to establish the correspondence between latent variables and observed variables based on the basic dataset within the framework of five factors: climate, lithology, soil, topography, and landscape. The correlation screening unit is used to perform Pearson correlation analysis with the observed variable as input and sediment transport as output, select the observed variable with significant correlation, and generate a candidate factor set. The variable selection unit is used to build a random forest model with the candidate factor set as input, evaluate the importance of variables based on the change of out-of-package error, and select representative observed variables. The structural equation modeling unit is used to construct a partial least squares structural equation model using climate, lithology, soil, topography and landscape as latent variables, combined with the representative observed variables, to calculate path coefficients and effect strengths, and obtain model estimation results. The decoupling analysis unit is used to analyze the direction and contribution rate of each potential variable on the spatial variation of sediment transport based on the model estimation results, determine the dominant influencing factors, and output the decoupling results.