A strategy optimization method for digital mapping of soil carbon-nitrogen ratio
By optimizing the soil carbon-nitrogen ratio prediction model using the quantile random forest algorithm and multiple modeling strategies, the problem of insufficient modeling applicability under different land use types was solved, and high-precision digital mapping of soil carbon-nitrogen ratio was achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- XINJIANG UNIVERSITY
- Filing Date
- 2026-03-20
- Publication Date
- 2026-06-19
Smart Images

Figure CN122245523A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of soil fertility assessment and land health management technology, specifically relating to a strategy optimization method suitable for digital mapping of soil carbon-nitrogen ratio. Background Technology
[0002] Soil carbon-nitrogen ratio (CNR) is a crucial indicator for measuring soil nutrient cycling and fertility, and it plays a significant role in ecosystem function assessment and agricultural management. Traditional studies of soil CNR largely rely on laboratory analysis, making it difficult to characterize large-scale, continuous spatial distributions. With the development of digital soil mapping technology, predictive modeling methods based on the spatial relationship between environmental covariates and soil properties have gradually become a research hotspot, providing a new technical approach for the spatial visualization of soil CNR.
[0003] Currently, soil carbon-nitrogen ratio (CNR) modeling mainly employs two methods: direct modeling and indirect modeling. Direct modeling uses the CNR as the dependent variable for prediction, while indirect modeling predicts organic carbon and total nitrogen separately before calculating the ratio. However, the applicability of these two methods to different land use types is still unclear, and they do not fully consider the impact of spatial heterogeneity on modeling accuracy. Therefore, it is necessary to explore more adaptive modeling strategies by incorporating land use information to improve the accuracy of digital mapping of soil CNR.
[0004] To improve the stability and prediction accuracy of modeling, machine learning methods have been widely used in soil property mapping in recent years. Among them, the random forest algorithm has attracted much attention due to its good anti-overfitting ability and adaptability to high-dimensional data. This study introduces the quantile random forest algorithm and combines it with various modeling strategies and parameter optimization methods to explore an optimization scheme for digital mapping of soil carbon-nitrogen ratios suitable for different regions, providing technical support for precise soil resource management. Summary of the Invention
[0005] To address the problems existing in the prior art, this invention provides a strategy optimization method suitable for digital mapping of soil carbon-nitrogen ratio, comprising the following steps: S1, acquiring soil organic carbon and total nitrogen sample data from 0-0.3m depth; S2, collecting data on four types of environmental variables—climate, topography, vegetation, and soil properties—and extracting them to sampling points; S3, designing carbon-nitrogen ratio modeling strategies: direct modeling, indirect modeling, and zonal direct modeling; S4, constructing a soil carbon-nitrogen ratio prediction model based on the quantile random forest algorithm; S5, validating the model accuracy and evaluating the applicability and stability of different modeling strategies through cross-validation and error analysis.
[0006] Furthermore, step S1 of the strategy optimization method applicable to soil carbon-nitrogen ratio digital mapping specifically involves: systematic sampling using a gridded sampling method (30m×30m grid). Sampling grids are set up in forest, shrub, grassland, farmland, and other land types, with a total of approximately 250 sampling points. Among them, forest accounts for 30%, with one sampling point per 2km², totaling 60 points; shrub accounts for 15%, with one sampling point per 1.5km², totaling 30 points; grassland accounts for 20%, with one sampling point per 1km², totaling 40 points; farmland accounts for 25%, with one sampling point per 0.5km², totaling 100 points; and other types account for 10%, with one sampling point per 1km², totaling 20 points.
[0007] Use a stainless steel drill bit to obtain 0-0.3m topsoil samples. Place the collected soil samples in a clean cloth or plastic bag, label them, and record information such as the geographical coordinates of the sampling point, sampling time, and land use type. Transport the obtained 0-0.3m topsoil samples back to the laboratory promptly, place them in a cool, ventilated place to air dry, remove stones, roots, and other impurities, grind and sieve (2mm sieve), and store them in a sealed bag away from light to prevent contamination and compositional changes.
[0008] Soil organic carbon was determined using the potassium dichromate-concentrated sulfuric acid oxidation method. The specific steps are as follows: Sample weighing: Accurately weigh 0.2 g of sieved soil sample into a digestion tube; Addition of oxidant: Add 10 mL of 0.4 mol / L potassium dichromate solution and 10 mL of concentrated sulfuric acid, mix well, and heat to digest; Titration: After cooling, titrate with 0.2 mol / L ferrous ammonium sulfate solution and record the titration volume; Calculation: Calculate the organic carbon content based on the difference between the blank test and the sample titration volume.
[0009] The Kjeldahl method was used to determine total nitrogen in soil. The specific steps are as follows: Sample weighing: Accurately weigh 0.5 g of soil sample into a Kjeldahl flask; Digestion: Add catalyst and concentrated sulfuric acid, heat until the solution is clear and transparent; Distillation: Transfer the digestion solution to a distillation apparatus, add concentrated NaOH for distillation, and absorb the released NH3 with boric acid solution; Titration: Titrate the absorbent solution with standard hydrochloric acid solution and calculate the total nitrogen content; Conversion: Convert the titration results to the total nitrogen content of the soil (%).
[0010] Visual Analysis: Concept of Soil Carbon-Nitrogen Ratio Sampling Point Layout Figure 2 .
[0011] Furthermore, step S2 of the strategy optimization method applicable to soil carbon-nitrogen ratio digital mapping specifically includes: environmental variable system construction: systematically integrating five major categories of environmental driving factors (climate factors, topographic indicators, vegetation indices, soil properties, and human activity data); data standardization processing: all raster data are resampled to a 1km spatial resolution and spatially registered using the WGS84 geographic coordinate system; spatial data extraction: based on the GPS coordinates of the sampling points, the "Extract Multi Values to Points" tool in ArcGIS Pro is used to extract the environmental variable information corresponding to each sampling point; variable screening and optimization: the variance inflation factor (VIF) is calculated, a threshold of 10 is set, highly correlated variables with VIF>10 are gradually eliminated, and the retained variables must pass the significance test (p<0.05), finally forming the optimal combination of environmental variables for modeling.
[0012] Table organization: See the environmental variable table. Figure 3 .
[0013] Furthermore, step S3 of the strategy optimization method applicable to soil carbon-nitrogen ratio digital mapping specifically involves the following: To explore the optimal modeling strategy for soil carbon-nitrogen ratio, we selected two currently mainstream soil carbon-nitrogen ratio modeling methods: direct modeling and indirect modeling. The direct modeling method uses the carbon-nitrogen ratio as the dependent variable and environmental covariates as the independent variables. The final predicted result of the model is the carbon-nitrogen ratio value for each spatial unit. The indirect modeling method uses organic carbon and total nitrogen as the dependent variables and environmental covariates as the independent variables. The final predicted result of the model is a spatial distribution map of organic carbon and total nitrogen, and then the carbon-nitrogen ratio map is obtained through calculation. In addition, considering spatial heterogeneity, we further optimized the strategy to a zonal direct modeling method. The zonal direct modeling method first divides the study area into five major regions according to land use type: forest, shrubland, grassland, farmland, and others. The dependent variable is the soil carbon-nitrogen ratio, and the independent variable is the environmental covariate. The final predicted result of the model is a spatial map of the carbon-nitrogen ratio for each region, and then these maps are stitched together to form a complete layer.
[0014] Visual Analysis: Soil Carbon-Nitrogen Ratio Modeling Strategy Optimization Strategy Concept Figure 4 .
[0015] Furthermore, step S4 of the strategy optimization method applicable to soil carbon-nitrogen ratio digital mapping specifically involves: quantile random forest modeling framework: standardized measured soil carbon-nitrogen ratio data are used as the dependent variable, and environmental covariate data corresponding to each soil sampling point are used as the independent variable. A soil carbon-nitrogen ratio inversion model is constructed by combining the quantile random forest algorithm.
[0016] Three modeling strategies are employed: direct modeling, indirect modeling, and zonal modeling. Direct modeling uses the carbon-to-nitrogen ratio as the dependent variable and environmental covariates as the independent variables. The model predicts the carbon-to-nitrogen ratio for each spatial unit. Indirect modeling uses organic carbon and total nitrogen as the dependent variables and environmental covariates as the independent variables. The model predicts the spatial distribution of organic carbon and total nitrogen, and then calculates the carbon-to-nitrogen ratio. Zonal direct modeling divides the study area into five zones based on land use: forest, shrubland, grassland, farmland, and others. The soil carbon-to-nitrogen ratio is the dependent variable, and environmental covariates are the independent variables. The model predicts the carbon-to-nitrogen ratio for each zone, and these maps are then combined to form a complete layer.
[0017] Model parameter optimization: Hyperparameter tuning was performed using GridSearchCV. The core parameter search range was as follows: decision tree depth: 8-15 layers (step size 2), number of trees: 100-500 (step size 100), learning rate: 0.01-0.3 (log interval), subsampling ratio: 0.6-1.0. 10-fold cross-validation was set to evaluate the parameter combination, and the parameter combination with the smallest RMSE on the validation set was selected.
[0018] Furthermore, step S5 of the strategy optimization method applicable to soil carbon-nitrogen ratio digital mapping specifically includes: cross-validation scheme: adopting a stratified 10-fold cross-validation strategy, with 70% as the training set and 30% as the validation set, ensuring consistent data distribution in each fold, and setting a random seed (random_state=42) to ensure the repeatability of results; evaluation index system: calculating three evaluation indices, R², RMSE, and MAE, and comparing the model accuracy of the three strategies; visualization analysis: plotting a regression fitting scatter plot of predicted values and measured values.
[0019] The advantages and positive effects of the technical solution to be protected by this invention are as follows: 1. Advantages of optimized carbon-nitrogen ratio modeling strategy: This invention innovatively proposes a direct modeling strategy for soil carbon-nitrogen ratio by region, dividing the study area into five regions: forest, shrubland, grassland, farmland, and others. For the first time, it realizes a comparison of direct method, indirect method, and region direct method for soil carbon-nitrogen ratio modeling strategies, and systematically reveals the optimal modeling strategy for soil carbon-nitrogen ratio.
[0020] 2. Advantages in improving the accuracy of soil carbon-nitrogen ratio prediction: This invention adopts the quantile random forest algorithm, optimizes model parameters through grid search, and further improves the accuracy of soil carbon-nitrogen ratio prediction based on direct modeling by using a partitioned direct modeling strategy. Attached Figure Description
[0021] To facilitate understanding of the technical solutions of this invention, the accompanying drawings involved in this specification are now described. It should be particularly noted that the accompanying drawings shown herein are merely exemplary illustrations illustrating the technical solutions of this invention, intended to more intuitively demonstrate the core concept of this invention. Any reasonable modifications or equivalent substitutions made by those skilled in the art based on the inventive concept and in conjunction with the accompanying drawings should fall within the protection scope of this invention.
[0022] Figure 1 The present invention provides an overall flowchart of a strategy optimization method for digital mapping of soil carbon-nitrogen ratio.
[0023] Figure 2 This is a conceptual diagram illustrating the layout of soil carbon-nitrogen ratio sampling points provided in an embodiment of the present invention.
[0024] Figure 3 A conceptual diagram of the soil carbon-nitrogen ratio modeling strategy optimization strategy provided in the embodiments of the present invention.
[0025] Figure 4 Scatter plot showing the accuracy of predicted and measured values of soil carbon-nitrogen ratio using the direct modeling strategy provided in this embodiment of the invention.
[0026] Figure 5 Scatter plot showing the accuracy of predicted and measured values of soil carbon-nitrogen ratio using the indirect modeling strategy provided in this embodiment of the invention.
[0027] Figure 6 Scatter plot showing the accuracy of predicted and measured values of soil carbon-nitrogen ratio using the zoning modeling strategy provided in this embodiment of the invention.
[0028] Figure 7 The soil carbon-nitrogen ratio partial mapping results are provided for the application zoning direct modeling optimization strategy in this embodiment of the invention. Detailed Implementation
[0029] To make the technical solution of the present invention clearer and easier to understand, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that, without departing from the basic principles of the present invention, the various embodiments and their technical features described in this application can be supplemented, combined, or substituted for each other, and these variations of implementation should all be considered within the scope of protection of the present invention.
[0030] To enable those skilled in the art to fully understand the technical solutions of the present invention, detailed descriptions of several specific embodiments are provided in the following specification. It should be particularly noted that the technical solutions covered by the present invention are not limited to the embodiments listed herein; any equivalent modifications or reasonable extensions made based on the core concept of the present invention are within the protection scope of the present invention.
[0031] This invention discloses a strategy optimization method suitable for digital mapping of soil carbon-nitrogen ratios, such as... Figure 1 As shown, the process includes the following steps: S1, obtaining soil organic carbon and total nitrogen sample data from 0-0.3m.
[0032] S2 collects data on four types of environmental variables: climate, topography, vegetation, and soil properties, and extracts them to the sampling points.
[0033] S3, Design carbon-nitrogen ratio modeling strategies: direct modeling method, indirect modeling method, and partitioned direct modeling method.
[0034] S4, a soil carbon-nitrogen ratio prediction model is constructed based on the quantile random forest algorithm.
[0035] S5 performs model accuracy verification and evaluates the applicability and stability of different modeling strategies through cross-validation and error analysis.
[0036] Further, step S1 specifically involves: using a gridded sampling method (30m×30m) for systematic sampling. Sampling grids are set up in forest, shrub, grassland, farmland, and other land types, with a total of approximately 250 sampling points. For example, forest ecosystems account for about 30% of the total area, with one sampling point set up every 2km², for a total of 60 points; shrub ecosystems account for about 15% of the total area, with one sampling point set up every 1.5km², for a total of 30 points; grassland ecosystems account for about 20% of the total area, with one sampling point set up every 1km², for a total of 40 points; farmland ecosystems account for about 25% of the total area, with one sampling point set up every 0.5km², for a total of 100 points; other ecosystems (such as bare land, wetlands, etc.) account for the remaining 10%, with one sampling point set up every 1km², to ensure the spatial uniformity and representativeness of the sampling points in different ecosystems.
[0037] Use a stainless steel drill bit to obtain 0-0.3m topsoil samples. Place the collected soil samples in a clean cloth or plastic bag, label them, and record information such as the geographical coordinates of the sampling point, sampling time, and land use type. Transport the obtained 0-0.3m topsoil samples back to the laboratory promptly, place them in a cool, ventilated place to air dry, remove stones, roots, and other impurities, grind and sieve (2mm sieve), and store them in a sealed bag away from light to prevent contamination and compositional changes.
[0038] Soil organic carbon was determined using the potassium dichromate-concentrated sulfuric acid oxidation method, and the specific steps are as follows:
[0039] Sample weighing: Accurately weigh 0.2 g of sieved soil sample into a digestion tube.
[0040] Add oxidizing agent: Add 10 mL of 0.4 mol / L potassium dichromate solution and 10 mL of concentrated sulfuric acid, mix well, and heat to digest.
[0041] Titration: After cooling, titrate with 0.2 mol / L ferrous ammonium sulfate solution and record the titration volume.
[0042] Calculation: Calculate the organic carbon content based on the difference in titration volume between the blank test and the sample.
[0043] The Kjeldahl method was used to determine total nitrogen in soil. The specific steps are as follows:
[0044] Sample weighing: Accurately weigh 0.5 g of soil sample into a Kjeldahl flask.
[0045] Digestion: Add catalyst and concentrated sulfuric acid, and heat until the solution is clear and transparent.
[0046] Distillation: Transfer the digestion solution to a distillation apparatus, add concentrated NaOH for distillation, and absorb the released NH3 with boric acid solution.
[0047] Titration: Titrate the absorbent solution with standard hydrochloric acid solution and calculate the total nitrogen content.
[0048] Conversion: Convert the titration results to soil total nitrogen content (%).
[0049] Furthermore, step S2 specifically involves:
[0050] Environmental variable system construction: The system integrates five major categories of environmental driving factors (climate factors, topographic indicators, vegetation indices, soil properties, and human activity data).
[0051] Data standardization: All raster data were resampled to a spatial resolution of 1km and spatially registered using the WGS84 geographic coordinate system.
[0052] Spatial data extraction: Based on the GPS coordinates of the sampling points, the "Extract MultiValues to Points" tool in ArcGIS Pro was used to extract the environmental variable information corresponding to each sampling point.
[0053] Variable selection optimization: Calculate the variance inflation factor (VIF), set a threshold of 10, and gradually remove highly correlated variables with VIF>10. The retained variables must pass the significance test (p<0.05) to finally form the optimal combination of environmental variables for modeling.
[0054] Further, step S3 specifically involves: To explore the optimal modeling strategy for soil carbon-nitrogen ratio, we selected two currently mainstream soil carbon-nitrogen ratio modeling methods: direct modeling and indirect modeling. The direct modeling method uses the carbon-nitrogen ratio as the dependent variable and environmental covariates as the independent variables. The final predicted result is the carbon-nitrogen ratio value for each spatial unit. The indirect modeling method uses organic carbon and total nitrogen as the dependent variables and environmental covariates as the independent variables. The final predicted result is a spatial distribution map of organic carbon and total nitrogen, which is then further calculated to obtain the carbon-nitrogen ratio map. Furthermore, considering spatial heterogeneity, we further optimized the strategy to a zonal direct modeling method. This method first divides the study area into five major regions according to land use type: forest, shrubland, grassland, farmland, and others. The dependent variable is the soil carbon-nitrogen ratio, and the independent variables are environmental covariates. The final predicted result is a spatial map of the carbon-nitrogen ratio for each region, which is then stitched together to form a complete layer.
[0055] Furthermore, step S4 specifically involves: quantile random forest modeling framework: standardized measured soil carbon-nitrogen ratio data are used as the dependent variable, and environmental covariate data corresponding to each soil sampling point are used as the independent variable. A soil carbon-nitrogen ratio inversion model is constructed by combining the quantile random forest algorithm.
[0056] Three modeling strategies are employed: direct modeling, indirect modeling, and zonal modeling. Direct modeling uses the carbon-to-nitrogen ratio as the dependent variable and environmental covariates as the independent variables. The model predicts the carbon-to-nitrogen ratio for each spatial unit. Indirect modeling uses organic carbon and total nitrogen as the dependent variables and environmental covariates as the independent variables. The model predicts the spatial distribution of organic carbon and total nitrogen, and then calculates the carbon-to-nitrogen ratio. Zonal direct modeling divides the study area into five zones based on land use: forest, shrubland, grassland, farmland, and others. The soil carbon-to-nitrogen ratio is the dependent variable, and environmental covariates are the independent variables. The model predicts the carbon-to-nitrogen ratio for each zone, and these maps are then combined to form a complete layer.
[0057] Model parameter optimization: Hyperparameter tuning was performed using GridSearchCV. The core parameter search range was as follows: decision tree depth: 8-15 layers (step size 2), number of trees: 100-500 (step size 100), learning rate: 0.01-0.3 (log interval), subsampling ratio: 0.6-1.0. 10-fold cross-validation was set to evaluate the parameter combination, and the parameter combination with the smallest RMSE on the validation set was selected.
[0058] Furthermore, step S5 specifically involves the following cross-validation scheme: a stratified 10-fold cross-validation strategy is adopted, with 70% used as the training set and 30% as the validation set to ensure consistent data distribution in each fold. A random seed (random_state=42) is set to ensure the reproducibility of the results.
[0059] Evaluation index system: Calculate three evaluation indices: R², RMSE, and MAE, and compare the model accuracy of the three strategies.
[0060] Visualization analysis: Plot a scatter plot of the regression fit between predicted and measured values.
[0061] The embodiments of this invention are merely specific implementations of the invention, but the scope of protection of this invention is not limited thereto. For those skilled in the art, any obvious modifications, equivalent substitutions, or improvements within the scope of the technical concept disclosed in this invention should be included within the scope of protection of this invention. The final scope of protection of this invention is determined by the scope defined in the claims.
Claims
1. A strategy optimization method suitable for digital mapping of soil carbon-nitrogen ratio, characterized in that... Includes the following steps: S1, Obtain soil organic carbon and total nitrogen sample data from 0-0.3m. S2 collects data on five categories of environmental variables: climate, topography, vegetation, soil properties, and human activities, and extracts them to the sampling points. S3, Design carbon-nitrogen ratio modeling strategies: direct modeling method, indirect modeling method, and partitioned direct modeling method; S4, a soil carbon-nitrogen ratio prediction model was constructed based on the quantile random forest algorithm; S5 performs model accuracy verification and evaluates the applicability and stability of different modeling strategies through cross-validation and error analysis.
2. The strategy optimization method for soil carbon-nitrogen ratio digital mapping according to claim 1, characterized in that, Step S1 is as follows: A 30m×30m grid sampling method was used to establish approximately 250 sampling points across forests, shrublands, grasslands, farmland, and other land types, ensuring reasonable sampling density and uniform spatial distribution for each ecosystem. Specifically, forests accounted for 30%, with one sampling point per 2km² (60 points); shrublands accounted for 15%, with one sampling point per 1.5km² (30 points); grasslands accounted for 20%, with one sampling point per 1km² (40 points); farmland accounted for 25%, with one sampling point per 0.5km² (100 points); and other land types accounted for 10%, with one sampling point per 1km² (20 points). During the sampling process, the latitude and longitude coordinates (using GPS for precise positioning), altitude (based on DEM data), and land use type (such as cultivated land, forest land, grassland, saline-alkali land, etc.) of each sampling point are recorded simultaneously to ensure comprehensive coverage of environmental variables and provide basic data support for subsequent modeling. Soil samples from the 0-0.3m topsoil layer were collected, air-dried, impurities removed, ground, and sieved (2mm), then sealed and stored. Soil organic carbon was determined using the potassium dichromate-concentrated sulfuric acid oxidation method, and total nitrogen was determined using the Kjeldahl method.
3. The strategy optimization method for soil carbon-nitrogen ratio digital mapping according to claim 1, characterized in that, Step S2 is as follows: Environmental variable system construction: systematically integrate five major categories of environmental driving factors (climate factors, topographic indicators, vegetation indices, soil property data, and human activities). Data standardization: All raster data were resampled to a spatial resolution of 1km and spatially registered using the WGS84 geographic coordinate system; Spatial data extraction: Based on the GPS coordinates of the sampling points, the "Extract Multi Values toPoints" tool in ArcGIS Pro was used to extract the environmental variable information corresponding to each sampling point; Variable selection optimization: Calculate the variance inflation factor (VIF), set a threshold of 10, and gradually remove highly correlated variables with VIF>10. The retained variables must pass the significance test (p<0.05) to finally form the optimal combination of environmental variables for modeling.
4. The strategy optimization method for soil carbon-nitrogen ratio digital mapping according to claim 1, characterized in that, Step S3 is as follows: To optimize the modeling approach, three strategies were compared and analyzed: Direct modeling method: Using the carbon-to-nitrogen ratio as the dependent variable and environmental covariates as independent variables, the spatial distribution of the carbon-to-nitrogen ratio is directly predicted; Indirect modeling method: Model with organic carbon and total nitrogen as dependent variables respectively, predict their spatial distribution and then calculate the carbon-nitrogen ratio; Direct modeling by zone: The land use type is divided into five categories: forest, shrubland, grassland, farmland and others. The carbon-nitrogen ratio is predicted separately in each region and finally stitched together to generate a complete spatial map of the carbon-nitrogen ratio.
5. The strategy optimization method for soil carbon-nitrogen ratio digital mapping according to claim 1, characterized in that, Step S4 is as follows: A quantile stochastic forest inversion model was constructed using standardized measured soil carbon-nitrogen ratio data as the dependent variable and environmental covariates as independent variables. Three modeling strategies were compared: Direct modeling method: directly using the carbon-to-nitrogen ratio as the dependent variable for modeling; Indirect modeling method: Model and predict organic carbon and total nitrogen separately, and then calculate the carbon-nitrogen ratio; Zonal modeling method: Zonal the land use type (forest, shrubland, grassland, farmland, others), model each area separately, and finally stitch them together to form a complete layer.
6. The strategy optimization method for soil carbon-nitrogen ratio digital mapping according to claim 1, characterized in that... Step S5 is as follows: Cross-validation scheme: A stratified 10-fold cross-validation strategy is adopted, with 70% as the training set and 30% as the validation set to ensure that the data distribution is consistent in each fold. A random seed (random_state=42) is set to ensure the repeatability of the results. Evaluation index system: Calculate three evaluation indices: R², RMSE, and MAE, and compare the model accuracy of the three strategies; Visualization analysis: Plot a scatter plot of the regression fit between predicted and measured values.