A soil organic carbon identification method, electronic equipment and computer program product
By embedding salinity gradient constraints and environmental factor interaction mechanisms into the machine learning model, a composite weight function is constructed, which solves the problems of low prediction accuracy and poor robustness of traditional methods in coastal saline-alkali areas. This achieves high-precision and highly interpretable SOC spatial mapping, improving the prediction accuracy and stability of high-salinity areas.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANDONG NORMAL UNIV
- Filing Date
- 2026-05-22
- Publication Date
- 2026-06-19
AI Technical Summary
Traditional spatial mapping methods for soil organic carbon (SOC) have low prediction accuracy and poor robustness in coastal saline-alkali areas, making it difficult to reveal the true interaction pathways between salinity gradient, environmental covariates, and SOC. This results in unstable prediction results in high-salt/extremely saline-alkali areas.
By embedding the salt gradient constraint and the interaction mechanism of environmental factors into the machine learning model, a gradient threshold weight function and an interaction effect function are constructed to generate a composite weight function, which is then embedded into the training objective function of the machine learning model. This optimizes the data fitting and physical mechanism conformity, thereby improving prediction accuracy and robustness.
It achieves high-precision, highly interpretable, and robust SOC spatial mapping, significantly improving the prediction accuracy and stability of high-salt/extremely saline-alkali areas, ensuring continuous and smooth spatial distribution of mapping results, and meeting the needs of fine digital soil mapping.
Smart Images

Figure CN122241652A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of soil measurement technology, and more specifically, to a method for identifying soil organic carbon, electronic equipment, and computer program products. Background Technology
[0002] Soil organic carbon (SOC) is a key indicator for measuring soil fertility and quality, and an important component of the global carbon cycle. Accurately mapping the spatial distribution of SOC is crucial for understanding regional carbon storage, assessing soil health, and developing sound land use and management policies. Traditional SOC spatial mapping methods primarily rely on classical geostatistical methods (such as Kriging interpolation) and general machine learning models (such as multiple linear regression, random forests, support vector machines, and gradient boosting decision trees) to make spatial predictions by establishing statistical relationships between SOC and a range of environmental covariates (such as topography, climate, vegetation, and soil properties).
[0003] However, in special regions with complex ecological processes and strong spatial heterogeneity, especially coastal saline-alkali areas, traditional mapping methods face significant challenges. Coastal saline-alkali soils are influenced by land-sea interactions, and soil salinity (SSC) exhibits a significant gradient change from nearshore to inland, exerting a strong nonlinear inhibitory effect on the accumulation and decomposition of soil organic matter (SOC). Traditional general models typically input salinity as a common environmental covariate, failing to fully consider its unique gradient effect and piecewise constraints. This leads to systematic biases in predictions in high-salinity / extremely saline-alkali areas, resulting in unstable predictions and significantly reduced accuracy. Furthermore, traditional models are mostly based on the assumption of feature independence or only consider simple linear interactions, making it difficult to reveal the true interaction paths between salinity gradients, environmental covariates, and SOC, thus limiting the interpretability and physical meaning of the models. Summary of the Invention
[0004] The purpose of this application is to provide a method, electronic device and computer program product for identifying soil organic carbon, in order to solve the problems of low prediction accuracy, poor robustness in high salinity areas and weak interpretability of physical mechanisms in the application of existing technologies in coastal saline-alkali areas.
[0005] This application provides a method for identifying soil organic carbon, including:
[0006] Acquire soil salinity data and environmental covariate data for the area to be tested;
[0007] Based on soil salinity data, the area to be tested is divided into different salinity gradient levels;
[0008] By constructing constraint models for soil organic carbon, soil salinity, and environmental covariates at each salinity gradient level, the threshold and constraint range of soil organic carbon at the corresponding salinity gradient level are determined.
[0009] Based on the threshold and constraint interval, a gradient threshold weighting function is generated to constrain the spatial distribution continuity of soil organic carbon.
[0010] Based on environmental covariate data, the impact of nonlinear interactions among multiple environmental covariates on soil organic carbon is quantified and embedded to generate an interaction effect function.
[0011] A composite weighting function for characterizing the spatial differentiation of soil organic carbon is constructed by coupling the gradient threshold weighting function and the interaction effect function.
[0012] By using a composite weight function as a penalty or weight term, embedding it into the training objective function of the machine learning model, and then training the model, an organic carbon identification model is obtained.
[0013] The model inputs salinity and environmental covariates into the organic carbon identification model to predict and output the organic carbon identification results.
[0014] The above technical solution embeds the interaction mechanism between salinity gradient constraints and environmental factors into a machine learning model, achieving high-precision, highly interpretable, and robust SOC spatial mapping. By analyzing the SSC-SOC relationship through piecewise quantile regression, SOC thresholds and constraint intervals under each gradient are identified and set, ensuring the physical rationality of the model output values. This effectively avoids the problems of large prediction bias and instability in high-salinity areas found in traditional models, significantly improving the prediction accuracy and robustness in high-salinity / extremely saline-alkali areas. By constructing a gradient threshold weight function to constrain the spatial continuity of SOC and quantifying the interaction effects between environmental covariates, the model can characterize the multi-level action paths of salinity gradient, environmental covariates, and SOC. By embedding a composite weight function as a penalty or weight term into the model training objective, the model is guided to simultaneously optimize data fitting and physical mechanism conformity, improving prediction accuracy while ensuring a continuous and smooth spatial distribution of the mapping results, meeting the needs of refined digital soil mapping.
[0015] In some alternative implementations, the area to be tested is divided into different salinity gradient levels based on soil salinity data, including:
[0016] The K-means clustering method was used to classify the soil salinity data, dividing the area under test into four salinity gradient levels: slightly, moderately, severely, and extremely severely saline-alkali areas.
[0017] In the above technical solution, K-means clustering is used to divide the soil into four levels, effectively identifying and separating the inflection points or segmentation points of SOC response to salinity. This provides accurate gradient boundaries and sample basis for subsequent steps (such as piecewise quantile regression) to build dedicated constraint models for severely and extremely severely saline-alkali areas. This allows the model to independently learn the slowing pattern of the salinity inhibition effect unique to high-salinity areas, effectively avoiding systematic bias caused by using a single model to fit all gradients, thus significantly improving the prediction accuracy and stability of the model in high-salinity / extremely severely saline-alkali areas. The clearly defined four levels provide clear computational units for subsequently determining the thresholds and constraint intervals of soil organic carbon at the corresponding salinity gradient levels. The SSC-SOC relationship can be analyzed separately for each independent gradient level (such as severely saline-alkali areas), setting specific thresholds and constraint intervals that better fit the physical laws of that gradient.
[0018] In some optional implementations, by constructing a constraint model of soil organic carbon, soil salinity, and environmental covariates at each salinity gradient level, the threshold and constraint range of soil organic carbon at the corresponding salinity gradient level are determined, including:
[0019] For different salinity gradient levels, a piecewise quantile regression method was used to construct a constraint model between soil salinity and soil organic carbon;
[0020] The AIC criterion is used to optimize the structure of the piecewise regression model and determine the optimal piecewise model.
[0021] Based on the inflection point and slope of the regression coefficient changes in the optimal piecewise model, the threshold and constraint range of soil organic carbon at different salinity gradient levels are identified.
[0022] In some optional implementations, a gradient threshold weighting function constraining the spatial distribution continuity of soil organic carbon is generated based on a threshold and a constraint interval, including:
[0023] A spatial continuity constraint function is constructed using the Laplace operator to express the gradient relationship of soil organic carbon constrained by salinity;
[0024] Based on the threshold and constraint interval, set the gradient threshold weight function:
[0025] ;
[0026] in, The spatial gradient of soil organic carbon. The attenuation coefficient is... The dynamic threshold parameter is determined by the threshold and the constraint interval.
[0027] In the above technical solution, the Laplacian operator is used to constrain the continuity of the SOC space, expressing the gradient relationship of SOC constrained by salt, thus ensuring the continuity of the SOC spatial mapping results. The formula is as follows: Where ∇SOC is the SOC spatial gradient operator, , The coordinates of the SOC. As a grading standard, the second derivative is calculated using the finite difference method to constrain the continuity of the SOC spatial distribution. Based on the gradient constraint threshold studied previously, the gradient threshold weight is set as follows: ;in, It is an absolute value function, representing the gradient strength. This is a dynamic threshold parameter (salt gradient threshold control). This is the attenuation coefficient (controlling the attenuation rate of the weight).
[0028] In some optional implementations, based on environmental covariate data, the impact of nonlinear interactions between multiple environmental covariates on soil organic carbon is quantified and embedded to generate an interaction effect function, including:
[0029] The Hadamard product is used to characterize the nonlinear interaction effect between environmental covariates, resulting in the interaction effect function:
[0030] ;
[0031] in, and As an environmental covariate, This represents the Hadamard product. This represents the interaction weight coefficient.
[0032] In some optional implementations, a composite weighting function is constructed by coupling a gradient threshold weighting function and an interaction effect function to characterize the spatial differentiation of soil organic carbon, including:
[0033] By weighting and coupling the gradient threshold weight function and the interaction effect function, a composite weight function is obtained:
[0034] ;
[0035] in, These are parameters that can be dynamically adjusted.
[0036] In some optional implementations, a composite weight function is used as a penalty term or a weight term, embedded in the training objective function of the machine learning model, and the model is trained to obtain an organic carbon identification model, including:
[0037] In the objective function of the machine learning model, a Kullback-Leibler divergence constraint term and a spatial variance penalty term are added, resulting in the objective function:
[0038] ;
[0039] in, The loss function is the difference between the predicted and actual values. The parameters are the model parameters for the consistency of the distribution of composite weights and theoretical weights, i.e., the range of allowed SOC values is within the threshold range, to realize the construction of XGBoost machine learning model based on composite weight function. and To adjust the parameters.
[0040] In some alternative implementations, the machine learning model is an XGBoost model, LightGBM, CatBoost, or a random forest model;
[0041] Environmental covariates include at least one of the following: micro-topographic features, vegetation type and its growth indicators, land use patterns, farming patterns, and farmland management measures.
[0042] An electronic device provided in this application includes a processor and a memory, wherein the memory stores machine-readable instructions executable by the processor, and the machine-readable instructions, when executed by the processor, perform any of the methods described above.
[0043] This application provides a computer program product, including a computer program / instruction, which, when executed by a processor, implements the steps of any of the methods described above.
[0044] The beneficial effects of the embodiments of this application include: By embedding the interaction mechanism between salinity gradient constraints and environmental factors into a machine learning model, high-precision, highly interpretable, and highly robust SOC spatial mapping is achieved.
[0045] By analyzing the SSC-SOC relationship through piecewise quantile regression, we can identify and set the SOC threshold and constraint interval under each gradient, ensuring that the model output value is physically reasonable. This effectively avoids the problem of large prediction bias and instability of traditional models in high-salinity areas, and significantly improves the prediction accuracy and robustness of high-salinity / extremely saline-alkali areas.
[0046] By constructing a gradient threshold weight function to constrain the continuity of the SOC space and quantifying the interaction effects among environmental covariates, the model can characterize the multi-level action paths of salinity gradient, environmental covariates, and SOC.
[0047] By embedding a composite weight function as a penalty or weight term into the model training objective, the model is guided to simultaneously optimize data fitting and physical mechanism conformity. This improves prediction accuracy while ensuring continuous and smooth spatial distribution of mapping results, meeting the needs of refined digital soil mapping. Attached Figure Description
[0048] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments of this application will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0049] Figure 1 A flowchart illustrating the steps of a soil organic carbon identification method provided in this application embodiment. Detailed Implementation
[0050] The technical solutions in the embodiments of this application will now be described with reference to the accompanying drawings.
[0051] Please refer to Figure 1 , Figure 1 A flowchart of a soil organic carbon identification method provided in this application embodiment includes:
[0052] Step 1: Obtain soil salinity data and environmental covariate data for the area to be tested;
[0053] Specifically, the Yellow River Delta and the southern shore of Laizhou Bay, with geographical coordinates of 118°32′00″-120°38′00″E and 36°40′00″-37°20′00″N, is characterized by strong land-sea interaction. Saline-alkali soils decrease in distribution from the coast to the inland areas, making it a typical coastal saline-alkali region in my country. It also serves as a testing ground for exploring the comprehensive utilization of saline-alkali land and developing modern agriculture on saline-alkali land, and is considered an important reserve land resource. Data on saline-alkali land in the Yellow River Delta and the southern shore of Laizhou Bay were collected. Following the principles of representativeness and typicality, K-means clustering was used to classify salinity into four gradients: slightly, moderately, severely, and extremely severely saline-alkali. Then, a grid method was used to determine the sampling point layout for each salinity gradient. 10-20 sampling points were set up in each gradient plot, and multi-temporal sampling was conducted in 2026, collecting more than 200 soil samples in total. Soil sampling requires removing debris from the surface of the soil in situ. Using a five-point sampling method (five-point mixing method) centered on each sampling point, topsoil (0-20cm) is collected, and five subsamples are mixed. 2kg of this mixture is used as the actual sample. A handheld GPS positioning system is used to record the coordinates of each sampling point, and high-resolution photographs of the sample plot are taken. Detailed records of the surrounding environment and basic soil information are also documented.
[0054] After removing mixed animal and plant residues in the air-drying chamber, the soil is naturally air-dried, ground, and passed through a 2mm (10-mesh) nylon sieve. It is then thoroughly mixed and packaged for salt content determination. Another portion of the soil is passed through a 0.25mm nylon sieve for SOC content determination.
[0055] Specific determination methods: SOC was measured by potassium dichromate oxidation-volume method; SSC was determined by gravimetric method.
[0056] Step 2: Based on soil salinity data, the area to be tested is divided into different salinity gradient levels;
[0057] Specifically, to more reasonably determine the gradient relationship between the influence of salinization on salinization and salinization on osmosis (SOC), K-means clustering is used to identify the segmentation points of the relationship between SSC and SOC. The salinity gradient is divided into four levels (mild / moderate / severe / extremely severe) to reflect the stage characteristics of the influence of salinization on SOC and ensure that the gradient division can accurately match the response pattern of SOC.
[0058] Step 3: By constructing a constraint model for soil organic carbon, soil salinity, and environmental covariates at each salinity gradient level, determine the threshold and constraint range of soil organic carbon at the corresponding salinity gradient level.
[0059] Step 4: Generate a gradient threshold weighting function to constrain the spatial distribution continuity of soil organic carbon based on the threshold and constraint interval;
[0060] Step 5: Based on the environmental covariate data, quantify and embed the impact of nonlinear interactions between multiple environmental covariates on soil organic carbon, and generate an interaction effect function;
[0061] Step 6: Couple the gradient threshold weight function and the interaction effect function to construct a composite weight function for characterizing the spatial differentiation of soil organic carbon;
[0062] Step 7: Using a composite weight function as a penalty or weight term, embed it into the training objective function of the machine learning model, and train the model to obtain the organic carbon identification model.
[0063] Step 8: Input the salinity and environmental covariates into the organic carbon identification model, predict and output the organic carbon identification results.
[0064] The above technical solution embeds the interaction mechanism between salinity gradient constraints and environmental factors into a machine learning model, achieving high-precision, highly interpretable, and robust SOC spatial mapping. By analyzing the SSC-SOC relationship through piecewise quantile regression, SOC thresholds and constraint intervals under each gradient are identified and set, ensuring the physical rationality of the model output values. This effectively avoids the problems of large prediction bias and instability in high-salinity areas found in traditional models, significantly improving the prediction accuracy and robustness in high-salinity / extremely saline-alkali areas. By constructing a gradient threshold weight function to constrain the spatial continuity of SOC and quantifying the interaction effects between environmental covariates, the model can characterize the multi-level action paths of salinity gradient, environmental covariates, and SOC. By embedding a composite weight function as a penalty or weight term into the model training objective, the model is guided to simultaneously optimize data fitting and physical mechanism conformity, improving prediction accuracy while ensuring a continuous and smooth spatial distribution of the mapping results, meeting the needs of refined digital soil mapping.
[0065] In some alternative implementations, the area to be tested is divided into different salinity gradient levels based on soil salinity data, including:
[0066] The K-means clustering method was used to classify the soil salinity data, dividing the area under test into four salinity gradient levels: slightly, moderately, severely, and extremely severely saline-alkali areas.
[0067] In the above technical solution, K-means clustering is used to divide the soil into four levels, effectively identifying and separating the inflection points or segmentation points of SOC response to salinity. This provides accurate gradient boundaries and sample basis for subsequent steps (such as piecewise quantile regression) to build dedicated constraint models for severely and extremely severely saline-alkali areas. This allows the model to independently learn the slowing pattern of the salinity inhibition effect unique to high-salinity areas, effectively avoiding systematic bias caused by using a single model to fit all gradients, thus significantly improving the prediction accuracy and stability of the model in high-salinity / extremely severely saline-alkali areas. The clearly defined four levels provide clear computational units for subsequently determining the thresholds and constraint intervals of soil organic carbon at the corresponding salinity gradient levels. The SSC-SOC relationship can be analyzed separately for each independent gradient level (such as severely saline-alkali areas), setting specific thresholds and constraint intervals that better fit the physical laws of that gradient.
[0068] In some alternative implementations, gradient partitioning methods can use quantile grading or natural breakpoints instead of K-means clustering for salt gradient grading. The core is to ensure that the salt gradient can be accurately partitioned and the constraint characteristics of different gradients on SOC can be identified.
[0069] In some optional implementations, by constructing a constraint model of soil organic carbon, soil salinity, and environmental covariates at each salinity gradient level, the threshold and constraint range of soil organic carbon at the corresponding salinity gradient level are determined, including:
[0070] For different salinity gradient levels, a piecewise quantile regression method was used to construct a constraint model between soil salinity and soil organic carbon;
[0071] The AIC criterion was used to optimize the structure of the piecewise quantile regression model and determine the optimal piecewise model.
[0072] Based on the inflection point and slope of the regression coefficient changes in the optimal piecewise model, the threshold and constraint range of soil organic carbon at different salinity gradient levels are identified.
[0073] In some optional implementations, a gradient threshold weighting function constraining the spatial distribution continuity of soil organic carbon is generated based on a threshold and a constraint interval, including:
[0074] A spatial continuity constraint function is constructed using the Laplace operator to express the gradient relationship of soil organic carbon constrained by salinity;
[0075] Based on the threshold and constraint interval, set the gradient threshold weight function:
[0076] ;
[0077] in, The spatial gradient of soil organic carbon. The attenuation coefficient is... The dynamic threshold parameter is determined by the threshold and the constraint interval.
[0078] In the above technical solution, the Laplacian operator is used to constrain the continuity of the SOC space, expressing the gradient relationship of SOC constrained by salt, thus ensuring the continuity of the SOC spatial mapping results. The formula is as follows: Where ∇SOC is the SOC spatial gradient operator, , The coordinates of the SOC. As a grading standard, the second derivative is calculated using the finite difference method to constrain the continuity of the SOC spatial distribution. Based on the gradient constraint threshold studied previously, the gradient threshold weight is set as follows: ;in, It is an absolute value function, representing the gradient strength. This is a dynamic threshold parameter (salt gradient threshold control). This is the attenuation coefficient (controlling the attenuation rate of the weight).
[0079] In some optional implementations, based on environmental covariate data, the impact of nonlinear interactions between multiple environmental covariates on soil organic carbon is quantified and embedded to generate an interaction effect function, including:
[0080] The Hadamard product is used to characterize the nonlinear interaction effect between environmental covariates, resulting in the interaction effect function:
[0081] ;
[0082] in, and As an environmental covariate, This represents the Hadamard product. This represents the interaction weight coefficient.
[0083] In some optional implementations, a composite weighting function is constructed by coupling a gradient threshold weighting function and an interaction effect function to characterize the spatial differentiation of soil organic carbon, including:
[0084] By weighting and coupling the gradient threshold weight function and the interaction effect function, a composite weight function is obtained:
[0085] ;
[0086] in, These are parameters that can be dynamically adjusted.
[0087] In some optional implementations, a composite weight function is used as a penalty term or a weight term, embedded in the training objective function of the machine learning model, and the model is trained to obtain an organic carbon identification model, including:
[0088] In the objective function of the machine learning model, a Kullback-Leibler divergence constraint term and a spatial variance penalty term are added, resulting in the objective function:
[0089] ;
[0090] in, The loss function is the difference between the predicted and actual values. The parameters are the model parameters for the consistency of the distribution of composite weights and theoretical weights, i.e., the range of allowed SOC values is within the threshold range, to realize the construction of XGBoost machine learning model based on composite weight function. and To adjust the parameters.
[0091] In some alternative implementations, the machine learning model is an XGBoost model, LightGBM, CatBoost, or a random forest model;
[0092] Environmental covariates include at least one of the following: micro-topographic features, vegetation type and its growth indicators, land use patterns, farming patterns, and farmland management measures.
[0093] Specifically, in this embodiment, the machine learning model adopts the XGBoost model, and uses R², RMSE, and RPD as core evaluation indicators to optimize the key parameters of the model. The model is also compared and verified with traditional models to ensure high accuracy and high stability.
[0094] The training process of the XGBoost model includes:
[0095] Prepare the training dataset, including measured values of soil organic carbon (SOC) and soil salinity (SSC) data from sample points, as well as multi-source environmental covariates (such as soil texture, water content, vegetation index, etc.).
[0096] Salinity gradient constraint quantification: Based on SSC data, the study area is divided into different salinity gradients (e.g., mild, moderate, severe, and extremely severe) using methods such as K-means. For each gradient, the relationship between SOC and SSC is analyzed using a piecewise quantile regression model to determine the reasonable threshold and constraint range of SOC under that gradient.
[0097] Constructing a customized objective function .
[0098] Parameter initialization and iterative training: Initialize the XGBoost model parameters (such as tree structure and learning rate) and mechanism parameters (gradient constraint coefficients, threshold adjustment parameters, interaction weight coefficients, balance parameters, penalty coefficients, etc.).
[0099] Start iteration (Boosting): a. Forward propagation: Calculate the SOC prediction value for all samples based on the current model (a set of decision trees). b. Loss Calculation: Calculate the value of the complete objective function L. This loss value measures not only the prediction error but also the degree to which the predicted value violates physical constraints (exceeding the threshold range, spatial discontinuity). c. Gradient Calculation and Backpropagation: Calculate the objective function L with respect to the model's predicted values. The gradient (first derivative) and second derivative (Hessian matrix) are used to construct the next decision tree. This gradient contains signals from both data fitting and mechanism penalties. The XGBoost algorithm uses these gradients and the Hessian matrix to greedily find the split point that maximizes the reduction of the objective function L. The new tree is then added to the model, updating the predictions. .
[0100] During training or through cross-validation, the mechanism parameters (gradient constraint coefficients, threshold adjustment parameters, interaction weight coefficients, balance parameters, penalty coefficients, etc.) are tuned (e.g., by using variable iterative experiments or grid search) to find the parameter combination that makes the model simultaneously achieve the highest prediction accuracy (maximum R², minimum RMSE) and the best mechanism fit (minimum spatial variance) on the validation set.
[0101] Repeat the above steps until the preset number of iterations is reached or the loss function converges to obtain the trained organic carbon identification model.
[0102] In this embodiment, the key parameters to be optimized include: gradient constraint coefficient α, threshold adjustment parameter τ, decay coefficient κ, interaction weight coefficient γ, and spatial smoothness coefficient β. The parameters are preferably obtained using the variable iterative experiment method, as follows:
[0103] Optimal salt gradient partitioning: K-means clustering was used to divide the SSC of the study area into four gradients: mild, moderate, severe, and extremely severe. Validated by the AIC criterion, this hierarchical method can optimally express the segmented constraint characteristics of salt on SOC, ensuring the rationality of the gradient partitioning.
[0104] Optimization of composite weight parameters: Conduct variable experiments within each salinity gradient, fix other parameters, adjust the values of individual parameters one by one, and screen out the parameter combination that makes the model simultaneously satisfy the highest prediction accuracy (maximum R², minimum RMSE, RPD≥2) and optimal spatial continuity (minimum spatial variance), ensuring that the composite weight function can stably represent gradient constraints and interaction driving mechanisms.
[0105] Model regularization parameter optimization: Regularization parameters such as α, β, γ, and λ are optimized using a grid search method to avoid model overfitting and improve the model's generalization ability.
[0106] One possible structure of the electronic device provided in this application includes: a processor, a memory, and a communication interface, which are interconnected and communicate with each other via a communication bus and / or other forms of connection mechanisms.
[0107] The memory includes one or more, which may be, but is not limited to, Random Access Memory (RAM), Read Only Memory (ROM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), etc. The processor and other possible components can access the memory, reading and / or writing data therein.
[0108] A processor may be one or more, and can be an integrated circuit chip with signal processing capabilities. The aforementioned processors can be general-purpose processors, including Central Processing Units (CPUs), Microcontroller Units (MCUs), Network Processors (NPs), or other conventional processors; they can also be special-purpose processors, including Neural-network Processing Units (NPUs), Graphics Processing Units (GPUs), Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. Furthermore, when there are multiple processors, some may be general-purpose processors, while others may be special-purpose processors.
[0109] A communication interface includes one or more devices that can be used to communicate directly or indirectly with other devices to exchange data. The communication interface may include interfaces for wired and / or wireless communication.
[0110] One or more computer program instructions may be stored in the memory, and the processor may read and execute these computer program instructions to implement the methods provided in the embodiments of this application.
[0111] Understandably, electronic devices can include more or fewer components, or different structures, and these components can be implemented using hardware, software, or a combination thereof. Electronic devices can be physical devices, such as PCs, laptops, tablets, mobile phones, servers, and embedded devices, or they can be virtual devices, such as virtual machines and virtualized containers. Furthermore, electronic devices are not limited to single devices; they can also be combinations of multiple devices or clusters of numerous devices.
[0112] This application provides a computer program product, including a computer program / instruction, which, when executed by a processor, implements the steps of any of the methods described above.
[0113] In the embodiments provided in this application, it should be understood that the disclosed apparatus and methods can be implemented in other ways. The apparatus embodiments described above are merely illustrative. For example, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. Furthermore, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Additionally, the displayed or discussed mutual couplings, direct couplings, or communication connections may be through some communication interfaces; indirect couplings or communication connections between devices or units may be electrical, mechanical, or other forms.
[0114] Furthermore, the units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0115] Furthermore, the functional modules in the various embodiments of this application can be integrated together to form an independent part, or each module can exist independently, or two or more modules can be integrated to form an independent part.
[0116] In this document, relational terms such as first and second are used only to distinguish one entity or operation from another entity or operation, without necessarily requiring or implying any such actual relationship or order between these entities or operations.
[0117] The above description is merely an embodiment of this application and is not intended to limit the scope of protection of this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this application should be included within the scope of protection of this application.
Claims
1. A method of soil organic carbon identification, characterized by, include: Acquire soil salinity data and environmental covariate data for the area to be tested; Based on the soil salinity data, the area to be tested is divided into different salinity gradient levels; By constructing constraint models for soil organic carbon, soil salinity, and environmental covariates at each salinity gradient level, the threshold and constraint range of soil organic carbon at the corresponding salinity gradient level are determined. Based on the threshold and constraint interval, a gradient threshold weighting function is generated to constrain the spatial distribution continuity of soil organic carbon. Based on the environmental covariate data, the impact of nonlinear interactions among multiple environmental covariates on soil organic carbon is quantified and embedded to generate an interaction effect function. By coupling the gradient threshold weight function and the interaction effect function, a composite weight function is constructed to characterize the spatial differentiation of soil organic carbon. The composite weight function is used as a penalty term or a weight term, embedded into the training objective function of the machine learning model, and the model is trained to obtain an organic carbon identification model. The organic carbon identification model is input into the salinity and environmental covariates to predict and output the organic carbon identification results.
2. The method of claim 1, wherein, Based on the soil salinity data, the area to be tested is divided into different salinity gradient levels, including: The K-means clustering method was used to classify the soil salinity data, dividing the area under test into four salinity gradient levels: slightly, moderately, severely, and extremely severely saline-alkali areas.
3. The method of claim 1, wherein, The process involves constructing constraint models for soil organic carbon, soil salinity, and environmental covariates at each salinity gradient level to determine the threshold and constraint range for soil organic carbon at the corresponding salinity gradient level, including: For different salinity gradient levels, a piecewise quantile regression method was used to construct a constraint model between soil salinity and soil organic carbon; The AIC criterion is used to optimize the structure of the piecewise regression model and determine the optimal piecewise model. Based on the inflection point and slope of the regression coefficient changes in the optimal piecewise model, the threshold and constraint range of soil organic carbon at different salinity gradient levels are identified.
4. The method of claim 1, wherein, The step of generating a gradient threshold weighting function to constrain the spatial distribution continuity of soil organic carbon based on the threshold and constraint interval includes: A spatial continuity constraint function is constructed using the Laplace operator to express the gradient relationship of soil organic carbon constrained by salinity; Based on the aforementioned threshold and constraint interval, set the gradient threshold weight function: ; wherein, is a spatial gradient of soil organic carbon, is a decay coefficient, is a dynamic threshold parameter determined by a threshold value and a constraint interval.
5. The method of claim 4, wherein, The step of quantifying and embedding the impact of nonlinear interactions among multiple environmental covariates on soil organic carbon based on the environmental covariate data, and generating an interaction effect function, includes: The Hadamard product is used to characterize the nonlinear interaction effect between environmental covariates, resulting in the interaction effect function: ; wherein, and is an environmental covariate, denotes the Hadamard product, is an interaction weight coefficient.
6. The method of claim 5, wherein, The coupling of the gradient threshold weight function and the interaction effect function to construct a composite weight function for characterizing the spatial differentiation of soil organic carbon includes: By weighting and coupling the gradient threshold weight function and the interaction effect function, a composite weight function is obtained: ; wherein is a dynamic adjustment parameter.
7. The method of claim 6, wherein, The step of using the composite weight function as a penalty term or a weight term, embedding it into the training objective function of the machine learning model, and training the model to obtain the organic carbon identification model includes: In the objective function of the machine learning model, a Kullback-Leibler divergence constraint term and a spatial variance penalty term are added, resulting in the objective function: ; wherein, is a loss function of predicted values and true values, is a distribution consistency model parameter of composite weights and theoretical weights, and is an adjustment parameter.
8. The method of claim 1, wherein, The machine learning model is an XGBoost model, LightGBM, CatBoost, or random forest model. The environmental covariates include at least one of the following: micro-topographic features, vegetation type and its growth indicators, land use patterns, farming patterns, and farmland management measures.
9. An electronic device, characterized in that, include: A processor and a memory, the memory storing machine-readable instructions executable by the processor, which, when executed by the processor, perform the method as described in any one of claims 1-8.
10. A computer program product, comprising a computer program / instructions, characterized in that, When the computer program / instructions are executed by the processor, they implement the steps of the method described in any one of claims 1-8.