A forest carbon sink prediction method based on spatial heterogeneous decoupling and hierarchical distributed autoregressive meta-learning fusion
By fusing spatial heterogeneous decoupling with hierarchical distributed autoregressive meta-learning, this method solves the problems of ecological and environmental heterogeneity, multi-source driving factor coupling interference, and time-series lag effects in forest carbon sink prediction. It achieves high-precision, stable, and interpretable prediction of forest carbon density and carbon sink, supporting the realization of carbon peaking and carbon neutrality strategic goals.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- INST OF FOREST ECOLOGY ENVIRONMENT & PROTECTION CHINESE ACAD OF FORESTRY
- Filing Date
- 2026-03-31
- Publication Date
- 2026-06-26
Smart Images

Figure CN122288015A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of ecological environment monitoring and carbon cycle assessment technology, and in particular to a forest carbon sink prediction method based on the fusion of spatial heterogeneous decoupling and hierarchical distributed autoregressive meta-learning. Background Technology
[0002] As the largest carbon sink among terrestrial ecosystems, forest ecosystems play an irreplaceable and crucial role in the global carbon cycle. Forests absorb carbon dioxide from the atmosphere through photosynthesis and fix it in vegetation and soil, forming forest carbon sinks. Forest carbon sinks not only mitigate climate change caused by rising greenhouse gas concentrations but also play a vital role in maintaining ecological security, preserving biodiversity, and achieving carbon peaking and carbon neutrality strategic goals. Therefore, establishing high-precision, multi-scale, long-term series methods for estimating and predicting forest carbon sinks is a key technical issue in the field of ecological environment monitoring and carbon sink management.
[0003] Early forest carbon sink predictions primarily relied on process-based ecosystem models. These models simulate the mechanisms of carbon, water, and energy exchange by constructing biophysical and biochemical process equations. However, due to the large number of model parameters and their high dependence on field survey data, it is difficult to obtain complete and reliable input information in complex terrain areas or regions with scarce data, resulting in high uncertainty of model parameters and large prediction errors. Furthermore, differences in survey methods and data standards across different regions further affect the stability and comparability of model results.
[0004] With the development of remote sensing and geographic information technologies, machine learning and deep learning algorithms that combine multi-source data, such as ground plot survey data, satellite imagery data (e.g., Landsat, MODIS), and meteorological reanalysis data (e.g., ERA5), have gradually become the mainstream technical approach for predicting forest vegetation carbon density. These data-driven models can improve prediction accuracy and spatial resolution to some extent, while reducing reliance on complex mechanistic equations.
[0005] However, existing data-driven models still have the following problems: First, the handling of spatial heterogeneity is insufficient. my country's forests are distributed across multiple climate zones and topographic regions, and the growth driving mechanisms of different ecological regions vary significantly. Traditional global models typically use uniform parameters for fitting, ignoring spatial non-stationarity. This can easily lead to the dominant regions with a large sample size influencing the model's weight allocation, thereby reducing the prediction accuracy of ecological edge areas or special regions.
[0006] Second, multi-source feature coupling modeling suffers from interference effects. Complex nonlinear relationships exist between ground survey data, remote sensing features, climate factors, topographic factors, and soil properties. Traditional models often directly input all variables into the same network or regression framework for unified training, leading to coupling interference between different physical driving layers and affecting the accuracy of feature contribution representation.
[0007] Third, the integration strategy is relatively simple. Some existing studies use model integration methods to improve stability, but most of them use simple weighted averages or linear combinations, which makes it difficult to identify the failure modes of each sub-model under different ecological and environmental conditions, and lacks dynamic weight adjustment and nonlinear bias correction mechanisms.
[0008] Fourth, the time lag effect is not adequately depicted. Forest growth and carbon sequestration processes are affected by long-term lag effects of climate factors, and traditional static models or simple time-series models are unable to adaptively adjust the memory length for different ecological regions, thus failing to fully express the differences in the response of ecosystems in different regions to climate.
[0009] Fifth, insufficient interpretability. While end-to-end deep learning models have certain advantages in numerical prediction, the internal decision-making process of the models is difficult to interpret, which is not conducive to ecological management decisions and policy formulation.
[0010] Therefore, in the prediction of carbon density and carbon sink in large-scale forest vegetation, there is an urgent need for a comprehensive prediction method that can explicitly handle spatial heterogeneity, separate the influence of different driving layers, support distributed modeling and dynamic fusion, and has interpretability, so as to improve the overall accuracy, robustness and engineering practical value of the model. Summary of the Invention
[0011] In view of this, the purpose of this invention is to provide a forest carbon sink prediction method based on the fusion of spatial heterogeneous decoupling and hierarchical distributed autoregressive meta-learning, so as to solve the problems of existing forest carbon sink prediction methods that are difficult to effectively characterize the heterogeneity of the ecological environment in large-scale spatial environments, have significant interference from the coupling of multiple driving factors, have insufficient expression of time lag effects, and have limited dynamic bias correction capabilities of integrated models, thereby improving the accuracy, stability and generalization ability of forest vegetation carbon density and carbon sink prediction.
[0012] Furthermore, this invention constructs a hierarchical distributed learning architecture of "multi-source feature decoupling modeling - spatial homogeneity partitioning - distributed temporal autoregressive prediction - meta-learning fusion and system bias correction" to achieve a structured expression of the driving mechanism of complex ecosystems and progressive error convergence control.
[0013] To achieve the above objectives, the present invention provides the following technical solution: In one embodiment of the present invention, a forest carbon sink prediction method based on the fusion of spatial heterogeneous decoupling and hierarchical distributed autoregressive meta-learning is provided, comprising the following steps: S1. Construction and unified representation of multi-source heterogeneous features: Acquire multi-source ecological attribute data of the target forest area, clean, align and standardize data from different sources and at different temporal and spatial scales, and construct multi-dimensional spatial feature attribute data including ground structure features, remote sensing temporal features, meteorological driving features, topographic and geological features and soil ecological features; S2, Decoupling Modeling of Category Drivers: Independent nonlinear prediction sub-models are established for the different categories of features, so that different ecological driving factors are mapped and modeled in mutually isolated modeling channels, and the intermediate representation vectors or stage prediction results of the category features are output. Among them, the parameters of each category of prediction sub-model are independent and not shared; S3. Spatial homogeneity partitioning based on multidimensional feature space: Using the multidimensional feature space constructed in step S1 and the category features output in step S2 as joint inputs, an unsupervised clustering method is used to spatially divide the target forest area, generating multiple ecologically homogeneous clusters with relatively consistent ecological attributes. S4, Distributed Deep Autoregressive Enhanced Prediction: For each of the aforementioned ecological homogeneous clusters, an independent nonlinear autoregressive prediction model is constructed. in: The intermediate representation results output by the category feature decoupling modeling are used as exogenous input variables of the autoregressive prediction model; Historical forest vegetation carbon density sequence was used as an input for the autoregressive term; The parameters of the autoregressive prediction models corresponding to each ecological homogeneous cluster are independent and not shared; By modeling each ecological homogeneous cluster separately, the initial forest vegetation carbon density prediction value of each ecological homogeneous cluster is obtained. S5, Cross-cluster Meta-learning Fusion and System Bias Correction: The initial forest vegetation carbon density prediction values of each ecological homogeneous cluster are used as the output of the first-level learner, while key original ecological covariate features are reintroduced to construct a meta-learning fusion model. The meta-learning fusion model performs nonlinear mapping modeling based on the joint input of the initial predicted value and the key original ecological covariate features, realizing dynamic weighted fusion and systematic bias correction of the distributed prediction results, and obtaining the final forest vegetation carbon density prediction result. S6. Calculation of annual carbon sequestration: Based on the final forest vegetation carbon density (tC ha)-1 The predicted results were used to calculate forest vegetation carbon storage (tC) based on forest area, and the annual carbon sink of forest vegetation (tC·ha) was calculated using the continuous period difference method. -1 · a -1 ).
[0014] Furthermore, the ground survey data includes stand origin, dominant tree species, canopy closure, tree diameter at breast height (cm), stand age (a), and stand density.
[0015] Preferably, the remote sensing time series features include time series data of NDVI, NPP and LAI, and the area under the time series curve, maximum value and growth rate are extracted as input features.
[0016] Furthermore, the meteorological driving characteristics include monthly and annual temperature (°C) and precipitation (mm), consecutive dry days (a), consecutive wet days (a), growing season length (a), relative humidity (%) and potential evapotranspiration (mm), and construct lagging climate accumulation characteristics over the past many years.
[0017] Optionally, the geological features include elevation, aspect, and slope, and a composite topographic index is constructed.
[0018] Furthermore, the soil characteristics include soil density (g / cm³). 3 Soil porosity (mm), cation exchange capacity (cmol+ / kg), soil texture (%), soil pH, soil moisture (%), soil organic carbon content (g / kg), soil total nitrogen content (g / kg), soil available nitrogen (mg / kg), soil total phosphorus (g / kg), soil available phosphorus (mg / kg), soil total potassium (g / kg), and soil available potassium (mg / kg).
[0019] Preferably, in step S2: the ground survey data is modeled using regression models such as generalized additive models or machine learning algorithms such as random forests; Remote sensing temporal features are modeled using deep learning algorithms such as long short-term memory networks or one-dimensional convolutional neural networks; Meteorological driving features are modeled using machine learning algorithms such as gradient boosting tree model (LightGBM / XGboost); Geological features are modeled using geographic weighted regression or neural network embedding layer algorithms; Soil characteristics are modeled using regression models such as partial least squares regression or machine learning algorithms such as support vector regression.
[0020] Furthermore, in step S3, a self-organizing map neural network is used for unsupervised clustering to reduce the non-stationarity of the modeled geospatial environment.
[0021] Preferably, the autoregressive deep model in step S4 is a nonlinear autoregressive model with exogenous input variables (NARX), and the learning rate and time memory step size are adaptively adjusted for different ecological homogeneous clusters.
[0022] Furthermore, the meta-learning model in step S5 is a LightGBM / XGboost / Lasso regression model. While receiving the prediction results of each ecological homogeneous cluster, the meta-learning model re-inputs the key original covariate features and further uses the SHAP interpretation framework to perform contribution attribution analysis on the final prediction result of forest vegetation carbon density, quantifying the marginal contribution of each input variable to the predicted value of forest vegetation carbon density.
[0023] In one possible implementation, a forest carbon sink prediction system based on the fusion of spatial heterogeneous decoupling and hierarchical distributed autoregressive meta-learning is provided, comprising: Data construction and unified mapping module, This is used to obtain multi-source ecological attribute data of the target forest area, and to clean, align and standardize data from different sources and at different spatial and temporal scales to construct multi-dimensional feature spatial attribute data; Category-driven factor decoupling modeling module, It is used to build independent nonlinear prediction sub-models for different categories of features, and outputs intermediate representation vectors or stage prediction results of category features in mutually isolated modeling channels; Among them, the parameters of each category of prediction sub-model are independent and not shared; Spatial homogeneity partitioning module, Unsupervised clustering is performed based on the multidimensional feature space and the category feature representation results to generate multiple ecological homogeneous clusters with relatively consistent ecological attributes. Distributed autoregressive prediction module This is used to construct independent nonlinear autoregressive prediction models for each ecologically homogeneous cluster. in: The intermediate representation results output by the category feature decoupling modeling module are used as exogenous input variables of the autoregressive prediction model; Historical forest vegetation carbon density sequence was used as an input for the autoregressive term; The parameters of the autoregressive prediction models corresponding to each ecological homogeneous cluster are independent and not shared; Used to output the initial forest vegetation carbon density prediction value for each ecological homogeneous cluster; Meta-learning fusion and bias correction module This is used to receive the initial forest vegetation carbon density prediction values from each ecological homogeneous cluster, and at the same time reintroduce the key original ecological covariate features to construct a meta-learning fusion model. The distributed prediction results are then subjected to nonlinear weighted fusion and systematic bias correction to output the final prediction result of forest vegetation carbon density. Carbon sink calculation module Forest vegetation carbon storage is calculated based on forest vegetation carbon density prediction results and forest area, and annual forest carbon sink is calculated using the continuous period difference method. in, The category-driven factor decoupling modeling module and the distributed autoregressive prediction module constitute the first layer of the learning structure. The meta-learning fusion and bias correction module constitutes the second-layer learning structure. The two-layer learning structure forms a progressive error convergence system, achieving a leapfrog improvement in the performance of the multi-level architecture.
[0024] The forest carbon sink prediction method based on spatial heterogeneous decoupling and distributed deep autoregressive meta-learning of the present invention achieves refined modeling and high-precision prediction of forest vegetation carbon density by constructing a multi-level prediction architecture of "multi-source feature decoupling - ecological cluster classification - distributed autoregressive modeling - meta-learning nonlinear fusion".
[0025] Specifically, this invention first establishes a comprehensive feature system that encompasses multi-dimensional information such as biophysical driving forces, spectral productivity, climate stress, topographic constraints, and soil nutrient supply by fusing multi-source heterogeneous data. This overcomes the problem of insufficient information expression caused by the dependence of traditional models on a single data source and improves the efficiency of basic data utilization.
[0026] Secondly, by decoupling and modeling different categories of features, a nonlinear mapping relationship between them and forest carbon density is established, avoiding the feature weight imbalance problem caused by multivariate coupling interference in traditional single-unit models, and improving the model's ability to express the driving mechanism of complex ecosystems from a structural level.
[0027] Furthermore, by introducing a self-organizing map neural network for spatial physical similarity classification, large-scale forest areas are divided into several ecologically homogeneous clusters, which effectively reduces the model generalization bias caused by geospatial nonstationarity, allowing each local model to learn in a relatively stable ecological environment, thereby improving the prediction accuracy and robustness of local areas.
[0028] Based on this, the present invention constructs an independent nonlinear autoregressive deep model for each ecological homogeneous cluster, and adaptively adjusts the model memory step size and learning rate parameters for different ecological characteristics, so as to achieve differentiated characterization of the "time lag effect" of different ecosystems and avoid the time-series interference problem brought about by the traditional global weight sharing mechanism.
[0029] Furthermore, by constructing a meta-learning fusion layer with a rescanning mechanism, key original covariate features are reintroduced while receiving the prediction results of each distributed sub-model, thereby achieving nonlinear reweighting and dynamic bias correction of the sub-model outputs, which significantly enhances the robustness and generalization ability of the ensemble model.
[0030] After predicting the carbon density of forest vegetation, this invention further combines forest area data to calculate the carbon storage of forest vegetation, and calculates the carbon sink of forest through the difference method, realizing a complete calculation chain from carbon density to carbon storage and then to carbon sink, thereby improving the systematicness and engineering practical value of forest carbon sink estimation.
[0031] Meanwhile, by introducing the SHAP interpretation framework, the contribution of each input variable in different ecological classification regions is quantitatively analyzed, making the model results interpretable and providing a scientific basis at the physical mechanism level for forest management decisions, carbon sink management and policy formulation.
[0032] Compared to traditional global single-unit models or end-to-end deep learning models, this invention effectively solves the problems of difficult spatial nonstationarity, insufficient local accuracy, weak ensemble robustness, and poor interpretability in existing technologies through the collaborative design of explicit spatial heterogeneity processing, distributed expert model isolation, and meta-learning bias correction mechanism. It significantly improves the accuracy, stability, and generalization ability of forest carbon density and carbon sink prediction.
[0033] Therefore, this invention has higher prediction accuracy, stronger model robustness and better interpretability in national-scale forest carbon sink estimation, and can provide highly reliable data support for achieving the strategic goals of carbon peaking and carbon neutrality. Attached Figure Description
[0034] Figure 1 This is a schematic diagram of the overall process framework of a forest carbon sink prediction method based on the fusion of spatial heterogeneous decoupling and hierarchical distributed autoregressive meta-learning according to the present invention. Figure 2 This is a schematic diagram of the multi-source heterogeneous feature decoupling modeling structure of the present invention; Figure 3 This is a schematic diagram of the fusion structure of the distributed deep autoregressive model and meta-learning of the present invention. Detailed Implementation
[0035] The technical solution of the present invention will be further described below with reference to the accompanying drawings. Those skilled in the art should understand that the following embodiments are for illustrative purposes only and are not intended to limit the scope of protection of the present invention. Equivalent substitutions or improvements to the technical solution without departing from the spirit of the present invention should fall within the scope of protection of the present invention.
[0036] I. Overall Implementation Structure Description (in conjunction with...) Figure 1 ) In this embodiment, as Figure 1 As shown, this invention provides a forest carbon sink prediction method based on spatial heterogeneous decoupling and distributed deep autoregressive meta-learning, which is constructed as a multi-level, modular, and distributed prediction architecture. This architecture is based on the fusion of multi-source heterogeneous data, uses spatial physical similarity classification as an intermediate transition layer, and employs a distributed deep autoregressive model as the core prediction unit. It achieves global fusion and bias correction through a meta-learning mechanism, ultimately calculating forest vegetation carbon density, carbon storage, and annual carbon sink.
[0037] like Figure 1 As shown, the method of the present invention generally includes the following functional modules: (a) Multi-source heterogeneous data acquisition and preprocessing module; (ii) Feature construction and decoupling modeling module; (iii) Ecological homogeneous cluster partitioning module based on self-organizing mapping; (iv) Distributed deep autoregressive forest vegetation carbon density prediction module; (v) Meta-learning fusion and dynamic bias correction module; (vi) Module for calculating forest vegetation carbon storage and annual carbon sink; (vii) Explainability Attribution Analysis Module.
[0038] The modules described above are connected sequentially according to a preset data flow order, forming a complete prediction chain from bottom-level data-driven to high-level decision output.
[0039] Specifically, firstly, the multi-source heterogeneous data acquisition and preprocessing module performs spatial registration, time scale unification, outlier removal, and standardization on ground survey data, remote sensing time series data, meteorological driving data, geological and topographic data, and soil attribute data to construct a unified data input matrix.
[0040] Subsequently, the feature construction and decoupling modeling module performs hierarchical processing on different categories of features, structurally separating and modeling biophysical driving factors, spectral productivity factors, climate and environmental factors, topographic constraint factors, and soil nutrient factors to generate corresponding intermediate prediction outputs or embedded feature representations.
[0041] Based on this, an ecological homogeneous clustering module based on self-organizing mapping is used to perform unsupervised clustering of forest samples across the entire region. The study area is divided into several ecological homogeneous regions according to physical similarity, thereby reducing the impact of geospatial nonstationarity on modeling stability.
[0042] Furthermore, the distributed deep autoregressive prediction module constructs an independent nonlinear autoregressive model for each ecologically homogeneous cluster, and performs time-series prediction of forest vegetation carbon density within each cluster, thereby achieving independent learning and parameter isolation of local expert models.
[0043] After obtaining the prediction results of each ecological cluster, the meta-learning fusion and dynamic bias correction module takes the prediction results of the distributed sub-model as new input features and combines them with key original covariates. Through the meta-learner, it performs nonlinear fusion to achieve reweighting and error correction of the sub-model output.
[0044] Finally, the forest vegetation carbon storage and annual carbon sink calculation module calculates forest vegetation carbon storage based on the predicted forest vegetation carbon density results and forest area data, and calculates the annual forest carbon sink through the difference method, forming a complete carbon sink estimation result output.
[0045] Meanwhile, the interpretability attribution analysis module performs variable contribution analysis on the final prediction results, quantifies the marginal impact of each driving factor on changes in forest vegetation carbon density in different ecological classification regions, and thus provides an interpretable scientific basis for forest management and policy formulation.
[0046] As can be seen, this invention achieves full-process collaborative optimization from multi-source data input, spatial heterogeneity processing, distributed modeling to meta-learning fusion through modular and hierarchical design, and constructs an overall architecture for forest carbon sink prediction that can explicitly handle spatial non-stationarity, support temporal lag modeling and has interpretability.
[0047] II. Multi-source heterogeneous data acquisition and preprocessing (corresponding) Figure 1 front end) In this embodiment, as Figure 1 As shown, the multi-source heterogeneous data acquisition and preprocessing module is located at the front end of the overall prediction process. Its main function is to build a unified, standardized, and computable input data system, providing a high-quality data foundation for subsequent feature decoupling modeling and distributed prediction.
[0048] (a) Multi-source data acquisition The multi-source heterogeneous data mentioned in this embodiment includes: ground survey data, remote sensing time series data, meteorological driving data, geological and topographic data, and soil property data.
[0049] 1. Ground survey data Data derived from national or regional forest resource inventory and sample plot surveys mainly includes ground structure characteristics such as stand origin, dominant tree species, canopy closure, tree diameter at breast height, stand age, and stand density.
[0050] 2. Remote sensing time series data The data are derived from satellite remote sensing imagery products, including but not limited to Landsat series data and MODIS products. Time-series data such as Normalized Difference Vegetation Index (NDVI), Net Primary Productivity (NPP), and Leaf Area Index (LAI) are obtained through image processing.
[0051] 3. Meteorological driving data Data derived from meteorological station observations or reanalysis data products mainly includes meteorological driving characteristics such as monthly and annual temperature and precipitation, consecutive dry days, consecutive wet days, growing season length, relative humidity, and potential evapotranspiration. It can also be used to construct lagged climate accumulation characteristics over the past many years.
[0052] 4. Geological and topographical data It is derived from Digital Elevation Model (DEM) data, mainly including geological and topographic features such as elevation, aspect and slope, and can be further used to construct composite topographic indices.
[0053] 5. Soil property data Data derived from soil databases or soil surveys mainly includes soil characteristics such as soil density, soil porosity, cation exchange capacity, soil texture, soil pH, soil moisture, soil organic carbon content, soil total nitrogen content, soil available nitrogen, soil total phosphorus, soil available phosphorus, soil total potassium, and soil available potassium.
[0054] The aforementioned multi-source heterogeneous data are collected through a unified data interface, and a corresponding multi-source feature database is established.
[0055] (ii) Unity of spatial and temporal scales Because various data sources have different spatial resolutions and time scales, they need to be processed uniformly.
[0056] 1. Spatial registration processing All data are unified to the same geographic coordinate system and projected coordinate system, and resampled to a uniform spatial resolution.
[0057] For point plot data, spatial correspondence with remote sensing raster data is achieved through spatial interpolation or raster matching.
[0058] 2. Time scale alignment Unify remote sensing data and meteorological data to an annual scale.
[0059] For high temporal resolution data, calculate the annual average or the cumulative value of the growing season.
[0060] Through the above steps, a spatially and temporally consistent data matrix is constructed.
[0061] (III) Data Quality Control To ensure model stability, this embodiment further performs quality control on the raw data, including: 1. Handling missing values: Missing data can be filled using mean interpolation, K-nearest neighbor interpolation, or time series interpolation methods.
[0062] 2. Outlier Detection and Removal: Abnormal samples can be identified using box plots or Z-score normalization.
[0063] 3. Consistency check: Perform a reasonableness check on the range of physical quantities between different data sources.
[0064] (iv) Feature construction and generation of derived variables After data cleaning, further feature engineering processing is performed: 1. Remote sensing temporal feature extraction Extracted from NDVI, NPP, and LAI time series: Area under the curve (AUC) Maximum value growth rate Peak occurrence time 2. Construction of Climate Lag Characteristics Construct the cumulative or average values of climate factors over the past 1, 3, and 5 years to form lagged variables.
[0065] 3. Construction of Composite Terrain Indices The comprehensive topographic index is calculated based on slope and aspect.
[0066] 4. Identification of soil limiting factors Identify dominant soil nutrient factors based on correlation analysis or variance contribution analysis.
[0067] (v) Data standardization and input matrix construction Finally, all continuous variables are normalized or standardized to ensure that features of different dimensions are comparable during model training.
[0068] After completing the above steps, a unified input feature matrix is formed, which serves as the input for the subsequent feature decoupling modeling module.
[0069] Through the above-mentioned multi-source data acquisition, scale unification, quality control and feature construction process, this invention establishes a multi-dimensional, highly consistent and low-noise input data system, which provides a reliable data foundation for subsequent spatial heterogeneous decoupling modeling and distributed deep autoregressive prediction, while improving the stability and generalization ability of model training.
[0070] III. Multi-source feature decoupling modeling module (combined with...) Figure 2 ) In this embodiment, as Figure 2 As shown, the multi-source feature decoupling modeling module is located after data preprocessing. Its core objective is to structurally separate and model different physical driving layers, and to characterize the nonlinear relationship between them and forest vegetation carbon density, thereby avoiding the weight interference problem caused by direct coupling of multi-source features to the input, and improving the physical expressive power and stability of the model.
[0071] like Figure 2 As shown, this module includes five independent but parallel-running sub-modeling units, each corresponding to a different category of driving feature layer: (a) Biophysical driving layer modeling unit; (ii) Modeling unit for the spectral productivity-driven layer; (iii) Environmental stress-driven layer modeling unit; (iv) Terrain-constrained driving layer modeling unit; (v) Modeling unit for soil nutrient-driven layer.
[0072] Each sub-modeling unit independently receives the feature variables of the corresponding category, outputs intermediate predicted values or embedded feature vectors, and passes them to the subsequent distributed prediction module.
[0073] (I) Biophysical driving layer modeling In this embodiment, the biophysical driving layer is mainly based on ground survey data to build the model. Input variables include stand age, dominant tree species, canopy closure, diameter at breast height (DBH), stand density, and stand origin.
[0074] Preferably, a generalized additive model (GAM) or a random forest model is used to model the above variables to construct a baseline mapping relationship between forest vegetation carbon density and stand structure factors.
[0075] This sub-model focuses on characterizing the nonlinear trend between forest growth and carbon sequestration, and is used to provide a structural baseline estimate of forest vegetation carbon density.
[0076] (II) Modeling of the Spectral Productivity-Driven Layer The spectral productivity-driven layer primarily processes remote sensing temporal features, including time-series derived variables of NDVI, NPP, and LAI.
[0077] In this embodiment, a Long Short-Term Memory (LSTM) network or a one-dimensional convolutional neural network (1D-CNN) is preferably used to model the temporal features in order to extract the dynamic change patterns over time.
[0078] In some implementations, statistical features (such as area under the time series curve, extreme values, growth rate, etc.) can be extracted first, and then input into the gradient boosting tree model for modeling.
[0079] This sub-model is used to characterize the dynamic response relationship between changes in vegetation productivity and carbon sequestration.
[0080] (III) Modeling of the Environmental Stress-Driven Layer The environmental stress driving layer mainly deals with climate variables, including temperature, precipitation, relative humidity, growing season length, and potential evapotranspiration.
[0081] In this embodiment, a gradient boosting tree model (such as XGBoost / LightGBM) is preferably used for modeling, and lagged feature variables are introduced to characterize the delayed impact of climate factors on forest carbon sequestration.
[0082] Specifically, cumulative climate variables from the past 1, 3, and 5 years can be constructed as input features to enhance the model's ability to express long-term climate change trends.
[0083] (iv) Terrain-constrained driving layer modeling The terrain constraint driving layer is mainly based on terrain factors such as elevation, slope and aspect to build the model.
[0084] In this embodiment, a geographic weighted regression model can be used to assign differentiated weights to different spatial locations, or an embedding layer can be constructed in the neural network structure to express spatial location features.
[0085] This sub-model is used to characterize the indirect effects of topography on the distribution of hydrothermal conditions and the forest growth environment.
[0086] (v) Modeling of soil nutrient driving layer The soil nutrient driving layer mainly treats soil characteristics, including soil density, soil porosity, cation exchange capacity, soil texture, soil pH, soil moisture, soil organic carbon content, soil total nitrogen content, soil available nitrogen, soil total phosphorus, soil available phosphorus, soil total potassium, and soil available potassium.
[0087] In this embodiment, partial least squares regression or other regression models, or support vector regression or other machine learning algorithms are preferably used for modeling to reduce the impact of multicollinearity on model stability.
[0088] This sub-model focuses on identifying limiting soil factors and characterizing the regulatory effects of soil nutrient supply capacity and soil physicochemical properties on forest vegetation carbon density.
[0089] (vi) Collaborative Mechanism of Decoupling Modeling like Figure 2 As shown, the five sub-models mentioned above are not simply arranged side-by-side, but rather structurally separate the driving layers. Each sub-model outputs: Predicted carbon density of intermediate forest vegetation; or Feature embedding vector representation.
[0090] These outputs serve as the input basis for subsequent distributed deep autoregressive models.
[0091] Through the above decoupling modeling structure, the present invention achieves: 1. Weight isolation between different physical driving layers; 2. Avoid gradient interference caused by strong coupling of multiple variables; 3. Improve the clarity of the expression of the contribution of each category of features; 4. Provide structured input for subsequent meta-learning fusion.
[0092] Therefore, this module solves the feature weight aliasing problem caused by traditional unified modeling of all variables from the model structure level, laying the core foundation for realizing spatial heterogeneous decoupling and distributed prediction.
[0093] IV. Self-organizing mapping ecological cluster division (combined with...) Figure 3 (Previous section) In this embodiment, as Figure 3 As shown in the previous section, the self-organizing map ecological cluster partitioning module is located after the multi-source feature decoupling modeling module. Its core function is to classify the physical similarity of forest samples across the entire region. Through unsupervised learning, it divides samples with similar ecological attributes in space into several ecologically homogeneous clusters, thereby reducing the impact of geospatial nonstationarity on subsequent prediction models.
[0094] (a) Input feature composition In this embodiment, the input vector of the self-organizing map (SOM) model includes: 1. Standardized multi-source feature vectors; 2. Intermediate predicted values or embedded features output by each decoupled sub-model; 3. Spatial location characteristics.
[0095] By integrating the above multidimensional features, the clustering results not only reflect spatial proximity, but also the similarity of ecological and physical driving mechanisms.
[0096] (II) Construction of SOM Model In this embodiment, a two-dimensional topological self-organizing mapping neural network is used, whose network structure includes an input layer and an output mapping layer (competition layer).
[0097] 1. The input layer dimension is consistent with the multi-source feature dimension; 2. The competition layer consists of several neuron nodes, forming a pre-defined topological grid structure; 3. Each neuron node corresponds to a weight vector.
[0098] During training, the ecological clusters are divided through the following steps: (1) Calculate the Euclidean distance between the input sample and the weight vector of each neuron; (2) Determine the best matching unit (BMU); (3) Update the weights of the BMU and its neighboring neurons; (4) Gradually converge to form a stable topological mapping structure.
[0099] Through the above iterative training, samples with similar ecological driving characteristics are clustered into adjacent nodes in the topological space, thereby forming several ecologically homogeneous clusters.
[0100] (III) Determining the number of ecological clusters In this embodiment, the number of ecological clusters can be determined based on the following indicators: 1. Quantization Error (QE); 2. Topographic Error (TE); 3. Davies-Bouldin index or silhouette coefficient.
[0101] By comparing and analyzing different numbers of clusters, we selected the number of clusters with better error index and ecological interpretability.
[0102] (iv) Results of Ecological Homogeneous Cluster Classification After training, each sample is assigned a corresponding ecological cluster label. Different ecological clusters have the following physical characteristics: Similar microclimate background; Similar terrain constraints; Similar forest stand structure characteristics; Similar soil nutrient levels.
[0103] Through this classification mechanism, the present invention enables the decomposition of complex ecological non-stationary signals into multiple relatively stable local sub-regions within a large spatial scale.
[0104] (v) Technical Effects Description By introducing a self-organizing mapping ecological cluster partitioning mechanism, this invention achieves the following technical effects: 1. Explicitly address spatial heterogeneity issues to avoid bias caused by uniformly fitting different climate zones to a single global model; 2. Reduce sample entropy during model training to improve local modeling stability; 3. Provides a structured partitioning basis for subsequent distributed deep autoregressive models; 4. Improve the prediction accuracy and generalization ability of local ecological edge areas.
[0105] Therefore, this module plays a crucial role in the overall architecture, serving as a bridge between the previous and subsequent modules. On the one hand, it receives the decoupled modeling results, and on the other hand, it provides ecological classification basis for the distributed deep autoregressive prediction module, thereby improving the accuracy and robustness of forest vegetation carbon density prediction from a structural perspective.
[0106] V. Distributed Deep Autoregressive Model (Level-0 Layer) In this embodiment, as Figure 3 As shown in the middle section, the distributed deep autoregressive model constitutes the core prediction layer (Level-0 layer) of the overall prediction architecture. Based on the aforementioned self-organizing map ecological cluster division results, this layer constructs an independent nonlinear autoregressive deep model for each ecologically homogeneous cluster, realizing the time-series prediction of forest vegetation carbon density in local areas.
[0107] (I) Overall Structure of Distributed Modeling like Figure 3 As shown, the Level-0 layer adopts a "multi-expert parallel structure," which specifically includes: 1. Several ecologically homogeneous clusters; 2. An independent deep autoregressive model corresponding to each ecological cluster; 3. Each model outputs the predicted vegetation carbon density of the corresponding cluster samples.
[0108] Model parameters are not shared between different ecological clusters, thus achieving weight isolation and avoiding gradient interference between different ecological zones.
[0109] (II) Construction of NARX Model Structure In this embodiment, a nonlinear autoregressive model with exogenous inputs (NARX) is preferably used as the distributed prediction unit.
[0110] For the i-th ecological homogeneous cluster, its prediction model can be expressed as: ; in: This indicates the current carbon density of forest vegetation; This indicates the carbon density of forest vegetation at a historical moment. Represents the current vector of exogenous driving variables; Represents historical exogenous variables; k represents the memory step size; f represents a nonlinear mapping function, implemented by a deep neural network.
[0111] Exogenous variables include: Intermediate features output during the decoupling modeling phase; Lag characterization data of climate factors; Key structural covariates.
[0112] By incorporating historical vegetation carbon density sequences and lagged climate factors, this model can characterize the time delay effect in forest carbon sequestration.
[0113] (III) Adaptive Mechanism for Differentiated Parameters To enhance the model's adaptability to different ecological types, this embodiment adaptively configures model parameters for different homogeneous ecological clusters, including: 1. Differentiated settings for memory step size k; 2. Dynamic adjustment of the number of neurons in the hidden layer; 3. Clustering optimization of learning rate; Differentiated configuration settings for model parameters.
[0114] For different homogeneous ecological clusters, the parameters of the autoregressive prediction model are configured differently, including memory step size, learning rate, hidden layer structure, training batch size, and number of iterations.
[0115] Since different ecological homogeneous clusters differ in dominant driving factors, time lag response characteristics of carbon sink changes, and sample distribution structure, their corresponding models should adopt different parameter settings to more accurately characterize the dynamic response patterns of various ecosystems.
[0116] For example, for ecologically homogeneous clusters where carbon sink changes are strongly influenced by historical cumulative effects and have significant time lag characteristics, a longer memory step can be set; for ecologically homogeneous clusters with weaker historical dependence and faster response processes, a shorter historical dependence window can be used.
[0117] Meanwhile, the training batch size and number of iterations are set differently according to the data scale and complexity of different homogeneous ecological clusters in order to improve the model training stability and convergence efficiency.
[0118] By using this parameter differentiation configuration mechanism based on ecological homogeneous clusters, each cluster model can better fit the response characteristics of the corresponding ecological type, thereby improving the accuracy and generalization ability of forest carbon sink prediction.
[0119] (iv) Training and Prediction Process For each ecologically homogeneous cluster, the training process includes: 1. Construct a time series dataset of samples within the cluster; 2. Divide the dataset into training and validation sets; 3. Parameter optimization based on the backpropagation algorithm; 4. Use mean squared error or other loss functions to determine model convergence.
[0120] During the prediction phase, each distributed model outputs the predicted forest vegetation carbon density for the corresponding cluster samples.
[0121] (v) Analysis of technical effects By constructing a distributed deep autoregressive model, this invention achieves the following technical effects: 1. Explicitly characterize the time lag effect of forest vegetation carbon density; 2. Avoid the regional interference problem caused by weight sharing in traditional global models; 3. Improve the fitting accuracy of local ecological areas; 4. Enhance the model's generalization ability in large-scale spaces; 5. Provide diverse expert outputs for subsequent meta-learning integration.
[0122] Therefore, the Level-0 layer plays a core role in "local expert prediction" in the overall architecture, laying the foundation for the dynamic fusion of the meta-learning layer.
[0123] VI. Meta-learning Fusion Layer (Level-1 Layer) In this embodiment, as Figure 3 As shown in the latter part, the meta-learning fusion layer constitutes the second layer (Level-1 layer) of the overall prediction architecture. Its main function is to perform nonlinear fusion and dynamic bias correction on the output results of the distributed deep autoregressive model (Level-0 layer), thereby improving the overall prediction accuracy and model robustness.
[0124] (I) Overall Structure of Meta-Learning Integration like Figure 3 As shown, the Level-1 layer receives the following two types of input data: 1. Predicted forest vegetation carbon density output from distributed NARX models of various ecological homogeneous clusters; 2. Key features of the original covariates.
[0125] Key original covariates include, but are not limited to: Forest stand age; NDVI or productivity-related indicators; Climate-dominant factors; Soil or topography is the main controlling variable.
[0126] By introducing the original key variables, this layer constructs a "re-scanning mechanism," which means re-examining the underlying feature information during the fusion stage, rather than relying solely on the output results of the sub-models.
[0127] (II) Implementation of the rescanning mechanism In traditional ensemble models, the fusion phase typically involves only a linear weighted average of the outputs of multiple sub-models, which cannot dynamically adjust the contribution of each sub-model according to the current ecological environment.
[0128] In this embodiment, by inputting the Level-0 layer output and the original key covariates into the meta-learner, the model can: 1. Identify the effectiveness of sub-models under different ecological conditions; 2. Automatically reduce the weight of the failure sub-model; 3. Perform nonlinear correction on local deviations.
[0129] This rescanning mechanism essentially constructs a secondary feature learning process, enabling the remodeling of the output of the underlying model.
[0130] (III) Construction of Meta-learner Model In this embodiment, a gradient boosting tree model (such as XGBoost) or a Lasso regression model is preferably used as the meta-learner.
[0131] Its mapping relationship can be expressed as: ; in: This represents the predicted vegetation carbon density after final fusion. This represents the output result of the i-th ecological cluster model; Represents the key original covariate vector; This represents the nonlinear mapping function implemented by the meta-learner.
[0132] This fusion model enables the dynamic integration of prediction results from multiple experts.
[0133] (iv) Training strategies During model training: 1. First, complete the training of the distributed model at Level-0 layer; 2. Fix the model parameters for each cluster and generate prediction results; 3. Combine the prediction results with key covariates to form a new feature matrix; 4. Train the meta-learner using real vegetation carbon density data; 5. Select the optimal parameters through cross-validation.
[0134] This hierarchical training mechanism avoids gradients being directly passed to the underlying models, thus maintaining the independence of each cluster of models.
[0135] (v) Technical Effects Description By constructing a meta-learning fusion layer, this invention achieves the following technical effects: 1. Improve the overall model's generalization ability; 2. Dynamically adjust the contribution weights of different ecological cluster models; 3. Effectively corrects local prediction biases; 4. Improve the stability of large-scale predictions; 5. Avoid information loss caused by simple average integration.
[0136] At the same time, the meta-learning layer forms a collaborative relationship with the aforementioned distributed structure: Level-0 is responsible for local expert predictions, while Level-1 is responsible for global integration and error control.
[0137] This leads to the construction of a two-layer prediction architecture of "local fine modeling + global dynamic fusion", which significantly improves the accuracy and robustness of forest vegetation carbon density prediction.
[0138] VII. Carbon Storage and Carbon Sequestration Calculation Module In this embodiment, the forest vegetation carbon storage and carbon sink calculation module is located at the end of the overall prediction process. It is used to convert the forest vegetation carbon density prediction results output by the meta-learning fusion layer into forest vegetation carbon storage and annual carbon sink indicators with practical ecological significance, thereby realizing the conversion from model prediction results to carbon management decision indicators.
[0139] like Figure 1 As shown, this module includes a forest vegetation carbon storage calculation unit and a forest annual carbon sink calculation unit.
[0140] (a) Calculation of carbon storage in forest vegetation After completing the Level-1 layer fusion prediction, the predicted value of forest vegetation carbon density per unit area was obtained, in tC·ha. -1 .
[0141] In this embodiment, the formula for calculating forest vegetation carbon storage is: ; in: S t denoted as forest vegetation carbon storage (t C) in year t; C t This represents the forest vegetation carbon density (t C·ha) in year t. -1 ); A t This represents the forest area (ha) in year t.
[0142] Forest area data can be obtained from forest resource inventory data or remote sensing classification data, and spatially matched with carbon density prediction raster.
[0143] In the specific implementation process, the following steps can be adopted: 1. Convert the forest vegetation carbon density prediction results into raster data format; 2. Overlay a forest overlay layer and extract the forest area; 3. Perform area-weighted summation on the grid cells within the forest area; 4. Calculate the forest vegetation carbon storage at the regional or national scale.
[0144] This method enables spatial aggregation calculation of forest carbon storage.
[0145] (II) Calculation of forest carbon sink (sink difference method) In this embodiment, the annual carbon sink of the forest is calculated using the difference in carbon sinks method.
[0146] The basic principle of the carbon sink / source method is to calculate the difference in carbon storage between two adjacent periods to obtain the annual carbon sink or source, expressed in carbon dioxide equivalent (tCO2e·ha). -1 ·a -1 ).
[0147] The calculation formula is as follows: ; in: This represents the average annual carbon sink or carbon source per unit area, expressed in tCO2e·ha. -1 ·a -1 ; Representing time respectively Carbon storage per unit area of forest vegetation, expressed in t C·ha -1 ; Representing different points in time, with the unit being years (a), and ; coefficient This is the molecular weight conversion factor for converting carbon elements to carbon dioxide equivalents.
[0148] when When, it indicates that the forest ecosystem is a carbon sink; when When, it is indicated as a carbon source.
[0149] The above methods enable standardized calculation of forest carbon sinks, providing a unified quantitative indicator for subsequent model evaluation and spatial analysis.
[0150] (iii) Timing Continuous Calculation Mechanism In multi-year forecasting scenarios, this module supports annual calculations of forest vegetation carbon storage and carbon sink, including: 1. Construct a multi-year vegetation carbon density prediction sequence; 2. Calculate vegetation carbon storage on an annual basis; 3. Calculate the inventory difference year by year; 4. Generate a time series curve of annual carbon sink changes.
[0151] This time-series analysis can further assess the changing trends and fluctuation characteristics of forest carbon sinks.
[0152] (iv) Description of technical effects By constructing a module for calculating vegetation carbon storage and annual carbon sink, this invention achieves the following: 1. Transform model predictions into carbon management indicators that can be used for policy analysis; 2. Establish a complete calculation chain from vegetation carbon density → carbon storage → carbon sink; 3. Support carbon sink assessments at the national or regional scale; 4. Enhance the engineering application value of model results in carbon neutrality strategic decision-making.
[0153] Meanwhile, due to the improved accuracy of vegetation carbon density prediction, the stability and reliability of the carbon sink calculation results are significantly enhanced, avoiding the problem of abnormal fluctuations in carbon sink estimation caused by error accumulation in traditional models.
[0154] VIII. Interpretive Attribution Analysis Module In this embodiment, the interpretability attribution analysis module is located at the output end of the overall prediction process. It is used to analyze the variable contribution of the final prediction results of forest vegetation carbon density, thereby revealing the marginal impact mechanism of different driving factors on the changes in forest vegetation carbon density in each ecological homogeneous cluster, and improving the transparency of model results and decision availability.
[0155] like Figure 3 As shown in the following section, this module is connected to the meta-learning fusion layer to interpret and analyze the output results of the Level-1 layer.
[0156] (I) Overall Structure of Attribution Analysis In this embodiment, the SHAP (Shapley Additive Explanations) method, based on game theory, is used to calculate the contribution of variables. This method quantifies the degree of influence of each input variable in the model decision-making process by calculating the marginal contribution of each input variable to the prediction result under different feature combinations.
[0157] The SHAP method represents the final predicted value as: ; in: To ultimately predict carbon density; This is the baseline forecast value; Shapley contribution value for the i-th input variable.
[0158] Through calculation This allows us to determine the degree of positive or negative influence of each variable on the prediction result.
[0159] (II) Analytical Objects and Variable Scope In this embodiment, the attribution analysis objects include: 1. Biophysical driving variables; 2. Spectra and productivity variables; 3. Climate driving variables; 4. Topographical features; 5. Soil nutrient variables; 6. Distributed model output features.
[0160] By calculating the contribution of the above variables, we can obtain the following: Univariate contribution values; Ranking of contributions of categorical variables; Differences in the contribution of variables within different ecological clusters.
[0161] (III) Differentiation Attribution of Ecological Clusters Since the aforementioned module has already divided the forest area into several ecologically homogeneous clusters, this module can further perform SHAP analysis on different ecological clusters separately.
[0162] The specific steps include: 1. Calculate the SHAP value separately for each ecological cluster sample; 2. Calculate the average contribution of each variable within the cluster; 3. Construct a comparison chart of the importance of variables among clusters.
[0163] This method reveals: In arid regions, moisture variables in the climate may be the dominant factor; In high-nutrient areas, the contribution of soil variables decreases; In areas with young forest stands, the age of the forest stand has a significant impact.
[0164] This enables a quantitative analysis of the driving mechanisms of forest vegetation carbon density under different ecological backgrounds.
[0165] (iv) Visualization of results In this embodiment, the following analysis graphs can be generated: 1. Global variable importance ranking chart; 2. Bar chart showing the contribution of variables from each ecological cluster; 3. Univariate influence curve; 4. Spatial distribution contribution heatmap.
[0166] The above results can be used to support ecological management decisions and policy formulation.
[0167] (v) Technical Effects Description By constructing an interpretable attribution analysis module, this invention achieves the following: 1. Improve the transparency of deep models to avoid the "black box" problem; 2. Quantitatively reveal the mechanisms by which different driving factors affect forest vegetation carbon density; 3. Support the development of differentiated management strategies for different ecological regions; 4. Enhance the credibility of model results and their support for science policy formulation.
[0168] By combining the aforementioned spatial heterogeneous decoupling structure and distributed prediction mechanism, this invention not only achieves high-precision forest carbon sink prediction, but also provides an interpretable physical-driven analysis framework, thereby constructing a comprehensive model system that combines predictive performance and scientific explanatory capabilities.
[0169] IX. Specific Application Examples (Taking the Simulation and Prediction of Forest Vegetation Carbon Density in China in 2018 as an Example) To further illustrate the implementation process and technical effects of the method of the present invention, the following describes the specific application of the forest carbon sink prediction method of the present invention in conjunction with relevant forest vegetation data of China in 2018. This embodiment is only used to illustrate the technical solution of the present invention and is not intended to limit the scope of protection.
[0170] (I) Study Area and Data Sources This embodiment selects forest distribution areas within China as the research object, with a time scale of 2018.
[0171] The data used includes: 1. The forest plot data comes from the National Continuous Forest Resources Inventory. This includes information on attributes such as stand age, dominant tree species, canopy closure, stand density, and tree diameter at breast height (DBH).
[0172] 2. Remote sensing time series data Time series data of NDVI, NPP and LAI from 2014 to 2018 were obtained using MODIS products, with a uniform spatial resolution of 30m.
[0173] 3. Meteorological data are derived from the China regional surface meteorological element driving dataset and... ERA5 reanalysis data were obtained for temperature, precipitation, relative humidity, and potential evapotranspiration from 2014 to 2018.
[0174] 4. Topographic data We used NASA SRTM DEM digital elevation model (DEM) data to extract elevation, slope and aspect information.
[0175] 5. Soil Data It is derived from a Chinese soil dataset used for land surface simulation, including indicators such as soil organic carbon, total nitrogen, available phosphorus, available potassium, and pH.
[0176] All data are uniformly projected to the same coordinate system and resampled to a spatial resolution of 1km.
[0177] (II) Data Preprocessing and Feature Construction 1. Perform spatial registration and temporal alignment; 2. Construct climate lag variables (climate accumulation from 2014 to 2017). 3. Extract time series features of NDVI, NPP, and LAI, including: Area under the curve; Peak value; Annual growth rate; 4. Construct a composite terrain index; 5. Standardize all continuous variables.
[0178] Finally, a comprehensive input matrix containing approximately 50-80 dimensions of features is constructed.
[0179] (III) Classification of Ecological Homogeneous Clusters A self-organizing map neural network was used to perform unsupervised clustering of forest samples from across the country.
[0180] 1. The network structure is set to a 10×10 two-dimensional topology; 2. The training iterations are 5000. 3. Select the optimal clustering result based on the quantization error and topological error indices.
[0181] Ultimately, the country's forest regions were divided into 12 ecologically homogeneous clusters, corresponding to: Cold temperate humid coniferous forest region; Temperate semi-arid and semi-humid mixed coniferous and broad-leaved forest region; Warm-temperate semi-arid and semi-humid deciduous broad-leaved forest region; Northern subtropical humid evergreen broad-leaved forest region; Central subtropical humid evergreen broad-leaved forest region; Tropical and southern subtropical humid rainforests and monsoon forests; Seven forest distribution areas based on climate zones, including the Qinghai-Tibet Plateau alpine forest region.
[0182] (iv) Training of distributed deep autoregressive models NARX models are constructed for each ecological homogeneous cluster.
[0183] The model settings are as follows: Memory step size: 3-5 years (adaptively selected based on cluster characteristics); Number of neurons in the hidden layer: 32-128; Learning rate: 0.001–0.01; Loss function: Root mean square error (MSE).
[0184] The model training uses 80% of the samples as the training set and 20% as the validation set.
[0185] After training, the predicted forest vegetation carbon density for each ecological cluster in 2018 will be output.
[0186] (V) Meta-learning integration The XGBoost meta-learner was constructed by taking the outputs of 12 ecological cluster models and key primitive variables (stand age, NDVI peak, annual average temperature, etc.) as inputs.
[0187] The optimal parameters were determined using 5-fold cross-validation.
[0188] The final result was the fusion prediction of the carbon density of forest vegetation in China in 2018.
[0189] (vi) Calculation of carbon storage and carbon sink 1. Calculate forest vegetation carbon storage based on the 2018 forest vegetation carbon density prediction results; 2. The carbon sink in 2018 was calculated by comparing the carbon storage with that in 2017.
[0190] (vii) Verification of prediction results and technical effectiveness A comparative analysis with the traditional global random forest model yields the following results: The results show that: 1. The coefficient of determination has significantly improved; 2. The root mean square error is significantly reduced; 3. Improved model training efficiency; 4. The prediction accuracy of marginal ecological areas has been significantly improved.
[0191] (viii) Results of interpretability analysis SHAP analysis revealed that: In temperate forest regions, stand age and temperature-related climatic factors are the dominant factors. Subtropical forest regions are dominated by NDVI and precipitation; In high-altitude forest areas, altitude and temperature are the dominant factors.
[0192] The above results are consistent with the expectations of ecological theory and verify the physical rationality of the model.
[0193] (ix) Conclusion of the Implementation Examples As can be seen from this specific application example, the method of the present invention can be used to predict forest vegetation carbon density and carbon sink at the national scale: 1. Significantly improves prediction accuracy; 2. Effectively address the problem of spatial heterogeneity; 3. Improve model stability and generalization ability; 4. It also has high interpretability.
[0194] In summary, this invention achieves structured modeling and progressive error control of forest vegetation carbon density and carbon sink by constructing a hierarchical distributed learning architecture of "multi-source feature decoupling modeling—spatial homogeneity partitioning—distributed temporal autoregressive prediction—meta-learning dynamic fusion and bias correction." Compared to traditional globally unified modeling or simple ensemble models, this invention effectively reduces prediction bias caused by multi-source feature coupling interference and spatial non-stationarity by explicitly handling spatial heterogeneity, separating the influence of different ecological driving layers, constructing a distributed expert model, and introducing a rescanning meta-learning fusion mechanism. This enhances the model's ability to characterize temporal lag effects and its cross-regional generalization ability. Furthermore, by introducing an interpretable attribution analysis mechanism, the prediction results possess ecological mechanism-level explanatory power, improving the model's engineering application value in carbon sink management, ecological assessment, and carbon neutrality strategic decision-making. Therefore, this invention has high accuracy, stability, and practical application significance in forest carbon sink prediction at the national or regional scale.
Claims
1. A forest carbon sink prediction method based on the fusion of spatial heterogeneous decoupling and hierarchical distributed autoregressive meta-learning, characterized in that, Includes the following steps: S1. Construction and unified representation of multi-source heterogeneous features: Acquire multi-source ecological attribute data of the target forest area, clean, align and standardize data from different sources and at different temporal and spatial scales, and construct multi-dimensional spatial feature attribute data including ground structure features, remote sensing temporal features, meteorological driving features, topographic and geological features and soil ecological features; S2, Decoupling Modeling of Category Drivers: Independent nonlinear prediction sub-models are established for the different categories of features, so that different ecological driving factors are mapped and modeled in mutually isolated modeling channels, and the intermediate representation vectors or stage prediction results of the category features are output. Among them, the parameters of each category of prediction sub-model are independent and not shared; S3. Spatial homogeneity partitioning based on multidimensional feature space: Using the multidimensional feature space constructed in step S1 and the category features output in step S2 as joint inputs, an unsupervised clustering method is used to spatially divide the target forest area, generating multiple ecologically homogeneous clusters with relatively consistent ecological attributes. S4, Distributed Deep Autoregressive Enhanced Prediction: For each of the aforementioned ecological homogeneous clusters, an independent nonlinear autoregressive prediction model is constructed. in: The intermediate representation results output by the category feature decoupling modeling are used as exogenous input variables of the autoregressive prediction model; Historical forest vegetation carbon density sequence was used as an input for the autoregressive term; The parameters of the autoregressive prediction models corresponding to each ecological homogeneous cluster are independent and not shared; By modeling each ecological homogeneous cluster separately, the initial forest vegetation carbon density prediction value of each ecological homogeneous cluster is obtained. S5, Cross-cluster Meta-learning Fusion and System Bias Correction: The initial forest vegetation carbon density prediction values of each ecological homogeneous cluster are used as the output of the first-level learner, while key original ecological covariate features are reintroduced to construct a meta-learning fusion model. The meta-learning fusion model performs nonlinear mapping modeling based on the joint input of the initial predicted value and the key original ecological covariate features, realizing dynamic weighted fusion and systematic bias correction of the distributed prediction results, and obtaining the final forest vegetation carbon density prediction result. S6. Calculation of annual carbon sequestration: Based on the final forest vegetation carbon density (tC ha) -1 The predicted results were used to calculate forest vegetation carbon storage (tC) based on forest area, and the annual carbon sink of forest vegetation (tC·ha) was calculated using the continuous period difference method. -1 ·a -1 ).
2. The method according to claim 1, characterized in that, The ground survey data includes stand origin, dominant tree species, canopy closure, tree diameter at breast height (cm), stand age (a), and stand density.
3. The method according to claim 1, characterized in that, The remote sensing time series features include time series data of NDVI, NPP and LAI, and the area under the time series curve, maximum value and growth rate are extracted as input features.
4. The method according to claim 1, characterized in that, The meteorological driving features include monthly and annual temperature (°C) and precipitation (mm), consecutive dry days (a), consecutive wet days (a), growing season length (a), relative humidity (%) and potential evapotranspiration (mm), and a lag climate accumulation feature over the past many years is constructed.
5. The method according to claim 1, characterized in that, The soil characteristics include soil density (g / cm³). 3 Soil porosity (mm), cation exchange capacity (cmol+ / kg), soil texture (%), soil pH, soil moisture (%), soil organic carbon content (g / kg), soil total nitrogen content (g / kg), soil available nitrogen (mg / kg), soil total phosphorus (g / kg), soil available phosphorus (mg / kg), soil total potassium (g / kg), and soil available potassium (mg / kg).
6. The method according to claim 1, characterized in that, In step S2: Ground survey data are modeled using regression models such as generalized additive models or machine learning algorithms such as random forests; Remote sensing temporal features are modeled using deep learning algorithms such as long short-term memory networks or one-dimensional convolutional neural networks; Meteorological driving features are modeled using machine learning algorithms such as gradient boosting tree model (LightGBM / XGboost); Geological features are modeled using geographic weighted regression or neural network embedding layer algorithms; Soil characteristics are modeled using regression models such as partial least squares regression or machine learning algorithms such as support vector regression.
7. The method according to claim 1, characterized in that, In step S3, a self-organizing map neural network is used for unsupervised clustering to reduce the non-stationarity of the modeled geospatial environment.
8. The method according to claim 1, characterized in that, The autoregressive deep model in step S4 is a nonlinear autoregressive model with exogenous input variables, and it adaptively adjusts the learning rate and time memory step size for different ecological homogeneous clusters.
9. The method according to claim 1, characterized in that, The meta-learning model in step S5 can be selected from LightGBM / XGboost / Lasso regression models. While receiving the prediction results of each ecological homogeneous cluster, the meta-learning model re-inputs the key original covariate features and further uses the SHAP (Shapley Additive Explanations) interpretation framework to perform contribution attribution analysis on the final prediction result of forest vegetation carbon density, quantifying the marginal contribution of each input variable to the predicted value of forest vegetation carbon density.
10. A forest carbon sink prediction system based on the fusion of spatial heterogeneous decoupling and hierarchical distributed autoregressive meta-learning, characterized in that, include: Data construction and unified mapping module, This is used to obtain multi-source ecological attribute data of the target forest area, and to clean, align and standardize data from different sources and at different spatial and temporal scales to construct multi-dimensional feature spatial attribute data; Category-driven factor decoupling modeling module, It is used to build independent nonlinear prediction sub-models for different categories of features, and outputs intermediate representation vectors or stage prediction results of category features in mutually isolated modeling channels; Among them, the parameters of each category of prediction sub-model are independent and not shared; Spatial homogeneity partitioning module, Unsupervised clustering is performed based on the multidimensional feature space and the category feature representation results to generate multiple ecological homogeneous clusters with relatively consistent ecological attributes. Distributed autoregressive prediction module This is used to construct independent nonlinear autoregressive prediction models for each ecologically homogeneous cluster. in: The intermediate representation results output by the category feature decoupling modeling module are used as exogenous input variables of the autoregressive prediction model; Historical forest vegetation carbon density sequence was used as an input for the autoregressive term; The parameters of the autoregressive prediction models corresponding to each ecological homogeneous cluster are independent and not shared; Used to output the initial forest vegetation carbon density prediction value for each ecological homogeneous cluster; Meta-learning fusion and bias correction module This is used to receive the initial forest vegetation carbon density prediction values from each ecological homogeneous cluster, and at the same time reintroduce the key original ecological covariate features to construct a meta-learning fusion model. The distributed prediction results are then subjected to nonlinear weighted fusion and systematic bias correction to output the final prediction result of forest vegetation carbon density. Carbon sink calculation module Forest vegetation carbon storage is calculated based on forest vegetation carbon density prediction results and forest area, and annual forest carbon sink is calculated using the continuous period difference method. in, The category-driven factor decoupling modeling module and the distributed autoregressive prediction module constitute the first layer of the learning structure. The meta-learning fusion and bias correction module constitutes the second-layer learning structure. The two-layer learning structure forms a progressive error convergence system, achieving a leapfrog improvement in the performance of the multi-level architecture.