Fine-grained phenological parameter extraction method based on feature fusion network
By employing a deep learning method based on feature fusion networks and a ResFormer dual-branch structure, this method addresses the issues of complex operation and limited phenological period extraction in existing technologies. It achieves high-precision, simplified operation for fine-grained phenological parameter extraction, applicable to various vegetation types and phenological camera sites, thereby improving the efficiency of ecological research.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- NORTHEAST FORESTRY UNIV
- Filing Date
- 2025-11-06
- Publication Date
- 2026-06-18
AI Technical Summary
Existing methods for extracting vegetation phenological parameters are complex to operate and can only extract a small number of specific phenological periods, which limits a comprehensive understanding of plant growth processes, especially in response to climate change and ecological management needs.
A feature fusion network-based approach is adopted, which uses time-series images captured by a phenological camera to construct a training dataset. Phenological parameters are extracted through a deep learning model, and combined with the ResFormer dual-branch feature fusion network structure, up to seven different phenological periods can be identified, improving the extraction accuracy and richness.
It achieves high-precision and simplified operation of phenological parameter extraction, can identify multiple phenological stages, is applicable to different tree species and phenological camera sites, supports fine-grained monitoring, and improves the efficiency of ecological research.
Smart Images

Figure CN2025133003_18062026_PF_FP_ABST
Abstract
Description
A Fine-Grained Phenological Parameter Extraction Method Based on Feature Fusion Network Technical Field
[0001] This invention provides a fine-grained phenological parameter extraction method based on feature fusion networks, belonging to the field of image processing technology. Background Technology
[0002] Vegetation phenology refers to the seasonal physiological changes and phenomena that occur in plants during their growth cycle, mainly including important stages such as germination, flowering, fruiting, and leaf fall. These changes are influenced by climate, environmental, and ecological factors, reflecting the plant's response to climate change. By monitoring vegetation phenology, we can understand the health of ecosystems, predict agricultural production, assess the impacts of climate change, and provide a scientific basis for ecological restoration and resource management. With the development of remote sensing technology and data analysis methods, the monitoring and research of vegetation phenology has become increasingly precise and efficient.
[0003] Methods for extracting vegetation phenological parameters mainly target time-series curves. The calculation process for vegetation phenological parameters mainly consists of two steps: time-series reconstruction and phenological parameter extraction. The time-series reconstruction process mainly includes time-series harmonic analysis, polynomial fitting, SG filtering, and local spline function fitting. The phenological parameter extraction methods mainly include absolute thresholding, dynamic thresholding, and derivative function extremum methods.
[0004] Most phenological parameter extraction methods can only extract phenological parameters from time series composed of vegetation indices extracted from remote sensing images through methods such as smoothing fitting and curve thresholding, making the process extremely complex. Furthermore, most phenological products can only extract phenological parameters at the beginning and end of the growing season, resulting in low granularity of phenological period extraction. This limitation restricts a deeper understanding of plant responses to environmental changes, particularly in addressing climate change and ecological management needs. Summary of the Invention
[0005] This invention provides a fine-grained phenological parameter extraction method based on feature fusion networks to achieve the following objectives:
[0006] (1) To address the shortcomings of existing phenological parameter extraction methods which are complex to operate, this invention proposes a fine-grained phenological parameter extraction framework (RBPhenology) suitable for real-time monitoring of multiple vegetation types using phenological cameras. Based on time-series images captured by phenological cameras, spatial and temporal information is used to determine the phenological parameters of each time-series image from several images, a training dataset is constructed, and a deep learning-based vegetation phenological parameter extraction model is trained using deep learning methods. This solves the problem of complex operation for high-quality phenological data.
[0007] (2) Existing methods can usually only extract a small number of specific phenological stages, which limits the comprehensive understanding of plant growth processes. This invention proposes a ResFormer dual-branch feature fusion network structure, which can effectively identify up to seven different phenological stages, is suitable for fine-grained phenological monitoring tasks, and significantly improves the accuracy and richness of phenological parameter extraction.
[0008] The specific technical solution is as follows:
[0009] A fine-grained phenological parameter extraction method based on feature fusion networks includes the following steps:
[0010] (1) Collect and preprocess the photos taken by the phenology camera to construct a dataset suitable for training;
[0011] Image quality and content detection are performed, manually identifying difficult-to-classify or unclear images, marking them as "needing inspection," and then deleting them manually or by other means. The dataset is then labeled and annotated; images with incorrect labels are identified, deleted, or re-labeled through manual review. Data augmentation is then performed, using methods such as rotation, cropping, and random occlusion to increase the model's robustness, thus constructing a dataset suitable for training.
[0012] (2) Extraction of phenological parameters;
[0013] First, ResNet18 is introduced, and global feature extraction is performed using 3×3 convolutional kernels;
[0014] Pooling layers downsample the feature maps through max pooling, gradually reducing the spatial resolution, and then pass the input directly to the output layer via skip connections. After being processed by a series of convolutional and pooling layers, the feature maps are passed through fully connected layers to extract global semantic information about the weather.
[0015] The focus loss function is used to mitigate the sensitivity of the residual network to gradient changes. The formula is as follows: FL(pt)=-α t (1-pt) γ log(pt)
[0016] Where pt is the model's predicted probability for different phenological periods, and α t It is the weighting coefficient for different categories, and γ is the focus parameter.
[0017] Secondly, it targets the extraction of local features in a lightweight manner;
[0018] A two-layer routing module is used to rearrange the input data and convert it into a shape. The form is as follows. Next, tensors of query, key, and value are obtained through linear projection. The linear projection is calculated by multiplying the rearranged input data by the projection weights to obtain the corresponding query, key, and value. Routing between regions is implemented using a directed graph.
[0019] In this stage, a region-level query and key matrix is generated by averaging the queries and keys for each region. Only the connections between each region and the top k most relevant regions are retained.
[0020] A local context enhancement term is introduced to further improve the representational power of local features. The specific formula is as follows: O = Attention(X) r W q ,K g V g +LCE(X r W v ) K g =gather(K,topIndex(Q) r (X r W k ) T V g =gather(V,topIndex(Q) r (X r W v ) T )))
[0021] Among them, W q W k W v These represent the projection weights for the Q query, K key, and V value, respectively; O is the total attention sum; Q is the linear projection; and K... g For clustered bonds, V g For a value tensor, LCE(X) r W v ) is a value tensor, which is a local context enhancement term. The number of channels for each attention is set to 32, and the kernel size is 5.
[0022] The lightweight local feature extraction uses the GELU activation function and cross-entropy loss function, and employs the AdamW algorithm for training and the RandAugment automated enhancement strategy to optimize model parameters. The GELU activation function is as follows:
[0023] Where 0.5 is a coefficient used to scale a portion of the input, x is the input value, and tanh represents the hyperbolic tangent function. 0.79788 is a constant factor, approximately equal to 0.79788, used to scale the input. 0.044715 is a constant used to adjust the cubic term of x to improve the approximation of the GELU function.
[0024] The cross-entropy loss function is as follows:
[0025] C represents the total number of categories, which is set to 7 in this invention. i p is an indicator variable for the true label. i Let log(p) be the probability value of the i-th class predicted by the model. i ) represents the logarithm of the model's predicted probability.
[0026] (3) After extracting global and local features from each of the two branches, an adaptive feature fusion module is used to fuse the two features. The GELU activation function is then used to process the concatenated features to generate a more comprehensive feature representation: F fuse =Softmax(α×feature vector1+β×feature vector2)
[0027] α and β are learnable weight parameters that control the weights of feature vector1 and feature vector2 in feature fusion, respectively. Feature vector1 and feature vector2 represent global and local features, respectively.
[0028] (4) The classification layer uses the Softmax function and incorporates Dropout technology to map the fused features to the category space of different phenological periods, achieving accurate classification of multiple phenological periods. Once the model starts training, the model parameters are fine-tuned in each round of training.
[0029] Compared with the prior art, the present invention has the following technical advantages:
[0030] 1) This invention realizes a high-precision phenological period monitoring framework, which improves the previous complex phenological parameter extraction methods, and therefore has certain commercial value.
[0031] 2) Compared with previous methods for extracting phenological parameters, this invention is simple to operate, has a high degree of fineness in the extraction of phenological periods, and is suitable for ecological research on plant growth processes.
[0032] 3) No need for long-term time series recording, this method can be well applied to different tree species and various phenological camera sites. Attached Figure Description
[0033] Figure 1 is a flowchart of the present invention;
[0034] Figure 2 is a fine-grained phenological period dataset from the embodiment;
[0035] Figure 3 shows the fine-grained phenological monitoring framework of the embodiment;
[0036] Figure 4 is the fine-grained phenological period identification confusion matrix of Station 1 in the embodiment;
[0037] Figure 5 is the site 2 fine-grained phenological period identification confusion matrix in the embodiment;
[0038] Figure 6 is the site 3 fine-grained phenological period identification confusion matrix of the embodiment;
[0039] Figure 7 is the site 4 fine-grained phenological period identification confusion matrix of the embodiment. Detailed Implementation
[0040] This embodiment is implemented using the existing deep learning framework PyTorch and its corresponding programming libraries, mainly including NumPy, PIL, and SciPy. PyTorch primarily uses pre-trained deep learning models, including linear modules and convolutional modules.
[0041] As shown in Figure 1, a fine-grained phenological parameter extraction method based on feature fusion networks includes the following steps:
[0042] (1) Collect and preprocess photos taken by phenological cameras to ensure the accuracy and consistency of the data, thereby improving the performance of subsequent models. Perform image quality and image content detection, manually identify images that are difficult to classify or unclear, mark these images as "need to be checked", and then delete them manually or by other means. In addition, label the dataset, identify images with incorrect labels through manual review and delete or relabel them, and then perform data augmentation by rotating, cropping, and randomly covering parts of the data to increase the robustness of the model and construct a dataset suitable for training, as shown in Figure 2.
[0043] (2) Extraction of phenological parameters;
[0044] First, ResNet18 is introduced, and features are extracted using 3×3 convolutional kernels. Pooling layers downsample through max pooling, gradually reducing the spatial resolution of the feature maps. Skip connections directly pass the input to the output layer, solving the gradient vanishing problem in deep networks. After processing through a series of convolutional and pooling layers, the feature maps pass through fully connected layers to extract global semantic information. Since residual networks are sensitive to gradient changes, this invention proposes using a focus loss function to alleviate this sensitivity, as shown in the following formula: FL(pt)=-α t (1-pt) γ log(pt)
[0045] Where pt is the model's predicted probability for different phenological periods, and α t It is the weighting coefficient for different categories, and γ is the focus parameter.
[0046] Secondly, for the lightweight extraction of local features, this embodiment employs a two-layer routing module, which rearranges the input data and transforms it into a shape of... The form is as follows. Next, tensors of query, key, and value are obtained through linear projection. Linear projection is calculated by multiplying the rearranged input data by the projection weights to obtain the corresponding query, key, and value. Routing between regions is implemented through a directed graph. In this stage, a region-level query and key matrix is generated by averaging the queries and keys for each region. To improve computational efficiency, this invention only retains the connections between each region and the top k most relevant regions, which significantly reduces computational complexity. Finally, a local context enhancement term is introduced to further enhance the representational power of local features. The specific formula is as follows: O = Attention(X r W q ,K g V g +LCE(X r W v ) K g =gather(K,topIndex(Q) r (X r W k ) T V g =gather(V,topIndex(Q) r (X r W v ) T )))
[0047] Among them, W q W k W v These represent the projection weights for the Q query, K key, and V value, respectively; O is the total attention sum; Q is the linear projection; and K... g For clustered bonds, V g For a value tensor, LCE(X) r W v ) is a value tensor, which is a local context enhancement term. The number of channels for each attention is set to 32, and the kernel size is 5.
[0048] The lightweight local feature extraction uses the GELU activation function and cross-entropy loss function, and employs the AdamW algorithm for training and the RandAugment automated enhancement strategy to optimize model parameters. The GELU activation function is as follows:
[0049] Where 0.5 is a coefficient used to scale a portion of the input, x is the input value, and tanh represents the hyperbolic tangent function. 0.79788 is a constant factor used to scale the input. 0.044715 is a constant used to adjust the cubic term of x to improve the approximation of the GELU function. The GELU function has a smoother shape than ReLU, making it suitable for complex nonlinear tasks in neural networks. The cross-entropy loss function is as follows:
[0050] C represents the total number of categories, which is set to 7 in this experiment. i p is an indicator variable for the true label. i Let log(p) be the probability value of the i-th class predicted by the model. i ) represents the logarithm of the model's predicted probability.
[0051] (3) After extracting global and local features from each of the two branches, the model uses an adaptive feature fusion module to fuse the two features, and uses the GELU activation function to further process the concatenated features in order to generate a more comprehensive feature representation. F fuse =Softmax(α×feature vector1+β×feature vector2)
[0052] α and β are learnable weight parameters that control the weights of feature vector1 and feature vector2 in feature fusion, respectively. Feature vector1 and feature vector2 represent global and local features, respectively.
[0053] (4) The classification layer uses the Softmax function and incorporates Dropout technology to map the fused features to the category space of different phenological periods, thereby achieving accurate classification of multiple phenological periods. Once the model starts training, the model parameters are fine-tuned in each round of training.
[0054] This invention proposes a framework for extracting fine-grained phenological parameters of various vegetation types in real-time monitoring using phenological cameras (RBPhenology) and a ResFormer-based dual-branch feature fusion network structure.
[0055] As shown in Figure 3, the real-time fine-grained phenological monitoring framework of this invention has the ability to monitor and evaluate in real time, supporting the classification of up to seven different phenological stages, including but not limited to budding, flowering, fruiting, and leaf fall. This refined classification method can better understand the plant's response to environmental changes and improve the efficiency of ecological monitoring.
[0056] This invention modifies the ResNet network structure by adding a dual-layer routing attention mechanism and introducing a focus loss function to enhance the network's feature extraction capabilities. A dual-branch network model structure is constructed, achieving mutual optimization of local and global features and improving the network's expressive power. Using this fine-tuned network model, the MLP classification module is improved, and Dropout is added to enhance the model's generalization ability. This not only enhances the model's learning ability but also optimizes its performance in different scenarios, making it more adaptable to practical applications in phenological monitoring.
[0057] Finally, experiments were conducted on phenological cameras at four publicly available sites. The experiments demonstrate that the proposed ResFormer method comprehensively outperforms state-of-the-art methods. At the four publicly available sites, compared to algorithms such as Swin-TransFormer, the proposed ResFormer method comprehensively outperforms state-of-the-art methods, achieving an accuracy improvement of 0.31% at site 1, 0.25% at site 2, 0.18% at site 3, and 2.25% at site 4. The confusion matrices for each dataset are shown in Figures 4 to 7.
Claims
1. A method for extracting fine-grained phenological parameters based on feature fusion networks, characterized in that, Includes the following steps: (1) Collect and preprocess the photos taken by the phenology camera to construct a dataset suitable for training; (2) Extraction of phenological parameters; First, ResNet18 is introduced, and global feature extraction is performed using 3×3 convolutional kernels; Secondly, it targets the extraction of local features in a lightweight manner; (3) After extracting global and local features from the two branches respectively, an adaptive feature fusion module is used to fuse the two parts of features. The GELU activation function is used to process the concatenated features to generate a comprehensive feature representation: F fuse =Softmax(α×feature vector1+β×feature vector2) α and β are learnable weight parameters that control the weights of feature vector1 and feature vector2 in feature fusion, respectively; feature vector1 and feature vector2 are global features and local features, respectively. (4) The classification layer uses the Softmax function and adds Dropout technology to map the fused features to the category space of different phenological periods, so as to achieve accurate classification of multiple phenological periods; when the model starts training, the model parameters will be fine-tuned in each round of training.
2. The method for extracting fine-grained phenological parameters based on feature fusion networks according to claim 1, characterized in that, The specific method of step (1) is as follows: perform image quality and image content detection, manually identify images that are difficult to classify or unclear, mark these images as "need to be checked", and then delete them manually or by other means; and label the dataset, identify images with incorrect labels through manual review and delete or re-label them, and then perform data augmentation by rotating, cropping and randomly covering some data to increase the robustness of the model and construct a dataset suitable for training.
3. The method for extracting fine-grained phenological parameters based on feature fusion networks according to claim 1, characterized in that, The specific method for global feature extraction in step (2) is as follows; The pooling layer downsamples the feature map through max pooling, gradually reducing the spatial resolution of the feature map. Skip connections directly pass the input to the output layer. The feature map after being processed by a series of convolutional and pooling layers passes through a fully connected layer to extract the global semantic information of the phenological stage. The focus loss function is used to mitigate the sensitivity of the residual network to gradient changes, as shown in the following formula: FL(pt)=-α t (1-pt) γ log(pt) Where pt is the model's predicted probability for different phenological periods, and α t It is the weighting coefficient for different categories, and γ is the focus parameter.
4. The method for extracting fine-grained phenological parameters based on feature fusion networks according to claim 1, characterized in that, The specific method for local feature extraction in step (2) is as follows; A two-layer routing module is used to rearrange the input data and convert it into a shape. The form is as follows: then, the query, key, and value tensors are obtained through linear projection; the linear projection is calculated by multiplying the rearranged input data by the projection weights to obtain the corresponding query, key, and value; the routing between regions is implemented through a directed graph; In this stage, a region-level query and key matrix is generated by averaging the queries and keys for each region; only the connections between each region and the top k most relevant regions are retained. A local context enhancement term is introduced to further improve the representational power of local features; the specific formula is shown below: O=Attention(X r W q ,K g ,V g +LCE(X r W v ) K g =gather(K,topIndex(Q r (X r W k ) T )) V g =gather(V,topIndex(Q r (X r W v ) T ))) Among them, W q W k W v These represent the projection weights for the Q query, K key, and V value, respectively; O is the total attention sum; Q is the linear projection; and K... g For clustered bonds, V g For a value tensor, LCE(X) r W v ) is a value tensor, which is a local context enhancement term. The number of channels for each attention is set to 32, and the kernel size is 5. The lightweight local feature extraction uses the GELU activation function and cross-entropy loss function, and employs the AdamW algorithm for training and the RandAugment automated enhancement strategy to optimize model parameters; the GELU activation function is as follows: Where 0.5 is a coefficient used to scale a portion of the input, x is the input value, and tanh represents the hyperbolic tangent function. is a constant factor used to scale the input; 0.044715 is a constant used to adjust the cubic term of x to improve the approximation of the GELU function. The cross-entropy loss function is as follows: C represents the total number of categories, y i p is an indicator variable for the true label. i Let log(p) be the probability value of the i-th class predicted by the model. i ) represents the logarithm of the model's predicted probability.