Crop growth season phenology prediction method and device, electronic equipment and medium
By using interpolation models and feature extraction techniques, the irregularity problem in optical-SAR remote sensing data fusion was solved, achieving high-precision phenological period prediction and improving the monitoring reliability of remote sensing data.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NAT GEOMATICS CENT OF CHINA
- Filing Date
- 2026-04-03
- Publication Date
- 2026-06-19
AI Technical Summary
Existing optical-SAR remote sensing data fusion methods rely on regular time-series images and prior assumptions, which leads to irregular and incomplete remote sensing data affecting the extraction of phenological features and reducing the accuracy and reliability of phenological period prediction.
An interpolation model combined with a feedforward neural network and a bidirectional long short-term memory network is used to process the irregularities of optical-SAR time-series images. Through interpolation and feature extraction, the spatiotemporal dependence of crop phenology is captured, time dependence is established, and prediction accuracy is improved.
It enables high-precision phenological period prediction under irregular and incomplete data conditions, enhancing the reliability and accuracy of monitoring.
Smart Images

Figure CN122244716A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the interdisciplinary field of intelligent processing of remote sensing images and agricultural science, and more specifically, to methods, devices, electronic equipment, and media for predicting phenological periods during the crop growing season. Background Technology
[0002] Crop phenological information is crucial foundational data for agricultural management, climate change research, and ecosystem monitoring, and has significant application value in optimizing agricultural production and increasing crop yields. While current ground-based observation systems can achieve high-precision local monitoring, their spatial coverage is limited, failing to meet regional or global monitoring needs. Remote sensing (RS) technology compensates for this deficiency. Optical remote sensing (RS) enables fine-scale monitoring but is susceptible to cloud cover, making data continuity difficult to guarantee. SAR (Synthetic Aperture Radar) remote sensing can circumvent weather limitations but cannot comprehensively reflect crop phenological characteristics; therefore, data fusion between these two methods has become an important direction.
[0003] However, existing fusion monitoring methods have significant drawbacks. They rely on regular time-series images and prior assumptions, while actual optical-SAR time-series images are prone to irregular and incomplete data due to weather, task scheduling, etc., resulting in inaccurate extraction of phenological features and difficulty in establishing time dependence, which ultimately reduces the prediction accuracy of phenological period forecasts and affects the reliability of phenological monitoring. Summary of the Invention
[0004] In view of this, the purpose of this application is to provide a method, device, electronic device and medium for predicting the phenological period of crop growing season, which can adapt to the irregular and incomplete characteristics of optical-SAR time series images, without relying on regular data and prior assumptions, accurately extract phenological features, establish time dependence, and improve prediction accuracy and monitoring reliability.
[0005] In a first aspect, embodiments of this application provide a method for predicting the phenological stages of a crop growing season, the method comprising: Acquire optical time-series images and SAR time-series images of the crop to be predicted; The optical time-series image and the SAR time-series image are input into the interpolation model to interpolate the initial NDVI time-series image in the optical time-series image based on the VV time-series image and VH time-series image in the SAR time-series image, so as to obtain the target NDVI time-series data. The target NDVI time-series data, the VV time-series image, and the VH time-series image are input into the spatial feature extraction module of the phenological prediction model to obtain a spatial feature representation. The spatial feature representation is used to characterize the greenness difference of crops in different spatial locations, the greenness change trend of crops over time, the physical morphology of crop canopy, and the water content of crop canopy. The target NDVI time series data, the VV time series image, the VH time series image, and the spatial feature representation are input into the multi-scale feature fusion module of the phenological period prediction model to obtain a fused feature tensor. The fused feature tensor is used to characterize the phenological change pattern of crops at different time spans, and the correlation between the spatial distribution and temporal change of crop phenology. The fused feature tensor is input into the prediction module of the phenological prediction model to obtain the phenological prediction result of the crop to be predicted.
[0006] In one possible implementation, the step of inputting the optical time-series image and the SAR time-series image into an interpolation model to interpolate the initial NDVI time-series image in the optical time-series image based on the VV and VH time-series images in the SAR time-series image to obtain target NDVI time-series data includes: The optical time-series image and the SAR time-series image are input into the feature extraction module of the interpolation model to extract full-cycle growth features by capturing the spatial distribution differences of crop phenology and the dynamic changes in the temporal dimension; the full-cycle growth features are used to reflect the growth status of each growth stage in the entire growth cycle of the crop from seedling to maturity. The full-cycle growth characteristics are input into the bidirectional long short-term memory module in the interpolation model to capture the spatiotemporal dependence of crop phenological changes and obtain context-aware output. The context-aware output is input into the output mapping module in the interpolation model to obtain the target NDVI time series data.
[0007] In one possible implementation, the step of inputting the target NDVI time-series data, the VV time-series image, and the VH time-series image into the spatial feature extraction module of the phenological period prediction model to obtain a spatial feature representation includes: The target NDVI time series data is input into the first feature extractor in the spatial feature extraction module to capture the greenness difference of crops in different spatial locations and the greenness change trend of crops over time, and to obtain the first pooling representation. The VV time series image is input into the second feature extractor in the spatial feature extraction module to capture the physical morphology of the crop canopy and obtain the second pooling representation; The VH time series image is input into the third feature extractor in the spatial feature extraction module to capture the water content of the crop canopy and obtain the third pooling representation; The first pooling representation, the second pooling representation, and the third pooling representation are concatenated to obtain a spatial feature representation.
[0008] In one possible implementation, any temporal image is input into the corresponding feature extractor in the spatial feature extraction module according to the following steps to obtain the corresponding feature representation: The time-series image is input into the first convolutional block of the feature extractor to obtain the first local spatial features; The first local spatial features are input into the second convolutional block of the feature extractor to obtain the second local spatial features; The second local spatial feature is input into the third convolutional block of the feature extractor to obtain the third local spatial feature; The third local spatial features are input into the global pooling layer of the feature extractor to obtain the feature representation.
[0009] In one possible implementation, inputting the temporal image into the first convolutional block of the feature extractor to obtain the first local spatial features includes: The temporal image is convolved using the convolution kernel in the first convolution block to obtain the initial first local spatial features; The initial first local spatial features are input into the batch normalization layer in the first convolutional block to obtain the intermediate first local spatial features. The intermediate first local spatial features are input into the max pooling layer in the first convolutional block to obtain the final first local spatial features.
[0010] In one possible implementation, the step of inputting the second local spatial features into the third convolutional block of the feature extractor to obtain the third local spatial features includes: The second local spatial features are convolved by the convolution kernel in the third convolution block to obtain the initial third local spatial features; The initial third local spatial features are input into the batch normalization layer in the first convolutional block to obtain the final third local spatial features.
[0011] In one possible implementation, the step of inputting the target NDVI time-series data, the VV time-series image, the VH time-series image, and the spatial feature representation into the multi-scale feature fusion module of the phenological period prediction model to obtain a fused feature tensor includes: The target NDVI time series data, the VV time series image, the VH time series image, and the spatial feature representation are concatenated to obtain an initial fused feature tensor; The initial fusion feature tensor is input into each convolution branch in the multi-scale feature fusion module to convolve the initial fusion feature tensor with convolution kernels of different sizes to obtain multiple intermediate fusion feature tensors. All intermediate fusion feature tensors are input into the fusion layer of the multi-scale feature fusion module to obtain the final fusion feature tensor.
[0012] Secondly, embodiments of this application also provide a device for predicting the phenological stages of a crop growing season, the device comprising: The acquisition module is used to acquire optical time-series images and SAR time-series images of the crop to be predicted; The input module is used to input the optical time-series image and the SAR time-series image into the interpolation model, so as to interpolate the initial NDVI time-series image in the optical time-series image based on the VV time-series image and VH time-series image in the SAR time-series image to obtain the target NDVI time-series data. The input module is also used to input the target NDVI time series data, the VV time series image and the VH time series image into the spatial feature extraction module in the phenological prediction model to obtain a spatial feature representation; the spatial feature representation is used to characterize the greenness difference of crops in different spatial locations, the greenness change trend of crops over time, the physical morphology of crop canopy and the water content of crop canopy; The input module is further configured to input the target NDVI time series data, the VV time series image, the VH time series image, and the spatial feature representation into the multi-scale feature fusion module of the phenological prediction model to obtain a fused feature tensor; the fused feature tensor is used to characterize the phenological change pattern of crops at different time spans, and the correlation between the spatial distribution and temporal change of crop phenology; The input module is also used to input the fused feature tensor into the prediction module of the phenological period prediction model to obtain the phenological period prediction result of the crop to be predicted.
[0013] Thirdly, embodiments of this application also provide an electronic device, including: a processor, a storage medium, and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor communicates with the storage medium via the bus, and the processor executes the machine-readable instructions to perform the steps of the crop growing season phenological period prediction method as described in any of the first aspects.
[0014] Fourthly, embodiments of this application also provide a computer-readable storage medium storing a computer program, which, when executed by a processor, performs the steps of the crop growing season phenological prediction method as described in any of the first aspects.
[0015] This application provides a method, apparatus, electronic device, and medium for predicting the phenological period of a crop growing season. The method includes: acquiring optical time-series images and SAR time-series images of the crop to be predicted; inputting the optical time-series images and SAR time-series images into an interpolation model to obtain target NDVI time-series data; inputting the target NDVI time-series data, VV time-series images, and VH time-series images from the optical time-series images into a spatial feature extraction module in the phenological period prediction model to obtain a spatial feature representation; inputting the target NDVI time-series data, VV time-series images, VH time-series images, and spatial feature representation into a multi-scale feature fusion module in the phenological period prediction model to obtain a fused feature tensor; and inputting the fused feature tensor into a prediction module in the phenological period prediction model to obtain the phenological period prediction result of the crop to be predicted. This application can adapt to the irregular and incomplete characteristics of optical-SAR time-series images, without relying on regular data and prior assumptions, accurately extracting phenological features, establishing time dependencies, and improving prediction accuracy and monitoring reliability. Attached Figure Description
[0016] To more clearly illustrate the technical solutions of the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of this application and should not be regarded as a limitation of the scope. For those skilled in the art, other related drawings can be obtained based on these drawings without creative effort.
[0017] Figure 1 A flowchart of a method for predicting the phenological stages of a crop growing season, provided in an embodiment of this application, is shown. Figure 2 This document illustrates a flowchart of an embodiment of the present application for interpolating an initial NDVI time-series image in an optical time-series image using an interpolation model. Figure 3 This illustration shows a structural schematic diagram of a crop growing season phenological prediction device provided in an embodiment of this application; Figure 4 A schematic diagram of the structure of an electronic device provided in an embodiment of this application is shown. Detailed Implementation
[0018] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. It should be understood that the accompanying drawings in this application are for illustrative and descriptive purposes only and are not intended to limit the scope of protection of this application. Furthermore, it should be understood that the schematic drawings are not drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of this application. It should be understood that the operations in the flowcharts may not be implemented in sequence, and steps without logical contextual relationships may be reversed or implemented simultaneously. In addition, those skilled in the art, guided by the content of this application, may add one or more other operations to the flowcharts, or remove one or more operations from the flowcharts.
[0019] Furthermore, the described embodiments are merely some, not all, of the embodiments of this application. The components of the embodiments of this application described and illustrated herein can typically be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of this application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely to illustrate selected embodiments of the application. All other embodiments obtained by those skilled in the art based on the embodiments of this application without inventive effort are within the scope of protection of this application.
[0020] To enable those skilled in the art to utilize the content of this application, and in conjunction with the specific application scenario of "the interdisciplinary field of intelligent remote sensing image processing and agricultural science," the following embodiments are provided. For those skilled in the art, the general principles defined herein can be applied to other embodiments and application scenarios without departing from the spirit and scope of this application. Although this application primarily describes the field of "the interdisciplinary field of intelligent remote sensing image processing and agricultural science," it should be understood that this is merely an exemplary embodiment.
[0021] It should be noted that the term "comprising" will be used in the embodiments of this application to indicate the presence of the features declared thereafter, but does not exclude the addition of other features.
[0022] The following is a detailed description of a method for predicting the phenological stages of a crop growing season provided in the embodiments of this application.
[0023] Reference Figure 1 The diagram shown is a flowchart illustrating a method for predicting the phenological stages of a crop growing season according to an embodiment of this application. The exemplary steps of this embodiment are described below: S101. Obtain optical time-series images and SAR time-series images of the crop to be predicted.
[0024] In this application's embodiments, the optical time-series images include initial NDVI (Normalized Difference Vegetation Index) time-series images, which are time-series images acquired based on optical remote sensing and reflect differences in crop greenness and spatial distribution. The SAR time-series images are time-series images acquired based on synthetic aperture radar and are sensitive to crop canopy structure and water content, including VV time-series images and VH time-series images. VV time-series images refer to time-series images acquired by the same polarization channel of SAR and are sensitive to crop canopy geometry. VH time-series images refer to time-series images acquired by cross-polarization channels of SAR and are sensitive to crop canopy water content.
[0025] S102. Input the optical time-series image and the SAR time-series image into the interpolation model, and interpolate the initial NDVI time-series image in the optical time-series image based on the VV time-series image and VH time-series image in the SAR time-series image to obtain the target NDVI time-series data.
[0026] In this embodiment, since optical remote sensing is easily affected by cloud cover, making it difficult to guarantee the continuity of NDVI time-series images, this embodiment employs a hybrid architecture combining a feedforward neural network and a bidirectional long short-term memory network to construct an interpolation model. This interpolation model is specifically tailored for phenological prediction tasks and can extract key spatiotemporal features from optical-SAR multimodal time-series images, capturing long-term dependencies in irregular time series. It aims to fill in missing data by interpolating NDVI values, effectively handling the irregularities of optical-SAR time-series images, thereby better capturing the spatiotemporal characteristics of phenological changes and addressing the challenges of phenological feature extraction caused by irregular sampling and data gaps.
[0027] Specifically, the interpolation model comprises a feature extraction module, a bidirectional long short-term memory module, and an output mapping module. These three modules are specifically optimized for feature extraction, temporal dynamic modeling, and NDVI value interpolation tasks in phenological prediction, respectively. (Refer to...) Figure 2 The diagram shown is a flowchart illustrating the interpolation process of an initial NDVI time-series image in an optical time-series image using an interpolation model, as provided in an embodiment of this application. The interpolation process is as follows: S201. Input the optical time-series image and the SAR time-series image into the feature extraction module of the interpolation model to extract full-cycle growth features by capturing the spatial distribution differences of crop phenology and the dynamic change patterns in the temporal dimension.
[0028] In this application embodiment, the full-cycle growth characteristics are used to reflect the growth status of crops at each growth stage throughout the entire growth cycle from seedling emergence to maturity. The full-cycle growth characteristics can completely cover and reflect the growth status and phenological changes of crops from sowing / emergence to jointing, flowering / flowering, grain filling, and then to maturity and harvest, including: (1) growth vigor: greenness intensity, growth rate (e.g., rapid growth during jointing stage, slow decline during maturity); (2) physiological characteristics: canopy water content, nutrient accumulation status (e.g., nutrient enrichment during grain filling stage, peak water content); (3) morphological characteristics: canopy structure (density, plant height), key phenological node status (e.g., panicle development during heading stage, yellowing degree during maturity). To improve the predictive ability of the interpolation model, the interpolation model uses multi-source input data. Optical time-series images reflect vegetation greenness, while SAR time-series images provide structural information (such as canopy density, plant height morphology, canopy coverage, and spatial distribution structure of plant populations) that is insensitive to weather conditions (sensitivity to weather conditions is less than a preset sensitivity threshold, which measures the degree to which crop information is affected by cloud cover and sunlight). The time step, however, encodes the temporal context of irregular sampling. Therefore, a feature extraction module was designed to integrate these heterogeneous data into a unified feature representation through a fully connected layer, fully utilizing the complementarity of multi-source data to enhance the robustness of phenological features. The specific process is as follows: Step 1: Perform structured preprocessing and feature integration on optical time-series images and SAR time-series images to obtain crop feature tensors.
[0029] In this embodiment, firstly, 19×19 pixel NDVI time-series images (from optical time-series images) and 19×19 pixel VV and VH time-series images (from SAR time-series images) are extracted. The center pixel value of each image is input (focusing on the core crop area, avoiding edge interference, and characterizing the local growth state, greenness level, and canopy structure features of the crop), and this value is filled into the corresponding date position in the year-round array (containing 365 elements, each corresponding to a date in the year). Simultaneously, a time step array (containing elements with time intervals from 0 to 364) is generated to record the various data time intervals throughout the year. Through this process, the image data is processed into four one-dimensional tensors: Then, after inputting the four-channel one-dimensional tensor into the feature extraction module, these inputs are concatenated along the feature dimension to form the crop feature tensor X∈R. B×T×4 Where B is the batch size, T is the time step, and 4 corresponds to the number of input channels. This concatenation process can be represented as: .
[0030] in, This is a feature splicing operation.
[0031] Step 2: Since remote sensing data is often affected by factors such as cloud cover and sensor noise, a feedforward neural network (FNN) is used to filter noise by learning weights. This maps the crop feature tensor to a high-dimensional latent space, captures complex spatiotemporal patterns related to phenological changes, and extracts key features related to the crop growth cycle to obtain full-cycle growth characteristics. This feedforward neural network contains linear layers and the ReLU activation function: ; in, and Here, H represents the trainable weight matrix and bias matrix, respectively, and H (set to 128) represents the hidden layer dimension. The ReLU activation function is defined as ReLU(x) = max(0,x).
[0032] Here, the nonlinear interpolation model introduced by ReLU can learn complex feature representations from multimodal inputs. The ReLU activation function, by introducing nonlinearity, enhances the interpolation model's ability to extract phenologically relevant features from multimodal inputs—such as key dynamic patterns in the vegetation growth cycle. Time step information is incorporated into the feature extraction process, providing the interpolation model with contextual information about irregular data sampling, thereby enhancing its sensitivity to phenological changes.
[0033] S202. Input the full-cycle growth characteristics into the bidirectional long short-term memory module in the interpolation model to capture the spatiotemporal dependence of crop phenological changes and obtain context-aware output.
[0034] In this application's implementation, the context-aware output is used to characterize the spatiotemporal correlation of crop phenological changes and the long-term correlation of irregular time series, while adapting to the irregular sampling-related feature patterns of optical-SAR data caused by cloud cover and other factors. The spatiotemporal correlation of crop phenological changes refers to changes in crop phenology (such as emergence and heading), which are related to both "spatial location" (e.g., differences in growth in different areas of the field) and "time sequence" (e.g., early growth influences later growth); the relationship between these two is spatiotemporal dependence. The long-term correlation of irregular time series refers to the fact that due to cloud cover and other factors, the sampling time of optical-SAR data is not fixed (irregular sequence), while crop phenology lasts for several months; "long-term dependence" means that the later phenological state of the crop is related to the earlier (even much earlier) growth state and data characteristics, and the model needs to capture this long-term correlation. Irregular sampling-related feature patterns refer to the feature output's ability to fit this irregularity and accurately extract effective feature patterns (unaffected by chaotic sampling intervals) to address the problem of irregular data sampling caused by cloud cover.
[0035] Here, since crop phenological periods can last for several months, this embodiment of the application designs a bidirectional long short-term memory (LSTM) module to address the problem of long-term dependency modeling in irregular time series data. The LSTM module processes sequences through both forward and backward paths, effectively capturing the spatiotemporal dependencies of phenological changes, and is particularly suitable for handling the irregular sampling characteristics of optical-SAR time series data caused by factors such as cloud cover. The LSTM module extracts full-cycle growth features. ∈R B×T×H The obtained representation is processed bidirectionally to generate context-aware output. The specific implementation process is as follows: At each time step t, the Bi-LSTM module updates its hidden state. With cell state Specifically, Bi-LSTM (Bidirectional Long Short-Term Memory) processes the sequence along both the forward and backward directions to generate hidden states. and And concatenate them into the final output:
[0036] Here, by encoding contextual information at time steps, Bi-LSTM can capture the bidirectional dynamic characteristics of crop growth cycles (such as the trend of NDVI values rising rapidly in spring and falling in autumn), thus providing support for accurate phenological prediction.
[0037] S203. Input the context-aware output into the output mapping module of the interpolation model to obtain the target NDVI time series data.
[0038] In this embodiment, the output mapping module converts the context-aware output generated by the bidirectional long short-term memory network into interpolated normalized vegetation index (NDVI) values at each time step (i.e., target NDVI time-series data). It then performs fine mapping of the features of the bidirectional long short-term memory network through a multi-layer fully connected network, ensuring that the interpolated normalized vegetation index highly matches the actual observed values, thereby further optimizing the input data quality for phenological prediction. This module consists of two linear layers connected by a ReLU activation function. ; ; in, ∈R 2H×H , ∈R H , ∈R H×1 , ∈R 1 These are the first weight matrix and the first bias matrix, the second weight matrix and the second bias matrix, respectively. The final output is the interpolated NDVI value. This provides continuous and smooth time series data for subsequent phenological prediction tasks.
[0039] Furthermore, to improve the interpolation performance of NDVI time series, we designed an improved reconstruction loss function for the interpolation model that integrates diurnal GCC (relative greenness index, considered the "true" phenological observation data at the ground scale) time series data to enhance model accuracy. Since GCC data captures the diurnal trend of vegetation greenness changes, the auxiliary information it provides is highly correlated with NDVI. By incorporating the dynamic trend of GCC into the loss function, the interpolation model can be guided to generate NDVI interpolation results that better reflect the vegetation growth cycle, thereby enhancing the biological rationality of the interpolation results. The core of this method is to calculate the trend loss using GCC data: the consistency of the predicted NDVI time series with the GCC time series in terms of temporal difference is used to measure the degree of fit of the dynamic trend. In addition, the contribution weight of GCC data in the loss function is controlled by the hyperparameter β to balance the relationship between the trend loss and other loss terms. This method effectively utilizes the additional environmental information provided by GCC data and can specifically address potential missing or noise issues in the NDVI time series, thereby improving interpolation accuracy and the model's ability to represent vegetation dynamics.
[0040] Through the above design, the bidirectional long short-term memory interpolation module effectively captures the spatiotemporal patterns related to phenology during the feature extraction stage, solves the problem of long-term dependency modeling of irregular time series, and generates high-quality interpolated NDVI data in the output mapping module, thus providing a reliable basis for phenological prediction.
[0041] S103. Input the target NDVI time series data, VV time series image and VH time series image into the spatial feature extraction module in the phenological period prediction model to obtain the spatial feature representation.
[0042] In the embodiments of this application, spatial features are used to characterize the differences in greenness of crops in different spatial locations, the trend of greenness changes of crops over time, the physical morphology of crop canopy (e.g., the canopy becomes taller and denser during the jointing stage of wheat, and the canopy spread changes during the grain filling stage of rice) and the water content of crop canopy (referring to the water content of crop leaves, stems, and ears, which is a key indicator reflecting the growth status and phenological stage of crops (e.g., the water content of crops reaches its peak during the grain filling stage and the water content decreases rapidly during the ripening stage)). (1) The differences in greenness in different spatial locations refer to the differences in the NDVI values and the distribution gradient at the same time point, reflecting the spatial heterogeneity of vegetation greenness: for example, in the same farmland, the NDVI value is higher (greener) in areas where crops emerge early, and the NDVI value is lower (yellowish) in areas where crops emerge late / have poor growth, forming a greenness gradient; for example, the spatial distribution differences of NDVI values between the edge and core areas of the plot, and between plots with different water and fertilizer conditions, are all manifestations of the greenness gradient. (2) The trend of crop greenness change over time reflects the temporal change pattern of vegetation greenness: For example, during the jointing stage of crops, the NDVI value of plots with large greenness gradient (difference in the timing of seedling emergence) increases as crops grow synchronously and the spatial gradient gradually shrinks (greenness tends to be uniform); for another example, from the grain filling stage to the maturity stage, the overall greenness of crops decreases and areas with uneven maturity will re-form a new greenness gradient (NDVI drops sharply in areas that mature earlier, while immature areas still maintain a higher value). This process of "gradient formation - shrinkage - re-formation" + "NDVI value increase / decrease" is the dynamic change of greenness gradient.
[0043] Specifically, the spatial feature extraction module aims to extract high-dimensional spatial features from the reconstructed NDVI time series and SAR time series images, providing rich spatial contextual information for subsequent multi-scale feature fusion and phenological prediction. This application's embodiment designs a feature extractor based on a multi-branch CNN framework, combining global pooling operations to capture the spatial pattern and texture features of the images, thereby improving the phenological prediction model's ability to predict key phenological stages of crops.
[0044] The spatial feature extraction module comprises three parallel CNN-based feature extractors (first, second, and third feature extractors) that process NDVI, VV, and VH data respectively, capturing spatial information from different data sources to provide more comprehensive features for subsequent phenological prediction. Since different data reflect different aspects of crop growth (spectral, structural, humidity, etc.), each feature extractor is independently optimized for its corresponding input modality to avoid information confusion, while adhering to the same architectural design to ensure consistency in feature extraction and facilitate subsequent fusion for collaborative modeling. Each feature extractor receives a 4-dimensional input tensor with a shape of (patch, number of channels, height, width), where the channel dimension corresponds to the time series.
[0045] Each feature extractor architecture consists of three convolutional blocks (first, second, and third) designed to capture hierarchical spatial features.
[0046] (a) The first two convolutional blocks (the first and second convolutional blocks) each contain a two-dimensional convolutional layer with a 3×3 kernel and 1 padding to maintain the spatial dimension of the feature map. The mathematical expression for the convolution used to extract local spatial features is: ; in, This represents the input feature map, where w represents the convolution kernel weights and b is the bias term. is the output feature map; i,j represent spatial locations, c represents the output channel index, (m, n) represents the spatial index of the convolution kernel, and k represents the input channel index.
[0047] Subsequently, a batch normalization layer normalizes the output of the convolutional layer to stabilize the training process and reduce internal covariate bias. This process can be defined as: ; in, and These are the mean and variance of the batch data, respectively. and For learnable scaling and translation parameters, It is a very small constant used to prevent division by zero errors. This is the output feature map. The LeakyReLU activation function is used to introduce non-linearity, enhance the model's expressive power, prevent neuron inactivation, and ensure gradient flow even when the input is negative. This is useful for capturing SAR data. Subtle spatial variations are crucial.
[0048] Subsequently, the max-pooling layer uses a 2×2 window and a stride of 2 to downsample the feature maps output by the batch normalization layer to reduce spatial dimensionality and enhance feature robustness. This process is defined by the following formula: .in, This is the final output feature map.
[0049] (ii) The third convolutional block omits the max-pooling layer to maintain feature resolution before global pooling. The channel configuration of this layer gradually transitions from 365 input channels to 64, 128, and 256 output channels, enabling the phenological prediction model to learn progressively abstract spatial patterns. Specifically, the NDVI branch primarily captures dynamic changes such as vegetation greenness gradients, while the VV and VH branches extract structural and textural features sensitive to crop canopy geometry and water content from SAR data, respectively.
[0050] ; in, and These represent the height and width of the feature map, respectively. This represents the pixel value at coordinates (i, j) on channel c of the input feature map. These are the feature values after pooling. This process aggregates the same spatial information, enhancing the robustness of the phenological period prediction model to spatial changes within the image.
[0051] Finally, the pooling features of the NDVI, VV, and VH branches are concatenated along the channel dimension to generate a feature with 256... The 3 = 768 channel feature vectors constitute a unified spatial feature representation: This design ensures that the model can effectively capture the spatial patterns unique to each modality and achieve seamless integration for subsequent phenological predictions. Among these, This is the first pooling representation. This is the second pooling representation. This represents the third pooling method.
[0052] In summary, the process of extracting spatial feature representations using the spatial feature extraction module is as follows: Step 1: Input the target NDVI time series data into the first feature extractor in the spatial feature extraction module to capture the greenness differences of crops in different spatial locations and the greenness change trend of crops over time, and obtain the first pooling representation.
[0053] Step 2: Input the VV time series image into the second feature extractor in the spatial feature extraction module to capture the physical morphology of the crop canopy and obtain the second pooling representation.
[0054] Step 3: Input the VH time series image into the third feature extractor in the spatial feature extraction module to capture the water content of the crop canopy and obtain the third pooling representation.
[0055] Step 4: Perform feature concatenation on the first pooling representation, the second pooling representation, and the third pooling representation to obtain the spatial feature representation.
[0056] Specifically, any time-series image (target NDVI time-series data, VV time-series image, and VH time-series image) is input into the corresponding feature extractor in the spatial feature extraction module according to the following steps to obtain the corresponding feature representation: the time-series image is input into the first convolutional block of the feature extractor to obtain the first local spatial feature; the first local spatial feature is input into the second convolutional block of the feature extractor to obtain the second local spatial feature; the second local spatial feature is input into the third convolutional block of the feature extractor to obtain the third local spatial feature; and the third local spatial feature is input into the global pooling layer of the feature extractor to obtain the feature representation.
[0057] The process of inputting the temporal image into the first convolutional block of the feature extractor to obtain the first local spatial features includes: convolving the temporal image with the convolutional kernel in the first convolutional block to obtain initial first local spatial features; inputting the initial first local spatial features into the batch normalization layer in the first convolutional block to obtain intermediate first local spatial features; and inputting the intermediate first local spatial features into the max pooling layer in the first convolutional block to obtain the final first local spatial features.
[0058] The process of inputting the first local spatial feature into the second convolutional block of the feature extractor to obtain the second local spatial feature includes: convolving the first local spatial feature with the convolutional kernel in the second convolutional block to obtain an initial second local spatial feature; inputting the initial second local spatial feature into the batch normalization layer in the second convolutional block to obtain an intermediate second local spatial feature; and inputting the intermediate second local spatial feature into the max pooling layer in the second convolutional block to obtain the final second local spatial feature.
[0059] The process of inputting the second local spatial feature into the third convolutional block of the feature extractor to obtain the third local spatial feature includes: convolving the second local spatial feature with the convolution kernel in the third convolutional block to obtain the initial third local spatial feature; and inputting the initial third local spatial feature into the batch normalization layer in the first convolutional block to obtain the final third local spatial feature.
[0060] S104. Input the target NDVI time series data, VV time series image, VH time series image and spatial feature representation into the multi-scale feature fusion module in the phenological period prediction model to obtain the fused feature tensor.
[0061] In the embodiments of this application, the fusion feature tensor is used to characterize the phenological change pattern of crops in the same space over different time spans (from a few days to several months), and the correlation between the spatial distribution and temporal change of crop phenology (specifically, the phenological changes of crops in different spatial locations are related over time, and the overall trend of phenological evolution is uniform, such as the synchronous emergence and heading of crops in the same region over time, and the variation of the growth of crops in different regions over time).
[0062] Here, the multi-scale feature fusion module is used to process the fused temporal and spatial features to capture the dynamic changes of crop phenology at different time scales. We design this module based on an Inception-like architecture, combining multi-branch convolutional kernels to extract multi-scale spatiotemporal features in parallel, thereby improving the model's ability to represent key stages of the crop growth cycle (such as emergence and maturity). Since the time scale of phenological changes for different crops varies from a few days to several months, a single convolutional kernel cannot capture all the change patterns. Multi-scale convolution adapts to changes over different time spans through various convolutional kernel sizes, enhancing the model's ability to model complex temporal patterns, such as the rapid rise (spring) or slow decline (autumn) of crop growth. Therefore, this design captures the changing patterns of time series data at different time scales through parallel branches with different convolutional kernel sizes, while combining global information from spatial features to construct a robust feature representation that can adapt to the complexity of irregular optical-synthetic aperture radar time series images.
[0063] The input of the multi-scale feature fusion module includes two parts: (1) time series features: target NDVI time series data, VV time series image and VH time series image, in the form of , where B represents batch size, T represents sequence length, and D represents input feature dimension. (2) Spatial features: Spatial features extracted from NDVI, VV, and VH images by the spatial feature extraction module are represented in the form of , where 256×3 represents the splicing result of 256 channels for each of the three modes.
[0064] Furthermore, the specific implementation process of determining the spatial feature representation through this multi-scale feature fusion module is as follows: Step 1: Perform feature concatenation on the target NDVI time series data, VV time series image, VH time series image and spatial feature representation to obtain the initial fused feature tensor.
[0065] In this embodiment of the application, temporal and spatial features are expanded and spliced together along the temporal dimension to form the following form: The initial fusion feature tensor.
[0066] Step 2: Input the initial fusion feature tensor into each convolution branch in the multi-scale feature fusion module, so as to convolve the initial fusion feature tensor with convolution kernels of different sizes to obtain multiple intermediate fusion feature tensors.
[0067] In this embodiment, the core architecture of the multi-scale feature fusion module consists of three parallel convolutional branches. Each branch employs a one-dimensional convolution operation with different kernel sizes (3, 5, 7) to capture short-term, medium-term, and long-term temporal patterns, respectively. Remote sensing data often suffers from missing features due to cloud cover or sensor limitations. This design enhances the model's robustness to sparse data by extracting features from different time windows and captures nonlinear time series features, thereby improving prediction accuracy. The convolution operation of each branch can be represented as follows: ; in, and These represent the kernel weights and bias terms of the k-th convolutional branch, respectively. , , This represents a one-dimensional convolution operation, where f is the LeakyReLU activation function, defined as: ,in, It has a negative slope. This is the intermediate fused feature tensor output by the k-th convolutional branch. The choice of convolutional kernel size reflects the need for feature extraction at different time scales: smaller kernels (e.g., 3×3) focus on local temporal changes and are suitable for capturing rapid phenological changes; larger kernels (e.g., 7×7) can capture patterns over longer time spans and are suitable for analyzing the overall trend of phenological cycles. Each convolutional branch is followed by a batch normalization layer to accelerate training and stabilize gradient flow, and then by the LeakyReLU activation function and Dropout layers to enhance the model's generalization ability.
[0068] Step 3: Input all intermediate fusion feature tensors into the fusion layer of the multi-scale feature fusion module to obtain the final fusion feature tensor.
[0069] In this embodiment, to integrate multi-scale features, the fusion layer concatenates the intermediate fusion feature tensors output from the three convolutional branches along the channel dimension. The intermediate fusion feature tensor is obtained. : ,in Subsequently, a one-dimensional convolution (with a kernel size of 1) was used to reduce the number of channels from 192 to 128. The calculation formula is as follows: ; in, and , where are the weights and biases of the fusion layer, and f is the LeakyReLU activation function. This is the final fusion feature tensor.
[0070] In addition, the fusion layer is followed by a batch normalization layer and a LeakyReLU activation function, and a Dropout layer is used to further prevent overfitting, thus providing compact and information-rich input for subsequent processing steps.
[0071] S105. Input the fused feature tensor into the prediction module of the phenological prediction model to obtain the phenological prediction results of the crop to be predicted.
[0072] In this embodiment, the phenological period is ultimately predicted by a prediction module that integrates a bidirectional long short-term memory network, an attention module, and a fully convolutional network, yielding the phenological period prediction results for the crop to be predicted. The phenological period prediction results for the crop to be predicted include the predicted time of the greening-up period (SOS) and the predicted time of the yellowing-out period (EOS).
[0073] The loss function module of the phenological period prediction model combines L1 loss and boundary constraint loss. L1 loss directly measures the absolute error between the predicted SOS (Solving of the Green) and EOS (Emerging of the Yellow) periods and the true labels, driving the model to learn accurate phenological time points. Boundary constraint loss ensures that the prediction results conform to temporal constraints. By weighting and combining L1 loss and boundary constraint loss, a balance is achieved between prediction accuracy and the physical plausibility of the results, thus effectively optimizing the model's predictions of SOS and EOS.
[0074] Based on the same inventive concept, this application also provides a phenological prediction device for the crop growing season, which corresponds to the method for predicting the phenological period of the crop growing season. Since the principle of the device in this application is similar to the method for predicting the phenological period of the crop growing season described above, the implementation of the device can refer to the implementation of the method, and the repeated parts will not be described again.
[0075] Reference Figure 3 The diagram shown is a schematic of a crop growing season phenological prediction device provided in an embodiment of this application. The device includes: The acquisition module 301 is used to acquire optical time-series images and SAR time-series images of the crop to be predicted; Input module 302 is used to input the optical time series image and the SAR time series image into the interpolation model, so as to interpolate the initial NDVI time series image in the optical time series image based on the VV time series image and VH time series image in the SAR time series image to obtain the target NDVI time series data; The input module 302 is further configured to input the target NDVI time series data, the VV time series image, and the VH time series image into the spatial feature extraction module of the phenological prediction model to obtain a spatial feature representation; the spatial feature representation is used to characterize the greenness difference of crops in different spatial locations, the greenness change trend of crops over time, the physical morphology of crop canopy, and the water content of crop canopy. The input module 302 is further configured to input the target NDVI time series data, the VV time series image, the VH time series image, and the spatial feature representation into the multi-scale feature fusion module of the phenological prediction model to obtain a fused feature tensor; the fused feature tensor is used to characterize the phenological change pattern of crops at different time spans, and the correlation between the spatial distribution and temporal change of crop phenology; The input module 302 is also used to input the fused feature tensor into the prediction module in the phenological prediction model to obtain the phenological prediction result of the crop to be predicted.
[0076] like Figure 4 As shown in the embodiment of this application, an electronic device 400 includes a processor 401, a memory 402, and a bus. The memory 402 stores machine-readable instructions that can be executed by the processor 401. When the electronic device is running, the processor 401 communicates with the memory 402 via the bus. The processor 401 executes the machine-readable instructions to perform the steps of the crop growing season phenological prediction method described above.
[0077] Specifically, the memory 402 and processor 401 mentioned above can be general-purpose memory and processor, without any specific limitations. When the processor 401 runs the computer program stored in the memory 402, it can execute the above-mentioned method for predicting the phenological period of the crop growing season.
[0078] Corresponding to the above-mentioned method for predicting the phenological stages of the crop growing season, this application also provides a computer-readable storage medium storing a computer program, which, when run by a processor, executes the steps of the above-mentioned method for predicting the phenological stages of the crop growing season.
[0079] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems and devices described above can be referred to the corresponding processes in the method embodiments, and will not be repeated here. In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods can be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of modules is only a logical functional division, and in actual implementation, there may be other division methods. Furthermore, multiple modules or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the displayed or discussed mutual coupling or direct coupling or communication connection can be through some communication interfaces; the indirect coupling or communication connection of devices or modules can be electrical, mechanical, or other forms.
[0080] The modules described as separate components may or may not be physically separate. The components shown as modules may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.
[0081] In addition, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit.
[0082] If the aforementioned functions are implemented as software functional units and sold or used as independent products, they can be stored in a processor-executable, non-volatile, computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a portion of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, ROM, RAM, magnetic disks, or optical disks.
[0083] The above are merely specific embodiments of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A method for predicting phenological stages during the crop growing season, characterized in that, The method includes: Acquire optical time-series images and SAR time-series images of the crop to be predicted; The optical time-series image and the SAR time-series image are input into the interpolation model to interpolate the initial NDVI time-series image in the optical time-series image based on the VV time-series image and VH time-series image in the SAR time-series image, so as to obtain the target NDVI time-series data. The target NDVI time-series data, the VV time-series image, and the VH time-series image are input into the spatial feature extraction module of the phenological prediction model to obtain a spatial feature representation. The spatial feature representation is used to characterize the greenness difference of crops in different spatial locations, the greenness change trend of crops over time, the physical morphology of crop canopy, and the water content of crop canopy. The target NDVI time series data, the VV time series image, the VH time series image, and the spatial feature representation are input into the multi-scale feature fusion module of the phenological period prediction model to obtain a fused feature tensor. The fused feature tensor is used to characterize the phenological change pattern of crops at different time spans, and the correlation between the spatial distribution and temporal change of crop phenology. The fused feature tensor is input into the prediction module of the phenological prediction model to obtain the phenological prediction result of the crop to be predicted.
2. The method according to claim 1, characterized in that, The step of inputting the optical time-series image and the SAR time-series image into an interpolation model, and interpolating the initial NDVI time-series image in the optical time-series image based on the VV and VH time-series images in the SAR time-series image to obtain the target NDVI time-series data includes: The optical time-series image and the SAR time-series image are input into the feature extraction module of the interpolation model to extract full-cycle growth features by capturing the spatial distribution differences of crop phenology and the dynamic changes in the temporal dimension; the full-cycle growth features are used to reflect the growth status of each growth stage in the entire growth cycle of the crop from seedling to maturity. The full-cycle growth characteristics are input into the bidirectional long short-term memory module in the interpolation model to capture the spatiotemporal dependence of crop phenological changes and obtain context-aware output. The context-aware output is input into the output mapping module in the interpolation model to obtain the target NDVI time series data.
3. The method according to claim 1, characterized in that, The step of inputting the target NDVI time-series data, the VV time-series image, and the VH time-series image into the spatial feature extraction module of the phenological period prediction model to obtain spatial feature representation includes: The target NDVI time series data is input into the first feature extractor in the spatial feature extraction module to capture the greenness difference of crops in different spatial locations and the greenness change trend of crops over time, and to obtain the first pooling representation. The VV time series image is input into the second feature extractor in the spatial feature extraction module to capture the physical morphology of the crop canopy and obtain the second pooling representation; The VH time series image is input into the third feature extractor in the spatial feature extraction module to capture the water content of the crop canopy and obtain the third pooling representation; The first pooling representation, the second pooling representation, and the third pooling representation are concatenated to obtain a spatial feature representation.
4. The method according to claim 3, characterized in that, The following steps are used to input any temporal image into the corresponding feature extractor in the spatial feature extraction module to obtain the corresponding feature representation: The time-series image is input into the first convolutional block of the feature extractor to obtain the first local spatial features; The first local spatial features are input into the second convolutional block of the feature extractor to obtain the second local spatial features; The second local spatial feature is input into the third convolutional block of the feature extractor to obtain the third local spatial feature; The third local spatial features are input into the global pooling layer of the feature extractor to obtain the feature representation.
5. The method according to claim 4, characterized in that, The step of inputting the temporal image into the first convolutional block of the feature extractor to obtain the first local spatial features includes: The temporal image is convolved using the convolution kernel in the first convolution block to obtain the initial first local spatial features; The initial first local spatial features are input into the batch normalization layer in the first convolutional block to obtain the intermediate first local spatial features. The intermediate first local spatial features are input into the max pooling layer in the first convolutional block to obtain the final first local spatial features.
6. The method according to claim 4, characterized in that, The step of inputting the second local spatial features into the third convolutional block of the feature extractor to obtain the third local spatial features includes: The second local spatial features are convolved by the convolution kernel in the third convolution block to obtain the initial third local spatial features; The initial third local spatial features are input into the batch normalization layer in the first convolutional block to obtain the final third local spatial features.
7. The method according to claim 1, characterized in that, The step of inputting the target NDVI time-series data, the VV time-series image, the VH time-series image, and the spatial feature representation into the multi-scale feature fusion module of the phenological period prediction model to obtain a fused feature tensor includes: The target NDVI time series data, the VV time series image, the VH time series image, and the spatial feature representation are concatenated to obtain an initial fused feature tensor; The initial fusion feature tensor is input into each convolution branch in the multi-scale feature fusion module to convolve the initial fusion feature tensor with convolution kernels of different sizes to obtain multiple intermediate fusion feature tensors. All intermediate fusion feature tensors are input into the fusion layer of the multi-scale feature fusion module to obtain the final fusion feature tensor.
8. A device for predicting phenological stages during the crop growing season, characterized in that, The device includes: The acquisition module is used to acquire optical time-series images and SAR time-series images of the crop to be predicted; The input module is used to input the optical time-series image and the SAR time-series image into the interpolation model, so as to interpolate the initial NDVI time-series image in the optical time-series image based on the VV time-series image and VH time-series image in the SAR time-series image to obtain the target NDVI time-series data. The input module is also used to input the target NDVI time series data, the VV time series image and the VH time series image into the spatial feature extraction module in the phenological prediction model to obtain a spatial feature representation; the spatial feature representation is used to characterize the greenness difference of crops in different spatial locations, the greenness change trend of crops over time, the physical morphology of crop canopy and the water content of crop canopy; The input module is further configured to input the target NDVI time series data, the VV time series image, the VH time series image, and the spatial feature representation into the multi-scale feature fusion module of the phenological prediction model to obtain a fused feature tensor; the fused feature tensor is used to characterize the phenological change pattern of crops at different time spans, and the correlation between the spatial distribution and temporal change of crop phenology; The input module is also used to input the fused feature tensor into the prediction module of the phenological period prediction model to obtain the phenological period prediction result of the crop to be predicted.
9. An electronic device, characterized in that, include: The device includes a processor, a storage medium, and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, and when the electronic device is in operation, the processor communicates with the storage medium via the bus, and the processor executes the machine-readable instructions to perform the steps of the crop growing season phenological prediction method as described in any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, performs the steps of the crop growing season phenological prediction method as described in any one of claims 1 to 7.