Fine extraction method of winter wheat planting area based on high-resolution image and edge-enhanced deep lab v3+
By introducing an edge enhancement module and a joint loss function into the DeepLabV3+ model, the problem of insufficient edge localization accuracy in winter wheat planting areas in Gaofen-2 images was solved, and the fine extraction and automated processing of winter wheat field edges were realized.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NORTHWEST A & F UNIV
- Filing Date
- 2026-03-17
- Publication Date
- 2026-06-19
Smart Images

Figure CN122244707A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of remote sensing image processing and agricultural information technology, and in particular to a method for refined extraction of winter wheat planting areas based on Gaofen-2 imagery and edge-enhanced DeepLabV3+. Background Technology
[0002] Winter wheat is an important food crop in my country, and accurately and quickly obtaining its spatial distribution information is of great significance for food security assessment, agricultural policy formulation, and precision agriculture management. With the development of remote sensing technology, high-resolution remote sensing satellites have provided a data foundation for precision agricultural surveys. The Gaofen-2 (GF-2) satellite is a high-resolution Earth observation satellite independently developed by my country, equipped with panchromatic and multispectral cameras. Its panchromatic resolution reaches 0.8 meters, and its multispectral resolution reaches 3.2 meters. After fusion processing, it can produce multispectral images with a resolution of 1 meter. However, Gaofen-2 images have limitations in spectral information (only blue, green, red, and near-infrared bands), complex ground textures, and irregular boundaries of winter wheat fields that often intersect with roads, ditches, and other vegetation. Traditional classification methods and conventional deep learning semantic segmentation models struggle to accurately extract planting boundaries.
[0003] In recent years, deep learning-based semantic segmentation models have achieved significant results in remote sensing ground object extraction. DeepLabV3+, a semantic segmentation network proposed by Google in 2018, captures multi-scale contextual information by introducing an Atrous Spatial Pyramid Pooling (ASPP) module and uses an encoder-decoder structure to recover spatial details, demonstrating strong segmentation capabilities for object edges. Compared to DeepLabV3, the added decoder module in DeepLabV3+ can effectively recover spatial details lost due to atrous convolution downsampling, resulting in even stronger segmentation capabilities for object boundaries. However, the standard DeepLabV3+ still has the following shortcomings when processing winter wheat fields in Gaofen-2 imagery: Insufficient edge localization accuracy: The limited details recovered by the decoder result in obvious jagged edges on the field blocks in the segmentation results, with low consistency with the true boundaries. Poor adaptability to limited spectral information: Gaofen-2 only has four bands, making it difficult for the model to effectively distinguish winter wheat from other green vegetation growing at the same time; Lack of targeted edge constraint mechanisms: Model training relies solely on pixel-level classification loss, without explicitly guiding the network to focus on boundary regions, leading to misclassification near the boundaries.
[0004] Existing research has attempted to enhance boundary segmentation capabilities by improving loss functions. For example, Boundary DoU Loss guides boundary segmentation by calculating the ratio of the difference set to the union of the predicted and actual values. Other studies have proposed multi-feature fusion structures to expand the receptive field and introduced attention mechanisms to improve feature capture capabilities in key regions. However, specific improvement methods tailored to the characteristics of Gaofen-2 imagery and winter wheat cultivation are still rare.
[0005] Therefore, there is an urgent need for a refined extraction method for winter wheat planting areas that can fully utilize the high spatial resolution characteristics of Gaofen-2 imagery while enhancing edge extraction capabilities. Summary of the Invention
[0006] In order to overcome the above-mentioned defects of the prior art, the present invention provides a method for fine extraction of winter wheat planting areas based on Gaofen-2 image and edge-enhanced DeepLabV3+, so as to solve the problems existing in the background art.
[0007] This invention provides the following technical solution: a method for refined extraction of winter wheat planting areas based on Gaofen-2 imagery and edge-enhanced DeepLabV3+, comprising the following steps: Step 1: Acquire Gaofen-2 remote sensing images and preprocess them to construct a training dataset containing winter wheat label maps; Step 2: Construct an edge-enhanced DeepLabV3+ network. This network adds an edge enhancement module to the standard DeepLabV3+ network. The edge enhancement module is used to explicitly learn the edge features of winter wheat fields and inject the edge features into the decoder to enhance the semantic representation of the boundary region. Step 3: Design a joint loss function, including segmentation loss, edge loss, and boundary-aware loss, and train the network end-to-end; Step 4: Input the high-resolution image to be extracted into the trained network, and obtain the refined extraction results of the winter wheat planting area through sliding window prediction and post-processing.
[0008] Furthermore, the preprocessing described in step 1 includes the following sub-steps: Step 1.1 Radiometric calibration: Radiometric calibration of Gaofen-2 panchromatic and multispectral data is performed using official calibration coefficients. Multispectral data is calibrated to radiance, and panchromatic data is calibrated to reflectance. Step 1.2 Atmospheric Correction: Atmospheric correction is performed on the multispectral data using the FLAASH module to obtain surface reflectance data; Step 1.3 Geometric fine correction: Perform geometric fine correction on the image based on ground control points, and control the RMS error within 0.5 pixels; Step 1.4 Image Fusion: The panchromatic and multispectral images are fused using the NNDiffuse or Gram-Schmidt algorithm to generate a multispectral image with a resolution of 1 meter. Step 1.5 Image cropping and sample labeling: The image is cropped into 512×512 pixel image blocks, and the winter wheat planting area is labeled based on the field survey data. The images are divided into training set, validation set and test set in a 6:2:2 ratio.
[0009] Furthermore, the edge enhancement module described in step 2 specifically includes: Step 2.1 Multi-scale feature input: Extract multi-scale features downsampled by 8 times and downsampled by 16 times from the encoder; Step 2.2 Feature Alignment and Fusion: Upsample the 16x downsampled features and concatenate them with the 8x downsampled features, then compress the number of channels using a 1×1 convolution; Step 2.3 Edge Feature Extraction Subnetwork: Composed of multiple convolutional layers, it outputs an edge probability map with the same resolution as the input image; Step 2.4 Edge Feature Injection Path: Extract features from the intermediate layer of the edge feature extraction subnetwork, upsample them, and add them element-wise with the features in the decoder to inject edge information.
[0010] Furthermore, the edge feature extraction subnetwork consists of four convolutional layers: the first 3×3 convolutional layer compresses the input channels to 128, the second 3×3 convolutional layer compresses them to 64, the third 3×3 convolutional layer compresses them to 32, and the fourth 1×1 convolutional layer outputs a single-channel edge probability map and is then activated by a Sigmoid function; the 32-channel features output by the third layer are used for edge feature injection.
[0011] Furthermore, the joint loss function mentioned in step 3 is: ,in, The cross-entropy segmentation loss is used. Binary cross-entropy edge loss, α represents the boundary-aware loss; α and β are balancing factors, with values ranging from 0.2 to 0.5 and from 0.1 to 0.3, respectively.
[0012] Furthermore, the boundary-aware loss employs Boundary Difference over Union Loss, which guides boundary region segmentation by calculating the ratio of the difference set to the union between the predicted and actual values. The calculation formula is as follows: Where P is the predicted segmentation region, G is the true segmentation region, and B is the boundary region.
[0013] Furthermore, the sliding window prediction in step 4 uses a 50% overlapping sliding window, and the prediction results of the overlapping areas are weighted and fused using Gaussian weights; the post-processing includes conditional random field optimization, small patch removal, and hole filling.
[0014] Furthermore, the standard DeepLabV3+ includes an encoder, an ASPP module, and a decoder. The encoder uses ResNet-50, ResNet-101, Xception, or MobileNetV2 as the backbone network, and the ASPP module contains multiple dilated convolutional branches with different dilation rates.
[0015] A computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method as described in any one of claims 1 to 8 above.
[0016] An electronic device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that the processor executes the program to implement the method as described in any one of the above descriptions.
[0017] The technical effects and advantages of this invention are as follows: This invention, by designing an edge enhancement module and introducing edge loss constraints, enables the network to explicitly learn field boundary features, significantly improving the segmentation accuracy of winter wheat field edges, effectively suppressing jagged edges, and making the extracted field boundaries more closely resemble real-world features. The edge enhancement module of this invention fuses boundary features with semantic features, prompting the network to pay more attention to spatial details in the vicinity of the boundary, thereby better distinguishing winter wheat from other spectrally similar vegetation and reducing misclassification. Addressing the limitation of spectral information but high spatial resolution of the Gaofen-2 satellite, this invention fully utilizes its high spatial resolution advantage, achieving fine characterization of complex and fragmented fields through multi-scale features and edge enhancement mechanisms. Through joint loss and multi-task learning, the method of this invention exhibits better robustness to winter wheat performance in different regions and at different growth stages, making it easy to extend to other major winter wheat producing areas. This method is implemented end-to-end, requiring no manual design of complex features, and is suitable for rapid and automated extraction in large-scale winter wheat planting areas. Attached Figure Description
[0018] Figure 1 This is a flowchart of the overall method of the present invention. Detailed Implementation
[0019] The technical solution of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. These embodiments are only used to explain the present invention and are not intended to limit the scope of protection of the present invention.
[0020] Please see Figure 1As shown, a method for fine-grained extraction of winter wheat planting areas based on Gaofen-2 imagery and edge-enhanced DeepLabV3+ includes the following steps: Step 1: Data Acquisition and Preprocessing 1.1 Image Data Acquisition: Gaofen-2 satellite images covering the study area were acquired, selecting image data from the key growth stages of winter wheat (from the greening stage to the heading stage, generally March-April). Gaofen-2 satellite data products include Level 1 and Level 2 products. Level 1 products undergo radiometric and sensor correction, while Level 2 products undergo geometric and orthorectification correction. This invention preferably uses Level 1 products for subsequent processing to retain more original information for fine-tuning.
[0021] 1.2 Radiometric Calibration: The RadiometricCalibration tool in ENVI software was used to perform radiometric calibration on the multispectral and panchromatic data respectively. The calibration type for the multispectral data was set to Radiance, and the calibration type for the panchromatic data was set to Reflectance, with the ScaleFactor set to 10000. The calibration coefficients used were the annually updated coefficients provided by the China Resources Satellite Application Center.
[0022] 1.3 Atmospheric Correction: Atmospheric correction was performed on the multispectral data using ENVI's FLAASHAtmosphericCorrection module. The parameter settings were as follows: sensor type was UNKNOWN, flight date and time were set according to the image metadata, the atmospheric model was selected as Mid-LatitudeWinter or Mid-LatitudeSummer based on the latitude of the study area, the aerosol model was selected as Rural, and the initial visibility was set to 40 km. Surface reflectance data were obtained through atmospheric correction.
[0023] 1.4 Geometric Fine Correction: Based on 1:10000 topographic maps or high-precision ground control points (GCPs), a quadratic polynomial model is used to perform geometric fine correction on the imagery. Control points should be evenly distributed, with a minimum of 20 points, and the RMS error should be controlled within 0.5 pixels.
[0024] 1.5 Image Fusion: The NNDiffuse fusion algorithm or Gram-Schmidt fusion algorithm is used to fuse the 0.8-meter panchromatic band with the atmospherically corrected 3.2-meter multispectral band to generate a 1-meter resolution four-band (blue, green, red, and near-infrared) multispectral fused image.
[0025] 1.6 Image cropping: The merged image is cropped without overlap to a size of 512×512 pixels. Areas with edges less than 512 pixels are filled by mirroring.
[0026] 1.7 Sample Dataset Construction: Based on field survey data and visual interpretation, winter wheat planting areas were marked on preprocessed images. The marking criteria included: the hue (bright green during the greening stage), texture (uniform and fine), and shape (regular or irregular field boundaries) of the winter wheat in the image.
[0027] The labeled winter wheat sample areas should be evenly distributed within the study area, covering different planting structures (large fields, fragmented fields) and different growth conditions.
[0028] Generate pixel-level binary label images: 1 represents winter wheat, and 0 represents non-winter wheat.
[0029] The labeled image patches and their corresponding labels are combined to form a dataset with no fewer than 2,000 samples.
[0030] The dataset is randomly divided into training, validation, and test sets in a 6:2:2 ratio.
[0031] Step 2: Construct an edge-enhanced DeepLabV3+ network The edge-enhanced DeepLabV3+ network is constructed by adding an edge enhancement module (EEM) to the standard DeepLabV3+. The overall structure includes an encoder, an ASPP module, a decoder, and an edge enhancement module.
[0032] 2.1 Standard DeepLabV3+ Infrastructure: Encoder: A deep convolutional neural network is used as the backbone network. This invention preferably uses a ResNet-101 pre-trained on ImageNet, removing the fully connected layers and retaining only the last convolutional layer. The encoder downsamples the input image, and the spatial resolution of the output feature map is 1 / 16 of the input image.
[0033] ASPP module: Connected to the back end of the encoder, containing four parallel branches: A 1×1 convolution with 256 output channels. Three 3×3 dilated convolutions with dilation rates of 6, 12, and 18, respectively, each with 256 output channels. A global average pooling branch is upsampled after a 1×1 convolution to restore the spatial size. After concatenating all branch outputs, a 1×1 convolution is performed to reduce the dimensionality to 256 channels, resulting in multi-scale context-enhanced semantic features.
[0034] Decoder: The 256-channel feature map output by ASPP is upsampled by 4 times using bilinear upsampling. Low-level features (features output from the first residual block of ResNet-101) are extracted from the encoder and compressed to 48 channels using a 1×1 convolution to balance the weights of high-level semantic features and low-level spatial details. The upsampled high-level features and the compressed low-level features are concatenated along the channel dimension. Feature refinement is performed using two 3×3 convolutions, followed by BatchNormalization and ReLU activation for each convolution. Finally, the image size is restored to the input image size through 4x bilinear upsampling, and then a pixel-level classification probability map is output via softmax. 2.2 Edge Enhancement Module (EEM) Design: The edge enhancement module is the core improvement of this invention, used to explicitly learn the edge features of winter wheat fields and inject the edge information into the segmentation network. The module design includes the following sub-modules: Multi-scale feature input: The input to EEM is the multi-scale features output from the last two residual blocks of the encoder, which correspond to feature maps downsampled by 8 times (F8, 512 channels) and 16 times (F16, 1024 channels), respectively.
[0035] Feature alignment and fusion: By performing bilinear upsampling on F16, we obtain feature F16_up with the same spatial size as F8. Concatenate F8 and F16_up along the channel dimension to obtain the fused feature F_fuse (1536 channels). The number of F_fuse channels is compressed to 256 using 1×1 convolution, reducing the number of parameters. Edge feature extraction subnetwork: First layer: 3×3 convolution, 256 input channels, 128 output channels, stride 1, padding=1, followed by Batch Normalization (BN) and ReLU. Second layer: 3×3 convolution, 128 input channels, 64 output channels, stride 1, padding=1, followed by Batch Normalization (BN) and ReLU. The third layer: a 3×3 convolution with 64 input channels and 32 output channels, a stride of 1, padding of 1, followed by Batch Normalization (BN) and ReLU. Fourth layer: 1×1 convolution, 32 input channels, 1 output channel, followed by Sigmoid activation, generating an edge probability map E_edge with the same resolution as the input image (requires 8x upsampling). Edge feature injection path: A branch is derived from the third layer of the edge feature extraction subnetwork (outputting 32-channel features). The feature is upsampled 8-fold bilinearly to restore it to the input image size. The upsampled features are added element-wise to the unsampled features in the ASPP output features of the decoder, thereby injecting edge features into semantic features. The injected features continue to participate in subsequent processing by the decoder. Edge label generation: Perform Canny edge detection on the winter wheat label map constructed in step 1, with the threshold set to (50, 150), to generate a binary edge label map (1 represents edge pixels and 0 represents non-edge pixels), which serves as the supervision signal for the edge enhancement module.
[0036] 2.3 Overall network structure characteristics: The edge-enhanced DeepLabV3+ network implements a dual-task learning framework: Main task: Semantic segmentation of winter wheat growing areas, completed via the encoder-ASPP-decoder path. Auxiliary task: Edge detection of winter wheat fields, completed by the edge enhancement module. Feature sharing: The two tasks share encoder features, promoting mutual improvement through multi-task learning. Feature injection: Edge features are fed back to the decoder through the injection path to enhance the semantic representation of the boundary region. Step 3: Design the joint loss function To simultaneously optimize segmentation accuracy and edge consistency, this invention employs a joint loss function to train the network, including segmentation loss, edge loss, and boundary-aware loss.
[0037] 3.1 Segmentation Loss The multi-class cross-entropy loss function is used to calculate the pixel-level classification error between the predicted segmentation map and the ground truth label map. Where N is the total number of pixels and C is the number of categories (in this invention, C=2, winter wheat and non-winter wheat). Let i be the true label (0 or 1) for class c. Predict the probability that pixel i belongs to category c for the model.
[0038] 3.2 Edge Loss : The error between the edge probability map output by the edge enhancement module and the true edge map is calculated using the binary cross-entropy loss function. Where M is the total number of pixels on the edge map. The actual edge label (0 or 1). The marginal probabilities predicted by the model.
[0039] 3.3 Boundary-aware loss : To further enhance the segmentation accuracy of boundary regions, Boundary Difference over Union Loss is introduced. This loss guides boundary segmentation by calculating the ratio of the difference set to the union set between the predicted and actual values. Where P represents the predicted segmentation region, G represents the true segmentation region, and B represents the boundary region (obtained by morphological dilation of the true edge map). This loss function focuses on the prediction error of the boundary region, making the network more attentive to the optimization of the edge parts.
[0040] 3.4 Total Loss Function: In this invention, α and β are balance factors. Through experiments, the optimal effect was determined to be α=0.3 and β=0.2.
[0041] Step 4: Model Training and Optimization 4.1 Training environment configuration: Deep learning framework: PyTorch 1.10+ GPU: NVIDIA Tesla V100 or equivalent GPU Video memory requirement: ≥16GB 4.2 Training parameter settings: Optimizer: Adam, initial learning rate 0.001 Learning rate decay strategy: polynomial decay (power=0.9), updated every epoch. Batch size: 8 (adjusted based on GPU memory) Number of epochs: 100 Input image size: 512×512 Data augmentation: Random horizontal flip, random vertical flip, random rotation (90°, 180°, 270°), random brightness adjustment (±10%), random contrast adjustment (±10%). 4.3 Training Strategy: The encoder is initialized using ResNet-101 weights pre-trained on ImageNet. The newly added layers (partial convolution and edge enhancement modules in ASPP) are initialized using He. After each epoch, calculate the mean intersection-union ratio (mIoU) on the validation set and save the optimal model parameters. Early stopping strategy: Stop training when the validation set mIoU does not improve for 10 consecutive epochs. Step 5: Refined extraction in winter wheat growing areas 5.1 Sliding Window Prediction: After preprocessing the Gaofen-2 image to be predicted, the sliding window method is used for prediction. Window size: 512×512 pixels Slide step size: 256 pixels (50% overlap) Predictions are made for each window to obtain a segmentation probability map. 5.2 Results Fusion: For pixels in overlapping regions, a weighted average method is used to fuse the probability values from multiple predictions. The weights are set as follows: the central region has a higher weight, and the peripheral region has a lower weight (Gaussian weight). After fusion, the final category of each pixel is obtained through argmax. 5.3 Post-processing optimization: Conditional Random Field (CRF) Optimization: A fully connected CRF is used to refine the boundaries of the segmentation results. Parameter settings: 10 iterations, compatibility parameters are compatible. Small patch removal: Based on eight-neighbor connectivity analysis, isolated patches with an area of less than 10 pixels are removed. Void filling: Morphological closing operations are performed to fill voids within winter wheat regions. 5.4 Accuracy Verification Calculate overall accuracy (OA), mean intersection-over-union ratio (mIoU), and Kappa coefficient using the test set. Boundary segmentation accuracy is evaluated using the Boundary F1 Score. Randomly select verification points for field verification. The present invention will be further described in detail below with reference to specific embodiments.
[0042] Example 1: Extraction from a winter wheat growing area in a county in Shaanxi Province This embodiment uses a county in Shaanxi Province as the research area to verify the effectiveness of the method of the present invention. This area is a typical major winter wheat producing area, with a planting structure including large-scale contiguous planting and some fragmented fields, and land cover types including winter wheat, villages, roads, woodlands, rivers, etc.
[0043] Step 1: Data Acquisition and Preprocessing 1.1 Image Acquisition: Gaofen-2 Level 1 product images covering the study area were ordered from the China Center for Resources Satellite Data and Application. Image data from March 25, 2024 (winter wheat heading stage) were selected, with cloud cover <5%. The images include panchromatic bands (0.8m resolution) and multispectral bands (3.2m resolution).
[0044] 1.2 Radiometric Calibration: Using the Radiometric Calibration tool in ENVI 5.6 software, the official 2024 calibration coefficient file was loaded. The calibration type for multispectral data was set to Radiance, and for panchromatic data to Reflectance, with ScaleFactor=10000.
[0045] 1.3 Atmospheric Correction: Atmospheric correction was performed on the multispectral data using the FLAASH Atmospheric Correction module. Parameter settings: Sensor type: UNKNOWN; Imaging date: 2024-03-25; Imaging time: 03:25:00 based on metadata; Atmospheric model: Mid-Latitude Winter; Aerosol model: Rural; Initial visibility: 40km; Output: Reflectance data.
[0046] 1.4 Geometric fine correction: Based on the 1:10000 topographic map, 25 evenly distributed ground control points were collected, and a quadratic polynomial model was used for geometric fine correction, with the RMS error controlled within 0.43 pixels.
[0047] 1.5 Image Fusion: The NNDiffuse fusion algorithm is used to fuse the 0.8m panchromatic band and the 3.2m multispectral band to generate a four-band multispectral fused image with a resolution of 1m.
[0048] 1.6 Image cropping: The merged image was cropped without overlap to a size of 512×512 pixels, resulting in a total of 2850 image blocks.
[0049] 1.7 Sample Dataset Construction: Field survey: From March 26 to 30, 2024, a field survey was conducted, and latitude and longitude information of 120 winter wheat interpretation markers was collected using handheld GPS. Visual interpretation: Based on field survey data and historical Google Earth imagery, winter wheat planting areas are marked on preprocessed images. The marking criteria include hue (bright green), texture (uniform and delicate), and shape (field boundaries). A total of 1,850 winter wheat sample areas were labeled, covering an area of approximately 120 square kilometers. 2850 sample-label pairs were generated by cropping the data to 512×512 pixels. The dataset is randomly divided into a training set (1710 samples), a validation set (570 samples), and a test set (570 samples) in a 6:2:2 ratio. Step 2: Construct an edge-enhanced DeepLabV3+ network 2.1 Basic Network Construction: Using the PyTorch framework, the pre-trained ResNet-101 from torchvision is loaded as the encoder backbone. The ASPP module is configured according to Section 2.1 of the technical solution, with the expansion rate set to [6, 12, 18].
[0050] 2.2 Edge Enhancement Module Construction: Multi-scale features were extracted from ResNet-101 layer 2 (downsampling 8x, 512 channels) and layer 3 (downsampling 16x, 1024 channels). The layer 3 features are upsampled by 2x, concatenated with the layer 2 features, and then compressed to 256 channels via a 1×1 convolution. Construct a four-layer convolutional network for edge feature extraction: conv1: 3×3, 256→128, BN+ReLU conv2: 3×3, 128→64, BN+ReLU conv3: 3×3, 64→32, BN+ReLU conv4: 1×1, 32→1, Sigmoid An edge feature injection branch is derived from conv3, and after 8x upsampling, it is added to the ASPP output features. 2.3 Edge Label Generation: The Canny function of OpenCV is used to perform edge detection on the winter wheat label map, with a threshold of (50, 150) to generate a binary edge label map.
[0051] Step 3: Set the loss function The joint loss function is set according to Section 3 of the technical solution, with α=0.3 and β=0.2. The Boundary DoU Loss is adapted by referring to the open-source implementation.
[0052] Step 4: Model Training 4.1 Training Environment: Ubuntu 20.04, PyTorch 1.12, NVIDIA Tesla V100 (32GB) ×1 4.2 Training parameters: Optimizer: Adam, initial lr=0.001 Learning rate decay: Polynomial decay power = 0.9 Batch size: 8 epochs: 100 Input dimensions: 512×512 Data augmentation: random horizontal / vertical flip, random rotation, random brightness / contrast adjustment 4.3 Training monitoring: mIoU is calculated on the validation set every 5 epochs. The optimal mIoU of 90.3% is reached in the 42nd epoch, and the model weights are saved.
[0053] Step 5: Refined extraction in winter wheat growing areas 5.1 Sliding window prediction: The sliding window method is used to predict the test set images. The window size is 512×512, the step size is 256, and the overlapping areas are weighted and fused.
[0054] 5.2 Post-processing: CRF optimization was performed using pydensecrf, iterated 10 times. Remove small patches with an area of less than 10 pixels. Morphological closing operation fills the void (kernel size 3×3) 5.3 Accuracy verification: The various indicators were calculated on the test set, and the results are shown in Table 1.
[0055] Comparative experimental setup To verify the effectiveness of the method of the present invention, the following comparative experiment was conducted: Comparison method: U-Net (classic encoder-decoder architecture) PSPNet (Pyramid Scene Analysis Network) Standard DeepLabV3+ (ResNet-101 backbone) DeepLabV3+ + Attention (Adding the CBAM attention module) The method of this invention (edge-enhanced DeepLabV3+) Evaluation indicators: Overall Accuracy (OA) Mean Intersection over Union (mIoU) Kappa coefficient Boundary F1 Score (Evaluates the accuracy of edge segmentation) Parameters (M) Inference Time (ms / 512×512 image) Experimental results: Table 1. Performance comparison of different methods on the test set As can be seen from Table 1 The method of this invention outperforms the comparative methods in four metrics: OA, mIoU, Kappa, and Boundary F1. Compared to the standard DeepLabV3+, mIoU is improved by 2.3 percentage points, and Boundary F1 is improved by 7.2%. The number of parameters has increased slightly (2.1M), and the inference time has increased by 14ms, which is within an acceptable range. ablation experiment To verify the effectiveness of each module, an ablation experiment was conducted: Table 2 Ablation Experiment Results Ablation experiments show that: The Edge Enhancement Module (EEM) alone can improve mIoU by 1.4% and Boundary F1 by 0.04%. Further improvement after adding edge loss L_edge The effect is best after adding the boundary-aware loss L_boundary. CRF post-processing can bring a slight improvement. Example 2: Cross-regional migration capability test To verify the generalization ability of the method, the model trained in Shaanxi Province was directly applied to Gaofen-2 imagery of a county in Shandong Province (without fine-tuning). The test results are as follows: Table 3. Cross-regional migration test results The method of this invention reduces mIoU by 3.9 percentage points (from 90.1% to 86.2%) in cross-region migration, which is better than the 5.3 percentage point reduction of the standard DeepLabV3+ (from 87.8% to 82.5%), demonstrating that the method of this invention has stronger generalization ability.
[0056] Example 3: Comparison of different backbone networks To verify the universality of the method, experiments were conducted using different backbone networks: Table 4 Comparison of different backbone networks (mIoU %) The results show that the edge enhancement module proposed in this invention has a stable performance improvement (about 2.2-2.3 percentage points) on different backbone networks, proving that the method has good universality.
[0057] The above are merely specific embodiments of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
Claims
1. A method for refined extraction of winter wheat planting areas based on Gaofen-2 imagery and edge-enhanced DeepLabV3+, characterized in that, Includes the following steps: Step 1: Acquire Gaofen-2 remote sensing images and preprocess them to construct a training dataset containing winter wheat label maps; Step 2: Construct an edge-enhanced DeepLabV3+ network. This network adds an edge enhancement module to the standard DeepLabV3+ network. The edge enhancement module is used to explicitly learn the edge features of winter wheat fields and inject the edge features into the decoder to enhance the semantic representation of the boundary region. Step 3: Design a joint loss function, including segmentation loss, edge loss, and boundary-aware loss, and train the network end-to-end; Step 4: Input the high-resolution image to be extracted into the trained network, and obtain the refined extraction results of the winter wheat planting area through sliding window prediction and post-processing.
2. The method for refined extraction of winter wheat planting areas based on Gaofen-2 imagery and edge-enhanced DeepLabV3+ according to claim 1, characterized in that, The preprocessing described in step 1 includes the following sub-steps: Step 1.1 Radiometric calibration: Radiometric calibration of Gaofen-2 panchromatic and multispectral data is performed using official calibration coefficients. Multispectral data is calibrated to radiance, and panchromatic data is calibrated to reflectance. Step 1.2 Atmospheric Correction: Atmospheric correction is performed on the multispectral data using the FLAASH module to obtain surface reflectance data; Step 1.3 Geometric fine correction: Perform geometric fine correction on the image based on ground control points, and control the RMS error within 0.5 pixels; Step 1.4 Image Fusion: The panchromatic and multispectral images are fused using the NNDiffuse or Gram-Schmidt algorithm to generate a multispectral image with a resolution of 1 meter. Step 1.5 Image cropping and sample labeling: The image is cropped into 512×512 pixel image blocks, and the winter wheat planting area is labeled based on the field survey data. The images are divided into training set, validation set and test set in a 6:2:2 ratio.
3. The method for refined extraction of winter wheat planting areas based on Gaofen-2 imagery and edge-enhanced DeepLabV3+ according to claim 1, characterized in that, The edge enhancement module mentioned in step 2 specifically includes: Step 2.1 Multi-scale feature input: Extract multi-scale features downsampled by 8 times and downsampled by 16 times from the encoder; Step 2.2 Feature Alignment and Fusion: Upsample the 16x downsampled features and concatenate them with the 8x downsampled features, then compress the number of channels using a 1×1 convolution; Step 2.3 Edge Feature Extraction Subnetwork: Composed of multiple convolutional layers, it outputs an edge probability map with the same resolution as the input image; Step 2.4 Edge Feature Injection Path: Extract features from the intermediate layer of the edge feature extraction subnetwork, upsample them, and add them element-wise with the features in the decoder to inject edge information.
4. The method for refined extraction of winter wheat planting areas based on Gaofen-2 imagery and edge-enhanced DeepLabV3+ according to claim 3, characterized in that, The edge feature extraction subnetwork consists of four convolutional layers: the first 3×3 convolutional layer compresses the input channels to 128, the second 3×3 convolutional layer compresses them to 64, the third 3×3 convolutional layer compresses them to 32, and the fourth 1×1 convolutional layer outputs a single-channel edge probability map and is activated by Sigmoid; the 32-channel features output by the third layer are used for edge feature injection.
5. The method for refined extraction of winter wheat planting areas based on Gaofen-2 imagery and edge-enhanced DeepLabV3+ according to claim 1, characterized in that, The joint loss function mentioned in step 3 is: ,in, The cross-entropy segmentation loss is used. Binary cross-entropy edge loss, α represents the boundary-aware loss; α and β are balance factors, with values ranging from 0.2 to 0.5 and from 0.1 to 0.3, respectively.
6. The method for refined extraction of winter wheat planting areas based on Gaofen-2 imagery and edge-enhanced DeepLabV3+ according to claim 5, characterized in that, The boundary-aware loss employs Boundary Difference over Union Loss, which guides boundary region segmentation by calculating the ratio of the difference set to the union set between the predicted and actual values. The calculation formula is as follows: Where P is the predicted segmentation region, G is the true segmentation region, and B is the boundary region.
7. The method for refined extraction of winter wheat planting areas based on Gaofen-2 imagery and edge-enhanced DeepLabV3+ according to claim 1, characterized in that, The sliding window prediction in step 4 uses a 50% overlap sliding window, and the prediction results of the overlapping areas are weighted and fused using Gaussian weights; the post-processing includes conditional random field optimization, small patch removal, and hole filling.
8. The method for refined extraction of winter wheat planting areas based on Gaofen-2 imagery and edge-enhanced DeepLabV3+ according to claim 1, characterized in that, The standard DeepLabV3+ includes an encoder, an ASPP module, and a decoder. The encoder uses ResNet-50, ResNet-101, Xception, or MobileNetV2 as the backbone network. The ASPP module contains multiple dilated convolutional branches with different dilation rates.
9. A computer-readable storage medium having a computer program stored thereon that, when executed by a processor, implements the method as claimed in any one of claims 1 to 8.
10. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the program, it implements the method as described in any one of claims 1 to 8.