African arid land field parcel extraction method based on maa-bcnet
By combining the MAA-BCNet model with a bi-branch encoder and multi-scale axial attention, the problems of ambiguity and misjudgment in farmland plot extraction in arid regions of Africa were solved, achieving high-precision farmland plot extraction and cost savings.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HOHAI UNIV
- Filing Date
- 2025-06-30
- Publication Date
- 2026-06-23
AI Technical Summary
In arid regions of Africa, existing technologies often result in blurred or broken edges of farmland plots in medium-resolution remote sensing images against a high-contrast desert background. Traditional methods lack sufficient extraction accuracy and explicit constraints on farmland boundaries, leading to misjudgments or missed detections.
A method based on MAA-BCNet, combined with a dual-branch encoder structure and multi-scale axial attention, is used to accurately extract farmland plots through Sentinel-2 image feature optimization and multi-level boundary constraints.
It improves the accuracy of farmland plot extraction, reduces computational workload and monitoring costs, adapts to the complex geographical environment of arid regions in Africa, and enhances edge constraint capabilities.
Smart Images

Figure CN120808145B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of satellite remote sensing image recognition technology, specifically to a method for extracting farmland plots in arid regions of Africa based on MAA-BCNet. Background Technology
[0002] Faced with the severe challenges of global climate change, water scarcity, and population growth, accurate monitoring of arable land in arid regions of Africa is crucial for ensuring food security. Satellite remote sensing technology, due to its ability to rapidly acquire large-scale surface information, has become a core tool for arable land monitoring. However, its application in arid regions of Africa faces significant technical bottlenecks: in high-contrast desert backgrounds, the edges of arable land plots in medium-resolution remote sensing images are prone to blurring and fragmentation, and the complex terrain and irregular distribution of arable land result in insufficient accuracy for traditional methods. Existing technologies, such as unsupervised classification and edge detection, suffer from low accuracy due to limitations in resolution and noise interference; object-oriented analysis and machine learning methods (such as SVM and random forests) are susceptible to salt-and-pepper noise or have insufficient edge fitting ability when dealing with heterogeneous agricultural landscapes; while deep learning-based encoder-decoder models have improved upon these approaches, single encoders struggle to accommodate multi-scale features, fusion encoders suffer from information loss, and their ability to distinguish spectral differences between arable land and similar features (such as grassland and woodland) in desert backgrounds is limited. Furthermore, existing models lack explicit constraints on arable land boundaries, leading to misclassification or missed detection at irregular plot edges. To address the aforementioned issues, there is an urgent need for a remote sensing farmland extraction technology that can effectively integrate multi-scale features and enhance edge constraints. This technology would be adapted to the complex geographical environment and farmland morphology of arid regions in Africa, improve the pixel-level accuracy of farmland plot extraction, and provide reliable technical support for large-scale farmland monitoring. Summary of the Invention
[0003] The purpose of this invention is to provide a method for extracting farmland plots in arid regions of Africa based on MAA-BCNet. This method can integrate multi-scale axial attention and multi-level boundary constraints to achieve accurate extraction of farmland plots in arid regions of Africa.
[0004] To achieve the above functions, this invention designs a method for extracting arable land parcels in arid regions of Africa based on MAA-BCNet, performing the following steps S1-S5 to complete the extraction of arable land parcels in the target area:
[0005] Step S1: Acquire L2A-level optical satellite imagery of the Sentinel-2 region to be identified based on the GEE platform;
[0006] Step S2: Based on the band and spectral index thresholds, the extent of cultivated land in Sentinel-2 optical satellite imagery is initially extracted. Based on the visual interpretation of the extracted areas, cultivated land and non-cultivated land are classified and labeled in binary and sample points are generated.
[0007] Step S3: Extract spectral features from the Sentinel-2 optical satellite imagery obtained in Step S2, rank the importance of the spectral features and then randomly combine them, evaluate the best feature combination, and perform principal component analysis to obtain the three-band imagery after dimensionality reduction of the Sentinel-2 optical satellite imagery, which will be used as input data.
[0008] Step S4: Build the MAA-BCNet model based on the dual-branch encoder structure. Divide the Sentinel-2 optical satellite imagery into training and testing areas. Use the three-band imagery and corresponding samples obtained after dimensionality reduction as input to the MAA-BCNet model. Train the MAA-BCNet model to obtain the trained MAA-BCNet model.
[0009] Step S5: For the target area, apply the trained MAA-BCNet model to extract the cultivated land plots in the target area and draw a distribution map of the cultivated land plots in the target area.
[0010] As a preferred technical solution of the present invention, the specific steps of step S2 are as follows:
[0011] Step S2.1: For Sentinel-2 optical satellite imagery, the red, green and blue bands B2, B3, and B4, the near-infrared bands B11 and B12, and the red edge bands B5, B6, and B7 are used as the bands for initially extracting the cultivated land area. The spatial resolution is uniformly resampled to 10m, and the relevant vegetation indices are calculated before band synthesis is performed. The cultivated land area is initially extracted based on the vegetation spectral threshold.
[0012] Step S2.2: Visually interpret the Sentinel-2 optical satellite imagery using PIE and ArcGIS software, create binary label data for cultivated land and non-cultivated land, and automatically generate sample points from the label data using ArcGIS.
[0013] As a preferred technical solution of the present invention, the specific method of step S3 is as follows:
[0014] After ranking the importance of spectral features of Sentinel-2 optical satellite imagery using the out-of-bag data error of the RF algorithm, the features are randomly combined. Each random combination of spectral features is taken as a subset, and the TOP-K method is introduced to cross-validate and evaluate each subset of spectral features to obtain the optimal feature combination. Principal component analysis is used to reduce the dimensionality of the optimal feature combination, retaining the three principal components with a cumulative contribution rate of over 98%, and thus obtaining the feature map.
[0015] As a preferred technical solution of the present invention: the MAA-BCNet model built in step S4 is based on a dual-branch encoder structure, including two branches. One branch is a backbone network encoder composed of a convolutional neural network VGG16, which extracts local features at different scales in the feature map; the other branch is an RMT encoder composed of Manhattan self-attention and VisionTransformer, which extracts global features at different scales in the feature map.
[0016] The local features extracted by the backbone network encoder and the global features extracted by the RMT encoder are input into the feature fusion module for feature fusion. The feature map after feature fusion is input into the decoder, and the multi-scale feature map is upsampled through skip connections.
[0017] As a preferred embodiment of the present invention: the feature fusion module sequentially includes a convolutional layer, an axial attention module, an edge detection module, a pooling layer, a convolutional layer, and a ReLU activation function;
[0018] The axial attention module uses horizontal and vertical axis convolutions to capture long-range dependencies in the spatial dimension and uses residual connections for the output; the edge detection module uses four different Sobel filters to supplement the edge details after the axial attention feature enhancement.
[0019] As a preferred embodiment of the present invention, the training process of the MAA-BCNet model in step S4 is as follows:
[0020] Step S4.1: Select the training area and the test area on the feature map obtained in step S3, and crop the image and label into an image block of size 256×256 without overlap;
[0021] Step S4.2: Perform positive and negative sample balancing so that the ratio of positive to negative samples is approximately 1:1. After data augmentation using geometric transformation, the data is randomly divided into training and validation sets with a ratio of 8:2.
[0022] Step S4.3: Train the MAA-BCNet model using the Dice Loss + Focal Loss dual loss function. The Dice Loss + Focal Loss dual loss function is as follows:
[0023]
[0024] Focal Loss=-α t (1-p t ) γ log(p t )
[0025] Where, α t p is the balance factor.t To predict probabilities, p is used for the positive class and 1-p for the negative class, with γ being a moderating factor, p i Let g be the probability of the i-th pixel predicted by the model. i is the true label of the i-th pixel, which takes the value 0 or 1, and ε is a smoothing term to prevent the denominator from being zero;
[0026] Step S4.4: The MAA-BCNet model training process uses AdamW as the optimizer, with an initial learning rate of 0.0001, cosine annealing for learning rate reduction, a batch size of 8, and 100 training iterations.
[0027] Beneficial effects: Compared with the prior art, the advantages of the present invention include:
[0028] 1. This invention utilizes Sentinel-2 imagery for feature optimization and dimensionality reduction, proposes a dual-branch encoder structure, and introduces multi-scale axial attention and multi-level boundary constraints for farmland extraction, thereby reducing computational load while improving extraction accuracy;
[0029] 2. This invention utilizes Sentinel-2 image data to rapidly acquire multi-scale data on the distribution of cultivated land in arid regions, reducing the consumption of manpower and resources and greatly saving monitoring costs. Attached Figure Description
[0030] Figure 1 This is a flowchart of a method for extracting farmland plots in arid regions of Africa based on MAA-BCNet, according to an embodiment of the present invention.
[0031] Figure 2 This is a schematic diagram of the MAA-BCNet model provided according to an embodiment of the present invention. Detailed Implementation
[0032] The present invention will be further described below with reference to the accompanying drawings. The following embodiments are only used to more clearly illustrate the technical solution of the present invention, and should not be used to limit the scope of protection of the present invention.
[0033] The method for extracting arable land parcels in arid African regions based on MAA-BCNet provided in this embodiment of the invention refers to... Figure 1 Perform the following steps S1-S5 to complete the extraction of cultivated land plots in the target area:
[0034] Step S1: Obtain L2A-level optical satellite imagery of the Sentinel-2 region to be identified using the GEE (Google Earth Engine) platform;
[0035] Step S2: Based on the band and spectral index threshold, the cultivated land area in the Sentinel-2 optical satellite image is initially extracted. Based on the visual interpretation of the extracted area, the cultivated land and non-cultivated land are classified and labeled and sample points are generated.
[0036] The specific steps of step S2 are as follows:
[0037] Step S2.1: For Sentinel-2 optical satellite imagery, the red, green and blue bands B2, B3, and B4, the near-infrared bands B11 and B12, and the red edge bands B5, B6, and B7 are used as the bands for initially extracting the cultivated land area. The spatial resolution is uniformly resampled to 10m, and the relevant vegetation indices are calculated before band synthesis is performed. The cultivated land area is initially extracted based on the vegetation spectral threshold.
[0038] Step S2.2: Visually interpret the Sentinel-2 optical satellite imagery using PIE and ArcGIS software, create binary label data for cultivated land and non-cultivated land, and automatically generate sample points from the label data using ArcGIS.
[0039] Step S3: Extract spectral features from the Sentinel-2 optical satellite imagery obtained in Step S2, rank the importance of the spectral features, randomly combine them, evaluate the best feature combination, and perform principal component analysis to obtain the three-band imagery after dimensionality reduction of the Sentinel-2 optical satellite imagery, which is used as input data to maximize the use of the spectral features of the satellite imagery while reducing computational costs.
[0040] The specific method is as follows:
[0041] After ranking the importance of spectral features of Sentinel-2 optical satellite imagery using the out-of-bag error of the Random Forest (RF) algorithm, the features are randomly combined. Each random combination of spectral features is taken as a subset, and the TOP-K method is introduced to cross-validate and evaluate each subset of spectral features to obtain the optimal feature combination. Principal Component Analysis (PCA) is used to reduce the dimensionality of the optimal feature combination, retaining the three principal components with a cumulative contribution rate of over 98%, thus obtaining the feature map.
[0042] Step S4: Build the MAA-BCNet (Multi-scale Axial Attention and Boundary-Constrained U-Net with Dual Encoders) model based on the dual-branch encoder structure. Divide the Sentinel-2 optical satellite imagery into training and testing areas. Use the three-band images and corresponding samples obtained after dimensionality reduction as input to the MAA-BCNet model to train the MAA-BCNet model and obtain the trained MAA-BCNet model.
[0043] Reference Figure 2 The MAA-BCNet model is built using the PyTorch deep learning library. Based on U-Net++, this model employs a dual-branch encoder structure for farmland feature extraction. Farmland plot segmentation results are obtained through a feature fusion module and a multi-scale decoder. The dual-branch encoder structure includes two branches. One branch is a backbone network encoder composed of a VGG16 convolutional neural network, which acts as a local convolutional encoder to provide local features at different scales for subsequent feature fusion. The other branch is an RMT (Retentive Networks Meet Vision Transformers) encoder composed of Manhattan self-attention and Vision Transformer, which works in parallel with the backbone network to extract global features at different scales from the feature map.
[0044] The local features extracted by the backbone network encoder and the global features extracted by the RMT encoder are input into the feature fusion module for feature fusion to generate an attention map with more prominent farmland features; the feature map after feature fusion is input into the decoder, and upsampled multi-scale feature maps are obtained through skip connections.
[0045] The feature fusion module consists of convolutional layers, axial attention modules (CBAMs), edge detection modules (ED), pooling layers, convolutional layers, and ReLU activation functions.
[0046] The Axial Attention Module (CBAMs) replaces the spatial attention mechanism of the traditional CBAM with axial attention. It utilizes horizontal and vertical axis convolutions to capture long-range dependencies in the spatial dimension, enhancing the spectral and edge representation of farmland in the corresponding arid background. Residual connections replace the traditional Sigmoid function output. In scenes with low contrast between farmland and background, this design preserves original features and prevents the model from losing important semantic information due to excessive focus on edges. The Edge Detection Module (ED) applies four different Sobel filters to supplement edge details that axial attention fails to capture in arid backgrounds, forming a cascaded optimization process from semantic enhancement to spatial localization and then to edge refinement. The enhanced feature maps are fused and input into the decoder, where multi-scale feature maps are upsampled via skip connections.
[0047] The training process for the MAA-BCNet model is as follows:
[0048] Step S4.1: Select the training area and the test area on the feature map obtained in step S3, and crop the image and label into an image block of size 256×256 without overlap;
[0049] Step S4.2: Perform positive and negative sample balancing so that the ratio of positive to negative samples is approximately 1:1. After data augmentation using geometric transformation, the data is randomly divided into training and validation sets with a ratio of 8:2.
[0050] Step S4.3: Train the MAA-BCNet model using the Dice Loss + Focal Loss dual loss function. The Dice Loss and Focal Loss loss functions complement each other to solve the problems of classification bias and segmentation accuracy, and improve the segmentation performance of small-scale plots and edge regions; the Dice Loss + Focal Loss dual loss function is as follows:
[0051]
[0052] Focal Loss=-α t (1-p t ) γ log(p t )
[0053] Where, α t p is the balance factor. t To predict probabilities, p is used for the positive class and 1-p for the negative class, with γ being a moderating factor, p i Let g be the probability of the i-th pixel predicted by the model. i is the true label of the i-th pixel, which takes the value 0 or 1, and ε is a smoothing term to prevent the denominator from being zero;
[0054] Step S4.4: The MAA-BCNet model training process uses AdamW as the optimizer, with an initial learning rate of 0.0001, cosine annealing for learning rate reduction, a batch size of 8, and 100 training iterations.
[0055] Step S5: For the target area, apply the trained MAA-BCNet model to extract the cultivated land plots in the target area and draw a distribution map of the cultivated land plots in the target area.
[0056] The embodiments of the present invention have been described in detail above with reference to the accompanying drawings. However, the present invention is not limited to the above embodiments. Within the scope of knowledge possessed by those skilled in the art, various changes can be made without departing from the spirit of the present invention.
Claims
1. A method for extracting arable land parcels in arid regions of Africa based on MAA-BCNet, characterized in that, Perform the following steps S1-S5 to complete the extraction of cultivated land parcels in the target area: Step S1: Acquire L2A-level optical satellite imagery of the Sentinel-2 region to be identified using the GEE platform; Step S2: Based on the band and spectral index threshold, the cultivated land area in the Sentinel-2 optical satellite image is initially extracted. Based on the visual interpretation of the extracted area, the cultivated land and non-cultivated land are classified and labeled and sample points are generated. Step S3: Extract spectral features from the Sentinel-2 optical satellite imagery obtained in Step S2, rank the importance of the spectral features and then randomly combine them, evaluate the best feature combination, and perform principal component analysis to obtain the three-band imagery after dimensionality reduction of the Sentinel-2 optical satellite imagery, which is then used as the input imagery. Step S4: Build the MAA-BCNet model based on the dual-branch encoder structure. The full name of the MAA-BCNet model is Multi-scale Axial Attention and Boundary-Constrained U-Net with Dual Encoders. Divide the Sentinel-2 optical satellite imagery into training and testing areas. Use the three-band images and corresponding samples obtained after dimensionality reduction as input to the MAA-BCNet model to train the MAA-BCNet model and obtain the trained MAA-BCNet model. The MAA-BCNet model built in step S4 is based on a dual-branch encoder structure, which includes two branches. One branch is a backbone network encoder composed of a convolutional neural network VGG16, which extracts local features at different scales in the feature map. The other branch is an RMT encoder composed of Manhattan self-attention and Vision Transformer, which extracts global features at different scales in the feature map. The local features extracted by the backbone network encoder and the global features extracted by the RMT encoder are input into the feature fusion module for feature fusion. The feature map after feature fusion is input into the decoder, and the multi-scale feature map is upsampled through skip connections. The feature fusion module consists of a convolutional layer, an axial attention module, an edge detection module, a pooling layer, another convolutional layer, and a ReLU activation function. The axial attention module uses horizontal and vertical axis convolutions to capture long-range dependencies in the spatial dimension and uses residual connections for the output; the edge detection module uses four different Sobel filters to supplement the edge details after the axial attention feature enhancement. Step S5: For the target area, apply the trained MAA-BCNet model to extract the cultivated land plots in the target area and draw a distribution map of the cultivated land plots in the target area.
2. The method for extracting arable land parcels in arid African regions based on MAA-BCNet according to claim 1, characterized in that, The specific steps of step S2 are as follows: Step S2.1: For Sentinel-2 optical satellite imagery, the red, green and blue bands B2, B3, and B4, the near-infrared bands B11 and B12, and the red edge bands B5, B6, and B7 are used as the bands for initially extracting the cultivated land area. The spatial resolution is uniformly resampled to 10m, and the relevant vegetation indices are calculated before band synthesis is performed. The cultivated land area is initially extracted based on the vegetation spectral threshold. Step S2.2: Visually interpret the Sentinel-2 optical satellite imagery using PIE and ArcGIS software, create binary label data for cultivated land and non-cultivated land, and automatically generate sample points from the label data using ArcGIS.
3. The method for extracting arable land plots in arid African regions based on MAA-BCNet according to claim 1, characterized in that, The specific method for step S3 is as follows: After ranking the importance of spectral features of Sentinel-2 optical satellite imagery using the out-of-bag error of the RF algorithm, the features are randomly combined. Each random combination of spectral features is taken as a subset, and the TOP-K method is introduced to cross-validate and evaluate each subset of spectral features to obtain the optimal feature combination. Principal component analysis is used to reduce the dimensionality of the optimal feature combination, retaining the three principal components with a cumulative contribution rate of over 98%, and thus obtaining the input data.
4. The method for extracting arable land plots in arid African regions based on MAA-BCNet according to claim 1, characterized in that, The training process for the MAA-BCNet model in step S4 is as follows: Step S4.1: Select the training area and the test area on the feature map obtained in step S3, and crop the image and label into an image block of size 256×256 without overlap; Step S4.2: Perform positive and negative sample balancing to make the ratio of positive to negative samples 1:
1. After data augmentation using geometric transformation, the samples are randomly divided into training and validation sets with a ratio of 8:
2. Step S4.3: Train the MAA-BCNet model using the Dice Loss + Focal Loss dual loss function. The Dice Loss + Focal Loss dual loss function is as follows: ; ; in, As a balance factor, To predict probabilities, the value is p for the positive class and 1−p for the negative class. As a regulating factor, Let be the probability of the i-th pixel predicted by the model. This is the true label of the i-th pixel, with a value of 0 or 1. To smooth out terms and prevent the denominator from being zero; Step S4.4: The MAA-BCNet model training process uses AdamW as the optimizer, with an initial learning rate of 0.0001, cosine annealing for learning rate reduction, a batch size of 8, and 100 training iterations.