A farmland plot boundary instance segmentation method, system, device and medium

By using a multi-level teacher distillation method to segment farmland boundaries, the problem of insufficient segmentation accuracy in complex scenarios in existing technologies is solved, achieving high-precision and efficient farmland boundary segmentation that is adaptable to farmland environments with different resolutions and geometric complexities.

CN121962618BActive Publication Date: 2026-06-26SHANDONG UNIV OF SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHANDONG UNIV OF SCI & TECH
Filing Date
2026-04-01
Publication Date
2026-06-26

Smart Images

  • Figure CN121962618B_ABST
    Figure CN121962618B_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of image instance segmentation, in particular to a farmland plot boundary instance segmentation method, system, device and medium; the method of the present application firstly obtains a training subset by assigning resolution and geometric complexity two-dimensional condition labels to multi-source satellite remote sensing images and carrying out double-level constraint sampling; then a teacher model containing a condition feature adaptive layer and a boundary perception auxiliary branch is constructed, which can allow the model to adaptively adjust the feature map according to the two-dimensional labels and strengthen the perception and representation ability of the farmland plot boundary; then the knowledge distillation of source domain training and target domain migration is used to migrate the complex scene adaptation ability and high-precision boundary segmentation ability of the teacher model to a lightweight student model; finally, the distilled student model is applied to the processing of the satellite remote sensing images to be segmented, effectively making up for the poor adaptability of the traditional model to complex scenes and the weak boundary perception ability, and effectively improving the instance segmentation accuracy of the farmland plot boundary.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of image instance segmentation technology, specifically to a method, system, device, and medium for segmenting farmland plot boundaries. Background Technology

[0002] With the development of high-resolution Earth observation platforms and the widespread application of agricultural remote sensing, multi-source, multi-resolution satellite remote sensing imagery is widely used for farmland plot identification and boundary extraction. The results of farmland plot boundary segmentation are not only fundamental data for plot ownership confirmation and plot-scale yield estimation, but also crucial support for precision agriculture operations such as precision fertilization, agricultural machinery operation planning, and crop condition monitoring. Therefore, how to achieve stable and precise farmland plot boundary segmentation on large-scale, multi-resolution, and cross-regional remote sensing imagery has become a hot topic in the fields of intelligent remote sensing interpretation and precision agriculture.

[0003] In recent years, with the development of deep learning, methods such as Mask R-CNN and YOLO series instance segmentation networks have been gradually introduced into farmland segmentation tasks. Some studies have trained unified instance segmentation models based on large-scale datasets containing farmland samples from multiple countries and at multiple resolutions worldwide, achieving certain results in the automatic segmentation of farmland boundaries.

[0004] However, existing methods typically train by directly mixing samples of different resolutions and geometries, relying mainly on the network's own representational capabilities to implicitly learn multi-scale and complexity differences. They lack explicit modeling at the data construction and model structure levels, and their boundary accuracy and shape preservation capabilities remain significantly insufficient in extremely high-resolution, low-resolution, or geometrically complex plot scenarios. Summary of the Invention

[0005] The purpose of this invention is to provide a method, system, device, and medium for segmenting farmland plot boundaries using multi-level teacher distillation.

[0006] The technical solution of this invention is as follows:

[0007] A method for segmenting farmland plot boundaries includes the following operations:

[0008] S1. Acquire multi-source satellite remote sensing images and corresponding farmland plot vector annotation data. Assign two-dimensional conditional labels to each satellite remote sensing image, including resolution level and geometric complexity level. Construct a two-dimensional spatial grid based on resolution level and complexity level. Count the sample size of each grid and allocate corresponding sampling quotas. Perform two-level constraint sampling on the multi-source satellite remote sensing images according to the sampling quotas to obtain the training subset.

[0009] S2. Train the teacher model using a training subset to obtain the training teacher network; perform source domain training and target domain transfer knowledge distillation on the training teacher network and student model in sequence to obtain the distilled student model; The teacher model includes: the backbone network and feature pyramid neck network of the YOLOv11 network, and a conditional feature adaptation layer located after the backbone network and neck network and before the output layer. The conditional feature adaptation layer is used to adaptively adjust the feature map according to the two-dimensional conditional label, and a boundary-aware auxiliary branch is set in the output layer to output the boundary mask map and boundary distance map; The student model is a lightweight YOLOv11 network with fewer network layers and channels than the YOLOv11 network;

[0010] S3. Use the distilled student model to process the satellite remote sensing image to be segmented to obtain the segmentation results of farmland plot instances.

[0011] The two-level constraint sampling in S1 includes: selecting all samples for grids with no more than the corresponding sampling quota; performing K-means clustering on the representation vectors of all samples for grids with more than the corresponding sampling quota to obtain several clusters; within each cluster, clustering is performed according to the geographical region to which the samples belong, and redundancy removal is performed based on the intra-cluster similarity threshold; according to the sampling quota of the grid, a corresponding number of samples are selected from the redundancy-removed candidate samples in ascending order of distance from the cluster center as the selected training samples for the grid.

[0012] The method for obtaining the geometric complexity level in S1 is as follows: Based on the vector annotation data of farmland plots in satellite remote sensing images, calculate the geometric complexity features to form a comprehensive complexity index, classify the comprehensive complexity index to obtain the geometric complexity level; the geometric complexity features include at least: plot density, mean perimeter and area ratio, compactness, and boundary length density; after normalizing each geometric complexity feature, perform a weighted sum to obtain the comprehensive complexity index.

[0013] In S2, adaptive adjustments include resolution adaptation and complexity adaptation. Resolution adaptation is used to enhance the boundaries or structure of the corresponding feature map according to the resolution level in the two-dimensional conditional label, resulting in a resolution-adapted feature map. The feature map is the output of the backbone network and the neck network of the feature pyramid. Complexity adaptation is used to enhance the boundaries of the corresponding feature map or resolution-adapted feature map according to the geometric complexity level in the two-dimensional conditional label, resulting in a complexity-adapted feature map.

[0014] In the resolution adaptation operation, for high-resolution images, the feature maps corresponding to the high-resolution level in the two-dimensional conditional label are processed for boundary localization to obtain a boundary confidence map representing the boundary region. Based on the boundary confidence map, the boundary direction is estimated within the boundary region, and the boundary direction is discretized into several direction categories to obtain a direction category map. Based on the direction category map, multiple sets of directional convolution, attention processing, or weighted fusion of multiple sets of directional convolution and attention processing are performed on the boundary region to obtain a resolution adaptation feature map. The high-resolution image is the highest level of resolution in the two-dimensional conditional label.

[0015] When training the teacher model in S2, a boundary perception auxiliary branch output loss is applied to the boundary perception auxiliary branch output. The boundary perception auxiliary branch output loss is a weighted sum of the bidirectional contour transport loss and the hierarchical ranking distance field loss. The bidirectional contour transport loss is obtained based on the shortest distance from the predicted boundary to the true boundary and the shortest distance from the true boundary to the predicted boundary. The hierarchical ranking distance field loss is obtained based on the point state zeroing loss and the hierarchical ranking loss.

[0016] In the knowledge distillation process of S2, for each training sample, the teacher model and the student model generate detection category output, instance mask output, and boundary information output respectively, and construct a distillation training objective function formed by weighting the category prediction distillation loss, mask distillation loss, and boundary feature distillation loss to train the distilled student model.

[0017] A farmland plot boundary instance segmentation system, used to implement the above-mentioned farmland plot boundary instance segmentation method, includes:

[0018] The training subset generation module is used to acquire multi-source satellite remote sensing images and corresponding farmland plot vector annotation data. Each satellite remote sensing image is assigned a two-dimensional conditional label, including resolution level and geometric complexity level. A two-dimensional spatial grid is constructed based on the resolution level and complexity level. The sample size of each grid is counted and a corresponding sampling quota is allocated. The multi-source satellite remote sensing images are sampled under two-level constraints according to the sampling quota to obtain the training subset.

[0019] The distilled student model generation module is used to train the teacher model using a training subset to obtain the training teacher network. Knowledge distillation, involving source domain training and target domain transfer, is then performed on the training teacher network and the student model sequentially to obtain the distilled student model. The teacher model includes: the backbone network and feature pyramid neck network of the YOLOv11 network, and a conditional feature adaptation layer located after the backbone network and neck network and before the output layer. The conditional feature adaptation layer is used to adaptively adjust the feature map based on two-dimensional conditional labels, and a boundary-aware auxiliary branch is set in the output layer to output boundary mask maps and boundary distance maps. The student model is a lightweight YOLOv11 network with fewer layers and channels than the standard YOLOv11 network.

[0020] The farmland plot instance segmentation result generation module is used to process the satellite remote sensing image to be segmented using the distilled student model to obtain farmland plot instance segmentation results.

[0021] A farmland plot boundary instance segmentation device includes a processor and a memory, wherein the processor implements the above-described farmland plot boundary instance segmentation method when executing a computer program stored in the memory.

[0022] A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the above-described method for segmenting farmland plot boundaries.

[0023] The beneficial effects of this invention are as follows:

[0024] This invention provides a method for segmenting farmland plot boundaries. First, it assigns two-dimensional conditional labels (resolution and geometric complexity) to multi-source satellite remote sensing images and performs two-level constraint sampling. This effectively addresses the uneven sample distribution problem under complex resolution and geometric scenes, obtaining a training subset. Then, it constructs a teacher model containing a conditional feature adaptation layer and a boundary-aware auxiliary branch. This allows the model to adaptively adjust feature maps based on the two-dimensional labels and enhance its perception and representation capabilities of farmland plot boundaries. Finally, through knowledge distillation of source domain training and target domain transfer, the teacher model's complex scene adaptation capabilities and high-precision boundary segmentation capabilities are transferred to a lightweight student model, achieving a balance between... The model's adaptability to complex scenarios, boundary segmentation accuracy, and lightweight deployment requirements are considered. Finally, the distilled student model is applied to the processing of satellite remote sensing images to be segmented, enabling it to accurately adapt to farmland boundary environments with different resolution and geometric complexity levels. This addresses the challenges of boundary segmentation in complex scenarios and effectively compensates for the poor adaptability and weak boundary perception capabilities of traditional models in farmland boundary environments with complex resolutions and geometries, thereby significantly improving the instance segmentation accuracy of farmland boundaries. At the same time, the lightweight student model also ensures the efficiency of remote sensing image processing and adapts to actual farmland segmentation application scenarios. Attached Figure Description

[0025] The solutions and advantages of this application will become clear to those skilled in the art upon reading the following detailed description of preferred embodiments. The accompanying drawings are for illustrative purposes only and are not intended to limit the scope of the invention.

[0026] In the attached diagram:

[0027] Figure 1 This is a flowchart illustrating the method of this embodiment.

[0028] Figure 2 This is a comparison diagram of the segmentation effect of the method in this embodiment and the prior art in different remote sensing scenarios in the embodiment. Detailed Implementation

[0029] To make the objectives, technical solutions, and advantages of the exemplary embodiments of this application clearer, the technical solutions in the exemplary embodiments of this application are described clearly and completely below. Obviously, the described exemplary embodiments are only some embodiments of this application, and not all embodiments.

[0030] This embodiment provides a multi-level teacher distillation method for dividing farmland plot boundary instances. See [link to relevant documentation]. Figure 1 This includes the following operations:

[0031] S1. Acquire multi-source satellite remote sensing images and corresponding farmland plot vector annotation data. Assign two-dimensional conditional labels to each satellite remote sensing image, including resolution level and geometric complexity level. Construct a two-dimensional spatial grid based on resolution level and complexity level. Count the sample size of each grid and allocate corresponding sampling quotas. Perform two-level constraint sampling on the multi-source satellite remote sensing images according to the sampling quotas to obtain the training subset.

[0032] S2. Train the teacher model using the training subset to obtain the training teacher network; perform source domain training and target domain transfer knowledge distillation on the training teacher network and student model in sequence to obtain the distilled student model.

[0033] S3. Process the satellite remote sensing image to be segmented using the distilled student model to obtain the segmentation results of farmland plot instances;

[0034] The specific steps are detailed below.

[0035] S1. Acquire multi-source satellite remote sensing images and corresponding farmland plot vector annotation data. Assign two-dimensional conditional labels to each satellite remote sensing image, including resolution level and geometric complexity level. Construct a two-dimensional spatial grid based on resolution level and complexity level. Count the sample size of each grid and allocate corresponding sampling quotas. Perform two-level constraint sampling on the multi-source satellite remote sensing images according to the sampling quotas to obtain the training subset.

[0036] In a massive dataset of farmland plots with multiple resolutions and morphologies, an explicit resolution and geometric complexity hierarchy partitioning method is constructed, and constrained sampling is performed accordingly. This method can specifically address the uneven distribution of samples under complex resolution and geometric scenarios, thereby forming a representative and balanced training subset in the two-dimensional space of resolution and complexity. This provides a high-quality sample base that fits the actual boundary environment of complex farmland plots for model training, avoiding insufficient boundary segmentation accuracy due to sample bias.

[0037] First, acquire multi-source satellite remote sensing images and corresponding farmland plot vector annotation data, which contains farmland plot boundary information. To improve training consistency, preprocessing can be performed on the satellite remote sensing images and farmland plot vector annotation data, including format unification, coordinate system / projection calibration, and basic cleaning. These data are then stored in separate databases (image database and annotation database), and a standardized data interface is provided for subsequent use.

[0038] Then, a two-dimensional condition label is calculated and assigned to each satellite remote sensing image, which includes a resolution level and a geometric complexity level.

[0039] The resolution levels are obtained as follows: based on the spatial resolution of satellite remote sensing images, and mapped to complexity levels according to a threshold, they are divided into R1 (low resolution level), R2 (medium resolution level), R3 (medium-high resolution level), and R4 (high resolution level). The mapping formula is as follows:

[0040] ,

[0041] The thresholds {0.5, 2, 5, 10} are used to cover common agricultural remote sensing scenarios ranging from sub-meter to 10m. On different sensors / datasets, the thresholds can be linearly scaled according to the data distribution or equivalently implemented by using quantile partitioning.

[0042] The method for obtaining the geometric complexity level is as follows: based on the vector annotation data of farmland plots in satellite remote sensing images, calculate the geometric complexity features, form a comprehensive complexity index, classify the comprehensive complexity index, and obtain the geometric complexity level.

[0043] Geometric complexity features include at least: plot density, mean perimeter and area ratio, compactness, and boundary length density; after normalizing each geometric complexity feature, a weighted sum is obtained to obtain the comprehensive complexity index.

[0044] Plot density The calculation formula is as follows:

[0045] ,

[0046] This represents the total number of land parcels in the image. This represents the area covered by the land parcel image.

[0047] The formulas for calculating the average ratio of perimeter to area are as follows:

[0048] ,

[0049] This represents the average ratio of the plot's perimeter to its area. For the first Perimeter of the polygon of the plot For the first The polygonal area of ​​each plot of land. This is the compensation term in the denominator.

[0050] Reflecting the compactness of the land parcel The calculation formula is as follows:

[0051] .

[0052] Parcel Boundary Length Density The calculation formula is as follows:

[0053] .

[0054] The overall complexity index is then divided into C1 (low complexity level), C2 (medium complexity level), and C3 (high complexity level). ,in These are the 33% and 66% percentiles of the overall complexity index.

[0055] Finally, a grid (R×C) is constructed in the two-dimensional space corresponding to the resolution level R and the geometric complexity level C. The sample size of each grid is counted, and the corresponding sampling quota of each grid is allocated according to the preset allocation rules. The multi-source satellite remote sensing images are sampled under two-level constraints according to the sampling quota to form a representative training set of about 15k, which is the training subset.

[0056] The two-level constraint sampling includes: for grids with a sample size not exceeding the corresponding sampling quota (i.e., for grids with insufficient samples), all samples are selected; for grids with a sample size exceeding the corresponding sampling quota, the representation vectors of all samples are subjected to K-means clustering to obtain several clusters, where the representation vectors are obtained based on the image features and geometric complexity of the samples; within each cluster, secondary clustering is performed according to the geographic region (coordinates of the center point of the image slice) to which the samples belong, and redundancy removal is performed based on the intra-cluster similarity threshold to constrain the repeated selection of spatially adjacent samples and control geographic diversity; finally, according to the sampling quota of the grid, from the redundancy-removed candidate samples, a corresponding number of samples are selected according to the representativeness within the cluster and the selection order based on the distance from the cluster center from smallest to largest, as the selected training samples for that grid. The training set obtained in this way has a more balanced coverage across different resolution and complexity combinations, reduces redundant samples and training costs, and improves the model's ability to learn "rare scene combinations".

[0057] S2. Train the teacher model using the training subset to obtain the training teacher network; perform source domain training and target domain transfer knowledge distillation on the training teacher network and the student model in sequence to obtain the distilled student model.

[0058] A teacher model with a conditional feature adaptation layer and a boundary-aware auxiliary branch is constructed, which allows the model to adaptively adjust the feature map according to the two-dimensional label and enhance the perception and representation ability of farmland plot boundaries. A unified lightweight student model is trained on the source domain data through resolution and complexity conditional distillation. Under the condition that there is a lack of labeled data in the target area, the model can achieve cross-regional adaptation with the help of pseudo-labels generated by the teacher model and a small number of real labels, so that the student model still has good generalization ability with a significant reduction in the number of parameters.

[0059] First, the teacher model is trained using a training subset to obtain the training teacher network.

[0060] The teacher model includes: the backbone network and neck network of the YOLOv11 network, used to extract multi-scale features; a conditional feature adaptation layer located after the backbone network and neck network and before the output layer, used to adaptively adjust the feature map according to the two-dimensional conditional labels; and a boundary-aware auxiliary branch output in the output layer, used to output boundary mask map and boundary distance map, enhancing the model's ability to depict the details of the plot edges.

[0061] The adaptive adjustments in the conditional feature adaptation layer include resolution adaptation and complexity adaptation. Resolution adaptation is used to perform boundary enhancement or structural enhancement on the corresponding feature map (output of the backbone network and the neck network of the feature pyramid) according to the resolution level in the two-dimensional conditional label, so as to obtain the resolution-adapted feature map. Complexity adaptation is used to perform boundary separability enhancement or internal path denoising on the corresponding feature map (output of the backbone network and the neck network of the feature pyramid) or resolution-adapted feature map according to the geometric complexity level in the two-dimensional conditional label.

[0062] The resolution adaptation process is as follows: For high-resolution images, the feature map corresponding to the high-resolution level (R4) in the 2D conditional label is processed for boundary localization to obtain a boundary confidence map representing the boundary region. Based on the boundary confidence map, the boundary direction is estimated within the boundary region, and the boundary direction is discretized into several direction categories to obtain a direction category map. Based on the direction category map, multiple sets of directional convolutions, attention processing, or weighted fusion of multiple sets of directional convolutions and attention processing are performed on the boundary region to achieve directional enhancement processing of the boundary region, thereby strengthening the feature response along the boundary direction and obtaining a high-resolution adaptation feature map after boundary refinement to improve the boundary segmentation accuracy in high-resolution scenes. For low-resolution images, the feature map corresponding to the low-resolution level (R1) in the 2D conditional label is processed by low-pass filtering and downsampling to suppress the aliasing effect caused by high-frequency components, obtaining an anti-aliasing feature map. The anti-aliasing feature map is then expanded by sparse sampling through dilated convolution and multi-scale pooling to achieve multi-scale context convergence to form a structural aggregation feature map with a large receptive field, which serves as the structurally enhanced resolution adaptation feature map.

[0063] The complexity adaptation process is as follows. For high-complexity images, obtain the resolution-adapted feature map or the boundary confidence map of the feature map (without resolution adaptation processing) corresponding to the high-complexity level (C3) in the 2D conditional label. Calculate the pixel connectivity of the boundary neighborhood based on the boundary confidence map to obtain a neighborhood connectivity gating map. Based on the boundary confidence map and the neighborhood connectivity gating map, assign higher separation weights to instance-separation-related loss terms or features within the boundary neighborhood than to non-boundary regions to enhance the separability of small plots and tortuous boundaries. Perform morphological processing to suppress boundary breaks and reduce the adhesion of adjacent instances under differentiable conditions, thereby strengthening the independent component expression of fragmented small plots and obtaining a complexity-adapted feature map, which facilitates improved instance segmentation accuracy in complex scenes. For low-complexity images, a gated map is obtained for the resolution-adapted feature map or the (unadapted) feature map corresponding to the low-complexity level (C1) in the two-dimensional conditional label. Based on the gated map, channel recalibration is performed on the resolution-adapted feature map or the (unadapted) feature map to enhance the homogeneous response within the plot and suppress false activation of boundary patterns. The channel-recalibrated feature map is then subjected to low-frequency structure aggregation (preferably achieved through pyramid pooling), and the plot area is smoothed to suppress false boundary responses and reduce noise segmentation of large regular plots, thereby improving the regional consistency of the segmentation results in low-complexity scenes and reducing the probability of noise segmentation, thus obtaining a complexity-adapted feature map.

[0064] The features that pass through the conditional feature adaptation layer will enter the multi-head output layer for prediction. The multi-head output layer mainly refers to multi-task output, including conventional object detection and instance segmentation output, as well as boundary-aware output added in this embodiment to enhance boundary details.

[0065] The target detection and instance segmentation outputs are the regular outputs of the YOLOv11 network. They output the bounding boxes and class probabilities of the targets to locate the approximate location and class of the land parcels, and output the corresponding instance masks to characterize the pixel-level regions of each land parcel instance.

[0066] The boundary-aware auxiliary branch output includes a boundary mask map and a boundary distance map predicted through multiple convolutional layers. The boundary mask map is a binary map, the same size as the input image, used to mark pixels belonging to the boundaries of plots, indicating pixels in the image that correspond to the boundary lines of plots (which can be considered the edge contour extraction result of the instance mask). The boundary distance map is a continuous value map used to represent the distance of a pixel to the nearest plot boundary. Each pixel is assigned a distance value representing its distance to the nearest plot boundary (pixels on the boundary have a distance of 0, increasing inwards and outwards). As a continuous value output, the boundary distance map can more finely depict the spatial relationships near the boundaries, constraining the smoothness and topological correctness of the segmentation shape. By focusing on feature learning in local areas of plot edges, the boundary-aware auxiliary branch output helps the teacher model output clear contours and reasonable topological structures even in complex scenarios with adjacent plots and blurred boundaries.

[0067] When training the teacher model, a targeted loss constraint is imposed on the boundary-aware auxiliary branch output, that is, the boundary-aware auxiliary branch output loss is a weighted sum of the bidirectional contour transport loss and the hierarchical sorting distance field loss.

[0068] The bidirectional contour transport loss is based on the shortest distance from the predicted boundary to the true boundary and the shortest distance from the true boundary to the predicted boundary. That is, the shortest distance maps to each other are generated from the probability map of the predicted boundary and the set of true boundary points, respectively. It simultaneously penalizes offset and missed detection in a symmetrical form of "distance cost from the predicted boundary quality to the true boundary + distance cost from the true boundary to the predicted boundary".

[0069] The formula for calculating the bidirectional profile handling loss is as follows:

[0070] ,

[0071] For bidirectional profile handling loss, To predict the boundary probability graph The probability value of the location is the boundary probability distribution output by the boundary-aware auxiliary branch. , These are the closest distance maps from the predicted boundary to the actual boundary and from the actual boundary to the predicted boundary, respectively. The value at that location, For true boundary indication function, if pixel Belongs to the set of real boundary points If the distance is 1, the value is 1; otherwise, it is 0. The distance cost from the predicted boundary to the true boundary point is calculated only, precisely penalizing missed detections at the true boundary. H and W are the height and width of the image, respectively. In the above calculation formula, The higher the probability of a pixel predicting the boundary, the farther it is from the true boundary ( The larger the value, the heavier the penalty, thus specifically constraining the deviation of the predicted boundary; It can realize that the farther the true boundary point is from the predicted boundary ( The larger the value, the heavier the penalty, thus specifically constraining the missed detections at the true boundary. The calculation formula in this embodiment is a symmetrical summation form to achieve the dual constraint of "offset + missed detection", solving the one-sidedness of distance loss in a single direction.

[0072] Among them, the closest distance map from the predicted boundary to the actual boundary. The method for obtaining the distance is as follows: For each pixel (x, y) in the predicted boundary probability map, calculate its Euclidean distance to all points in the true boundary point set, take the minimum value as the distance value of that pixel, and summarize them to obtain the nearest distance map from the predicted boundary to the true boundary, i.e. , , For the true boundary point set The kth point The value of .

[0073] The method for obtaining the nearest distance map from the ground truth boundary to the predicted boundary is as follows: The predicted boundary probability map is thresholded to obtain a binarized predicted boundary mask, and then its boundary point set is extracted as the feature boundary point set. For each pixel (x, y) in the predicted boundary probability map, the minimum Euclidean nearest distance to the feature boundary point set is calculated and used as the distance value of that pixel. These are then combined to obtain the nearest distance map from the ground truth boundary to the predicted boundary. , , For the feature boundary point set The kth point The value of .

[0074] Meanwhile, the hierarchical ranking distance field loss is based on the hierarchical relationship of the distance from pixels within the same instance to the ground truth boundary with spacing, and the baseline constraint that the predicted distance at the ground truth boundary must be 0. It is obtained by weighted fusion of the point state zeroing loss and the hierarchical ranking loss of pixel pairs within the instance with spacing constraints. Specifically, the hierarchical ranking distance field loss is used to calculate the predicted distance map. That is, pixel pairs are sampled within the same instance, and the relative order of the predicted distance map is constrained according to the distance from the ground truth boundary (with spacing constraints). The predicted distance is constrained to be zero at the ground truth boundary, so that the student model learns the geometric hierarchical structure of the distance field rather than point-by-point numerical fitting.

[0075] The formula for calculating the distance field loss in hierarchical sorting is as follows:

[0076] ,

[0077] For hierarchical sorting distance field loss, This is a point-state zero-reduction loss, used to constrain the predicted distance at the true boundary to be 0. The loss function is a hierarchical ranking loss with margin constraints, which constrain the relative order of the predicted distance maps within the same instance to the true distance. Figure 1 To, For hyperparameters, , is the weight of the balance between the zero-reset constraint and the sorting constraint.

[0078] For the real boundary point set To achieve precise distance-to-zero constraint, we employ mean squared error (a classic loss method for regression tasks, known for its stable constraint effect). We calculate the loss only for the true boundary pixels to avoid interference from background pixels. The resulting formula for calculating the point-state-to-zero loss is as follows:

[0079] ,

[0080] To predict the boundary distance points in the graph The value of is the distance map output by the boundary-aware system, used to represent the distance from each pixel to the predicted boundary, which is the target to be constrained. The true boundary distance is the distance between points in the graph. The value is the set of true boundary points from each pixel in the image. Euclidean closest distance, at the true boundary Because of the true boundary The formula can be simplified to the above form, directly constraining the predicted distance to 0. The above calculation formula forces the predicted distance at the true boundary to be 0, ensuring the absolute benchmark accuracy of the distance map.

[0081] Hierarchical sorting loss The core constraint rule is that within the same instance region Internal random sampling pixel pairs Only retain those that satisfy the true distance hierarchy. pixel pairs, i.e. Closer to the real boundary Further from the true boundary, and the distance difference is greater than the interval threshold. And constrain the predicted distance graph to satisfy the same hierarchical relationship. Penalties are imposed on pixel pairs that do not meet the requirements. Hinge Loss is used to implement hierarchical ordering constraints, and linear penalties are applied to samples that do not meet the order. This method has strong training robustness. The calculation formula is as follows:

[0082] ,

[0083] ,

[0084] To satisfy the true hierarchical sorted pixel pair set with spacing constraints, The number of pixel pairs in the set. This is a sorting relaxation factor to avoid meaningless penalties caused by numerical precision issues. , for In the true boundary distance map The value in To constrain the sorting threshold and avoid noise in pixel pairs with excessively close actual distances, sorting is only performed on pixels with a true distance difference greater than [a certain value]. Sorting constraints are applied to the pixel pairs. , for In the predicted boundary distance map The values ​​in the above calculation formula, with the hierarchical sorting of the interval constraint, ensure the rationality of the relative order of the predicted distance map, that is, the "distance relationship" of pixels within the same instance from the real boundary is consistent with the real one, avoiding local sorting disorder in the distance map.

[0085] Next, based on the training subset, knowledge distillation is performed sequentially on the training teacher network and the student model, involving source domain training and target domain transfer, to obtain the distilled student model. The student model is a lightweight YOLOv11 network with fewer layers and channels than the standard YOLOv11 network, thereby reducing the number of parameters and computational cost.

[0086] In the knowledge distillation process, for each training sample, the teacher model and the student model generate detection category output, instance mask output, and boundary information output, respectively. A distillation training objective function is constructed by weighting the category prediction distillation loss, mask distillation loss, and boundary feature distillation loss to train the distilled student model.

[0087] In the knowledge distillation process during source domain training, for each training sample in the training subset, the teacher model generates a set of prediction outputs based on the sample's two-dimensional conditional label. These outputs include: the detected category logits (probability values ​​or raw scores for each category), the segmentation mask prediction for each instance, and the boundary mask / distance map prediction. The student model outputs the corresponding prediction results under the same input.

[0088] In the category prediction distillation loss, for the category probability distribution of each detected predicted bounding box or each pixel location, a two-dimensional conditional label (R,C) conditionalized confusion map interval distillation loss is used. Specifically, it is first divided by resolution level. With complexity level The statistical teacher model's class confusion adjacency relationships form a set of easily confused classes for each two-dimensional conditional label (R,C). During training, distillation is performed only on the teacher's Top-K classes and their easily confused neighbors. The student's class ranking and logit margin within this set are constrained to be consistent with the teacher's, and boundary neighborhood samples are given higher distillation weights. This allows the student to learn a more stable separability structure under different two-dimensional conditional labels (R,C). KL divergence or mean squared error loss is used to make the student model's class logits distribution approximate the teacher model's output distribution. Compared to training with only hard labels (0 / 1 classification results), this allows the student to learn the teacher's differences in confidence levels among different classes.

[0089] In mask distillation loss, the instance mask output by the teacher model is treated as a soft label, allowing the corresponding mask prediction of the student model to approximate the teacher's output. A two-dimensional conditional label (R,C) conditional boundary band consistency mask distillation loss is employed. Specifically, the teacher instance mask is softened to obtain a probability map, and a boundary band weight map is generated based on the teacher's boundary. Within the boundary band, higher weights are used to constrain the consistency between the student and teacher masks. Simultaneously, morphological soft dilation / soft erosion is performed on the teacher and student masks to obtain two layers of "band-like regions," constraining the consistency of boundary expansion and contraction, respectively. This allows the student to not only approximate the overall shape but also explicitly inherit the teacher's boundary thickness and contour stability, adaptively adjusting the boundary bandwidth and weights under different two-dimensional conditional label (R,C) conditions. Pixel-wise binary cross-entropy loss or Dice loss can be used to ensure that the student's segmentation result closely approximates the teacher's result in shape. Due to the high performance of the teacher model, its output can be considered an enhanced pseudo-label, allowing the student to acquire better boundary detail representation capabilities through imitation than by directly learning the original labels.

[0090] In boundary feature distillation loss, if the student model also has similar boundary outputs (such as edge masks or other edge features), the loss can be calculated on the teacher's boundary output and the student's output, allowing the student to learn the teacher's strengths in edge detection. If the student model does not have independent boundary branches, it is also possible to strengthen the gradient of the student mask output through the teacher's boundary information during the distillation process (for example, by giving stronger constraints to the pixels of the student mask in the teacher's boundary region).

[0091] The losses from the three components mentioned above are weighted appropriately to form the total distillation loss of the student model. The student network is trained by minimizing this loss, allowing it to inherit the discriminative ability of the teacher model in different scenarios to the greatest extent possible while significantly reducing the number of parameters. Specifically, since the teacher model has different output characteristics for different two-dimensional conditional label (R,C) conditional samples, the student model, through conditional distillation, essentially "broadens its knowledge," learning processing techniques for various situations such as high-resolution complex plots and low-resolution fragmented plots, thus improving its adaptability under broad-spectrum conditions.

[0092] In the knowledge distillation process of target domain transfer, the teacher model parameters are frozen, the teacher model is applied to the target region image to generate pseudo-labels containing instance segmentation results and boundary information, and the student model is trained based on the pseudo-labels.

[0093] The pseudo-labels include the segmentation results of farmland plot instances and boundary information on each image, which are the prediction outputs of the teacher model on the target image and are regarded as automatic annotations of the target region.

[0094] To ensure the quality of pseudo-labels, the following strategies can be adopted to control them during training.

[0095] A confidence threshold filtering method is used to retain only plot instances predicted with high confidence (confidence greater than the confidence threshold) by the teacher's model as pseudo-labels. Specifically, for each plot output by the teacher, a threshold can be set using the confidence score of the detection branch or the consistency evaluation index of the boundary branch to eliminate low-confidence predictions. This reduces the propagation of regional errors that the teacher is uncertain about to the students.

[0096] Shape and topology validation involves simple post-processing of the plot boundaries output by the teacher, such as removing extremely small (area less than the area threshold) isolated fragments, smoothing jagged edges, and ensuring that the polygons of adjacent plots do not have obvious overlaps or gaps.

[0097] Balanced sampling of pseudo-labels: If the amount of data in the target area is very large (the amount of data is greater than the data threshold), pseudo-labels generated by teachers can be sampled to ensure that pseudo-labels of different plot types and different image conditions are covered, and to avoid repeatedly labeling too many similar samples, thereby improving the effectiveness of students' transfer training.

[0098] After obtaining a high-quality pseudo-label set, it is merged with a small number of manually labeled samples (if any) of the target region to form a training set for transfer learning. This data is then used to retrain (fine-tune) the student model. During training, a loss strategy similar to that used for the source domain can be employed, but the "teacher guidance" here becomes the pseudo-labels themselves. That is, the pseudo-segmentation results generated by the teacher model in the target domain are treated as ground truth to supervise the student model's learning. Simultaneously, if there is real-labeled data, normal supervised loss is applied to these samples. Since pseudo-labels may be less accurate than real-labels, real-labeled samples can be given higher weights during training to guide the student model in correcting potential teacher biases. After a certain number of fine-tuning iterations, the student model's parameters will adapt to the data distribution of the target region, and without large-scale reliance on manual annotation, its segmentation performance in the new region can quickly approach the teacher model's performance in that region.

[0099] S3. Process the satellite remote sensing image to be segmented using the distilled student model to obtain the segmentation results of farmland plot instances. To verify the segmentation effect of the method in this embodiment, the segmentation effect of this embodiment is compared with that of the existing technologies yolo11n-seg and DelAny models in three scenarios: extremely high resolution scene, extremely low resolution scene, and scene with extremely complex geometry. See [link to relevant documentation]. Figure 2 It can be clearly seen that the method of this embodiment (corresponding to) Figure 2 The method of the present invention has higher instance segmentation accuracy.

[0100] This embodiment also provides a multi-level teacher distillation farmland parcel boundary instance segmentation system to implement the above-mentioned multi-level teacher distillation farmland parcel boundary instance segmentation method, including:

[0101] The training subset generation module is used to acquire multi-source satellite remote sensing images and corresponding farmland plot vector annotation data. Each satellite remote sensing image is assigned a two-dimensional conditional label, including resolution level and geometric complexity level. A two-dimensional spatial grid is constructed based on the resolution level and complexity level. The sample size of each grid is counted and a corresponding sampling quota is allocated. The multi-source satellite remote sensing images are sampled under two-level constraints according to the sampling quota to obtain the training subset.

[0102] The distilled student model generation module is used to train the teacher model using a training subset to obtain the training teacher network. Knowledge distillation, involving source domain training and target domain transfer, is then performed on the training teacher network and the student model sequentially to obtain the distilled student model. The teacher model includes: the backbone network and feature pyramid neck network of the YOLOv11 network, and a conditional feature adaptation layer located after the backbone network and neck network and before the output layer. The conditional feature adaptation layer is used to adaptively adjust the feature map based on two-dimensional conditional labels, and a boundary-aware auxiliary branch is set in the output layer to output boundary mask maps and boundary distance maps. The student model is a lightweight YOLOv11 network with fewer layers and channels than the standard YOLOv11 network.

[0103] The farmland plot instance segmentation result generation module is used to process the satellite remote sensing image to be segmented using the distilled student model to obtain farmland plot instance segmentation results.

[0104] This embodiment also provides a multi-level teacher distillation farmland plot boundary instance segmentation device, including a processor and a memory, wherein the processor executes the computer program stored in the memory to implement the above-described multi-level teacher distillation farmland plot boundary instance segmentation method.

[0105] This embodiment also provides a computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the above-described multi-level teacher distillation method for dividing farmland plot boundaries.

[0106] This embodiment provides a multi-level teacher distillation method for farmland parcel boundary instance segmentation. First, it assigns two-dimensional conditional labels (resolution and geometric complexity) to multi-source satellite remote sensing images and performs two-level constrained sampling. This effectively addresses the uneven sample distribution problem under complex resolution and geometric scenes, obtaining a training subset. Then, it constructs a teacher model containing a conditional feature adaptation layer and a boundary-aware auxiliary branch. This allows the model to adaptively adjust feature maps based on the two-dimensional labels and enhance its perception and representation capabilities of farmland parcel boundaries. Finally, through knowledge distillation of source domain training and target domain transfer, the complex scene adaptation capabilities and high-precision boundary segmentation capabilities of the teacher model are transferred to a lightweight student model. The model balances adaptability to complex scenarios, accuracy in boundary segmentation, and lightweight deployment requirements. Finally, the distilled student model is applied to the processing of satellite remote sensing images to be segmented, enabling it to accurately adapt to farmland boundary environments with different resolution and geometric complexity levels. This specifically addresses the challenges of boundary segmentation in complex scenarios, effectively compensating for the poor adaptability and weak boundary perception capabilities of traditional models in complex resolution and geometric farmland boundary environments, thus significantly improving the instance segmentation accuracy of farmland boundaries. Simultaneously, the lightweight student model ensures efficient remote sensing image processing, adapting to practical farmland segmentation application scenarios.

[0107] This embodiment provides a multi-level teacher distillation method for farmland plot boundary instance segmentation. At the data level, it introduces resolution hierarchy partitioning and geometric complexity index calculation, constructing a two-dimensional hierarchical grid of resolution and complexity. Based on this, constrained sampling is performed, selecting only representative training samples from massive datasets. Compared to simple random sampling or region-balanced sampling methods, this approach significantly reduces the amount of training data while still covering various resolution and plot morphology combinations, eliminating redundant samples and invalid calculations. This reduces the cost of data annotation and model training, and provides a more comprehensive and balanced sample distribution foundation for model training, helping to improve the model's learning performance in rare scenarios.

[0108] This embodiment provides a multi-level teacher distillation method for farmland plot boundary instance segmentation. Based on the YOLOv11 shared backbone network, it adds resolution and complexity adaptation modules. By introducing (R, C) conditional vectors to control different feature paths, it achieves differentiated modeling for high-resolution detailed scenes and low-resolution complex plot scenes. Samples at different levels follow different or weighted "expert paths" in the network, avoiding the limitations of traditional single-structure "one set of parameters for all" approaches. This allows the model to maintain stable boundary segmentation performance and robustness in various scenarios, including high / low resolution and simple / complex plots.

[0109] This embodiment provides a multi-level teacher distillation method for farmland parcel boundary instance segmentation. A boundary-aware branch is specifically designed in the teacher model, employing a multi-task mechanism including edge feature extraction, outputting binary boundary maps and distance maps to focus on learning the parcel edge regions, and assigning higher weights to these regions in the loss function. Compared to instance segmentation methods that only use ordinary masks as output targets, this scheme can better identify the true contours of elongated parcels, adjacent parcels, and irregularly shaped parcels, effectively alleviating problems such as parcel adhesion, blurred boundaries, and local breaks. The output parcel polygons are more accurate and consistent in geometry and topology, providing a high-quality data foundation for subsequent vectorization processing and area calculation.

[0110] This embodiment provides a multi-level teacher distillation method for farmland plot boundary instance segmentation. The proposed knowledge distillation framework, based on resolution-complexity conditions, no longer uses a single teacher model to uniformly distill all samples. Instead, it selectively transmits knowledge from different teacher "expert paths" according to the (R, C) level of each image, performing multi-level distillation of logits, masks, and boundary features on the student model. On the one hand, while significantly compressing network parameters and reducing inference overhead, the lightweight student model still inherits the discriminative ability of the teacher model in complex boundary scenarios as much as possible. On the other hand, the adaptability of the student model to data with different resolutions and complexities is also enhanced, achieving a better balance between "small model" and "high accuracy."

[0111] This embodiment provides a multi-level teacher distillation method for farmland plot boundary instance segmentation. It proposes a training scheme based on pseudo-label generation from a teacher model and cross-regional distillation for target regions (such as farmland in different countries or different provinces within China). For data domains lacking large-scale annotations, the system uses a multi-teacher model to automatically generate pseudo-labels containing plot boundary information, and combines this with a small number of real-labeled samples for secondary training of a lightweight student model. The result allows the student model to quickly approach the segmentation performance of the teacher model in new regions. Compared to traditional methods that rely solely on fine-tuning or re-labeling, this scheme significantly reduces the workload and time cost of manual annotation, while improving the model's generalization ability across different countries and cropping systems, making widespread application possible.

[0112] While exemplary embodiments of the invention have been described herein, many other variations or modifications conforming to the principles of the invention can be directly determined or derived from the disclosure of this invention without departing from its spirit and scope. Therefore, the scope of the invention should be understood and recognized to cover all such other variations or modifications.

Claims

1. A method for segmenting farmland plot boundaries, characterized in that, This includes the following operations: S1. Acquire multi-source satellite remote sensing images and corresponding farmland plot vector annotation data, and assign two-dimensional conditional labels to each satellite remote sensing image, including resolution level and geometric complexity level; A two-dimensional spatial grid is constructed based on resolution and complexity levels. The sample size of each grid is counted and a corresponding sampling quota is allocated. Multi-source satellite remote sensing images are sampled under two-level constraints according to the sampling quota to obtain a training subset. S2. Train the teacher model using the training subset to obtain the training teacher network; For the training teacher network and student model, knowledge distillation is performed sequentially by source domain training and target domain transfer to obtain the distilled student model; The teacher model includes: the backbone network and neck network of the YOLOv11 network, and a conditional feature adaptation layer located after the backbone network and neck network and before the output layer. The conditional feature adaptation layer is used to adaptively adjust the feature map according to the two-dimensional conditional labels, and sets a boundary-aware auxiliary branch output in the output layer to output boundary mask map and boundary distance map. Adaptive adjustment includes resolution adaptation and complexity adaptation. Resolution adaptation is used to perform boundary enhancement or structural enhancement on the corresponding feature map according to the resolution level in the two-dimensional conditional labels to obtain a resolution-adapted feature map. The feature map is the output of the backbone network and the neck network of the feature pyramid. Complexity adaptation is used to perform boundary separability enhancement or internal path denoising on the corresponding feature map or resolution-adapted feature map according to the geometric complexity level in the two-dimensional conditional labels to obtain a complexity-adapted feature map. The student model is a lightweight YOLOv11 network with fewer layers and channels than the YOLOv11 network. S3. Use the distilled student model to process the satellite remote sensing image to be segmented to obtain the segmentation results of farmland plot instances.

2. The method for segmenting farmland plot boundaries according to claim 1, characterized in that, The two-level constraint sampling in S1 includes: For grids with a sample size not exceeding the corresponding sampling quota, all samples are selected. For grids with more samples than the corresponding sampling quota, K-means clustering is performed on the representation vectors of all samples to obtain several clusters. Within each cluster, the samples are clustered according to their geographical regions, and redundancy removal is performed based on the intra-cluster similarity threshold. According to the grid's sampling quota, a corresponding number of samples are selected from the redundancy-removed candidate samples in ascending order of distance from the cluster center, and these samples are used as the grid's training samples.

3. The method for segmenting farmland plot boundaries according to claim 1, characterized in that, The method for obtaining the geometric complexity level in S1 is as follows: Based on the vector annotation data of farmland plots from satellite remote sensing images, geometric complexity characteristics are calculated to form a comprehensive complexity index. The comprehensive complexity index is then categorized to obtain the geometric complexity level. Geometric complexity features include at least: plot density, mean perimeter and area ratio, compactness, and boundary length density; after normalizing each geometric complexity feature, a weighted sum is obtained to obtain the comprehensive complexity index.

4. The method for segmenting farmland plot boundaries according to claim 1, characterized in that, In the resolution adaptation operation, for high-resolution images, the feature maps corresponding to the high-resolution level in the two-dimensional conditional label are processed for boundary localization to obtain a boundary confidence map representing the boundary region. Based on the boundary confidence map, the boundary direction is estimated within the boundary region, and the boundary direction is discretized into several direction categories to obtain a direction category map. Based on the direction category map, multiple sets of directional convolution, attention processing, or weighted fusion of multiple sets of directional convolution and attention processing are performed on the boundary region to obtain a resolution adaptation feature map. The high-resolution image is the highest level of resolution in the two-dimensional conditional label.

5. The method for segmenting farmland plot boundaries according to claim 1, characterized in that, When training the teacher model in S2, a boundary perception auxiliary branch output loss is applied to the boundary perception auxiliary branch output. The boundary perception auxiliary branch output loss is a weighted sum of the bidirectional contour transport loss and the hierarchical ranking distance field loss. The bidirectional contour transport loss is obtained based on the shortest distance from the predicted boundary to the true boundary and the shortest distance from the true boundary to the predicted boundary. The hierarchical ranking distance field loss is obtained based on the point state zeroing loss and the hierarchical ranking loss.

6. The method for segmenting farmland plot boundaries according to claim 1, characterized in that, In the knowledge distillation process of S2, for each training sample, the teacher model and the student model generate detection category output, instance mask output, and boundary information output respectively, and construct a distillation training objective function formed by weighting the category prediction distillation loss, mask distillation loss, and boundary feature distillation loss to train the distilled student model.

7. A farmland plot boundary instance segmentation system, used to implement the farmland plot boundary instance segmentation method of claim 1, characterized in that, include: The training subset generation module is used to acquire multi-source satellite remote sensing images and corresponding farmland plot vector annotation data, and assign two-dimensional conditional labels to each satellite remote sensing image, including resolution level and geometric complexity level; A two-dimensional spatial grid is constructed based on resolution and complexity levels. The sample size of each grid is counted and a corresponding sampling quota is allocated. Multi-source satellite remote sensing images are sampled under two-level constraints according to the sampling quota to obtain a training subset. The distilled student model generation module is used to train the teacher model using a training subset to obtain the training teacher network; For the training teacher network and student model, knowledge distillation is performed sequentially for source domain training and target domain transfer to obtain the distilled student model. The teacher model includes: the backbone network and feature pyramid neck network of the YOLOv11 network, and a conditional feature adaptation layer located after the backbone network and neck network and before the output layer. The conditional feature adaptation layer is used to adaptively adjust the feature map according to the two-dimensional conditional label, and a boundary-aware auxiliary branch is set in the output layer to output the boundary mask map and boundary distance map. The student model is a lightweight YOLOv11 network with fewer network layers and channels than the YOLOv11 network. The farmland plot instance segmentation result generation module is used to process the satellite remote sensing image to be segmented using the distilled student model to obtain farmland plot instance segmentation results.

8. A device for segmenting farmland plot boundaries, characterized in that, It includes a processor and a memory, wherein the processor executes a computer program stored in the memory to implement the farmland plot boundary instance segmentation method as described in any one of claims 1-6.

9. A computer-readable storage medium, characterized in that, Used to store a computer program, wherein the computer program, when executed by a processor, implements the farmland plot boundary instance segmentation method as described in any one of claims 1-6.