Crop remote sensing image segmentation method based on convolutional neural network

By using a crop remote sensing image segmentation method based on convolutional neural networks, combined with topographic and spectral correction and growth gradient dual-branch decoding, the problem of insufficient segmentation accuracy in fragmented farmland scenarios in hilly and foggy mountainous areas of southern China was solved, achieving high-precision field boundary positioning and crop monitoring.

CN122244679APending Publication Date: 2026-06-19HENAN BAITUO INFORMATION TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HENAN BAITUO INFORMATION TECHNOLOGY CO LTD
Filing Date
2026-03-31
Publication Date
2026-06-19

Smart Images

  • Figure CN122244679A_ABST
    Figure CN122244679A_ABST
Patent Text Reader

Abstract

This invention relates to the fields of agricultural remote sensing monitoring and digital image processing, and particularly to a crop remote sensing image segmentation method based on convolutional neural networks, comprising the following steps: S1, input and preprocessing of remote sensing images and DEM data; S2, topographic and spectral correction; S3, terrain-adaptive convolutional coding; S4, growth gradient dual-branch decoding; S5, output of topologically constrained segmentation results. This invention establishes a joint topographic and spectral correction mechanism, simultaneously achieving geometric inverse correction of topographic distortion, elevation-layered spectral calibration, topographically constrained cloud and fog repair, and adaptive spectral compensation. It eliminates errors caused by topographic undulations and cloud / fog interference from both geometric and spectral dimensions, effectively reducing field boundary misalignment rates and improving the classification accuracy of foggy areas.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of agricultural remote sensing monitoring and digital image processing, and in particular to a crop remote sensing image segmentation method based on convolutional neural networks. Background Technology

[0002] The hilly and mountainous areas of southern my country are one of the core production areas for grain crops such as rice and corn. The arable land in this region is characterized by fragmented plots, scattered distribution, significant topographic relief, substantial differences in local water and fertilizer conditions, and frequent cloudy and foggy weather throughout the year. Among these, fragmented arable land with an area of ​​less than 100 square meters accounts for more than 40%, constituting a crucial component of the region's arable land resources. Accurate identification and dynamic monitoring of this fragmented land are the core foundation for implementing arable land protection policies, promoting the construction of high-standard farmland, conducting crop growth and yield estimation, and ensuring food security.

[0003] Existing remote sensing image-based farmland segmentation technologies have achieved high recognition accuracy in regular farmland areas of plains. However, in fragmented farmland scenarios in hilly and foggy mountainous areas of southern China, the following unique technical challenges lead to severely insufficient segmentation accuracy: The hilly and mountainous terrain is highly undulating, and during remote sensing imaging, steep slopes experience severe perspective stretching distortion, causing pixel misalignment and field boundary shifts. Existing terrain correction methods mostly employ radiometric correction schemes such as cosine correction and C-correction, which can only eliminate illumination radiation deviations caused by terrain, but cannot solve the pixel misalignment problem caused by geometric distortion. Furthermore, general semantic segmentation models use fixed rectangular convolution kernels, which cannot adapt to terrain deformation, resulting in severely distorted extracted crop features and ultimately large errors in field boundary localization. Therefore, this application proposes a crop remote sensing image segmentation method based on convolutional neural networks. Summary of the Invention

[0004] The purpose of this invention is to address the shortcomings of existing technologies by proposing a crop remote sensing image segmentation method based on convolutional neural networks.

[0005] To achieve the above objectives, the present invention adopts the following technical solution:

[0006] The crop remote sensing image segmentation method based on convolutional neural networks includes the following steps:

[0007] S1. Input and preprocessing of remote sensing images and DEM data: Collect multispectral remote sensing images of the target area in hilly and mountainous areas, acquire grid data of four bands: red (R), green (G), blue (B), and near-infrared (NIR), and simultaneously collect 30m resolution DEM digital elevation data of the corresponding range of the target area. Complete pixel-by-pixel registration, radiometric calibration, and atmospheric correction preprocessing of remote sensing images and DEM data.

[0008] S2. Topographic and spectral correction: Based on the registered remote sensing images and DEM data, the following steps are performed in sequence: reverse geometric correction of topographic distortion, elevation layer spectral consistency calibration, cloud and fog obscuration area repair, and cloud and fog residual spectrum adaptive compensation, generating a 6-channel topographic and spectral fusion feature map without geometric distortion and spectral bias.

[0009] S3. Terrain-adaptive convolutional coding performs terrain-guided deformable convolutional feature extraction, elevation hierarchical pooling aggregation, and channel attention weighted optimization on the fused feature map, simultaneously extracting deep semantic features of crops and NDVI growth gradient features, and outputting multi-scale encoded feature maps.

[0010] S4. Growth gradient dual-branch decoding: Construct a dual-path decoding structure with a main branch for category segmentation and an auxiliary branch for growth gradient. Perform synchronous upsampling and feature fusion on the multi-scale encoded feature map. Through dual-branch cross-supervision constraints, eliminate pixel confusion and field boundary offset caused by uneven crop growth, and output the optimized dual-branch fused segmentation feature map.

[0011] S5. Output the topological constraint segmentation results. Perform pixel classification normalization and connected component extraction on the dual-branch fusion segmentation feature map. Based on the fragmented farmland topological constraint rules, complete the connected component verification and false target removal, and output the final farmland crop segmentation mask results.

[0012] Preferably, in step S1, the remote sensing image is a UAV multispectral remote sensing image with a spatial resolution ≤0.5m, and the image size is uniformly normalized to 512×512 pixels;

[0013] The DEM data was resampled to the same spatial resolution as the remote sensing image through bicubic interpolation, and the geometric root mean square error between the two was controlled within 1 pixel after registration.

[0014] Preferably, the joint correction of terrain and spectrum in step S2 is specifically performed through the following sub-steps: S201 Terrain parameter calculation: Based on the registered DEM data, the slope Slp and aspect Asp of the target area are calculated pixel by pixel, wherein the slope Slp ranges from 0° to 90° and the aspect Asp ranges from 0° to 360°, and a pixel-by-pixel terrain distortion matrix M is generated. In the formula For pixel coordinates, Represents pixel coordinates The topographic distortion coefficient at that location, This indicates the slope of that coordinate. This is the distortion correction baseline coefficient, with a value of 1.0;

[0015] S202, Reverse correction of terrain distortion: Based on the terrain distortion matrix M, reverse coordinate offset correction is performed pixel by pixel on the remote sensing data of the four bands R, G, B, and NIR.

[0016] S203. Elevation-stratified spectral consistency calibration: The target area is divided into three levels according to the DEM elevation: 0-200m low-altitude gentle slope layer, 200-400m mid-altitude gentle slope layer, and 400-600m high-altitude steep slope layer. For the corrected four-band data in each level, the mean and variance within the band are standardized to eliminate the spectral radiation deviation caused by the difference in illumination at different elevations.

[0017] S204. Cloud and fog mask generation and occlusion area marking: Generate a binary cloud and fog mask based on the spectral ratio of near-infrared and blue light bands.

[0018] S205. Terrain-constrained cloud and fog occlusion repair: For the marked cloud and fog occluded pixels, select the 3×3 neighborhood of non-occluded effective pixels within the same elevation and slope direction, and fill and repair the four-band values ​​of the occluded pixels using the inverse distance weighting method to complete the initial cloud and fog removal.

[0019] S206. Adaptive compensation for residual cloud and fog spectra: Extract the four-band spectral mean of pure crop pixels within each elevation layer as a reference value. Spectral residual deviation compensation is performed on the pixels after initial repair. The compensation algorithm formula is as follows: In the formula, To compensate for the pixel values ​​in the later bands, These are the original band pixel values ​​after preliminary repair. For adaptive compensation coefficients;

[0020] S207. Generation of fused topographic and spectral feature map: The compensated and corrected R, G, B, and NIR four-band data are stitched together with the pixel-by-pixel slope Slp and aspect Asp data to generate a 6-channel fused topographic and spectral feature map.

[0021] Preferably, the adaptive compensation coefficient in step S206 Calculate using the following formula: Among them, The base compensation coefficient is set at 0.7. The per-pixel cloud / fog persistence value ranges from 0 to 1, and its calculation formula is as follows: In the formula, This is the near-infrared band value after the current pixel has been repaired. This represents the near-infrared average value of pure crop pixels within the corresponding elevation layer.

[0022] Preferably, the terrain-adaptive convolutional coding in step S3 is specifically performed through the following sub-steps: S301, terrain-guided deformable convolutional feature extraction: input the 6-channel terrain and spectral fusion feature map into the coding network, use 3×3 deformable convolution as the basic convolutional unit, and subject the offset of the deformable convolution to terrain constraints;

[0023] S302, Elevation-level Layered Pooling Feature Aggregation: The feature map output by the convolution is masked according to the three elevation levels in step S203. A 3×3 fusion pooling operation is performed independently within each level. The fusion pooling is a fusion of average pooling and max pooling with a weight of 1:1 and a pooling step size of 2. Cross-regional feature aggregation is not performed between different elevation levels, and the independent semantic features of the fragmented fields are preserved.

[0024] S303, Channel Attention Weighted Optimization: For the feature map after hierarchical pooling, the SE channel attention module is used to learn the weight coefficients of each feature channel, strengthen the high weight channels corresponding to crop spectrum and growth characteristics, suppress the low weight channels corresponding to soil, rock and water background, and output the optimized multi-scale deep features.

[0025] S304, NDVI growth gradient feature extraction: The pixel-wise normalized vegetation index (NDVI) is calculated based on the corrected remote sensing image. The gradient magnitude of the NDVI image is calculated using the 3×3 Sobel operator and used as the growth gradient feature. This feature is then concatenated with the deep semantic features to output the final multi-scale coded feature map.

[0026] Preferably, the growth gradient dual-branch decoding in step S4 is specifically performed through the following sub-steps: S401 Class segmentation main branch decoding: with the goal of binary classification of crops and background, four layer-by-layer deconvolution upsampling is performed on the multi-scale encoded feature map. During each upsampling process, the shallow edge features of the corresponding scale in the encoding stage are fused through skip connections. Finally, a pixel-level crop and background classification probability map with the same size as the input image is output, and the value range is 0~1.

[0027] S402 Growth Gradient Assisted Branch Decoding: With the goal of field boundary identification, it simultaneously performs four same-scale deconvolution upsampling operations on the multi-scale encoded feature map to specifically extract the abrupt change features of the NDVI growth gradient, and finally outputs a growth boundary probability map with the same size as the input image, with a value range of 0~1.

[0028] S403 Dual-Branch Cross-Supervision and Boundary Alignment: Construct a joint loss function to perform end-to-end supervised training on dual branches, forcing the segmentation boundary of the main branch to be pixel-level aligned with the growth mutation boundary of the auxiliary branch, eliminating pixel confusion caused by growth differences within the same field, and boundary adhesion caused by similar growth between adjacent fields.

[0029] S404 Dual-Branch Feature Fusion: The classification probability map output by the main branch of category segmentation and the boundary probability map output by the growth gradient auxiliary branch are weighted and fused with a weight ratio of 0.7:0.3 to generate an optimized dual-branch fused segmentation feature map.

[0030] Preferably, the topology constraint segmentation result output in step S5 is specifically executed through the following sub-steps:

[0031] S501, Pixel Classification Normalization: Perform softmax normalization on the dual-branch fusion segmentation feature map, mapping pixel values ​​to classification probabilities of 0 to 1. With 0.5 as the classification threshold, pixels with a probability ≥ 0.5 are marked as crop and cultivated land pixels, and pixels with a probability < 0.5 are marked as background pixels, generating preliminary binary segmentation results.

[0032] S502. Connected Component Extraction and Topology Verification: Perform 8-neighborhood connected component analysis on the preliminary results of binary segmentation to extract all independent connected components, and verify each connected component based on the topology constraint rules of fragmented farmland.

[0033] S503, False Target Removal and Result Output: Retain all connected components that satisfy the topological constraint rules, remove false targets such as noise, field ridges, roads, and rocks that do not satisfy the rules, and generate the final farmland crop segmentation mask result.

[0034] Preferably, the topological constraint rules for the broken farmland in step 502 are as follows: the area of ​​connected domain pixels is ≥16 pixels; the aspect ratio of the bounding rectangle of the connected domain is ≤4:1; the convexity of the connected domain is ≥0.6; and the average magnitude of the growth gradient within the connected domain is ≤0.1.

[0035] The present invention has the following beneficial effects:

[0036] 1. This invention establishes a joint correction of topography and spectral data, simultaneously achieving geometric reverse correction of topographic distortion, elevation-layered spectral calibration, cloud and fog repair under topographic constraints, and adaptive spectral compensation. It eliminates errors caused by topographic undulations and cloud and fog interference from both geometric and spectral dimensions, effectively reducing the misalignment rate of field boundaries and improving the classification accuracy of cloud and fog areas.

[0037] 2. By cross-supervision between the main branch of category segmentation and the auxiliary branch of growth gradient, the crop growth boundary is used as a strong constraint for segmentation, which solves the problems of field confusion, missed detection, and adhesion caused by uneven crop growth.

[0038] 3. By employing the two techniques mentioned above, while eliminating errors caused by terrain and fog, the characteristics of crop growth boundaries are enhanced. For fragmented farmland scenarios in hilly and foggy mountainous areas of southern China, the overall segmentation MIoU ratio, extraction, and positioning accuracy are greatly improved, effectively meeting the precise monitoring needs of complex farmland scenarios in hilly and mountainous areas of southern China. Attached Figure Description

[0039] Figure 1 This is a flowchart illustrating the crop remote sensing image segmentation method based on convolutional neural networks proposed in this invention.

[0040] Figure 2 This is a schematic diagram of the growth gradient dual-branch decoding and cross-supervision operation logic structure in this invention.

[0041] Figure 3 This is a multispectral remote sensing image of the crop remote sensing image segmentation method based on convolutional neural networks proposed in this invention.

[0042] Figure 4 This is a topographic and spectral fusion feature map of the crop remote sensing image segmentation method based on convolutional neural networks proposed in this invention. Detailed Implementation

[0043] The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments.

[0044] Reference Figure 1-4 A crop remote sensing image segmentation method based on convolutional neural networks includes the following steps:

[0045] S1. Input and preprocessing of remote sensing images and DEM data: Collect multispectral remote sensing images of the target area in hilly and mountainous areas, acquire grid data of four bands: red (R), green (G), blue (B), and near-infrared (NIR), and simultaneously collect 30m resolution DEM digital elevation data of the corresponding range of the target area. Complete pixel-by-pixel registration, radiometric calibration, and atmospheric correction preprocessing of remote sensing images and DEM data.

[0046] S2. Topographic and spectral correction: Based on the registered remote sensing images and DEM data, the following steps are performed in sequence: reverse geometric correction of topographic distortion, elevation layer spectral consistency calibration, cloud and fog obscuration area repair, and cloud and fog residual spectrum adaptive compensation, generating a 6-channel topographic and spectral fusion feature map without geometric distortion and spectral bias.

[0047] Joint topographic and spectral correction is performed through the following sub-steps: S201 Topographic parameter calculation: Based on the registered DEM data, the slope Slp and aspect Asp of the target area are calculated pixel by pixel, where the slope Slp ranges from 0° to 90° and the aspect Asp ranges from 0° to 360°, and a pixel-by-pixel topographic distortion matrix M is generated. In the formula For pixel coordinates, Represents pixel coordinates The topographic distortion coefficient at the location ranges from 0 to 1. This indicates the slope of that coordinate. This is the distortion correction baseline coefficient, with a value of 1.0;

[0048] S202. Inverse correction of terrain distortion: Based on the terrain distortion matrix M, inverse coordinate offset correction is performed pixel-by-pixel on the R, G, B, and NIR band remote sensing data. Specifically, the correction algorithm formula is as follows:

[0049] In the formula, These are the original pixel coordinates. For the corrected pixel coordinates, The unit normal vector corresponding to the slope aspect is used to offset the perspective stretching caused by terrain undulations by reverse offset, thus completely eliminating pixel stretching misalignment and field boundary offset caused by hilly terrain from a geometric perspective.

[0050] S203. Elevation-stratified spectral consistency calibration: The target area is divided into three levels according to the DEM elevation: 0-200m low-altitude gentle slope layer, 200-400m mid-altitude gentle slope layer, and 400-600m high-altitude steep slope layer. For the corrected four-band data in each level, the mean and variance within the band are standardized to eliminate the spectral radiation deviation caused by the difference in illumination at different elevations.

[0051] S204. Cloud and Fog Mask Generation and Occlusion Area Marking: Based on the spectral ratio of the near-infrared and blue light bands, a binary cloud and fog mask is generated; the binary cloud and fog mask is generated according to the following formula:

[0052] In the formula This is the binary mask value for the cloud / fog at pixel coordinates (i,j), which can only be 0 or 1. pixel coordinates The near-infrared band remote sensing reflectance value at the location ranges from 0 to 1. pixel coordinates The remote sensing reflectance value in the blue light band at the location, with a value range of 0 to 1; =1 indicates a pixel obscured by clouds or fog. =0 indicates a valid pixel that is not occluded;

[0053] S205. Terrain-constrained cloud and fog occlusion repair: For the marked cloud and fog occluded pixels, select the 3×3 neighborhood of non-occluded effective pixels within the same elevation and slope direction, and fill and repair the four-band values ​​of the occluded pixels using the inverse distance weighting method to complete the initial cloud and fog removal.

[0054] S206. Adaptive compensation for residual cloud and fog spectra: Extract the four-band spectral mean of pure crop pixels within each elevation layer as a reference value. Spectral residual deviation compensation is performed on the pixels after initial repair. The compensation algorithm formula is as follows: In the formula, To compensate for the pixel values ​​in the later bands, These are the original band pixel values ​​after preliminary repair. For adaptive compensation coefficients;

[0055] Adaptive compensation coefficient in step S206 Calculate using the following formula: Among them, The base compensation coefficient is set at 0.7. The per-pixel cloud / fog persistence value ranges from 0 to 1, and its calculation formula is as follows: In the formula, This is the near-infrared band value after the current pixel has been repaired. This represents the near-infrared average value of pure crop pixels within the corresponding elevation layer.

[0056] It should be noted that this compensation mechanism can dynamically adjust the compensation intensity according to the slope and the degree of cloud and fog residue. The gentler the slope and the higher the degree of cloud and fog residue, the greater the compensation intensity, thus completely eliminating the spectral residue deviation in thin cloud and fog areas.

[0057] S207. Generation of fused feature map of terrain and spectrum: The compensated and corrected R, G, B, and NIR four-band data are concatenated with the pixel-by-pixel slope Slp and aspect Asp data to generate a 6-channel fused feature map of terrain and spectrum, providing high-quality feature input with terrain constraints for subsequent encoding and decoding.

[0058] S3. Terrain-adaptive convolutional coding performs terrain-guided deformable convolutional feature extraction, elevation hierarchical pooling aggregation, and channel attention weighted optimization on the fused feature map, simultaneously extracting deep semantic features of crops and NDVI growth gradient features, and outputting multi-scale encoded feature maps.

[0059] Step S3, terrain-adaptive convolutional coding, is specifically executed through the following sub-steps: S301, Terrain-guided deformable convolutional feature extraction: The 6-channel terrain-spectral fusion feature map is input into the coding network, and a 3×3 deformable convolution is used as the basic convolutional unit. The offset of the deformable convolution is subject to terrain constraints; the specific constraints are as follows:

[0060] In the formula, This represents the original learned offset for deformable convolution. This represents the effective offset after constraints. It should be noted that this constraint mechanism enables terrain-adaptive adjustment of the convolution kernel; the steeper the slope, the stricter the constraint on the convolution offset. When the angle is 90°, the offset is 0, and the convolution degenerates into a standard 3×3 convolution. This avoids feature extraction distortion caused by excessive convolution offset in steep slope areas, while retaining the deformation adaptation capability of deformable convolution in flat areas, thus accurately extracting the edge features of broken fields.

[0061] S302, Elevation-level Layered Pooling Feature Aggregation: The feature map output by convolution is masked according to the three elevation levels in step S203. A 3×3 fusion pooling operation is performed independently within each level. The fusion pooling is a fusion of average pooling and max pooling with a weight of 1:1 and a pooling stride of 2. Cross-regional feature aggregation is not performed between different elevation levels to preserve the independent semantic features of fragmented fields. This can prevent the features of adjacent fields from sticking together.

[0062] S303, Channel Attention Weighted Optimization: For the feature map after hierarchical pooling, the SE channel attention module is used to learn the weight coefficients of each feature channel, strengthen the high weight channels corresponding to crop spectrum and growth characteristics, suppress the low weight channels corresponding to soil, rock and water background, and output the optimized multi-scale deep features.

[0063] S304, NDVI Growth Gradient Feature Extraction: The pixel-wise normalized vegetation index (NDVI) is calculated based on the corrected remote sensing image. The NDVI calculation formula is as follows: ;

[0064] The gradient magnitude of the NDVI image is calculated using the 3×3 Sobel operator. Calculate using the following formula: ,in This represents the gradient in the x-direction of the NDVI image. The gradient in the y-direction of the NDVI image is represented; this gradient is used as the growth gradient feature and concatenated with the deep semantic features to output the final multi-scale encoded feature map.

[0065] S4. Growth gradient dual-branch decoding: Construct a dual-path decoding structure with a main branch for category segmentation and an auxiliary branch for growth gradient. Perform synchronous upsampling and feature fusion on the multi-scale encoded feature map. Through dual-branch cross-supervision constraints, eliminate pixel confusion and field boundary offset caused by uneven crop growth, and output the optimized dual-branch fused segmentation feature map.

[0066] Step S4, the dual-branch decoding of the growth gradient, is specifically executed through the following sub-steps: S401 Class segmentation main branch decoding: With the goal of binary classification of crops and background, four layer-by-layer deconvolution upsampling operations are performed on the multi-scale encoded feature map. During each upsampling process, shallow edge features of the corresponding scale in the encoding stage are fused through skip connections. Finally, the pixel-level crop and background classification probability map with the same size as the input image is output, and the value range is 0~1.

[0067] S402 Growth Gradient Assisted Branch Decoding: With the goal of field boundary identification, it simultaneously performs four same-scale deconvolution upsampling operations on the multi-scale encoded feature map to specifically extract the abrupt change features of the NDVI growth gradient. Finally, it outputs a growth boundary probability map with the same size as the input image, with a value range of 0 to 1. The higher the probability value, the greater the possibility that the pixel is the field growth boundary.

[0068] S403 Dual-Branch Cross-Supervision and Boundary Alignment: Constructing a joint loss function for end-to-end supervised training of the dual branches, the formula for the joint loss function is as follows: In the formula, The binary cross-entropy loss of the main branch is used to constrain the classification accuracy of crops and background; To assist the growth gradient regression loss of the branch, mean square error (MSE) is used to calculate and constrain the positioning accuracy of the field boundary; For boundary alignment loss, Dice loss is used to calculate the loss, which forces the segmentation boundary of the main branch to be aligned with the abrupt change boundary of the auxiliary branch at the pixel level. , , The loss weights are set to 0.7, 0.2, and 0.1 respectively.

[0069] This cross-monitoring mechanism achieves strong constraints of "growth boundary" on "category segmentation". It forces the segmentation boundary of the main branch to be pixel-level aligned with the growth mutation boundary of the auxiliary branch, eliminating pixel confusion caused by differences in growth within the same field, as well as boundary adhesion caused by similar growth between adjacent fields;

[0070] S404 Dual-Branch Feature Fusion: The classification probability map output by the main branch of category segmentation and the boundary probability map output by the growth gradient auxiliary branch are weighted and fused with a weight ratio of 0.7:0.3 to generate an optimized dual-branch fused segmentation feature map.

[0071] S5. Output the topological constraint segmentation results. Perform pixel classification normalization and connected component extraction on the dual-branch fusion segmentation feature map. Based on the fragmented farmland topological constraint rules, complete the connected component verification and false target removal, and output the final farmland crop segmentation mask results.

[0072] S501, Pixel Classification Normalization: Perform softmax normalization on the dual-branch fusion segmentation feature map, mapping pixel values ​​to classification probabilities of 0 to 1. With 0.5 as the classification threshold, pixels with a probability ≥ 0.5 are marked as crop and cultivated land pixels, and pixels with a probability < 0.5 are marked as background pixels, generating preliminary binary segmentation results.

[0073] S502, Connected Component Extraction and Topology Verification: Perform 8-neighborhood connected component analysis on the preliminary results of binary segmentation to extract all independent connected components. Verify each connected component based on the topological constraint rules of fragmented farmland. The pixel area of ​​the connected component is ≥16 pixels, which can be used for 0.5m resolution images, corresponding to an actual area of ​​≥4㎡, covering the smallest fragmented farmland unit in the hilly and mountainous areas of southern China.

[0074] The aspect ratio of the bounding rectangle of the connected domain is ≤4:1, which can eliminate false targets such as narrow field ridges and roads;

[0075] The convexity of the connected domain is ≥0.6. The convexity is the ratio of the area of ​​the connected domain to the area of ​​the circumscribed convex polygon. Irregularly shaped non-arable land targets such as rocks and water bodies are removed.

[0076] The mean value of the growth gradient within the connected domain is ≤0.1, ensuring continuous crop growth within the connected domain and that it is the same field unit.

[0077] S503, False Target Removal and Result Output: Retain all connected components that satisfy the topological constraint rules, remove false targets such as noise, field ridges, roads, and rocks that do not satisfy the rules, and generate the final farmland crop segmentation mask result.

[0078] The above description is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any equivalent substitutions or modifications made by those skilled in the art within the scope of the technology disclosed in the present invention, based on the technical solution and inventive concept of the present invention, should be covered within the scope of protection of the present invention.

Claims

1. A crop remote sensing image segmentation method based on a convolutional neural network, characterized in that, Includes the following steps: S1. Input and preprocessing of remote sensing images and DEM data: Collect multispectral remote sensing images of the target area in hilly and mountainous areas, acquire grid data of four bands: red (R), green (G), blue (B), and near-infrared (NIR), and simultaneously collect 30m resolution DEM digital elevation data of the corresponding range of the target area. Complete pixel-by-pixel registration, radiometric calibration, and atmospheric correction preprocessing of remote sensing images and DEM data. S2. Topographic and spectral correction: Based on the registered remote sensing images and DEM data, the following steps are performed in sequence: reverse geometric correction of topographic distortion, elevation layer spectral consistency calibration, cloud and fog obscuration area repair, and cloud and fog residual spectrum adaptive compensation, generating a 6-channel topographic and spectral fusion feature map without geometric distortion and spectral bias. S3. Terrain-adaptive convolutional coding performs terrain-guided deformable convolutional feature extraction, elevation hierarchical pooling aggregation, and channel attention weighted optimization on the fused feature map, simultaneously extracting deep semantic features of crops and NDVI growth gradient features, and outputting multi-scale encoded feature maps. S4. Growth gradient dual-branch decoding: Construct a dual-path decoding structure with a main branch for category segmentation and an auxiliary branch for growth gradient. Perform synchronous upsampling and feature fusion on the multi-scale encoded feature map. Through dual-branch cross-supervision constraints, eliminate pixel confusion and field boundary offset caused by uneven crop growth, and output the optimized dual-branch fused segmentation feature map. S5. Output the topological constraint segmentation results. Perform pixel classification normalization and connected component extraction on the dual-branch fusion segmentation feature map. Based on the fragmented farmland topological constraint rules, complete the connected component verification and false target removal, and output the final farmland crop segmentation mask results.

2. The crop remote sensing image segmentation method based on a convolutional neural network according to claim 1, characterized in that, In step S1, the remote sensing image is a UAV multispectral remote sensing image with a spatial resolution of ≤0.5m, and the image size is uniformly normalized to 512×512 pixels. The DEM data was resampled to the same spatial resolution as the remote sensing image through bicubic interpolation, and the geometric root mean square error between the two was controlled within 1 pixel after registration.

3. The crop remote sensing image segmentation method based on convolutional neural networks according to claim 1, characterized in that, The joint terrain and spectral correction in step S2 is specifically performed through the following sub-steps: S201 Terrain parameter calculation: Based on the registered DEM data, the slope Slp and aspect Asp of the target area are calculated pixel by pixel, where the slope Slp ranges from 0° to 90° and the aspect Asp ranges from 0° to 360°, and a pixel-by-pixel terrain distortion matrix M is generated. In the formula For pixel coordinates, Represents pixel coordinates The topographic distortion coefficient at that location, This indicates the slope of that coordinate. This is the distortion correction baseline coefficient, with a value of 1.0; S202, Reverse correction of terrain distortion: Based on the terrain distortion matrix M, reverse coordinate offset correction is performed pixel by pixel on the remote sensing data of the four bands R, G, B, and NIR. S203. Elevation-stratified spectral consistency calibration: The target area is divided into three levels according to the DEM elevation: 0-200m low-altitude gentle slope layer, 200-400m mid-altitude gentle slope layer, and 400-600m high-altitude steep slope layer. For the corrected four-band data in each level, the mean and variance within the band are standardized to eliminate the spectral radiation deviation caused by the difference in illumination at different elevations. S204. Cloud and fog mask generation and occlusion area marking: Generate a binary cloud and fog mask based on the spectral ratio of near-infrared and blue light bands. S205. Terrain-constrained cloud and fog occlusion repair: For the marked cloud and fog occluded pixels, select the 3×3 neighborhood of non-occluded effective pixels within the same elevation and slope direction, and fill and repair the four-band values ​​of the occluded pixels using the inverse distance weighting method to complete the initial cloud and fog removal. S206. Adaptive compensation for residual cloud and fog spectra: Extract the four-band spectral mean of pure crop pixels within each elevation layer as a reference value. Spectral residual deviation compensation is performed on the pixels after initial repair. The compensation algorithm formula is as follows: In the formula, To compensate for the pixel values ​​in the later bands, These are the original band pixel values ​​after preliminary repair. For adaptive compensation coefficients; S207. Generation of fused topographic and spectral feature map: The compensated and corrected R, G, B, and NIR four-band data are stitched together with the pixel-by-pixel slope Slp and aspect Asp data to generate a 6-channel fused topographic and spectral feature map.

4. The crop remote sensing image segmentation method based on convolutional neural networks according to claim 3, characterized in that, The adaptive compensation coefficient in step S206 Calculate using the following formula: Among them, The base compensation coefficient is set at 0.

7. The per-pixel cloud / fog persistence value ranges from 0 to 1, and its calculation formula is as follows: In the formula, This is the near-infrared band value after the current pixel has been repaired. This represents the near-infrared average value of pure crop pixels within the corresponding elevation layer.

5. The crop remote sensing image segmentation method based on convolutional neural networks according to claim 1, characterized in that, The terrain-adaptive convolutional coding in step S3 is specifically executed through the following sub-steps: S301, terrain-guided deformable convolutional feature extraction: input the 6-channel terrain and spectral fusion feature map into the coding network, use 3×3 deformable convolution as the basic convolutional unit, and apply terrain constraints to the offset of the deformable convolution. S302, Elevation-level Layered Pooling Feature Aggregation: The feature map output by the convolution is masked according to the three elevation levels in step S203. A 3×3 fusion pooling operation is performed independently within each level. The fusion pooling is a fusion of average pooling and max pooling with a weight of 1:1 and a pooling step size of 2. Cross-regional feature aggregation is not performed between different elevation levels, and the independent semantic features of the fragmented fields are preserved. S303, Channel Attention Weighted Optimization: For the feature map after hierarchical pooling, the SE channel attention module is used to learn the weight coefficients of each feature channel, strengthen the high weight channels corresponding to crop spectrum and growth characteristics, suppress the low weight channels corresponding to soil, rock and water background, and output the optimized multi-scale deep features. S304, NDVI growth gradient feature extraction: The pixel-wise normalized vegetation index (NDVI) is calculated based on the corrected remote sensing image. The gradient magnitude of the NDVI image is calculated using the 3×3 Sobel operator and used as the growth gradient feature. This feature is then concatenated with the deep semantic features to output the final multi-scale coded feature map.

6. The crop remote sensing image segmentation method based on convolutional neural networks according to claim 1, characterized in that, The growth gradient dual-branch decoding in step S4 is specifically executed through the following sub-steps: S401 Class segmentation main branch decoding: Taking crop and background binary classification as the target, perform 4 layer-by-layer deconvolution upsampling on the multi-scale encoded feature map. During each upsampling process, the shallow edge features of the corresponding scale in the encoding stage are fused through skip connections. Finally, the pixel-level crop and background classification probability map with the same size as the input image is output, and the value range is 0~1. S402 Growth Gradient Assisted Branch Decoding: With the goal of field boundary identification, it simultaneously performs four same-scale deconvolution upsampling operations on the multi-scale encoded feature map to specifically extract the abrupt change features of the NDVI growth gradient, and finally outputs a growth boundary probability map with the same size as the input image, with a value range of 0~1. S403 Dual-Branch Cross-Supervision and Boundary Alignment: Construct a joint loss function to perform end-to-end supervised training on dual branches, forcing the segmentation boundary of the main branch to be pixel-level aligned with the growth mutation boundary of the auxiliary branch, eliminating pixel confusion caused by growth differences within the same field, and boundary adhesion caused by similar growth between adjacent fields. S404 Dual-Branch Feature Fusion: The classification probability map output by the main branch of category segmentation and the boundary probability map output by the growth gradient auxiliary branch are weighted and fused with a weight ratio of 0.7:0.3 to generate an optimized dual-branch fused segmentation feature map.

7. The crop remote sensing image segmentation method based on convolutional neural networks according to claim 1, characterized in that, The output of the topology constraint segmentation result in step S5 is specifically executed through the following sub-steps: S501, Pixel Classification Normalization: Perform softmax normalization on the dual-branch fusion segmentation feature map, mapping pixel values ​​to classification probabilities of 0 to 1. With 0.5 as the classification threshold, pixels with a probability ≥ 0.5 are marked as crop and cultivated land pixels, and pixels with a probability < 0.5 are marked as background pixels, generating preliminary binary segmentation results. S502. Connected Component Extraction and Topology Verification: Perform 8-neighborhood connected component analysis on the preliminary results of binary segmentation to extract all independent connected components, and verify each connected component based on the topology constraint rules of fragmented farmland. S503, False Target Removal and Result Output: Retain all connected components that satisfy the topological constraint rules, remove false targets such as noise, field ridges, roads, and rocks that do not satisfy the rules, and generate the final farmland crop segmentation mask result.

8. The crop remote sensing image segmentation method based on convolutional neural networks according to claim 7, characterized in that, The topological constraint rules for the broken farmland in step 502 are as follows: the area of ​​the connected domain pixel is ≥16 pixels; the aspect ratio of the bounding rectangle of the connected domain is ≤4:1; the convexity of the connected domain is ≥0.6; and the mean magnitude of the growth gradient within the connected domain is ≤0.1.