A corn field cape positioning and path extraction method
By combining the YOLOv12-LiteRep and AFFormer models, stable ridge end identification and cape location were achieved in complex farmland environments, improving the accuracy and real-time performance of cape location and path extraction in cornfields, and ensuring the safety and reliability of agricultural machinery operations.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- JILIN UNIVERSITY
- Filing Date
- 2026-05-21
- Publication Date
- 2026-06-19
AI Technical Summary
Existing farmland boundary detection methods are easily affected by factors such as light fluctuations, ground shadows, broken rows of plants, and weeds in complex and variable field environments, leading to misjudgments or omissions of boundary areas. This makes it difficult to achieve stable and reliable positioning of ridge ends and headlands, affecting the safety and accuracy of turning operations.
A lightweight detection triggering network based on YOLOv12-LiteRep is adopted, which combines temporal frame analysis and the lightweight semantic segmentation model AFFormer. A smooth turning path is generated by multinomial fitting and high-order Bézier curves to improve the recognition stability and accuracy.
It effectively suppresses interference such as light fluctuations and weed obstruction, improves the stability of ridge end identification and the feasibility of turning paths, and meets the requirements of real-time performance and robustness.
Smart Images

Figure CN122244770A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of image recognition technology, and in particular to a method for locating cornfield promontory and extracting paths. Background Technology
[0002] When autonomous navigation agricultural machinery performs cross-row operations, the accuracy of identifying the end of the ridge and the headland area directly affects the safety and reliability of turning operations. In particular, the headland, as a turning operation area at the edge of farmland, will further affect the accuracy and stability of turning path planning due to its boundary detection results.
[0003] Existing farmland boundary detection methods mostly employ deep learning vision technology based on single-frame images. Although they can learn multi-level features from large-scale training data and improve robustness to a certain extent, their architecture essentially relies on static single-frame input, ignoring the temporal continuity during the dynamic movement of agricultural machinery. In complex and ever-changing field environments, they are easily affected by light fluctuations, ground shadows, broken rows of plants, and weed obstruction, leading to misjudgment or missed judgment of boundary areas. They are also difficult to effectively distinguish between real ridge-end signals and transient noise, thus requiring manual intervention for verification in actual deployment, which restricts the complete autonomy of field-end turning operations.
[0004] To improve perception capabilities, the industry has also tried to adopt dual-modal perception strategies such as RGB-D fusion, which utilize the terrain undulation perception advantages of RGB texture information and depth information to improve semantic segmentation performance. However, such methods often rely on the elevation difference features provided by depth information to segment headlands. Their applicability is limited in scenarios where the surface height difference in the headland area is not obvious, the boundary is a non-linear curve, and it is accompanied by dense weeds, making it difficult to obtain stable and reliable boundary positioning results.
[0005] Meanwhile, although there are already ideas to decompose complex perception tasks and gradually integrate information through two-stage or multi-stage frameworks to balance efficiency and accuracy, there is still a lack of sufficient exploration and engineering application for the specific task of locating the end of ridges and headlands in farmland. Under the conditions of complex environment, many interference factors and high real-time requirements in dryland farmland, single-stage models generally have insufficient generalization ability and are difficult to balance between real-time performance and robustness. There is an urgent need for a technical solution that can achieve stable identification of the end of ridges and localization of headland areas in complex scenarios. Summary of the Invention
[0006] The technical solution of this invention to solve the above-mentioned technical problems is to provide a method for locating cornfield promontory and extracting paths, comprising the following steps: Step 1: Construct a lightweight detection triggering network, YOLOv12-LiteRep, based on YOLOv12. The YOLOv12-LiteRep network uses YOLOv12 as the baseline model, introducing DySample and FocalerCIoU to improve feature representation and bounding box regression stability; it designs the MBConv_detect detection head to enhance key feature representation; and it improves the MobileOne module as the backbone network through a structural reparameterization strategy. This effectively compresses model size and computational load while maintaining detection accuracy, providing a deployable perception front-end for agricultural machinery operations. Step 2: Propose a triggering strategy based on temporal frame analysis. This strategy is based on the frame-by-frame detection results output by the YOLOv12-LiteRep detector, and reduces the detection state of each frame to a binary judgment: a valid crop target is detected or a valid crop target is not detected. A trigger signal is generated based on the judgment results of multiple consecutive frames. Step 3: Introduce the lightweight semantic segmentation model AFFormer. Its purpose is to quickly achieve fine segmentation of the drivable area after the trigger signal is activated, so as to meet the real-time requirements. Step 4: Design a turning path generation method based on polynomial fitting and higher-order Bézier curves. After using the segmented drivable area contour, the cape boundary is extracted through polynomial fitting, and then a smooth turning trajectory is generated by using higher-order Bézier curves in combination with the curvature change points.
[0007] Further, in step 1, the lightweight detection triggering network YOLOv12-LiteRep includes: using the MBConv module based on EfficientNet to reconstruct the detection head in a lightweight manner, and modifying downsampling to the dynamic upsampling operator DySample, significantly reducing the number of parameters and computational complexity. Secondly, an improved structural reparameterization module, MobileOne, is introduced into the backbone network. During the training phase, a multi-branch topology structure enhances feature representation capabilities, and during the inference phase, it can be equivalently converted into a lightweight single-branch structure, significantly improving forward computation efficiency. Finally, to enhance the model's robustness to localization of difficult samples, the Focaler-IoU loss function is adopted. By dynamically adjusting the weights of samples of different difficulty in gradient backpropagation, the gradient distribution during training is optimized, improving the model's localization accuracy and convergence stability in complex scenarios.
[0008] Further, in step 2, the triggering strategy based on time-series frame analysis includes: performing binarization judgment based on the frame-by-frame detection results of YOLOv12-LiteRep; in a 25fps video, when a frame contains a crop target with a confidence level of not less than 0.30 and a detection box area of not less than 0.1% of the total image area, the frame is judged as detected; otherwise, it is judged as not detected; on this basis, a continuous undetected counter CN and a continuous detection counter CP are set; when CN accumulates to 5 frames, a ridge end arrival trigger signal is generated; after triggering, the triggering state is maintained until CP continuously reaches 3 frames, at which point the triggering is canceled, so as to form a triggering mechanism with hysteresis characteristics, thereby improving the stability of boundary recognition and suppressing false triggering caused by short-term interference; Further, in step 3, the lightweight semantic segmentation model AFFormer includes: when the trigger signal is activated, calling the lightweight semantic segmentation network to perform pixel-level fine segmentation of the drivable area, so as to reduce redundant calculations and meet real-time processing requirements while ensuring segmentation accuracy; the lightweight semantic segmentation network is preferably AFFormer. Further, in step 4, the turning path generation method includes: extracting a set of boundary contour points based on the drivable area segmentation results obtained in step 3, performing polynomial fitting on the boundary contours to determine the cape boundary and key turning points, and constructing a high-order Bézier curve based on the key points to generate a turning path; wherein the generated turning path satisfies the smoothness and continuity constraint and is located within the drivable area, so as to achieve the smoothness and safety of the turning trajectory.
[0009] Compared with the prior art, the advantages of the present invention are as follows: (1) The timing triggering mechanism can suppress short-term interference such as light fluctuations, row breaks and weed blockage, significantly reduce misjudgment of the end of the row and improve the recognition stability.
[0010] (2) A two-stage model architecture is adopted, and a lightweight semantic segmentation network is called only after the trigger signal is activated to achieve fine segmentation of the drivable area, reducing redundant calculations and meeting the real-time requirements of the end side.
[0011] (3) By reducing the number of parameters and computational complexity through lightweight reconstruction and reparameterization of the detection network, and by combining improved loss function to enhance the robustness of difficult sample localization, the detection accuracy and inference efficiency in complex scenarios are improved.
[0012] (4) Based on the boundary fitting of the drivable area and the generation of high-order Bézier curves, the turning trajectory that satisfies the smoothness and continuity and is located within the drivable area is improved, thereby enhancing the executability and tracking stability of the ground turning path. Attached Figure Description
[0013] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the structures shown in these drawings without creative effort.
[0014] Figure 1 This is a flowchart illustrating the cornfield promontory location and path extraction method of the present invention; Figure 2 This is a schematic diagram of the network structure of the lightweight detection triggering network YOLOv12-LiteRep of the present invention; Figure 3 This is a schematic diagram of the improved module of the present invention; Figure 4 This is a schematic diagram of the network structure of AFFormer, the lightweight semantic segmentation network of this invention; Figure 5 This is a schematic diagram of the fitting process for generating the turning path according to the present invention; Figure 6 This is a schematic diagram illustrating the deployment of the algorithm of this invention on an embedded platform. Detailed Implementation
[0015] This invention provides a method for locating cornfield promontory and extracting paths, aiming to obtain reliable flight path information and improve the stability and safety of field turning during cross-row operations.
[0016] The technical solution of the present invention will be described below with reference to specific embodiments: In the technical solution of this embodiment, such as Figure 1 As shown, the method for locating cornfield promontory and extracting paths includes the following steps: Step 1: Construct a lightweight detection triggering network, YOLOv12-LiteRep, based on YOLOv12. The YOLOv12-LiteRep network uses YOLOv12 as the baseline model, introducing DySample and FocalerCIoU to improve feature representation and bounding box regression stability; it designs the MBConv_detect detection head to enhance key feature representation; and it improves the MobileOne module as the backbone network through a structural reparameterization strategy. This effectively compresses model size and computational load while maintaining detection accuracy, providing a deployable perception front-end for agricultural machinery operations. Step 2: Propose a triggering strategy based on temporal frame analysis. This strategy is based on the frame-by-frame detection results output by the YOLOv12-LiteRep detector, and reduces the detection state of each frame to a binary judgment: a valid crop target is detected or a valid crop target is not detected. A trigger signal is generated based on the judgment results of multiple consecutive frames. Step 3: Introduce the lightweight semantic segmentation model AFFormer. Its purpose is to quickly achieve fine segmentation of the drivable area after the trigger signal is activated, so as to meet the real-time requirements. Step 4: Design a turning path generation method based on polynomial fitting and higher-order Bézier curves. After using the segmented drivable area contour, the cape boundary is extracted through polynomial fitting, and then a smooth turning trajectory is generated by using higher-order Bézier curves in combination with the curvature change points.
[0017] Further, in step 1, the lightweight detection triggering network YOLOv12-LiteRep includes: using the MBConv module based on EfficientNet to reconstruct the detection head in a lightweight manner, and modifying downsampling to the dynamic upsampling operator DySample, significantly reducing the number of parameters and computational complexity. Secondly, an improved structural reparameterization module, MobileOne, is introduced into the backbone network. During the training phase, a multi-branch topology structure enhances feature representation capabilities, and during the inference phase, it can be equivalently converted into a lightweight single-branch structure, significantly improving forward computation efficiency. Finally, to enhance the model's robustness to localization of difficult samples, the Focaler-IoU loss function is adopted. By dynamically adjusting the weights of samples of different difficulty in gradient backpropagation, the gradient distribution during training is optimized, improving the model's localization accuracy and convergence stability in complex scenarios.
[0018] Further, in step 2, the triggering strategy based on time-series frame analysis includes: performing binarization judgment based on the frame-by-frame detection results of YOLOv12-LiteRep; in a 25fps video, when a frame contains a crop target with a confidence level of not less than 0.30 and a detection box area of not less than 0.1% of the total image area, the frame is judged as detected; otherwise, it is judged as not detected; on this basis, a continuous undetected counter CN and a continuous detection counter CP are set; when CN accumulates to 5 frames, a ridge end arrival trigger signal is generated; after triggering, the triggering state is maintained until CP continuously reaches 3 frames, at which point the triggering is canceled, so as to form a triggering mechanism with hysteresis characteristics, thereby improving the stability of boundary recognition and suppressing false triggering caused by short-term interference; Further, in step 3, the lightweight semantic segmentation model AFFormer includes: when the trigger signal is activated, calling the lightweight semantic segmentation network to perform pixel-level fine segmentation of the drivable area, so as to reduce redundant calculations and meet real-time processing requirements while ensuring segmentation accuracy; the lightweight semantic segmentation network is preferably AFFormer. Further, in step 4, the turning path generation method includes: extracting a set of boundary contour points based on the drivable area segmentation results obtained in step 3, performing polynomial fitting on the boundary contours to determine the cape boundary and key turning points, and constructing a high-order Bézier curve based on the key points to generate a turning path; wherein the generated turning path satisfies the smoothness and continuity constraint and is located within the drivable area, so as to achieve the smoothness and safety of the turning trajectory. Example 1
[0019] A method for locating cornfield promontory and extracting paths includes the following steps: Step 1: Construct a real-time detection model more suitable for edge computing environments, improving its ability to perceive small and occluded targets in complex scenes. Based on YOLOv12, this invention first uses the MBConv module based on EfficientNet to lightweightly reconstruct the detection head, modifying downsampling to the dynamic upsampling operator DySample, significantly reducing the number of parameters and computational complexity. Secondly, an improved structural reparameterization module, MobileOne, is introduced into the backbone network. During training, a multi-branch topology enhances feature representation capabilities, and during inference, it can be equivalently converted into a lightweight single-branch structure, significantly improving forward computation efficiency. Finally, to enhance the model's robustness to difficult samples, the Focaler-IoU loss function is adopted. By dynamically adjusting the weights of samples of different difficulty in gradient backpropagation, the gradient distribution during training is optimized, improving the model's localization accuracy and convergence stability in complex scenes. The improved YOLOv12-LiteRep architecture is as follows: Figure 2 As shown.
[0020] The specific improvements to the YOLOv12-LiteRep network model are explained below: In cross-row visual systems for farmland scenarios, accurate perception of crop targets directly depends on the quality of feature reconstruction during the decoding stage. Traditional upsampling methods, facing challenges such as varying field lighting and soil weed interference, easily introduce ambiguity and semantic information loss during feature recovery, particularly hindering the preservation of edge structures and textures of small seedlings and partially occluded targets. To address this, this invention introduces the DySample dynamic upsampling module, which dynamically adjusts convolutional kernel parameters based on the semantic content and spatial distribution of input features through a content-adaptive convolutional kernel weight generation mechanism, achieving context-aware feature reconstruction. Simultaneously, the computational efficiency and feature extraction capabilities of the detection head are equally crucial, directly impacting the real-time monitoring performance on mobile deployments. To further improve the operating efficiency of YOLOv12 on resource-constrained devices, this invention reconstructs the detection head into a lightweight design based on the MBConv module, such as... Figure 3 As shown in (a), the MBConv_detect module employs an inverted bottleneck structure. It first expands the feature representation capability through 1×1 convolutions to increase dimensionality, then uses depthwise separable convolutions to separate spatial and channel feature extraction. The depthwise convolutions focus on spatial feature extraction, while the pointwise convolutions handle channel feature fusion. Building upon this, an efficient ECA attention mechanism is introduced, adaptively calibrating channel feature weights through one-dimensional convolutions to enhance key feature representation while avoiding dimensionality reduction. Finally, feature compression and fusion are achieved through 1×1 convolution dimensionality reduction, and shortcut connections are combined to promote gradient flow, improving the model's inference efficiency while maintaining multi-scale feature discrimination capabilities.
[0021] To optimize the practical performance of the YOLOv12 model in farmland crop detection, and considering the regular spatial structure characteristics of crop rows, a targeted design was implemented for the backbone network, proposing features such as... Figure 3 (b) shows the MobileRepOne architecture. This backbone network efficiently extracts features using a feedforward structure. The input image is first processed through standard convolutions to construct low-level visual features, and then strip convolutions are used to specifically capture the strong correlation between the vertical and horizontal directions in the image, explicitly enhancing the model's ability to perceive the geometric structure of crop rows. On this basis, the features are sequentially passed through three cascaded MobileRepOne modules to further complete multi-scale feature fusion and semantic information enhancement.
[0022] For the designed MobileRepOne module, a multi-branch topology is used during training to enhance feature representation capabilities. The input feature map is first processed through three parallel branches. The first branch uses a 3×3 standard convolution to capture basic local details and texture features, introducing a depthwise separable convolution. The depthwise convolution focuses on extracting key spatial features related to crop rows and plant contours, while the pointwise convolution handles cross-channel information fusion, improving feature discriminative power. The second branch is a batch normalization layer, serving as a clean skip connection to ensure smooth gradients. The third branch is a 1×1 standard convolution and a batch normalization skip connection. These three branches are concatenated to aggregate features. The aggregated features are then non-linearly introduced through an activation function and fed into a dual-path enhancement block. One path consistently uses a single batch normalization layer, while the other path repeats the cascaded operation of 1×1 convolution + batch normalization three times. This constitutes a hierarchical feature extraction mechanism, enabling the model to select and integrate features at multiple stages, thereby learning a more robust representation of complex field environments. During the inference phase, the multi-branch topology structure during training is transformed into a single convolution and batch normalization operation through structural reparameterization technology. This allows the model to be deployed without any branches, reducing memory access costs and meeting the low latency requirements of real-time operating systems.
[0023] In bounding box regression tasks, this invention introduces the Focaler-CIoU loss function to address the critical issue of uneven sample difficulty distribution in object detection. This loss function is based on an IoU-based adaptive focusing mechanism, which can dynamically adjust the weight allocation of samples of different difficulties in the loss function. This is achieved by introducing a focusing coefficient. β and modulation factor α The calculation equation is shown in formula (1): (1) in, This is the penalty term in CIoU, incorporating the effects of center point distance and aspect ratio. IoU, or Intersection over Union, is the ratio of the overlap area between the predicted and ground truth bounding boxes to their union area. This loss function combines... IoU and CIoU A penalty term was introduced, and a focusing mechanism was implemented to measure the regression error between the predicted bounding boxes and the true bounding boxes. This made the model pay more attention to those bounding boxes that were difficult to regress accurately during training, while appropriately reducing the attention to simple samples that had been learned sufficiently.
[0024] Step 2 involves a triggering strategy based on temporal frame analysis to stably determine whether the agricultural machinery has reached the boundary of the crop ridge during operation. This strategy is based on the frame-by-frame detection results output by the YOLOv12-LiteRep detector, reducing the detection state of each frame to a binary judgment: a valid crop target is detected or not. A trigger signal is generated based on the judgment results of multiple consecutive frames. Specifically, with a video frame rate of 25fps, a frame is considered to have detected a valid crop target when a target with a confidence level of not less than 0.30 and a detection box area not less than 0.1% of the total image area is present. The confidence level and area thresholds are set as fixed values based on statistical analysis of the actual field images collected, and are not dynamically adjusted with the environment, effectively suppressing false detection interference caused by distant small targets or changes in field lighting.
[0025] However, in actual operation scenarios, single-frame detection results often fail to accurately reflect the true spatial position of the agricultural machinery within the ridge. One typical scenario involves a certain width of dividing strip or bare soil area between adjacent plots. When the agricultural machinery moves from one plot to the next, it sequentially experiences a series of frames: where crop targets can be detected within the ridge, where crop targets are almost undetectable in the dividing strip, and where crop targets are detected again after entering the next plot. During this process, although several frames within the dividing area do not detect valid crop targets, these frames do not correspond to the turning point but are merely transition areas between plots. If the turning logic is triggered solely based on the absence of a crop target in a single frame, false triggering at the dividing strip location is likely. To enhance the stability of the timing judgment, this invention introduces a continuous frame counting mechanism based on the aforementioned single-frame binary judgment. Two counters are initialized: the number of consecutive frames without detected crop targets (CN) and the number of consecutive frames with detected crop targets (CP). When CN accumulates to 5 frames, it is determined that the end of the ridge has been reached, and a trigger signal is generated. After triggering, the trigger state is deactivated only after CP recovers to 3 frames. This mechanism can effectively filter out the interference of short-term absence of crop targets in the plot separation area on decision-making, and avoid false triggering caused by single-frame missed detection or fluctuations in detection results, thereby significantly improving the robustness of the model to the operation boundary in complex farmland environments.
[0026] Step 3: Based on the precise boundary signal output by the trigger module, this invention introduces an innovative headless lightweight semantic segmentation model, AFFormer, in the second stage. Its architecture is as follows: Figure 4As shown, its purpose is to quickly achieve fine-grained segmentation of the boundary region after the trigger module is activated, in order to meet real-time requirements. AFFormer uses a prototype clustering mechanism to compress high-resolution pixel features into a small number of representative prototype vectors, transforming the semantic learning process into this low-dimensional space. This design significantly reduces the computational complexity of traditional self-attention mechanisms from O(N²) to O(N), making it particularly suitable for processing high-resolution farmland images, maintaining extremely low computational latency while meeting the lightweight deployment requirements of mobile platforms.
[0027] Step 4: After finely segmenting the end-of-ridge boundary region using the constructed segmentation model, a drivable area mask will be output. This invention proposes a method for generating smooth navigation trajectories for autonomous cross-row operations based on drivable area masks. First, the contour boundaries of the end-of-ridge points are extracted based on the segmentation results. A polynomial fitting method is used to perform preliminary smoothing on the discrete boundary points to improve the continuity and geometric consistency of the boundary lines. Then, a curvature analysis method based on a moving window is applied to extract features from the boundary lines by calculating the first-order curve. and second order Numerical differentiation identifies curvature extrema on the boundary line and selects them as control points for the navigation line fitting process. The calculation method is as follows: (2) Finally, a fifth-order Bézier curve fitting method based on multiple control points is used to connect the discrete path points into a continuous and smooth navigation trajectory. Its parameterized model is as follows: (3) In the formula, The coordinates of the six control points (including the start point, the end point, and four intermediate control points) are given. The coefficients are binomial coefficients. Figure 5 The process of fitting a navigation path is described. Observation of the final trajectory demonstrates the stability and practicality of the trajectory, effectively guiding agricultural machinery in slope turning operations.
[0028] To verify the deployability of the proposed framework on actual agricultural equipment, the entire algorithm was deployed and tested on an NVIDIA Jetson TX2 edge device, including a lightweight object detection triggering module, an AFFormer segmentation module, and a path generation module based on polynomial fitting and higher-order Bézier curves. Compared to a high-performance desktop platform, the overall algorithm deployed on the embedded platform showed a slight increase in inference time and a slight decrease in the average processing frame rate of each module. The detection, segmentation, and path extraction results are as follows: Figure 6 As shown, Figure 6 (a) shows the image input sequence. Figure 6(b) shows the processing results of the input image on the Jetson platform. Experimental results show that, for the same dataset, the detection and segmentation models deployed on the embedded platform are approximately 21.7 fps, 18.2 fps, and 200 fps respectively on the Jetson TX2, compared to those deployed on a high-performance desktop. According to field experiment statistics, in typical cornfield operations, the triggering frequency of the segmentation and path generation stages is low, accounting for approximately 5%-10% of the overall operation time. Therefore, the system runs only the detection module at a high frame rate of 21.7 fps during the straight-line driving stage for the vast majority of the time, triggering all three stages simultaneously at a frame rate of 9.43 fps to execute the complete path planning process. This design ensures system real-time performance while minimizing computational resource consumption, meeting the real-time requirements of agricultural machinery operations.
[0029] The above description is merely a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A method for locating cornfield promontory and extracting paths, characterized in that, Includes the following steps: Step 1: Construct a lightweight detection triggering network YOLOv12-LiteRep based on YOLOv12. The YOLOv12-LiteRep network uses YOLOv12 as the baseline model, introduces DySample and FocalerCIoU to improve feature representation ability and bounding box regression stability, designs MBConv_detect detection head to enhance key feature representation, and improves the MobileOne module as the backbone network through a structural reparameterization strategy. Step 2: Propose a triggering strategy based on temporal frame analysis. This strategy is based on the frame-by-frame detection results output by the YOLOv12-LiteRep detector. It reduces the detection state of each frame to a binary judgment of whether a valid crop target is detected or not, and generates a trigger signal based on the judgment results of multiple consecutive frames. Step 3: Introduce the lightweight semantic segmentation model AFFormer to achieve fine segmentation of the drivable area after activation by the trigger signal; Step 4: Design a turning path generation method based on polynomial fitting and higher-order Bézier curves. Utilize the segmented drivable area contour, extract the cape boundary through polynomial fitting, and then combine the curvature change points to generate a smooth turning trajectory using higher-order Bézier curves.
2. The method for locating cornfield promontory and extracting paths according to claim 1, characterized in that, In step 1, the construction of the YOLOv12-LiteRep network includes: using the MBConv module based on EfficientNet to perform lightweight reconstruction of the detection head, modifying downsampling to the dynamic upsampling operator DySample; introducing the improved structural reparameterization module MobileOne into the backbone network; and using the Focaler-IoU loss function to optimize the gradient distribution during the training process.
3. The method for locating and extracting paths in a cornfield according to claim 1, characterized in that, In step 2, the triggering strategy based on time-series frame analysis is as follows: In a 25fps video, when a crop target with a confidence level ≥ 0.30 and a detection box area ≥ 0.1% of the total image area exists in a certain frame, it is determined to be detected; otherwise, it is determined to be undetected. A continuous undetected counter CN and a continuous detected counter CP are set. When CN accumulates to 5 frames, a ridge end arrival trigger signal is generated. After triggering, the triggering state is maintained until CP continuously reaches 3 frames, at which point the triggering is canceled.
4. The method for locating cornfield promontory and extracting paths according to claim 1, characterized in that, In step 3, the lightweight semantic segmentation model AFFormer compresses high-resolution pixel features into a small number of prototype vectors through a prototype clustering mechanism, transforming the semantic learning process into a low-dimensional space, reducing the computational complexity of the self-attention mechanism from O(N²) to O(N), and achieving pixel-level fine segmentation of the drivable area.
5. The method for locating cornfield promontory and extracting paths according to claim 1, characterized in that, In step 4, the turning path generation method specifically includes: extracting a set of boundary contour points based on the drivable area segmentation results, performing polynomial fitting on the boundary contours to determine the cape boundary and key turning points, constructing a high-order Bézier curve based on the key points, and generating a turning path that satisfies the smoothness and continuity constraints and is located within the drivable area.
6. The method for locating cornfield promontory and extracting paths according to claim 2, characterized in that, The MBConv_detect head adopts an inverted bottleneck structure. It first expands the feature representation capability by increasing the dimensionality through 1×1 convolution, then uses depthwise separable convolution to separate spatial and channel feature extraction, introduces ECA attention mechanism to adaptively calibrate channel feature weights, and finally achieves feature compression and fusion by 1×1 convolution dimensionality reduction, and combines shortcut connections to promote gradient flow.
7. The method for locating cornfield promontory and extracting paths according to claim 2, characterized in that, The improved MobileOne module (MobileRepOne) adopts a multi-branch topology during the training phase, including three parallel branches: the first branch is a combination of 3×3 standard convolution and depthwise separable convolution, the second branch is a batch normalization layer, and the third branch is a skip connection of 1×1 standard convolution and batch normalization. The three branches are spliced and aggregated before being fed into a dual-path enhancement block. During the inference phase, the structure is reparameterized to be converted into a single convolution and batch normalization operation.
8. The method for locating and extracting paths in a cornfield according to claim 2, characterized in that, The equation for calculating the Focaler-IoU loss function is as follows: Where α is the modulation factor, β Here, IoU is the focus coefficient, and the intersection-union ratio between the predicted bounding box and the ground truth bounding box. R CIoU This is a penalty term in CIoU, which includes the effects of center point distance and aspect ratio.
9. The method for locating and extracting paths in a cornfield according to claim 5, characterized in that, The curvature change points are extracted using a curvature analysis method based on a moving window, and the first-order numerical differential of the boundary contour points is calculated. and second order And through the curvature formula Determine the curvature extremum points and use them as control points for fitting the turning path.
10. The method for locating and extracting paths in a cornfield according to claim 5, characterized in that, The higher-order Bézier curve is a fifth-order Bézier curve, and its parameterization model is as follows: ,in t ∈[0,1], The coordinates are for six control points, including the start point, the end point, and four intermediate control points. The coefficients are binomial coefficients.