Yolov4-based optimization method for rapid identification and detection of field sowthistle obstacles
By optimizing the prior box re-clustering, pooling method, and network structure of YOLOv4, the problem of slow detection speed of bitter lettuce in orchards by YOLOv4 was solved, and real-time and efficient detection was achieved in embedded systems.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- INST OF AGRI RESOURCES & REGIONAL PLANNING CHINESE ACADEMY OF AGRI SCI
- Filing Date
- 2025-11-14
- Publication Date
- 2026-06-18
AI Technical Summary
The existing YOLOv4 is slow when detecting obstacles such as bitter lettuce in orchards, which makes it difficult to meet the requirements of real-time performance and resource efficiency, especially in embedded systems where resources are limited.
Optimizations were made to YOLOv4's prior box re-clustering, pooling methods, and network structure, including using the K-means algorithm to initialize anchor boxes, pruning unnecessary detection targets, and using an exponentially weighted average filtering method for pooling.
It accelerates the target detection process, reduces the use of computing and memory resources, is suitable for embedded systems, and maintains high-quality accuracy in identifying bitter lettuce.
Smart Images

Figure CN2025134963_18062026_PF_FP_ABST
Abstract
Description
An Optimized Method for Fast Obstacle Recognition and Detection in Sow thistle Based on YOLOv4 Technical Field
[0001] This invention relates to the field of agricultural robot technology, specifically to an optimized method for rapid obstacle recognition and detection in bitter lettuce based on YOLOv4. Background Technology
[0002] To achieve autonomous driving for agricultural robots, they first need to be able to autonomously avoid obstacles and drive safely. In orchards, sow thistle is the most common obstacle. Due to the inconsistent clustering and growth patterns of sow thistle, it exhibits both rigid and flexible growth characteristics. Furthermore, from an image perspective, sow thistle cannot be consistently and accurately detected due to environmental influences. While YOLOv4 is suitable for both head-type and backbone networks, making it applicable to detecting sow thistle with its diverse growth forms in complex orchard scenarios, its detection speed is relatively slow, which may limit its performance, especially in resource-constrained embedded systems or real-time applications. Therefore, it is necessary to develop an optimized method for fast recognition and detection based on YOLOv4 to meet the requirements of real-time performance and resource efficiency. Summary of the Invention
[0003] The purpose of this invention is to overcome the problems existing in the prior art and provide an optimized method for fast recognition and detection of obstacles in *Hedysarum heterotropoides* based on YOLOv4. Optimizations have been made in three aspects: prior box re-clustering, pooling method, and network structure to accelerate the target detection process, making it suitable for real-time applications, reducing the use of computing and memory resources to adapt to embedded systems or devices with limited resources, minimizing the loss of target detection accuracy, and maintaining high-quality recognition of *Hedysarum heterotropoides*.
[0004] To achieve the above-mentioned technical objectives and effects, the present invention is implemented through the following technical solution:
[0005] An optimized method for fast obstacle recognition and detection in *Sonchus oleraceus* based on YOLOv4, the method comprising the following steps:
[0006] Step S1: Initialize anchor boxes using the K-means algorithm to generate anchor boxes with stronger scale adaptability, making the model more suitable for detecting slender bitter lettuce.
[0007] Step S2: Optimize the network structure to suit the specific characteristics of the network scene, remove distant sow thistle and nearby small sow thistle targets, and only perform near-field stability detection in each frame image;
[0008] Step S3: Optimize the pooling method by using an exponentially weighted average filtering method to retain as much useful information as possible.
[0009] Furthermore, the specific steps for initializing the anchor frame in step S1 are as follows:
[0010] Step S1.1: Randomly select 9 boxes as initial anchor boxes;
[0011] Step S1.2: Using the IOU metric, assign each box to the anchor box that is closest to it, where the IOU metric formula is as follows:
[0012] In the formula, box represents the actual bounding box, centre represents the anchor box, and IOU represents the intersection-union ratio;
[0013] Step S1.3: Calculate the mean of the width and height of all boxes in each cluster, and update the anchor boxes;
[0014] Step S1.4: Repeat steps S1.2 and S1.3 until the selected anchor box no longer changes, or the maximum number of iterations is reached.
[0015] Furthermore, in step S2, the anchor boxes corresponding to the large-size feature maps are responsible for detecting small targets. Since small targets do not need to be detected, the largest feature map is deleted to reduce the number of prior boxes placed, thereby increasing the speed of inference.
[0016] The beneficial effects of this invention are:
[0017] This invention employs YOLOv4 for accurate detection of bitter lettuce and optimizes it in three aspects: prior box re-clustering, pooling method, and network structure. The optimized YOLOv4 can accelerate the target detection process, making it suitable for real-time applications, reducing the use of computing and memory resources to adapt to embedded systems or resource-constrained devices, minimizing the loss of target detection accuracy, and maintaining high-quality target recognition. Attached Figure Description
[0018] Figure 1 is a flowchart of the optimization method of the present invention;
[0019] Figure 2 shows the optimized YOLOv4 network structure using the method of the present invention.
[0020] Figure 3 is a comparison between YOLOv4 after re-clustering optimization by the method of the present invention and the standard YOLOv4.
[0021] Figure 4 is a schematic diagram of the optimized anchor frame according to the method of the present invention;
[0022] Figure 5 shows the detection results of using the method of the present invention, taking bitter lettuce as an example. Detailed Implementation
[0023] The present invention will now be described in detail with reference to the accompanying drawings and embodiments.
[0024] An optimized method for fast obstacle recognition and detection in *Sonchus oleraceus* based on YOLOv4, as shown in Figure 1, includes the following steps:
[0025] Step S1: Initialize anchor boxes using the K-means algorithm to generate anchor boxes with stronger scale adaptability, making the model more suitable for detecting slender bitter lettuce.
[0026] Step S2: Optimize the network structure for the specific network scene. Distant sow thistle and nearby small sow thistle targets will not interfere with the obstacle avoidance decision of the transport vehicle. By cropping the network output and deleting large feature maps, the network will not recognize such targets. That is, crop the distant sow thistle and nearby small sow thistle targets and only detect the near-field stability in each frame.
[0027] Step S3: Optimization of pooling method. To address the issue that maximum pooling can lead to the loss of important features, an exponentially weighted average filtering method is adopted for pooling to retain as much useful information as possible.
[0028] In step S1, the YOLOv4 preset prior box is obtained by K-means clustering on the COCO dataset. The COCO dataset contains objects of different sizes, and K-means re-clustering is performed on specific objects.
[0029] The specific steps involved in initializing the anchor frame are as follows:
[0030] Step S1.1: Randomly select 9 boxes as initial anchor boxes;
[0031] Step S1.2: Using the IOU metric, assign each box to the anchor box that is closest to it, where the IOU metric formula is as follows:
[0032] In the formula, box represents the actual bounding box, centre represents the anchor box, and IOU represents the intersection-union ratio;
[0033] Step S1.3: Calculate the mean of the width and height of all boxes in each cluster, and update the anchor boxes;
[0034] Step S1.4: Repeat steps S1.2 and S1.3 until the selected anchor box no longer changes, or the maximum number of iterations is reached;
[0035] Taking the detection of bitter lettuce as an example, the anchor boxes after K-means clustering are shown in the table below: The training results obtained by re-clustering are shown in Figure 3. After re-clustering, the anchor boxes are more practical, which increases the F1 value of the optimized YOLOv4 by 5%.
[0036] In step S2, the method of the present invention does not need to stably detect obstacles at a distance when detecting obstacles in the form of bitter lettuce. It only needs to stably detect nearby obstacles in each frame of the image. According to the principle of near objects appearing larger and far objects appearing smaller, small targets in the foreground do not pose a threat to the vehicle and are therefore not detected. Since the receptive field of the largest scale (76×76) is the smallest, the large scale will detect small bitter lettuce targets. The anchor boxes obtained at the large scale have been obtained in step S1. The anchor boxes in the image are shown in Figure 4. The anchor boxes corresponding to the large feature maps are responsible for detecting small targets, which do not pose a threat to the vehicle and are therefore not required to be detected. Therefore, the output of the large scale is cropped in the YOLOv4 pruning optimization. By deleting the largest feature map, the number of prior boxes placed can be greatly reduced, which greatly increases the inference speed.
[0037] Taking the detection of bitter lettuce as an example, the speed before and after deletion was compared, and the detection speed increased by 24.3%, as shown in the table below:
[0038] [Revised according to Rule 26, 23.12.2025] YOLOv4 increases the receptive field of the network through SPP (Max Pooling). Max Pooling selects the maximum value within the filter instead of the corresponding region within the filter, which may lead to the loss of useful information. In step S3, the pooling is performed using an exponentially weighted average filtering method, which can retain as much useful information as possible. The principle is shown in the table below: Compared to max pooling, exponentially weighted average pooling retains more information when downsampling activation mapping, and higher activations dominate, significantly reducing the risk of losing most of the information and thus improving accuracy.
[0039] The YOLOv4 network structure after the above three optimization steps is shown in Figure 2. Taking sow thistle as an example, the detection results based on the optimized YOLOv4 are shown in Figure 5. Since the network has removed the large-scale prior boxes, small targets in the foreground and small targets in the image view are no longer detected (red dashed box). After the network is improved, it can achieve accurate detection of sow thistle, whether it is near, far or in the case of alternating occlusion and illumination shadows.
[0040] A comparison of the optimized YOLOv4 model using the method of this invention with the unoptimized YOLOv4 model is shown in the table below, taking the detection of bitter lettuce as an example:
[0041] To further verify the differences between the proposed optimized method of this invention in YOLOv4 and other detection models, Fast R-CNN, SSD300, and YOLOv3 were compared, and P, R, F1, inference speed, and model size were compared. The results are shown in the table below: The real-time performance of the optimized YOLOv4 obtained by the method of this invention has a significant advantage over Fast R-CNN and SSD, and its accuracy has a significant advantage over the single-stage network YOLOv3 and the standard YOLOv4.
[0042] The above description is merely a preferred embodiment of the present invention and is not intended to limit the invention. Various modifications and variations can be made to the present invention by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the scope of protection of the present invention.
Claims
1. An optimized method for rapid obstacle recognition and detection in *Sonchus oleraceus* based on YOLOv4, characterized in that, The method includes the following steps: Step S1: Initialize anchor boxes using the K-means algorithm to generate anchor boxes with stronger scale adaptability, making the model more suitable for detecting slender bitter lettuce. Step S2: Optimize the network structure to suit the specific characteristics of the network scene, remove distant sow thistle and nearby small sow thistle targets, and only perform near-field stability detection in each frame image; Step S3: Optimize the pooling method by using an exponentially weighted average filtering method to retain as much useful information as possible.
2. The optimized method for rapid obstacle recognition and detection in *Sonchus oleraceus* based on YOLOv4 according to claim 1, characterized in that, In step S1, the specific steps for initializing the anchor frame are as follows: Step S1.1: Randomly select 9 boxes as initial anchor boxes; Step S1.2: Using the IOU metric, assign each box to the anchor box closest to it, where the IOU metric formula is as follows: In the formula, box represents the actual bounding box, centre represents the anchor box, and IOU represents the intersection-union ratio; Step S1.3: Calculate the mean of the width and height of all boxes in each cluster, and update the anchor boxes; Step S1.4: Repeat steps S1.2 and S1.3 until the selected anchor box no longer changes, or the maximum number of iterations is reached.
3. The optimized method for rapid obstacle recognition and detection in *Sonchus oleraceus* based on YOLOv4 according to claim 2, characterized in that, In step S2, the anchor boxes corresponding to the large feature maps are responsible for detecting small targets. Small targets do not need to be detected. The largest feature map is deleted to reduce the number of prior boxes placed, thereby increasing the speed of inference.