An improved yolov5-based water surface target detection method and device

By improving the Yolov5 water surface target detection method, and utilizing Gaussian filtering, multi-angle attention mechanism, and feature scale transformation transfer fusion network, combined with the Distance-IoU loss function, the problem of insufficient multi-target detection capability in water surface target detection is solved, thereby improving detection accuracy and precision.

CN116311011BActive Publication Date: 2026-06-23SOUTHEAST UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SOUTHEAST UNIV
Filing Date
2023-03-10
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing target detection algorithms suffer from poor multi-target detection capabilities, complex water surface scenes, diverse target types, large differences in scale distribution, and numerous occlusions in water surface target detection, resulting in unsatisfactory detection results.

Method used

An improved Yolov5 water surface target detection method is adopted, which improves detection accuracy by using Gaussian filtering, multi-angle attention mechanism and feature scale transformation transfer fusion network, combined with the Distance-IoU loss function.

Benefits of technology

It improves the precision and accuracy of surface target detection, reduces missed and false detections in complex backgrounds, and enhances the performance of multi-target detection.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116311011B_ABST
    Figure CN116311011B_ABST
Patent Text Reader

Abstract

The application provides a kind of based on improved Yolov5 water surface target detection method and device, it includes the following steps: 1) image acquisition, after data enhancement constructs water surface target set;2) using Gaussian filter to shallow feature map does smoothing operation and realizes image preprocessing;3) embedding multi-angle attention mechanism in Yolov5 network, so that the feature extracted is richer, improve the resolution decline, local information loss problem caused by ordinary method;4) feature scale conversion migration fusion Yolov5 network carries out feature processing;5) using Distance-IoU loss method loss calculation, finally, the three kinds of losses are weighted calculation, and then the network is optimized, and more accurate results are obtained.The method combines computer vision technology and detection technology and other fields, improves the efficiency and accuracy of water surface target detection, avoids the problems such as low detection accuracy, slow recognition speed and unsatisfactory multi-target detection caused by existing methods.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the fields of computer vision and detection, specifically to an improved YOLOv5-based method and apparatus for detecting water surface targets. Background Technology

[0002] With the rapid development of modern shipbuilding technology and the proposal of the national maritime power strategy, the ocean has been elevated to a national strategic level. Domestically, my country boasts a long coastline and favorable marine resources and environmental conditions, but the overall utilization and development of marine resources are currently insufficient, indicating a huge potential for marine resources to support national socio-economic development. Internationally, competition among maritime powers in marine development is ongoing, and my country's marine construction also faces various international challenges. To better utilize marine resources, alleviate the urgent demand for resource supply in modern society, and enhance our initiative in international maritime competition, how to further improve the intelligence level of ships and ensure navigation safety has become a hot research topic in the shipbuilding field. With the successive emergence of new concepts such as the Internet of Things, cloud computing, and artificial intelligence, and the continuous upgrading of sensor equipment and computer hardware, target detection technology is playing an increasingly important role.

[0003] However, current object detection algorithms have certain limitations; they perform well for detecting single, simple targets, but are less effective at detecting multiple target types. Water surface target detection faces challenges such as complex water scenes, diverse types of water targets with significant intra-class variations, large differences in target scale distribution, and frequent occlusion. These issues lead to unsatisfactory detection results. Accurate identification and classification of water targets and improving the accuracy of target detection are key research areas, as they are crucial for future practical applications. Summary of the Invention

[0004] To address the aforementioned issues, this invention provides an improved Yolov5-based method and apparatus for water surface target detection. It utilizes Gaussian filtering for smoothing, introduces a multi-angle attention mechanism, and integrates feature scale transformation and transfer with the Yolov5 network to improve detection accuracy.

[0005] This invention provides an improved YOLOv5-based method for detecting water surface targets, comprising the following steps:

[0006] Step S1: Use the camera on the unmanned vessel to collect several images of the water surface environment, and use methods such as random scaling, random cropping, random arrangement, mix-up and mosaic to perform data augmentation and create a water surface target dataset;

[0007] Step S2: Introduce Gaussian filtering. Before fusing the shallow and deep features of the PAN structure, use a Gaussian filter with a small kernel size and variance to smooth the shallow feature map. This can preserve useful high-frequency information and reduce the impact of Gaussian noise on the fused features.

[0008] Step S3: Introduce a multi-angle attention mechanism. First, flip the image three times by 90° to obtain four sets of samples from different orientations. Then, perform 2D convolution on each of the four sets of samples to extract features. Concatenate the four extracted features, and then perform Batch Normalization and ReLU operations after 2D convolution to form new features. Flip the new features by 180° to obtain two sets of samples. The operations are concatenation, convolution, normalization, and ReLU. Output the extracted features.

[0009] Step S4: Feature scale transformation and transfer fusion of the Yolov5 network. The feature map scale is transformed without destroying the original feature data. The low-level features are scaled down, and then convolutional dimensionality reduction, convolutional feature extraction, and convolutional dimensionality increase are performed. Finally, the result is added to the fusion layer as the input of the subsequent network.

[0010] Step S5: Improve loss calculation. The Distance-IoU loss method is used to improve the localization loss, which speeds up the convergence speed and eliminates bounding box redundancy. At the same time, the target confidence loss on the three prediction branches is first weighted and then summed to obtain the total target confidence loss. The weighting is used to improve the target detection accuracy.

[0011] Furthermore, in step S1, the detection category labels include at least the categories of ships, yachts, sailboats, humans, birds, and bottles;

[0012] Furthermore, in step S3, a multi-angle attention mechanism is introduced. First, the image is flipped three times by 90° to obtain four sets of samples from different orientations. Then, two-dimensional convolution is applied to each of the four sets of samples for feature extraction. The extracted features are then concatenated and subjected to Batch Normalization and ReLU operations to form new features. The new features are then flipped by 180° to obtain two sets of samples. Two-dimensional convolution is applied to each of the four sets of samples for feature extraction. The two sets of features are then concatenated and subjected to Batch Normalization and ReLU operations before being output.

[0013] Further, in step S4, the feature scale transformation transfer fusion Yolov5 network transforms the scale of the feature map without destroying the original data of the features. The low-level features are scaled down by a sampling factor of 2, and then the features are first reduced by a 1×1 convolution kernel. Then, the features are extracted by a 3×3 convolution kernel. Finally, the features are increased by a 1×1 convolution kernel matching the number of fusion layers. Finally, the features are added to the fusion layer as the input of the subsequent network.

[0014] Further, in step S5, the Distance-IoU loss function is used to calculate the loss, which is defined as:

[0015]

[0016] Where IoU is the crossover-union ratio, b and b gt ρ represents the center point of the predicted box and the ground truth box, ρ(·) represents the Euclidean distance, and c is the diagonal length of the minimum bounding box that covers both boxes.

[0017] This invention provides a water surface target detection device, the device comprising:

[0018] The image preprocessing module is configured to acquire several images, perform data augmentation on each image by randomly scaling, cropping, arranging, mixing up, and mosaicking, and stitch the images and prior bounding boxes together.

[0019] The Gaussian filtering module performs a small smoothing operation on the shallow feature map before fusing shallow and deep features in the PAN structure. This retains useful high-frequency information while reducing the impact of Gaussian noise on the fused features.

[0020] The multi-angle attention mechanism module flips the image to obtain multiple sets of samples from different orientations. It then extracts features from each sample using 2D convolution. The extracted features are concatenated and then processed through 2D convolution, a Batch Normalization (BN) layer, and a ReLU function to form new features. These new features are then flipped and concatenated with the original features, and the same operation is performed to complete feature extraction. This multi-angle attention mechanism module aims to reduce the need to rely on false positives and false negatives when dealing with complex background problems.

[0021] The feature scale transformation module transfers and fuses the Yolov5 network, transforming the scale of the feature map without destroying the original feature data. It performs feature scale reduction on low-level features, then performs convolutional dimensionality reduction, extracts features through convolution, performs convolutional dimensionality increase, and finally adds it to the fusion layer as the input of subsequent networks.

[0022] The loss calculation module is configured to use the DIoU loss function to calculate the localization loss, and combine it with other losses to improve the overfitting and low accuracy caused by uneven sample classification, improve the regression accuracy of the detection boxes, and obtain the final target detection network.

[0023] The beneficial effects of this invention are:

[0024] 1. Gaussian filtering is introduced to smooth the shallow feature map before fusing the shallow and deep features of the PAN structure. This retains useful high-frequency information and reduces the impact of Gaussian noise on the fused features. 2. A multi-angle attention mechanism is introduced to perform multiple image flips and ReLU operations to extract features from multiple angles. This makes the extracted features richer and improves the resolution reduction and local information loss problems caused by ordinary methods. This is beneficial for subsequent detection and classification and improves recognition accuracy.

[0025] 2. By adopting the Distance-IoU loss method, even when the target box does not overlap, the method can still provide a movement direction for the bounding box and accelerate the convergence speed. Therefore, the improved Yolov5-based water surface target detection method proposed in this invention improves the detection accuracy of water surface targets. Attached Figure Description

[0026] Figure 1 A flowchart of a water surface target detection method based on an improved YOLOv5 according to an embodiment of the present invention;

[0027] Figure 2 A diagram of a network model for a multi-angle attention mechanism;

[0028] Figure 3 This is an example diagram of a dataset according to an embodiment of the present invention;

[0029] Figure 4 This is a structural diagram of a surface target detection device based on an improved Yolov5 according to an embodiment of the present invention. Detailed Implementation

[0030] The present invention will be further illustrated below with reference to the accompanying drawings and specific embodiments. It should be understood that the following specific embodiments are for illustrative purposes only and are not intended to limit the scope of the invention. It should be noted that the terms "front," "rear," "left," "right," "up," and "down" used in the following description refer to directions in the accompanying drawings, and the terms "inner" and "outer" refer to directions toward or away from the geometric center of a specific component, respectively.

[0031] This invention provides an improved Yolov5-based method for detecting water surface targets, comprising the following steps:

[0032] Step 1: Obtain several images of the water surface environment, and perform data augmentation using random scaling, random cropping, random arrangement, MixUp, and mosaic to create a water surface target dataset. The detection category labels should at least include ships, yachts, sailboats, humans, birds, and bottles.

[0033] Step 2: Introduce Gaussian filtering. Before fusing the shallow and deep features of the PAN structure, use a Gaussian filter with a small kernel size and variance to smooth the shallow feature map. This can preserve useful high-frequency information and reduce the impact of Gaussian noise on the fused features.

[0034] Step 3: Introduce a multi-angle attention mechanism. First, flip the image three times by 90° to obtain four sets of samples from different orientations. Then, perform 2D convolution on each of the four sets of samples to extract features. Concatenate the four extracted features, and then perform Batch Normalization and ReLU operations to form new features. Flip the new features by 180° to obtain two sets of samples. Perform 2D convolution on each of the four sets of samples to extract features. Concatenate the two sets of features, and then perform Batch Normalization and ReLU operations before outputting the result.

[0035] Step 4: Feature Scale Transformation and Transfer Fusion of the Yolov5 Network. The feature map scale is transformed without destroying the original feature data. The low-level features are downsampled by a sampling factor of 2, and then passed through a 1×1 convolution kernel for dimensionality reduction. Then, features are extracted by a 3×3 convolution kernel. Finally, they are added to the fusion layer as the input of the subsequent network.

[0036] Step 5: Calculate the loss using the Distance-IoU loss function, defined as:

[0037]

[0038] Where IoU is the crossover-union ratio, b and b gt Represents B, B gt The center point, ρ(·) represents the Euclidean distance, and c is the diagonal length of the minimum bounding box covering the two boxes.

[0039] This invention also provides a surface target detection device, the device comprising:

[0040] The image preprocessing module 601 is configured to acquire several images, perform data augmentation on each image by randomly scaling, randomly cropping, randomly arranging, mixing up and mosaicking, and stitch the images and prior boxes together.

[0041] The Gaussian filter module 602 is configured to perform a small smoothing operation on the shallow feature map using a Gaussian filter with a small kernel size and standard deviation before fusing shallow and deep features in the PAN structure. This can preserve useful high-frequency information and reduce the impact of Gaussian noise on the fused features.

[0042] The multi-angle attention mechanism module 603 is configured to flip the image to obtain multiple sets of samples from different orientations, then use two-dimensional convolution to extract features from the samples, concatenate the extracted features and perform two-dimensional convolution, and then flip the new samples again through the BN layer and ReLU function to obtain new samples. This reduces the problem of reverting to false detection and false negative detection when dealing with some complex background problems.

[0043] The feature scale transformation module 604 is configured as a transfer fusion Yolov5 network. It transforms the scale of the feature map without destroying the original feature data. It performs feature scale reduction operation on low-level features, performs convolution dimensionality reduction operation, extracts features through convolution operation, performs convolution dimensionality increase operation, and finally adds it to the fusion layer as the input of the subsequent network to continue extracting features.

[0044] The loss calculation module 605 is configured to use a classification loss function to improve the overfitting and low accuracy problems caused by uneven sample classification, improve the regression accuracy of the detection boxes, and obtain the final target detection network. The localization loss is:

[0045]

[0046] Where IoU is the crossover-union ratio, b and b gt The center point of the predicted bounding box and the ground truth bounding box are represented by ρ(·), where ρ(·) represents the Euclidean distance, and c is the diagonal length of the minimum bounding box covering both boxes. The technical means disclosed in this invention are not limited to those disclosed in the above embodiments, but also include technical solutions composed of any combination of the above technical features.

Claims

1. A method for detecting water surface targets based on an improved Yolov5 system, characterized in that, Includes the following steps: Step S1: Use the camera on the unmanned vessel to collect several images of the water surface environment, and use random scaling, random cropping, random arrangement, mix-up and mosaic methods to perform data augmentation and create a water surface target dataset; Step S2: Introduce Gaussian filtering. Before fusing the shallow and deep features of the PAN structure, use a Gaussian filter with a small kernel size and variance to smooth the shallow feature map. This can preserve useful high-frequency information and reduce the impact of Gaussian noise on the fused features. Step S3: Introduce a multi-angle attention mechanism; in step S3, the constructed multi-angle attention mechanism includes the following steps: Step 3.1: Flip the image three times by 90° to obtain four sets of samples from different orientations. Then, perform two-dimensional convolution on each of the four sets of samples to extract features. Concatenate the four extracted features and then perform BatchNormalization and ReLU operations after two-dimensional convolution to form new features. Step 3.2: Flip the newly formed features by 180° to obtain two sets of samples. Extract features from the four sets of samples using two-dimensional convolution. Concatenate the two sets of features, perform two-dimensional convolution, and then output the results after Batch Normalization and ReLU operations. Step S4: Feature Scale Transformation and Transfer Fusion of the Yolov5 Network. This step transforms the scale of the feature maps while preserving the original feature data. Low-level features are scaled down, then subjected to convolutional dimensionality reduction, convolutional feature extraction, and convolutional dimensionality increase. Finally, the result is added to the fusion layer as input to subsequent networks. Step S4 specifically includes the following processes: After downsampling with a sampling factor of 2, the system first performs convolutional dimensionality reduction using a 1×1 convolutional kernel, then extracts features through convolutional operations using a 3×3 convolutional kernel, then performs convolutional dimensionality increase using a 1×1 convolutional kernel matching the number of fusion layers, and finally adds it to the fusion layer as the input to the subsequent network. Step S5: Improve loss calculation. The Distance-IoU loss method is used to improve the localization loss, which speeds up the convergence speed and eliminates bounding box redundancy. At the same time, the target confidence loss on the three prediction branches is first weighted and then summed to obtain the total target confidence loss. The weighting is used to improve the target detection accuracy.

2. The improved YOLOv5-based water surface target detection method according to claim 1, characterized in that: In step S1, the detection category labels of the method include at least the categories of ships, yachts, sailboats, humans, birds, and bottles.

3. The improved YOLOv5-based water surface target detection method according to claim 1, characterized in that: Step S5: Calculate the loss using the Distance-IoU loss function, defined as: ; Where IoU is the crossover-union ratio, b, b gt ρ(·) represents the center point of the predicted box and the ground truth box, respectively, ρ(·) represents the Euclidean distance, and c is the diagonal length of the minimum bounding box that covers both boxes.