A method for improving target positioning speed based on yolov5 and opencv

By combining OpenCV and YOLOv5 algorithms, and using OpenCV for initial bounding and a small classification network for classification, the problems of misjudging adjacent targets and slow speed of deep learning in traditional methods are solved, thereby improving the speed of target localization and accurately distinguishing multiple adjacent targets.

CN116051827BActive Publication Date: 2026-06-19ANHUI JICUI INTELLIGENT ROBOT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ANHUI JICUI INTELLIGENT ROBOT TECH CO LTD
Filing Date
2022-11-22
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

In existing technologies, traditional image processing methods are fast in target localization but are prone to misjudging overlapping targets, while deep learning methods can solve the problem of multiple overlapping targets but are slow, resulting in insufficient production capacity in industrial projects.

Method used

Combining OpenCV and YOLOv5 algorithms, the target is initially bounded using OpenCV, a small classification network is designed for classification, and the use of YOLOv5 for relocalization is determined based on the number of adhering objects. Median filtering and Otsu thresholding are used for background segmentation and target extraction, and the Mish activation function and multi-channel convolutional layers are used for training.

🎯Benefits of technology

It improves target localization speed and can effectively distinguish multiple targets stuck together, with a speed improvement of 6.2% compared to using YOLOv5 alone.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116051827B_ABST
    Figure CN116051827B_ABST
Patent Text Reader

Abstract

This invention discloses a method for improving target localization speed based on YOLOv5 and OpenCV. The method includes using the OpenCV algorithm to extract targets from zircon and film, designing a classification network to classify the targets extracted by OpenCV, leaving no processing for single targets, and feeding multiple targets into YOLOv5 for relocalization. Compared with using YOLOv5 alone, the speed is improved by 6.2%, and it can distinguish the case of multiple targets being stuck together.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and in particular to a method for improving target localization speed based on YOLOv5 and OpenCV. Background Technology

[0002] Currently, target localization in an image can be broadly categorized into two types: traditional image processing methods and deep learning-based methods. Traditional image processing methods offer advantages such as speed and accuracy, but they tend to treat overlapping targets as a single entity. Deep learning-based localization can address the issue of multiple overlapping targets, but it is slow and inefficient for industrial projects, hindering production capacity. Therefore, a combination of traditional and deep learning approaches is considered for target localization. This approach retains the speed advantage of traditional methods while addressing the problem of mislocalizing overlapping targets. Summary of the Invention

[0003] The purpose of this section is to outline some aspects of embodiments of the present invention and to briefly describe some preferred embodiments. Simplifications or omissions may be made in this section, as well as in the abstract and title of this application, to avoid obscuring the purpose of these documents; however, such simplifications or omissions should not be construed as limiting the scope of the invention.

[0004] In view of the problems existing in the above and / or existing methods for improving target localization speed based on YOLOv5 and OpenCV, this invention is proposed.

[0005] Therefore, the problem to be solved by this invention is how to provide a method for improving target localization speed based on YOLOv5 and OpenCV.

[0006] To solve the above-mentioned technical problems, the present invention provides the following technical solution:

[0007] The OpenCV algorithm is used to extract targets from the film and initially define the targets on the film.

[0008] Design a classification network to classify the targets extracted by the OpenCV algorithm;

[0009] The adhesion quantity is analyzed based on the classified film targets. No processing is performed on the film targets with an adhesion quantity of 0, and the remaining adhered film targets are sent to YOLOv5 for relocation.

[0010] The repositioned film target is framed using the OpenCV algorithm.

[0011] As a preferred embodiment of the method for improving target localization speed based on YOLOv5 and OpenCV described in this invention, the OpenCV algorithm for target extraction involves using median filtering to remove noise from the image, and then using the Otsu thresholding method to segment the background and extract the target.

[0012] As a preferred embodiment of the method for improving target localization speed based on YOLOv5 and OpenCV described in this invention, the target classification design of the classification network is a separate small network designed to classify the acquired targets. This network includes 2 convolutional layers, 2 pooling layers, 2 fully connected layers, and 2 activation function layers.

[0013] As a preferred embodiment of the method for improving target localization speed based on YOLOv5 and OpenCV described in this invention, the classification network aims to obtain a model by training a small classification network to extract the target, and to classify the target segmented by OpenCV using the model.

[0014] As a preferred embodiment of the method for improving target localization speed based on YOLOv5 and OpenCV described in this invention, the relocalization includes:

[0015] The system classifies and processes the objects based on their adhesion and quantity. When the initial adhesion quantity is 1, the initial object is placed into the YOLOv5 network for model training. When the initial adhesion quantity exceeds 1, the initial object is segmented again using the Otsu thresholding method to extract the segmented objects. The adhesion quantity analysis is performed again on the segmented objects, and the relocalization operation is repeated. The single film after YOLOv5 model training is bounded by OpenCV.

[0016] As a preferred embodiment of the method for improving target localization speed based on YOLOv5 and OpenCV described in this invention, the activation function layer uses the Mish activation function.

[0017] As a preferred embodiment of the method for improving target localization speed based on YOLOv5 and OpenCV described in this invention, the OpenCV algorithm for target extraction includes:

[0018] Extract the gradient value of the image grayscale;

[0019] Take the gradient values ​​of the horizontal and vertical images respectively;

[0020] Construct a three-dimensional normalized unit coordinate system for the gradients of the x-axis and y-axis;

[0021] Determine the top-down angle and radian system of the light source;

[0022] Determine the orientation angle and radian system of the light source;

[0023] The image is reconstructed after the light source is normalized.

[0024] As a preferred embodiment of the method for improving target localization speed based on YOLOv5 and OpenCV described in this invention, the activation function layer value is 0 when the input is less than 0, and outputs a value equal to the input when the input is greater than zero.

[0025] As a preferred embodiment of the method for improving target localization speed based on YOLOv5 and OpenCV described in this invention, the convolutional layers used in the classification network design are multi-channel input multi-convolutional layers.

[0026] As a preferred embodiment of the method for improving target localization speed based on YOLOv5 and OpenCV described in this invention, the two fully connected layers of the classification network are used to train a black and white image with a resolution of 784.

[0027] The beneficial effects of this invention are that it provides a method for improving target localization speed based on YOLOv5 and OpenCV. Compared with using YOLOv5 alone, the speed is improved by 6.2%, and it can distinguish the situation of multiple targets sticking together. Attached Figure Description

[0028] To more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort. Wherein:

[0029] Figure 1 This is a flowchart illustrating the method for improving target localization speed based on YOLOv5 and OpenCV in Example 1.

[0030] Figure 2 The image shown is the original target image extracted in Example 1, which uses YOLOv5 and OpenCV to improve the target localization speed.

[0031] Figure 3 This is the background segmentation image after OpenCV target extraction, which is the result of the method for improving target localization speed based on YOLOv5 and OpenCV in Example 1.

[0032] Figure 4 This is the Mish activation function image of the method for improving target localization speed based on YOLOv5 and OpenCV in Example 1.

[0033] Figure 5 This is a classification diagram of the method for improving target localization speed based on YOLOv5 and OpenCV in Example 1. Figure 5 The image on the left shows the image type when the amount of film adhered is 0. Figure 5 The right-hand side image shows the type of image when the adhesion quantity is 1.

[0034] Figure 6 This refers to the target localization method based on YOLOv5 and OpenCV in Example 1.

[0035] Figure 7 This is a YOLOv5 positioning time diagram for the method of improving target positioning speed based on YOLOv5 and OpenCV in Example 2.

[0036] Figure 8 This is a graph showing the target localization time based on YOLOv5 and OpenCV in Example 2, illustrating the method for improving target localization speed using YOLOv5 and OpenCV. Detailed Implementation

[0037] To make the above-mentioned objects, features and advantages of the present invention more apparent and understandable, the specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

[0038] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and those skilled in the art can make similar extensions without departing from the spirit of the invention. Therefore, the invention is not limited to the specific embodiments disclosed below.

[0039] Secondly, the term "one embodiment" or "embodiment" as used herein refers to a specific feature, structure, or characteristic that may be included in at least one implementation of the present invention. The phrase "in one embodiment" appearing in different places in this specification does not necessarily refer to the same embodiment, nor is it a single or selective embodiment that is mutually exclusive with other embodiments.

[0040] This invention utilizes YOLOv5 and OpenCV to improve the speed of target localization. Those skilled in the art should know that YOLO is an object detection model. YOLOv5 differs from previous models like YOLOv3 and YOLOv4. While YOLOv3 and V4 had a complete large model, they only had a lightweight Tiny model. Notably, Tiny only had two output layers. YOLOv5, however, features four network models: YOLOv5s, YOLOv5m, YOLOv5I, and YOLOv5x. These models control the network's width and depth using `depth_multiple` and `width_multiple`, similar to the concept of EfficientNet.

[0041] Among them, YOLOv5s has the smallest depth and the smallest feature map width in this series. Other networks are built upon this foundation, continuously increasing in depth and width.

[0042] OpenCV is a cross-platform computer vision and machine learning software library released under the Apache 2.0 license (open source). It can run on Linux, Windows, Android, and Mac OS operating systems. It is lightweight and efficient—consisting of a series of C functions and a small number of C++ classes—while also providing interfaces for languages ​​such as Python, Ruby, and MATLAB, and implementing many common algorithms in image processing and computer vision.

[0043] Example 1

[0044] Reference Figures 1 to 7 This is the first embodiment of the present invention, which provides a method for improving target localization speed based on YOLOv5 and OpenCV. Specifically:

[0045] (1) Extract the target using the OpenCV algorithm:

[0046] First, median filtering is used to remove noise from the image. Then, Otsu's thresholding method is used to segment the background and obtain the target, such as... Figure 1 As shown.

[0047] The OpenCV algorithm is used to extract targets from the film and initially define the targets on the film.

[0048] Design a classification network to classify the targets extracted by the OpenCV algorithm;

[0049] The adhesion quantity is analyzed based on the classified film targets. No processing is performed on the film targets with an adhesion quantity of 0, and the remaining adhered film targets are sent to YOLOv5 for relocation.

[0050] The repositioned film target is framed using the OpenCV algorithm.

[0051] The OpenCV algorithm described above uses median filtering to remove noise from the image, and then uses the Otsu thresholding method to segment the background and extract the target.

[0052] Algorithms for target acquisition include:

[0053] The OpenCV algorithm used for target extraction includes:

[0054] Extract the gradient value of the image grayscale;

[0055] Take the gradient values ​​of the horizontal and vertical images respectively;

[0056] Construct a three-dimensional normalized unit coordinate system for the gradients of the x-axis and y-axis;

[0057] Determine the top-down angle and radian system of the light source;

[0058] Determine the orientation angle and radian system of the light source;

[0059] The image is reconstructed after the light source is normalized.

[0060] (2) Design a small classification network:

[0061] The target classification design of the classification network is to design a small network to classify the acquired target. This network contains 2 convolutional layers, 2 pooling layers, 2 fully connected layers, and 2 activation function layers.

[0062] A small network is designed to classify the target obtained in the previous step. This network contains 2 convolutional layers, 2 pooling layers, 2 fully connected layers, and 2 activation function layers. The activation function used is the Mish activation function. Mish has performed better than other functions such as ReLU and Swish in many experiments. It has the characteristics of no upper limit but a lower limit, smoothness, and non-monotonicity. Figure 4 Image of the Mish activation function.

[0063] (3) The classification network classifies the targets:

[0064] The first step involves acquiring numerous targets. These targets are then trained using the small classification network described above to create a model. The targets segmented by OpenCV are then classified using this model. The results are shown in the image. Figure 5 As shown. The activation function layer outputs a value equal to the input when the input is less than 0, and when the input is greater than zero.

[0065] Figure 2 Extract the original image for the target. Figure 3This is the background segmentation image after target extraction using OpenCV, i.e., the segmentation image after target extraction using the OpenCV algorithm. Figure 5 The image shows a classification diagram of methods for improving target localization speed based on YOLOv5 and OpenCV. Figure 5 The image on the left shows the image type when the amount of film adhered is 0. Figure 5 The right-hand side image shows the type of image when the adhesion quantity is 1.

[0066] The relocation includes:

[0067] The system classifies and processes the objects based on their adhesion and quantity. When the initial adhesion quantity is 1, the initial object is placed into the YOLOv5 network for model training. When the initial adhesion quantity exceeds 1, the initial object is segmented again using the Otsu thresholding method to extract the segmented objects. The adhesion quantity analysis is performed again on the segmented objects, and the relocalization operation is repeated. The single film after YOLOv5 model training is bounded by OpenCV.

[0068] The algorithms for classification networks include:

[0069] (4) Incorporate categories with multiple objectives into the YOLOv5 network:

[0070] The premise here is that images of category 1 are used for labeling, and then the model is trained using YOLOv5. After passing through a small classification network, OpenCV's generated bounding boxes are retained for category 0, while category 1 images are fed into the YOLOv5 network to regenerate target boxes. The final localization is as follows: Figure 6 .

[0071] Figure 6 The final image shows the target localization based on YOLOv5 and OpenCV.

[0072] Example 2

[0073] Reference Figures 7 to 8 This is the second embodiment of the present invention, which differs from the first embodiment in that it further includes: In the previous embodiment, the method for improving target localization speed based on YOLOv5 and OpenCV includes:

[0074] In a hardware environment with Ubuntu 20.04, i7-11700, and 2080Ti, Figure 7 A graph showing the YOLOv5 localization time for methods of target localization speed. Figure 8 This is a time-consuming graph for target localization based on YOLOv5 and OpenCV. Figure 7-8It can be seen that when using YOLOv5, it takes 97ms to locate 13 images with a network size of 640*640, while the OpenCV+YOLOv5 method takes 91ms to locate 13 images, resulting in an overall speed improvement of 6.2%.

[0075] It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit it. Although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art should understand that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all such modifications or substitutions should be covered within the scope of the claims of the present invention.

Claims

1. A method for improving target positioning speed based on yolov5 and opencv, characterized in that: include, The OpenCV algorithm is used to extract targets from the film and initially define the targets on the film. Design a classification network to classify the targets extracted by the OpenCV algorithm; The adhesion quantity is analyzed based on the classified film targets. No processing is performed on the film targets with an adhesion quantity of 0, and the remaining adhered film targets are sent to YOLOv5 for relocation. The repositioned film target is framed using the OpenCV algorithm.

2. The method for improving target positioning speed based on yolov5 and opencv according to claim 1, wherein: The OpenCV algorithm described above uses median filtering to remove noise from the image, and then uses the Otsu thresholding method to segment the background and extract the target.

3. The method for improving target positioning speed based on yolov5 and opencv of claim 2, wherein: The target classification design of the classification network is to design a small network to classify the acquired target. This network contains 2 convolutional layers, 2 pooling layers, 2 fully connected layers, and 2 activation function layers.

4. The method for improving target positioning speed based on yolov5 and opencv of claim 3, wherein: The goal of the classification network is to train a small classification network to obtain a model for extracting targets, and then use the model to classify the targets segmented by OpenCV.

5. The method for improving target localization speed based on YOLOv5 and OpenCV as described in claim 4, characterized in that: The relocation includes: The process is categorized and processed according to the input adhesion target and quantity; When the initial target adhesion count is 1, the initial target is put into the YOLOv5 network for model training; When the number of adhesions of the initial target exceeds 1, the initial target is segmented again using the Otsu threshold method to extract the segmented target. The number of adhesions is analyzed again for the segmented target, and the relocation operation is repeated. The single film after training the YOLOv5 model is framed using OpenCV.

6. The method for improving target positioning speed based on yolov5 and opencv of claim 5, wherein: The activation function layer uses the mish activation function.

7. The method for improving target positioning speed based on yolov5 and opencv of claim 6, wherein: The OpenCV algorithm for target extraction includes: Extract the gradient value of the image grayscale; Take the gradient values ​​of the horizontal and vertical images respectively; Construct a three-dimensional normalized unit coordinate system for the gradients of the x-axis and y-axis; Determine the top-down angle and radian system of the light source; Determine the orientation angle and radian system of the light source; The image is reconstructed after the light source is normalized.

8. The method for improving target positioning speed based on yolov5 and opencv of claim 7, wherein: The activation function layer outputs a value equal to the input when the input is less than 0, and when the input is greater than zero.

9. The method for improving target positioning speed based on yolov5 and opencv of claim 8, wherein: The classification network design utilizes a multi-channel input multi-convolutional layer configuration.

10. The method for improving target positioning speed based on yolov5 and opencv of claim 9, wherein: The classification network consists of a two-layer fully connected network constructed from fully connected layers, trained on black and white images with a resolution of 784.

Citation Information

Patent Citations

  • Non-inductive dinner plate image data automatic annotation method based on adversarial learning

    CN110765844A

  • Intelligent lossless apple sorting method

    CN113569922A