Object detection methods, methods and apparatus for determining training samples for object detection

By adjusting the intersection-union ratio (IUU) threshold between candidate boxes and target boxes in the object detection algorithm, positive samples are determined based on the aspect ratio of the target boxes. This solves the problem of low detection accuracy for objects with large aspect ratios, achieving higher detection accuracy and a lower false negative rate.

CN116630688BActive Publication Date: 2026-06-30ZHEJIANG SMART VIDEO SECURITY INNOVATION CENT CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG SMART VIDEO SECURITY INNOVATION CENT CO LTD
Filing Date
2023-04-25
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing object detection algorithms have low accuracy and high false negative rate for objects with large aspect ratios, especially long and narrow objects, and cannot effectively learn their features and location information.

Method used

Positive samples are determined based on the aspect ratio of the target boxes in the training images. A first preset threshold that is inversely proportional to the aspect ratio is set, and the intersection-union ratio threshold between the candidate boxes and the target boxes is adjusted to ensure that the positive samples of the candidate boxes are selected accurately. The target detection model is then trained in combination with a preset loss function.

Benefits of technology

It improves the accuracy of detecting objects with large aspect ratios, reduces the false negative rate, and enhances the model's ability to learn the features and positions of objects with large aspect ratios.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116630688B_ABST
    Figure CN116630688B_ABST
Patent Text Reader

Abstract

This invention relates to a method and apparatus for object detection and determining training samples for object detection. The method, when dividing positive and negative samples, determines positive samples of target boxes based on the aspect ratio of the target boxes in the training image. Furthermore, when the aspect ratio of a target box is large, a first preset threshold inversely proportional to the aspect ratio is set. This results in a smaller first preset threshold when the aspect ratio of the target box is large. Using the intersection-union ratio (IUU) of candidate boxes and target boxes, along with the first preset threshold, can also effectively determine whether a candidate box is a positive sample, avoiding missed detections. This lays the foundation for training an object detection model based on positive and negative samples of target boxes in the training image. The trained object detection model can better learn the position and feature information of the object to be detected, reducing the missed detection rate of objects with large aspect ratios, thereby improving the accuracy of object detection.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence technology, and in particular to object detection, and methods and apparatus for determining training samples for object detection. Background Technology

[0002] Object detection is an important research area in computer vision and digital image processing, with wide applications in fields such as robot navigation, intelligent video surveillance, industrial inspection, and aerospace. The goal of object detection is to find objects of interest in an image, encompassing two sub-tasks: object localization and object classification—that is, simultaneously determining the object's category and location.

[0003] Currently, object detection using neural networks trained with large amounts of image data has become the mainstream approach in the industry. Neural network-based algorithms can be broadly categorized into two types: two-stage algorithms, represented by Faster R-CNN proposed in the paper "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," and one-stage algorithms, such as You Only Look Once (YOLO) and Single Shot MultiBox Detector (SSD), which predict the location and category of multiple boxes at once. Regardless of whether it's a one-stage or two-stage algorithm, the basic idea is to randomly define candidate boxes of different aspect ratios and sizes in various regions of the training image during the model training phase. Based on the relevance of these candidate boxes to the real target's location, they are divided into three categories: positive samples, negative samples, and ignored samples. A candidate box that is relatively close to the real target's location is designated as a positive sample. A candidate box that is far from all targets is designated as a negative sample. Samples located between very close and very far from the target are set to be ignored. Generally, the intersection-over-union (IOU) ratio between the candidate box and the target box is used as the relevance of the location.

[0004] Currently, both one-stage and two-stage algorithms show a significant performance degradation when detecting long, narrow objects with large aspect ratios, such as skis and pencils, compared to detecting smaller objects like soccer balls and cars. In some cases, the network cannot detect objects with large aspect ratios, resulting in a high false negative rate. In other cases, although the network can detect objects with large aspect ratios, the target location is not accurate enough. Summary of the Invention

[0005] This invention provides a target detection method, a method and apparatus for determining training samples for target detection, to solve the problem in the prior art that objects with a large aspect ratio are missed, resulting in low accuracy of target detection. This invention aims to reduce the missed detection rate of objects with a large aspect ratio and improve the detection accuracy of such objects.

[0006] A target detection method includes: acquiring the length, width, and position information of a target bounding box and candidate bounding boxes in a training image; the target bounding box is the smallest bounding box of the object to be detected in the training image, and the candidate bounding boxes are candidate bounding boxes with different aspect ratios randomly drawn in the training image; determining the intersection-union ratio (IUR) of the target bounding box and each candidate bounding box according to the length, width, and position information of the target bounding box and each candidate bounding box; determining positive and negative samples of the target bounding box from each candidate bounding box based on the IUR of each candidate bounding box; the IUR of the candidate bounding box corresponding to the positive sample and the target bounding box is greater than a first preset threshold, the first preset threshold being determined based on the aspect ratio of the target bounding box; and when the aspect ratio of the target bounding box is greater than a second preset threshold, the first preset threshold is inversely proportional to the aspect ratio of the target bounding box.

[0007] In one embodiment, determining the positive and negative samples of the target box from each candidate box based on the intersection-union ratio (IU) of each candidate box includes: if the aspect ratio of the target box is greater than a second preset threshold, calculating the ratio of the aspect ratio of the target box to the second preset threshold, calculating the ratio of a third preset threshold to the ratio, and obtaining a first preset threshold corresponding to the target box; or, if the aspect ratio of the target box is less than or equal to the second preset threshold, determining the second preset threshold as the first preset threshold corresponding to the target box.

[0008] In one embodiment, determining the positive and negative samples of the target box from each candidate box based on the intersection-union ratio (IU) of each candidate box further includes: determining the candidate box as a positive sample of the target box when the IU of the candidate box and the target box in the training image is greater than a first preset threshold; or, determining the candidate box as a negative sample of the target box when the IU of the candidate box and the target box is less than a fourth preset threshold, wherein the fourth preset threshold is less than the first preset threshold; or, determining the candidate box as an ignored sample of the target box when the IU of the candidate box and the target box is greater than or equal to the fourth preset threshold and less than or equal to the first preset threshold.

[0009] In one embodiment, determining the intersection-union ratio (IUU) of the target box and each candidate box based on the length, width, and position information of the target box and each candidate box includes: determining the aspect ratio of the target box based on the length and width of the target box in the training image; wherein the aspect ratio of the target box is the ratio of the larger of the width and length of the target box to the smaller of the width and length of the target box; and determining the IUU of the target box and each candidate box based on the length, width, and position information of the target box and each candidate box in the training image.

[0010] In one embodiment, the method further includes:

[0011] The target detection model is trained based on the positive and negative samples and a preset loss function; wherein the preset loss function is as follows: Where, N cls N represents the sum of the number of positive and negative samples. reg L represents the number of positive samples. cls L represents the classification loss function used to determine the difference between the predicted output class and the true class of the candidate box. reg α represents the location loss function, used to determine the difference between the predicted output location information of the candidate box and the true location information of its corresponding target box; α is the weight balancing the classification loss function and the location loss function; i represents the index number of the candidate box; p i This represents the predicted output category corresponding to the i-th candidate box. t represents the true category of the i-th candidate box; i This represents the predicted output location information corresponding to the i-th candidate box. δ represents the true location information of the target corresponding to the i-th candidate box; i This indicates whether the i-th candidate box is a positive sample. If the i-th candidate box is a positive sample, δ i =1; when the i-th candidate box is a negative sample or the sample is ignored, δ i =0.

[0012] An object detection method includes: acquiring a target image to be detected; inputting the target image to be detected into a trained object detection model to obtain the position information and final category of each object in the target image; wherein the object detection model is trained based on positive and negative samples of target boxes in the training image; the intersection-union ratio (IU) of the candidate box corresponding to the positive sample with the target box in the training image is greater than a first preset threshold, the first preset threshold being determined based on the aspect ratio of the target box; and when the aspect ratio of the target box is greater than a second preset threshold, the first preset threshold corresponding to the target box is inversely proportional to the aspect ratio of the target box.

[0013] An apparatus for determining training samples for object detection includes: a first acquisition module, configured to acquire the length, width, and position information of a target bounding box and candidate bounding boxes in a training image; the target bounding box is the smallest bounding box of the object to be detected in the training image, and the candidate bounding boxes are candidate bounding boxes with different aspect ratios randomly drawn in the training image; a first determination module, configured to determine the intersection-union ratio (IUR) of the target bounding box and each candidate bounding box based on the length, width, and position information of the target bounding box and each candidate bounding box; a second determination module, configured to determine positive and negative samples of the target bounding box from each candidate bounding box based on the IUR of each candidate bounding box; the IUR of the candidate bounding box corresponding to the positive sample and the target bounding box is greater than a first preset threshold, the first preset threshold being determined based on the aspect ratio of the target bounding box; and when the aspect ratio of the target bounding box is greater than a second preset threshold, the first preset threshold is inversely proportional to the aspect ratio of the target bounding box.

[0014] An object detection device includes: a first acquisition module for acquiring an image of a target to be detected; and a processing module for inputting the image of the target to be detected into a trained object detection model to obtain position information and a final category for each object in the target image; wherein the object detection model is trained based on positive and negative samples of target boxes in a training image; the intersection-union ratio (IU) of the candidate box corresponding to the positive sample with the target box in the training image is greater than a first preset threshold, the first preset threshold being determined based on the aspect ratio of the target box; and when the aspect ratio of the target box is greater than a second preset threshold, the first preset threshold corresponding to the target box is inversely proportional to the aspect ratio of the target box.

[0015] The present invention also provides a computer device, including a memory and a processor, wherein the memory stores computer-readable instructions, which, when executed by the processor, cause the processor to perform the steps of the object detection method or the method for determining training samples for object detection described above.

[0016] The present invention also provides a storage medium storing computer-readable instructions, which, when executed by one or more processors, cause the one or more processors to perform the steps of the object detection method or the method for determining training samples for object detection described above.

[0017] The aforementioned object detection method, method, and apparatus for determining training samples for object detection, by determining positive samples of target boxes based on the aspect ratio of the target boxes in the training image during the division of positive and negative samples, and by setting a first preset threshold that is inversely proportional to the aspect ratio of the target boxes when the aspect ratio of the target boxes is large, thus making the first preset threshold smaller when the aspect ratio of the target boxes is large. Using the intersection-union ratio of candidate boxes and target boxes and the first preset threshold, it is also possible to effectively determine whether a candidate box is a positive sample, avoiding the situation of missing candidate boxes. This lays the foundation for training a well-trained object detection model based on positive and negative samples of target boxes in the training image and a preset loss function, enabling the trained object detection model to better learn the position information and feature information of the objects to be detected, reducing the false negative rate of objects with large aspect ratios, thereby improving the accuracy of object detection. Attached Figure Description

[0018] Figure 1 This is one of the flowcharts illustrating the method for determining training samples for target detection provided by the present invention;

[0019] Figure 2 The second schematic flowchart of the method for determining training samples for target detection provided by the present invention;

[0020] Figure 3 A schematic flowchart of the target detection method provided by the present invention;

[0021] Figure 4 A schematic diagram of the framework of the device for determining training samples for target detection provided by the present invention;

[0022] Figure 5 A schematic diagram of the frame of the target detection device provided by the present invention;

[0023] Figure 6 A schematic diagram of the electronic device provided by the present invention. Detailed Implementation

[0024] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.

[0025] It should be noted that, unless otherwise defined, the technical or scientific terms used in the embodiments of this disclosure should have the ordinary meaning understood by one of ordinary skill in the art to which this disclosure pertains. The terms "first," "second," and similar terms used in the embodiments of this disclosure do not indicate any order, quantity, or importance, but are merely used to distinguish different components. Terms such as "comprising" or "including" mean that the element or object preceding the word encompasses the elements or objects listed following the word and their equivalents, without excluding other elements or objects. Terms such as "connected" or "linked" are not limited to physical or mechanical connections, but can include electrical connections, whether direct or indirect. Terms such as "upper," "lower," "left," and "right" are used only to indicate relative positional relationships; when the absolute position of the described object changes, the relative positional relationship may also change accordingly.

[0026] To facilitate understanding, the inventive concept of this invention will be explained first.

[0027] Currently, both one-stage and two-stage object detection algorithms show a significant performance degradation when detecting long, narrow objects with large aspect ratios, such as skis and pencils, compared to detecting smaller objects like soccer balls and cars. In some cases, the network cannot detect objects with large aspect ratios, resulting in a high false negative rate. In other cases, although the network can detect objects with large aspect ratios, the target location is not accurate enough.

[0028] One reason for this phenomenon is that when the aspect ratio of the target object is large, the intersection and union ratio between the candidate boxes and the target box is low. These candidate boxes are likely to be set as ignored samples or negative samples, which results in a small number of candidate boxes for targets with large aspect ratios. This causes the model to fail to learn the features of objects with large aspect ratios well, and thus fails to classify the target correctly or accurately locate its position.

[0029] The following description, in conjunction with the accompanying drawings, illustrates the target detection method, the method for determining training samples for target detection, and the apparatus provided by this invention.

[0030] Figure 1 This is a flowchart illustrating a method for determining training samples for object detection provided by the present invention. It is understood that this method for determining training samples for object detection can be executed by a device for determining training samples for object detection. The device for determining training samples for object detection can be a computer device.

[0031] like Figure 1 As shown, in one embodiment, a method for determining training samples for object detection is proposed, which may specifically include the following steps:

[0032] Step 110: Obtain the length, width, and position information of the target bounding box and candidate bounding boxes in the training image.

[0033] The target bounding box is the ground truth bounding box of the object to be detected in the training image, which can be the smallest bounding box of the object to be detected in the training image, used to provide feedback on the true location and size of the object to be detected. The candidate bounding box is a candidate bounding box with different aspect ratios randomly drawn in the training image.

[0034] The training images are images containing objects to be detected. These objects can be, for example, vehicles, cats, dogs, or people.

[0035] It is understandable that a training image contains at least one object to be detected, and therefore, a training image contains at least one bounding box. Furthermore, since different objects to be detected have different sizes, the aspect ratios of different bounding boxes are different.

[0036] Step 120: Determine the intersection-union ratio of the target box and each candidate box based on the length, width and position information of the target box and each candidate box.

[0037] In one embodiment, determining the intersection-union ratio (IUU) of the target bounding box and each candidate bounding box based on the length, width, and position information of the target bounding box and each candidate bounding box includes:

[0038] The aspect ratio of the target box is determined based on the length and width of the target box in the training image; wherein the aspect ratio of the target box is the ratio of the larger value of the width and length of the target box to the smaller value of the width and length of the target box; the intersection-union ratio of the target box and each candidate box is determined based on the length, width and position information of the target box and each candidate box in the training image.

[0039] The aspect ratio of the target bounding box can be determined based on the length and width of the target bounding box.

[0040] Specifically, the formula for calculating the aspect ratio of the target bounding box is as follows: wh ratio =max(w,h) / min(w,h), where w is the width of the target box, h is the length of the target box, max(w,h) represents the larger of the width w and the length h, and min(w,h) represents the smaller of the width w and the length h.

[0041] The intersection-union ratio (IUR) of the target bounding box and each candidate bounding box is the ratio of the intersection and union of the target bounding box and each candidate bounding box. Specifically, the area of ​​the target bounding box and the corresponding area of ​​each candidate bounding box can be calculated using the length, width, and position information of the target bounding box and each candidate bounding box. This determines the area of ​​the union of the target bounding box and each candidate bounding box, and the area of ​​the intersection of the target bounding box and each candidate bounding box is also calculated. Finally, the IUR is calculated by the ratio of the area of ​​the union of the target bounding box and each candidate bounding box to the area of ​​their intersection. Specifically, the length, width, and position information of the target bounding box or candidate bounding box can be determined by the position coordinates of its diagonal or four corners. The process for determining the IUR can be found in existing technologies and will not be elaborated here for simplicity.

[0042] Step 130: Based on the intersection-union ratio (IU) of each candidate box, determine the positive and negative samples of the target box from each candidate box; the IU of the candidate box corresponding to the positive sample and the target box is greater than a first preset threshold, the first preset threshold is determined based on the aspect ratio of the target box; and when the aspect ratio of the target box is greater than a second preset threshold, the first preset threshold is inversely proportional to the aspect ratio of the target box.

[0043] The positive and negative samples of the target bounding boxes form the training samples, which are used in conjunction with the loss function to train the target detection model.

[0044] Understandably, during the training phase of an object detection model, candidate boxes with different aspect ratios and sizes are randomly drawn in the training images. Based on the positional relevance of these candidate boxes to the target boxes of the object to be detected (usually the Intersection over Union (IOU) between the candidate and target boxes), the candidate boxes are divided into three categories: positive samples, negative samples, and ignored samples. Specifically, if a candidate box is relatively close to the ground truth location of the target box, it is designated as a positive sample. If a candidate box is far from all target boxes, it is designated as a negative sample. If the position of the candidate box is between close and far from the target box, it is designated as an ignored sample. In other words, if the IOU of a candidate box to the target box is greater than a preset threshold, the candidate box is designated as a positive sample of the target box; if the IOU of a candidate box to the target box is less than another preset threshold, the candidate box is designated as a negative sample of the target box. However, in reality, for some objects with large aspect ratios, the intersection and union of candidate boxes and target boxes is relatively small. This means that candidate boxes are very likely to be classified as negative samples or ignored samples. Consequently, during the subsequent training of the object detection model based on positive samples and the pre-set position loss function, the model cannot learn the relevant position information of the positive samples and some feature information of the object to be detected. As a result, the object detection model trained based on the pre-set loss function cannot effectively determine the position information and feature information of the target box, leading to missed detections.

[0045] Therefore, in view of the above situation, the method for determining training samples for object detection in this invention, when dividing positive and negative samples, determines the positive samples of the target boxes based on the aspect ratio of the target boxes in the training image. Furthermore, when the aspect ratio of the target boxes is large, a first preset threshold inversely proportional to the aspect ratio is set, thus making the first preset threshold smaller when the aspect ratio of the target boxes is large. Using the intersection-union ratio of the candidate boxes and the target boxes, along with the first preset threshold, it is also possible to effectively determine whether a candidate box is a positive sample, avoiding the possibility of missing candidate boxes. This lays the foundation for training a well-trained object detection model based on the positive and negative samples of the target boxes in the training image and a preset loss function. The trained object detection model can better learn the positional and feature information of the objects to be detected, reducing the false negative rate of objects with large aspect ratios, thereby improving the accuracy of object detection.

[0046] In one embodiment, such as Figure 2 As shown, determining the positive and negative samples of the target box from each candidate box based on the intersection-union ratio (IU) of each candidate box includes the following steps:

[0047] Step 210: Determine the first preset threshold corresponding to the target box based on the aspect ratio of the target box.

[0048] Specifically, step 210 includes either step 2101 or step 2102.

[0049] Step 2101: If the aspect ratio of the target box is greater than the second preset threshold, the ratio of the third preset threshold to the aspect ratio of the target box and the second preset threshold is determined as the first preset threshold corresponding to the target box.

[0050] Understandably, in this case, the first preset threshold corresponding to the target box is expressed by the formula: th1 = th3 / (wh ratio / th2); where th1 is the first preset threshold corresponding to the target box, th2 is the second preset threshold, th3 is the third preset threshold, and wh ratio The aspect ratio of the target bounding box.

[0051] The second preset threshold can be, for example, 4, and the third preset threshold can be, for example, 0.7. It is understood that the second and third preset thresholds can also be other values.

[0052] Step 2102: If the aspect ratio of the target box is less than or equal to the second preset threshold, the second preset threshold is determined as the first preset threshold corresponding to the target box.

[0053] It is understood that after step 210 above, the step of determining the positive and negative samples of the target box from each candidate box based on the intersection-union ratio corresponding to each candidate box may also include any one of steps 220 to 240.

[0054] Step 220: If the intersection-union ratio of the candidate box and the target box in the training image is greater than a first preset threshold, the candidate box is determined as a positive sample of the target box.

[0055] Step 230: If the intersection-union ratio of the candidate box and the target box is less than a fourth preset threshold, the candidate box is determined as a negative sample of the target box; the fourth preset threshold is less than the first preset threshold.

[0056] Step 240: If the intersection-union ratio of the candidate box and the target box is greater than or equal to a fourth preset threshold and less than or equal to a first preset threshold, the candidate box is determined as an ignored sample of the target box.

[0057] In other words, if the Intersection over Union (IOU) ratio between a candidate bounding box and the target bounding box satisfies th4 ≤ IOU ≤ th1, the candidate bounding box is determined as an ignored sample of the target bounding box. Here, th4 is a fourth preset threshold.

[0058] The fourth preset threshold can be, for example, 0.3. It is understood that the fourth preset threshold can also be other values.

[0059] As mentioned above, the positive and negative samples of the target bounding boxes constitute the training samples, which are combined with the loss function to train the target detection model. Therefore, in one embodiment, the method further includes:

[0060] The target detection model is trained based on the positive and negative samples and the preset loss function.

[0061] The preset loss function is as follows: Where, N cls N represents the sum of the number of positive and negative samples. reg L represents the number of positive samples. cls L represents the classification loss function used to determine the difference between the predicted output class and the true class of the candidate box. reg α represents the location loss function, used to determine the difference between the predicted output location information of the candidate box and the true location information of its corresponding target box; α is the weight balancing the classification loss function and the location loss function; i represents the index number of the candidate box; p i This represents the predicted output category corresponding to the i-th candidate box. t represents the true category of the i-th candidate box; iThis represents the predicted output location information corresponding to the i-th candidate box. δ represents the true location information of the target corresponding to the i-th candidate box; i This indicates whether the i-th candidate box is a positive sample. If the i-th candidate box is a positive sample, δ i =1; when the i-th candidate box is a negative sample or the sample is ignored, δ i =0.

[0062] The object detection model is a model used for object detection, which can be a neural network-based algorithm model, such as any one of Faster R-CNN, YOLO or SSD. This invention does not limit this.

[0063] The classification loss function is the loss function that enables the model to classify data, reducing the gap between the predicted output class and the true output class; for example, it can be the cross-entropy loss function. The location loss function is the loss function that enables the model to have location information; for example, it can be the smooth-L1 loss or various IOU-based loss functions.

[0064] Specifically, based on a preset loss function, the loss of the target detection model's predicted output corresponding to positive and negative samples as inputs can be calculated, and the parameters of the target detection model can be iteratively optimized until the model converges. Specifically, a stochastic gradient descent algorithm can be used to iteratively optimize the model. When the number of iterations equals a preset upper limit, or the loss value is less than a preset target threshold, the model optimization stops, resulting in the final trained target detection model.

[0065] It is understandable that, as mentioned earlier, even for target boxes with large aspect ratios in the training image, positive samples can be effectively identified. Therefore, during the training process, the object detection model can continuously learn the positional information of the target boxes in the training image based on the identified positive samples and the positional loss function, even for target boxes with large aspect ratios. Furthermore, it can continuously learn feature information for classification based on the positive and negative samples of the target boxes in the training image. Therefore, in practical applications, a well-trained object detection model can effectively detect objects with large aspect ratios in the target image, thereby reducing the false negative rate of large-aspect-ratio objects and improving the accuracy of object detection.

[0066] Figure 3 This is a schematic flowchart illustrating a target detection method provided by the present invention. It can be understood that this target detection method can be executed by a target detection device, which can be a computer device.

[0067] like Figure 3As shown, in one embodiment, a target detection method is proposed, which may specifically include the following steps:

[0068] Step 310: Obtain the target image to be detected.

[0069] The target image to be detected can be an image containing the object to be detected. For example, in a vehicle detection scenario, the target image can be an image containing a vehicle. Similarly, in an animal detection scenario, the target image can be an image containing a cat, a dog, or a person.

[0070] Step 320: Input the target image to be detected into the trained target detection model to obtain the position information and final category of each object in the target image.

[0071] The target detection model is trained based on positive and negative samples of target boxes in the training image; the intersection-union ratio (IUR) of the candidate box corresponding to the positive sample with the target box in the training image is greater than a first preset threshold, which is determined based on the aspect ratio of the target box; and when the aspect ratio of the target box is greater than a second preset threshold, the first preset threshold corresponding to the target box is inversely proportional to the aspect ratio of the target box.

[0072] Specifically, the process of determining the positive and negative samples of the target boxes in the training image can be referred to the relevant description in the previous embodiments, and will not be repeated here for the sake of brevity.

[0073] It can be understood that the output of the object detection model includes the location information and category information of each object (target box) in the image to be detected. The category information is the probability that each object belongs to each preset category. The final category of the object at each location is determined by the maximum probability value among the probabilities of the object at each location belonging to each preset category.

[0074] The object detection method of this invention, when dividing positive and negative samples, determines positive samples of target boxes based on the aspect ratio of the target boxes in the training image. Furthermore, when the aspect ratio of the target box is large, a first preset threshold inversely proportional to the aspect ratio is set, thus making the first preset threshold smaller when the aspect ratio of the target box is large. Using the intersection-union ratio (IUU) of the candidate box and the target box, along with the first preset threshold, it is also possible to effectively determine whether a candidate box is a positive sample, avoiding missed detections. This lays the foundation for training a well-trained object detection model based on positive and negative samples of target boxes in the training image and a preset loss function. The trained object detection model can better learn the position and feature information of the object to be detected, reducing the missed detection rate of objects with large aspect ratios, thereby improving the accuracy of object detection.

[0075] like Figure 4 As shown, in one embodiment, an apparatus for determining training samples for object detection is provided, comprising:

[0076] The first acquisition module 410 is used to acquire the length, width, and position information of the target box and candidate boxes in the training image; the target box is the smallest bounding box of the object to be detected in the training image, and the candidate boxes are candidate boxes with different aspect ratios randomly drawn in the training image.

[0077] The first determining module 420 is used to determine the intersection-union ratio of the target box and each candidate box based on the length, width and position information of the target box and each candidate box;

[0078] The second determining module 430 is used to determine positive and negative samples of the target box from each candidate box based on the intersection-union ratio (IU) of each candidate box; the IU of the candidate box corresponding to the positive sample and the target box is greater than a first preset threshold, the first preset threshold is determined based on the aspect ratio of the target box; and when the aspect ratio of the target box is greater than a second preset threshold, the first preset threshold is inversely proportional to the aspect ratio of the target box.

[0079] like Figure 5 As shown, in one embodiment, a target detection device is provided, which may include:

[0080] The second acquisition module 510 is used to acquire the target image to be detected;

[0081] Processing module 520 is used to input the target image to be detected into a trained target detection model to obtain the position information and final category of each object in the target image;

[0082] The target detection model is trained based on positive and negative samples of target boxes in the training image; the intersection-union ratio (IUR) of the candidate box corresponding to the positive sample with the target box in the training image is greater than a first preset threshold, which is determined based on the aspect ratio of the target box; and when the aspect ratio of the target box is greater than a second preset threshold, the first preset threshold corresponding to the target box is inversely proportional to the aspect ratio of the target box.

[0083] The object detection apparatus and the apparatus for determining training samples for object detection provided by this invention, when dividing positive and negative samples, determine the positive samples of the target boxes based on the aspect ratio of the target boxes in the training image. Furthermore, when the aspect ratio of the target boxes is large, a first preset threshold that is inversely proportional to the aspect ratio is set. This results in a smaller first preset threshold when the aspect ratio of the target boxes is large, enabling the use of the intersection-union ratio (IUU) of the candidate boxes and the target boxes, along with the first preset threshold, to effectively determine whether a candidate box is a positive sample. This lays the foundation for subsequent training of an object detection model based on the positive and negative samples of the target boxes in the training image. The trained object detection model can effectively determine the position information of the object to be detected, reducing the false negative rate of objects with large aspect ratios, thereby improving the accuracy of object detection.

[0084] Figure 6 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 6As shown, the electronic device may include: a processor 610, a communication interface 620, a memory 630, and a communication bus 640, wherein the processor 610, the communication interface 620, and the memory 630 communicate with each other through the communication bus 640. The processor 610 can call logic instructions in the memory 630 to execute a method for determining training samples for object detection and an object detection method. The method for determining training samples for object detection includes: acquiring the length, width, and position information of target boxes and candidate boxes in a training image; the target box is the smallest bounding box of the object to be detected in the training image, and the candidate boxes are candidate boxes with different aspect ratios randomly drawn in the training image; determining the intersection-union ratio (IU) of the target box and each candidate box according to the length, width, and position information of the target box and each candidate box; determining positive and negative samples of the target box from each candidate box based on the IU corresponding to each candidate box; the IU of the candidate box corresponding to the positive sample and the target box is greater than a first preset threshold, the first preset threshold being determined based on the aspect ratio of the target box; and when the aspect ratio of the target box is greater than a second preset threshold, the first preset threshold is inversely proportional to the aspect ratio of the target box. The target detection method includes: acquiring a target image to be detected; inputting the target image to be detected into a trained target detection model to obtain the position information and final category of each object in the target image; wherein, the target detection model is trained based on positive and negative samples of target boxes in the training image; the intersection-union ratio (IU) of the candidate box corresponding to the positive sample with the target box in the training image is greater than a first preset threshold, the first preset threshold being determined based on the aspect ratio of the target box; and when the aspect ratio of the target box is greater than a second preset threshold, the first preset threshold corresponding to the target box is inversely proportional to the aspect ratio of the target box.

[0085] Furthermore, the logical instructions in the aforementioned memory 630 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the embodiments of the present invention. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0086] On the other hand, the present invention also provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, and when the program instructions are executed by a computer, the computer is able to execute the target detection method and the method for determining training samples for target detection provided by the present invention. The method for determining training samples for target detection comprises: acquiring the length, width, and position information of a target box and candidate boxes in a training image; the target box is the smallest bounding box of the object to be detected in the training image, and the candidate boxes are candidate boxes with different aspect ratios randomly drawn in the training image; determining the intersection-union ratio (IU) of the target box and each candidate box according to the length, width, and position information of the target box and each candidate box; determining positive and negative samples of the target box from each candidate box based on the IU corresponding to each candidate box; the IU of the candidate box corresponding to the positive sample and the target box is greater than a first preset threshold, the first preset threshold being determined based on the aspect ratio of the target box; and when the aspect ratio of the target box is greater than a second preset threshold, the first preset threshold is inversely proportional to the aspect ratio of the target box. The target detection method includes: acquiring a target image to be detected; inputting the target image to be detected into a trained target detection model to obtain the position information and final category of each object in the target image; wherein, the target detection model is trained based on positive and negative samples of target boxes in the training image; the intersection-union ratio (IU) of the candidate box corresponding to the positive sample with the target box in the training image is greater than a first preset threshold, the first preset threshold being determined based on the aspect ratio of the target box; and when the aspect ratio of the target box is greater than a second preset threshold, the first preset threshold corresponding to the target box is inversely proportional to the aspect ratio of the target box.

[0087] In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing a computer program thereon. When executed by a processor, the computer program implements the target detection method and the method for determining training samples for target detection provided by the present invention. The method for determining training samples for target detection includes: acquiring the length, width, and position information of a target box and candidate boxes in a training image; the target box is the smallest bounding box of the object to be detected in the training image, and the candidate boxes are candidate boxes with different aspect ratios randomly drawn in the training image; determining the intersection-union ratio (IU) of the target box and each candidate box according to the length, width, and position information of the target box and each candidate box; determining positive and negative samples of the target box from each candidate box based on the IU corresponding to each candidate box; the IU of the candidate box corresponding to the positive sample and the target box is greater than a first preset threshold, the first preset threshold being determined based on the aspect ratio of the target box; and when the aspect ratio of the target box is greater than a second preset threshold, the first preset threshold is inversely proportional to the aspect ratio of the target box. The target detection method includes: acquiring a target image to be detected; inputting the target image to be detected into a trained target detection model to obtain the position information and final category of each object in the target image; wherein, the target detection model is trained based on positive and negative samples of target boxes in the training image; the intersection-union ratio (IU) of the candidate box corresponding to the positive sample with the target box in the training image is greater than a first preset threshold, the first preset threshold being determined based on the aspect ratio of the target box; and when the aspect ratio of the target box is greater than a second preset threshold, the first preset threshold corresponding to the target box is inversely proportional to the aspect ratio of the target box.

[0088] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0089] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

[0090] It is understood that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for determining training samples for object detection, characterized in that, include: Obtain the length, width, and position information of the target bounding box and candidate bounding boxes in the training image; The target bounding box is the smallest bounding box of the object to be detected in the training image, and the candidate bounding boxes are candidate bounding boxes with different aspect ratios randomly drawn in the training image; the training image contains at least one target bounding box, and different target bounding boxes have different aspect ratios; Based on the length, width, and position information of the target bounding box and each candidate bounding box, the intersection-union ratio of the target bounding box and each candidate bounding box is determined respectively; Based on the intersection-union ratio (IUU) of each candidate box, positive and negative samples of the target box are determined from each candidate box; the IUU of the candidate box corresponding to the positive sample and the target box is greater than a first preset threshold, which is determined based on the aspect ratio of the target box. Furthermore, when the aspect ratio of the target frame is greater than the second preset threshold, the first preset threshold is inversely proportional to the aspect ratio of the target frame; The step of determining the positive and negative samples of the target box from each candidate box based on the intersection-union ratio (IU) of each candidate box includes: If the aspect ratio of the target box is greater than the second preset threshold, calculate the ratio of the aspect ratio of the target box to the second preset threshold, calculate the ratio of the third preset threshold to the ratio, and obtain the first preset threshold corresponding to the target box. or, If the aspect ratio of the target box is less than or equal to the second preset threshold, the second preset threshold is determined as the first preset threshold corresponding to the target box.

2. The method for determining training samples for target detection according to claim 1, characterized in that, The step of determining the positive and negative samples of the target box from each candidate box based on the intersection-union ratio corresponding to each candidate box further includes: If the intersection-union ratio (IoU) between the candidate bounding box and the target bounding box in the training image is greater than a first preset threshold, the candidate bounding box is determined as a positive sample of the target bounding box; or, If the intersection-union ratio (IUU) between the candidate bounding box and the target bounding box is less than a fourth preset threshold, the candidate bounding box is determined as a negative sample of the target bounding box; the fourth preset threshold is less than the first preset threshold; or, If the intersection-union ratio of the candidate box and the target box is greater than or equal to a fourth preset threshold and less than or equal to a first preset threshold, the candidate box is determined as an ignored sample of the target box.

3. The method for determining training samples for target detection according to claim 1, characterized in that, The step of determining the intersection-union ratio (IUU) of the target bounding box and each candidate bounding box based on the length, width, and position information of the target bounding box and each candidate bounding box includes: The aspect ratio of the target box is determined based on the length and width of the target box in the training image; wherein the aspect ratio of the target box is the ratio of the larger value of the width and length of the target box to the smaller value of the width and length of the target box. Based on the length, width, and position information of the target bounding box and each candidate bounding box in the training image, the intersection-union ratio of the target bounding box and each candidate bounding box is determined.

4. The method for determining training samples for target detection according to claim 1, characterized in that, The method further includes: The target detection model is trained based on the positive and negative samples and a preset loss function; wherein the preset loss function is as follows: ,in, This represents the sum of the number of positive and negative samples. Indicates the number of positive samples. This represents the classification loss function, used to determine the difference between the predicted output class and the true class of the candidate box. This represents the location loss function, used to determine the difference between the predicted output location information of the candidate box and the true location information of its corresponding target box; It is the weight that balances the classification loss function and the location loss function, where i represents the index number of the candidate box; This represents the predicted output category corresponding to the i-th candidate box. This represents the true category of the i-th candidate box; This represents the predicted output location information corresponding to the i-th candidate box. This represents the actual location information of the target corresponding to the i-th candidate box; This indicates whether the i-th candidate box is a positive sample.

5. A target detection method, characterized in that, include: Acquire the target image to be detected; The target image to be detected is input into the trained target detection model to obtain the position information and final category of each object in the target image; The trained target detection model is obtained by training based on the method for determining target detection training samples as described in any one of claims 1 to 4.

6. An apparatus for determining training samples for target detection, characterized in that, include: The first acquisition module is used to acquire the length, width, and position information of the target box and candidate box in the training image; The target bounding box is the smallest bounding box of the object to be detected in the training image, and the candidate bounding boxes are candidate bounding boxes with different aspect ratios randomly drawn in the training image; the training image contains at least one target bounding box, and different target bounding boxes have different aspect ratios; The first determining module is used to determine the intersection-union ratio of the target box and each candidate box based on the length, width and position information of the target box and each candidate box; The second determining module is used to determine the positive and negative samples of the target box from each candidate box based on the intersection-union ratio (IU) of each candidate box; the IU of the candidate box corresponding to the positive sample and the target box is greater than a first preset threshold, which is determined based on the aspect ratio of the target box. Furthermore, when the aspect ratio of the target frame is greater than the second preset threshold, the first preset threshold is inversely proportional to the aspect ratio of the target frame; The step of determining the positive and negative samples of the target box from each candidate box based on the intersection-union ratio (IU) of each candidate box includes: If the aspect ratio of the target box is greater than the second preset threshold, calculate the ratio of the aspect ratio of the target box to the second preset threshold, calculate the ratio of the third preset threshold to the ratio, and obtain the first preset threshold corresponding to the target box. or, If the aspect ratio of the target box is less than or equal to the second preset threshold, the second preset threshold is determined as the first preset threshold corresponding to the target box.

7. A target detection device, characterized in that, include: The second acquisition module is used to acquire the target image to be detected; The processing module is used to input the target image to be detected into the trained target detection model to obtain the position information and final category of each object in the target image; The trained target detection model is obtained by training based on the method for determining target detection training samples as described in any one of claims 1 to 4.

8. An electronic device comprising a memory and a processor, wherein the memory stores a computer program executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the method for determining training samples for target detection as described in any one of claims 1 to 4, or the target detection method as described in claim 5.

9. A computer storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it is used to implement the steps of the method for determining training samples for target detection as described in any one of claims 1 to 4, or the target detection method as described in claim 5.