Training method and device of target detection model, target detection method and device

CN116368537BActive Publication Date: 2026-06-26BOE TECHNOLOGY GROUP CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BOE TECHNOLOGY GROUP CO LTD
Filing Date
2021-10-28
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

In existing technologies, the detection of self-similar targets (such as fireworks, dust, clouds, coastlines, etc.) suffers from the problem of "difficult labeling and inaccurate detection", resulting in low detection accuracy.

Method used

By determining the relationship between the intersection region and the prediction region during the training of the object detection model, and setting the loss function value of the prediction region to a low value when the intersection region and the prediction region satisfy the target relationship, the update of the loss function value is suppressed, thus avoiding incorrect update of the model parameters.

Benefits of technology

It improved the model's accuracy in detecting self-similar targets, reduced ambiguity in data annotation, and enhanced the model's recognition accuracy.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116368537B_ABST
    Figure CN116368537B_ABST
Patent Text Reader

Abstract

Embodiments of the present disclosure provide a target detection model training method and device, and a target detection method and device. The method comprises: determining a first region on a sample image, the first region being a target region predicted by a target detection model on the sample image; determining a relationship between an intersection region and the first region, the intersection region being an intersection of the first region and a second region, the second region being a labeled region labeled for a target on the sample image in a data labeling stage, the second region enclosing the whole of the target on the sample image; when the relationship between the intersection region and the first region satisfies a target relationship, setting a loss function value of the first region as a preset low loss function value, the low loss function value being a constant; and training the target detection model using the loss function value.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This disclosure relates to the field of image detection, and in particular to a training method and apparatus for an object detection model, and an object detection method and apparatus. Background Technology

[0002] Object detection refers to identifying objects of interest in images or videos, such as people, animals, flames, and smoke. Typically, object detection models are used to detect and identify these objects.

[0003] If the whole of a target and its parts are similar (including outline, texture, etc.), and the target itself can be infinitely divided, then the target is considered to have self-similarity. For example, flames have self-similarity.

[0004] When detecting self-similar targets, if the target detection model identifies the target's location in a region that differs from the pre-labeled region, the detection is deemed inaccurate. Summary of the Invention

[0005] This disclosure provides a training method and apparatus for an object detection model, as well as an object detection method and apparatus, to improve the accuracy of the model in detecting self-similar objects. The technical solution is as follows:

[0006] Firstly, a method for training an object detection model is provided, the method comprising:

[0007] A first region is determined on the sample image, which is the target region predicted by the target detection model on the sample image;

[0008] Determine the relationship between the intersection region and the first region. The intersection region is the intersection of the first region and the second region. The second region is the annotation region on the sample image for the target during the data annotation stage. The second region surrounds the entire target on the sample image.

[0009] When the relationship between the intersection region and the first region satisfies the target relationship, the loss function value of the first region is set to a preset low loss function value, where the low loss function value is a constant.

[0010] The target detection model is trained using the loss function value.

[0011] Optionally, determining the relationship between the intersection region and the first region includes:

[0012] Calculate the area ratio of the intersection region to the first region;

[0013] The relationship between the intersection region and the first region satisfies the target relationship, including:

[0014] The area ratio is not less than the threshold.

[0015] Optionally, the threshold value ranges from 0.9 to 1.

[0016] Optionally, the low loss function value is less than 0.001.

[0017] Optionally, the sample image includes targets of at least one category, and each category of targets corresponds to at least one first region;

[0018] Training the target detection model using the loss function value includes:

[0019] The loss function values ​​of multiple first regions corresponding to each category of target are weighted and summed.

[0020] The loss function value obtained by weighted summation is used for model training.

[0021] Optionally, when performing a weighted summation of the loss function values ​​of multiple first regions corresponding to each category of target, the weight of the lower loss function value is greater than the weight of the other loss function values.

[0022] Optionally, the first regions of different targets have different identifiers, and the second regions of different targets have different identifiers;

[0023] The first and second regions, which have the same target, have the same identifier.

[0024] Optionally, the method further includes: counting the training cycles of the target detection model;

[0025] When the count value of the training cycle of the target detection model reaches the target value, the steps of determining the relationship between the intersection region and the first region, and setting the loss function value of the first region to a preset low loss function value when the relationship between the intersection region and the first region satisfies the target relationship are executed.

[0026] Optionally, the method further includes:

[0027] When the relationship between the intersection region and the first region does not satisfy the target relationship, the loss function value of the first region is determined using the loss function value calculation formula.

[0028] Secondly, a target detection method is provided, characterized in that the method includes:

[0029] Target detection is performed using a target detection model, which is trained using the method described in any of the first aspects.

[0030] Thirdly, a training device for an object detection model is provided, the device comprising:

[0031] The first determining module is used to determine a first region on the sample image, wherein the first region is a target region predicted by the target detection model on the sample image.

[0032] The second determining module is used to determine the relationship between the intersection region and the first region. The intersection region is the intersection of the first region and the second region. The second region is the annotation region for the target annotation on the sample image during the data annotation stage. The second region surrounds the entire target on the sample image.

[0033] The processing module is configured to set the loss function value of the first region to a preset low loss function value when the relationship between the intersection region and the first region satisfies the target relationship, wherein the low loss function value is a constant;

[0034] The training module is used to train the target detection model using the loss function value.

[0035] Optionally, the second determining module is used to calculate the area ratio between the intersection region and the first region;

[0036] The relationship between the intersection region and the first region satisfies the target relationship, including:

[0037] The area ratio is not less than the threshold.

[0038] Optionally, the threshold value ranges from 0.9 to 1.

[0039] Optionally, the low loss function value is less than 0.001.

[0040] Optionally, the sample image includes targets of at least one category, and each category of targets corresponds to at least one first region;

[0041] The training module is used to perform a weighted summation of the loss function values ​​of multiple first regions corresponding to each category of target; and to use the weighted summation of the loss function values ​​for model training.

[0042] Fourthly, a target detection device is provided, the device comprising:

[0043] The detection module is used to perform target detection using a target detection model, which is trained using the method described in any of the first aspects.

[0044] Fifthly, a computer device is provided, the computer device including a processor and a memory;

[0045] The memory is used to store computer programs;

[0046] The processor is configured to execute a computer program stored in the memory to implement the training method of the target detection model described in any of the first aspects, or the target detection method described in the second aspect.

[0047] In a sixth aspect, a computer-readable storage medium is provided, wherein computer instructions are stored therein, and when executed by a processor, the stored computer instructions are capable of implementing the training method of any of the target detection models described in the first aspect, or the target detection method described in the second aspect.

[0048] The beneficial effects of the technical solutions provided in this disclosure are:

[0049] During model calibration, there may be cases where the calibrated region is only a part of the target, and this is accurate. In the training method provided in this embodiment, since the second region during annotation is the entire target area, when the intersection of the first and second regions satisfies the target relationship with the first region, it indicates that the first region calibrates the target. In this case, the loss function value of the first region is set to a preset low loss function value to suppress the loss function value. Thus, even if the model only calibrates a part of the target, it is considered to be correctly calibrated, avoiding incorrect updates to the model parameters and thereby improving the model's recognition accuracy. Attached Figure Description

[0050] To more clearly illustrate the technical solutions in the embodiments of this disclosure, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of this disclosure. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0051] Figure 1 This is a schematic diagram of furnace combustion provided in an embodiment of this disclosure;

[0052] Figure 2 This is a schematic diagram of furnace combustion provided in an embodiment of this disclosure;

[0053] Figure 3 This is a schematic diagram of furnace combustion provided in an embodiment of this disclosure;

[0054] Figure 4 This is a flowchart of a training method for an object detection model provided in an embodiment of this disclosure;

[0055] Figure 5 This is a flowchart of a training method for an object detection model provided in an embodiment of this disclosure;

[0056] Figure 6 This is a schematic diagram of smoke provided in an embodiment of this disclosure;

[0057] Figure 7 This is a combustion diagram provided in an embodiment of the present disclosure;

[0058] Figure 8 This is a schematic diagram illustrating the effect of target detection using the method provided in the embodiments of this disclosure;

[0059] Figure 9 This is a schematic diagram illustrating the effect of target detection using methods provided by related technologies;

[0060] Figure 10 This is a block diagram of a training device for an object detection model provided in an embodiment of this disclosure;

[0061] Figure 11 This is a schematic diagram of the structure of a computer device provided in an embodiment of this disclosure. Detailed Implementation

[0062] To make the objectives, technical solutions, and advantages of this disclosure clearer, the embodiments of this disclosure will be described in further detail below with reference to the accompanying drawings.

[0063] Currently, in the field of deep learning-based object detection, there is no specific labeling standard and detection method for self-similar objects (such as fireworks, dust, clouds, coastlines, etc.). Instead, they are treated as general object detection, that is, unlimited sample amplification and tedious manual labeling.

[0064] The detection of self-similar targets can cause problems for data annotation and model recognition accuracy due to the inherent self-similarity of the targets themselves. Taking fireworks detection as an example, flames themselves possess self-similarity, meaning that the entire flame and a part of it are similar. Figure 1 The diagram shown is a schematic of a furnace burning, such as Figure 1 As shown, the two parts of the flame are labeled 1 and 2, and these two parts are similar to the whole. Therefore, this can lead to some ambiguity when labeling the data: some people might label only the entire flame, such as... Figure 2 The label 3 in the text; while others might think that the flames around the flame and the main body of the flame need to be labeled separately, such as... Figure 3Labels 4 and 5 in the image. This leads to differing interpretations of the flame region by different individuals during the initial data annotation phase, resulting in ambiguity in the data annotation and hindering model convergence, leading to low detection accuracy for such objects. During model training, due to the self-similarity of flames, the region labeled by the model at different scales may only represent a portion of the flame in the original image (fractal), meaning the labeled region differs significantly from the overall target region. Related techniques calculate a higher loss function value in this situation, but the model's detection is actually correct; any detected part of the flame is still part of the flame itself, not necessarily limited to the absolute overall appearance of the flame.

[0065] In summary, when detecting self-similar targets, if the region where the target is located, as identified by the target detection model, differs from the pre-labeled region, the detection is deemed inaccurate. However, the region identified by the target detection model may actually belong to the target, and the target may have already been detected. In such cases, judging the detection as inaccurate and updating the model's parameters might actually lead to lower model recognition accuracy. In other words, the task of detecting self-similar targets presents a problem of "difficult to label, inaccurate detection."

[0066] Figure 4 This is a flowchart illustrating a training method for an object detection model provided in an embodiment of this disclosure. See also... Figure 4 The method includes:

[0067] 101: Determine a first region on the sample image, the first region being the target region predicted by the target detection model on the sample image.

[0068] Here, the object detection model refers to a neural network model used to identify objects in an image. The sample image refers to the image used during the training process of the object detection model. The first region refers to the region where the object detection model detects and marks the object in the image during the training phase; that is, the target region.

[0069] 102: Determine the relationship between the intersection region and the first region, wherein the intersection region is the intersection of the first region and the second region.

[0070] The second region is the annotation region on the sample image for the target during the data annotation stage, and the second region surrounds the entire target on the sample image.

[0071] In the field of object detection, the labeling and annotation of objects are usually done using rectangular boxes. Since the data annotation stage is done manually, the second region is also called the ground truth box, while the first region is predicted by the object detection model, so the first region is also called the predicted box.

[0072] 103: When the relationship between the intersection region and the first region satisfies the target relationship, the loss function value of the first region is set to a preset low loss function value, where the low loss function value is a constant.

[0073] Here, the preset low loss function value can be set before training begins. The low loss function value is relatively small, for example, less than 0.001.

[0074] In this embodiment of the disclosure, when the relationship between the intersection region and the first region satisfies the target relationship, most or even all of the first region lies within the second region. The loss function value of the first region is suppressed, and the loss function value is controlled to be a very small value, for example, less than a set value.

[0075] In this step, the loss function value of the first region, calculated using the loss function value formula, may be either a small or a large value. However, as long as the relationship between the intersection region and the first region satisfies the target relationship, regardless of whether the value calculated using the loss function value formula is large or small, a smaller value is uniformly used to represent it, thereby achieving the purpose of suppressing the loss function value.

[0076] 104: Train the target detection model using the loss function value.

[0077] That is, the suppressed loss function value is used to update the parameters of the object detection model, and then the next training cycle is performed.

[0078] During model calibration, there may be cases where the calibrated region is only a part of the target, and this is accurate. In the training method provided in this embodiment, since the second region during annotation is the entire target area, when the intersection of the first and second regions satisfies the target relationship with the first region, it indicates that the first region calibrates the target. In this case, the loss function value of the first region is set to a preset low loss function value to suppress the loss function value. Thus, even if the model only calibrates a part of the target, it is considered to be correctly calibrated, avoiding incorrect updates to the model parameters and thereby improving the model's recognition accuracy.

[0079] Figure 5 This is a flowchart illustrating a training method for an object detection model provided in an embodiment of this disclosure. See also... Figure 5 The method includes:

[0080] 200: The number of training cycles for the target detection model.

[0081] Once the count value of the training epoch reaches the target value, proceed to step 201 and subsequent steps. Otherwise, continue training in the normal manner.

[0082] The object detection model is performed periodically. In each cycle, object detection is performed on the input sample image. Then, the loss function value is determined based on the first region identified by the detection and the second region pre-labeled. The parameters of the object detection model are updated based on the loss function value, and then the next cycle begins.

[0083] For example, the total training period of the object detection model is A, and the target value is A / 2.

[0084] For example, the total training cycle of the object detection model is 300. In the first 150 cycles, the conventional training method is used, that is, the training method without loss function value suppression is used. When the count value of the training cycle reaches 150, step 201 and subsequent steps are executed to perform the training method with loss function value suppression.

[0085] In this embodiment of the disclosure, loss function value suppression needs to be performed after training has been carried out for a period of time and the model's detection has a certain accuracy. This is to ensure that loss function value suppression can further improve the model's performance.

[0086] Taking fireworks detection as an example, a time point was set where training with loss function value suppression was enabled. That is, when the target detection model has been trained to a certain stage and has learned the shape of fireworks to a certain extent, training with loss function value suppression is enabled at this time. The loss function is actively suppressed for the first region that is completely surrounded by the second region. This enhances the model's confidence in detecting the fractal of fireworks targets, thereby enhancing the perception of the self-similarity characteristics of the targets and achieving better model performance.

[0087] 201: Determine a first region on the sample image, the first region being the target region predicted by the target detection model on the sample image.

[0088] In the field of object detection, the labeling and annotation of objects are usually done using rectangular boxes. Since the data annotation stage is done manually, the second region is also called the ground truth box, while the first region is predicted by the object detection model, so the first region is also called the predicted box.

[0089] 202: Calculate the area ratio between the intersecting region and the first region. The intersecting region is the intersection of the first region and the second region.

[0090] The second region is the annotation region on the sample image for the target during the data annotation stage, and the second region surrounds the entire target on the sample image.

[0091] The second region is a rectangular frame, which can be the bounding rectangle of the target, such as the maximum bounding rectangle.

[0092] In this embodiment of the disclosure, when annotating targets on sample images, the annotation is performed by marking the largest bounding box (e.g., the largest outer rectangle of the target) of the same source targets. "Same source" refers to the same target, and the interval between the parts does not exceed a distance threshold, such as 30% of the image width.

[0093] like Figure 6 As shown, although both locations marked in the image are smoke, they are far apart and belong to different sources. Therefore, they need to be labeled with separate rectangles, that is... Figure 6 Labels 6 and 7 in the code. If the targets are from the same source, the entire object is labeled as a rectangle, for example... Figure 2 Number 3 in the text.

[0094] This embodiment of the disclosure marks the largest and outermost contours of the target, rather than marking the fractals of the target, so that the accuracy of the first region can be determined based on the intersection area between the first region and the second region.

[0095] For example, the area ratio in step 202 is calculated according to the following formula:

[0096]

[0097] Where pred is the first region, gt is the second region, intersection(pred,gt) is the area of ​​the intersection region, and area(pred) is the area of ​​the first region.

[0098] In this embodiment, the areas of the first and second regions are calculated by identifying the length and width of the rectangular boxes in the sample image. The intersection region is calculated by identifying the length and width of the intersection region in the sample image.

[0099] When the area ratio between the intersection region and the first region is less than a threshold, it indicates that the first region is accurately calibrated, and step 203 is executed. When the area ratio between the intersection region and the first region is not greater than the threshold, it indicates that the first region is inaccurately calibrated, and step 204 is executed.

[0100] Optionally, the threshold value ranges from 0.9 to 1. Setting the threshold large enough ensures that loss function value suppression is only performed when the calibration is of the target type, and that the model parameters can be updated by feedback through the loss function value when the calibration is inaccurate, thereby improving the model accuracy.

[0101] For example, the threshold is 0.998.

[0102] 203: Set the loss function value of the first region to a preset low loss function value.

[0103] A ratio greater than the threshold indicates that the first region is well-calibrated. Suppressing the loss function value in this case can improve the model's recognition accuracy.

[0104] In this step, the value of the suppressed loss function is directly set to a numerical value that is small enough to avoid drastically changing the model parameters when the model is correctly calibrated.

[0105] In some possible implementations, the low loss function value is a constant, and the low loss function value is less than 0.001.

[0106] For example, the low loss function value is 0.0001.

[0107] 204: The loss function value of the first region is determined using the loss function value calculation formula.

[0108] For the first region that does not satisfy the target relationship, the normal loss function value calculation formula is used to ensure that the model training proceeds normally.

[0109] For example, the formula for calculating the loss function value is as follows:

[0110]

[0111] L CIOU Here, represents the loss function value, IOU is the intersection-union ratio of the first and second regions, and αv is an influence factor, where α is a parameter used for trade-offs, and v is a parameter used to measure the consistency of the aspect ratio of the rectangles; B gt B represents the prediction box for the category of defect; B represents the prediction box for the category of insulator, and b and b gt Let B and Bgt represent the center points of B and Bgt, respectively; ρ is the Euclidean distance; and c is the diagonal distance of the minimum bounding rectangle of the target.

[0112] The IOU is calculated using the following formula:

[0113]

[0114] Where union(pred,gt) is the union of the first and second regions.

[0115] In fact, steps 203 and 204 can be expressed by the following formula:

[0116]

[0117] 205: Perform a weighted summation of the loss function values ​​for multiple first regions corresponding to each category of target.

[0118] The sample images include targets of at least one category, and each category of targets corresponds to at least one first region.

[0119] In this step, the loss function value of the entire sample image is obtained by weighting. This allows the weight of the first region to be set according to whether it is suppressed, which is more conducive to suppressing the loss function value of the first region and improving the model's recognition accuracy.

[0120] For example, step 205 can be calculated using the following formula:

[0121]

[0122] Where M is related to L CIOU Weight matrices of the same size, L box The loss function value is for each sample image. In formula (5), both the numerator and denominator contain two summation symbols. The first of these symbols represents the summation of the loss function values ​​for each first region of a target. Figure 7 A schematic diagram of flame calibration is shown; see [link / reference]. Figure 7 The model identifies two first regions for the flame, namely the two small rectangles in the image. During calculation, the loss function corresponding to the two small rectangles is determined and summed to obtain the loss function value of the flame target. The second summation symbol indicates the summation of the loss function values ​​of each target in the sample image.

[0123] Optionally, when performing a weighted summation of the loss function values ​​for multiple first regions corresponding to each category of target, the weight of the low loss function value (determined in step 203) is greater than the weight of other loss function values ​​(determined in step 204). By increasing the weight of the suppressed loss function values, significant changes to the model parameters are further avoided when the model is correctly calibrated.

[0124] For example, when performing weighted summation, there are a first loss function value and a second loss function value. The first loss function value is the lower loss function value, and the second loss function value is calculated using the loss function value calculation formula. When performing weighted summation, the weight of the first loss function value is greater than that of the second loss function value. For example, the weight of the first loss function value is twice that of the second loss function value.

[0125] Optionally, the first regions of different targets have different labels, and the second regions of different targets have different labels; the first regions and second regions of the same target have the same label. For a sample image, multiple targets can be detected simultaneously. The first regions (or second regions) of different targets are distinguished by different labels, and the first regions and second regions of the same target are associated through labels, thereby ensuring simultaneous training for multiple target detections.

[0126] 206: The loss function value obtained by weighted summation is used for model training.

[0127] The following comparison uses a target detection model implemented with a YOLOv5-s network as an example to compare the effectiveness of the model training method provided in this disclosure with that in related technologies. In this comparative experiment, the flame is further subdivided into conflagration and candlelight.

[0128] Figure 8 This is a schematic diagram illustrating the effect of target detection using the method provided in the embodiments of this disclosure. See also... Figure 8 In this diagram, numbers 11 through 15 represent the PageRank (PR) curves for candlelight, fire, all classes, smoke, and nighttime light disturbance, respectively. In the PR curve, R is the horizontal axis, representing recall, and P is the vertical axis, representing precision. Recall refers to the proportion of targets identified from the image out of the labeled targets, while precision refers to the accuracy of the identified targets. Figure 9 This is a schematic diagram illustrating the effect of target detection using methods provided by relevant technologies.

[0129] exist Figure 8 In the study, the precision rates for fire, smoke, and candlelight, with a recall rate of 0.5, were 0.747, 0.682, and 0.482, respectively, with an overall precision rate of 0.704 across all categories. Figure 9 In the study, the accuracy rates for fire, smoke, and candlelight were 0.729, 0.625, and 0.437, respectively, with a recall rate of 0.5. The overall accuracy rate for all categories was 0.670.

[0130] See Figure 8 and Figure 9 Compared with models trained by methods provided in related technologies, the model trained by the method provided in this disclosure improves the detection accuracy of large fires, smoke, and candle flames by 2.47%, 9.12%, and 2.02%, respectively. The fluctuations in the accuracy and recall curves are significantly reduced, verifying the effectiveness of the training method provided in this disclosure.

[0131] Although the embodiments provided in this disclosure use the detection of smoke and fire as an example, the training method is also applicable to the detection of other targets with self-similar statistical properties, such as dust, clouds, steam, and coastlines. Furthermore, although this disclosure uses the YOLOv5-s network for experiments, the training method is also applicable to training models based on other networks.

[0132] This disclosure addresses the shortcomings of current methods for detecting self-similar objects, which often rely on rote data accumulation and sample increases or overfitting to engineering scenarios. Instead, it improves upon existing methods by labeling only the largest and outermost contours of the target. This reduces the workload of data labeling and preparation while enhancing detection results, resolving the "difficult to label, inaccurate detection" problem in self-similar object detection. Furthermore, by forcibly suppressing the loss function generated when detecting fractals in objects during training, the model's accuracy in detecting such self-similar objects is improved.

[0133] This disclosure also provides a target detection method, the method comprising: performing target detection using a target detection model, wherein the target detection model is employed as follows: Figure 4 or Figure 5 The method shown is used for training.

[0134] Figure 10 This is a block diagram of a training apparatus 300 for an object detection model provided in an embodiment of this disclosure. Figure 10 As shown, the training device 300 for the target detection model includes: a first determination module 301, a second determination module 302, a processing module 303, and a training module 304.

[0135] The first determining module 301 is used to determine a first region on the sample image, wherein the first region is a target region predicted by the target detection model on the sample image.

[0136] The second determining module 302 is used to determine the relationship between the intersection region and the first region. The intersection region is the intersection of the first region and the second region. The second region is the annotation region for the target annotation on the sample image during the data annotation stage. The second region surrounds the entire target on the sample image.

[0137] Processing module 303 is used to set the loss function value of the first region to a preset low loss function value when the relationship between the intersection region and the first region satisfies the target relationship, wherein the low loss function value is a constant;

[0138] Training module 304 is used to train the target detection model using the loss function value.

[0139] Optionally, the second determining module 302 is used to calculate the area ratio between the intersection region and the first region;

[0140] The relationship between the intersection region and the first region satisfies the target relationship, including:

[0141] The area ratio is not less than the threshold.

[0142] Optionally, the threshold value ranges from 0.9 to 1.

[0143] Optionally, the low loss function value is less than 0.001.

[0144] Optionally, the sample image includes targets of at least one category, and each category of targets corresponds to at least one first region;

[0145] The training module 304 is used to perform a weighted summation of the loss function values ​​of multiple first regions corresponding to each category of target; and to use the loss function values ​​obtained by the weighted summation for model training.

[0146] Optionally, when performing a weighted summation of the loss function values ​​of multiple first regions corresponding to each category of target, the weight of the lower loss function value is greater than the weight of the other loss function values.

[0147] Optionally, the first regions of different targets have different identifiers, and the second regions of different targets have different identifiers;

[0148] The first and second regions, which have the same target, have the same identifier.

[0149] Optionally, the processing module 303 is further configured to determine the loss function value of the first region using a loss function value calculation formula when the relationship between the intersection region and the first region does not satisfy the target relationship.

[0150] Optionally, the device further includes:

[0151] The counting module 305 is used to count the training cycles of the target detection model;

[0152] The second determining module 302 and the processing module 303 are used to determine the relationship between the intersection region and the first region when the count value of the training cycle of the target detection model reaches the target value, and to set the loss function value of the first region to a preset low loss function value when the relationship between the intersection region and the first region satisfies the target relationship.

[0153] It should be noted that the training device for the target detection model provided in the above embodiments is only illustrated by the division of the above functional modules. In practical applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device can be divided into different functional modules to complete all or part of the functions described above. In addition, the training device for the target detection model provided in the above embodiments and the training method embodiments for the target detection model belong to the same concept, and the implementation process is detailed in the method embodiments, which will not be repeated here.

[0154] This disclosure also provides a target detection device, which includes a detection module. The detection module is used to perform target detection using a target detection model, wherein the target detection model employs, for example... Figure 4 or Figure 5 The method shown is used for training.

[0155] like Figure 11 As shown, this disclosure also provides a computer device 400, which can be a training device for an object detection model or an object detection device. The computer device 400 can be used to execute the training method for the object detection model or the object detection method provided in the above embodiments. See also Figure 11 The computer device 400 includes a memory 401, a processor 402, and a display component 403, as will be understood by those skilled in the art. Figure 11 The structure of the computer device 400 shown does not constitute a limitation on the computer device 400. In practical applications, it may include more or fewer components than shown, or combine certain components, or have different component arrangements. Wherein:

[0156] Memory 401 can be used to store computer programs and modules. Memory 401 may mainly include a program storage area and a data storage area. The program storage area may store the operating system, application programs required for at least one function, etc. Memory 401 may include high-speed random access memory and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, memory 401 may also include a memory controller to provide processor 402 with access to memory 401.

[0157] The processor 402 executes various functional applications and data processing by running software programs and modules stored in the memory 401.

[0158] Display component 403 is used to display images. Display component 403 may include a display panel, which may optionally be configured as an LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or other similar form.

[0159] In an exemplary embodiment, a computer-readable storage medium is also provided. This computer-readable storage medium is a non-volatile storage medium that stores a computer program. When the computer program in the computer-readable storage medium is executed by a processor, it can execute the training method of the target detection model or the target detection method provided in the embodiments of this disclosure.

[0160] In an exemplary embodiment, a computer program product is also provided, which stores instructions that, when run on a computer, enable the computer to execute the training method of the target detection model or the target detection method provided in the embodiments of this disclosure.

[0161] In an exemplary embodiment, a chip is also provided, which includes programmable logic circuits and / or program instructions, and when the chip is running, it is able to execute the training method of the target detection model provided in the embodiments of this disclosure, or the target detection method.

[0162] Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by hardware or by a program instructing related hardware. The program can be stored in a computer-readable storage medium, such as a read-only memory, a disk, or an optical disk.

[0163] The above description is merely an optional embodiment of this disclosure and is not intended to limit this disclosure. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this disclosure should be included within the protection scope of this disclosure.

Claims

1. A training method for an object detection model, characterized in that, The method includes: A first region is determined on the sample image, which is the target region predicted by the target detection model on the sample image; Calculate the area ratio of the intersection region and the first region. The intersection region is the intersection of the first region and the second region. The second region is the labeled region on the sample image for the target during the data labeling stage. The second region surrounds the entire target on the sample image. The target is a target with self-similarity statistical characteristics. The target includes one or more of the following: smoke, dust, clouds, flames, smoke, steam, and nighttime lights. When the area ratio of the intersection region to the first region is not less than a threshold, the loss function value of the first region is set to a preset low loss function value, wherein the low loss function value is a constant and the low loss function value is less than 0.001; The parameters of the object detection model are updated using the loss function value as feedback, and then the next training cycle is performed.

2. The method according to claim 1, characterized in that, The threshold value ranges from 0.9 to 1.

3. The method according to any one of claims 1 to 2, wherein the method is characterized in that, The sample image includes targets of at least one category, and each category of targets corresponds to at least one first region; Training the target detection model using the loss function value includes: The loss function values ​​of multiple first regions corresponding to each category of target are weighted and summed. The loss function value obtained by weighted summation is used for model training.

4. The method according to claim 3, characterized in that, When performing a weighted summation of the loss function values ​​for multiple first regions corresponding to each category of target, the weight of the lower loss function value is greater than the weight of the other loss function values.

5. The method according to any one of claims 1 to 2, characterized in that, The method further includes: counting the training cycles of the target detection model; When the count value of the training cycle of the target detection model reaches the target value, the steps of calculating the area ratio of the intersection region and the first region are performed, and when the area ratio of the intersection region and the first region is not less than the threshold, the loss function value of the first region is set to a preset low loss function value.

6. The method according to any one of claims 1 to 2, characterized in that, The method further includes: When the relationship between the intersection region and the first region does not satisfy the target relationship, the loss function value of the first region is determined using the loss function value calculation formula.

7. A target detection method, characterized in that, The method includes: Target detection is performed using a target detection model, which is trained using the method described in any one of claims 1 to 6.

8. A training device for an object detection model, characterized in that, The device includes: The first determining module is used to determine a first region on the sample image, wherein the first region is a target region predicted by the target detection model on the sample image. The second determining module is used to calculate the area ratio of the intersection region and the first region. The intersection region is the intersection of the first region and the second region. The second region is the labeled region on the sample image for the target during the data labeling stage. The second region surrounds the entire target on the sample image. The target is a target with self-similar statistical characteristics. The target includes one or more of the following: smoke, dust, clouds, flames, smoke, steam, and night lights. The processing module is configured to set the loss function value of the first region to a preset low loss function value when the area ratio of the intersection region to the first region is not less than a threshold. The low loss function value is a constant and the low loss function value is less than 0.

001. The training module is used to update the parameters of the object detection model using the loss function value as feedback, and then perform training for the next cycle.

9. A target detection device, characterized in that, The device includes: A detection module is used to perform target detection using a target detection model, wherein the target detection model is trained using the method described in any one of claims 1 to 6.

10. A computer device, characterized in that, The computer device includes a processor and memory; The memory is used to store computer programs; The processor is configured to execute the computer program stored in the memory to implement the training method of the target detection model according to any one of claims 1 to 6, or the target detection method according to claim 7.

11. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions, which, when executed by a processor, enable the training method of the target detection model as described in any one of claims 1 to 6, or the target detection method as described in claim 7.