Model evaluation method and device, electronic equipment and storage medium
By performing video file detection and result calculation on the target detection model, the problem of the inability to evaluate the model's video analysis and tracking capabilities in existing technologies is solved, and the accurate evaluation and improvement of the model's video detection capabilities are achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHENZHEN INTELLIFUSION TECHNOLOGIES CO LTD
- Filing Date
- 2022-12-30
- Publication Date
- 2026-06-23
AI Technical Summary
Existing object detection model testing methods mainly rely on image evaluation, which cannot effectively assess the model's video analysis and tracking capabilities, making it difficult to evaluate accuracy.
By acquiring a test set of multiple video files, inputting them into the model to be evaluated for detection, determining the target detection results in the video files, and calculating the model's evaluation results based on positive and negative results, including trajectory matching and overlap calculation, to assess the model's video analysis and tracking capabilities.
This study enabled the accuracy assessment of the video detection capability of the target detection model, ensuring a comprehensive evaluation of the model's video analysis and tracking capabilities, and improving the model's video detection accuracy.
Smart Images

Figure CN116206178B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of computer technology, and more specifically to a model evaluation method, apparatus, electronic device, and storage medium. Background Technology
[0002] Object detection models are mainly used to detect the specific location and size of objects in images or videos, thereby achieving object localization and detection. With the rapid development of smart cities, object detection models are increasingly being used in various scenarios of smart urban governance. Before object detection models are used to detect objects in images or videos, they need to be tested. However, current object detection model testing methods usually use images for evaluation, and the analysis results obtained are relatively fixed. This cannot cover the video analysis and tracking capabilities of object detection models, making it difficult to evaluate the accuracy of object detection models in video detection. Summary of the Invention
[0003] Firstly, the main objective of this invention is to provide a model evaluation method, including:
[0004] Obtain the test set; the test set includes multiple video files;
[0005] Each video file in the test set is input into the model to be evaluated for detection, and the corresponding target detection results in each video file are determined; the target detection results include positive results and negative results;
[0006] The evaluation result of the model to be evaluated is obtained by calculating the positive and negative results corresponding to each video file in the test set.
[0007] Preferably, the step of inputting each video file in the test set into the evaluation model for detection, and determining the target detection result corresponding to each video file, includes:
[0008] For each video file in the test set, if a target object is detected in the video file, determine the appearance time and detection box of the target object;
[0009] If the occurrence time falls within the time interval of any pre-labeled object, trajectory matching calculation is performed on the target object based on the detection box to obtain the calculation result; the pre-labeled object is the target object appearing in the video file;
[0010] Based on the calculation results, the target detection result is determined to be either a positive or negative result.
[0011] Preferably, for each video file in the test set, if a target object is detected in the video file, after determining the appearance time and detection bounding box of the target object, the process includes:
[0012] If the occurrence time does not fall within the time interval of any pre-labeled object, the detection result of the target object is determined to be a negative result.
[0013] Preferably, the step of performing trajectory matching calculation on the target object based on the detection box to obtain the calculation result includes:
[0014] The center point is calculated based on the detection box, and the distance between the coordinates of the center point and each pixel coordinate point in the pre-stored pixel coordinate point set is calculated to obtain the target trajectory point with the shortest distance; the pixel coordinate points are the pixel coordinates corresponding to the multiple trajectory points marked by the pre-annotated object in the video file;
[0015] The target bounding box corresponding to the target trajectory point is determined, and the overlap ratio is calculated based on the target bounding box and the detection box to obtain the overlap ratio value.
[0016] Preferably, determining whether the target detection result is positive or negative based on the calculation result includes:
[0017] The overlap value is compared with a preset threshold. If the overlap value is greater than the preset threshold, the target detection result is determined to be a positive result.
[0018] If the overlap value is less than or equal to the preset threshold, the target detection result is determined to be a negative result.
[0019] Preferably, the step of calculating the evaluation result of the model to be evaluated based on the positive and negative results corresponding to each video file in the test set includes:
[0020] Based on the positive and negative results corresponding to each video file in the test set, determine the number of positive results and the number of negative results;
[0021] The evaluation result of the model to be evaluated is obtained by calculating the number of positive results and the number of negative results.
[0022] Preferably, the calculation of the evaluation result of the model to be evaluated based on the number of positive results and the number of negative results includes:
[0023] The accuracy, precision, and recall of the model to be evaluated are determined by calculating the number of positive results and the number of negative results.
[0024] The evaluation score of the model to be evaluated is calculated based on its accuracy, precision, and recall.
[0025] Secondly, embodiments of the present invention provide a model evaluation device, comprising:
[0026] The acquisition module is used to acquire a test set; the test set includes multiple video files.
[0027] The detection module is used to input each video file in the test set into the evaluation model for detection, and determine the target detection result for each video file; the target detection result includes positive results and negative results;
[0028] The calculation module is used to calculate the evaluation result of the model to be evaluated based on the positive and negative results corresponding to each video file in the test set.
[0029] Thirdly, embodiments of the present invention provide an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the model evaluation method described above.
[0030] Fourthly, embodiments of the present invention provide a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the steps of the model evaluation method described above.
[0031] The above-described solution of the present invention has at least the following beneficial effects:
[0032] The model evaluation method provided by this invention first obtains a test set, which includes multiple video files. Then, each video file in the test set is input into the model to be evaluated for detection, determining the corresponding target detection results in each video file. The target detection results include positive and negative results. Finally, based on the positive and negative results corresponding to each video file in the test set, the evaluation result of the model to be evaluated is obtained. This allows for a comprehensive evaluation of the video analysis and tracking capabilities of the target detection model, ensuring the accuracy of the target detection model's video detection capabilities. Attached Figure Description
[0033] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the structures shown in these drawings without creative effort.
[0034] Figure 1 This is a schematic diagram of the overall process of the model evaluation method provided in the embodiments of the present invention;
[0035] Figure 2 Example diagram of the model evaluation method provided in the embodiments of the present invention;
[0036] Figure 3 This is a structural block diagram of the model evaluation device provided in an embodiment of the present invention;
[0037] Figure 4 This is a structural block diagram of an electronic device provided in an embodiment of the present invention.
[0038] The realization of the objective, functional features and advantages of the present invention will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation
[0039] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the scope of protection of the present invention.
[0040] The terms "first," "second," and "third," etc., used in the specification, claims, and accompanying drawings of this invention are used to distinguish different objects and not to describe a specific order. Furthermore, the term "comprising," and any variations thereof, is intended to cover a non-exclusive inclusion. For example, a process, method, system, product, or apparatus that includes a series of steps or units is not limited to the listed steps or units, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or apparatuses.
[0041] Since this application involves relevant privacy data, the specific implementation of this application involves data such as captured images, facial images, or human body images. When the embodiments of this application are applied to specific products or technologies, user permission or consent is required, and the collection, use, and processing of relevant data must comply with the relevant laws, regulations, and standards of the relevant countries and regions.
[0042] First, let's take a look at the relevant accompanying drawings to illustrate the solution of the embodiments of this application.
[0043] like Figure 1 As shown, a specific embodiment of the present invention provides a model evaluation method, including:
[0044] S10. Obtain the test set; the test set includes multiple video files.
[0045] In this embodiment, the test set may include video files collected from different locations, such as roads, residential areas, scenic spots, and other public places. Each public place may be equipped with image acquisition devices, which can capture video streams of target objects in real-time or at set intervals. For example, the image acquisition devices may capture video streams hourly or in real-time. Each video file may be extracted from a segment of a video stream or be a video stream of a specific duration. The video file may contain facial images or human images, which may appear at different times within the video file, enabling the model to detect people at different times. Video files can vary in resolution, for example, they can include low-resolution video files and high-resolution video files. Low-resolution video files can be used as easy-to-test sample materials for evaluation, while high-resolution video files can be used as difficult-to-test sample materials. Furthermore, different video files can include those containing the target object and those not containing the target object. Video files containing the target object can be used as positive sample materials, while those not containing the target object can be used as negative sample materials. Thus, by inputting different video files as the test set into the model to be evaluated for detection, different detection results can be obtained based on the model to be evaluated.
[0046] S20. Input each video file in the test set into the model to be evaluated for detection, and determine the corresponding target detection results in each video file; the target detection results include positive results and negative results.
[0047] In this embodiment, a positive result indicates that the target object in the positive sample is correctly identified by the evaluation model, and that no target object is detected in the negative sample by the evaluation model. A negative result indicates that the target object in the positive sample is not identified by the evaluation model, and that a non-existent target object is misidentified by the evaluation model in the negative sample. Therefore, after inputting each video file into the evaluation model for detection, the target object detection result of each video file can be determined as either a positive or negative result. It is understood that the video file can be a pre-annotated video file. When annotating the video file, corresponding annotation coordinate boxes can be marked for the target objects in the video file, or the trajectory can be marked according to the movement trajectory of the target objects in the video file. Thus, the target objects in the video file can be detected according to the marked trajectory and annotation coordinate boxes to determine the target detection result.
[0048] Specifically, the above-mentioned process involves inputting each video file in the test set into the model to be evaluated for detection, and determining the target detection result for each video file. This includes: for each video file in the test set, if a target object is detected in the video file, determining the appearance time and detection box of the target object; if the appearance time falls within the time interval of any pre-labeled object, performing trajectory matching calculation on the target object based on the detection box to obtain the calculation result; the pre-labeled object is the target object appearing in the video file; and determining whether the target detection result is a positive or negative result based on the calculation result.
[0049] In this embodiment, the time interval of the pre-labeled object represents the time interval from the appearance time to the disappearance time of the labeled object in the video file. When detecting the video file, each frame of the video file can be detected. When a target object is detected in a certain frame, the appearance time of the target object can be determined and the corresponding detection box can be output. It can be understood that when matching the appearance time of the target object with the time interval of the pre-labeled object, the appearance time of the target object can be matched with the time intervals of multiple pre-labeled objects in the video file to determine whether the appearance time of the target object belongs to the time interval of a certain pre-labeled object in the video file. If it belongs to the time interval of a certain pre-labeled object, the detection box output by the target object can be used for trajectory matching calculation, thereby determining whether the target detection result is a positive result or a negative result.
[0050] Here, the occurrence time of the target object can be set to t0, and the pre-labeled object can be e. Therefore, the occurrence time of the pre-labeled object e within its time interval is t. s The disappearance time is t e Therefore, it can be determined whether the appearance time t0 of the target object is within the time interval of the pre-labeled object [t].s , t e Within this context, it can be understood that the appearance time of the target object represents its appearance time in the video file, the appearance time of the pre-labeled object represents its appearance time in the video file, and the disappearance time represents its disappearance time in the video file. For example, a video file includes multiple pre-labeled objects A, B, and C. A appears at 3 minutes and 10 seconds and disappears at 4 minutes and 16 seconds, B appears at 4 minutes and 10 seconds and disappears at 4 minutes and 50 seconds, and C appears at 3 minutes and 10 seconds and disappears at 4 minutes and 20 seconds. Therefore, after the model under evaluation detects the target object, it can determine whether the detected target object matches the time interval of any of the pre-labeled objects A, B, and C. It can also be understood that when the model under evaluation detects multiple target objects simultaneously, the appearance times of the multiple target objects can be compared with each of the pre-labeled objects one by one to determine whether trajectory matching calculation is performed for the multiple target objects.
[0051] Furthermore, for each video file in the test set, if a target object is detected in the video file, after determining the appearance time and detection box of the target object, the following steps are taken: if the appearance time is not within the time interval of any pre-labeled object, the detection result of the target object is determined to be a negative result.
[0052] In this process, after the model under evaluation detects the target object, the appearance time of the target object can be compared with the time intervals of each pre-labeled object. If the appearance time of the target object does not fall within the time interval of any pre-labeled object, it indicates that the model under evaluation has misidentified the target object, meaning that a non-existent target object has been mistakenly identified in the video file. Therefore, the detection result for this target object can be determined to be negative. For example, if video file M contains buildings in a certain area and is 3 minutes long, and the model under evaluation identifies a floating object as the target object after detecting the video file, but no object is labeled during the labeling process for this video file, it can be determined that the model under evaluation has misidentified the floating object, and the detection result for this target object can be determined to be negative.
[0053] Specifically, the above-mentioned trajectory matching calculation of the target object based on the detection box to obtain the calculation result includes: calculating the center point based on the detection box, and calculating the distance between the coordinates of the center point and each pixel coordinate point in the pre-stored pixel coordinate point set to obtain the target trajectory point with the shortest distance; the pixel coordinate point set represents the pixel coordinates corresponding to multiple trajectory points of the pre-annotated object in the video file; determining the target box corresponding to the target trajectory point, and calculating the overlap between the target box and the detection box to obtain the overlap value.
[0054] In this embodiment, when a target object is determined to match any pre-labeled object, trajectory matching is performed on the target object. A corresponding detection box is generated for the detected target object, and its center point is determined using the detection box. The center point of the detection box is then matched with the trajectory points of the pre-labeled object. It can be understood that the pixel coordinate set represents the pixel coordinates corresponding to multiple trajectory points labeled by the pre-labeled object in the video file. Each trajectory point can correspond to one pixel coordinate. When calculating the coordinates of the center point and the pixel coordinate set, Euclidean distance can be used for distance calculation. This determines the target trajectory point with the shortest distance in the pixel coordinate set. Based on the shortest distance target trajectory point and the width and height of the detection box, the target box corresponding to the target trajectory point can be reconstructed to obtain the target box corresponding to the target trajectory point for overlap calculation. The overlap calculation involves calculating the intersection-union ratio (IUU) of the target box and the detection box to determine the overlap value.
[0055] The detection box mentioned above can be represented as {x0, y0, w0, h0}, where x0 represents the x-coordinate of the top-left point of the detection box, y0 represents the y-coordinate of the top-left point of the detection box, w0 represents the width of the detection box, and h0 represents the height of the detection box. Therefore, the center point P of the detection box can be calculated as P = {x0 + w0 / 2, y0 + h0 / 2}. The pixel coordinate set can be represented as A = {a1, a2, ..., an}, where n ∈ [1, n]. n ={x n ,y n}, a n This represents the coordinates of a point on the trajectory. By calculating the Euclidean distance between the coordinates of the center point P and the coordinates of each trajectory point, the Euclidean distances between each trajectory point and the center point P can be determined as D1, D2, ..., D... n The target trajectory point corresponding to the shortest distance can be represented as D. q D q =Min(D1,D2,...,D) n Therefore, after determining the target trajectory point with the shortest distance, the coordinates of the target trajectory point can be determined as q = {x}. q ,y q The target trajectory point can correspond to the center point of the target bounding box. Therefore, by reconstructing the target bounding box of the target trajectory point using the width and height of the detection box, the target bounding box can be determined as {x}. q -w / 2,y q -h / 2,w,h}; It is understandable that, after determining the detection bounding box as {x0,y0,w0,h0} and the target bounding box as {x q -w / 2,y qThe intersection-union ratio (IUU) of the detection bounding box and the target bounding box (H / 2, w, h) can be calculated to determine the degree of overlap between them.
[0056] Understandably, it can be referenced. Figure 2 As shown, when calculating the intersection-union ratio (IU) of the detection box and the target box, the coordinates of the top-left corner of the detection box can be defined as (x, y), and the coordinates of the bottom-right corner as (x+w, y+h). The coordinates of the top-left corner of the target box can be defined as (x′, y′), and the coordinates of the bottom-right corner as (x′+w′, y′+h′). Therefore, the coordinates of the top-left and bottom-right corners of the overlapping region between the detection box and the target box can be defined as follows:
[0057] x l =max(x,x′)
[0058] y l =max(y,y′)
[0059] x k =min(x+w,x′+w′)
[0060] y k =min(y+h,y′+h′);
[0061] Where, x l and y l The x-coordinate represents the top-left corner of the overlapping region between the detection box and the target box. k and y k This represents the coordinates of the lower right corner of the overlapping area between the detection box and the target box. Therefore, after calculating the area S1 of the detection box and the area S2 of the target box, the overlap value between the detection box and the recognition box can be determined, which can be calculated using the following formula:
[0062]
[0063] S1∪S2=w×h+w′×h′-S1∩S2
[0064]
[0065] Here, δ represents the overlap ratio between the detection box and the target box. Therefore, by calculating the intersection-union ratio (IUU) between the detection box and the target box, the overlap ratio is obtained. This overlap ratio can then be used to determine whether the detection result for the target is positive or negative. For example, ... Figure 2 As shown, the intersection-union ratio (IUR) between the detection box and the target box can be calculated based on the area of the overlapping region between them. This allows us to determine the intersection-union ratio. Figure 2If the overlap value between the detection box and the recognition box is small, it can be determined that the detection result of the target object is negative.
[0066] Furthermore, the above-mentioned determination of whether the target detection result is positive or negative based on the calculation results includes: comparing the overlap value with a preset threshold; if the overlap value is greater than the preset threshold, the target detection result is determined to be positive; if the overlap value is less than or equal to the preset threshold, the target detection result is determined to be negative.
[0067] In this embodiment, the preset threshold is pre-set. When the overlap value is greater than the preset threshold, it indicates that the loss between the detection box and the target box output by the model under evaluation is small. Therefore, it can be determined that the model under evaluation can correctly identify the target object, and its corresponding detection result can be output as a positive result for subsequent statistics. When the overlap value is less than or equal to the preset threshold, it indicates that the loss between the detection box and the target box output by the model under evaluation is large. Therefore, it can be determined that the recognition accuracy of the model under evaluation is insufficient, and its corresponding detection result can be output as a negative result for subsequent statistics. It can be understood that the detection result output by each video file in the test set can be either a positive result or a negative result. By statistically analyzing the detection results of each video file in the test set, the number of positive results and the number of negative results can be determined. Thus, the model under evaluation can be comprehensively evaluated and scored to determine the evaluation result of the model under evaluation.
[0068] S30. Calculate the evaluation results of the model to be evaluated based on the positive and negative results corresponding to each video file in the test set.
[0069] In this embodiment, a positive result indicates that the target object in the positive sample was correctly identified by the model under evaluation, and that no target object was detected by the model under evaluation in the negative sample. A negative result indicates that the target object in the positive sample was not detected by the model under evaluation, and that a non-existent target object was misidentified by the model under evaluation in the negative sample. Therefore, the number of target objects correctly identified by the model under evaluation in video files containing target objects can be counted and defined as TP, and the number of target objects not identified by the model under evaluation in video files containing non-target objects can be counted and defined as TN, thus obtaining the number of positive results. The number of target objects not identified by the model under evaluation in video files containing target objects can be defined as FN, and the number of target objects misidentified by the model under evaluation in video files containing non-target objects can be defined as FP, thus obtaining the number of negative results. Therefore, the evaluation result of the model under evaluation can be calculated based on the detection results of each video file in the test set.
[0070] Specifically, the above calculations based on the positive and negative results corresponding to each video file in the test set yield the evaluation results of the model to be evaluated, including: determining the number of positive and negative results based on the positive and negative results corresponding to each video file in the test set; and calculating the evaluation results of the model to be evaluated based on the number of positive and negative results.
[0071] In this embodiment, after the model to be evaluated has detected the target, the number of correctly identified targets (TP) in the positive samples, the number of unidentified targets (TN) in the negative samples, the number of unidentified targets (FN) in the positive samples, and the number of misidentified targets (FP) in the negative samples can be determined. The corresponding percentages are then calculated to determine the evaluation result of the model to be evaluated. It can be understood that by determining the evaluation result of the model to be evaluated, the model can be updated in the future to improve its video analysis and tracking capabilities, thereby improving the accuracy of the model's video detection capabilities.
[0072] Furthermore, the above calculations based on the number of positive and negative results yield the following evaluation results for the model to be evaluated: the accuracy, precision, and recall of the model to be evaluated are determined based on the number of positive and negative results; and the evaluation score of the model to be evaluated is obtained based on the accuracy, precision, and recall of the model to be evaluated.
[0073] In this embodiment, accuracy can be the percentage of positive results, precision can be the percentage of correctly identified target objects in both positive and negative samples, and recall can be the percentage of correctly identified target objects in positive samples. Therefore, when calculating accuracy, based on the number of correctly identified targets (TP) in positive samples, the number of unidentified target objects (TN) in negative samples, the number of unidentified targets (FN) in positive samples, and the number of misidentified target objects (FP) in negative samples, the following formula can be used:
[0074]
[0075] Understandably, by calculating the accuracy, we can determine the percentage of correctly identified positive and negative samples (TP and TN). For example, with TP = 5, TN = 20, FP = 3, and FN = 10, the accuracy can be calculated to be 0.65.
[0076] Furthermore, the accuracy rate can be calculated using the following formula:
[0077]
[0078] The precision rate can be calculated to obtain the percentage of correctly identified positive samples (TP). For example, the precision rate can be calculated to be 0.625 using the example above.
[0079] Furthermore, recall can be calculated using the following formula:
[0080]
[0081] Understandably, the recall rate can be calculated to be 0.33 using the example above. After determining the precision, accuracy, and recall, a comprehensive calculation can be performed based on these three rates to obtain the corresponding evaluation score. The evaluation score can be calculated using the following formula:
[0082]
[0083] The F1 score can be represented as an evaluation score. For example, after calculating the precision (0.625) and recall (0.33) mentioned above, the F1 score can be determined to be 0.43. It can be understood that the higher the recall, the lower the precision of the model being evaluated; and the lower the recall, the higher the precision. Therefore, by taking a weighted average of precision and recall to obtain the evaluation score, the evaluation result of the model being evaluated can be determined, so as to obtain the performance of the model being evaluated and adjust it to improve the model's performance. It can be understood that by comprehensively evaluating the model being evaluated, the performance of the model can be determined to ensure more accurate target detection in subsequent target object detection processes.
[0084] The model evaluation method provided by this invention first obtains a test set, which includes multiple video files. Then, each video file in the test set is input into the model to be evaluated for detection, determining the corresponding target detection results for each video file. The target detection results include positive and negative results. Finally, based on the positive and negative results corresponding to each video file in the test set, the evaluation result of the model to be evaluated is obtained. This allows for a comprehensive evaluation of the video analysis and tracking capabilities of the target detection model, ensuring the accuracy of the target detection model's video detection capabilities.
[0085] like Figure 3 As shown, this embodiment of the invention provides a model evaluation device 10, comprising:
[0086] Module 11 is used to acquire the test set; the test set includes multiple video files.
[0087] The detection module 12 is used to input each video file in the test set into the model to be evaluated for detection, and to determine the target detection result in each video file; the target detection result includes positive results and negative results;
[0088] The calculation module 13 is used to calculate the evaluation results of the model to be evaluated based on the positive and negative results corresponding to each video file in the test set.
[0089] The model evaluation device 10 provided by this invention first acquires a test set, which includes multiple video files. Then, it inputs each video file from the test set into the model to be evaluated for detection, determining the corresponding target detection results for each video file. The target detection results include positive and negative results. Finally, it calculates the evaluation results of the model based on the positive and negative results corresponding to each video file in the test set. This allows for a comprehensive evaluation of the video analysis and tracking capabilities of the target detection model, ensuring the accuracy of the target detection model's video detection capabilities.
[0090] It should be noted that the model evaluation device 10 provided in the specific embodiment of the present invention is a device corresponding to the above-described model evaluation method. All embodiments of the above-described model evaluation method are applicable to the model evaluation device 10. Each embodiment of the above-described model evaluation device 10 has corresponding modules corresponding to the steps in the above-described model evaluation method, which can achieve the same or similar beneficial effects. In order to avoid excessive repetition, each module in the model evaluation device 2 will not be described in detail here.
[0091] like Figure 4 As shown, a specific embodiment of the present invention also provides an electronic device 20, including a memory 202, a processor 201, and a computer program stored in the memory 202 and executable on the processor 201. When the processor 201 executes the computer program, it implements the steps of the model evaluation method described above.
[0092] Specifically, processor 201 calls the computer program stored in memory 202 and performs the following steps:
[0093] Obtain the test set; the test set includes multiple video files;
[0094] Each video file in the test set is input into the model to be evaluated for detection, and the target detection results corresponding to each video file are determined; the target detection results include positive results and negative results;
[0095] The evaluation results of the model under evaluation are obtained by calculating the positive and negative results corresponding to each video file in the test set.
[0096] Optionally, the processor 201 performs the following steps: inputting each video file from the test set into the model to be evaluated for detection, and determining the target detection results for each video file, including:
[0097] For each video file in the test set, if a target object is detected in the video file, determine the appearance time and detection bounding box of the target object;
[0098] If the time of occurrence falls within the time interval of any pre-labeled object, the trajectory matching calculation of the target object is performed based on the detection box to obtain the calculation result; the pre-labeled object is the target object appearing in the video file;
[0099] The calculation results determine whether the target detection result is positive or negative.
[0100] Optionally, the processor 201 executes the following for each video file in the test set: if a target object is detected in a video file, after determining the appearance time and detection bounding box of the target object, it includes:
[0101] If the time does not fall within the time interval of any pre-labeled object, the detection result of the target object is determined to be negative.
[0102] Optionally, the processor 201 performs trajectory matching calculations on the target object based on the detection box to obtain the calculation results, including:
[0103] The center point is calculated based on the detection box, and the distance between the coordinates of the center point and each pixel coordinate point in the pre-stored pixel coordinate point set is calculated to obtain the target trajectory point with the shortest distance; the pixel coordinate points are the pixel coordinates corresponding to multiple trajectory points marked by the pre-annotated object in the video file;
[0104] Determine the bounding box corresponding to the target trajectory point, and calculate the overlap between the bounding box and the detection box to obtain the overlap value.
[0105] Optionally, the processor 201 performs the following steps to determine whether the target detection result is positive or negative based on the calculation results:
[0106] The overlap value is compared with a preset threshold. If the overlap value is greater than the preset threshold, the target detection result is determined to be a positive result.
[0107] If the overlap value is less than or equal to the preset threshold, the target detection result is determined to be a negative result.
[0108] Optionally, the processor 201 performs calculations based on the positive and negative results corresponding to each video file in the test set to obtain the evaluation results of the model to be evaluated, including:
[0109] Based on the positive and negative results corresponding to each video file in the test set, determine the number of positive results and the number of negative results.
[0110] The evaluation results of the model under evaluation are obtained by calculating the number of positive and negative results.
[0111] Optionally, the processor 201 performs calculations based on the number of positive and negative results to obtain the evaluation results of the model under evaluation, including:
[0112] The accuracy, precision, and recall of the model under evaluation are determined by calculating the number of positive and negative results.
[0113] The evaluation score of the model to be evaluated is calculated based on its accuracy, precision, and recall.
[0114] That is, in a specific embodiment of the present invention, when the processor 201 of the electronic device 20 executes the computer program, it implements the steps of the above-mentioned model evaluation method, thereby comprehensively evaluating the video analysis and tracking capabilities of the target detection model and ensuring the accuracy of the target detection model's video detection capabilities.
[0115] It should be noted that since the processor 201 of the electronic device 20 implements the steps of the above model evaluation method when executing the computer program, all embodiments of the above model evaluation method are applicable to the electronic device 20 and can achieve the same or similar beneficial effects.
[0116] The computer-readable storage medium provided in this embodiment of the invention stores a computer program. When the computer program is executed by a processor, it implements the various processes of the model evaluation method or the application-side model evaluation method provided in this embodiment of the invention and can achieve the same technical effect. To avoid repetition, it will not be described again here.
[0117] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The program can be stored in a computer-readable storage medium, and when executed, it can include the processes of the embodiments of the above methods. The storage medium can be a magnetic disk, optical disk, read-only memory (ROM), or random access memory (RAM), etc.
[0118] In the description of this specification, references to terms such as "one embodiment," "some embodiments," "example," "specific example," or "some examples," etc., indicate that a specific feature, structure, material, or characteristic described in connection with that embodiment or example is included in at least one embodiment or example of the invention. In this specification, the illustrative expressions of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials, or characteristics described may be combined in any suitable manner in one or more embodiments or examples.
[0119] The above description is only a preferred embodiment of the present invention and does not limit the patent scope of the present invention. All equivalent structural transformations made under the concept of the present invention using the contents of the present invention specification and drawings, or direct / indirect applications in other related technical fields, are included within the patent protection scope of the present invention.
Claims
1. A model evaluation method, characterized by, include: Get the test set; The test set includes multiple video files; Each video file in the test set is input into the model to be evaluated for detection, and the target detection result corresponding to each video file is determined. The target detection results include positive and negative results; The evaluation result of the model to be evaluated is obtained by calculating the positive and negative results corresponding to each video file in the test set. The step of inputting each video file in the test set into the evaluation model for detection, and determining the target detection result corresponding to each video file, includes: For each video file in the test set, if a target object is detected in the video file, determine the appearance time and detection box of the target object; If the occurrence time falls within the time interval of any pre-labeled object, the target object is subjected to trajectory matching calculation based on the detection box to obtain the calculation result; the pre-labeled object is the target object appearing in the video file; when matching the occurrence time of the target object with the time interval of the pre-labeled object, the occurrence time of the target object is matched with the time intervals of multiple pre-labeled objects in the video file to determine whether the occurrence time of the target object belongs to the time interval of a certain pre-labeled object in the video file. If it belongs to the time interval of a certain pre-labeled object, the detection box output by the target object is subjected to trajectory matching calculation. Based on the calculation results, the target detection result is determined to be either a positive or negative result; The step of performing trajectory matching calculation on the target object based on the detection box to obtain the calculation result includes: The center point is calculated based on the detection box, and the coordinates of the center point and each pixel coordinate point in the pre-stored pixel coordinate point set are used to calculate the distance using Euclidean distance to obtain the target trajectory point with the shortest distance; the pixel coordinate points are the pixel coordinates corresponding to the multiple trajectory points marked by the pre-annotated object in the video file; The target bounding box corresponding to the target trajectory point is determined, and the overlap ratio is calculated based on the target bounding box and the detection box to obtain the overlap ratio value.
2. The model evaluation method of claim 1, wherein, For each video file in the test set, if a target object is detected in the video file, after determining the appearance time and detection bounding box of the target object, the process includes: If the occurrence time does not fall within the time interval of any pre-labeled object, the detection result of the target object is determined to be a negative result.
3. The model evaluation method of claim 1, wherein, Determining whether the target detection result is positive or negative based on the calculation result includes: The overlap value is compared with a preset threshold. If the overlap value is greater than the preset threshold, the target detection result is determined to be a positive result. If the overlap value is less than or equal to the preset threshold, the target detection result is determined to be a negative result.
4. The model evaluation method of claim 1, wherein, The evaluation result of the model to be evaluated is obtained by calculating the positive and negative results corresponding to each video file in the test set, including: Based on the positive and negative results corresponding to each video file in the test set, determine the number of positive results and the number of negative results; The evaluation result of the model to be evaluated is obtained by calculating the number of positive results and the number of negative results.
5. The model evaluation method of claim 4, wherein, The evaluation results of the model under evaluation are obtained by calculating based on the number of positive results and the number of negative results, including: The accuracy, precision, and recall of the model to be evaluated are determined by calculating the number of positive results and the number of negative results. The evaluation score of the model to be evaluated is calculated based on its accuracy, precision, and recall.
6. A model evaluation device characterized by comprising: include: The acquisition module is used to acquire the test set; The test set includes multiple video files; The detection module is used to input each video file in the test set into the model to be evaluated for detection, and to determine the target detection result in each video file; The target detection results include positive and negative results; The calculation module is used to calculate the evaluation result of the model to be evaluated based on the positive and negative results corresponding to each video file in the test set. The step of inputting each video file in the test set into the evaluation model for detection, and determining the target detection result corresponding to each video file, includes: For each video file in the test set, if a target object is detected in the video file, determine the appearance time and detection box of the target object; If the occurrence time falls within the time interval of any pre-labeled object, the target object is subjected to trajectory matching calculation based on the detection box to obtain the calculation result; the pre-labeled object is the target object appearing in the video file; when matching the occurrence time of the target object with the time interval of the pre-labeled object, the occurrence time of the target object is matched with the time intervals of multiple pre-labeled objects in the video file to determine whether the occurrence time of the target object belongs to the time interval of a certain pre-labeled object in the video file. If it belongs to the time interval of a certain pre-labeled object, the detection box output by the target object is subjected to trajectory matching calculation. Based on the calculation results, the target detection result is determined to be either a positive or negative result; The step of performing trajectory matching calculation on the target object based on the detection box to obtain the calculation result includes: The center point is calculated based on the detection box, and the coordinates of the center point and each pixel coordinate point in the pre-stored pixel coordinate point set are used to calculate the distance using Euclidean distance to obtain the target trajectory point with the shortest distance; the pixel coordinate points are the pixel coordinates corresponding to the multiple trajectory points marked by the pre-annotated object in the video file; The target bounding box corresponding to the target trajectory point is determined, and the overlap ratio is calculated based on the target bounding box and the detection box to obtain the overlap ratio value.
7. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the model evaluation method as described in any one of claims 1 to 5.
8. A computer-readable storage medium storing a computer program, the computer program comprising instructions that, when executed by a computer, cause the computer to perform the method of any one of claims 1 to 7. When the computer program is executed by the processor, it implements the steps of the model evaluation method as described in any one of claims 1 to 5.