A multi-target tracking method in sports video based on repeated detection optimization
By employing a self-generalized cross-union loss function and a duplicate detection optimizer to optimize the detection loss of the network model in sports videos, the problem of duplicate detection caused by the similarity of athletes' appearances is solved, thereby improving the accuracy and training efficiency of multi-target tracking.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- BEIHANG UNIV
- Filing Date
- 2022-09-06
- Publication Date
- 2026-06-19
AI Technical Summary
Existing multi-object tracking methods in sports videos suffer from repeated detection problems due to the similarity of athletes' appearances, and the training time is too long, making them unable to effectively solve the challenges of multi-object tracking in sports videos.
A duplicate detection optimization method is adopted, which optimizes the detection loss of the network model by using a self-generalized cross-union loss function and a duplicate detection optimizer. By combining the original detection loss and the duplicate detection loss, duplicate detection can be quickly identified and eliminated, thereby improving training efficiency.
It effectively solves the problem of repeated detection in sports videos, reduces training time while maintaining tracking quality, and improves the accuracy and efficiency of multi-target tracking.
Smart Images

Figure CN115439786B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of video surveillance technology, and more specifically, to a multi-target tracking method in sports videos based on repeat detection optimization. Background Technology
[0002] The multi-target tracking problem can be defined as follows: Given a video segment consisting of several consecutive frames, from the initial frame to the final frame, there are multiple targets randomly entering and exiting the field of view. Our goal is to track the trajectory of each target across different frames. The most common application is pedestrian tracking in surveillance videos.
[0003] Compared to pedestrian tracking in general surveillance videos, multi-target tracking in sports videos presents some unique challenges, posing significant difficulties for existing multi-target tracking methods. The uniformity of athletes' team uniforms can be considered as multiple targets exhibiting extremely similar appearances, leading to a similar distribution of visual features among different targets. Figure 1 As shown in the dashed box; this can lead to repeated detection of one or more targets in a single frame, such as... Figure 2 As shown in the circle, tracking multiple targets in sports videos allows for the acquisition of athletes' movement information. For competitors, this effectively assists in analyzing the competitive habits of multiple athletes, thereby enabling the development of targeted strategies. For viewers, it provides a comprehensive view of the competition dynamics and athlete statistics.
[0004] Existing multi-object tracking methods mostly address the problem of tracking multiple pedestrians in surveillance videos. In this scenario, although the number of targets is large, the camera perspective is fixed, the targets have certain differences in appearance, and the targets move at a relatively slow speed, allowing multi-object tracking methods to achieve good tracking results. However, they still have certain limitations. Currently, there is very little research specifically on multi-object tracking for sports videos. Even if existing methods can train models based on sports videos to address the aforementioned challenges, they lack reasonable algorithm design for sports video scenarios. Therefore, existing multi-object tracking methods are generally not directly applicable to sports video scenarios. Direct application would lead to the following problems:
[0005] Duplicate detection occurs in sports videos where athletes appear similar. During training, the model may learn to detect multiple athletes in close proximity or overlapping positions. If, during inference, the model cannot distinguish whether a target is close or overlapping, duplicate detection will occur. Figure 2 As shown in the circle.
[0006] Excessive training time can lead to subtle repetitive detection problems in multi-object tracking tasks in pedestrian videos. These problems are called "subtle" because after long training, the model can learn the subtle differences between different pedestrians and eliminate the problem itself to achieve optimal performance. However, this problem requires a long training period for the model to solve on its own.
[0007] To address this, a multi-target tracking method for sports videos based on repeat detection optimization is proposed. Summary of the Invention
[0008] The present invention aims to provide a multi-target tracking method in sports videos based on repeat detection optimization, so as to solve or improve at least one of the above-mentioned technical problems.
[0009] In view of this, a first aspect of the present invention is to provide a multi-target tracking method in sports videos based on repeat detection optimization.
[0010] The first aspect of the present invention provides a multi-target tracking method in sports videos based on repeat detection optimization, comprising the following steps: S1, inputting a frame image from a training round into a network model to obtain preliminary detection results; S2, applying the self-detection of the bounding box information of the detection results to the network model to obtain the original detection loss; S3, applying a preset method to the bounding box information of the detection results to obtain the repeat detection loss; S4, merging the repeat detection loss and the original detection loss into a total detection loss, backpropagating it to the network model for weight optimization to obtain a new network model; S5, inputting the image again into the new network model to obtain a detection result without repeat detection; wherein, the preset method transforms and calculates the bounding box information through a repeat detection optimizer and sets a detection lower bound to filter out the repeat detection loss.
[0011] This invention provides a multi-target tracking method in sports videos based on duplicate detection optimization. By using a pre-defined duplicate detection optimizer and existing network models for detection, the network model can better adapt to the problem of multiple detections caused by similar appearances of athletes in sports videos, thus solving the problem of duplicate detection in existing multi-target tracking methods in sports videos.
[0012] Meanwhile, since repeated detection problems also occur in pedestrian videos, but are very subtle, the original detection loss and the re-detection loss are used to optimize the network model. This optimization is achieved in a single image information detection and is combined into a total detection loss, which can accelerate the self-optimization of the network model. Therefore, this invention can significantly reduce the training time of existing multi-object tracking methods in pedestrian videos while maintaining tracking quality and improving training efficiency.
[0013] In addition, the technical solutions provided by embodiments of the present invention may also have the following additional technical features:
[0014] In any of the above technical solutions, the bounding box information is obtained by calculating and outputting a certain frame of image in the network model, specifically: B = {b i |i=1,…,N};where, the b i For a single bounding box detected in the image, specifically The formula described and stated This represents the coordinates of the top-left corner of the bounding box. and stated This represents the coordinates of the lower right corner of the bounding box.
[0015] In this technical solution, since multiple people may appear in the same frame in sports videos, multiple bounding box information from different points will be detected in the same frame. Each bounding box information corresponds to an active person. Multiple bounding boxes are combined into a single bounding box information B, which is recorded using the formula B = {b i |i=1,…,N}, where N is an integer not less than 1;
[0016] For a single bounding box b i Record the two corners of its frame, both within the bounding box b. i They are diagonally related, and the top left and bottom right corners are the options. and Accurate recording and information transformation are necessary for subsequent algorithm processing.
[0017] In any of the above technical solutions, the preset method includes: a self-generalized cross-union ratio loss function and a duplicate detection optimization; wherein the result output by the self-generalized cross-union ratio loss function is used as the input condition for the duplicate detection optimization, and then the duplicate detection loss is obtained.
[0018] In this technical solution, in the processing flow of the preset method, the self-generalized intersection-union-ratio (IUU) loss function first receives the bounding box information and converts it into data that can be calculated by the self-generalized IUU loss function so as to perform algorithm calculation to obtain the re-inspection loss.
[0019] In any of the above technical solutions, the self-generalized cross-union ratio (CUI) loss function is SGL. Substituting B into SGL yields the self-generalized CUI loss, specifically as follows:
[0020] M = SGL(B) = 1 – GIoU(B,B); where GIoU is the generalized crossover ratio.
[0021] In this technical solution, GIoU (Generalized Intersection over Union) is the generalized intersection over union. According to B, a self-generalized intersection over union loss matrix M is constructed through the self-generalized intersection over union loss function Self-GIoU Loss (SGL).
[0022] In any of the above technical solutions, the duplicate detection optimization specifically includes the following steps: Calculate the elements in the GIOU using the following formula: Then, according to the commutative law of intersection and union operations, we get: M ij = M ji ; Set a duplicate detection optimization lower bound LB in the duplicate detection optimizer, and output all values in M that are less than LB as the recheck loss The specific formula is as follows: M ij < LB; where C is the smallest rectangle that can cover region b i ∪b j The smallest rectangle, \ is for subtraction, IoU is the intersection over union, M ij Is the element in the i-th row and j-th column of M, M is a symmetric matrix, and the main diagonal is 0, ∑· represents summation.
[0023] In this technical solution, through three steps, the self-generalized intersection over union loss is finally calculated as the recheck loss using the above formula, and the duplicate detection optimization lower bound LB is set to screen the pre-information of the combined recheck loss, ensuring that the output information meets the expectations. The meanings of b i And b j Are the same, both being elements in set B.
[0024] In any of the above technical solutions, the value range of the GIoU is [-1, 1], and the value range of the elements in M is [0, 2].
[0025] In this technical solution, according to the formula for the self-generalized intersection over union loss matrix M, by limiting the value range of GIoU, the value range of the elements in M is further limited.
[0026] In any of the above technical solutions, in S4, after merging, it is specifically the following formula: Where, Loss Origin-Boxes Is the original detection loss, and the Loss Total-Boxes Is the total detection loss.
[0027] In this technical solution, the original detection loss and the re-detection loss can be directly added together to obtain the total detection loss. This results in a detection loss with more and greater information than the original detection loss of the original network model, which helps the network model to be updated and optimized quickly. On the one hand, it can detect fast-moving and highly similar people in sports videos, and on the other hand, it can quickly optimize the detection of people in ordinary videos so as to eliminate the problem of duplicate detection in subsequent detections.
[0028] Additional aspects and advantages of embodiments of the invention will become apparent in the following description or may be learned by practice of embodiments of the invention. Attached Figure Description
[0029] The accompanying drawings are for illustrative purposes only and are not intended to limit the scope of the invention.
[0030] Figure 1 For multi-target tracking in sports videos in existing technologies;
[0031] Figure 2 This addresses the problem of repetitive detection in existing technologies;
[0032] Figure 3 This is a flowchart of the algorithm of the present invention;
[0033] Figure 4 This is a graph showing the relationship between training loss and number of training epochs in this invention.
[0034] Figure 5 This is a rendering of the invention in a volleyball video;
[0035] Figure 6 This is a rendering of the invention in a football video;
[0036] Figure 7 This is a rendering of the invention in a basketball video. Detailed Implementation
[0037] To better understand the above-mentioned objectives, features, and advantages of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that, unless otherwise specified, the embodiments and features described in these embodiments can be combined with each other.
[0038] Many specific details are set forth in the following description in order to provide a full understanding of the invention. However, the invention may also be practiced in other ways different from those described herein, and therefore the scope of protection of the invention is not limited to the specific embodiments disclosed below.
[0039] Please see Figure 1-7The first aspect of the present invention provides a multi-target tracking method in sports videos based on repeat detection optimization.
[0040] The first aspect of the present invention provides a multi-target tracking method in sports videos based on repeat detection optimization, comprising the following steps: S1, inputting a frame image from a training round into a network model to obtain preliminary detection results; S2, applying self-detection of the bounding box information of the detection results to the network model to obtain the original detection loss; S3, applying a preset method to the bounding box information of the detection results to obtain the repeat detection loss; S4, merging the repeat detection loss and the original detection loss into a total detection loss, backpropagating it to the network model for weight optimization to obtain a new network model; S5, inputting the image again into the new network model to obtain a detection result without repeat detection; wherein, the preset method transforms and calculates the bounding box information through a repeat detection optimizer and sets a detection lower bound to filter out the repeat detection loss.
[0041] This invention provides a multi-target tracking method in sports videos based on duplicate detection optimization. By using a pre-defined duplicate detection optimizer and existing network models for detection, the network model can better adapt to the problem of multiple detections caused by similar appearances of athletes in sports videos, thus solving the problem of duplicate detection in existing multi-target tracking methods in sports videos.
[0042] Meanwhile, since repeated detection problems also occur in pedestrian videos, but are very subtle, the original detection loss and the re-detection loss are used to optimize the network model. This optimization is achieved in a single image information detection and is combined into a total detection loss, which can accelerate the self-optimization of the network model. Therefore, this invention can significantly reduce the training time of existing multi-object tracking methods in pedestrian videos while maintaining tracking quality and improving training efficiency.
[0043] Specifically, the network model in step S1 is an existing attention mechanism network model. Figure 1 The dashed lines in the diagram indicate acceptable information and do not require repeated testing.
[0044] In any of the above embodiments, as Figure 3 As shown, bounding box information is obtained by calculating and outputting a frame of an image in the network model, specifically: B = {b i |i=1,…,N};where, b i For a single bounding box detected in the image, specifically In the formula and This represents the coordinates of the top-left corner of the bounding box. and This represents the coordinates of the bottom right corner of the bounding box.
[0045] In this embodiment, since multiple people may appear in the same frame in sports videos, multiple bounding box information from different locations will be detected in the same frame. Each bounding box information corresponds to an active person. These multiple bounding boxes are aggregated into a single bounding box information B, and recorded using the formula: B = {b i |i=1,…,N}, where N is an integer not less than 1;
[0046] For a single bounding box b i Record the two corners of its frame, both within the bounding box b. i They are diagonally related, and the top left and bottom right corners are the options. and Accurate recording and information transformation are necessary for subsequent algorithm processing.
[0047] In any of the above embodiments, as Figure 3 As shown, the preset method includes: a self-generalized cross-union loss function and duplicate detection optimization; wherein, the output of the self-generalized cross-union loss function is used as the input condition for duplicate detection optimization, and then the duplicate detection loss is obtained.
[0048] In this embodiment, in the processing flow of the preset method, the self-generalized intersection-union loss function first receives the bounding box information and converts it into data that can be calculated by the self-generalized intersection-union loss function so as to perform algorithm calculation to obtain the re-inspection loss.
[0049] In any of the above embodiments, as Figure 3 As shown, the self-generalized cross-union ratio (CUI) loss function is SGL. Substituting B into SGL, we obtain the self-generalized CUI loss, specifically as follows:
[0050] M = SGL(B) = 1 – GIoU(B,B); where GIoU is the generalized crossover ratio.
[0051] In this embodiment, GIoU (Generalized Intersection over Union) is the generalized intersection-over-union ratio. The self-generalized intersection-over-union ratio loss matrix M is constructed based on B using the self-generalized intersection-over-union ratio loss function Self-GIoU Loss (SGL).
[0052] In any of the above embodiments, as Figure 3 As shown, the optimization of duplicate detection specifically includes the following steps: The elements in GIoU are calculated using the following formula: Then, according to the commutative law of intersection and union operations, we get: M ij =M ji Set a lower bound (LB) for the repeat detection optimizer, and output all values in M that are less than LB as the repeat detection loss. The specific formula is as follows: M ij <LB; where C is the smallest rectangle that can cover region b i ∪b j of, \ is for subtraction, IoU is the intersection over union, M ij is the element in the i-th row and j-th column of M, M is a symmetric matrix, and the main diagonal is 0, ∑· represents summation.
[0053] In this embodiment, through three steps, the self-generalized intersection over union loss is finally calculated as the re-inspection loss using the above formula, and the lower bound LB of the repeated detection optimization is set to screen the pre-information of the combined re-inspection loss, ensuring that the output information meets the expectations. b i and b j have the same meaning and are both elements in set B.
[0054] In any of the above embodiments, as Figure 3 shown, the value range of GIoU is [-1, 1], and the value range of the elements in M is [0, 2].
[0055] In this embodiment, according to the self-generalized intersection over union loss matrix M of the formula, by restricting the value range of GIoU, the value range of the elements in M is further restricted.
[0056] In any of the above embodiments, as Figure 3 shown, in S4, after merging, it is specifically the following formula: where Loss Origin-Boxes is the original detection loss, and Loss Total-Boxes is the total detection loss.
[0057] In this embodiment, the original detection loss and the re-inspection loss can be directly added to obtain the total detection loss. The detection loss information is more and greater in amount compared to the original detection loss of the self-detection of the original network model, which helps the network model to be updated and optimized quickly. On the one hand, it can detect the people who are moving fast and have similar heights in the sports video, and on the other hand, it can quickly optimize the detection of the people in the ordinary video to eliminate the problem of repeated detection in the subsequent detection.
[0058] Comparative Example 1
[0059] The present invention conducts an ablation experiment on the repeated detection optimizer to verify the effectiveness of the present invention, and uses multiple indicators in the existing evaluation indicators of the multi-object tracking (Multi Objects Tracking) task to evaluate the experimental effect. The specific indicators are as follows:
[0060] IDF1: Identification F-Score refers to the F-value identified by the target ID in each target bounding box. The ID is the identification number assigned to each target during tracking, and the same applies below.
[0061] MT: Mostly Tracked trajectories, which is the number of true trajectories that are successfully tracked in more than 80% of the total frames.
[0062] FP: False Positive refers to a negative sample that is predicted as positive by the model, also known as a false alarm.
[0063] FN: False Negative refers to a positive sample that is predicted as negative by the model, also known as a missed detection.
[0064] IDS: ID switches refers to the number of times the target ID changes instantaneously in the tracking trajectory, which usually reflects the stability of the tracking.
[0065] MOTA: Multiple Object Tracking Accuracy is an indicator that comprehensively measures the accuracy of multi-object tracking by a single camera based on FP, FN, and IDS, and is usually the main indicator used for measurement.
[0066] Table 1 Comparison of various metrics in the MOT-Rally dataset
[0067]
[0068] Table 1 shows the results on the MOT-Rally dataset using the benchmark method TransTrack and the method using the duplicate detection optimizer (D). 3 The table compares the performance metrics of the method under various repeatability detection optimization lower bound (LB) settings and various training epoch settings. The best results are indicated in bold. It is clear from the table that the method using D... 3 After 40 rounds of training, D's model... 3 When LB is set to 0.011 (sixth line), compared to the same training epochs but without D... 3 The baseline model (fifth row) shows a 6.9% improvement in MOTA, a 0.9% improvement in IDF1, a 10% improvement in MT, and significant decreases in FP and FN. A LB value of 0.011 significantly outperforms values of 0.010 and 0.012, indicating that LB is more sensitive to identifying duplicate detections. Excessive training epochs (150 epochs) can lead to slight overfitting and a slight decrease in metrics.
[0069] Figure 4 The graph shows the relationship between training loss and the number of training epochs. It can be seen from the figure that the D proposed in this invention was used. 3 The training loss of the method is significantly lower than that of the benchmark method, which indicates that the present invention can improve tracking quality.
[0070] Figure 5 The results demonstrate the effectiveness of this method in a volleyball video. The left image shows the baseline method, which exhibits duplicate detection and target loss. This phenomenon may occur because the model considers the repeatedly detected target to be an occluded target rather than a missing target. The right image shows the method using D... 3 The method clearly shows that the duplicate detection phenomenon has disappeared and the lost target has been recovered.
[0071] Example 1
[0072] This invention can also be applied to pedestrian tracking videos. Table 2 compares the various metrics of this invention applied to the MOT17 dataset, with the best results indicated in bold and the second-best results in underlined. Table 2 shows that this invention accelerates convergence in pedestrian tracking videos, especially when the baseline method TransTrack is equipped with D... 3 Then, by setting LB to 0.010, the effect of training for only 20 epochs can be close to that of the benchmark training for 40 or even 150 epochs. This experiment proves that the present invention can significantly reduce model training time while maintaining tracking quality.
[0073] Table 2 Comparison of various indicators in the MOT17 dataset
[0074]
[0075] In the description of this invention, it should be understood that the terms "longitudinal", "lateral", "up", "down", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc., indicate the orientation or positional relationship based on the orientation or positional relationship shown in the accompanying drawings, and are only for the convenience of describing this invention, and are not intended to indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of this invention.
[0076] The embodiments described above are merely preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Various modifications and improvements made by those skilled in the art to the technical solutions of the present invention without departing from the spirit of the present invention should fall within the protection scope defined by the claims of the present invention.
Claims
1. A multi-target tracking method in sports videos based on repeat detection optimization, characterized in that, Includes the following steps: S1, Input a frame image from a certain training round into the network model to obtain preliminary detection results; S2, the bounding box information set of the detection results is subjected to self-detection by the network model to obtain the original detection loss; S3, then use a preset method to detect the bounding box information set of the detection results to obtain the re-inspection loss; S4, combine the re-detection loss and the original detection loss into a total detection loss, and backpropagate it to the network model for weight optimization to obtain a new network model; S5, The image is input into the new network model again to obtain a detection result without duplicate detection; The preset method transforms and calculates the bounding box information through a repeat detection optimizer and sets a detection lower bound to filter out the repeat detection loss. The bounding box information is obtained by calculating and outputting a specific frame of image in the network model, specifically as follows: ; Among them, the For a single bounding box detected in the image, specifically The formula described and stated This represents the coordinates of the top-left corner of the bounding box. and stated This represents the coordinates of the lower right corner of the bounding box. N This indicates the number of bounding boxes detected in the current frame; The preset method includes: a self-generalized cross-union loss function and duplicate detection optimization; The result of the self-generalized cross-union loss function is used as the input condition for the repeated detection optimization, and then the repeated detection loss is obtained. The self-generalized cross-union loss function is: , will the Substituting the above In this process, the self-generalized intersection-union ratio loss matrix is obtained, specifically as follows: ; Among them, the For generalized crossover and union comparison; The optimization of duplicate detection specifically includes the following steps: Regarding the The elements in are calculated using the following formula: ; Then, according to the commutative law of intersection and union operations, we get: ; A lower bound (LB) for repeat detection optimization is set in the repeat detection optimizer, and all values in the SGL that are less than LB are output as the repeat detection loss. , This is an abbreviation for Duplicate Detection Decontaminator, and its specific formula is as follows: ; in, To be able to cover the area The smallest rectangle For the purpose of doing tasks, For intersection, union, and comparison for The Middle Line number Column elements, It is a symmetric matrix, and its main diagonal is 0. This indicates a summation.
2. The multi-target tracking method in sports videos based on repeat detection optimization according to claim 1, characterized in that, The The range of values is The The range of values for the elements in the middle is: .
3. The multi-target tracking method in sports videos based on repeat detection optimization according to claim 1, characterized in that, In S4, the merged result is specifically the following formula: ; Among them, the For the original detection loss, the aforementioned This represents the total testing loss.